[based on mm-unstable, 651c8c1d7359]
Optimize mseal checks by removing the separate can_modify_mm() step, and just doing checks on the individual vmas, when various operations are themselves iterating through the tree. This provides a nice speedup and restores performance parity with pre-mseal[3].
This series also depends on the powerpc series that removes arch_unmap[2]. This series is already in mm-unstable.
will-it-scale mmap1_process[1] -t 1 results:
commit 3450fe2b574b4345e4296ccae395149e1a357fee:
min:277605 max:277605 total:277605 min:281784 max:281784 total:281784 min:277238 max:277238 total:277238 min:281761 max:281761 total:281761 min:274279 max:274279 total:274279 min:254854 max:254854 total:254854 measurement min:269143 max:269143 total:269143 min:270454 max:270454 total:270454 min:243523 max:243523 total:243523 min:251148 max:251148 total:251148 min:209669 max:209669 total:209669 min:190426 max:190426 total:190426 min:231219 max:231219 total:231219 min:275364 max:275364 total:275364 min:266540 max:266540 total:266540 min:242572 max:242572 total:242572 min:284469 max:284469 total:284469 min:278882 max:278882 total:278882 min:283269 max:283269 total:283269 min:281204 max:281204 total:281204
After this patch set:
min:280580 max:280580 total:280580 min:290514 max:290514 total:290514 min:291006 max:291006 total:291006 min:290352 max:290352 total:290352 min:294582 max:294582 total:294582 min:293075 max:293075 total:293075 measurement min:295613 max:295613 total:295613 min:294070 max:294070 total:294070 min:293193 max:293193 total:293193 min:291631 max:291631 total:291631 min:295278 max:295278 total:295278 min:293782 max:293782 total:293782 min:290361 max:290361 total:290361 min:294517 max:294517 total:294517 min:293750 max:293750 total:293750 min:293572 max:293572 total:293572 min:295239 max:295239 total:295239 min:292932 max:292932 total:292932 min:293319 max:293319 total:293319 min:294954 max:294954 total:294954
This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second.
Oliver performed his own tests and showed[3] a similar ~5% gain in them.
[1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases. [2]: https://lore.kernel.org/all/20240807124103.85644-1-mpe@ellerman.id.au/ [3]: https://lore.kernel.org/all/ZrMMJfe9aXSWxJz6@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/
Changes in v3: - Moved can_modify_vma to vma.h instead of internal.h (Lorenzo) - Fixed a bug in munmap where we used the wrong VMA pointer - Added tests for the previous munmap bug - Moved the mremap source vma check upwards, to stop us from unmapping dest while the source is sealed (Liam) Changes in v2: - Rebased on top of mm-unstable - Removed a superfluous check in mremap (Jeff Xu)
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- Pedro Falcato (7): mm: Move can_modify_vma to mm/vma.h mm/munmap: Replace can_modify_mm with can_modify_vma mm/mprotect: Replace can_modify_mm with can_modify_vma mm/mremap: Replace can_modify_mm with can_modify_vma mseal: Replace can_modify_mm_madv with a vma variant mm: Remove can_modify_mm() selftests/mm: add more mseal traversal tests
mm/internal.h | 16 ----- mm/madvise.c | 13 +--- mm/mmap.c | 11 +--- mm/mprotect.c | 12 +--- mm/mremap.c | 32 ++------- mm/mseal.c | 55 ++-------------- mm/vma.c | 19 ++++-- mm/vma.h | 35 ++++++++++ tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 9 files changed, 174 insertions(+), 130 deletions(-) --- base-commit: 651c8c1d735983040bec4f71d0e2e690f3c0fc2d change-id: 20240816-mseal-depessimize-f39d9f4c32c6
Best regards,
Move can_modify_vma to vma.h so it can be inlined properly (with the intent to remove can_modify_mm callsites).
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- mm/mseal.c | 17 ----------------- mm/vma.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..2170e2139ca0 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -16,28 +16,11 @@ #include <linux/sched.h> #include "internal.h"
-static inline bool vma_is_sealed(struct vm_area_struct *vma) -{ - return (vma->vm_flags & VM_SEALED); -} - static inline void set_vma_sealed(struct vm_area_struct *vma) { vm_flags_set(vma, VM_SEALED); }
-/* - * check if a vma is sealed for modification. - * return true, if modification is allowed. - */ -static bool can_modify_vma(struct vm_area_struct *vma) -{ - if (unlikely(vma_is_sealed(vma))) - return false; - - return true; -} - static bool is_madv_discard(int behavior) { switch (behavior) { diff --git a/mm/vma.h b/mm/vma.h index 6efdf1768a0a..e979015cc7fc 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) return mas_prev_range(&vmi->mas, 0); }
+#ifdef CONFIG_64BIT + +static inline bool vma_is_sealed(struct vm_area_struct *vma) +{ + return (vma->vm_flags & VM_SEALED); +} + +/* + * check if a vma is sealed for modification. + * return true, if modification is allowed. + */ +static inline bool can_modify_vma(struct vm_area_struct *vma) +{ + if (unlikely(vma_is_sealed(vma))) + return false; + + return true; +} + +#else + +static inline bool can_modify_vma(struct vm_area_struct *vma) +{ + return true; +} + +#endif + #endif /* __MM_VMA_H */
* Pedro Falcato pedro.falcato@gmail.com [240816 20:18]:
Move can_modify_vma to vma.h so it can be inlined properly (with the intent to remove can_modify_mm callsites).
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mseal.c | 17 ----------------- mm/vma.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..2170e2139ca0 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -16,28 +16,11 @@ #include <linux/sched.h> #include "internal.h" -static inline bool vma_is_sealed(struct vm_area_struct *vma) -{
- return (vma->vm_flags & VM_SEALED);
-}
static inline void set_vma_sealed(struct vm_area_struct *vma) { vm_flags_set(vma, VM_SEALED); } -/*
- check if a vma is sealed for modification.
- return true, if modification is allowed.
- */
-static bool can_modify_vma(struct vm_area_struct *vma) -{
- if (unlikely(vma_is_sealed(vma)))
return false;
- return true;
-}
static bool is_madv_discard(int behavior) { switch (behavior) { diff --git a/mm/vma.h b/mm/vma.h index 6efdf1768a0a..e979015cc7fc 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) return mas_prev_range(&vmi->mas, 0); } +#ifdef CONFIG_64BIT
+static inline bool vma_is_sealed(struct vm_area_struct *vma) +{
- return (vma->vm_flags & VM_SEALED);
+}
If you respin, I'd support dropping this entirely as it seems unnecessary.
Either way, Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
+/*
- check if a vma is sealed for modification.
- return true, if modification is allowed.
- */
+static inline bool can_modify_vma(struct vm_area_struct *vma) +{
- if (unlikely(vma_is_sealed(vma)))
return false;
- return true;
+}
+#else
+static inline bool can_modify_vma(struct vm_area_struct *vma) +{
- return true;
+}
+#endif
#endif /* __MM_VMA_H */
-- 2.46.0
On Mon, Aug 19, 2024 at 9:15 PM Liam R. Howlett Liam.Howlett@oracle.com wrote:
- Pedro Falcato pedro.falcato@gmail.com [240816 20:18]:
Move can_modify_vma to vma.h so it can be inlined properly (with the intent to remove can_modify_mm callsites).
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mseal.c | 17 ----------------- mm/vma.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..2170e2139ca0 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -16,28 +16,11 @@ #include <linux/sched.h> #include "internal.h"
-static inline bool vma_is_sealed(struct vm_area_struct *vma) -{
return (vma->vm_flags & VM_SEALED);
-}
static inline void set_vma_sealed(struct vm_area_struct *vma) { vm_flags_set(vma, VM_SEALED); }
-/*
- check if a vma is sealed for modification.
- return true, if modification is allowed.
- */
-static bool can_modify_vma(struct vm_area_struct *vma) -{
if (unlikely(vma_is_sealed(vma)))
return false;
return true;
-}
static bool is_madv_discard(int behavior) { switch (behavior) { diff --git a/mm/vma.h b/mm/vma.h index 6efdf1768a0a..e979015cc7fc 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) return mas_prev_range(&vmi->mas, 0); }
+#ifdef CONFIG_64BIT
+static inline bool vma_is_sealed(struct vm_area_struct *vma) +{
return (vma->vm_flags & VM_SEALED);
+}
If you respin, I'd support dropping this entirely as it seems unnecessary.
ACK, I'll fold this into the next patch if the need for v4 arises.
Either way, Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
Thank you for the speedy review(s)!
On Sat, Aug 17, 2024 at 01:18:28AM GMT, Pedro Falcato wrote:
Move can_modify_vma to vma.h so it can be inlined properly (with the intent to remove can_modify_mm callsites).
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mseal.c | 17 ----------------- mm/vma.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..2170e2139ca0 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -16,28 +16,11 @@ #include <linux/sched.h> #include "internal.h"
-static inline bool vma_is_sealed(struct vm_area_struct *vma) -{
- return (vma->vm_flags & VM_SEALED);
-}
static inline void set_vma_sealed(struct vm_area_struct *vma) { vm_flags_set(vma, VM_SEALED); }
-/*
- check if a vma is sealed for modification.
- return true, if modification is allowed.
- */
-static bool can_modify_vma(struct vm_area_struct *vma) -{
- if (unlikely(vma_is_sealed(vma)))
return false;
- return true;
-}
static bool is_madv_discard(int behavior) { switch (behavior) { diff --git a/mm/vma.h b/mm/vma.h index 6efdf1768a0a..e979015cc7fc 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi) return mas_prev_range(&vmi->mas, 0); }
+#ifdef CONFIG_64BIT
+static inline bool vma_is_sealed(struct vm_area_struct *vma) +{
- return (vma->vm_flags & VM_SEALED);
+}
+/*
- check if a vma is sealed for modification.
- return true, if modification is allowed.
- */
+static inline bool can_modify_vma(struct vm_area_struct *vma) +{
- if (unlikely(vma_is_sealed(vma)))
return false;
- return true;
+}
+#else
+static inline bool can_modify_vma(struct vm_area_struct *vma) +{
- return true;
+}
+#endif
#endif /* __MM_VMA_H */
-- 2.46.0
Thanks for moving to vma.h rather than internal.h! Definitely seems to me to be the correct place for it.
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) { - struct mm_struct *mm = vma->vm_mm; - - /* - * Check if memory is sealed, prevent unmapping a sealed VMA. - * can_modify_mm assumes we have acquired the lock on MM. - */ - if (unlikely(!can_modify_mm(mm, start, end))) - return -EPERM; - - return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock); + return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock); }
/* diff --git a/mm/vma.c b/mm/vma.c index 84965f2cd580..5850f7c0949b 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) goto map_count_exceeded;
+ /* Don't bother splitting the VMA if we can't unmap it anyway */ + if (!can_modify_vma(vma)) { + error = -EPERM; + goto start_split_failed; + } + error = __split_vma(vmi, vma, start, 1); if (error) goto start_split_failed; @@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, */ next = vma; do { + if (!can_modify_vma(next)) { + error = -EPERM; + goto modify_vma_failed; + } + /* Does it split the end? */ if (next->vm_end > end) { error = __split_vma(vmi, next, end, 0); @@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, __mt_destroy(&mt_detach); return 0;
+modify_vma_failed: clear_tree_failed: userfaultfd_error: munmap_gather_failed: @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL;
- /* - * Check if memory is sealed, prevent unmapping a sealed VMA. - * can_modify_mm assumes we have acquired the lock on MM. - */ - if (unlikely(!can_modify_mm(mm, start, end))) - return -EPERM; - /* Find the first overlapping VMA */ vma = vma_find(vmi, end); if (!vma) {
* Pedro Falcato pedro.falcato@gmail.com [240816 20:18]:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Although this has the side effect of potentially splitting the first vma if it is not mseal()'ed, there really is no risk here since someone making an invalid call that splits the vma for whatever reason can modify the bad call to achieve the same split. That is, this doesn't help or hinder an attacker.
Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
- struct mm_struct *mm = vma->vm_mm;
- /*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
- return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
} /* diff --git a/mm/vma.c b/mm/vma.c index 84965f2cd580..5850f7c0949b 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) goto map_count_exceeded;
/* Don't bother splitting the VMA if we can't unmap it anyway */
if (!can_modify_vma(vma)) {
error = -EPERM;
goto start_split_failed;
}
- error = __split_vma(vmi, vma, start, 1); if (error) goto start_split_failed;
@@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, */ next = vma; do {
if (!can_modify_vma(next)) {
error = -EPERM;
goto modify_vma_failed;
}
- /* Does it split the end? */ if (next->vm_end > end) { error = __split_vma(vmi, next, end, 0);
@@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, __mt_destroy(&mt_detach); return 0; +modify_vma_failed: clear_tree_failed: userfaultfd_error: munmap_gather_failed: @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL;
- /*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
- /* Find the first overlapping VMA */ vma = vma_find(vmi, end); if (!vma) {
-- 2.46.0
On Sat, Aug 17, 2024 at 01:18:29AM GMT, Pedro Falcato wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
- struct mm_struct *mm = vma->vm_mm;
- /*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
- return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
}
Oh I like this. Want more mm/mmap.c stuff to look like this, abstracting actual functionality to mm/vma.c...
/* diff --git a/mm/vma.c b/mm/vma.c index 84965f2cd580..5850f7c0949b 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) goto map_count_exceeded;
/* Don't bother splitting the VMA if we can't unmap it anyway */
if (!can_modify_vma(vma)) {
error = -EPERM;
goto start_split_failed;
}
- error = __split_vma(vmi, vma, start, 1); if (error) goto start_split_failed;
@@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, */ next = vma; do {
if (!can_modify_vma(next)) {
error = -EPERM;
goto modify_vma_failed;
}
- /* Does it split the end? */ if (next->vm_end > end) { error = __split_vma(vmi, next, end, 0);
@@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, __mt_destroy(&mt_detach); return 0;
+modify_vma_failed: clear_tree_failed: userfaultfd_error: munmap_gather_failed: @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL;
- /*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
This means we will arch_unmap() first, before realising we can't unmap, however there are a number of other error conditions that would cause a similar outcome in do_vmi_align_munmap() so I don't think that's a problem.
/* Find the first overlapping VMA */ vma = vma_find(vmi, end); if (!vma) {
-- 2.46.0
LGTM, Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
struct mm_struct *mm = vma->vm_mm;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
Another approach to improve perf is to clone the vmi (since it already point to the first vma), and pass the cloned vmi/vma into can_modify_mm check, that will remove the cost of re-finding the first VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of address range, there will be some perf cost there. However, most address ranges in the real world are within a single VMA, in practice, the perf cost is the same as checking the single VMA, 99.9% case.
This will help preserve the nice sealing feature (if one of the vma is sealed, the entire address range is not modified)
return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
}
/* diff --git a/mm/vma.c b/mm/vma.c index 84965f2cd580..5850f7c0949b 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) goto map_count_exceeded;
/* Don't bother splitting the VMA if we can't unmap it anyway */
if (!can_modify_vma(vma)) {
error = -EPERM;
goto start_split_failed;
}
error = __split_vma(vmi, vma, start, 1); if (error) goto start_split_failed;
@@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, */ next = vma; do {
if (!can_modify_vma(next)) {
error = -EPERM;
goto modify_vma_failed;
}
/* Does it split the end? */ if (next->vm_end > end) { error = __split_vma(vmi, next, end, 0);
@@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, __mt_destroy(&mt_detach); return 0;
+modify_vma_failed: clear_tree_failed: userfaultfd_error: munmap_gather_failed: @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
/* Find the first overlapping VMA */ vma = vma_find(vmi, end); if (!vma) {
-- 2.46.0
On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu jeffxu@chromium.org wrote:
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
struct mm_struct *mm = vma->vm_mm;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
Another approach to improve perf is to clone the vmi (since it already point to the first vma), and pass the cloned vmi/vma into can_modify_mm check, that will remove the cost of re-finding the first VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of address range, there will be some perf cost there. However, most address ranges in the real world are within a single VMA, in practice, the perf cost is the same as checking the single VMA, 99.9% case.
This will help preserve the nice sealing feature (if one of the vma is sealed, the entire address range is not modified)
Please drop it. No one wants to preserve this. Everyone is in sync when it comes to the solution except you.
On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato pedro.falcato@gmail.com wrote:
On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu jeffxu@chromium.org wrote:
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
struct mm_struct *mm = vma->vm_mm;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
Another approach to improve perf is to clone the vmi (since it already point to the first vma), and pass the cloned vmi/vma into can_modify_mm check, that will remove the cost of re-finding the first VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of address range, there will be some perf cost there. However, most address ranges in the real world are within a single VMA, in practice, the perf cost is the same as checking the single VMA, 99.9% case.
This will help preserve the nice sealing feature (if one of the vma is sealed, the entire address range is not modified)
Please drop it. No one wants to preserve this. Everyone is in sync when it comes to the solution except you.
Still, this is another option that will very likely address the perf issue.
-Jeff
-- Pedro
On Wed, Aug 21, 2024 at 09:33:06AM GMT, Jeff Xu wrote:
On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato pedro.falcato@gmail.com wrote:
On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu jeffxu@chromium.org wrote:
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
struct mm_struct *mm = vma->vm_mm;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
Another approach to improve perf is to clone the vmi (since it already point to the first vma), and pass the cloned vmi/vma into can_modify_mm check, that will remove the cost of re-finding the first VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of address range, there will be some perf cost there. However, most address ranges in the real world are within a single VMA, in practice, the perf cost is the same as checking the single VMA, 99.9% case.
This will help preserve the nice sealing feature (if one of the vma is sealed, the entire address range is not modified)
Please drop it. No one wants to preserve this. Everyone is in sync when it comes to the solution except you.
Still, this is another option that will very likely address the perf issue.
Nack to your approach. Feel free to send a follow up series replacing Pedro's with yours for review if you feel differently, and stop stalling things. Thanks.
-Jeff
-- Pedro
* Jeff Xu jeffxu@chromium.org [240821 12:33]:
On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato pedro.falcato@gmail.com wrote:
On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu jeffxu@chromium.org wrote:
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
struct mm_struct *mm = vma->vm_mm;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
Another approach to improve perf is to clone the vmi (since it already point to the first vma), and pass the cloned vmi/vma into can_modify_mm check, that will remove the cost of re-finding the first VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of address range, there will be some perf cost there. However, most address ranges in the real world are within a single VMA, in practice, the perf cost is the same as checking the single VMA, 99.9% case.
This will help preserve the nice sealing feature (if one of the vma is sealed, the entire address range is not modified)
Please drop it. No one wants to preserve this. Everyone is in sync when it comes to the solution except you.
Still, this is another option that will very likely address the perf issue.
The cost of cloning the vmi is a memory copy, while the cost of not cloning the vmi is a re-walk of the tree. Neither of these are free.
Both can be avoided by checking the vma flag during the existing loop, which is what is done in this patch set. This is obviously lower cost of either of the above options since they do extra work and still have to check the vma flag(s).
I think you are confusing the behaviour of the munmap() call when you state 'partial success' with a potential split operation that may happen prior to aborting a munmap() call.
What could happen in the failure scenario is that you'd end up with two vmas instead of one mapping a particular area - but the mseal flag is checked prior to allowing a split to happen, so it'll only split non-mseal()'ed vmas.
So what mseal() used to avoid is a call that could potentially split a vma that isn't mseal()'ed, while this patch will allow it to be split. This is how it has been for a very long time and it's okay.
Considering how this affects the security model of mseal(), it means the attacker could still accomplish the same feat of splitting that first vma by altering the call to munmap() to avoid an mseal()'ed vma, so there isn't much lost or gained here security wise - if any.
Thanks, Liam
On Wed, Aug 21, 2024 at 09:15:52AM GMT, Jeff Xu wrote:
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
We were doing an extra mmap tree traversal just to check if the entire range is modifiable. This can be done when we iterate through the VMAs instead.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mmap.c | 11 +---------- mm/vma.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3af256bacef3..30ae4cb5cec9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, bool unlock) {
struct mm_struct *mm = vma->vm_mm;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
Another approach to improve perf is to clone the vmi (since it already point to the first vma), and pass the cloned vmi/vma into can_modify_mm check, that will remove the cost of re-finding the first VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of address range, there will be some perf cost there. However, most address ranges in the real world are within a single VMA, in practice, the perf cost is the same as checking the single VMA, 99.9% case.
This will help preserve the nice sealing feature (if one of the vma is sealed, the entire address range is not modified)
This is tantamount to saying 'why not drop the entire series and try a totally different approach?' and is frankly a little rude and unprofessional to put as a comment midway through a series like this.
I don't agree this sealed property is 'nice', every single other form of failure/inability to perform operations on a VMA is permitted to result in partial operations and an error code.
If a VMA is sealed and causes an operation to fail, that's fine. We run on arguments and facts in the kernel, not feelings.
Please provide this kind of generalised critique at the 0/7 patch. And try to be vaguely civil.
return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
}
/* diff --git a/mm/vma.c b/mm/vma.c index 84965f2cd580..5850f7c0949b 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) goto map_count_exceeded;
/* Don't bother splitting the VMA if we can't unmap it anyway */
if (!can_modify_vma(vma)) {
error = -EPERM;
goto start_split_failed;
}
error = __split_vma(vmi, vma, start, 1); if (error) goto start_split_failed;
@@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, */ next = vma; do {
if (!can_modify_vma(next)) {
error = -EPERM;
goto modify_vma_failed;
}
/* Does it split the end? */ if (next->vm_end > end) { error = __split_vma(vmi, next, end, 0);
@@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, __mt_destroy(&mt_detach); return 0;
+modify_vma_failed: clear_tree_failed: userfaultfd_error: munmap_gather_failed: @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL;
/*
* Check if memory is sealed, prevent unmapping a sealed VMA.
* can_modify_mm assumes we have acquired the lock on MM.
*/
if (unlikely(!can_modify_mm(mm, start, end)))
return -EPERM;
/* Find the first overlapping VMA */ vma = vma_find(vmi, end); if (!vma) {
-- 2.46.0
Avoid taking an extra trip down the mmap tree by checking the vmas directly. mprotect (per POSIX) tolerates partial failure.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- mm/mprotect.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 446f8e5f10d9..0c5d6d06107d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -611,6 +611,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, unsigned long charged = 0; int error;
+ if (!can_modify_vma(vma)) + return -EPERM; + if (newflags == oldflags) { *pprev = vma; return 0; @@ -769,15 +772,6 @@ static int do_mprotect_pkey(unsigned long start, size_t len, } }
- /* - * checking if memory is sealed. - * can_modify_mm assumes we have acquired the lock on MM. - */ - if (unlikely(!can_modify_mm(current->mm, start, end))) { - error = -EPERM; - goto out; - } - prev = vma_prev(&vmi); if (start > vma->vm_start) prev = vma;
* Pedro Falcato pedro.falcato@gmail.com [240816 20:19]:
Avoid taking an extra trip down the mmap tree by checking the vmas directly. mprotect (per POSIX) tolerates partial failure.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
mm/mprotect.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 446f8e5f10d9..0c5d6d06107d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -611,6 +611,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, unsigned long charged = 0; int error;
- if (!can_modify_vma(vma))
return -EPERM;
- if (newflags == oldflags) { *pprev = vma; return 0;
@@ -769,15 +772,6 @@ static int do_mprotect_pkey(unsigned long start, size_t len, } }
- /*
* checking if memory is sealed.
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(current->mm, start, end))) {
error = -EPERM;
goto out;
- }
- prev = vma_prev(&vmi); if (start > vma->vm_start) prev = vma;
-- 2.46.0
On Sat, Aug 17, 2024 at 01:18:30AM GMT, Pedro Falcato wrote:
Avoid taking an extra trip down the mmap tree by checking the vmas directly. mprotect (per POSIX) tolerates partial failure.
Pretty sure this also applies to any such mXXX() operation, though I haven't read the formalised POSIX spec. But in practice, this is how it is :)
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mprotect.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 446f8e5f10d9..0c5d6d06107d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -611,6 +611,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, unsigned long charged = 0; int error;
- if (!can_modify_vma(vma))
return -EPERM;
I'm glad to get rid of the unlikely() too, imo these should _only_ be added based on actual data to back them up rather than because the programmer instinctively 'feels' that something is unlikely from the compiler's point of view.
if (newflags == oldflags) { *pprev = vma; return 0; @@ -769,15 +772,6 @@ static int do_mprotect_pkey(unsigned long start, size_t len, } }
- /*
* checking if memory is sealed.
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(current->mm, start, end))) {
error = -EPERM;
goto out;
- }
This will allow the vm_ops->mprotect() caller to run on the vma before initiating the mprotect() fixup, a quick survey suggests that sgx uses this to see if mprotect() should be permitted in sgx_vma_mprotect() (so fine), and um uses it to actually do an mprotect() call on host memory (honestly fine too).
Looking at the struct vm_operations_struct declaration I see:
/* * Called by mprotect() to make driver-specific permission * checks before mprotect() is finalised. The VMA must not * be modified. Returns 0 if mprotect() can proceed. */ int (*mprotect)(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long newflags);
Which explicitly says DO NOT MODIFY THE VMA.
So we're good.
prev = vma_prev(&vmi); if (start > vma->vm_start) prev = vma;
-- 2.46.0
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Delegate all can_modify checks to the proper places. Unmap checks are done in do_unmap (et al). The source VMA check is done purposefully before unmapping, to keep the original mseal semantics.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- mm/mremap.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc640..24712f8dbb6b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -902,19 +902,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM;
- /* - * In mremap_to(). - * Move a VMA to another location, check if src addr is sealed. - * - * Place can_modify_mm here because mremap_to() - * does its own checking for address range, and we only - * check the sealing after passing those checks. - * - * can_modify_mm assumes we have acquired the lock on MM. - */ - if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) - return -EPERM; - if (flags & MREMAP_FIXED) { /* * In mremap_to(). @@ -1052,6 +1039,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; }
+ /* Don't allow remapping vmas when they have already been sealed */ + if (!can_modify_vma(vma)) { + ret = -EPERM; + goto out; + } + if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -1079,19 +1072,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; }
- /* - * Below is shrink/expand case (not mremap_to()) - * Check if src address is sealed, if so, reject. - * In other words, prevent shrinking or expanding a sealed VMA. - * - * Place can_modify_mm here so we can keep the logic related to - * shrink/expand together. - */ - if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) { - ret = -EPERM; - goto out; - } - /* * Always allow a shrinking remap: that just unmaps * the unnecessary pages..
* Pedro Falcato pedro.falcato@gmail.com [240816 20:19]:
Delegate all can_modify checks to the proper places. Unmap checks are done in do_unmap (et al). The source VMA check is done purposefully before unmapping, to keep the original mseal semantics.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
mm/mremap.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc640..24712f8dbb6b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -902,19 +902,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM;
- /*
* In mremap_to().
* Move a VMA to another location, check if src addr is sealed.
*
* Place can_modify_mm here because mremap_to()
* does its own checking for address range, and we only
* check the sealing after passing those checks.
*
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
return -EPERM;
- if (flags & MREMAP_FIXED) { /*
- In mremap_to().
@@ -1052,6 +1039,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; }
- /* Don't allow remapping vmas when they have already been sealed */
- if (!can_modify_vma(vma)) {
ret = -EPERM;
goto out;
- }
- if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -1079,19 +1072,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; }
- /*
* Below is shrink/expand case (not mremap_to())
* Check if src address is sealed, if so, reject.
* In other words, prevent shrinking or expanding a sealed VMA.
*
* Place can_modify_mm here so we can keep the logic related to
* shrink/expand together.
*/
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
ret = -EPERM;
goto out;
- }
- /*
- Always allow a shrinking remap: that just unmaps
- the unnecessary pages..
-- 2.46.0
On Sat, Aug 17, 2024 at 01:18:31AM GMT, Pedro Falcato wrote:
Delegate all can_modify checks to the proper places. Unmap checks are done in do_unmap (et al). The source VMA check is done purposefully before unmapping, to keep the original mseal semantics.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/mremap.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc640..24712f8dbb6b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -902,19 +902,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM;
- /*
* In mremap_to().
* Move a VMA to another location, check if src addr is sealed.
*
* Place can_modify_mm here because mremap_to()
* does its own checking for address range, and we only
* check the sealing after passing those checks.
*
* can_modify_mm assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
return -EPERM;
I'm honestly confused as to why the original implementation felt it necessary to split the checks. I guess for the purposes of efficiency? But doesn't seem efficient to me.
if (flags & MREMAP_FIXED) { /* * In mremap_to(). @@ -1052,6 +1039,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; }
- /* Don't allow remapping vmas when they have already been sealed */
- if (!can_modify_vma(vma)) {
ret = -EPERM;
goto out;
- }
This is much better, and having it be a VMA check is so obviously correct here. Again confused as to why this implemented at an mm granularity anyway...
if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -1079,19 +1072,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; }
- /*
* Below is shrink/expand case (not mremap_to())
* Check if src address is sealed, if so, reject.
* In other words, prevent shrinking or expanding a sealed VMA.
*
* Place can_modify_mm here so we can keep the logic related to
* shrink/expand together.
*/
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
ret = -EPERM;
goto out;
- }
- /*
- Always allow a shrinking remap: that just unmaps
- the unnecessary pages..
-- 2.46.0
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Replace can_modify_mm_madv() with a single vma variant, and associated checks in madvise.
While we're at it, also invert the order of checks in: if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))
Checking if we can modify the vma itself (through vm_flags) is certainly cheaper than is_ro_anon() due to arch_vma_access_permitted() looking at e.g pkeys registers (with extra branches) in some architectures.
This patch allows for partial madvise success when finding a sealed VMA, which historically has been allowed in Linux.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- mm/internal.h | 2 -- mm/madvise.c | 13 +++---------- mm/mseal.c | 17 ++++------------- mm/vma.h | 7 +++++++ 4 files changed, 14 insertions(+), 25 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index ca422aede342..1db320650539 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1363,8 +1363,6 @@ static inline int can_do_mseal(unsigned long flags)
bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end); -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, - unsigned long end, int behavior); #else static inline int can_do_mseal(unsigned long flags) { diff --git a/mm/madvise.c b/mm/madvise.c index 89089d84f8df..4e64770be16c 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1031,6 +1031,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, struct anon_vma_name *anon_name; unsigned long new_flags = vma->vm_flags;
+ if (unlikely(!can_modify_vma_madv(vma, behavior))) + return -EPERM; + switch (behavior) { case MADV_REMOVE: return madvise_remove(vma, prev, start, end); @@ -1448,15 +1451,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh start = untagged_addr_remote(mm, start); end = start + len;
- /* - * Check if the address range is sealed for do_madvise(). - * can_modify_mm_madv assumes we have acquired the lock on MM. - */ - if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) { - error = -EPERM; - goto out; - } - blk_start_plug(&plug); switch (behavior) { case MADV_POPULATE_READ: @@ -1470,7 +1464,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh } blk_finish_plug(&plug);
-out: if (write) mmap_write_unlock(mm); else diff --git a/mm/mseal.c b/mm/mseal.c index 2170e2139ca0..fdd1666344fa 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -75,24 +75,15 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end) }
/* - * Check if the vmas of a memory range are allowed to be modified by madvise. - * the memory ranger can have a gap (unallocated memory). - * return true, if it is allowed. + * Check if a vma is allowed to be modified by madvise. */ -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end, - int behavior) +bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) { - struct vm_area_struct *vma; - - VMA_ITERATOR(vmi, mm, start); - if (!is_madv_discard(behavior)) return true;
- /* going through each vma to check. */ - for_each_vma_range(vmi, vma, end) - if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))) - return false; + if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma))) + return false;
/* Allow by default. */ return true; diff --git a/mm/vma.h b/mm/vma.h index e979015cc7fc..da31d0f62157 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -380,6 +380,8 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) return true; }
+bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior); + #else
static inline bool can_modify_vma(struct vm_area_struct *vma) @@ -387,6 +389,11 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) return true; }
+static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) +{ + return true; +} + #endif
#endif /* __MM_VMA_H */
* Pedro Falcato pedro.falcato@gmail.com [240816 20:18]:
Replace can_modify_mm_madv() with a single vma variant, and associated checks in madvise.
While we're at it, also invert the order of checks in: if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))
Checking if we can modify the vma itself (through vm_flags) is certainly cheaper than is_ro_anon() due to arch_vma_access_permitted() looking at e.g pkeys registers (with extra branches) in some architectures.
This patch allows for partial madvise success when finding a sealed VMA, which historically has been allowed in Linux.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
mm/internal.h | 2 -- mm/madvise.c | 13 +++---------- mm/mseal.c | 17 ++++------------- mm/vma.h | 7 +++++++ 4 files changed, 14 insertions(+), 25 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index ca422aede342..1db320650539 100644 --- a/mm/internal.h +++ b/mm/internal.h:q! @@ -1363,8 +1363,6 @@ static inline int can_do_mseal(unsigned long flags) bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end); -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
unsigned long end, int behavior);
#else static inline int can_do_mseal(unsigned long flags) { diff --git a/mm/madvise.c b/mm/madvise.c index 89089d84f8df..4e64770be16c 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1031,6 +1031,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, struct anon_vma_name *anon_name; unsigned long new_flags = vma->vm_flags;
- if (unlikely(!can_modify_vma_madv(vma, behavior)))
return -EPERM;
- switch (behavior) { case MADV_REMOVE: return madvise_remove(vma, prev, start, end);
@@ -1448,15 +1451,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh start = untagged_addr_remote(mm, start); end = start + len;
- /*
* Check if the address range is sealed for do_madvise().
* can_modify_mm_madv assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) {
error = -EPERM;
goto out;
- }
- blk_start_plug(&plug); switch (behavior) { case MADV_POPULATE_READ:
@@ -1470,7 +1464,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh } blk_finish_plug(&plug); -out: if (write) mmap_write_unlock(mm); else diff --git a/mm/mseal.c b/mm/mseal.c index 2170e2139ca0..fdd1666344fa 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -75,24 +75,15 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end) } /*
- Check if the vmas of a memory range are allowed to be modified by madvise.
- the memory ranger can have a gap (unallocated memory).
- return true, if it is allowed.
*/
- Check if a vma is allowed to be modified by madvise.
-bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end,
int behavior)
+bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) {
- struct vm_area_struct *vma;
- VMA_ITERATOR(vmi, mm, start);
- if (!is_madv_discard(behavior)) return true;
- /* going through each vma to check. */
- for_each_vma_range(vmi, vma, end)
if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)))
return false;
- if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma)))
return false;
/* Allow by default. */ return true; diff --git a/mm/vma.h b/mm/vma.h index e979015cc7fc..da31d0f62157 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -380,6 +380,8 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) return true; } +bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior);
#else static inline bool can_modify_vma(struct vm_area_struct *vma) @@ -387,6 +389,11 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) return true; } +static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) +{
- return true;
+}
#endif #endif /* __MM_VMA_H */
-- 2.46.0
On Sat, Aug 17, 2024 at 01:18:32AM GMT, Pedro Falcato wrote:
Replace can_modify_mm_madv() with a single vma variant, and associated checks in madvise.
While we're at it, also invert the order of checks in: if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))
Checking if we can modify the vma itself (through vm_flags) is certainly cheaper than is_ro_anon() due to arch_vma_access_permitted() looking at e.g pkeys registers (with extra branches) in some architectures.
This patch allows for partial madvise success when finding a sealed VMA, which historically has been allowed in Linux.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
mm/internal.h | 2 -- mm/madvise.c | 13 +++---------- mm/mseal.c | 17 ++++------------- mm/vma.h | 7 +++++++ 4 files changed, 14 insertions(+), 25 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index ca422aede342..1db320650539 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1363,8 +1363,6 @@ static inline int can_do_mseal(unsigned long flags)
bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end); -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
unsigned long end, int behavior);
#else static inline int can_do_mseal(unsigned long flags) { diff --git a/mm/madvise.c b/mm/madvise.c index 89089d84f8df..4e64770be16c 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1031,6 +1031,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, struct anon_vma_name *anon_name; unsigned long new_flags = vma->vm_flags;
- if (unlikely(!can_modify_vma_madv(vma, behavior)))
return -EPERM;
- switch (behavior) { case MADV_REMOVE: return madvise_remove(vma, prev, start, end);
@@ -1448,15 +1451,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh start = untagged_addr_remote(mm, start); end = start + len;
- /*
* Check if the address range is sealed for do_madvise().
* can_modify_mm_madv assumes we have acquired the lock on MM.
*/
- if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) {
error = -EPERM;
goto out;
- }
- blk_start_plug(&plug); switch (behavior) { case MADV_POPULATE_READ:
@@ -1470,7 +1464,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh } blk_finish_plug(&plug);
-out: if (write) mmap_write_unlock(mm); else diff --git a/mm/mseal.c b/mm/mseal.c index 2170e2139ca0..fdd1666344fa 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -75,24 +75,15 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end) }
/*
- Check if the vmas of a memory range are allowed to be modified by madvise.
- the memory ranger can have a gap (unallocated memory).
- return true, if it is allowed.
*/
- Check if a vma is allowed to be modified by madvise.
-bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end,
int behavior)
+bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) {
struct vm_area_struct *vma;
VMA_ITERATOR(vmi, mm, start);
if (!is_madv_discard(behavior)) return true;
/* going through each vma to check. */
for_each_vma_range(vmi, vma, end)
if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)))
return false;
- if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma)))
return false;
Not your fault, but I find it extremely irritating that something this subtle has literally zero comments.
mseal()'d + user does not have permission to modify pages = potentially discards, as per the original message:
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory.
For something so invasive to just leave this as implied + needing to look up the commit message to understand is just... yeah. But again, not your fault...
/* Allow by default. */ return true; diff --git a/mm/vma.h b/mm/vma.h index e979015cc7fc..da31d0f62157 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -380,6 +380,8 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) return true; }
+bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior);
#else
static inline bool can_modify_vma(struct vm_area_struct *vma) @@ -387,6 +389,11 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) return true; }
+static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) +{
- return true;
+}
#endif
#endif /* __MM_VMA_H */
-- 2.46.0
I remain baffled that the original implementation tried to do these things at an mm- granularity.
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
With no more users in the tree, we can finally remove can_modify_mm().
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- mm/internal.h | 14 -------------- mm/mseal.c | 21 --------------------- 2 files changed, 35 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 1db320650539..3b738b0ad893 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1361,25 +1361,11 @@ static inline int can_do_mseal(unsigned long flags) return 0; }
-bool can_modify_mm(struct mm_struct *mm, unsigned long start, - unsigned long end); #else static inline int can_do_mseal(unsigned long flags) { return -EPERM; } - -static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start, - unsigned long end) -{ - return true; -} - -static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, - unsigned long end, int behavior) -{ - return true; -} #endif
#ifdef CONFIG_SHRINKER_DEBUG diff --git a/mm/mseal.c b/mm/mseal.c index fdd1666344fa..28cd17d7aaf2 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -53,27 +53,6 @@ static bool is_ro_anon(struct vm_area_struct *vma) return false; }
-/* - * Check if the vmas of a memory range are allowed to be modified. - * the memory ranger can have a gap (unallocated memory). - * return true, if it is allowed. - */ -bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end) -{ - struct vm_area_struct *vma; - - VMA_ITERATOR(vmi, mm, start); - - /* going through each vma to check. */ - for_each_vma_range(vmi, vma, end) { - if (unlikely(!can_modify_vma(vma))) - return false; - } - - /* Allow by default. */ - return true; -} - /* * Check if a vma is allowed to be modified by madvise. */
* Pedro Falcato pedro.falcato@gmail.com [240816 20:18]:
With no more users in the tree, we can finally remove can_modify_mm().
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
mm/internal.h | 14 -------------- mm/mseal.c | 21 --------------------- 2 files changed, 35 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 1db320650539..3b738b0ad893 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1361,25 +1361,11 @@ static inline int can_do_mseal(unsigned long flags) return 0; } -bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end);
#else static inline int can_do_mseal(unsigned long flags) { return -EPERM; }
-static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end)
-{
- return true;
-}
-static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
unsigned long end, int behavior)
-{
- return true;
-} #endif #ifdef CONFIG_SHRINKER_DEBUG diff --git a/mm/mseal.c b/mm/mseal.c index fdd1666344fa..28cd17d7aaf2 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -53,27 +53,6 @@ static bool is_ro_anon(struct vm_area_struct *vma) return false; } -/*
- Check if the vmas of a memory range are allowed to be modified.
- the memory ranger can have a gap (unallocated memory).
- return true, if it is allowed.
- */
-bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end) -{
- struct vm_area_struct *vma;
- VMA_ITERATOR(vmi, mm, start);
- /* going through each vma to check. */
- for_each_vma_range(vmi, vma, end) {
if (unlikely(!can_modify_vma(vma)))
return false;
- }
- /* Allow by default. */
- return true;
-}
/*
- Check if a vma is allowed to be modified by madvise.
*/
-- 2.46.0
On Sat, Aug 17, 2024 at 01:18:33AM GMT, Pedro Falcato wrote:
With no more users in the tree, we can finally remove can_modify_mm().
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
mm/internal.h | 14 -------------- mm/mseal.c | 21 --------------------- 2 files changed, 35 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 1db320650539..3b738b0ad893 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1361,25 +1361,11 @@ static inline int can_do_mseal(unsigned long flags) return 0; }
-bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end);
#else static inline int can_do_mseal(unsigned long flags) { return -EPERM; }
-static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end)
-{
- return true;
-}
-static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
unsigned long end, int behavior)
-{
- return true;
-} #endif
#ifdef CONFIG_SHRINKER_DEBUG diff --git a/mm/mseal.c b/mm/mseal.c index fdd1666344fa..28cd17d7aaf2 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -53,27 +53,6 @@ static bool is_ro_anon(struct vm_area_struct *vma) return false; }
-/*
- Check if the vmas of a memory range are allowed to be modified.
- the memory ranger can have a gap (unallocated memory).
- return true, if it is allowed.
- */
-bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end) -{
- struct vm_area_struct *vma;
- VMA_ITERATOR(vmi, mm, start);
- /* going through each vma to check. */
- for_each_vma_range(vmi, vma, end) {
if (unlikely(!can_modify_vma(vma)))
return false;
- }
- /* Allow by default. */
- return true;
-}
/*
- Check if a vma is allowed to be modified by madvise.
*/
-- 2.46.0
Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 259bef4945e9..0d4d40fb0f88 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_mprotect_partial_mprotect_tail(bool seal) +{ + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 2 * page_size; + int ret; + int prot; + + /* + * Check if a partial mseal (that results in two vmas) works correctly. + * It might mprotect the first, but it'll never touch the second (msealed) vma. + */ + + setup_single_address(size, &ptr); + FAIL_TEST_IF_FALSE(ptr != (void *)-1); + + if (seal) { + ret = sys_mseal(ptr + page_size, size); + FAIL_TEST_IF_FALSE(!ret); + } + + ret = sys_mprotect(ptr, size, PROT_EXEC); + if (seal) + FAIL_TEST_IF_FALSE(ret < 0); + else + FAIL_TEST_IF_FALSE(!ret); + + if (seal) { + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0); + FAIL_TEST_IF_FALSE(prot == 0x4); + } + + REPORT_TEST_PASS(); +} + + static void test_seal_mprotect_two_vma_with_gap(bool seal) { void *ptr; @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{ + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 2 * page_size; + int ret; + int prot; + + /* + * Check if a partial mseal (that results in two vmas) works correctly. + * It might unmap the first, but it'll never unmap the second (msealed) vma. + */ + + setup_single_address(size, &ptr); + FAIL_TEST_IF_FALSE(ptr != (void *)-1); + + if (seal) { + ret = sys_mseal(ptr + page_size, size); + FAIL_TEST_IF_FALSE(!ret); + } + + ret = sys_munmap(ptr, size); + if (seal) + FAIL_TEST_IF_FALSE(ret < 0); + else + FAIL_TEST_IF_FALSE(!ret); + + if (seal) { + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0); + FAIL_TEST_IF_FALSE(prot == 0x4); + } + + REPORT_TEST_PASS(); +} + static void test_munmap_start_freed(bool seal) { void *ptr; @@ -1735,6 +1806,37 @@ static void test_seal_discard_ro_anon(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_discard_across_vmas(bool seal) +{ + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 2 * page_size; + int ret; + + setup_single_address(size, &ptr); + FAIL_TEST_IF_FALSE(ptr != (void *)-1); + + if (seal) { + ret = seal_single_address(ptr + page_size, page_size); + FAIL_TEST_IF_FALSE(!ret); + } + + ret = sys_madvise(ptr, size, MADV_DONTNEED); + if (seal) + FAIL_TEST_IF_FALSE(ret < 0); + else + FAIL_TEST_IF_FALSE(!ret); + + ret = sys_munmap(ptr, size); + if (seal) + FAIL_TEST_IF_FALSE(ret < 0); + else + FAIL_TEST_IF_FALSE(!ret); + + REPORT_TEST_PASS(); +} + + static void test_seal_madvise_nodiscard(bool seal) { void *ptr; @@ -1779,7 +1881,7 @@ int main(int argc, char **argv) if (!pkey_supported()) ksft_print_msg("PKEY not supported\n");
- ksft_set_plan(82); + ksft_set_plan(88);
test_seal_addseal(); test_seal_unmapped_start(); @@ -1825,12 +1927,17 @@ int main(int argc, char **argv) test_seal_mprotect_split(false); test_seal_mprotect_split(true);
+ test_seal_mprotect_partial_mprotect_tail(false); + test_seal_mprotect_partial_mprotect_tail(true); + test_seal_munmap(false); test_seal_munmap(true); test_seal_munmap_two_vma(false); test_seal_munmap_two_vma(true); test_seal_munmap_vma_with_gap(false); test_seal_munmap_vma_with_gap(true); + test_seal_munmap_partial_across_vmas(false); + test_seal_munmap_partial_across_vmas(true);
test_munmap_start_freed(false); test_munmap_start_freed(true); @@ -1862,6 +1969,8 @@ int main(int argc, char **argv) test_seal_madvise_nodiscard(true); test_seal_discard_ro_anon(false); test_seal_discard_ro_anon(true); + test_seal_discard_across_vmas(false); + test_seal_discard_across_vmas(true); test_seal_discard_ro_anon_on_rw(false); test_seal_discard_ro_anon_on_rw(true); test_seal_discard_ro_anon_on_shared(false);
On Sat, Aug 17, 2024 at 1:18 AM Pedro Falcato pedro.falcato@gmail.com wrote:
@@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
Bah, obviously this comment isn't true, munmap is never partial. I'll change this locally for v4 if there ends up being one.
* Pedro Falcato pedro.falcato@gmail.com [240818 02:36]:
On Sat, Aug 17, 2024 at 1:18 AM Pedro Falcato pedro.falcato@gmail.com wrote:
@@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
Bah, obviously this comment isn't true, munmap is never partial. I'll change this locally for v4 if there ends up being one.
Besides this comment, the patch looks good.
Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com
On Sat, Aug 17, 2024 at 01:18:34AM GMT, Pedro Falcato wrote:
Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
Other than the comment, LGTM.
Thanks for this series!
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 259bef4945e9..0d4d40fb0f88 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_mprotect_partial_mprotect_tail(bool seal) +{
- void *ptr;
- unsigned long page_size = getpagesize();
- unsigned long size = 2 * page_size;
- int ret;
- int prot;
- /*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might mprotect the first, but it'll never touch the second (msealed) vma.
*/
- setup_single_address(size, &ptr);
- FAIL_TEST_IF_FALSE(ptr != (void *)-1);
- if (seal) {
ret = sys_mseal(ptr + page_size, size);
FAIL_TEST_IF_FALSE(!ret);
- }
- ret = sys_mprotect(ptr, size, PROT_EXEC);
- if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
- else
FAIL_TEST_IF_FALSE(!ret);
- if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
- }
- REPORT_TEST_PASS();
+}
static void test_seal_mprotect_two_vma_with_gap(bool seal) { void *ptr; @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
- void *ptr;
- unsigned long page_size = getpagesize();
- unsigned long size = 2 * page_size;
- int ret;
- int prot;
- /*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
- setup_single_address(size, &ptr);
- FAIL_TEST_IF_FALSE(ptr != (void *)-1);
- if (seal) {
ret = sys_mseal(ptr + page_size, size);
FAIL_TEST_IF_FALSE(!ret);
- }
- ret = sys_munmap(ptr, size);
- if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
- else
FAIL_TEST_IF_FALSE(!ret);
- if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
- }
- REPORT_TEST_PASS();
+}
static void test_munmap_start_freed(bool seal) { void *ptr; @@ -1735,6 +1806,37 @@ static void test_seal_discard_ro_anon(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_discard_across_vmas(bool seal) +{
- void *ptr;
- unsigned long page_size = getpagesize();
- unsigned long size = 2 * page_size;
- int ret;
- setup_single_address(size, &ptr);
- FAIL_TEST_IF_FALSE(ptr != (void *)-1);
- if (seal) {
ret = seal_single_address(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
- }
- ret = sys_madvise(ptr, size, MADV_DONTNEED);
- if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
- else
FAIL_TEST_IF_FALSE(!ret);
- ret = sys_munmap(ptr, size);
- if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
- else
FAIL_TEST_IF_FALSE(!ret);
- REPORT_TEST_PASS();
+}
static void test_seal_madvise_nodiscard(bool seal) { void *ptr; @@ -1779,7 +1881,7 @@ int main(int argc, char **argv) if (!pkey_supported()) ksft_print_msg("PKEY not supported\n");
- ksft_set_plan(82);
ksft_set_plan(88);
test_seal_addseal(); test_seal_unmapped_start();
@@ -1825,12 +1927,17 @@ int main(int argc, char **argv) test_seal_mprotect_split(false); test_seal_mprotect_split(true);
test_seal_mprotect_partial_mprotect_tail(false);
test_seal_mprotect_partial_mprotect_tail(true);
test_seal_munmap(false); test_seal_munmap(true); test_seal_munmap_two_vma(false); test_seal_munmap_two_vma(true); test_seal_munmap_vma_with_gap(false); test_seal_munmap_vma_with_gap(true);
test_seal_munmap_partial_across_vmas(false);
test_seal_munmap_partial_across_vmas(true);
test_munmap_start_freed(false); test_munmap_start_freed(true);
@@ -1862,6 +1969,8 @@ int main(int argc, char **argv) test_seal_madvise_nodiscard(true); test_seal_discard_ro_anon(false); test_seal_discard_ro_anon(true);
- test_seal_discard_across_vmas(false);
- test_seal_discard_across_vmas(true); test_seal_discard_ro_anon_on_rw(false); test_seal_discard_ro_anon_on_rw(true); test_seal_discard_ro_anon_on_shared(false);
-- 2.46.0
Hi Pedro
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 259bef4945e9..0d4d40fb0f88 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_mprotect_partial_mprotect_tail(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might mprotect the first, but it'll never touch the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
you are allocating 2 pages , and I assume you are sealing the second page, so the size should be page_size. ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_mprotect(ptr, size, PROT_EXEC);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial mprotect, the test needs to add the check for the first page to be changed, Also to avoid the merge, a PROT_NONE page can to be added in front.
}
REPORT_TEST_PASS();
+}
static void test_seal_mprotect_two_vma_with_gap(bool seal) { void *ptr; @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_munmap(ptr, size);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial unmap, the test needs to add the check for the first page to be freed, Also to avoid the merge, a PROT_NONE page needs to be in front.
The test_seal_munmap_partial_across_vmas shows the behavior difference with in-loop approach and out-loop approach. Previously, both VMAs will not be freed, now the first VMA will be freed, and the second VMA (sealed) won't.
This brings to the line you previously mentioned: [1] and I quote: "munmap is atomic and always has been. It's required by POSIX."
So this is no longer true for the current series. Linux doesn't need to be POSIX compliant, from previous conversation on this topic with Linus [2], so I'm open to that. If this is accepted by Linus, it would be better to add comments on munmap code or tests, for future readers (in case they are curious about reasoning. )
[1] https://lore.kernel.org/linux-mm/CAKbZUD3_3KN4fAyQsD+=p3PV8svAvVyS278umX40Eh... [2] https://lore.kernel.org/linux-mm/CAHk-=wgDv5vPx2xoxNQh+kbvLsskWubGGGK69cqF_i...
}
REPORT_TEST_PASS();
+}
static void test_munmap_start_freed(bool seal) { void *ptr; @@ -1735,6 +1806,37 @@ static void test_seal_discard_ro_anon(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_discard_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = seal_single_address(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_madvise(ptr, size, MADV_DONTNEED);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
ret = sys_munmap(ptr, size);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
REPORT_TEST_PASS();
+}
static void test_seal_madvise_nodiscard(bool seal) { void *ptr; @@ -1779,7 +1881,7 @@ int main(int argc, char **argv) if (!pkey_supported()) ksft_print_msg("PKEY not supported\n");
ksft_set_plan(82);
ksft_set_plan(88); test_seal_addseal(); test_seal_unmapped_start();
@@ -1825,12 +1927,17 @@ int main(int argc, char **argv) test_seal_mprotect_split(false); test_seal_mprotect_split(true);
test_seal_mprotect_partial_mprotect_tail(false);
test_seal_mprotect_partial_mprotect_tail(true);
test_seal_munmap(false); test_seal_munmap(true); test_seal_munmap_two_vma(false); test_seal_munmap_two_vma(true); test_seal_munmap_vma_with_gap(false); test_seal_munmap_vma_with_gap(true);
test_seal_munmap_partial_across_vmas(false);
test_seal_munmap_partial_across_vmas(true); test_munmap_start_freed(false); test_munmap_start_freed(true);
@@ -1862,6 +1969,8 @@ int main(int argc, char **argv) test_seal_madvise_nodiscard(true); test_seal_discard_ro_anon(false); test_seal_discard_ro_anon(true);
test_seal_discard_across_vmas(false);
test_seal_discard_across_vmas(true); test_seal_discard_ro_anon_on_rw(false); test_seal_discard_ro_anon_on_rw(true); test_seal_discard_ro_anon_on_shared(false);
-- 2.46.0
On Wed, Aug 21, 2024 at 4:56 PM Jeff Xu jeffxu@chromium.org wrote:
Hi Pedro
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 259bef4945e9..0d4d40fb0f88 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_mprotect_partial_mprotect_tail(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might mprotect the first, but it'll never touch the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
you are allocating 2 pages , and I assume you are sealing the second page, so the size should be page_size. ret = sys_mseal(ptr + page_size, page_size);
Yes, good catch, it appears to be harmless but ofc down to straight luck. I'll send a fixup for this and the other mistake down there.
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_mprotect(ptr, size, PROT_EXEC);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial mprotect, the test needs to add the check for the first page to be changed, Also to avoid the merge, a PROT_NONE page can to be added in front.
No, I'm leaving partial mprotect to be undefined. It doesn't make sense to constraint ourselves, since POSIX wording is already loose.
}
REPORT_TEST_PASS();
+}
static void test_seal_mprotect_two_vma_with_gap(bool seal) { void *ptr; @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_munmap(ptr, size);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial unmap, the test needs to add the check for the first page to be freed, Also to avoid the merge, a PROT_NONE page needs to be in front.
I'm not testing partial unmap. Partial unmap does not happen. I have told you this before.
The test_seal_munmap_partial_across_vmas shows the behavior difference with in-loop approach and out-loop approach. Previously, both VMAs will not be freed, now the first VMA will be freed, and the second VMA (sealed) won't.
This brings to the line you previously mentioned: [1] and I quote: "munmap is atomic and always has been. It's required by POSIX."
This is still true, the comment was a copy-and-paste mindslip. Please read the email thread. It has been fixed up by Andrew.
On Wed, Aug 21, 2024 at 9:20 AM Pedro Falcato pedro.falcato@gmail.com wrote:
On Wed, Aug 21, 2024 at 4:56 PM Jeff Xu jeffxu@chromium.org wrote:
Hi Pedro
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 259bef4945e9..0d4d40fb0f88 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_mprotect_partial_mprotect_tail(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might mprotect the first, but it'll never touch the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
you are allocating 2 pages , and I assume you are sealing the second page, so the size should be page_size. ret = sys_mseal(ptr + page_size, page_size);
Yes, good catch, it appears to be harmless but ofc down to straight luck. I'll send a fixup for this and the other mistake down there.
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_mprotect(ptr, size, PROT_EXEC);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial mprotect, the test needs to add the check for the first page to be changed, Also to avoid the merge, a PROT_NONE page can to be added in front.
No, I'm leaving partial mprotect to be undefined. It doesn't make sense to constraint ourselves, since POSIX wording is already loose.
}
REPORT_TEST_PASS();
+}
static void test_seal_mprotect_two_vma_with_gap(bool seal) { void *ptr; @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_munmap(ptr, size);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial unmap, the test needs to add the check for the first page to be freed, Also to avoid the merge, a PROT_NONE page needs to be in front.
I'm not testing partial unmap. Partial unmap does not happen. I have told you this before.
ok. Then this test should be as below ? (need to add PROT_NONE page before and after) size = get_vma_size(ptr, &prot); FAIL_TEST_IF_FALSE(size == 2 * page_size); FAIL_TEST_IF_FALSE(prot==0x4)
The test_seal_munmap_partial_across_vmas shows the behavior difference with in-loop approach and out-loop approach. Previously, both VMAs will not be freed, now the first VMA will be freed, and the second VMA (sealed) won't.
This brings to the line you previously mentioned: [1] and I quote: "munmap is atomic and always has been. It's required by POSIX."
This is still true, the comment was a copy-and-paste mindslip. Please read the email thread. It has been fixed up by Andrew.
Which thread/patch by Andrew ? Could you please send it to me ? (I might missed that)
-- Pedro
On Wed, Aug 21, 2024 at 5:27 PM Jeff Xu jeffxu@chromium.org wrote:
On Wed, Aug 21, 2024 at 9:20 AM Pedro Falcato pedro.falcato@gmail.com wrote:
On Wed, Aug 21, 2024 at 4:56 PM Jeff Xu jeffxu@chromium.org wrote:
Hi Pedro
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato pedro.falcato@gmail.com wrote:
Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 259bef4945e9..0d4d40fb0f88 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_mprotect_partial_mprotect_tail(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might mprotect the first, but it'll never touch the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
you are allocating 2 pages , and I assume you are sealing the second page, so the size should be page_size. ret = sys_mseal(ptr + page_size, page_size);
Yes, good catch, it appears to be harmless but ofc down to straight luck. I'll send a fixup for this and the other mistake down there.
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_mprotect(ptr, size, PROT_EXEC);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial mprotect, the test needs to add the check for the first page to be changed, Also to avoid the merge, a PROT_NONE page can to be added in front.
No, I'm leaving partial mprotect to be undefined. It doesn't make sense to constraint ourselves, since POSIX wording is already loose.
}
REPORT_TEST_PASS();
+}
static void test_seal_mprotect_two_vma_with_gap(bool seal) { void *ptr; @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal) REPORT_TEST_PASS(); }
+static void test_seal_munmap_partial_across_vmas(bool seal) +{
void *ptr;
unsigned long page_size = getpagesize();
unsigned long size = 2 * page_size;
int ret;
int prot;
/*
* Check if a partial mseal (that results in two vmas) works correctly.
* It might unmap the first, but it'll never unmap the second (msealed) vma.
*/
setup_single_address(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
ret = sys_mseal(ptr + page_size, size);
ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
ret = sys_munmap(ptr, size);
if (seal)
FAIL_TEST_IF_FALSE(ret < 0);
else
FAIL_TEST_IF_FALSE(!ret);
if (seal) {
FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial unmap, the test needs to add the check for the first page to be freed, Also to avoid the merge, a PROT_NONE page needs to be in front.
I'm not testing partial unmap. Partial unmap does not happen. I have told you this before.
ok. Then this test should be as below ? (need to add PROT_NONE page before and after) size = get_vma_size(ptr, &prot); FAIL_TEST_IF_FALSE(size == 2 * page_size); FAIL_TEST_IF_FALSE(prot==0x4)
That doesn't work because this region spans two vmas. I'll write something similar for the fixup.
The test_seal_munmap_partial_across_vmas shows the behavior difference with in-loop approach and out-loop approach. Previously, both VMAs will not be freed, now the first VMA will be freed, and the second VMA (sealed) won't.
This brings to the line you previously mentioned: [1] and I quote: "munmap is atomic and always has been. It's required by POSIX."
This is still true, the comment was a copy-and-paste mindslip. Please read the email thread. It has been fixed up by Andrew.
Which thread/patch by Andrew ? Could you please send it to me ? (I might missed that)
This thread and this patch: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches...
On Wed, Aug 21, 2024 at 6:28 PM Pedro Falcato pedro.falcato@gmail.com wrote:
ok. Then this test should be as below ? (need to add PROT_NONE page before and after) size = get_vma_size(ptr, &prot); FAIL_TEST_IF_FALSE(size == 2 * page_size); FAIL_TEST_IF_FALSE(prot==0x4)
That doesn't work because this region spans two vmas. I'll write something similar for the fixup.
Actually, I won't because this might cause spurious test failures on e.g !TOPDOWN mmap architectures. setup_single_address (and co) need a fresh coat of paint (wrt PROT_NONE guard regions around it) and I don't want to be the one to do it, at least not as part of this series.
Andrew, please squash this small patch with this one. It directly addresses a problem found in review.
(I was told this is the preferred way to send small fixups, and I don't think anyone wants a v4 over this trivial issue)
Thank you!
----8<---- From 614e5dc27073c39579c863ebdff4396948e28e03 Mon Sep 17 00:00:00 2001 From: Pedro Falcato pedro.falcato@gmail.com Date: Thu, 22 Aug 2024 00:20:19 +0100 Subject: [PATCH] selftests/mm: Fix mseal's length
We accidentally msealed too much, which could overrun and try to mseal other regions. This seems to be harmless (at least on top-down architectures) due to various reasons all aligning, but may very well cause spurious test failures to e.g bottom-up mmap architectures.
Signed-off-by: Pedro Falcato pedro.falcato@gmail.com --- tools/testing/selftests/mm/mseal_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c index 0d4d40fb0f88..0c41513219ae 100644 --- a/tools/testing/selftests/mm/mseal_test.c +++ b/tools/testing/selftests/mm/mseal_test.c @@ -783,7 +783,7 @@ static void test_seal_mprotect_partial_mprotect_tail(bool seal) FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) { - ret = sys_mseal(ptr + page_size, size); + ret = sys_mseal(ptr + page_size, page_size); FAIL_TEST_IF_FALSE(!ret); }
@@ -1036,7 +1036,7 @@ static void test_seal_munmap_partial_across_vmas(bool seal) FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) { - ret = sys_mseal(ptr + page_size, size); + ret = sys_mseal(ptr + page_size, page_size); FAIL_TEST_IF_FALSE(!ret); }
linux-kselftest-mirror@lists.linaro.org