When merging of the previous VMA fails after the vma iterator has been moved to the previous entry, the vma iterator must be advanced to ensure the caller takes the correct action on the next vma iterator event. Fix this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory situations.
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti... Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti... Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.") Cc: stable@vger.kernel.org Cc: Jann Horn jannh@google.com Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com --- mm/mmap.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index b56a7f0c9f85..b5bc4ca9bdc4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_pgoff = curr->vm_pgoff; vma_start_write(curr); remove = curr; - err = dup_anon_vma(next, curr); + err = dup_anon_vma(next, curr, &anon_dup); } } }
/* Error in anon_vma clone. */ if (err) - return NULL; + goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end) vma_expanded = true; @@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, }
if (vma_iter_prealloc(vmi, vma)) - return NULL; + goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2); VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && @@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_complete(&vp, vmi, mm); khugepaged_enter_vma(res, vm_flags); return res; + +prealloc_fail: +anon_vma_fail: + if (merge_prev) + vma_next(vmi); + return NULL; }
/*
On Wed, 27 Sep 2023 12:07:44 -0400 "Liam R. Howlett" Liam.Howlett@oracle.com wrote:
When merging of the previous VMA fails after the vma iterator has been moved to the previous entry, the vma iterator must be advanced to ensure the caller takes the correct action on the next vma iterator event. Fix this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory situations.
Looking through this thread:
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti...
I'm seeing no indication that the effect is CPU consumption? Jann is excpecting bogus oom-killing?
* Andrew Morton akpm@linux-foundation.org [230927 13:14]:
On Wed, 27 Sep 2023 12:07:44 -0400 "Liam R. Howlett" Liam.Howlett@oracle.com wrote:
When merging of the previous VMA fails after the vma iterator has been moved to the previous entry, the vma iterator must be advanced to ensure the caller takes the correct action on the next vma iterator event. Fix this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory situations.
Looking through this thread:
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti...
I'm seeing no indication that the effect is CPU consumption? Jann is excpecting bogus oom-killing?
His testing injected a bogus oom, but since the vma iterator may get stuck in a "I can merge! oh, I'm out of memory" loop due to the vma_merge() called with the same VMA in this loop, I would expect it to be increased CPU usage when almost out of memory until a task is killed. I don't think he expected a bogus OOM since we are using GFP_KERNEL during mm/internal.h:vma_iter_prealloc() calls.
* Liam R. Howlett Liam.Howlett@Oracle.com [230927 13:26]:
- Andrew Morton akpm@linux-foundation.org [230927 13:14]:
On Wed, 27 Sep 2023 12:07:44 -0400 "Liam R. Howlett" Liam.Howlett@oracle.com wrote:
When merging of the previous VMA fails after the vma iterator has been moved to the previous entry, the vma iterator must be advanced to ensure the caller takes the correct action on the next vma iterator event. Fix this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory situations.
Looking through this thread:
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti...
I'm seeing no indication that the effect is CPU consumption? Jann is excpecting bogus oom-killing?
His testing injected a bogus oom, but since the vma iterator may get stuck in a "I can merge! oh, I'm out of memory" loop due to the vma_merge() called with the same VMA in this loop, I would expect it to be increased CPU usage when almost out of memory until a task is killed. I don't think he expected a bogus OOM since we are using GFP_KERNEL during mm/internal.h:vma_iter_prealloc() calls.
The initial call to vma_merge() is correct, but on the second call vma is the same as prev so it won't attempt to merge prev again. I think it would only cause one extra call to vma_merge().
So I think you are correct, CPU usage will not increase very much. But, there also will not be a bogus OOM. There will just be two calls to vma_merge() for the same VMA when there is an OOM even and we could have merged prev.
I doubt the user would notice anything and they have bigger memory issues at that time.
On Wed, Sep 27, 2023 at 12:07:44PM -0400, Liam R. Howlett wrote:
+++ b/mm/mmap.c @@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_pgoff = curr->vm_pgoff; vma_start_write(curr); remove = curr;
err = dup_anon_vma(next, curr);
err = dup_anon_vma(next, curr, &anon_dup);
This isn't bisectable. dup_anon_vma() doesn't gain a third argument until patch 2.
* Matthew Wilcox willy@infradead.org [230927 16:06]:
On Wed, Sep 27, 2023 at 12:07:44PM -0400, Liam R. Howlett wrote:
+++ b/mm/mmap.c @@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_pgoff = curr->vm_pgoff; vma_start_write(curr); remove = curr;
err = dup_anon_vma(next, curr);
err = dup_anon_vma(next, curr, &anon_dup);
This isn't bisectable. dup_anon_vma() doesn't gain a third argument until patch 2.
Ah, I'll respin and retest.
Sorry about that, Liam
On 9/27/23 18:07, Liam R. Howlett wrote:
When merging of the previous VMA fails after the vma iterator has been moved to the previous entry, the vma iterator must be advanced to ensure the caller takes the correct action on the next vma iterator event. Fix this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory situations.
Maybe we could say explicitly that before this fix, vma_merge will be called twice on the same vma, which to the best of our knowledge does not cause anything worse than some wasted cycles because vma == prev, but it's fragile?
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti... Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejti... Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.") Cc: stable@vger.kernel.org Cc: Jann Horn jannh@google.com Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com
mm/mmap.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index b56a7f0c9f85..b5bc4ca9bdc4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_pgoff = curr->vm_pgoff; vma_start_write(curr); remove = curr;
err = dup_anon_vma(next, curr);
} }err = dup_anon_vma(next, curr, &anon_dup); }
/* Error in anon_vma clone. */ if (err)
return NULL;
goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end) vma_expanded = true;
The vma_iter_config() actions done in this part are something we don't need to undo?
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, } if (vma_iter_prealloc(vmi, vma))
return NULL;
goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2); VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && @@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_complete(&vp, vmi, mm); khugepaged_enter_vma(res, vm_flags); return res;
+prealloc_fail: +anon_vma_fail:
- if (merge_prev)
vma_next(vmi);
- return NULL;
} /*
* Vlastimil Babka vbabka@suse.cz [230929 05:52]:
On 9/27/23 18:07, Liam R. Howlett wrote:
When merging of the previous VMA fails after the vma iterator has been moved to the previous entry, the vma iterator must be advanced to ensure the caller takes the correct action on the next vma iterator event. Fix this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory situations.
Maybe we could say explicitly that before this fix, vma_merge will be called twice on the same vma, which to the best of our knowledge does not cause anything worse than some wasted cycles because vma == prev, but it's fragile?
I will modify this statement again in v3.
Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-mm/CAG48ez12VN1JAO... Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-mm/CAG48ez12VN1JAO... Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.") Cc: stable@vger.kernel.org Cc: Jann Horn jannh@google.com Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com
mm/mmap.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index b56a7f0c9f85..b5bc4ca9bdc4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_pgoff = curr->vm_pgoff; vma_start_write(curr); remove = curr;
err = dup_anon_vma(next, curr);
} }err = dup_anon_vma(next, curr, &anon_dup); }
/* Error in anon_vma clone. */ if (err)
return NULL;
goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end) vma_expanded = true;
The vma_iter_config() actions done in this part are something we don't need to undo?
Oh, right. Thanks.
This made me realise that my prealloc_fail path assumes there is a 'curr' vma that will cause the vma_next() to set the correct range. We actually may be 1 or 4, which means we are looking to add a VMA to the gap; in this case, vma_next() would go beyond where it was at the start of the function, and may cause issue since we do not return an error.
So the current patch fixes the problem Jann discovered (and with any iterators), but the issue may exist in other error scenarios today or in the future. There also may be an issue with next merging failing and skipping a vma...
Looking through the code, there are functions that do not match the entire vma iterator location due to trimming (mbind_range, mlock_fixup) so using the start/end that was passed in may not accurately represent the range passed in though the vma iterator. Although, those ranges do point to the correct location in the tree - they just may be smaller.
All the callers have `addr` within the range of the vma iterator, so it would be safe to do vma_iter_set(vmi, addr); and vma_iter_load(vmi) to restore the location and range. Safest would be to save the vma iterator start and end location and restore those after a re-walk, but this seems unnecessary and would complicate backporting.
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, } if (vma_iter_prealloc(vmi, vma))
return NULL;
goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2); VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && @@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_complete(&vp, vmi, mm); khugepaged_enter_vma(res, vm_flags); return res;
+prealloc_fail: +anon_vma_fail:
- if (merge_prev)
vma_next(vmi);
- return NULL;
} /*
linux-stable-mirror@lists.linaro.org