The patch titled
Subject: mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-fix-set_mempolicy_home_node-previous-vma-pointer.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer
Date: Thu, 28 Sep 2023 13:24:32 -0400
The two users of mbind_range() are expecting that mbind_range() will
update the pointer to the previous VMA, or return an error. However,
set_mempolicy_home_node() does not call mbind_range() if there is no VMA
policy. The fix is to update the pointer to the previous VMA prior to
continuing iterating the VMAs when there is no policy.
Users may experience a WARN_ON() during VMA policy updates when updating
a range of VMAs on the home node.
Link: https://lkml.kernel.org/r/20230928172432.2246534-1-Liam.Howlett@oracle.com
Link: https://lore.kernel.org/linux-mm/CALcu4rbT+fMVNaO_F2izaCT+e7jzcAciFkOvk21HG…
Fixes: f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Yikebaer Aizezi <yikebaer61(a)gmail.com>
Closes: https://lore.kernel.org/linux-mm/CALcu4rbT+fMVNaO_F2izaCT+e7jzcAciFkOvk21HG…
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-set_mempolicy_home_node-previous-vma-pointer
+++ a/mm/mempolicy.c
@@ -1543,8 +1543,10 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
* the home node for vmas we already updated before.
*/
old = vma_policy(vma);
- if (!old)
+ if (!old) {
+ prev = vma;
continue;
+ }
if (old->mode != MPOL_BIND && old->mode != MPOL_PREFERRED_MANY) {
err = -EOPNOTSUPP;
break;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
mmap-fix-error-paths-with-dup_anon_vma.patch
mm-mempolicy-fix-set_mempolicy_home_node-previous-vma-pointer.patch
mmap-add-clarifying-comment-to-vma_merge-code.patch
The patch titled
Subject: mmap: fix error paths with dup_anon_vma()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmap-fix-error-paths-with-dup_anon_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mmap: fix error paths with dup_anon_vma()
Date: Thu, 28 Sep 2023 13:16:33 -0400
When the calling function fails after the dup_anon_vma(), the duplication
of the anon_vma is not being undone. Add the necessary unlink_anon_vma()
call to the error paths that are missing them.
This issue showed up during inspection of the error path in vma_merge()
for an unrelated vma iterator issue.
Users may experience increased memory usage, which may be problematic as
the failure would likely be caused by a low memory situation.
Link: https://lkml.kernel.org/r/20230928171634.2245042-3-Liam.Howlett@oracle.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
--- a/mm/mmap.c~mmap-fix-error-paths-with-dup_anon_vma
+++ a/mm/mmap.c
@@ -587,7 +587,7 @@ again:
* Returns: 0 on success.
*/
static inline int dup_anon_vma(struct vm_area_struct *dst,
- struct vm_area_struct *src)
+ struct vm_area_struct *src, struct vm_area_struct **dup)
{
/*
* Easily overlooked: when mprotect shifts the boundary, make sure the
@@ -597,6 +597,7 @@ static inline int dup_anon_vma(struct vm
if (src->anon_vma && !dst->anon_vma) {
vma_assert_write_locked(dst);
dst->anon_vma = src->anon_vma;
+ *dup = dst;
return anon_vma_clone(dst, src);
}
@@ -624,6 +625,7 @@ int vma_expand(struct vma_iterator *vmi,
unsigned long start, unsigned long end, pgoff_t pgoff,
struct vm_area_struct *next)
{
+ struct vm_area_struct *anon_dup = NULL;
bool remove_next = false;
struct vma_prepare vp;
@@ -633,7 +635,7 @@ int vma_expand(struct vma_iterator *vmi,
remove_next = true;
vma_start_write(next);
- ret = dup_anon_vma(vma, next);
+ ret = dup_anon_vma(vma, next, &anon_dup);
if (ret)
return ret;
}
@@ -661,6 +663,8 @@ int vma_expand(struct vma_iterator *vmi,
return 0;
nomem:
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
return -ENOMEM;
}
@@ -860,6 +864,7 @@ struct vm_area_struct *vma_merge(struct
{
struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
+ struct vm_area_struct *anon_dup = NULL;
struct vma_prepare vp;
pgoff_t vma_pgoff;
int err = 0;
@@ -927,18 +932,18 @@ struct vm_area_struct *vma_merge(struct
vma_start_write(next);
remove = next; /* case 1 */
vma_end = next->vm_end;
- err = dup_anon_vma(prev, next);
+ err = dup_anon_vma(prev, next, &anon_dup);
if (curr) { /* case 6 */
vma_start_write(curr);
remove = curr;
remove2 = next;
if (!next->anon_vma)
- err = dup_anon_vma(prev, curr);
+ err = dup_anon_vma(prev, curr, &anon_dup);
}
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr);
+ err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
remove = curr;
} else { /* case 5 */
@@ -954,7 +959,7 @@ struct vm_area_struct *vma_merge(struct
vma_end = addr;
adjust = next;
adj_start = -(prev->vm_end - addr);
- err = dup_anon_vma(next, prev);
+ err = dup_anon_vma(next, prev, &anon_dup);
} else {
/*
* Note that cases 3 and 8 are the ONLY ones where prev
@@ -968,7 +973,7 @@ struct vm_area_struct *vma_merge(struct
vma_pgoff = curr->vm_pgoff;
vma_start_write(curr);
remove = curr;
- err = dup_anon_vma(next, curr);
+ err = dup_anon_vma(next, curr, &anon_dup);
}
}
}
@@ -1018,6 +1023,9 @@ struct vm_area_struct *vma_merge(struct
return res;
prealloc_fail:
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
+
anon_vma_fail:
if (merge_prev)
vma_next(vmi);
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
mmap-fix-error-paths-with-dup_anon_vma.patch
The patch titled
Subject: mmap: fix vma_iterator in error path of vma_merge()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mmap: fix vma_iterator in error path of vma_merge()
Date: Thu, 28 Sep 2023 13:16:32 -0400
When merging of the previous VMA fails after the vma iterator has been
moved to the previous entry, the vma iterator must be advanced to ensure
the caller takes the correct action on the next vma iterator event. Fix
this by adding a vma_next() call to the error path.
Users would not notice the effects as it would result in an extra
vma_merge() call if the first call ended due to an out-of-memory event.
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Link: https://lkml.kernel.org/r/20230928171634.2245042-2-Liam.Howlett@oracle.com
Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/mmap.c~mmap-fix-vma_iterator-in-error-path-of-vma_merge
+++ a/mm/mmap.c
@@ -975,7 +975,7 @@ struct vm_area_struct *vma_merge(struct
/* Error in anon_vma clone. */
if (err)
- return NULL;
+ goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end)
vma_expanded = true;
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct
}
if (vma_iter_prealloc(vmi, vma))
- return NULL;
+ goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
@@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct
vma_complete(&vp, vmi, mm);
khugepaged_enter_vma(res, vm_flags);
return res;
+
+prealloc_fail:
+anon_vma_fail:
+ if (merge_prev)
+ vma_next(vmi);
+ return NULL;
}
/*
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
mmap-fix-error-paths-with-dup_anon_vma.patch
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT,
kernel should attempt to migrate all existing pages, and return -EIO if
there is misplaced or unmovable page. Then commit 6f4576e3687b
("mempolicy: apply page table walker on queue_pages_range()") messed up
the return value and didn't break VMA scan early ianymore when MPOL_MF_STRICT
alone. The return value problem was fixed by commit a7f40cfe3b7a
("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified"),
but it broke the VMA walk early if unmovable page is met, it may cause some
pages are not migrated as expected.
The code should conceptually do:
if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.
Fixed the behavior.
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Rafael Aquini <aquini(a)redhat.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org> v4.9+
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
---
mm/mempolicy.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..f1b00d6ac7ee 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -426,6 +426,7 @@ struct queue_pages {
unsigned long start;
unsigned long end;
struct vm_area_struct *first;
+ bool has_unmovable;
};
/*
@@ -446,9 +447,8 @@ static inline bool queue_folio_required(struct folio *folio,
/*
* queue_folios_pmd() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. huge zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
* existing folio was already on a node that does not follow the
* policy.
@@ -479,7 +479,7 @@ static int queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma) ||
migrate_folio_add(folio, qp->pagelist, flags)) {
- ret = 1;
+ qp->has_unmovable = true;
goto unlock;
}
} else
@@ -495,9 +495,8 @@ static int queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
*
* queue_folios_pte_range() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - only MPOL_MF_STRICT was specified and an existing folio was already
* on a node that does not follow the policy.
*/
@@ -508,7 +507,6 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr,
struct folio *folio;
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
- bool has_unmovable = false;
pte_t *pte, *mapped_pte;
pte_t ptent;
spinlock_t *ptl;
@@ -538,11 +536,12 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr,
if (!queue_folio_required(folio, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- /* MPOL_MF_STRICT must be specified if we get here */
- if (!vma_migratable(vma)) {
- has_unmovable = true;
- break;
- }
+ /*
+ * MPOL_MF_STRICT must be specified if we get here.
+ * Continue walking vmas due to MPOL_MF_MOVE* flags.
+ */
+ if (!vma_migratable(vma))
+ qp->has_unmovable = true;
/*
* Do not abort immediately since there may be
@@ -550,16 +549,13 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr,
* need migrate other LRU pages.
*/
if (migrate_folio_add(folio, qp->pagelist, flags))
- has_unmovable = true;
+ qp->has_unmovable = true;
} else
break;
}
pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
- if (has_unmovable)
- return 1;
-
return addr != end ? -EIO : 0;
}
@@ -599,7 +595,7 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
* Detecting misplaced folio but allow migrating folios which
* have been queued.
*/
- ret = 1;
+ qp->has_unmovable = true;
goto unlock;
}
@@ -620,7 +616,7 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
* Failed to isolate folio but allow migrating pages
* which have been queued.
*/
- ret = 1;
+ qp->has_unmovable = true;
}
unlock:
spin_unlock(ptl);
@@ -756,12 +752,15 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
.start = start,
.end = end,
.first = NULL,
+ .has_unmovable = false,
};
const struct mm_walk_ops *ops = lock_vma ?
&queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops;
err = walk_page_range(mm, start, end, ops, &qp);
+ if (qp.has_unmovable)
+ err = 1;
if (!qp.first)
/* whole range in hole */
err = -EFAULT;
@@ -1358,7 +1357,7 @@ static long do_mbind(unsigned long start, unsigned long len,
putback_movable_pages(&pagelist);
}
- if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
+ if (((ret > 0) || nr_failed) && (flags & MPOL_MF_STRICT))
err = -EIO;
} else {
up_out:
--
2.39.0
The quilt patch titled
Subject: mmap: fix vma_iterator in error path of vma_merge()
has been removed from the -mm tree. Its filename was
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mmap: fix vma_iterator in error path of vma_merge()
Date: Wed, 27 Sep 2023 12:07:44 -0400
When merging of the previous VMA fails after the vma iterator has been
moved to the previous entry, the vma iterator must be advanced to ensure
the caller takes the correct action on the next vma iterator event. Fix
this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory
situations.
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Link: https://lkml.kernel.org/r/20230927160746.1928098-2-Liam.Howlett@oracle.com
Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/mmap.c~mmap-fix-vma_iterator-in-error-path-of-vma_merge
+++ a/mm/mmap.c
@@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct
vma_pgoff = curr->vm_pgoff;
vma_start_write(curr);
remove = curr;
- err = dup_anon_vma(next, curr);
+ err = dup_anon_vma(next, curr, &anon_dup);
}
}
}
/* Error in anon_vma clone. */
if (err)
- return NULL;
+ goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end)
vma_expanded = true;
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct
}
if (vma_iter_prealloc(vmi, vma))
- return NULL;
+ goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
@@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct
vma_complete(&vp, vmi, mm);
khugepaged_enter_vma(res, vm_flags);
return res;
+
+prealloc_fail:
+anon_vma_fail:
+ if (merge_prev)
+ vma_next(vmi);
+ return NULL;
}
/*
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
mmap-fix-error-paths-with-dup_anon_vma.patch
mmap-add-clarifying-comment-to-vma_merge-code.patch
The quilt patch titled
Subject: mm: lock VMAs skipped by a failed queue_pages_range()
has been removed from the -mm tree. Its filename was
mm-lock-vmas-skipped-by-a-failed-queue_pages_range.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: lock VMAs skipped by a failed queue_pages_range()
Date: Mon, 18 Sep 2023 14:16:08 -0700
When queue_pages_range() encounters an unmovable page, it terminates its
page walk. This walk, among other things, locks the VMAs in the range.
This termination might result in some VMAs being left unlocked after
queue_pages_range() completes. Since do_mbind() continues to operate on
these VMAs despite the failure from queue_pages_range(), it will encounter
an unlocked VMA, leading to a BUG().
This mbind() behavior has been modified several times before and might
need some changes to either finish the page walk even in the presence of
unmovable pages or to error out immediately after the failure to
queue_pages_range(). However that requires more discussions, so to fix
the immediate issue, explicitly lock the VMAs in the range if
queue_pages_range() failed. The added condition does not save much but is
added for documentation purposes to understand when this extra locking is
needed.
Link: https://lkml.kernel.org/r/20230918211608.3580629-1-surenb@google.com
Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+b591856e0f0139f83023(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/mempolicy.c~mm-lock-vmas-skipped-by-a-failed-queue_pages_range
+++ a/mm/mempolicy.c
@@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ /* If queue_pages_range failed then not all VMAs might be locked */
+ if (ret)
+ vma_start_write(vma);
err = mbind_range(&vmi, vma, &prev, start, end, new);
if (err)
break;
_
Patches currently in -mm which might be from surenb(a)google.com are
selftests-mm-add-uffdio_remap-ioctl-test.patch
svm_leave_nested() similar to a nested VM exit, get the vCPU out of nested
mode and thus should end the local inhibition of AVIC on this vCPU.
Failure to do so, can lead to hangs on guest reboot.
Raise the KVM_REQ_APICV_UPDATE request to refresh the AVIC state of the
current vCPU in this case.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/nested.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd496c9e5f91f28..3fea8c47679e689 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1253,6 +1253,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
nested_svm_uninit_mmu_context(vcpu);
vmcb_mark_all_dirty(svm->vmcb);
+
+ if (kvm_apicv_activated(vcpu->kvm))
+ kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
}
kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
--
2.26.3
The following problem exists since the x2avic was enabled in the KVM:
svm_set_x2apic_msr_interception is called to enable the interception of
the x2apic msrs.
In particular it is called at the moment the guest resets its apic.
Assuming that the guest's apic was in x2apic mode, the reset will bring
it back to the xapic mode.
The svm_set_x2apic_msr_interception however has an erroneous check for
'!apic_x2apic_mode()' which prevents it from doing anything in this case.
As a result of this, all x2apic msrs are left unintercepted, and that
exposes the bare metal x2apic (if enabled) to the guest.
Oops.
Remove the erroneous '!apic_x2apic_mode()' check to fix that.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/svm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9507df93f410a63..acdd0b89e4715a3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -913,8 +913,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;
- if (!x2avic_enabled ||
- !apic_x2apic_mode(svm->vcpu.arch.apic))
+ if (!x2avic_enabled)
return;
for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
--
2.26.3
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092019-aptitude-device-3909@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
In later revisions of AMD's APM, there is a new 'incomplete IPI' exit code:
"Invalid IPI Vector - The vector for the specified IPI was set to an
illegal value (VEC < 16)"
Note that tests on Zen2 machine show that this VM exit doesn't happen and
instead AVIC just does nothing.
Add support for this exit code by doing nothing, instead of filling
the kernel log with errors.
Also replace an unthrottled 'pr_err()' if another unknown incomplete
IPI exit happens with WARN_ON_ONCE()
(e.g in case AMD adds yet another 'Invalid IPI' exit reason)
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/avic.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 19bf955b67e0da0..3ac0ffc4f3e202b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -268,6 +268,7 @@ enum avic_ipi_failure_cause {
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
AVIC_IPI_FAILURE_INVALID_TARGET,
AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
+ AVIC_IPI_FAILURE_INVALID_IPI_VECTOR,
};
#define AVIC_PHYSICAL_MAX_INDEX_MASK GENMASK_ULL(8, 0)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2092db892d7d052..c44b65af494e3ff 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -529,8 +529,11 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
WARN_ONCE(1, "Invalid backing page\n");
break;
+ case AVIC_IPI_FAILURE_INVALID_IPI_VECTOR:
+ /* Invalid IPI with vector < 16 */
+ break;
default:
- pr_err("Unknown IPI interception\n");
+ WARN_ONCE(1, "Unknown avic incomplete IPI interception\n");
}
return 1;
--
2.26.3
#regzbot introduced v6.1.52..v6.1.53
#regzbot introduced: ed134f284b4ed85a70d5f760ed0686e3cd555f9b
We hit this regression when updating our guest vm kernel from 6.1.52 to
6.1.53 -- bisecting this problem was introduced
in ed134f284b4ed85a70d5f760ed0686e3cd555f9b -- vfs, security: Fix automount
superblock LSM init problem, preventing NFS sb sharing --
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
We're getting an EINVAL in `selinux_set_mnt_opts` in
`security/selinux/hooks.c` when mounting a folder in a guest VM where
selinux is disabled. We're mounting from another folder that we suspect has
selinux labels set from the host. The EINVAL is getting set in the
following block...
```
if (!selinux_initialized(&selinux_state)) {
if (!opts) {
/* Defer initialization until selinux_complete_init,
after the initial policy is loaded and the security
server is ready to handle calls. */
goto out;
}
rc = -EINVAL;
pr_warn("SELinux: Unable to set superblock options "
"before the security server is initialized\n");
goto out;
}
```
We can reproduce 100% of the time but don't currently have a simple
reproducer as the problem was found in our build service which uses
kata-containers (with cloud-hypervisor and rootfs mounted via virtio-blk).
We have not checked the mainline as we currently are tied to 6.1.x.
-Simon
commit 18789be8e0d9fbb78b2290dcf93f500726ed19f0 upstream.
Please apply to 6.4 and 6.5.
Do not allow the CS35L56 to be put into its lowest power
"hibernation" mode. This only affects I2C because "hibernation"
is already disabled on SPI and SoundWire.
Recent firmwares need a different wake-up sequence. Until
that sequence has been specified, the chip "hibernation" mode
must be disabled otherwise it can intermittently fail to wake.
Backport note: This is the same change as upstream commit, to delete
one line, but the upstream commit would not apply cleanly on older
branches because of minor differences to the surrounding code.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230912133841.3480466-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/codecs/cs35l56-i2c.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/sound/soc/codecs/cs35l56-i2c.c b/sound/soc/codecs/cs35l56-i2c.c
index c613a2554fa3..494adabd4f43 100644
--- a/sound/soc/codecs/cs35l56-i2c.c
+++ b/sound/soc/codecs/cs35l56-i2c.c
@@ -27,7 +27,6 @@ static int cs35l56_i2c_probe(struct i2c_client *client)
return -ENOMEM;
cs35l56->dev = dev;
- cs35l56->can_hibernate = true;
i2c_set_clientdata(client, cs35l56);
cs35l56->regmap = devm_regmap_init_i2c(client, regmap_config);
--
2.30.2
Hi Greg, Sasha,
The following small batch contains two more fixes for a WARNING splat on
chain unregistration and UaF in the flowtable unregistration that is
exercised from netns path for 5.10 -stable.
I am using original commit IDs for reference:
1) 6069da443bf6 ("netfilter: nf_tables: unregister flowtable hooks on netns exit")
2) f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path")
Please, apply.
Thanks.
Pablo Neira Ayuso (2):
netfilter: nf_tables: unregister flowtable hooks on netns exit
netfilter: nf_tables: double hook unregistration in netns path
net/netfilter/nf_tables_api.c | 68 ++++++++++++++++++++++++++---------
1 file changed, 52 insertions(+), 16 deletions(-)
--
2.30.2
Hi,
Some hangs are reported on GFX9 hardware related to pre-emption. The
hangs that are root caused to this are fixed by this commit from 6.6:
8cbbd11547f6 ("drm/amdgpu: set completion status as preempted for the
resubmission")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2447#note_2100975
Can you please bring this back to 6.5.y?
Thanks,
PIPE_CONTROL_FLUSH_L3 is not needed for aux invalidation
so don't set that.
Fixes: 78a6ccd65fa3 ("drm/i915/gt: Ensure memory quiesced before invalidation")
Cc: Jonathan Cavitt <jonathan.cavitt(a)intel.com>
Cc: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.8+
Cc: Andrzej Hajda <andrzej.hajda(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: Matt Roper <matthew.d.roper(a)intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan(a)intel.com>
Cc: Tapani Pälli <tapani.palli(a)intel.com>
Cc: Mark Janes <mark.janes(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 0143445dba83..ba4c2422b340 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -271,8 +271,17 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode)
if (GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
+ /*
+ * L3 fabric flush is needed for AUX CCS invalidation
+ * which happens as part of pipe-control so we can
+ * ignore PIPE_CONTROL_FLUSH_L3. Also PIPE_CONTROL_FLUSH_L3
+ * deals with Protected Memory which is not needed for
+ * AUX CCS invalidation and lead to unwanted side effects.
+ */
+ if (mode & EMIT_FLUSH)
+ bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
+
bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
- bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
bit_group_1 |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
/* Wa_1409600907:tgl,adl-p */
--
2.41.0
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam serdecznie
Wiktor Nurek
Per the "SMC calling convention specification", the 64-bit calling
convention can only be used when the client is 64-bit. Whereas the
32-bit calling convention can be used by either a 32-bit or a 64-bit
client.
Currently during SCM probe, irrespective of the client, 64-bit calling
convention is made, which is incorrect and may lead to the undefined
behaviour when the client is 32-bit. Let's fix it.
Cc: stable(a)vger.kernel.org
Fixes: 9a434cee773a ("firmware: qcom_scm: Dynamically support SMCCC and legacy conventions")
Reviewed-By: Elliot Berman <quic_eberman(a)quicinc.com>
Signed-off-by: Kathiravan Thirumoorthy <quic_kathirav(a)quicinc.com>
---
Changes in V3:
- reworded the commit title and msg
- pick up the R-b tag
Changes in V2:
- Added the Fixes tag and cc'd stable mailing list
---
drivers/firmware/qcom_scm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index c2c7fafef34b..520de9b5633a 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -215,6 +215,12 @@ static enum qcom_scm_convention __get_convention(void)
if (likely(qcom_scm_convention != SMC_CONVENTION_UNKNOWN))
return qcom_scm_convention;
+ /*
+ * Per the "SMC calling convention specification", the 64-bit calling
+ * convention can only be used when the client is 64-bit, otherwise
+ * system will encounter the undefined behaviour.
+ */
+#if IS_ENABLED(CONFIG_ARM64)
/*
* Device isn't required as there is only one argument - no device
* needed to dma_map_single to secure world
@@ -235,6 +241,7 @@ static enum qcom_scm_convention __get_convention(void)
forced = true;
goto found;
}
+#endif
probed_convention = SMC_CONVENTION_ARM_32;
ret = __scm_smc_call(NULL, &desc, probed_convention, &res, true);
---
base-commit: 8fff9184d1b5810dca5dd1a02726d4f844af88fc
change-id: 20230925-scm-d62c6cd1947b
Best regards,
--
Kathiravan Thirumoorthy <quic_kathirav(a)quicinc.com>
The patch titled
Subject: mmap: fix error paths with dup_anon_vma()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmap-fix-error-paths-with-dup_anon_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mmap: fix error paths with dup_anon_vma()
Date: Wed, 27 Sep 2023 12:07:45 -0400
When the calling function fails after the dup_anon_vma(), the duplication
of the anon_vma is not being undone. Add the necessary unlink_anon_vma()
call to the error paths that are missing them.
This issue showed up during inspection of the error path in vma_merge()
for an unrelated vma iterator issue.
Users may experience increased memory usage, which may be problematic as
the failure would likely be caused by a low memory situation.
Link: https://lkml.kernel.org/r/20230927160746.1928098-3-Liam.Howlett@oracle.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Cc: Jann Horn <jannh(a)google.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
--- a/mm/mmap.c~mmap-fix-error-paths-with-dup_anon_vma
+++ a/mm/mmap.c
@@ -587,7 +587,7 @@ again:
* Returns: 0 on success.
*/
static inline int dup_anon_vma(struct vm_area_struct *dst,
- struct vm_area_struct *src)
+ struct vm_area_struct *src, struct vm_area_struct **dup)
{
/*
* Easily overlooked: when mprotect shifts the boundary, make sure the
@@ -597,6 +597,7 @@ static inline int dup_anon_vma(struct vm
if (src->anon_vma && !dst->anon_vma) {
vma_assert_write_locked(dst);
dst->anon_vma = src->anon_vma;
+ *dup = dst;
return anon_vma_clone(dst, src);
}
@@ -624,6 +625,7 @@ int vma_expand(struct vma_iterator *vmi,
unsigned long start, unsigned long end, pgoff_t pgoff,
struct vm_area_struct *next)
{
+ struct vm_area_struct *anon_dup = NULL;
bool remove_next = false;
struct vma_prepare vp;
@@ -633,7 +635,7 @@ int vma_expand(struct vma_iterator *vmi,
remove_next = true;
vma_start_write(next);
- ret = dup_anon_vma(vma, next);
+ ret = dup_anon_vma(vma, next, &anon_dup);
if (ret)
return ret;
}
@@ -661,6 +663,8 @@ int vma_expand(struct vma_iterator *vmi,
return 0;
nomem:
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
return -ENOMEM;
}
@@ -860,6 +864,7 @@ struct vm_area_struct *vma_merge(struct
{
struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
+ struct vm_area_struct *anon_dup = NULL;
struct vma_prepare vp;
pgoff_t vma_pgoff;
int err = 0;
@@ -927,18 +932,18 @@ struct vm_area_struct *vma_merge(struct
vma_start_write(next);
remove = next; /* case 1 */
vma_end = next->vm_end;
- err = dup_anon_vma(prev, next);
+ err = dup_anon_vma(prev, next, &anon_dup);
if (curr) { /* case 6 */
vma_start_write(curr);
remove = curr;
remove2 = next;
if (!next->anon_vma)
- err = dup_anon_vma(prev, curr);
+ err = dup_anon_vma(prev, curr, &anon_dup);
}
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr);
+ err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
remove = curr;
} else { /* case 5 */
@@ -954,7 +959,7 @@ struct vm_area_struct *vma_merge(struct
vma_end = addr;
adjust = next;
adj_start = -(prev->vm_end - addr);
- err = dup_anon_vma(next, prev);
+ err = dup_anon_vma(next, prev, &anon_dup);
} else {
/*
* Note that cases 3 and 8 are the ONLY ones where prev
@@ -1018,6 +1023,9 @@ struct vm_area_struct *vma_merge(struct
return res;
prealloc_fail:
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
+
anon_vma_fail:
if (merge_prev)
vma_next(vmi);
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
mmap-fix-error-paths-with-dup_anon_vma.patch
mmap-add-clarifying-comment-to-vma_merge-code.patch
The patch titled
Subject: mmap: fix vma_iterator in error path of vma_merge()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mmap: fix vma_iterator in error path of vma_merge()
Date: Wed, 27 Sep 2023 12:07:44 -0400
When merging of the previous VMA fails after the vma iterator has been
moved to the previous entry, the vma iterator must be advanced to ensure
the caller takes the correct action on the next vma iterator event. Fix
this by adding a vma_next() call to the error path.
Users may experience higher CPU usage, most likely in very low memory
situations.
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Link: https://lkml.kernel.org/r/20230927160746.1928098-2-Liam.Howlett@oracle.com
Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/mmap.c~mmap-fix-vma_iterator-in-error-path-of-vma_merge
+++ a/mm/mmap.c
@@ -968,14 +968,14 @@ struct vm_area_struct *vma_merge(struct
vma_pgoff = curr->vm_pgoff;
vma_start_write(curr);
remove = curr;
- err = dup_anon_vma(next, curr);
+ err = dup_anon_vma(next, curr, &anon_dup);
}
}
}
/* Error in anon_vma clone. */
if (err)
- return NULL;
+ goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end)
vma_expanded = true;
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct
}
if (vma_iter_prealloc(vmi, vma))
- return NULL;
+ goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
@@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct
vma_complete(&vp, vmi, mm);
khugepaged_enter_vma(res, vm_flags);
return res;
+
+prealloc_fail:
+anon_vma_fail:
+ if (merge_prev)
+ vma_next(vmi);
+ return NULL;
}
/*
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
mmap-fix-vma_iterator-in-error-path-of-vma_merge.patch
mmap-fix-error-paths-with-dup_anon_vma.patch
mmap-add-clarifying-comment-to-vma_merge-code.patch
Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with
a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
CAN controller reception: the Rx buffer is only 5 messages long, so when
the bus loaded (eg. a message every 50us), overrun may easily
happen. Upon an overrun situation, due to a possible internal crosstalk
situation, the controller enters a frozen state which only can be
unlocked with a soft reset (experimentally). The solution was to offload
a call to sja1000_start() in a threaded handler. This needs to happen in
process context as this operation requires to sleep. sja1000_start()
basically enters "reset mode", performs a proper software reset and
returns back into "normal mode".
Since this fix was introduced, we no longer observe any stalls in
reception. However it was sporadically observed that the transmit path
would now freeze. Further investigation blamed the fix mentioned above,
and especially the reset operation. Reproducing the reset in a loop
helped identifying what could possibly go wrong. The sja1000 is a single
Tx queue device, which leverages the netdev helpers to process one Tx
message at a time. The logic is: the queue is stopped, the message sent
to the transceiver, once properly transmitted the controller sets a
status bit which triggers an interrupt, in the interrupt handler the
transmission status is checked and the queue woken up. Unfortunately, if
an overrun happens, we might perform the soft reset precisely between
the transmission of the buffer to the transceiver and the advent of the
transmission status bit. We would then stop the transmission operation
without re-enabling the queue, leading to all further transmissions to
be ignored.
The reset interrupt can only happen while the device is "open", and
after a reset we anyway want to resume normal operations, no matter if a
packet to transmit got dropped in the process, so we shall wake up the
queue. Restarting the device and waking-up the queue is exactly what
sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
the queue state, we must acquire a lock both in the reset handler and in
the transmit path to ensure serialization of both operations. As the
reset handler might still be called after the transmission of a frame to
the transceiver but before it actually gets transmitted, we must ensure
we don't leak the skb, so we free it (the behavior is consistent, no
matter if there was an skb on the stack or not).
Fixes: 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
This patch was written and tested on a slightly older stable kernel as
this is the kernel that runs on the boards which shown the problem, but
there should be no difference with upstream kernels.
drivers/net/can/sja1000/sja1000.c | 13 ++++++++++++-
drivers/net/can/sja1000/sja1000.h | 1 +
drivers/net/can/sja1000/sja1000_platform.c | 2 ++
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
index ae47fc72aa96..fe1d818f5d51 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -297,6 +297,8 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
if (can_dropped_invalid_skb(dev, skb))
return NETDEV_TX_OK;
+ spin_lock(&priv->tx_lock);
+
netif_stop_queue(dev);
fi = dlc = cf->can_dlc;
@@ -335,6 +337,8 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
sja1000_write_cmdreg(priv, cmd_reg_val);
+ spin_unlock(&priv->tx_lock);
+
return NETDEV_TX_OK;
}
@@ -394,9 +398,16 @@ static void sja1000_rx(struct net_device *dev)
static irqreturn_t sja1000_reset_interrupt(int irq, void *dev_id)
{
struct net_device *dev = (struct net_device *)dev_id;
+ struct sja1000_priv *priv = netdev_priv(dev);
netdev_dbg(dev, "performing a soft reset upon overrun\n");
- sja1000_start(dev);
+
+ spin_lock(&priv->tx_lock);
+
+ can_free_echo_skb(dev, 0);
+ sja1000_set_mode(dev, CAN_MODE_START);
+
+ spin_unlock(&priv->tx_lock);
return IRQ_HANDLED;
}
diff --git a/drivers/net/can/sja1000/sja1000.h b/drivers/net/can/sja1000/sja1000.h
index 9f041d027dcc..85def6329edb 100644
--- a/drivers/net/can/sja1000/sja1000.h
+++ b/drivers/net/can/sja1000/sja1000.h
@@ -166,6 +166,7 @@ struct sja1000_priv {
void __iomem *reg_base; /* ioremap'ed address to registers */
unsigned long irq_flags; /* for request_irq() */
spinlock_t cmdreg_lock; /* lock for concurrent cmd register writes */
+ spinlock_t tx_lock; /* lock for serializing transmissions and soft resets */
u16 flags; /* custom mode flags */
u8 ocr; /* output control register */
diff --git a/drivers/net/can/sja1000/sja1000_platform.c b/drivers/net/can/sja1000/sja1000_platform.c
index f33bad164813..ca3bcab461d8 100644
--- a/drivers/net/can/sja1000/sja1000_platform.c
+++ b/drivers/net/can/sja1000/sja1000_platform.c
@@ -305,6 +305,8 @@ static int sp_probe(struct platform_device *pdev)
priv->can.clock.freq = clk_get_rate(clk) / 2;
}
+ spin_lock_init(&priv->tx_lock);
+
platform_set_drvdata(pdev, dev);
SET_NETDEV_DEV(dev, &pdev->dev);
--
2.34.1
The patch titled
Subject: mm/page_alloc: correct start page when guard page debug is enabled
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-correct-start-page-when-guard-page-debug-is-enabled.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kemeng Shi <shikemeng(a)huaweicloud.com>
Subject: mm/page_alloc: correct start page when guard page debug is enabled
Date: Wed, 27 Sep 2023 17:44:01 +0800
When guard page debug is enabled and set_page_guard returns success, we
miss to forward page to point to start of next split range and we will do
split unexpectedly in page range without target page. Move start page
update before set_page_guard to fix this.
As we split to wrong target page, then splited pages are not able to merge
back to original order when target page is put back and splited pages
except target page is not usable. To be specific:
Consider target page is the third page in buddy page with order 2.
| buddy-2 | Page | Target | Page |
After break down to target page, we will only set first page to Guard
because of bug.
| Guard | Page | Target | Page |
When we try put_page_back_buddy with target page, the buddy page of target
if neither guard nor buddy, Then it's not able to construct original page
with order 2
| Guard | Page | buddy-0 | Page |
All pages except target page is not in free list and is not usable.
Link: https://lkml.kernel.org/r/20230927094401.68205-1-shikemeng@huaweicloud.com
Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")
Signed-off-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-page_alloc-correct-start-page-when-guard-page-debug-is-enabled
+++ a/mm/page_alloc.c
@@ -6475,6 +6475,7 @@ static void break_down_buddy_pages(struc
next_page = page;
current_buddy = page + size;
}
+ page = next_page;
if (set_page_guard(zone, current_buddy, high, migratetype))
continue;
@@ -6482,7 +6483,6 @@ static void break_down_buddy_pages(struc
if (current_buddy != target) {
add_to_free_list(current_buddy, zone, high, migratetype);
set_buddy_order(current_buddy, high);
- page = next_page;
}
}
}
_
Patches currently in -mm which might be from shikemeng(a)huaweicloud.com are
mm-page_alloc-correct-start-page-when-guard-page-debug-is-enabled.patch
mm-compaction-use-correct-list-in-move_freelist_head-tail.patch
mm-compaction-call-list_is_first-last-more-intuitively-in-move_freelist_head-tail.patch
mm-compaction-correctly-return-failure-with-bogus-compound_order-in-strict-mode.patch
mm-compaction-remove-repeat-compact_blockskip_flush-check-in-reset_isolation_suitable.patch
mm-compaction-improve-comment-of-is_via_compact_memory.patch
mm-compaction-factor-out-code-to-test-if-we-should-run-compaction-for-target-order.patch
The following commit has been merged into the timers/core branch of tip:
Commit-ID: 1a6a464774947920dcedcf7409be62495c7cedd0
Gitweb: https://git.kernel.org/tip/1a6a464774947920dcedcf7409be62495c7cedd0
Author: Frederic Weisbecker <frederic(a)kernel.org>
AuthorDate: Tue, 12 Sep 2023 12:44:06 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 27 Sep 2023 16:54:03 +02:00
timers: Tag (hr)timer softirq as hotplug safe
Specific stress involving frequent CPU-hotplug operations, such as
running rcutorture for example, may trigger the following message:
NOHZ tick-stop error: local softirq work is pending, handler #02!!!"
This happens in the CPU-down hotplug process, after
CPUHP_AP_SMPBOOT_THREADS whose teardown callback parks ksoftirqd, and
before the target CPU shuts down through CPUHP_AP_IDLE_DEAD. In this
fragile intermediate state, softirqs waiting for threaded handling may be
forever ignored and eventually reported by the idle task as in the above
example.
However some vectors are known to be safe as long as the corresponding
subsystems have teardown callbacks handling the migration of their
events. The above error message reports pending timers softirq although
this vector can be considered as hotplug safe because the
CPUHP_TIMERS_PREPARE teardown callback performs the necessary migration
of timers after the death of the CPU. Hrtimers also have a similar
hotplug handling.
Therefore this error message, as far as (hr-)timers are concerned, can
be considered spurious and the relevant softirq vectors can be marked as
hotplug safe.
Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230912104406.312185-6-frederic@kernel.org
---
include/linux/interrupt.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce4..4a1dc88 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -569,8 +569,12 @@ enum
* 2) rcu_report_dead() reports the final quiescent states.
*
* _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
+ *
+ * _ (HR)TIMER_SOFTIRQ: (hr)timers_dead_cpu() migrates the queue
*/
-#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ))
+#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(TIMER_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ) |\
+ BIT(HRTIMER_SOFTIRQ) | BIT(RCU_SOFTIRQ))
+
/* map softirq index to softirq name. update 'softirq_to_name' in
* kernel/softirq.c when adding a new softirq.
On Mon, Apr 03, 2023 at 09:02:42PM +0200, Marek Vasut wrote:
> Do not generate the HS front and back porch gaps, the HSA gap and
> EOT packet, as per "SN65DSI83 datasheet SLLSEC1I - SEPTEMBER 2012
> - REVISED OCTOBER 2020", page 22, these packets are not required.
> This makes the TI SN65DSI83 bridge work with Samsung DSIM on i.MX8MN.
>
> Signed-off-by: Marek Vasut <marex(a)denx.de>
Hello,
can you please queue up this to kernel v6.1-stable ?
commit ca161b259cc84fe1f4a2ce4c73c3832cf6f713f1
Author: Marek Vasut <marex(a)denx.de>
Commit: Robert Foss <rfoss(a)kernel.org>
drm/bridge: ti-sn65dsi83: Do not generate HFP/HBP/HSA and EOT packet
Do not generate the HS front and back porch gaps, the HSA gap and
EOT packet, as per "SN65DSI83 datasheet SLLSEC1I - SEPTEMBER 2012
- REVISED OCTOBER 2020", page 22, these packets are not required.
This makes the TI SN65DSI83 bridge work with Samsung DSIM on i.MX8MN.
Signed-off-by: Marek Vasut <marex(a)denx.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Robert Foss <rfoss(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230403190242.224490-1-marex…
It solves a real issue with some displays not working without it.
Thanks,
Francesco
When allocating the pages for bss the start address needs to be rounded
down instead of up.
Otherwise the start of the bss segment may be unmapped.
The was reported to happen on Aarch64:
Memory allocated by set_brk():
Before: start=0x420000 end=0x420000
After: start=0x41f000 end=0x420000
The triggering binary looks like this:
Elf file type is EXEC (Executable file)
Entry point 0x400144
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000178 0x0000000000000178 R E 0x10000
LOAD 0x000000000000ffe8 0x000000000041ffe8 0x000000000041ffe8
0x0000000000000000 0x0000000000000008 RW 0x10000
NOTE 0x0000000000000120 0x0000000000400120 0x0000000000400120
0x0000000000000024 0x0000000000000024 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .text .eh_frame
01 .bss
02 .note.gnu.build-id
03
Reported-by: Sebastian Ott <sebott(a)redhat.com>
Closes: https://lore.kernel.org/lkml/5d49767a-fbdc-fbe7-5fb2-d99ece3168cb@redhat.co…
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
I'm not really familiar with the ELF loading process, so putting this
out as RFC.
A example binary compiled with aarch64-linux-gnu-gcc 13.2.0 is available
at https://test.t-8ch.de/binfmt-bss-repro.bin
---
fs/binfmt_elf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7b3d2d491407..4008a57d388b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -112,7 +112,7 @@ static struct linux_binfmt elf_format = {
static int set_brk(unsigned long start, unsigned long end, int prot)
{
- start = ELF_PAGEALIGN(start);
+ start = ELF_PAGESTART(start);
end = ELF_PAGEALIGN(end);
if (end > start) {
/*
---
base-commit: aed8aee11130a954356200afa3f1b8753e8a9482
change-id: 20230914-bss-alloc-f523fa61718c
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Buggy BIOSes may have zero-length records in FPDT, table, as a result
fpdt_process_subtable() spins in eternal loop.
Break out of the loop if record length is zero.
Fixes: d1eb86e59be0 ("ACPI: tables: introduce support for FPDT table")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vasily Khoruzhick <anarsoul(a)gmail.com>
---
drivers/acpi/acpi_fpdt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/acpi/acpi_fpdt.c b/drivers/acpi/acpi_fpdt.c
index a2056c4c8cb7..53d8f9601a55 100644
--- a/drivers/acpi/acpi_fpdt.c
+++ b/drivers/acpi/acpi_fpdt.c
@@ -194,6 +194,11 @@ static int fpdt_process_subtable(u64 address, u32 subtable_type)
record_header = (void *)subtable_header + offset;
offset += record_header->length;
+ if (!record_header->length) {
+ pr_info(FW_BUG "Zero-length record found.\n");
+ break;
+ }
+
switch (record_header->type) {
case RECORD_S3_RESUME:
if (subtable_type != SUBTABLE_S3PT) {
--
2.42.0
high_memory is the virtual address of the 'highest physical address' in
the system. But __va(get_num_physpages() << PAGE_SHIFT) is not what we
want because there may be holes in the physical address space. On the
other hand, max_low_pfn is calculated from memblock_end_of_DRAM(), which
is exactly corresponding to the highest physical address, so use it for
high_memory calculation.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chong Qiao <qiaochong(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
arch/loongarch/kernel/numa.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c
index c7d33c489e04..6e65ff12d5c7 100644
--- a/arch/loongarch/kernel/numa.c
+++ b/arch/loongarch/kernel/numa.c
@@ -436,7 +436,7 @@ void __init paging_init(void)
void __init mem_init(void)
{
- high_memory = (void *) __va(get_num_physpages() << PAGE_SHIFT);
+ high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
memblock_free_all();
}
--
2.39.3
On Tue, Sep 26, 2023 at 10:02:38PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 26, 2023 at 04:17:33PM +0000, Carlos Llamas wrote:
> >
> > This issue is hurting the performance of our stable 6.1 releases. Does
> > it make sense to backport these patches into stable branches once they
> > land in mainline? I would assume we want to fix the perf regression
> > there too?
>
> Note that these patches are in tip/sched/core, slated for the next merge
> window.
We can wait, no problem. I just wanted to make sure we also patch stable
if needed. Elliot, would you be able to send a backport of your patches
to stable once they land in mainline on the next merge window?
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org
--
Carlos Llamas
The patch titled
Subject: selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fix-awk-usage-in-charge_reserved_hugetlbsh-and-hugetlb_reparenting_testsh-that-may-cause-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Juntong Deng <juntong.deng(a)outlook.com>
Subject: selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
Date: Wed, 27 Sep 2023 02:19:44 +0800
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).
The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).
Error Example:
awk: not an option: -e
Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0…
Signed-off-by: Juntong Deng <juntong.deng(a)outlook.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 4 ++--
tools/testing/selftests/mm/hugetlb_reparenting_test.sh | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh~selftests-mm-fix-awk-usage-in-charge_reserved_hugetlbsh-and-hugetlb_reparenting_testsh-that-may-cause-error
+++ a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -25,7 +25,7 @@ if [[ "$1" == "-cgroup-v2" ]]; then
fi
if [[ $cgroup2 ]]; then
- cgroup_path=$(mount -t cgroup2 | head -1 | awk -e '{print $3}')
+ cgroup_path=$(mount -t cgroup2 | head -1 | awk '{print $3}')
if [[ -z "$cgroup_path" ]]; then
cgroup_path=/dev/cgroup/memory
mount -t cgroup2 none $cgroup_path
@@ -33,7 +33,7 @@ if [[ $cgroup2 ]]; then
fi
echo "+hugetlb" >$cgroup_path/cgroup.subtree_control
else
- cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk -e '{print $3}')
+ cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk '{print $3}')
if [[ -z "$cgroup_path" ]]; then
cgroup_path=/dev/cgroup/memory
mount -t cgroup memory,hugetlb $cgroup_path
--- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh~selftests-mm-fix-awk-usage-in-charge_reserved_hugetlbsh-and-hugetlb_reparenting_testsh-that-may-cause-error
+++ a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
@@ -20,7 +20,7 @@ fi
if [[ $cgroup2 ]]; then
- CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk -e '{print $3}')
+ CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk '{print $3}')
if [[ -z "$CGROUP_ROOT" ]]; then
CGROUP_ROOT=/dev/cgroup/memory
mount -t cgroup2 none $CGROUP_ROOT
@@ -28,7 +28,7 @@ if [[ $cgroup2 ]]; then
fi
echo "+hugetlb +memory" >$CGROUP_ROOT/cgroup.subtree_control
else
- CGROUP_ROOT=$(mount -t cgroup | grep ",hugetlb" | awk -e '{print $3}')
+ CGROUP_ROOT=$(mount -t cgroup | grep ",hugetlb" | awk '{print $3}')
if [[ -z "$CGROUP_ROOT" ]]; then
CGROUP_ROOT=/dev/cgroup/memory
mount -t cgroup memory,hugetlb $CGROUP_ROOT
_
Patches currently in -mm which might be from juntong.deng(a)outlook.com are
selftests-mm-fix-awk-usage-in-charge_reserved_hugetlbsh-and-hugetlb_reparenting_testsh-that-may-cause-error.patch
Live-migrations under qemu result in guest corruption when
the following three conditions are all met:
* The source host CPU has capabilities that itself
extend that of the guest CPU fpstate->user_xfeatures
* The source kernel emits guest_fpu->user_xfeatures
with respect to the host CPU (i.e. it *does not* have
the "Fixes:" commit)
* The destination kernel enforces that the xfeatures
in the buffer given to KVM_SET_IOCTL are compatible
with the guest CPU (i.e., it *does* have the "Fixes:"
commit)
When these conditions are met, the semantical changes to
fpstate->user_features trigger a subtle bug in qemu that
results in qemu failing to put the XSAVE architectural
state into KVM.
qemu then both ceases to put the remaining (non-XSAVE) x86
architectural state into KVM and makes the fateful mistake
of resuming the guest anyways. This usually results in
immediate guest corruption, silent or not.
Due to the grave nature of this qemu bug, attempt to
retain behavior of old kernels by clamping the xfeatures
specified in the buffer given to KVM_SET_IOCTL such that
it aligns with the guests fpstate->user_xfeatures instead
of returning an error.
Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
Cc: stable(a)vger.kernel.org
Cc: Leonardo Bras <leobras(a)redhat.com>
Signed-off-by: Tyler Stachecki <tstachecki(a)bloomberg.net>
---
arch/x86/kvm/x86.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c9c81e82e65..baad160b592f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5407,11 +5407,21 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ union fpregs_state *ustate = (union fpregs_state *) guest_xsave->region;
+ u64 user_xfeatures = vcpu->arch.guest_fpu.fpstate->user_xfeatures;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return 0;
- return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
- guest_xsave->region,
+ /*
+ * In previous kernels, kvm_arch_vcpu_create() set the guest's fpstate
+ * based on what the host CPU supported. Recent kernels changed this
+ * and only accept ustate containing xfeatures that the guest CPU is
+ * capable of supporting.
+ */
+ ustate->xsave.header.xfeatures &= user_xfeatures;
+
+ return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, ustate,
kvm_caps.supported_xcr0,
&vcpu->arch.pkru);
}
--
2.30.2
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
Soon, support will be added for BPF sockaddr hooks for Unix sockets
which introduces the ability to modify the msg->msg_namelen value.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Makes a copy of msg->msg_name and to insulate callers.
4) Makes a copy of msg->msg_namelen to insulate callers in anticipation
of the aforementioned change to support Unix sockets.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Link: https://lore.kernel.org/bpf/202309231339.L2O0CrMU-lkp@intel.com/T/#m181770a…
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jordan Rife <jrife(a)google.com>
---
net/socket.c | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index c8b08b32f097e..107a257a75186 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -737,6 +737,14 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
return ret;
}
+static int __sock_sendmsg(struct socket *sock, struct msghdr *msg)
+{
+ int err = security_socket_sendmsg(sock, msg,
+ msg_data_left(msg));
+
+ return err ?: sock_sendmsg_nosec(sock, msg);
+}
+
/**
* sock_sendmsg - send a message through @sock
* @sock: socket
@@ -747,10 +755,21 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
*/
int sock_sendmsg(struct socket *sock, struct msghdr *msg)
{
- int err = security_socket_sendmsg(sock, msg,
- msg_data_left(msg));
+ struct sockaddr_storage *save_addr = (struct sockaddr_storage *)msg->msg_name;
+ int save_addrlen = msg->msg_namelen;
+ struct sockaddr_storage address;
+ int ret;
- return err ?: sock_sendmsg_nosec(sock, msg);
+ if (msg->msg_name) {
+ memcpy(&address, msg->msg_name, msg->msg_namelen);
+ msg->msg_name = &address;
+ }
+
+ ret = __sock_sendmsg(sock, msg);
+ msg->msg_name = save_addr;
+ msg->msg_namelen = save_addrlen;
+
+ return ret;
}
EXPORT_SYMBOL(sock_sendmsg);
@@ -1138,7 +1157,7 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (sock->type == SOCK_SEQPACKET)
msg.msg_flags |= MSG_EOR;
- res = sock_sendmsg(sock, &msg);
+ res = __sock_sendmsg(sock, &msg);
*from = msg.msg_iter;
return res;
}
@@ -2174,7 +2193,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
if (sock->file->f_flags & O_NONBLOCK)
flags |= MSG_DONTWAIT;
msg.msg_flags = flags;
- err = sock_sendmsg(sock, &msg);
+ err = __sock_sendmsg(sock, &msg);
out_put:
fput_light(sock->file, fput_needed);
@@ -2538,7 +2557,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
err = sock_sendmsg_nosec(sock, msg_sys);
goto out_freectl;
}
- err = sock_sendmsg(sock, msg_sys);
+ err = __sock_sendmsg(sock, msg_sys);
/*
* If this is sendmmsg() and sending to current destination address was
* successful, remember it.
--
2.42.0.515.g380fc7ccd1-goog
In snapshot_write_next sync_read is set and unset in three different
spots unnecessiarly. As a result there is a subtle bug where the first
page after the meta data has been loaded unconditionally sets sync_read
to 0. If this first pfn was actually a highmem page then the returned
buffer will be the global "buffer," and the page needs to be loaded
synchronously.
That is, I'm not sure we can always assume the following to be safe:
handle->buffer = get_buffer(&orig_bm, &ca);
handle->sync_read = 0;
Because get_buffer can call get_highmem_page_buffer which can
return 'buffer'
The easiest way to address this is just set sync_read before
snapshot_write_next returns if handle->buffer == buffer.
Signed-off-by: Brian Geffon <bgeffon(a)google.com>
Cc: stable(a)vger.kernel.org
---
kernel/power/snapshot.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 190ed707ddcc..362e6bae5891 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -2780,8 +2780,6 @@ int snapshot_write_next(struct snapshot_handle *handle)
if (handle->cur > 1 && handle->cur > nr_meta_pages + nr_copy_pages + nr_zero_pages)
return 0;
- handle->sync_read = 1;
-
if (!handle->cur) {
if (!buffer)
/* This makes the buffer be freed by swsusp_free() */
@@ -2824,7 +2822,6 @@ int snapshot_write_next(struct snapshot_handle *handle)
memory_bm_position_reset(&zero_bm);
restore_pblist = NULL;
handle->buffer = get_buffer(&orig_bm, &ca);
- handle->sync_read = 0;
if (IS_ERR(handle->buffer))
return PTR_ERR(handle->buffer);
}
@@ -2834,9 +2831,8 @@ int snapshot_write_next(struct snapshot_handle *handle)
handle->buffer = get_buffer(&orig_bm, &ca);
if (IS_ERR(handle->buffer))
return PTR_ERR(handle->buffer);
- if (handle->buffer != buffer)
- handle->sync_read = 0;
}
+ handle->sync_read = (handle->buffer == buffer);
handle->cur++;
/* Zero pages were not included in the image, memset it and move on. */
--
2.42.0.515.g380fc7ccd1-goog
The cxl_test unit test environment models a CXL topology for
sysfs/user-ABI regression testing. It uses interface mocking via the
"--wrap=" linker option to redirect cxl_core routines that parse
hardware registers with versions that just publish objects, like
devm_cxl_enumerate_decoders().
Starting with:
Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port")
...port register enumeration is moved into devm_cxl_add_port(). This
conflicts with the "cxl_test avoids emulating registers stance" so
either the port code needs to be refactored (too violent), or modified
so that register enumeration is skipped on "fake" cxl_test ports
(annoying, but straightforward).
This conflict has happened previously and the "check for platform
device" workaround to avoid instrusive refactoring was deployed in those
scenarios. In general, refactoring should only benefit production code,
test code needs to remain minimally instrusive to the greatest extent
possible.
This was missed previously because it may sometimes just cause warning
messages to be emitted, but it can also cause test failures. The
backport to -stable is only nice to have for clean cxl_test runs.
Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port")
Cc: <stable(a)vger.kernel.org>
Reported-by: Alison Schofield <alison.schofield(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/cxl/core/port.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 724be8448eb4..7ca01a834e18 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <linux/platform_device.h>
#include <linux/memregion.h>
#include <linux/workqueue.h>
#include <linux/debugfs.h>
@@ -706,16 +707,20 @@ static int cxl_setup_comp_regs(struct device *dev, struct cxl_register_map *map,
return cxl_setup_regs(map);
}
-static inline int cxl_port_setup_regs(struct cxl_port *port,
- resource_size_t component_reg_phys)
+static int cxl_port_setup_regs(struct cxl_port *port,
+ resource_size_t component_reg_phys)
{
+ if (dev_is_platform(port->uport_dev))
+ return 0;
return cxl_setup_comp_regs(&port->dev, &port->comp_map,
component_reg_phys);
}
-static inline int cxl_dport_setup_regs(struct cxl_dport *dport,
- resource_size_t component_reg_phys)
+static int cxl_dport_setup_regs(struct cxl_dport *dport,
+ resource_size_t component_reg_phys)
{
+ if (dev_is_platform(dport->dport_dev))
+ return 0;
return cxl_setup_comp_regs(dport->dport_dev, &dport->comp_map,
component_reg_phys);
}
Hi,
It's Steven from Xiamen Oready Industry & Trade Co.,Ltd in China. Hope this letter finds you well.
Are you interested in saving costs by importing bags from us? As a leading manufacturer of bags, we'd like to share some of our top sellers with you.
Size, color, logo and packing can all be customized, we can also custom make the bags according to your own designs, OEM & ODM orders are welcome.
Looking forward to hearing from you soon.
Steven Xiu
Xiamen Oready Industry & Trade Co.,Ltd.
as absurd and complex as it is. .02:29.The first paradox is that we love speed
It'll spread to the entire curve
It will disrupt the economy of all the countries
Hi kernel maintainers!
My computer doesn't boot with kernels newer than 6.1.45.
Here's what happens:
- system boots in initramfs
- detects my encrypted ZFS pool and asks for password
- mount system, pivots to it, starts real init
- before any daemon had time to start, the system hangs and the kernel
writes on the console
"nvme 0000:04:00.0: Unable to change power state from D3cold to D0,
device inaccessible"
- if I reboot directly without powering off (using magic sysrq or
panic=10), even the UEFI complains about not finding any storage to
boot from.
- after a real power off, I can boot using a kernel <= 6.1.45.
The bug has been discussed here:
https://bugzilla.kernel.org/show_bug.cgi?id=217705
My laptop is a Dell XPS 15 9560 (Intel 7700hq).
I bisected between 6.1.45 and 6.1.46 and found this commit
commit 8ee39ec479147e29af704639f8e55fce246ed2d9
Author: Ricky WU <ricky_wu(a)realtek.com>
Date: Tue Jul 25 09:10:54 2023 +0000
misc: rtsx: judge ASPM Mode to set PETXCFG Reg
commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
always set to HIGH during the initialization.
Cc: stable(a)vger.kernel.org
Signed-off-by: Ricky Wu <ricky_wu(a)realtek.com>
Link:
https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
drivers/misc/cardreader/rts5227.c | 2 +-
drivers/misc/cardreader/rts5228.c | 18 ------------------
drivers/misc/cardreader/rts5249.c | 3 +--
drivers/misc/cardreader/rts5260.c | 18 ------------------
drivers/misc/cardreader/rts5261.c | 18 ------------------
drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
6 files changed, 6 insertions(+), 58 deletions(-)
If I build 6.1.51 with this commit reverted, my laptop works again,
confirming that this commit is to blame.
Also, blacklisting `rtsx_pci_sdmmc` and `rtsx_pci`, while preventing to
use the sd card reading, allows to boot the system.
I can't try 6.4 or 6.5 because my system is dependent on ZFS..
Have a nice day,
Paul Grandperrin
The new adjustment should be based on the base frequency, not the
I40E_PTP_40GB_INCVAL in i40e_ptp_adjfine().
This issue was introduced in commit 3626a690b717 ("i40e: use
mul_u64_u64_div_u64 for PTP frequency calculation"), and was fixed in
commit 1060707e3809 ("ptp: introduce helpers to adjust by scaled
parts per million"). However the latter is a new feature and hasn't been
backported to the stable releases.
This issue affects both v6.0 and v6.1 versions, and the v6.1 version is
an LTS version.
Fixes: 3626a690b717 ("i40e: use mul_u64_u64_div_u64 for PTP frequency calculation")
Cc: <stable(a)vger.kernel.org> # 6.1
Signed-off-by: Yajun Deng <yajun.deng(a)linux.dev>
---
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index ffea0c9c82f1..97a9efe7b713 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -361,9 +361,9 @@ static int i40e_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
1000000ULL << 16);
if (neg_adj)
- adj = I40E_PTP_40GB_INCVAL - diff;
+ adj = freq - diff;
else
- adj = I40E_PTP_40GB_INCVAL + diff;
+ adj = freq + diff;
wr32(hw, I40E_PRTTSYN_INC_L, adj & 0xFFFFFFFF);
wr32(hw, I40E_PRTTSYN_INC_H, adj >> 32);
--
2.25.1
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> Description:
> When booting into Linux 6.4.4, system no longer recognizes touchpad input (confirmed with xinput). On the lts release, 6.1.39, the input is still recognized.
>
> Additional info:
> * package version(s): Linux 6.4.4, 6.1.39
> * Device: ELAN1206:00 04F3:30F1 Touchpad
>
> Steps to reproduce:
> - Install 6.4.4 with Elan Touchpad 1206
> - Reboot
>
> The issue might be related to bisected commit id: 7b63a88bb62ba2ddf5fcd956be85fe46624628b9
> This is the only recent commit related to Elantech drivers I've noticed that may have broken the input.
See Bugzilla for the full thread:
To the reporter (Verot): Can you attach dmesg and lspci output?
Anyway, I'm adding this regression to be tracked by regzbot:
#regzbot introduced: 7b63a88bb62ba2 https://bugzilla.kernel.org/show_bug.cgi?id=217701
#regzbot title: OOB protocol access fix breaks Elan Touchpad 1206
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217701
--
An old man doll... just what I always wanted! - Clara
From: Dave Chinner <dchinner(a)redhat.com>
[ Upstream commit 7cf2b0f9611b9971d663e1fc3206eeda3b902922 ]
Currently inodegc work can sit queued on the per-cpu queue until
the workqueue is either flushed of the queue reaches a depth that
triggers work queuing (and later throttling). This means that we
could queue work that waits for a long time for some other event to
trigger flushing.
Hence instead of just queueing work at a specific depth, use a
delayed work that queues the work at a bound time. We can still
schedule the work immediately at a given depth, but we no long need
to worry about leaving a number of items on the list that won't get
processed until external events prevail.
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
Acked-by: Darrick J. Wong <djwong(a)kernel.org>
---
fs/xfs/xfs_icache.c | 36 ++++++++++++++++++++++--------------
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_super.c | 2 +-
3 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 5e44d7bbd8fc..2c3ef553f5ef 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -458,7 +458,7 @@ xfs_inodegc_queue_all(
for_each_online_cpu(cpu) {
gc = per_cpu_ptr(mp->m_inodegc, cpu);
if (!llist_empty(&gc->list))
- queue_work_on(cpu, mp->m_inodegc_wq, &gc->work);
+ mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
}
}
@@ -1851,8 +1851,8 @@ void
xfs_inodegc_worker(
struct work_struct *work)
{
- struct xfs_inodegc *gc = container_of(work, struct xfs_inodegc,
- work);
+ struct xfs_inodegc *gc = container_of(to_delayed_work(work),
+ struct xfs_inodegc, work);
struct llist_node *node = llist_del_all(&gc->list);
struct xfs_inode *ip, *n;
@@ -2021,6 +2021,7 @@ xfs_inodegc_queue(
struct xfs_inodegc *gc;
int items;
unsigned int shrinker_hits;
+ unsigned long queue_delay = 1;
trace_xfs_inode_set_need_inactive(ip);
spin_lock(&ip->i_flags_lock);
@@ -2032,19 +2033,26 @@ xfs_inodegc_queue(
items = READ_ONCE(gc->items);
WRITE_ONCE(gc->items, items + 1);
shrinker_hits = READ_ONCE(gc->shrinker_hits);
- put_cpu_ptr(gc);
- if (!xfs_is_inodegc_enabled(mp))
+ /*
+ * We queue the work while holding the current CPU so that the work
+ * is scheduled to run on this CPU.
+ */
+ if (!xfs_is_inodegc_enabled(mp)) {
+ put_cpu_ptr(gc);
return;
-
- if (xfs_inodegc_want_queue_work(ip, items)) {
- trace_xfs_inodegc_queue(mp, __return_address);
- queue_work(mp->m_inodegc_wq, &gc->work);
}
+ if (xfs_inodegc_want_queue_work(ip, items))
+ queue_delay = 0;
+
+ trace_xfs_inodegc_queue(mp, __return_address);
+ mod_delayed_work(mp->m_inodegc_wq, &gc->work, queue_delay);
+ put_cpu_ptr(gc);
+
if (xfs_inodegc_want_flush_work(ip, items, shrinker_hits)) {
trace_xfs_inodegc_throttle(mp, __return_address);
- flush_work(&gc->work);
+ flush_delayed_work(&gc->work);
}
}
@@ -2061,7 +2069,7 @@ xfs_inodegc_cpu_dead(
unsigned int count = 0;
dead_gc = per_cpu_ptr(mp->m_inodegc, dead_cpu);
- cancel_work_sync(&dead_gc->work);
+ cancel_delayed_work_sync(&dead_gc->work);
if (llist_empty(&dead_gc->list))
return;
@@ -2080,12 +2088,12 @@ xfs_inodegc_cpu_dead(
llist_add_batch(first, last, &gc->list);
count += READ_ONCE(gc->items);
WRITE_ONCE(gc->items, count);
- put_cpu_ptr(gc);
if (xfs_is_inodegc_enabled(mp)) {
trace_xfs_inodegc_queue(mp, __return_address);
- queue_work(mp->m_inodegc_wq, &gc->work);
+ mod_delayed_work(mp->m_inodegc_wq, &gc->work, 0);
}
+ put_cpu_ptr(gc);
}
/*
@@ -2180,7 +2188,7 @@ xfs_inodegc_shrinker_scan(
unsigned int h = READ_ONCE(gc->shrinker_hits);
WRITE_ONCE(gc->shrinker_hits, h + 1);
- queue_work_on(cpu, mp->m_inodegc_wq, &gc->work);
+ mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
no_items = false;
}
}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 86564295fce6..3d58938a6f75 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -61,7 +61,7 @@ struct xfs_error_cfg {
*/
struct xfs_inodegc {
struct llist_head list;
- struct work_struct work;
+ struct delayed_work work;
/* approximate count of inodes in the list */
unsigned int items;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index df1d6be61bfa..8fe6ca9208de 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1061,7 +1061,7 @@ xfs_inodegc_init_percpu(
gc = per_cpu_ptr(mp->m_inodegc, cpu);
init_llist_head(&gc->list);
gc->items = 0;
- INIT_WORK(&gc->work, xfs_inodegc_worker);
+ INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker);
}
return 0;
}
--
2.42.0.515.g380fc7ccd1-goog
The patch titled
Subject: mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-keep-vma-walk-if-both-mpol_mf_strict-and-mpol_mf_move-are-specified.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yang Shi <yang(a)os.amperecomputing.com>
Subject: mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
Date: Wed, 20 Sep 2023 15:32:42 -0700
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page. Then commit 6f4576e3687b ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone. The
return value problem was fixed by commit a7f40cfe3b7a ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.
The code should conceptually do:
if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.
Fixed the behavior.
Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.…
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Rafael Aquini <aquini(a)redhat.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-keep-vma-walk-if-both-mpol_mf_strict-and-mpol_mf_move-are-specified
+++ a/mm/mempolicy.c
@@ -426,6 +426,7 @@ struct queue_pages {
unsigned long start;
unsigned long end;
struct vm_area_struct *first;
+ bool has_unmovable;
};
/*
@@ -446,9 +447,8 @@ static inline bool queue_folio_required(
/*
* queue_folios_pmd() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. huge zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
* existing folio was already on a node that does not follow the
* policy.
@@ -479,7 +479,7 @@ static int queue_folios_pmd(pmd_t *pmd,
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma) ||
migrate_folio_add(folio, qp->pagelist, flags)) {
- ret = 1;
+ qp->has_unmovable = true;
goto unlock;
}
} else
@@ -495,9 +495,8 @@ unlock:
*
* queue_folios_pte_range() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - only MPOL_MF_STRICT was specified and an existing folio was already
* on a node that does not follow the policy.
*/
@@ -508,7 +507,6 @@ static int queue_folios_pte_range(pmd_t
struct folio *folio;
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
- bool has_unmovable = false;
pte_t *pte, *mapped_pte;
pte_t ptent;
spinlock_t *ptl;
@@ -538,11 +536,12 @@ static int queue_folios_pte_range(pmd_t
if (!queue_folio_required(folio, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- /* MPOL_MF_STRICT must be specified if we get here */
- if (!vma_migratable(vma)) {
- has_unmovable = true;
- break;
- }
+ /*
+ * MPOL_MF_STRICT must be specified if we get here.
+ * Continue walking vmas due to MPOL_MF_MOVE* flags.
+ */
+ if (!vma_migratable(vma))
+ qp->has_unmovable = true;
/*
* Do not abort immediately since there may be
@@ -550,16 +549,13 @@ static int queue_folios_pte_range(pmd_t
* need migrate other LRU pages.
*/
if (migrate_folio_add(folio, qp->pagelist, flags))
- has_unmovable = true;
+ qp->has_unmovable = true;
} else
break;
}
pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
- if (has_unmovable)
- return 1;
-
return addr != end ? -EIO : 0;
}
@@ -599,7 +595,7 @@ static int queue_folios_hugetlb(pte_t *p
* Detecting misplaced folio but allow migrating folios which
* have been queued.
*/
- ret = 1;
+ qp->has_unmovable = true;
goto unlock;
}
@@ -620,7 +616,7 @@ static int queue_folios_hugetlb(pte_t *p
* Failed to isolate folio but allow migrating pages
* which have been queued.
*/
- ret = 1;
+ qp->has_unmovable = true;
}
unlock:
spin_unlock(ptl);
@@ -756,12 +752,15 @@ queue_pages_range(struct mm_struct *mm,
.start = start,
.end = end,
.first = NULL,
+ .has_unmovable = false,
};
const struct mm_walk_ops *ops = lock_vma ?
&queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops;
err = walk_page_range(mm, start, end, ops, &qp);
+ if (qp.has_unmovable)
+ err = 1;
if (!qp.first)
/* whole range in hole */
err = -EFAULT;
@@ -1361,7 +1360,7 @@ static long do_mbind(unsigned long start
putback_movable_pages(&pagelist);
}
- if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
+ if (((ret > 0) || nr_failed) && (flags & MPOL_MF_STRICT))
err = -EIO;
} else {
up_out:
_
Patches currently in -mm which might be from yang(a)os.amperecomputing.com are
mm-mempolicy-keep-vma-walk-if-both-mpol_mf_strict-and-mpol_mf_move-are-specified.patch
From: Song Shuai <suagrfillet(a)gmail.com>
The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names.
But if page mode is downgraded from kernel cmdline or restricted by
the hardware in 64BIT, it will give a wrong name.
Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD"
that should be "PGD":
0xffffffd840000000-0xffffffd900000000 0x00000000c0000000 3G PUD D A G . . W R V
So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it.
Fixes: e8a62cc26ddf ("riscv: Implement sv48 support")
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Signed-off-by: Song Shuai <suagrfillet(a)gmail.com>
Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230830044129.11481-3-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
arch/riscv/mm/ptdump.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 20a9f991a6d7..e9090b38f811 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -384,6 +384,9 @@ static int __init ptdump_init(void)
kernel_ptd_info.base_addr = KERN_VIRT_START;
+ pg_level[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
+ pg_level[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
+
for (i = 0; i < ARRAY_SIZE(pg_level); i++)
for (j = 0; j < ARRAY_SIZE(pte_bits); j++)
pg_level[i].mask |= pte_bits[j].mask;
--
2.42.0
The patch titled
Subject: mm, memcg: reconsider kmem.limit_in_bytes deprecation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memcg-reconsider-kmemlimit_in_bytes-deprecation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, memcg: reconsider kmem.limit_in_bytes deprecation
Date: Thu, 21 Sep 2023 09:38:29 +0200
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.
Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit 58056f77502f as
well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
Note to backporters to stable trees. a8c49af3be5f ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore. Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1e… [1]
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Tejun heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
mm/memcontrol.c | 14 ++++++++++++++
2 files changed, 21 insertions(+)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-memcg-reconsider-kmemlimit_in_bytes-deprecation
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -92,6 +92,13 @@ Brief summary of control files.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
+ memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
+ memory hard limit. Kernel hard limit is not
+ supported since 5.16. Writing any value to
+ do file will not have any effect same as if
+ nokmem kernel parameter was specified.
+ Kernel memory is still charged and reported
+ by memory.kmem.usage_in_bytes.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
--- a/mm/memcontrol.c~mm-memcg-reconsider-kmemlimit_in_bytes-deprecation
+++ a/mm/memcontrol.c
@@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(st
static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
unsigned int nr_pages)
{
+ struct page_counter *counter;
struct mem_cgroup *memcg;
int ret;
@@ -3867,6 +3868,13 @@ static ssize_t mem_cgroup_write(struct k
case _MEMSWAP:
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
+ case _KMEM:
+ pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
+ "Writing any value to this file has no effect. "
+ "Please report your usecase to linux-mm(a)kvack.org if you "
+ "depend on this functionality.\n");
+ ret = 0;
+ break;
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
@@ -5078,6 +5086,12 @@ static struct cftype mem_cgroup_legacy_f
},
#endif
{
+ .name = "kmem.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
+ {
.name = "kmem.usage_in_bytes",
.private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
.read_u64 = mem_cgroup_read_u64,
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-memcg-reconsider-kmemlimit_in_bytes-deprecation.patch
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
introduced in iommu_dma_init_domain() to fall back if not supported, but
this check runs too late: by that point, devices have been attached to
the IOMMU, and apple-dart's attach_dev() callback does not expect
IOMMU_DOMAIN_DMA_FQ domains.
Change the logic so the IOMMU_DOMAIN_DMA codepath is the default,
instead of explicitly enumerating all types.
Fixes an apple-dart regression in v6.5.
Cc: regressions(a)lists.linux.dev
Cc: stable(a)vger.kernel.org
Suggested-by: Robin Murphy <robin.murphy(a)arm.com>
Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
Changes in v2:
- Fixed the issue in apple-dart instead of the iommu core, per Robin's
suggestion.
- Link to v1: https://lore.kernel.org/r/20230922-iommu-type-regression-v1-1-1ed3825b2c38@…
---
drivers/iommu/apple-dart.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 2082081402d3..0b8927508427 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain *domain,
return ret;
switch (domain->type) {
- case IOMMU_DOMAIN_DMA:
- case IOMMU_DOMAIN_UNMANAGED:
+ default:
ret = apple_dart_domain_add_streams(dart_domain, cfg);
if (ret)
return ret;
---
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
change-id: 20230922-iommu-type-regression-25b4f43df770
Best regards,
--
Hector Martin <marcan(a)marcan.st>
During SCM probe, to identify the SCM convention, scm call is made with
SMC_CONVENTION_ARM_64 followed by SMC_CONVENTION_ARM_32. Based on the
result what convention to be used is decided.
IPQ chipsets starting from IPQ807x, supports both 32bit and 64bit kernel
variants, however TZ firmware runs in 64bit mode. When running on 32bit
kernel, scm call is made with SMC_CONVENTION_ARM_64 is causing the
system crash, due to the difference in the register sets between ARM and
AARCH64, which is accessed by the TZ.
To avoid this, use SMC_CONVENTION_ARM_64 only on ARM64 builds.
Cc: stable(a)vger.kernel.org
Fixes: 9a434cee773a ("firmware: qcom_scm: Dynamically support SMCCC and legacy conventions")
Signed-off-by: Kathiravan T <quic_kathirav(a)quicinc.com>
---
Changes in V2:
- Added the Fixes tag and cc'd stable mailing list
drivers/firmware/qcom_scm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index fde33acd46b7..db6754db48a0 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -171,6 +171,7 @@ static enum qcom_scm_convention __get_convention(void)
if (likely(qcom_scm_convention != SMC_CONVENTION_UNKNOWN))
return qcom_scm_convention;
+#if IS_ENABLED(CONFIG_ARM64)
/*
* Device isn't required as there is only one argument - no device
* needed to dma_map_single to secure world
@@ -191,6 +192,7 @@ static enum qcom_scm_convention __get_convention(void)
forced = true;
goto found;
}
+#endif
probed_convention = SMC_CONVENTION_ARM_32;
ret = __scm_smc_call(NULL, &desc, probed_convention, &res, true);
--
2.17.1
From: Alisa-Dariana Roman <alisa.roman(a)analog.com>
The avdd and the reference voltage are two different sources but the
reference voltage was assigned according to the avdd supply.
Add vref regulator structure and set the reference voltage according to
the vref supply from the devicetree.
In case vref supply is missing, reference voltage is set according to
the avdd supply for compatibility with old devicetrees.
Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Alisa-Dariana Roman <alisa.roman(a)analog.com>
Cc: stable(a)vger.kernel.org
---
v1 -> v2
- use dev_err_probe()
Link: https://lore.kernel.org/lkml/20230923225827.75681-1-alisadariana@gmail.com/
drivers/iio/adc/ad7192.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)
diff --git a/drivers/iio/adc/ad7192.c b/drivers/iio/adc/ad7192.c
index 69d1103b9508..b64fd365f83f 100644
--- a/drivers/iio/adc/ad7192.c
+++ b/drivers/iio/adc/ad7192.c
@@ -177,6 +177,7 @@ struct ad7192_chip_info {
struct ad7192_state {
const struct ad7192_chip_info *chip_info;
struct regulator *avdd;
+ struct regulator *vref;
struct clk *mclk;
u16 int_vref_mv;
u32 fclk;
@@ -1008,10 +1009,30 @@ static int ad7192_probe(struct spi_device *spi)
if (ret)
return dev_err_probe(&spi->dev, ret, "Failed to enable specified DVdd supply\n");
- ret = regulator_get_voltage(st->avdd);
- if (ret < 0) {
- dev_err(&spi->dev, "Device tree error, reference voltage undefined\n");
- return ret;
+ st->vref = devm_regulator_get_optional(&spi->dev, "vref");
+ if (IS_ERR(st->vref)) {
+ if (PTR_ERR(st->vref) != -ENODEV)
+ return PTR_ERR(st->vref);
+
+ ret = regulator_get_voltage(st->avdd);
+ if (ret < 0)
+ return dev_err_probe(&spi->dev, ret,
+ "Device tree error, AVdd voltage undefined\n");
+ } else {
+ ret = regulator_enable(st->vref);
+ if (ret) {
+ dev_err(&spi->dev, "Failed to enable specified Vref supply\n");
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(&spi->dev, ad7192_reg_disable, st->vref);
+ if (ret)
+ return ret;
+
+ ret = regulator_get_voltage(st->vref);
+ if (ret < 0)
+ return dev_err_probe(&spi->dev, ret,
+ "Device tree error, Vref voltage undefined\n");
}
st->int_vref_mv = ret / 1000;
--
2.34.1
From: Alisa-Dariana Roman <alisadariana(a)gmail.com>
The avdd and the reference voltage are two different sources but the
reference voltage was assigned according to the avdd supply.
Add vref regulator structure and set the reference voltage according to
the vref supply from the devicetree.
In case vref supply is missing, reference voltage is set according to
the avdd supply for compatibility with old devicetrees.
Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Alisa-Dariana Roman <alisa.roman(a)analog.com>
Cc: stable(a)vger.kernel.org
---
drivers/iio/adc/ad7192.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iio/adc/ad7192.c b/drivers/iio/adc/ad7192.c
index 69d1103b9508..c414fed60dd3 100644
--- a/drivers/iio/adc/ad7192.c
+++ b/drivers/iio/adc/ad7192.c
@@ -177,6 +177,7 @@ struct ad7192_chip_info {
struct ad7192_state {
const struct ad7192_chip_info *chip_info;
struct regulator *avdd;
+ struct regulator *vref;
struct clk *mclk;
u16 int_vref_mv;
u32 fclk;
@@ -1008,10 +1009,32 @@ static int ad7192_probe(struct spi_device *spi)
if (ret)
return dev_err_probe(&spi->dev, ret, "Failed to enable specified DVdd supply\n");
- ret = regulator_get_voltage(st->avdd);
- if (ret < 0) {
- dev_err(&spi->dev, "Device tree error, reference voltage undefined\n");
- return ret;
+ st->vref = devm_regulator_get_optional(&spi->dev, "vref");
+ if (IS_ERR(st->vref)) {
+ if (PTR_ERR(st->vref) != -ENODEV)
+ return PTR_ERR(st->vref);
+
+ ret = regulator_get_voltage(st->avdd);
+ if (ret < 0) {
+ dev_err(&spi->dev, "Device tree error, AVdd voltage undefined\n");
+ return ret;
+ }
+ } else {
+ ret = regulator_enable(st->vref);
+ if (ret) {
+ dev_err(&spi->dev, "Failed to enable specified Vref supply\n");
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(&spi->dev, ad7192_reg_disable, st->vref);
+ if (ret)
+ return ret;
+
+ ret = regulator_get_voltage(st->vref);
+ if (ret < 0) {
+ dev_err(&spi->dev, "Device tree error, Vref voltage undefined\n");
+ return ret;
+ }
}
st->int_vref_mv = ret / 1000;
--
2.34.1
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
introduced in iommu_dma_init_domain() to fall back if not supported, but
this check runs too late: by that point, devices have been attached to
the IOMMU, and the IOMMU driver might not expect FQ domains at
ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not
supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5.
Cc: regressions(a)lists.linux.dev
Cc: stable(a)vger.kernel.org
Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
drivers/iommu/iommu.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3bfc56df4f78..12464eaa8d91 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2039,6 +2039,15 @@ static int __iommu_attach_device(struct iommu_domain *domain,
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;
+ /*
+ * Ensure we do not try to attach devices to FQ domains if the
+ * IOMMU does not support them. We can safely fall back to
+ * non-FQ.
+ */
+ if (domain->type == IOMMU_DOMAIN_DMA_FQ &&
+ !device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH))
+ domain->type = IOMMU_DOMAIN_DMA;
+
ret = domain->ops->attach_dev(domain, dev);
if (ret)
return ret;
---
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
change-id: 20230922-iommu-type-regression-25b4f43df770
Best regards,
--
Hector Martin <marcan(a)marcan.st>
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 7fda67e8c3ab6069f75888f67958a6d30454a9f6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092055-disband-unveiling-f6cc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
7fda67e8c3ab ("ext4: fix rec_len verify error")
46c116b920eb ("ext4: verify dir block before splitting it")
f036adb39976 ("ext4: rename "dirent_csum" functions to use "dirblock"")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7fda67e8c3ab6069f75888f67958a6d30454a9f6 Mon Sep 17 00:00:00 2001
From: Shida Zhang <zhangshida(a)kylinos.cn>
Date: Thu, 3 Aug 2023 14:09:38 +0800
Subject: [PATCH] ext4: fix rec_len verify error
With the configuration PAGE_SIZE 64k and filesystem blocksize 64k,
a problem occurred when more than 13 million files were directly created
under a directory:
EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum
When enough files are created, the fake_dirent->reclen will be 0xffff.
it doesn't equal to the blocksize 65536, i.e. 0x10000.
But it is not the same condition when blocksize equals to 4k.
when enough files are created, the fake_dirent->reclen will be 0x1000.
it equals to the blocksize 4k, i.e. 0x1000.
The problem seems to be related to the limitation of the 16-bit field
when the blocksize is set to 64k.
To address this, helpers like ext4_rec_len_{from,to}_disk has already
been introduced to complete the conversion between the encoded and the
plain form of rec_len.
So fix this one by using the helper, and all the other in this file too.
Cc: stable(a)kernel.org
Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
Suggested-by: Andreas Dilger <adilger(a)dilger.ca>
Suggested-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Shida Zhang <zhangshida(a)kylinos.cn>
Reviewed-by: Andreas Dilger <adilger(a)dilger.ca>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Link: https://lore.kernel.org/r/20230803060938.1929759-1-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c0f0b4e2413b..c1ceccab05f5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -343,17 +343,17 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
struct buffer_head *bh)
{
struct ext4_dir_entry_tail *t;
+ int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
#ifdef PARANOID
struct ext4_dir_entry *d, *top;
d = (struct ext4_dir_entry *)bh->b_data;
top = (struct ext4_dir_entry *)(bh->b_data +
- (EXT4_BLOCK_SIZE(inode->i_sb) -
- sizeof(struct ext4_dir_entry_tail)));
- while (d < top && d->rec_len)
+ (blocksize - sizeof(struct ext4_dir_entry_tail)));
+ while (d < top && ext4_rec_len_from_disk(d->rec_len, blocksize))
d = (struct ext4_dir_entry *)(((void *)d) +
- le16_to_cpu(d->rec_len));
+ ext4_rec_len_from_disk(d->rec_len, blocksize));
if (d != top)
return NULL;
@@ -364,7 +364,8 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
#endif
if (t->det_reserved_zero1 ||
- le16_to_cpu(t->det_rec_len) != sizeof(struct ext4_dir_entry_tail) ||
+ (ext4_rec_len_from_disk(t->det_rec_len, blocksize) !=
+ sizeof(struct ext4_dir_entry_tail)) ||
t->det_reserved_zero2 ||
t->det_reserved_ft != EXT4_FT_DIR_CSUM)
return NULL;
@@ -445,13 +446,14 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
struct ext4_dir_entry *dp;
struct dx_root_info *root;
int count_offset;
+ int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
+ unsigned int rlen = ext4_rec_len_from_disk(dirent->rec_len, blocksize);
- if (le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb))
+ if (rlen == blocksize)
count_offset = 8;
- else if (le16_to_cpu(dirent->rec_len) == 12) {
+ else if (rlen == 12) {
dp = (struct ext4_dir_entry *)(((void *)dirent) + 12);
- if (le16_to_cpu(dp->rec_len) !=
- EXT4_BLOCK_SIZE(inode->i_sb) - 12)
+ if (ext4_rec_len_from_disk(dp->rec_len, blocksize) != blocksize - 12)
return NULL;
root = (struct dx_root_info *)(((void *)dp + 12));
if (root->reserved_zero ||
@@ -1315,6 +1317,7 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
unsigned int buflen = bh->b_size;
char *base = bh->b_data;
struct dx_hash_info h = *hinfo;
+ int blocksize = EXT4_BLOCK_SIZE(dir->i_sb);
if (ext4_has_metadata_csum(dir->i_sb))
buflen -= sizeof(struct ext4_dir_entry_tail);
@@ -1335,11 +1338,12 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
map_tail--;
map_tail->hash = h.hash;
map_tail->offs = ((char *) de - base)>>2;
- map_tail->size = le16_to_cpu(de->rec_len);
+ map_tail->size = ext4_rec_len_from_disk(de->rec_len,
+ blocksize);
count++;
cond_resched();
}
- de = ext4_next_entry(de, dir->i_sb->s_blocksize);
+ de = ext4_next_entry(de, blocksize);
}
return count;
}
From: Zheng Yejian <zhengyejian1(a)huawei.com>
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of
bytes in cpu buffer that have not been consumed. However, currently
after consuming data by reading file 'trace_pipe', the 'bytes' info
was not changed as expected.
# cat per_cpu/cpu0/stats
entries: 0
overrun: 0
commit overrun: 0
bytes: 568 <--- 'bytes' is problematical !!!
oldest event ts: 8651.371479
now ts: 8653.912224
dropped events: 0
read events: 8
The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it:
1. When stat 'read_bytes', account consumed event in rb_advance_reader();
2. When stat 'entries_bytes', exclude the discarded padding event which
is smaller than minimum size because it is invisible to reader. Then
use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for
page-based read/remove/overrun.
Also correct the comments of ring_buffer_bytes_cpu() in this patch.
Link: https://lore.kernel.org/linux-trace-kernel/20230921125425.1708423-1-zhengye…
Cc: stable(a)vger.kernel.org
Fixes: c64e148a3be3 ("trace: Add ring buffer stats to measure rate of events")
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a1651edc48d5..28daf0ce95c5 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -354,6 +354,11 @@ static void rb_init_page(struct buffer_data_page *bpage)
local_set(&bpage->commit, 0);
}
+static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
+{
+ return local_read(&bpage->page->commit);
+}
+
static void free_buffer_page(struct buffer_page *bpage)
{
free_page((unsigned long)bpage->page);
@@ -2003,7 +2008,7 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
* Increment overrun to account for the lost events.
*/
local_add(page_entries, &cpu_buffer->overrun);
- local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+ local_sub(rb_page_commit(to_remove_page), &cpu_buffer->entries_bytes);
local_inc(&cpu_buffer->pages_lost);
}
@@ -2367,11 +2372,6 @@ rb_reader_event(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->reader_page->read);
}
-static __always_inline unsigned rb_page_commit(struct buffer_page *bpage)
-{
- return local_read(&bpage->page->commit);
-}
-
static struct ring_buffer_event *
rb_iter_head_event(struct ring_buffer_iter *iter)
{
@@ -2517,7 +2517,7 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer,
* the counters.
*/
local_add(entries, &cpu_buffer->overrun);
- local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+ local_sub(rb_page_commit(next_page), &cpu_buffer->entries_bytes);
local_inc(&cpu_buffer->pages_lost);
/*
@@ -2660,9 +2660,6 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
event = __rb_page_index(tail_page, tail);
- /* account for padding bytes */
- local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes);
-
/*
* Save the original length to the meta data.
* This will be used by the reader to add lost event
@@ -2676,7 +2673,8 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
* write counter enough to allow another writer to slip
* in on this page.
* We put in a discarded commit instead, to make sure
- * that this space is not used again.
+ * that this space is not used again, and this space will
+ * not be accounted into 'entries_bytes'.
*
* If we are less than the minimum size, we don't need to
* worry about it.
@@ -2701,6 +2699,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
/* time delta must be non zero */
event->time_delta = 1;
+ /* account for padding bytes */
+ local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes);
+
/* Make sure the padding is visible before the tail_page->write update */
smp_wmb();
@@ -4215,7 +4216,7 @@ u64 ring_buffer_oldest_event_ts(struct trace_buffer *buffer, int cpu)
EXPORT_SYMBOL_GPL(ring_buffer_oldest_event_ts);
/**
- * ring_buffer_bytes_cpu - get the number of bytes consumed in a cpu buffer
+ * ring_buffer_bytes_cpu - get the number of bytes unconsumed in a cpu buffer
* @buffer: The ring buffer
* @cpu: The per CPU buffer to read from.
*/
@@ -4723,6 +4724,7 @@ static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
length = rb_event_length(event);
cpu_buffer->reader_page->read += length;
+ cpu_buffer->read_bytes += length;
}
static void rb_advance_iter(struct ring_buffer_iter *iter)
@@ -5816,7 +5818,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
} else {
/* update the entry counter */
cpu_buffer->read += rb_page_entries(reader);
- cpu_buffer->read_bytes += BUF_PAGE_SIZE;
+ cpu_buffer->read_bytes += rb_page_commit(reader);
/* swap the pages */
rb_init_page(bpage);
--
2.40.1
Hi Greg, Sasha,
The following list shows patches that you can cherry-pick to -stable 6.5.
I am using original commit IDs for reference:
1) 7ab9d0827af8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
2) 4e5f5b47d8de ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
3) 1d16d80d4230 ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
4) 7606622f20da ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
5) 44a76f08f7ca ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (1):
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (4):
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 7 ++++---
net/netfilter/nf_tables_api.c | 32 ++++++++++++++++++++++++++-----
net/netfilter/nft_set_hash.c | 11 ++++-------
net/netfilter/nft_set_pipapo.c | 4 ++--
net/netfilter/nft_set_rbtree.c | 8 +++-----
5 files changed, 40 insertions(+), 22 deletions(-)
--
2.30.2
23. September 2023.
Hallo,
Ich möchte Ihnen einen Geschäftsvorschlag mitteilen. Für weitere
Details antworten Sie auf Englisch.
Grüße
Frau Victoria Cleland
_________________________
Sekretärin: Lamya Essaoui
From: Ming Lei <ming.lei(a)redhat.com>
commit d36a9ea5e7766961e753ee38d4c331bbe6ef659b upstream.
For blk-mq, queue release handler is usually called after
blk_mq_freeze_queue_wait() returns. However, the
q_usage_counter->release() handler may not be run yet at that time, so
this can cause a use-after-free.
Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu().
Since ->release() is called with rcu read lock held, it is agreed that
the race should be covered in caller per discussion from the two links.
Backport-notes: Not a clean cherry-pick since a lot has changed,
however essentially the same fix.
Reported-by: Zhang Wensheng <zhangwensheng(a)huaweicloud.com>
Reported-by: Zhong Jinghua <zhongjinghua(a)huawei.com>
Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u
Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Yu Kuai <yukuai3(a)huawei.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Saranya Muruganandam <saranyamohan(a)google.com>
---
block/blk-sysfs.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index a582ea0da74f..a82bdec923b2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -737,6 +737,7 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head)
struct request_queue *q = container_of(rcu_head, struct request_queue,
rcu_head);
+ percpu_ref_exit(&q->q_usage_counter);
kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
}
@@ -762,8 +763,6 @@ static void blk_release_queue(struct kobject *kobj)
might_sleep();
- percpu_ref_exit(&q->q_usage_counter);
-
if (q->poll_stat)
blk_stat_remove_callback(q, q->poll_cb);
blk_stat_free_callback(q->poll_cb);
--
2.42.0.515.g380fc7ccd1-goog
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that address possible UaF
and memleaks. I am using original commit IDs for reference:
1) 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol")
2) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
3) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
4) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
5) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
6) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
7) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
8) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
9) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
10) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
11) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
12) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
13) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
14) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: integrate pipapo into commit protocol
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 125 +++++------
net/netfilter/nf_tables_api.c | 341 +++++++++++++++++++++++++++---
net/netfilter/nft_set_hash.c | 87 +++++---
net/netfilter/nft_set_pipapo.c | 115 ++++++----
net/netfilter/nft_set_rbtree.c | 157 ++++++++------
5 files changed, 589 insertions(+), 236 deletions(-)
--
2.30.2
The patch titled
Subject: arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
Date: Fri, 22 Sep 2023 12:58:04 +0100
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above. While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs. And since then new
callers have come along which rely on this working. But given the
brokeness is only observable after commit 8a13897fb0da ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.
Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.
Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Alexandre Ghiti <alex(a)ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Helge Deller <deller(a)gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm64/mm/hugetlbpage.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
--- a/arch/arm64/mm/hugetlbpage.c~arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries
+++ a/arch/arm64/mm/hugetlbpage.c
@@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct
flush_tlb_range(&vma, saddr, addr);
}
-static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
-{
- VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
-
- return page_folio(pfn_to_page(swp_offset_pfn(entry)));
-}
-
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned long sz)
{
@@ -257,13 +250,10 @@ void set_huge_pte_at(struct mm_struct *m
unsigned long pfn, dpfn;
pgprot_t hugeprot;
- if (!pte_present(pte)) {
- struct folio *folio;
-
- folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
- ncontig = num_contig_ptes(folio_size(folio), &pgsize);
+ ncontig = num_contig_ptes(sz, &pgsize);
- for (i = 0; i < ncontig; i++, ptep++)
+ if (!pte_present(pte)) {
+ for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
set_pte_at(mm, addr, ptep, pte);
return;
}
@@ -273,7 +263,6 @@ void set_huge_pte_at(struct mm_struct *m
return;
}
- ncontig = find_num_contig(mm, addr, ptep, &pgsize);
pfn = pte_pfn(pte);
dpfn = pgsize >> PAGE_SHIFT;
hugeprot = pte_pgprot(pte);
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-hugetlb-add-huge-page-size-param-to-set_huge_pte_at.patch
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that address possible UaF
and memleaks. I am using original commit IDs for reference:
1) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
2) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
3) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
4) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
5) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
6) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
7) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
8) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
9) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
10) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
11) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
12) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
13) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
14) 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 126 +++++-----
net/netfilter/nf_tables_api.c | 369 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 87 ++++---
net/netfilter/nft_set_pipapo.c | 67 +++---
net/netfilter/nft_set_rbtree.c | 161 +++++++------
5 files changed, 547 insertions(+), 263 deletions(-)
--
2.30.2
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that results in UaF and
memleaks. I am using original commit IDs for reference:
1) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
2) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
3) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
4) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
5) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
6) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
7) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
8) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
9) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
10) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
11) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
12) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
13) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
14) 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 126 +++++-----
net/netfilter/nf_tables_api.c | 369 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 87 ++++---
net/netfilter/nft_set_pipapo.c | 67 +++---
net/netfilter/nft_set_rbtree.c | 161 +++++++------
5 files changed, 547 insertions(+), 263 deletions(-)
--
2.30.2
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Cc: stable(a)vger.kernel.org
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
include/linux/maple_tree.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index e41c70ac7744..f66f5f78f8cf 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -511,6 +511,15 @@ static inline bool mas_is_paused(const struct ma_state *mas)
return mas->node == MAS_PAUSE;
}
+/* Check if the mas is pointing to a node or not */
+static inline bool mas_is_active(struct ma_state *mas)
+{
+ if ((unsigned long)mas->node >= MAPLE_RESERVED_RANGE)
+ return true;
+
+ return false;
+}
+
/**
* mas_reset() - Reset a Maple Tree operation state.
* @mas: Maple Tree operation state.
--
2.40.1
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.
The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.
Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.
Cc: stable(a)vger.kernel.org
Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver")
Reported-by: Aviram Dali <aviramd(a)marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Hello Aviram,
I have not tested this, but based on your report I believe the status
check is indeed missing here and could sometimes lead to unnoticed
partial writes.
Please test on your side and reply with your Tested-by if you validate
the change.
Any backport on kernels predating v4.17 will likely fail because of a
folder rename, so you will have to do the backport manually if needed.
Thanks,
Miquèl
---
drivers/mtd/nand/raw/marvell_nand.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index 30c15e4e1cc0..576441095012 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -1162,6 +1162,7 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
.ndcb[2] = NDCB2_ADDR5_PAGE(page),
};
unsigned int oob_bytes = lt->spare_bytes + (raw ? lt->ecc_bytes : 0);
+ u8 status;
int ret;
/* NFCv2 needs more information about the operation being executed */
@@ -1195,7 +1196,18 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
ret = marvell_nfc_wait_op(chip,
PSEC_TO_MSEC(sdr->tPROG_max));
- return ret;
+ if (ret)
+ return ret;
+
+ /* Check write status on the chip side */
+ ret = nand_status_op(chip, &status);
+ if (ret)
+ return ret;
+
+ if (status & NAND_STATUS_FAIL)
+ return -EIO;
+
+ return 0;
}
static int marvell_nfc_hw_ecc_hmg_write_page_raw(struct nand_chip *chip,
@@ -1624,6 +1636,7 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip,
int data_len = lt->data_bytes;
int spare_len = lt->spare_bytes;
int chunk, ret;
+ u8 status;
marvell_nfc_select_target(chip, chip->cur_cs);
@@ -1660,6 +1673,14 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip,
if (ret)
return ret;
+ /* Check write status on the chip side */
+ ret = nand_status_op(chip, &status);
+ if (ret)
+ return ret;
+
+ if (status & NAND_STATUS_FAIL)
+ return -EIO;
+
return 0;
}
--
2.34.1
We currently provide the physical address of the DMA region
rather than the output of dma_map_resource() which is obviously wrong.
Fixes: 7330fc505af4 ("mtd: rawnand: qcom: stop using phys_to_dma()")
Cc: stable(a)vger.kernel.org
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum(a)quicinc.com>
---
v5: Incorporated suggestions from Miquel/Mani
- Added tag to automatically include this patch in stable tree.
v4: Incorporated suggestion from Miquel
- Modified title and commit description.
https://lore.kernel.org/all/20230912115903.1007-1-quic_bibekkum@quicinc.com/
v3: Incorporated comments from Miquel
- Modified the commit message and title as per suggestions.
https://lore.kernel.org/all/20230912101814.7748-1-quic_bibekkum@quicinc.com/
v2: Incorporated comments from Pavan/Mani.
https://lore.kernel.org/all/20230911133026.29868-1-quic_bibekkum@quicinc.co…
v1: https://lore.kernel.org/all/20230907092854.11408-1-quic_bibekkum@quicinc.co…
drivers/mtd/nand/raw/qcom_nandc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 64499c1b3603..b079605c84d3 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -3444,7 +3444,7 @@ static int qcom_nandc_probe(struct platform_device *pdev)
err_aon_clk:
clk_disable_unprepare(nandc->core_clk);
err_core_clk:
- dma_unmap_resource(dev, res->start, resource_size(res),
+ dma_unmap_resource(dev, nandc->base_dma, resource_size(res),
DMA_BIDIRECTIONAL, 0);
return ret;
}
--
2.17.1
Defining a prctl flag as an int is a footgun because on a 64 bit machine
and with a variadic implementation of prctl (like in musl and glibc),
when used directly as a prctl argument, it can get casted to long with
garbage upper bits which would result in unexpected behaviors.
This patch changes the constant to an unsigned long to eliminate that
possibilities. This does not break UAPI.
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Cc: stable(a)vger.kernel.org
Signed-off-by: Florent Revest <revest(a)chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev(a)ispras.ru>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
include/uapi/linux/prctl.h | 2 +-
tools/include/uapi/linux/prctl.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3c36aeade991..9a85c69782bd 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 3c36aeade991..9a85c69782bd 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Olá, querido, Em duas ocasiões, enviei-lhe um e-mail ao qual você não
respondeu. Você consegue encontrar tempo para me responder agora?
.......................................
Hi Dear,On two occasions, I have sent an email to you which you have
not responded to.Can you find time to respond to me now?
Dzień dobry!
Czy mógłbym przedstawić rozwiązanie, które umożliwia monitoring każdego auta w czasie rzeczywistym w tym jego pozycję, zużycie paliwa i przebieg?
Dodatkowo nasze narzędzie minimalizuje koszty utrzymania samochodów, skraca czas przejazdów, a także tworzenie planu tras czy dostaw.
Z naszej wiedzy i doświadczenia korzysta już ponad 49 tys. Klientów. Monitorujemy 809 000 pojazdów na całym świecie, co jest naszą najlepszą wizytówką.
Bardzo proszę o e-maila zwrotnego, jeśli moglibyśmy wspólnie omówić potencjał wykorzystania takiego rozwiązania w Państwa firmie.
Pozdrawiam
Mateusz Talaga
Hi All,
This series fixes a bug in arm64's implementation of set_huge_pte_at(), which
can result in an unprivileged user causing a kernel panic. The problem was
triggered when running the new uffd poison mm selftest for HUGETLB memory. This
test (and the uffd poison feature) was merged for v6.6-rc1. However, upon
inspection there are multiple other pre-existing paths that can trigger this
bug.
Ideally, I'd like to get this fix in for v6.6 if possible? And I guess it should
be backported too, given there are call sites where this can theoretically
happen that pre-date v6.6-rc1 (I've cc'ed stable(a)vger.kernel.org).
Description of Bug
------------------
arm64's huge pte implementation supports multiple huge page sizes, some of which
are implemented in the page table with contiguous mappings. So set_huge_pte_at()
needs to work out how big the logical pte is, so that it can also work out how
many physical ptes (or pmds) need to be written. It does this by grabbing the
folio out of the pte and querying its size.
However, there are cases when the pte being set is actually a swap entry. But
this also used to work fine, because for huge ptes, we only ever saw migration
entries and hwpoison entries. And both of these types of swap entries have a PFN
embedded, so the code would grab that and everything still worked out.
But over time, more calls to set_huge_pte_at() have been added that set swap
entry types that do not embed a PFN. And this causes the code to go bang. The
triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm:
add uffd unit test for UFFDIO_POISON"), which sets a PTE_MARKER_POISONED swap
entry. But review shows there are other places too (PTE_MARKER_UFFD_WP).
If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it
will dereference a bad pointer in page_folio():
static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
{
VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
return page_folio(pfn_to_page(swp_offset_pfn(entry)));
}
So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill
set_huge_swap_pte_at()"), which aimed to simplify the interface to the core code
by removing set_huge_swap_pte_at() (which took a page size parameter) and
replacing it with calls to set_huge_swap_pte_at() where the size was inferred
from the folio, as descibed above. While that commit didn't break anything at
the time, it did break the interface because it couldn't handle swap entries
without PFNs. And since then new callers have come along which rely on this
working.
Fix
---
The simplest fix would have been to revert the dodgy cleanup commit, but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at(). As per the original intent of the change, it would also
leave us open to future bugs when people invariably get it wrong and call the
wrong helper.
So instead, I've converted the first parameter of set_huge_pte_at() to be a vma
rather than an mm. This means that the arm64 code can easily recover the huge
page size in all cases. It's a bigger change, due to needing to touch the arches
that implement the function, but it is entirely mechanical, so in my view, low
risk.
I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390 (and
additionally x86_64). I've additionally booted and run mm selftests against
arm64, where I observe the uffd poison test is fixed, and there are no other
regressions.
Patches
-------
patches 1-7: Convert core mm and arches to pass vma instead of mm
patch: 8: Fixes the arm64 bug
Patches based on v6.6-rc2.
Thanks,
Ryan
Ryan Roberts (8):
parisc: hugetlb: Convert set_huge_pte_at() to take vma
powerpc: hugetlb: Convert set_huge_pte_at() to take vma
riscv: hugetlb: Convert set_huge_pte_at() to take vma
s390: hugetlb: Convert set_huge_pte_at() to take vma
sparc: hugetlb: Convert set_huge_pte_at() to take vma
mm: hugetlb: Convert set_huge_pte_at() to take vma
arm64: hugetlb: Convert set_huge_pte_at() to take vma
arm64: hugetlb: Fix set_huge_pte_at() to work with all swap entries
arch/arm64/include/asm/hugetlb.h | 2 +-
arch/arm64/mm/hugetlbpage.c | 22 ++++----------
arch/parisc/include/asm/hugetlb.h | 2 +-
arch/parisc/mm/hugetlbpage.c | 4 +--
.../include/asm/nohash/32/hugetlb-8xx.h | 3 +-
arch/powerpc/mm/book3s64/hugetlbpage.c | 2 +-
arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 2 +-
arch/powerpc/mm/nohash/8xx.c | 2 +-
arch/powerpc/mm/pgtable.c | 7 ++++-
arch/riscv/include/asm/hugetlb.h | 2 +-
arch/riscv/mm/hugetlbpage.c | 3 +-
arch/s390/include/asm/hugetlb.h | 8 +++--
arch/s390/mm/hugetlbpage.c | 8 ++++-
arch/sparc/include/asm/hugetlb.h | 8 +++--
arch/sparc/mm/hugetlbpage.c | 8 ++++-
include/asm-generic/hugetlb.h | 6 ++--
include/linux/hugetlb.h | 6 ++--
mm/damon/vaddr.c | 2 +-
mm/hugetlb.c | 30 +++++++++----------
mm/migrate.c | 2 +-
mm/rmap.c | 10 +++----
mm/vmalloc.c | 5 +++-
22 files changed, 80 insertions(+), 64 deletions(-)
--
2.25.1
The patch titled
Subject: mm: make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long
has been added to the -mm mm-unstable branch. Its filename is
mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Florent Revest <revest(a)chromium.org>
Subject: mm: make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long
Date: Mon, 28 Aug 2023 17:08:56 +0200
Defining a prctl flag as an int is a footgun because on a 64 bit machine
and with a variadic implementation of prctl (like in musl and glibc), when
used directly as a prctl argument, it can get casted to long with garbage
upper bits which would result in unexpected behaviors.
This patch changes the constant to an unsigned long to eliminate that
possibilities. This does not break UAPI.
Link: https://lkml.kernel.org/r/20230828150858.393570-5-revest@chromium.org
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Signed-off-by: Florent Revest <revest(a)chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev(a)ispras.ru>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Ayush Jain <ayush.jain3(a)amd.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy(a)arm.com>
Cc: Topi Miettinen <toiwoton(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/uapi/linux/prctl.h | 2 +-
tools/include/uapi/linux/prctl.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/include/uapi/linux/prctl.h~mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long
+++ a/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
--- a/tools/include/uapi/linux/prctl.h~mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long
+++ a/tools/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
_
Patches currently in -mm which might be from revest(a)chromium.org are
kselftest-vm-fix-tabs-spaces-inconsistency-in-the-mdwe-test.patch
kselftest-vm-fix-mdwes-mmap_fixed-test-case.patch
kselftest-vm-check-errnos-in-mdwe_test.patch
mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long.patch
mm-add-a-no_inherit-flag-to-the-pr_set_mdwe-prctl.patch
kselftest-vm-add-tests-for-no-inherit-memory-deny-write-execute.patch
Hi,
We are a professional bags manufacturer from China supplying all types of customized bags. If you're interested, please don't hesitate to contact me.
Best regards,
Steven
There were three heads close together for a space of twenty seconds or so, and then a fearful explosion happened—the unique, tremendous laughter of Mr Colclough, which went off like a charge of melinite and staggered the furniture
THE FIGHT IN THE COACH-HOUSE
So many shocks, emotions, perils, horrors, added to the wound, his first, had tried his youthful body and sensitive nature too severely
The patch titled
Subject: maple_tree: add mas_is_active() to detect in-tree walks
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: add mas_is_active() to detect in-tree walks
Date: Thu, 21 Sep 2023 14:12:35 -0400
Patch series "maple_tree: Fix mas_prev() state regression".
Pedro Falcato contacted me on IRC with an mprotect regression which was
bisected back to the iterator changes for maple tree. Root cause analysis
showed the mas_prev() running off the end of the VMA space (previous from
0) followed by mas_find(), would skip the first value.
This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.
This patch (of 2):
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Pedro Falcato <pedro.falcato(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/maple_tree.h | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/include/linux/maple_tree.h~maple_tree-add-mas_active-to-detect-in-tree-walks
+++ a/include/linux/maple_tree.h
@@ -511,6 +511,15 @@ static inline bool mas_is_paused(const s
return mas->node == MAS_PAUSE;
}
+/* Check if the mas is pointing to a node or not */
+static inline bool mas_is_active(struct ma_state *mas)
+{
+ if ((unsigned long)mas->node >= MAPLE_RESERVED_RANGE)
+ return true;
+
+ return false;
+}
+
/**
* mas_reset() - Reset a Maple Tree operation state.
* @mas: Maple Tree operation state.
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
The patch titled
Subject: nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-potential-use-after-free-in-nilfs_gccache_submit_read_data.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pan Bian <bianpan2016(a)163.com>
Subject: nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
Date: Thu, 21 Sep 2023 23:17:31 +0900
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails. If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed. However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug. This patch moves the release
operation after unlocking and putting the page.
NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed. However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.
[konishi.ryusuke(a)gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163…
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e89 ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Reported-by: Ferry Meng <mengferry(a)linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.c…
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/gcinode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/gcinode.c~nilfs2-fix-potential-use-after-free-in-nilfs_gccache_submit_read_data
+++ a/fs/nilfs2/gcinode.c
@@ -73,10 +73,8 @@ int nilfs_gccache_submit_read_data(struc
struct the_nilfs *nilfs = inode->i_sb->s_fs_info;
err = nilfs_dat_translate(nilfs->ns_dat, vbn, &pbn);
- if (unlikely(err)) { /* -EIO, -ENOMEM, -ENOENT */
- brelse(bh);
+ if (unlikely(err)) /* -EIO, -ENOMEM, -ENOENT */
goto failed;
- }
}
lock_buffer(bh);
@@ -102,6 +100,8 @@ int nilfs_gccache_submit_read_data(struc
failed:
unlock_page(bh->b_page);
put_page(bh->b_page);
+ if (unlikely(err))
+ brelse(bh);
return err;
}
_
Patches currently in -mm which might be from bianpan2016(a)163.com are
nilfs2-fix-potential-use-after-free-in-nilfs_gccache_submit_read_data.patch
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
present before passing it down the stack to insulate callers from
changes to the send address.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jordan Rife <jrife(a)google.com>
---
v3->v4: Maintain reverse xmas tree order for variable declarations.
Remove precondition check for msg_namelen.
v2->v3: Add "Fixes" tag.
v1->v2: Split up original patch into patch series. Perform address copy
in sock_sendmsg() instead of sock->ops->sendmsg().
net/socket.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index c8b08b32f097e..a39ec136f5cff 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -737,6 +737,14 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
return ret;
}
+static int __sock_sendmsg(struct socket *sock, struct msghdr *msg)
+{
+ int err = security_socket_sendmsg(sock, msg,
+ msg_data_left(msg));
+
+ return err ?: sock_sendmsg_nosec(sock, msg);
+}
+
/**
* sock_sendmsg - send a message through @sock
* @sock: socket
@@ -747,10 +755,19 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
*/
int sock_sendmsg(struct socket *sock, struct msghdr *msg)
{
- int err = security_socket_sendmsg(sock, msg,
- msg_data_left(msg));
+ struct sockaddr_storage *save_addr = (struct sockaddr_storage *)msg->msg_name;
+ struct sockaddr_storage address;
+ int ret;
- return err ?: sock_sendmsg_nosec(sock, msg);
+ if (msg->msg_name) {
+ memcpy(&address, msg->msg_name, msg->msg_namelen);
+ msg->msg_name = &address;
+ }
+
+ ret = __sock_sendmsg(sock, msg);
+ msg->msg_name = save_addr;
+
+ return ret;
}
EXPORT_SYMBOL(sock_sendmsg);
@@ -1138,7 +1155,7 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (sock->type == SOCK_SEQPACKET)
msg.msg_flags |= MSG_EOR;
- res = sock_sendmsg(sock, &msg);
+ res = __sock_sendmsg(sock, &msg);
*from = msg.msg_iter;
return res;
}
@@ -2174,7 +2191,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
if (sock->file->f_flags & O_NONBLOCK)
flags |= MSG_DONTWAIT;
msg.msg_flags = flags;
- err = sock_sendmsg(sock, &msg);
+ err = __sock_sendmsg(sock, &msg);
out_put:
fput_light(sock->file, fput_needed);
@@ -2538,7 +2555,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
err = sock_sendmsg_nosec(sock, msg_sys);
goto out_freectl;
}
- err = sock_sendmsg(sock, msg_sys);
+ err = __sock_sendmsg(sock, msg_sys);
/*
* If this is sendmmsg() and sending to current destination address was
* successful, remember it.
--
2.42.0.459.ge4e396fd5e-goog
I'm announcing the release of the 5.10.196 kernel.
This release is only needed by any 5.10.y user that uses configfs, it
resolves a regression in 5.10.195 in that subsystem. Note that many
kernel subsystems use configfs for configuration so to be safe, you
probably want to upgrade if you are not sure.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
fs/configfs/dir.c | 2 --
2 files changed, 1 insertion(+), 3 deletions(-)
Greg Kroah-Hartman (2):
Revert "configfs: fix a race in configfs_lookup()"
Linux 5.10.196
Good day to you.
I would like to share a good news to you, our company have been
awarded the tender to supply to the Malaysian Army Aviation. Thus
we would like to request for updated quotation for item:
Attached your previous quotation for your reference.
Appreciate if you could give us the best price & lead time for
the items. Looking forward for your good offer.
Thank you & Kind Regards,
Andrea MASTELLA
Procurement,
Supply Chain Management.
Phone: +603 8778 9500
The information in this e-mail and any attachment(s) here to is
only for the use of the intended recipient and may be
confidential or privileged. If you are not the intended
recipient, any use of, reliance on, reference to, disclosure of,
alteration to or copying of the information for any purpose is
prohibited. Any information not related to SIR's official
business is solely the author's and does not necessarily
represent SIR's view and is not necessarily endorsed by SIR. SIR
shall not be liable for loss or damage caused by viruses
transmitted by this e-mail or its attachments. SIR is not
responsible for any unauthorized changes made to the information
or for the effect of such changes.
Hi,
My name is Dr. Lisa Williams, from the United States, currently living
in the United Kingdom.
I hope you consider my friend request. I will share some of my photos
and more details about me when I get your reply.
With love
Lisa
Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices")
made the pointers to additional zoned devices to be stored in a
dynamically allocated dmz->ddev array. However, this array is not freed.
Free it when cleaning up zoned device information inside
dmz_put_zoned_device(). Assigning NULL to dmz->ddev elements doesn't make
sense there as they are not supposed to be reused later and the whole dmz
target structure is being cleaned anyway.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/md/dm-zoned-target.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index ad8e670a2f9b..e25cd9db6275 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -753,12 +753,10 @@ static void dmz_put_zoned_device(struct dm_target *ti)
struct dmz_target *dmz = ti->private;
int i;
- for (i = 0; i < dmz->nr_ddevs; i++) {
- if (dmz->ddev[i]) {
+ for (i = 0; i < dmz->nr_ddevs; i++)
+ if (dmz->ddev[i])
dm_put_device(ti, dmz->ddev[i]);
- dmz->ddev[i] = NULL;
- }
- }
+ kfree(dmz->ddev);
}
static int dmz_fixup_devices(struct dm_target *ti)
--
2.42.0
On 9/20/23 11:42, Ian Rogers wrote:
> On Wed, 20 Sept 2023 at 11:13, Florian Fainelli <f.fainelli(a)gmail.com
> <mailto:f.fainelli@gmail.com>> wrote:
>
> On 9/20/23 04:30, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.10.196
> release.
> > There are 83 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied,
> please
> > let me know.
> >
> > Responses should be made by Fri, 22 Sep 2023 11:28:09 +0000.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.196-r… <https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.196-r…>
> > or in the git tree and branch at:
> >
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git <http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git> linux-5.10.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> perf fails to build on ARM, ARM64 and MIPS with:
>
> fixdep: error opening depfile:
> /local/users/fainelli/buildroot/output/bmips/build/linux-custom/tools/perf/pmu-events/.pmu-events.o.d:
> No such file or directory
> make[5]: *** [pmu-events/Build:33:
> /local/users/fainelli/buildroot/output/bmips/build/linux-custom/tools/perf/pmu-events/pmu-events.o]
> Error 2
> make[4]: *** [Makefile.perf:653:
> /local/users/fainelli/buildroot/output/bmips/build/linux-custom/tools/perf/pmu-events/pmu-events-in.o]
> Error 2
> make[3]: *** [Makefile.perf:229: sub-make] Error 2
> make[2]: *** [Makefile:70: all] Error 2
> make[1]: *** [package/pkg-generic.mk:294 <http://pkg-generic.mk:294>:
> /local/users/fainelli/buildroot/output/bmips/build/linux-tools/.stamp_built]
> Error 2
> make: *** [Makefile:27: _all] Error 2
>
> this is caused by 653fc524e350b62479529140dc9abef05abbcc29 ("perf
> build:
> Update build rule for generated files"). Reverting that commit plus
> 5804de1f2324ddcfe3f0b6ad58fcfe4d344e0471 ("perf jevents: Switch
> build to
> use jevents.py") gets us going again.
>
>
> Given the perf tool is backward compatible, does doing backports make sense?
For bugfixes certainly this does not appear to be one though?
--
Florian
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: qcom: camss: Fix pm_domain_on sequence in probe
Author: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Date: Wed Aug 30 16:16:06 2023 +0100
We need to make sure camss_configure_pd() happens before
camss_register_entities() as the vfe_get() path relies on the pointer
provided by camss_configure_pd().
Fix the ordering sequence in probe to ensure the pointers vfe_get() demands
are present by the time camss_register_entities() runs.
In order to facilitate backporting to stable kernels I've moved the
configure_pd() call pretty early on the probe() function so that
irrespective of the existence of the old error handling jump labels this
patch should still apply to -next circa Aug 2023 to v5.13 inclusive.
Fixes: 2f6f8af67203 ("media: camss: Refactor VFE power domain toggling")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/platform/qcom/camss/camss.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
---
diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c
index f11dc59135a5..75991d849b57 100644
--- a/drivers/media/platform/qcom/camss/camss.c
+++ b/drivers/media/platform/qcom/camss/camss.c
@@ -1619,6 +1619,12 @@ static int camss_probe(struct platform_device *pdev)
if (ret < 0)
goto err_cleanup;
+ ret = camss_configure_pd(camss);
+ if (ret < 0) {
+ dev_err(dev, "Failed to configure power domains: %d\n", ret);
+ goto err_cleanup;
+ }
+
ret = camss_init_subdevices(camss);
if (ret < 0)
goto err_cleanup;
@@ -1678,12 +1684,6 @@ static int camss_probe(struct platform_device *pdev)
}
}
- ret = camss_configure_pd(camss);
- if (ret < 0) {
- dev_err(dev, "Failed to configure power domains: %d\n", ret);
- return ret;
- }
-
pm_runtime_enable(dev);
return 0;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 57a943ebfcdb4a97fbb409640234bdb44bfa1953
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091622-outpost-audio-2222@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
57a943ebfcdb ("drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma")
5d945cbcd4b1 ("drm/amd/display: Create a file dedicated to planes")
60693e3a3890 ("Merge tag 'amd-drm-next-5.20-2022-07-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57a943ebfcdb4a97fbb409640234bdb44bfa1953 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Thu, 31 Aug 2023 15:12:28 -0100
Subject: [PATCH] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy
gamma
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma
using a pre-defined sRGB transfer function. It works fine for DCN2
family where degamma ROM and custom curves go to the same color block.
But, on DCN3+, degamma is split into two blocks: degamma ROM for
pre-defined TFs and `gamma correction` for user/custom curves and
degamma ROM settings doesn't apply to cursor plane. To get DRM legacy
gamma working as expected, enable cursor degamma ROM for implict sRGB
degamma on HW with this configuration.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 2198df96ed6f..cc74dd69acf2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1269,6 +1269,13 @@ void amdgpu_dm_plane_handle_cursor_update(struct drm_plane *plane,
attributes.rotation_angle = 0;
attributes.attribute_flags.value = 0;
+ /* Enable cursor degamma ROM on DCN3+ for implicit sRGB degamma in DRM
+ * legacy gamma setup.
+ */
+ if (crtc_state->cm_is_degamma_srgb &&
+ adev->dm.dc->caps.color.dpp.gamma_corr)
+ attributes.attribute_flags.bits.ENABLE_CURSOR_DEGAMMA = 1;
+
attributes.pitch = afb->base.pitches[0] / afb->base.format->cpp[0];
if (crtc_state->stream) {
From: Zhihao Cheng <chengzhihao1(a)huawei.com>
commit d919a1e79bac890421537cf02ae773007bf55e6b upstream.
Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
moved proc_flush_task() behind __exit_signal(). Then, process systemd can
take long period high cpu usage during releasing task in following
concurrent processes:
systemd ps
kernel_waitid stat(/proc/tgid)
do_wait filename_lookup
wait_consider_task lookup_fast
release_task
__exit_signal
__unhash_process
detach_pid
__change_pid // remove task->pid_links
d_revalidate -> pid_revalidate // 0
d_invalidate(/proc/tgid)
shrink_dcache_parent(/proc/tgid)
d_walk(/proc/tgid)
spin_lock_nested(/proc/tgid/fd)
// iterating opened fd
proc_flush_pid |
d_invalidate (/proc/tgid/fd) |
shrink_dcache_parent(/proc/tgid/fd) |
shrink_dentry_list(subdirs) ↓
shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock
Function d_invalidate() will remove dentry from hash firstly, but why does
proc_flush_pid() process dentry '/proc/tgid/fd' before dentry
'/proc/tgid'? That's because proc_pid_make_inode() adds proc inode in
reverse order by invoking hlist_add_head_rcu(). But proc should not add
any inodes under '/proc/tgid' except '/proc/tgid/task/pid', fix it by
adding inode into 'pid->inodes' only if the inode is /proc/tgid or
/proc/tgid/task/pid.
Performance regression:
Create 200 tasks, each task open one file for 50,000 times. Kill all
tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).
Before fix:
$ time killall -wq aa
real 4m40.946s # During this period, we can see 'ps' and 'systemd'
taking high cpu usage.
After fix:
$ time killall -wq aa
real 1m20.732s # During this period, we can see 'systemd' taking
high cpu usage.
Link: https://lkml.kernel.org/r/20220713130029.4133533-1-chengzhihao1@huawei.com
Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Suggested-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Brian Foster <bfoster(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Yu Kuai <yukuai3(a)huawei.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ bp: Context adjustments ]
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
---
fs/proc/base.c | 46 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 8 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a484c30bd5cf..712948e97991 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1881,7 +1881,7 @@ void proc_pid_evict_inode(struct proc_inode *ei)
put_pid(pid);
}
-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block *sb,
struct task_struct *task, umode_t mode)
{
struct inode * inode;
@@ -1910,11 +1910,6 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
/* Let the pid remember us for quick removal */
ei->pid = pid;
- if (S_ISDIR(mode)) {
- spin_lock(&pid->lock);
- hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
- spin_unlock(&pid->lock);
- }
task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
security_task_to_inode(task, inode);
@@ -1927,6 +1922,39 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
return NULL;
}
+/*
+ * Generating an inode and adding it into @pid->inodes, so that task will
+ * invalidate inode's dentry before being released.
+ *
+ * This helper is used for creating dir-type entries under '/proc' and
+ * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
+ * can be released by invalidating '/proc/<tgid>' dentry.
+ * In theory, dentries under '/proc/<tgid>/task' can also be released by
+ * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
+ * thread exiting situation: Any one of threads should invalidate its
+ * '/proc/<tgid>/task/<pid>' dentry before released.
+ */
+static struct inode *proc_pid_make_base_inode(struct super_block *sb,
+ struct task_struct *task, umode_t mode)
+{
+ struct inode *inode;
+ struct proc_inode *ei;
+ struct pid *pid;
+
+ inode = proc_pid_make_inode(sb, task, mode);
+ if (!inode)
+ return NULL;
+
+ /* Let proc_flush_pid find this directory inode */
+ ei = PROC_I(inode);
+ pid = ei->pid;
+ spin_lock(&pid->lock);
+ hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+ spin_unlock(&pid->lock);
+
+ return inode;
+}
+
int pid_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
@@ -3341,7 +3369,8 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry,
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
@@ -3637,7 +3666,8 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry,
struct task_struct *task, const void *ptr)
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
--
2.34.1
Hi Greg,
I recently found a bug in the rsvp traffic classifier in the Linux kernel.
This classifier is already retired in the upstream but affects stable
releases.
The symptom of the bug is that the kernel can be tricked into accessing a
wild pointer, thus crash the kernel.
Since it is just a crash and cannot be used for LPE, I do not want to
trouble security(a)kernel.org. And since the classifier is already
retired in the upstream, I cannot report there.
Since it affects stable releases, I decided to report it here. If it
is not appropriate, I appologize in advance and wonder what will be a
good channel to report bugs that only affects stable releases and no
equivalent fix exists in the upstream.
[Root Cause]
The root cause of the bug is an slab-out-of-bound access, but since the
offset to the original pointer is an `unsign int` fully controlled by
users, the behaviour is ususally a wild pointer access.
in `rsvp_change`, RSVP_PINFO is passed to the kernel without any checks
~~~
static int rsvp_change(...)
{
......
if (tb[TCA_RSVP_PINFO]) {
pinfo = nla_data(tb[TCA_RSVP_PINFO]);
f->spi = pinfo->spi;
f->tunnelhdr = pinfo->tunnelhdr;
}
......
if (pinfo) {
s->dpi = pinfo->dpi;
s->protocol = pinfo->protocol;
s->tunnelid = pinfo->tunnelid;
}
......
}
~~~
As a result, later when the classifier actually does the classification
in `rsvp_classify`:
~~~
TC_INDIRECT_SCOPE int RSVP_CLS(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res)
{
......
*(u32 *)(xprt + s->dpi.offset) ^ s->dpi.key)
......
}
~~~
`xprt + s->dpi.offset` becomes a wild pointer and crashes the kernel.
[Severity]
This will cause a local denial-of-service.
[Patch]
I don't know enough about this subsystem to suggest a proper patch. But
I will suggest to retire rsvp classifier completely just like in the
upstream.
[Affected Version]
I confirmed that this bug affects v5.10, v6.1, and v6.2.
[Proof-of-Concept]
A POC file is attached to this email.
[Splash]
A kernel oops splash is attached to this email.
Best,
Kyle Zeng
From: valis <sec(a)valis.email>
Commit 76e42ae831991c828cffa8c37736ebfb831ad5ec upstream.
[ Fixed small conflict as 'fnew->ifindex' assignment is not protected by
CONFIG_NET_CLS_IND on upstream since a51486266c3 ]
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee5993b ("net: sched: fw use RCU")
Reported-by: valis <sec(a)valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy(a)starlabs.sg>
Signed-off-by: valis <sec(a)valis.email>
Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Reviewed-by: Victor Nogueira <victor(a)mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela(a)mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan(a)starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Luiz Capitulino <luizcap(a)amazon.com>
---
net/sched/cls_fw.c | 1 -
1 file changed, 1 deletion(-)
Valis, Greg,
I noticed that 4.14 is missing this fix while we backported all three fixes
from this series to all stable kernels:
https://lore.kernel.org/all/20230729123202.72406-1-jhs@mojatatu.com
Is there a reason to have skipped 4.14 for this fix? It seems we need it.
This is only compiled-tested though, would be good to have a confirmation
from Valis that the issue is present on 4.14 before applying.
- Luiz
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index e63f9c2e37e5..7b04b315b2bd 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -281,7 +281,6 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
return -ENOBUFS;
fnew->id = f->id;
- fnew->res = f->res;
#ifdef CONFIG_NET_CLS_IND
fnew->ifindex = f->ifindex;
#endif /* CONFIG_NET_CLS_IND */
--
2.40.1
From: Christian König <christian.koenig(a)amd.com>
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.
The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
(cherry picked from commit 35588314e963938dfdcdb792c9170108399377d6)
---
This is a backport of 35588314e963 ("drm/amdgpu: fix amdgpu_cs_p1_user_fence")
to 6.5 and older stable kernels because the original patch does not apply
cleanly as is.
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fb78a8f47587..946d031d2520 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -127,7 +127,6 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
struct drm_gem_object *gobj;
struct amdgpu_bo *bo;
unsigned long size;
- int r;
gobj = drm_gem_object_lookup(p->filp, data->handle);
if (gobj == NULL)
@@ -139,23 +138,14 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
drm_gem_object_put(gobj);
size = amdgpu_bo_size(bo);
- if (size != PAGE_SIZE || (data->offset + 8) > size) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (size != PAGE_SIZE || data->offset > (size - 8))
+ return -EINVAL;
- if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
+ return -EINVAL;
*offset = data->offset;
-
return 0;
-
-error_unref:
- amdgpu_bo_unref(&bo);
- return r;
}
static int amdgpu_cs_p1_bo_handles(struct amdgpu_cs_parser *p,
--
2.41.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092016-bronze-dolphin-b917@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092057-deceiving-jukebox-5350@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ec5fa9fcdeca69edf7dab5ca3b2e0ceb1c08fe9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092029-banter-truth-cf72@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ec5fa9fcdeca ("drm/amd/display: Adjust the MST resume flow")
1e5d4d8eb8c0 ("drm/amd/display: Ext displays with dock can't recognized after resume")
028c4ccfb812 ("drm/amd/display: force connector state when bpc changes during compliance")
d5a43956b73b ("drm/amd/display: move dp capability related logic to link_dp_capability")
94dfeaa46925 ("drm/amd/display: move dp phy related logic to link_dp_phy")
630168a97314 ("drm/amd/display: move dp link training logic to link_dp_training")
d144b40a4833 ("drm/amd/display: move dc_link_dpia logic to link_dp_dpia")
a28d0bac0956 ("drm/amd/display: move dpcd logic from dc_link_dpcd to link_dpcd")
a98cdd8c4856 ("drm/amd/display: refactor ddc logic from dc_link_ddc to link_ddc")
4370f72e3845 ("drm/amd/display: refactor hpd logic from dc_link to link_hpd")
0e8cf83a2b47 ("drm/amd/display: allow hpo and dio encoder switching during dp retrain test")
7462475e3a06 ("drm/amd/display: move dccg programming from link hwss hpo dp to hwss")
e85d59885409 ("drm/amd/display: use encoder type independent hwss instead of accessing enc directly")
ebf13b72020a ("drm/amd/display: Revert Scaler HCBlank issue workaround")
639f6ad6df7f ("drm/amd/display: Revert Reduce delay when sink device not able to ACK 00340h write")
e3aa827e2ab3 ("drm/amd/display: Avoid setting pixel rate divider to N/A")
180f33d27a55 ("drm/amd/display: Adjust DP 8b10b LT exit behavior")
b7ada7ee61d3 ("drm/amd/display: Populate DP2.0 output type for DML pipe")
ea192af507d9 ("drm/amd/display: Only update link settings after successful MST link train")
be9f6b222c52 ("drm/amd/display: Fix fallback issues for DP LL 1.4a tests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ec5fa9fcdeca69edf7dab5ca3b2e0ceb1c08fe9a Mon Sep 17 00:00:00 2001
From: Wayne Lin <wayne.lin(a)amd.com>
Date: Tue, 22 Aug 2023 16:03:17 +0800
Subject: [PATCH] drm/amd/display: Adjust the MST resume flow
[Why]
In drm_dp_mst_topology_mgr_resume() today, it will resume the
mst branch to be ready handling mst mode and also consecutively do
the mst topology probing. Which will cause the dirver have chance
to fire hotplug event before restoring the old state. Then Userspace
will react to the hotplug event based on a wrong state.
[How]
Adjust the mst resume flow as:
1. set dpcd to resume mst branch status
2. restore source old state
3. Do mst resume topology probing
For drm_dp_mst_topology_mgr_resume(), it's better to adjust it to
pull out topology probing work into a 2nd part procedure of the mst
resume. Will have a follow up patch in drm.
Reviewed-by: Chao-kai Wang <stylon.wang(a)amd.com>
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Acked-by: Stylon Wang <stylon.wang(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ca129983a08b..c6fd34bab358 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2340,14 +2340,62 @@ static int dm_late_init(void *handle)
return detect_mst_link_for_all_connectors(adev_to_drm(adev));
}
+static void resume_mst_branch_status(struct drm_dp_mst_topology_mgr *mgr)
+{
+ int ret;
+ u8 guid[16];
+ u64 tmp64;
+
+ mutex_lock(&mgr->lock);
+ if (!mgr->mst_primary)
+ goto out_fail;
+
+ if (drm_dp_read_dpcd_caps(mgr->aux, mgr->dpcd) < 0) {
+ drm_dbg_kms(mgr->dev, "dpcd read failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+
+ ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
+ DP_MST_EN |
+ DP_UP_REQ_EN |
+ DP_UPSTREAM_IS_SRC);
+ if (ret < 0) {
+ drm_dbg_kms(mgr->dev, "mst write failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+
+ /* Some hubs forget their guids after they resume */
+ ret = drm_dp_dpcd_read(mgr->aux, DP_GUID, guid, 16);
+ if (ret != 16) {
+ drm_dbg_kms(mgr->dev, "dpcd read failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+
+ if (memchr_inv(guid, 0, 16) == NULL) {
+ tmp64 = get_jiffies_64();
+ memcpy(&guid[0], &tmp64, sizeof(u64));
+ memcpy(&guid[8], &tmp64, sizeof(u64));
+
+ ret = drm_dp_dpcd_write(mgr->aux, DP_GUID, guid, 16);
+
+ if (ret != 16) {
+ drm_dbg_kms(mgr->dev, "check mstb guid failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+ }
+
+ memcpy(mgr->mst_primary->guid, guid, 16);
+
+out_fail:
+ mutex_unlock(&mgr->lock);
+}
+
static void s3_handle_mst(struct drm_device *dev, bool suspend)
{
struct amdgpu_dm_connector *aconnector;
struct drm_connector *connector;
struct drm_connector_list_iter iter;
struct drm_dp_mst_topology_mgr *mgr;
- int ret;
- bool need_hotplug = false;
drm_connector_list_iter_begin(dev, &iter);
drm_for_each_connector_iter(connector, &iter) {
@@ -2369,18 +2417,15 @@ static void s3_handle_mst(struct drm_device *dev, bool suspend)
if (!dp_is_lttpr_present(aconnector->dc_link))
try_to_configure_aux_timeout(aconnector->dc_link->ddc, LINK_AUX_DEFAULT_TIMEOUT_PERIOD);
- ret = drm_dp_mst_topology_mgr_resume(mgr, true);
- if (ret < 0) {
- dm_helpers_dp_mst_stop_top_mgr(aconnector->dc_link->ctx,
- aconnector->dc_link);
- need_hotplug = true;
- }
+ /* TODO: move resume_mst_branch_status() into drm mst resume again
+ * once topology probing work is pulled out from mst resume into mst
+ * resume 2nd step. mst resume 2nd step should be called after old
+ * state getting restored (i.e. drm_atomic_helper_resume()).
+ */
+ resume_mst_branch_status(mgr);
}
}
drm_connector_list_iter_end(&iter);
-
- if (need_hotplug)
- drm_kms_helper_hotplug_event(dev);
}
static int amdgpu_dm_smu_write_watermarks_table(struct amdgpu_device *adev)
@@ -2774,7 +2819,8 @@ static int dm_resume(void *handle)
struct dm_atomic_state *dm_state = to_dm_atomic_state(dm->atomic_obj.state);
enum dc_connection_type new_connection_type = dc_connection_none;
struct dc_state *dc_state;
- int i, r, j;
+ int i, r, j, ret;
+ bool need_hotplug = false;
if (amdgpu_in_reset(adev)) {
dc_state = dm->cached_dc_state;
@@ -2872,7 +2918,7 @@ static int dm_resume(void *handle)
continue;
/*
- * this is the case when traversing through already created
+ * this is the case when traversing through already created end sink
* MST connectors, should be skipped
*/
if (aconnector && aconnector->mst_root)
@@ -2932,6 +2978,27 @@ static int dm_resume(void *handle)
dm->cached_state = NULL;
+ /* Do mst topology probing after resuming cached state*/
+ drm_connector_list_iter_begin(ddev, &iter);
+ drm_for_each_connector_iter(connector, &iter) {
+ aconnector = to_amdgpu_dm_connector(connector);
+ if (aconnector->dc_link->type != dc_connection_mst_branch ||
+ aconnector->mst_root)
+ continue;
+
+ ret = drm_dp_mst_topology_mgr_resume(&aconnector->mst_mgr, true);
+
+ if (ret < 0) {
+ dm_helpers_dp_mst_stop_top_mgr(aconnector->dc_link->ctx,
+ aconnector->dc_link);
+ need_hotplug = true;
+ }
+ }
+ drm_connector_list_iter_end(&iter);
+
+ if (need_hotplug)
+ drm_kms_helper_hotplug_event(ddev);
+
amdgpu_dm_irq_resume_late(adev);
amdgpu_dm_smu_write_watermarks_table(adev);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e5c624f027ac74f97e97c8f36c69228ac9f1102d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092033-smother-cannabis-e03d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e5c624f027ac ("tracing: Have event inject files inc the trace array ref count")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e5c624f027ac74f97e97c8f36c69228ac9f1102d Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 6 Sep 2023 22:47:16 -0400
Subject: [PATCH] tracing: Have event inject files inc the trace array ref
count
The event inject files add events for a specific trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when a event inject file is opened.
Link: https://lkml.kernel.org/r/20230907024804.292337868@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Fixes: 6c3edaf9fd6a ("tracing: Introduce trace event injection")
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
index abe805d471eb..8650562bdaa9 100644
--- a/kernel/trace/trace_events_inject.c
+++ b/kernel/trace/trace_events_inject.c
@@ -328,7 +328,8 @@ event_inject_read(struct file *file, char __user *buf, size_t size,
}
const struct file_operations event_inject_fops = {
- .open = tracing_open_generic,
+ .open = tracing_open_file_tr,
.read = event_inject_read,
.write = event_inject_write,
+ .release = tracing_release_file_tr,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e5c624f027ac74f97e97c8f36c69228ac9f1102d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092032-groom-mustiness-69eb@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e5c624f027ac ("tracing: Have event inject files inc the trace array ref count")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e5c624f027ac74f97e97c8f36c69228ac9f1102d Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 6 Sep 2023 22:47:16 -0400
Subject: [PATCH] tracing: Have event inject files inc the trace array ref
count
The event inject files add events for a specific trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when a event inject file is opened.
Link: https://lkml.kernel.org/r/20230907024804.292337868@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Fixes: 6c3edaf9fd6a ("tracing: Introduce trace event injection")
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
index abe805d471eb..8650562bdaa9 100644
--- a/kernel/trace/trace_events_inject.c
+++ b/kernel/trace/trace_events_inject.c
@@ -328,7 +328,8 @@ event_inject_read(struct file *file, char __user *buf, size_t size,
}
const struct file_operations event_inject_fops = {
- .open = tracing_open_generic,
+ .open = tracing_open_file_tr,
.read = event_inject_read,
.write = event_inject_write,
+ .release = tracing_release_file_tr,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ef064187a9709393a981a56cce1e31880fd97107
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092004-excavate-unending-0257@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ef064187a970 ("drm/amd/display: fix the white screen issue when >= 64GB DRAM")
c0fb85ae02b6 ("drm/amd/display: setup system context in dm_init")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ef064187a9709393a981a56cce1e31880fd97107 Mon Sep 17 00:00:00 2001
From: Yifan Zhang <yifan1.zhang(a)amd.com>
Date: Fri, 8 Sep 2023 16:46:39 +0800
Subject: [PATCH] drm/amd/display: fix the white screen issue when >= 64GB DRAM
Dropping bit 31:4 of page table base is wrong, it makes page table
base points to wrong address if phys addr is beyond 64GB; dropping
page_table_start/end bit 31:4 is unnecessary since dcn20_vmid_setup
will do that. Also, while we are at it, cleanup the assignments using
upper_32_bits()/lower_32_bits() and AMDGPU_GPU_PAGE_SHIFT.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Acked-by: Harry Wentland <harry.wentland(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang(a)amd.com>
Co-developed-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 88ba8b66de1f..6a0ea15936ae 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1274,11 +1274,15 @@ static void mmhub_read_system_context(struct amdgpu_device *adev, struct dc_phy_
pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);
- page_table_start.high_part = (u32)(adev->gmc.gart_start >> 44) & 0xF;
- page_table_start.low_part = (u32)(adev->gmc.gart_start >> 12);
- page_table_end.high_part = (u32)(adev->gmc.gart_end >> 44) & 0xF;
- page_table_end.low_part = (u32)(adev->gmc.gart_end >> 12);
- page_table_base.high_part = upper_32_bits(pt_base) & 0xF;
+ page_table_start.high_part = upper_32_bits(adev->gmc.gart_start >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_start.low_part = lower_32_bits(adev->gmc.gart_start >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_end.high_part = upper_32_bits(adev->gmc.gart_end >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_end.low_part = lower_32_bits(adev->gmc.gart_end >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_base.high_part = upper_32_bits(pt_base);
page_table_base.low_part = lower_32_bits(pt_base);
pa_config->system_aperture.start_addr = (uint64_t)logical_addr_low << 18;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092038-rebound-flashy-4766@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092037-underpaid-casing-6ccf@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092036-outline-champion-bd90@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092035-atop-usual-3d91@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092034-campsite-isolated-53bf@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092033-sash-truffle-be6d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092033-banker-expensive-aa8d@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092028-snore-semantic-95ac@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092026-imperfect-carnivore-b6de@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092027-worried-darkened-ffcc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092026-charger-blah-7144@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092025-handlebar-decimal-2fff@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092024-quintuple-veteran-81ec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092023-outfit-renounce-262c@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092018-italics-animation-cbfc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092016-cut-spearfish-b5dd@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092015-hazard-genre-fff0@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092014-ritzy-preaching-56d2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 24e0e61db3cb86a66824531989f1df80e0939f26
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092059-broiling-pumice-a10f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
24e0e61db3cb ("ata: libata: disallow dev-initiated LPM transitions to unsupported states")
7fe183c773c4 ("ata: start separating SATA specific code from libata-core.c")
a52fbcfc7b38 ("ata: move EXPORT_SYMBOL_GPL()s close to exported code")
10a663a1b151 ("ata: ahci: Add shutdown to freeze hardware resources of ahci")
95364f36701e ("ata: make qc_prep return ata_completion_errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e0e61db3cb86a66824531989f1df80e0939f26 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)wdc.com>
Date: Mon, 4 Sep 2023 22:42:56 +0200
Subject: [PATCH] ata: libata: disallow dev-initiated LPM transitions to
unsupported states
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In AHCI 1.3.1, the register description for CAP.SSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Slumber state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Slumber requests."
In AHCI 1.3.1, the register description for CAP.PSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Partial state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Partial requests."
Ensure that we always set the corresponding bits in PxSCTL.IPM, such that
a device is not allowed to initiate transitions to power states which are
unsupported by the HBA.
DevSleep is always initiated by the HBA, however, for completeness, set the
corresponding bit in PxSCTL.IPM such that agressive link power management
cannot transition to DevSleep if DevSleep is not supported.
sata_link_scr_lpm() is used by libahci, ata_piix and libata-pmp.
However, only libahci has the ability to read the CAP/CAP2 register to see
if these features are supported. Therefore, in order to not introduce any
regressions on ata_piix or libata-pmp, create flags that indicate that the
respective feature is NOT supported. This way, the behavior for ata_piix
and libata-pmp should remain unchanged.
This change is based on a patch originally submitted by Runa Guo-oc.
Signed-off-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Fixes: 1152b2617a6e ("libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index abb5911c9d09..08745e7db820 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1883,6 +1883,15 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
else
dev_info(&pdev->dev, "SSS flag set, parallel bus scan disabled\n");
+ if (!(hpriv->cap & HOST_CAP_PART))
+ host->flags |= ATA_HOST_NO_PART;
+
+ if (!(hpriv->cap & HOST_CAP_SSC))
+ host->flags |= ATA_HOST_NO_SSC;
+
+ if (!(hpriv->cap2 & HOST_CAP2_SDS))
+ host->flags |= ATA_HOST_NO_DEVSLP;
+
if (pi.flags & ATA_FLAG_EM)
ahci_reset_em(host);
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 5d31c08be013..a701e1538482 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -396,10 +396,23 @@ int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy,
case ATA_LPM_MED_POWER_WITH_DIPM:
case ATA_LPM_MIN_POWER_WITH_PARTIAL:
case ATA_LPM_MIN_POWER:
- if (ata_link_nr_enabled(link) > 0)
- /* no restrictions on LPM transitions */
+ if (ata_link_nr_enabled(link) > 0) {
+ /* assume no restrictions on LPM transitions */
scontrol &= ~(0x7 << 8);
- else {
+
+ /*
+ * If the controller does not support partial, slumber,
+ * or devsleep, then disallow these transitions.
+ */
+ if (link->ap->host->flags & ATA_HOST_NO_PART)
+ scontrol |= (0x1 << 8);
+
+ if (link->ap->host->flags & ATA_HOST_NO_SSC)
+ scontrol |= (0x2 << 8);
+
+ if (link->ap->host->flags & ATA_HOST_NO_DEVSLP)
+ scontrol |= (0x4 << 8);
+ } else {
/* empty port, power off */
scontrol &= ~0xf;
scontrol |= (0x1 << 2);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 52d58b13e5ee..bf4913f4d7ac 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -222,6 +222,10 @@ enum {
ATA_HOST_PARALLEL_SCAN = (1 << 2), /* Ports on this host can be scanned in parallel */
ATA_HOST_IGNORE_ATA = (1 << 3), /* Ignore ATA devices on this host. */
+ ATA_HOST_NO_PART = (1 << 4), /* Host does not support partial */
+ ATA_HOST_NO_SSC = (1 << 5), /* Host does not support slumber */
+ ATA_HOST_NO_DEVSLP = (1 << 6), /* Host does not support devslp */
+
/* bits 24:31 of host->flags are reserved for LLD specific flags */
/* various lengths of time */