Hi,
It's Steven from Xiamen Oready Industry & Trade Co.,Ltd in China. Hope this letter finds you well.
Are you interested in saving costs by importing bags from us? As a leading manufacturer of bags, we'd like to share some of our top sellers with you.
Size, color, logo and packing can all be customized, we can also custom make the bags according to your own designs, OEM & ODM orders are welcome.
Looking forward to hearing from you soon.
Steven Xiu
Xiamen Oready Industry & Trade Co.,Ltd.
as absurd and complex as it is. .02:29.The first paradox is that we love speed
It'll spread to the entire curve
It will disrupt the economy of all the countries
Hi kernel maintainers!
My computer doesn't boot with kernels newer than 6.1.45.
Here's what happens:
- system boots in initramfs
- detects my encrypted ZFS pool and asks for password
- mount system, pivots to it, starts real init
- before any daemon had time to start, the system hangs and the kernel
writes on the console
"nvme 0000:04:00.0: Unable to change power state from D3cold to D0,
device inaccessible"
- if I reboot directly without powering off (using magic sysrq or
panic=10), even the UEFI complains about not finding any storage to
boot from.
- after a real power off, I can boot using a kernel <= 6.1.45.
The bug has been discussed here:
https://bugzilla.kernel.org/show_bug.cgi?id=217705
My laptop is a Dell XPS 15 9560 (Intel 7700hq).
I bisected between 6.1.45 and 6.1.46 and found this commit
commit 8ee39ec479147e29af704639f8e55fce246ed2d9
Author: Ricky WU <ricky_wu(a)realtek.com>
Date: Tue Jul 25 09:10:54 2023 +0000
misc: rtsx: judge ASPM Mode to set PETXCFG Reg
commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
always set to HIGH during the initialization.
Cc: stable(a)vger.kernel.org
Signed-off-by: Ricky Wu <ricky_wu(a)realtek.com>
Link:
https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
drivers/misc/cardreader/rts5227.c | 2 +-
drivers/misc/cardreader/rts5228.c | 18 ------------------
drivers/misc/cardreader/rts5249.c | 3 +--
drivers/misc/cardreader/rts5260.c | 18 ------------------
drivers/misc/cardreader/rts5261.c | 18 ------------------
drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
6 files changed, 6 insertions(+), 58 deletions(-)
If I build 6.1.51 with this commit reverted, my laptop works again,
confirming that this commit is to blame.
Also, blacklisting `rtsx_pci_sdmmc` and `rtsx_pci`, while preventing to
use the sd card reading, allows to boot the system.
I can't try 6.4 or 6.5 because my system is dependent on ZFS..
Have a nice day,
Paul Grandperrin
The new adjustment should be based on the base frequency, not the
I40E_PTP_40GB_INCVAL in i40e_ptp_adjfine().
This issue was introduced in commit 3626a690b717 ("i40e: use
mul_u64_u64_div_u64 for PTP frequency calculation"), and was fixed in
commit 1060707e3809 ("ptp: introduce helpers to adjust by scaled
parts per million"). However the latter is a new feature and hasn't been
backported to the stable releases.
This issue affects both v6.0 and v6.1 versions, and the v6.1 version is
an LTS version.
Fixes: 3626a690b717 ("i40e: use mul_u64_u64_div_u64 for PTP frequency calculation")
Cc: <stable(a)vger.kernel.org> # 6.1
Signed-off-by: Yajun Deng <yajun.deng(a)linux.dev>
---
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index ffea0c9c82f1..97a9efe7b713 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -361,9 +361,9 @@ static int i40e_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
1000000ULL << 16);
if (neg_adj)
- adj = I40E_PTP_40GB_INCVAL - diff;
+ adj = freq - diff;
else
- adj = I40E_PTP_40GB_INCVAL + diff;
+ adj = freq + diff;
wr32(hw, I40E_PRTTSYN_INC_L, adj & 0xFFFFFFFF);
wr32(hw, I40E_PRTTSYN_INC_H, adj >> 32);
--
2.25.1
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> Description:
> When booting into Linux 6.4.4, system no longer recognizes touchpad input (confirmed with xinput). On the lts release, 6.1.39, the input is still recognized.
>
> Additional info:
> * package version(s): Linux 6.4.4, 6.1.39
> * Device: ELAN1206:00 04F3:30F1 Touchpad
>
> Steps to reproduce:
> - Install 6.4.4 with Elan Touchpad 1206
> - Reboot
>
> The issue might be related to bisected commit id: 7b63a88bb62ba2ddf5fcd956be85fe46624628b9
> This is the only recent commit related to Elantech drivers I've noticed that may have broken the input.
See Bugzilla for the full thread:
To the reporter (Verot): Can you attach dmesg and lspci output?
Anyway, I'm adding this regression to be tracked by regzbot:
#regzbot introduced: 7b63a88bb62ba2 https://bugzilla.kernel.org/show_bug.cgi?id=217701
#regzbot title: OOB protocol access fix breaks Elan Touchpad 1206
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217701
--
An old man doll... just what I always wanted! - Clara
From: Dave Chinner <dchinner(a)redhat.com>
[ Upstream commit 7cf2b0f9611b9971d663e1fc3206eeda3b902922 ]
Currently inodegc work can sit queued on the per-cpu queue until
the workqueue is either flushed of the queue reaches a depth that
triggers work queuing (and later throttling). This means that we
could queue work that waits for a long time for some other event to
trigger flushing.
Hence instead of just queueing work at a specific depth, use a
delayed work that queues the work at a bound time. We can still
schedule the work immediately at a given depth, but we no long need
to worry about leaving a number of items on the list that won't get
processed until external events prevail.
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
Acked-by: Darrick J. Wong <djwong(a)kernel.org>
---
fs/xfs/xfs_icache.c | 36 ++++++++++++++++++++++--------------
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_super.c | 2 +-
3 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 5e44d7bbd8fc..2c3ef553f5ef 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -458,7 +458,7 @@ xfs_inodegc_queue_all(
for_each_online_cpu(cpu) {
gc = per_cpu_ptr(mp->m_inodegc, cpu);
if (!llist_empty(&gc->list))
- queue_work_on(cpu, mp->m_inodegc_wq, &gc->work);
+ mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
}
}
@@ -1851,8 +1851,8 @@ void
xfs_inodegc_worker(
struct work_struct *work)
{
- struct xfs_inodegc *gc = container_of(work, struct xfs_inodegc,
- work);
+ struct xfs_inodegc *gc = container_of(to_delayed_work(work),
+ struct xfs_inodegc, work);
struct llist_node *node = llist_del_all(&gc->list);
struct xfs_inode *ip, *n;
@@ -2021,6 +2021,7 @@ xfs_inodegc_queue(
struct xfs_inodegc *gc;
int items;
unsigned int shrinker_hits;
+ unsigned long queue_delay = 1;
trace_xfs_inode_set_need_inactive(ip);
spin_lock(&ip->i_flags_lock);
@@ -2032,19 +2033,26 @@ xfs_inodegc_queue(
items = READ_ONCE(gc->items);
WRITE_ONCE(gc->items, items + 1);
shrinker_hits = READ_ONCE(gc->shrinker_hits);
- put_cpu_ptr(gc);
- if (!xfs_is_inodegc_enabled(mp))
+ /*
+ * We queue the work while holding the current CPU so that the work
+ * is scheduled to run on this CPU.
+ */
+ if (!xfs_is_inodegc_enabled(mp)) {
+ put_cpu_ptr(gc);
return;
-
- if (xfs_inodegc_want_queue_work(ip, items)) {
- trace_xfs_inodegc_queue(mp, __return_address);
- queue_work(mp->m_inodegc_wq, &gc->work);
}
+ if (xfs_inodegc_want_queue_work(ip, items))
+ queue_delay = 0;
+
+ trace_xfs_inodegc_queue(mp, __return_address);
+ mod_delayed_work(mp->m_inodegc_wq, &gc->work, queue_delay);
+ put_cpu_ptr(gc);
+
if (xfs_inodegc_want_flush_work(ip, items, shrinker_hits)) {
trace_xfs_inodegc_throttle(mp, __return_address);
- flush_work(&gc->work);
+ flush_delayed_work(&gc->work);
}
}
@@ -2061,7 +2069,7 @@ xfs_inodegc_cpu_dead(
unsigned int count = 0;
dead_gc = per_cpu_ptr(mp->m_inodegc, dead_cpu);
- cancel_work_sync(&dead_gc->work);
+ cancel_delayed_work_sync(&dead_gc->work);
if (llist_empty(&dead_gc->list))
return;
@@ -2080,12 +2088,12 @@ xfs_inodegc_cpu_dead(
llist_add_batch(first, last, &gc->list);
count += READ_ONCE(gc->items);
WRITE_ONCE(gc->items, count);
- put_cpu_ptr(gc);
if (xfs_is_inodegc_enabled(mp)) {
trace_xfs_inodegc_queue(mp, __return_address);
- queue_work(mp->m_inodegc_wq, &gc->work);
+ mod_delayed_work(mp->m_inodegc_wq, &gc->work, 0);
}
+ put_cpu_ptr(gc);
}
/*
@@ -2180,7 +2188,7 @@ xfs_inodegc_shrinker_scan(
unsigned int h = READ_ONCE(gc->shrinker_hits);
WRITE_ONCE(gc->shrinker_hits, h + 1);
- queue_work_on(cpu, mp->m_inodegc_wq, &gc->work);
+ mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
no_items = false;
}
}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 86564295fce6..3d58938a6f75 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -61,7 +61,7 @@ struct xfs_error_cfg {
*/
struct xfs_inodegc {
struct llist_head list;
- struct work_struct work;
+ struct delayed_work work;
/* approximate count of inodes in the list */
unsigned int items;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index df1d6be61bfa..8fe6ca9208de 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1061,7 +1061,7 @@ xfs_inodegc_init_percpu(
gc = per_cpu_ptr(mp->m_inodegc, cpu);
init_llist_head(&gc->list);
gc->items = 0;
- INIT_WORK(&gc->work, xfs_inodegc_worker);
+ INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker);
}
return 0;
}
--
2.42.0.515.g380fc7ccd1-goog
The patch titled
Subject: mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-keep-vma-walk-if-both-mpol_mf_strict-and-mpol_mf_move-are-specified.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yang Shi <yang(a)os.amperecomputing.com>
Subject: mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
Date: Wed, 20 Sep 2023 15:32:42 -0700
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page. Then commit 6f4576e3687b ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone. The
return value problem was fixed by commit a7f40cfe3b7a ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.
The code should conceptually do:
if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.
Fixed the behavior.
Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.…
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Rafael Aquini <aquini(a)redhat.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-keep-vma-walk-if-both-mpol_mf_strict-and-mpol_mf_move-are-specified
+++ a/mm/mempolicy.c
@@ -426,6 +426,7 @@ struct queue_pages {
unsigned long start;
unsigned long end;
struct vm_area_struct *first;
+ bool has_unmovable;
};
/*
@@ -446,9 +447,8 @@ static inline bool queue_folio_required(
/*
* queue_folios_pmd() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. huge zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
* existing folio was already on a node that does not follow the
* policy.
@@ -479,7 +479,7 @@ static int queue_folios_pmd(pmd_t *pmd,
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma) ||
migrate_folio_add(folio, qp->pagelist, flags)) {
- ret = 1;
+ qp->has_unmovable = true;
goto unlock;
}
} else
@@ -495,9 +495,8 @@ unlock:
*
* queue_folios_pte_range() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - only MPOL_MF_STRICT was specified and an existing folio was already
* on a node that does not follow the policy.
*/
@@ -508,7 +507,6 @@ static int queue_folios_pte_range(pmd_t
struct folio *folio;
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
- bool has_unmovable = false;
pte_t *pte, *mapped_pte;
pte_t ptent;
spinlock_t *ptl;
@@ -538,11 +536,12 @@ static int queue_folios_pte_range(pmd_t
if (!queue_folio_required(folio, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- /* MPOL_MF_STRICT must be specified if we get here */
- if (!vma_migratable(vma)) {
- has_unmovable = true;
- break;
- }
+ /*
+ * MPOL_MF_STRICT must be specified if we get here.
+ * Continue walking vmas due to MPOL_MF_MOVE* flags.
+ */
+ if (!vma_migratable(vma))
+ qp->has_unmovable = true;
/*
* Do not abort immediately since there may be
@@ -550,16 +549,13 @@ static int queue_folios_pte_range(pmd_t
* need migrate other LRU pages.
*/
if (migrate_folio_add(folio, qp->pagelist, flags))
- has_unmovable = true;
+ qp->has_unmovable = true;
} else
break;
}
pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
- if (has_unmovable)
- return 1;
-
return addr != end ? -EIO : 0;
}
@@ -599,7 +595,7 @@ static int queue_folios_hugetlb(pte_t *p
* Detecting misplaced folio but allow migrating folios which
* have been queued.
*/
- ret = 1;
+ qp->has_unmovable = true;
goto unlock;
}
@@ -620,7 +616,7 @@ static int queue_folios_hugetlb(pte_t *p
* Failed to isolate folio but allow migrating pages
* which have been queued.
*/
- ret = 1;
+ qp->has_unmovable = true;
}
unlock:
spin_unlock(ptl);
@@ -756,12 +752,15 @@ queue_pages_range(struct mm_struct *mm,
.start = start,
.end = end,
.first = NULL,
+ .has_unmovable = false,
};
const struct mm_walk_ops *ops = lock_vma ?
&queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops;
err = walk_page_range(mm, start, end, ops, &qp);
+ if (qp.has_unmovable)
+ err = 1;
if (!qp.first)
/* whole range in hole */
err = -EFAULT;
@@ -1361,7 +1360,7 @@ static long do_mbind(unsigned long start
putback_movable_pages(&pagelist);
}
- if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
+ if (((ret > 0) || nr_failed) && (flags & MPOL_MF_STRICT))
err = -EIO;
} else {
up_out:
_
Patches currently in -mm which might be from yang(a)os.amperecomputing.com are
mm-mempolicy-keep-vma-walk-if-both-mpol_mf_strict-and-mpol_mf_move-are-specified.patch
From: Song Shuai <suagrfillet(a)gmail.com>
The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names.
But if page mode is downgraded from kernel cmdline or restricted by
the hardware in 64BIT, it will give a wrong name.
Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD"
that should be "PGD":
0xffffffd840000000-0xffffffd900000000 0x00000000c0000000 3G PUD D A G . . W R V
So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it.
Fixes: e8a62cc26ddf ("riscv: Implement sv48 support")
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Signed-off-by: Song Shuai <suagrfillet(a)gmail.com>
Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230830044129.11481-3-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
arch/riscv/mm/ptdump.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 20a9f991a6d7..e9090b38f811 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -384,6 +384,9 @@ static int __init ptdump_init(void)
kernel_ptd_info.base_addr = KERN_VIRT_START;
+ pg_level[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
+ pg_level[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
+
for (i = 0; i < ARRAY_SIZE(pg_level); i++)
for (j = 0; j < ARRAY_SIZE(pte_bits); j++)
pg_level[i].mask |= pte_bits[j].mask;
--
2.42.0
The patch titled
Subject: mm, memcg: reconsider kmem.limit_in_bytes deprecation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memcg-reconsider-kmemlimit_in_bytes-deprecation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, memcg: reconsider kmem.limit_in_bytes deprecation
Date: Thu, 21 Sep 2023 09:38:29 +0200
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.
Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit 58056f77502f as
well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
Note to backporters to stable trees. a8c49af3be5f ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore. Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1e… [1]
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Tejun heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
mm/memcontrol.c | 14 ++++++++++++++
2 files changed, 21 insertions(+)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-memcg-reconsider-kmemlimit_in_bytes-deprecation
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -92,6 +92,13 @@ Brief summary of control files.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
+ memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
+ memory hard limit. Kernel hard limit is not
+ supported since 5.16. Writing any value to
+ do file will not have any effect same as if
+ nokmem kernel parameter was specified.
+ Kernel memory is still charged and reported
+ by memory.kmem.usage_in_bytes.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
--- a/mm/memcontrol.c~mm-memcg-reconsider-kmemlimit_in_bytes-deprecation
+++ a/mm/memcontrol.c
@@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(st
static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
unsigned int nr_pages)
{
+ struct page_counter *counter;
struct mem_cgroup *memcg;
int ret;
@@ -3867,6 +3868,13 @@ static ssize_t mem_cgroup_write(struct k
case _MEMSWAP:
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
+ case _KMEM:
+ pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
+ "Writing any value to this file has no effect. "
+ "Please report your usecase to linux-mm(a)kvack.org if you "
+ "depend on this functionality.\n");
+ ret = 0;
+ break;
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
@@ -5078,6 +5086,12 @@ static struct cftype mem_cgroup_legacy_f
},
#endif
{
+ .name = "kmem.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
+ {
.name = "kmem.usage_in_bytes",
.private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
.read_u64 = mem_cgroup_read_u64,
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-memcg-reconsider-kmemlimit_in_bytes-deprecation.patch
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
introduced in iommu_dma_init_domain() to fall back if not supported, but
this check runs too late: by that point, devices have been attached to
the IOMMU, and apple-dart's attach_dev() callback does not expect
IOMMU_DOMAIN_DMA_FQ domains.
Change the logic so the IOMMU_DOMAIN_DMA codepath is the default,
instead of explicitly enumerating all types.
Fixes an apple-dart regression in v6.5.
Cc: regressions(a)lists.linux.dev
Cc: stable(a)vger.kernel.org
Suggested-by: Robin Murphy <robin.murphy(a)arm.com>
Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
Changes in v2:
- Fixed the issue in apple-dart instead of the iommu core, per Robin's
suggestion.
- Link to v1: https://lore.kernel.org/r/20230922-iommu-type-regression-v1-1-1ed3825b2c38@…
---
drivers/iommu/apple-dart.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 2082081402d3..0b8927508427 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain *domain,
return ret;
switch (domain->type) {
- case IOMMU_DOMAIN_DMA:
- case IOMMU_DOMAIN_UNMANAGED:
+ default:
ret = apple_dart_domain_add_streams(dart_domain, cfg);
if (ret)
return ret;
---
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
change-id: 20230922-iommu-type-regression-25b4f43df770
Best regards,
--
Hector Martin <marcan(a)marcan.st>
During SCM probe, to identify the SCM convention, scm call is made with
SMC_CONVENTION_ARM_64 followed by SMC_CONVENTION_ARM_32. Based on the
result what convention to be used is decided.
IPQ chipsets starting from IPQ807x, supports both 32bit and 64bit kernel
variants, however TZ firmware runs in 64bit mode. When running on 32bit
kernel, scm call is made with SMC_CONVENTION_ARM_64 is causing the
system crash, due to the difference in the register sets between ARM and
AARCH64, which is accessed by the TZ.
To avoid this, use SMC_CONVENTION_ARM_64 only on ARM64 builds.
Cc: stable(a)vger.kernel.org
Fixes: 9a434cee773a ("firmware: qcom_scm: Dynamically support SMCCC and legacy conventions")
Signed-off-by: Kathiravan T <quic_kathirav(a)quicinc.com>
---
Changes in V2:
- Added the Fixes tag and cc'd stable mailing list
drivers/firmware/qcom_scm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index fde33acd46b7..db6754db48a0 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -171,6 +171,7 @@ static enum qcom_scm_convention __get_convention(void)
if (likely(qcom_scm_convention != SMC_CONVENTION_UNKNOWN))
return qcom_scm_convention;
+#if IS_ENABLED(CONFIG_ARM64)
/*
* Device isn't required as there is only one argument - no device
* needed to dma_map_single to secure world
@@ -191,6 +192,7 @@ static enum qcom_scm_convention __get_convention(void)
forced = true;
goto found;
}
+#endif
probed_convention = SMC_CONVENTION_ARM_32;
ret = __scm_smc_call(NULL, &desc, probed_convention, &res, true);
--
2.17.1
From: Alisa-Dariana Roman <alisa.roman(a)analog.com>
The avdd and the reference voltage are two different sources but the
reference voltage was assigned according to the avdd supply.
Add vref regulator structure and set the reference voltage according to
the vref supply from the devicetree.
In case vref supply is missing, reference voltage is set according to
the avdd supply for compatibility with old devicetrees.
Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Alisa-Dariana Roman <alisa.roman(a)analog.com>
Cc: stable(a)vger.kernel.org
---
v1 -> v2
- use dev_err_probe()
Link: https://lore.kernel.org/lkml/20230923225827.75681-1-alisadariana@gmail.com/
drivers/iio/adc/ad7192.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)
diff --git a/drivers/iio/adc/ad7192.c b/drivers/iio/adc/ad7192.c
index 69d1103b9508..b64fd365f83f 100644
--- a/drivers/iio/adc/ad7192.c
+++ b/drivers/iio/adc/ad7192.c
@@ -177,6 +177,7 @@ struct ad7192_chip_info {
struct ad7192_state {
const struct ad7192_chip_info *chip_info;
struct regulator *avdd;
+ struct regulator *vref;
struct clk *mclk;
u16 int_vref_mv;
u32 fclk;
@@ -1008,10 +1009,30 @@ static int ad7192_probe(struct spi_device *spi)
if (ret)
return dev_err_probe(&spi->dev, ret, "Failed to enable specified DVdd supply\n");
- ret = regulator_get_voltage(st->avdd);
- if (ret < 0) {
- dev_err(&spi->dev, "Device tree error, reference voltage undefined\n");
- return ret;
+ st->vref = devm_regulator_get_optional(&spi->dev, "vref");
+ if (IS_ERR(st->vref)) {
+ if (PTR_ERR(st->vref) != -ENODEV)
+ return PTR_ERR(st->vref);
+
+ ret = regulator_get_voltage(st->avdd);
+ if (ret < 0)
+ return dev_err_probe(&spi->dev, ret,
+ "Device tree error, AVdd voltage undefined\n");
+ } else {
+ ret = regulator_enable(st->vref);
+ if (ret) {
+ dev_err(&spi->dev, "Failed to enable specified Vref supply\n");
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(&spi->dev, ad7192_reg_disable, st->vref);
+ if (ret)
+ return ret;
+
+ ret = regulator_get_voltage(st->vref);
+ if (ret < 0)
+ return dev_err_probe(&spi->dev, ret,
+ "Device tree error, Vref voltage undefined\n");
}
st->int_vref_mv = ret / 1000;
--
2.34.1
From: Alisa-Dariana Roman <alisadariana(a)gmail.com>
The avdd and the reference voltage are two different sources but the
reference voltage was assigned according to the avdd supply.
Add vref regulator structure and set the reference voltage according to
the vref supply from the devicetree.
In case vref supply is missing, reference voltage is set according to
the avdd supply for compatibility with old devicetrees.
Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Alisa-Dariana Roman <alisa.roman(a)analog.com>
Cc: stable(a)vger.kernel.org
---
drivers/iio/adc/ad7192.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iio/adc/ad7192.c b/drivers/iio/adc/ad7192.c
index 69d1103b9508..c414fed60dd3 100644
--- a/drivers/iio/adc/ad7192.c
+++ b/drivers/iio/adc/ad7192.c
@@ -177,6 +177,7 @@ struct ad7192_chip_info {
struct ad7192_state {
const struct ad7192_chip_info *chip_info;
struct regulator *avdd;
+ struct regulator *vref;
struct clk *mclk;
u16 int_vref_mv;
u32 fclk;
@@ -1008,10 +1009,32 @@ static int ad7192_probe(struct spi_device *spi)
if (ret)
return dev_err_probe(&spi->dev, ret, "Failed to enable specified DVdd supply\n");
- ret = regulator_get_voltage(st->avdd);
- if (ret < 0) {
- dev_err(&spi->dev, "Device tree error, reference voltage undefined\n");
- return ret;
+ st->vref = devm_regulator_get_optional(&spi->dev, "vref");
+ if (IS_ERR(st->vref)) {
+ if (PTR_ERR(st->vref) != -ENODEV)
+ return PTR_ERR(st->vref);
+
+ ret = regulator_get_voltage(st->avdd);
+ if (ret < 0) {
+ dev_err(&spi->dev, "Device tree error, AVdd voltage undefined\n");
+ return ret;
+ }
+ } else {
+ ret = regulator_enable(st->vref);
+ if (ret) {
+ dev_err(&spi->dev, "Failed to enable specified Vref supply\n");
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(&spi->dev, ad7192_reg_disable, st->vref);
+ if (ret)
+ return ret;
+
+ ret = regulator_get_voltage(st->vref);
+ if (ret < 0) {
+ dev_err(&spi->dev, "Device tree error, Vref voltage undefined\n");
+ return ret;
+ }
}
st->int_vref_mv = ret / 1000;
--
2.34.1
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
introduced in iommu_dma_init_domain() to fall back if not supported, but
this check runs too late: by that point, devices have been attached to
the IOMMU, and the IOMMU driver might not expect FQ domains at
ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not
supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5.
Cc: regressions(a)lists.linux.dev
Cc: stable(a)vger.kernel.org
Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
drivers/iommu/iommu.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3bfc56df4f78..12464eaa8d91 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2039,6 +2039,15 @@ static int __iommu_attach_device(struct iommu_domain *domain,
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;
+ /*
+ * Ensure we do not try to attach devices to FQ domains if the
+ * IOMMU does not support them. We can safely fall back to
+ * non-FQ.
+ */
+ if (domain->type == IOMMU_DOMAIN_DMA_FQ &&
+ !device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH))
+ domain->type = IOMMU_DOMAIN_DMA;
+
ret = domain->ops->attach_dev(domain, dev);
if (ret)
return ret;
---
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
change-id: 20230922-iommu-type-regression-25b4f43df770
Best regards,
--
Hector Martin <marcan(a)marcan.st>
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 7fda67e8c3ab6069f75888f67958a6d30454a9f6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092055-disband-unveiling-f6cc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
7fda67e8c3ab ("ext4: fix rec_len verify error")
46c116b920eb ("ext4: verify dir block before splitting it")
f036adb39976 ("ext4: rename "dirent_csum" functions to use "dirblock"")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7fda67e8c3ab6069f75888f67958a6d30454a9f6 Mon Sep 17 00:00:00 2001
From: Shida Zhang <zhangshida(a)kylinos.cn>
Date: Thu, 3 Aug 2023 14:09:38 +0800
Subject: [PATCH] ext4: fix rec_len verify error
With the configuration PAGE_SIZE 64k and filesystem blocksize 64k,
a problem occurred when more than 13 million files were directly created
under a directory:
EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum
When enough files are created, the fake_dirent->reclen will be 0xffff.
it doesn't equal to the blocksize 65536, i.e. 0x10000.
But it is not the same condition when blocksize equals to 4k.
when enough files are created, the fake_dirent->reclen will be 0x1000.
it equals to the blocksize 4k, i.e. 0x1000.
The problem seems to be related to the limitation of the 16-bit field
when the blocksize is set to 64k.
To address this, helpers like ext4_rec_len_{from,to}_disk has already
been introduced to complete the conversion between the encoded and the
plain form of rec_len.
So fix this one by using the helper, and all the other in this file too.
Cc: stable(a)kernel.org
Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
Suggested-by: Andreas Dilger <adilger(a)dilger.ca>
Suggested-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Shida Zhang <zhangshida(a)kylinos.cn>
Reviewed-by: Andreas Dilger <adilger(a)dilger.ca>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Link: https://lore.kernel.org/r/20230803060938.1929759-1-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c0f0b4e2413b..c1ceccab05f5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -343,17 +343,17 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
struct buffer_head *bh)
{
struct ext4_dir_entry_tail *t;
+ int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
#ifdef PARANOID
struct ext4_dir_entry *d, *top;
d = (struct ext4_dir_entry *)bh->b_data;
top = (struct ext4_dir_entry *)(bh->b_data +
- (EXT4_BLOCK_SIZE(inode->i_sb) -
- sizeof(struct ext4_dir_entry_tail)));
- while (d < top && d->rec_len)
+ (blocksize - sizeof(struct ext4_dir_entry_tail)));
+ while (d < top && ext4_rec_len_from_disk(d->rec_len, blocksize))
d = (struct ext4_dir_entry *)(((void *)d) +
- le16_to_cpu(d->rec_len));
+ ext4_rec_len_from_disk(d->rec_len, blocksize));
if (d != top)
return NULL;
@@ -364,7 +364,8 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
#endif
if (t->det_reserved_zero1 ||
- le16_to_cpu(t->det_rec_len) != sizeof(struct ext4_dir_entry_tail) ||
+ (ext4_rec_len_from_disk(t->det_rec_len, blocksize) !=
+ sizeof(struct ext4_dir_entry_tail)) ||
t->det_reserved_zero2 ||
t->det_reserved_ft != EXT4_FT_DIR_CSUM)
return NULL;
@@ -445,13 +446,14 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
struct ext4_dir_entry *dp;
struct dx_root_info *root;
int count_offset;
+ int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
+ unsigned int rlen = ext4_rec_len_from_disk(dirent->rec_len, blocksize);
- if (le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb))
+ if (rlen == blocksize)
count_offset = 8;
- else if (le16_to_cpu(dirent->rec_len) == 12) {
+ else if (rlen == 12) {
dp = (struct ext4_dir_entry *)(((void *)dirent) + 12);
- if (le16_to_cpu(dp->rec_len) !=
- EXT4_BLOCK_SIZE(inode->i_sb) - 12)
+ if (ext4_rec_len_from_disk(dp->rec_len, blocksize) != blocksize - 12)
return NULL;
root = (struct dx_root_info *)(((void *)dp + 12));
if (root->reserved_zero ||
@@ -1315,6 +1317,7 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
unsigned int buflen = bh->b_size;
char *base = bh->b_data;
struct dx_hash_info h = *hinfo;
+ int blocksize = EXT4_BLOCK_SIZE(dir->i_sb);
if (ext4_has_metadata_csum(dir->i_sb))
buflen -= sizeof(struct ext4_dir_entry_tail);
@@ -1335,11 +1338,12 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
map_tail--;
map_tail->hash = h.hash;
map_tail->offs = ((char *) de - base)>>2;
- map_tail->size = le16_to_cpu(de->rec_len);
+ map_tail->size = ext4_rec_len_from_disk(de->rec_len,
+ blocksize);
count++;
cond_resched();
}
- de = ext4_next_entry(de, dir->i_sb->s_blocksize);
+ de = ext4_next_entry(de, blocksize);
}
return count;
}
From: Zheng Yejian <zhengyejian1(a)huawei.com>
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of
bytes in cpu buffer that have not been consumed. However, currently
after consuming data by reading file 'trace_pipe', the 'bytes' info
was not changed as expected.
# cat per_cpu/cpu0/stats
entries: 0
overrun: 0
commit overrun: 0
bytes: 568 <--- 'bytes' is problematical !!!
oldest event ts: 8651.371479
now ts: 8653.912224
dropped events: 0
read events: 8
The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it:
1. When stat 'read_bytes', account consumed event in rb_advance_reader();
2. When stat 'entries_bytes', exclude the discarded padding event which
is smaller than minimum size because it is invisible to reader. Then
use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for
page-based read/remove/overrun.
Also correct the comments of ring_buffer_bytes_cpu() in this patch.
Link: https://lore.kernel.org/linux-trace-kernel/20230921125425.1708423-1-zhengye…
Cc: stable(a)vger.kernel.org
Fixes: c64e148a3be3 ("trace: Add ring buffer stats to measure rate of events")
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a1651edc48d5..28daf0ce95c5 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -354,6 +354,11 @@ static void rb_init_page(struct buffer_data_page *bpage)
local_set(&bpage->commit, 0);
}
+static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
+{
+ return local_read(&bpage->page->commit);
+}
+
static void free_buffer_page(struct buffer_page *bpage)
{
free_page((unsigned long)bpage->page);
@@ -2003,7 +2008,7 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
* Increment overrun to account for the lost events.
*/
local_add(page_entries, &cpu_buffer->overrun);
- local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+ local_sub(rb_page_commit(to_remove_page), &cpu_buffer->entries_bytes);
local_inc(&cpu_buffer->pages_lost);
}
@@ -2367,11 +2372,6 @@ rb_reader_event(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->reader_page->read);
}
-static __always_inline unsigned rb_page_commit(struct buffer_page *bpage)
-{
- return local_read(&bpage->page->commit);
-}
-
static struct ring_buffer_event *
rb_iter_head_event(struct ring_buffer_iter *iter)
{
@@ -2517,7 +2517,7 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer,
* the counters.
*/
local_add(entries, &cpu_buffer->overrun);
- local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+ local_sub(rb_page_commit(next_page), &cpu_buffer->entries_bytes);
local_inc(&cpu_buffer->pages_lost);
/*
@@ -2660,9 +2660,6 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
event = __rb_page_index(tail_page, tail);
- /* account for padding bytes */
- local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes);
-
/*
* Save the original length to the meta data.
* This will be used by the reader to add lost event
@@ -2676,7 +2673,8 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
* write counter enough to allow another writer to slip
* in on this page.
* We put in a discarded commit instead, to make sure
- * that this space is not used again.
+ * that this space is not used again, and this space will
+ * not be accounted into 'entries_bytes'.
*
* If we are less than the minimum size, we don't need to
* worry about it.
@@ -2701,6 +2699,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
/* time delta must be non zero */
event->time_delta = 1;
+ /* account for padding bytes */
+ local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes);
+
/* Make sure the padding is visible before the tail_page->write update */
smp_wmb();
@@ -4215,7 +4216,7 @@ u64 ring_buffer_oldest_event_ts(struct trace_buffer *buffer, int cpu)
EXPORT_SYMBOL_GPL(ring_buffer_oldest_event_ts);
/**
- * ring_buffer_bytes_cpu - get the number of bytes consumed in a cpu buffer
+ * ring_buffer_bytes_cpu - get the number of bytes unconsumed in a cpu buffer
* @buffer: The ring buffer
* @cpu: The per CPU buffer to read from.
*/
@@ -4723,6 +4724,7 @@ static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
length = rb_event_length(event);
cpu_buffer->reader_page->read += length;
+ cpu_buffer->read_bytes += length;
}
static void rb_advance_iter(struct ring_buffer_iter *iter)
@@ -5816,7 +5818,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
} else {
/* update the entry counter */
cpu_buffer->read += rb_page_entries(reader);
- cpu_buffer->read_bytes += BUF_PAGE_SIZE;
+ cpu_buffer->read_bytes += rb_page_commit(reader);
/* swap the pages */
rb_init_page(bpage);
--
2.40.1
Hi Greg, Sasha,
The following list shows patches that you can cherry-pick to -stable 6.5.
I am using original commit IDs for reference:
1) 7ab9d0827af8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
2) 4e5f5b47d8de ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
3) 1d16d80d4230 ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
4) 7606622f20da ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
5) 44a76f08f7ca ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (1):
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (4):
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 7 ++++---
net/netfilter/nf_tables_api.c | 32 ++++++++++++++++++++++++++-----
net/netfilter/nft_set_hash.c | 11 ++++-------
net/netfilter/nft_set_pipapo.c | 4 ++--
net/netfilter/nft_set_rbtree.c | 8 +++-----
5 files changed, 40 insertions(+), 22 deletions(-)
--
2.30.2
23. September 2023.
Hallo,
Ich möchte Ihnen einen Geschäftsvorschlag mitteilen. Für weitere
Details antworten Sie auf Englisch.
Grüße
Frau Victoria Cleland
_________________________
Sekretärin: Lamya Essaoui
From: Ming Lei <ming.lei(a)redhat.com>
commit d36a9ea5e7766961e753ee38d4c331bbe6ef659b upstream.
For blk-mq, queue release handler is usually called after
blk_mq_freeze_queue_wait() returns. However, the
q_usage_counter->release() handler may not be run yet at that time, so
this can cause a use-after-free.
Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu().
Since ->release() is called with rcu read lock held, it is agreed that
the race should be covered in caller per discussion from the two links.
Backport-notes: Not a clean cherry-pick since a lot has changed,
however essentially the same fix.
Reported-by: Zhang Wensheng <zhangwensheng(a)huaweicloud.com>
Reported-by: Zhong Jinghua <zhongjinghua(a)huawei.com>
Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u
Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Yu Kuai <yukuai3(a)huawei.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Saranya Muruganandam <saranyamohan(a)google.com>
---
block/blk-sysfs.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index a582ea0da74f..a82bdec923b2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -737,6 +737,7 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head)
struct request_queue *q = container_of(rcu_head, struct request_queue,
rcu_head);
+ percpu_ref_exit(&q->q_usage_counter);
kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
}
@@ -762,8 +763,6 @@ static void blk_release_queue(struct kobject *kobj)
might_sleep();
- percpu_ref_exit(&q->q_usage_counter);
-
if (q->poll_stat)
blk_stat_remove_callback(q, q->poll_cb);
blk_stat_free_callback(q->poll_cb);
--
2.42.0.515.g380fc7ccd1-goog
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that address possible UaF
and memleaks. I am using original commit IDs for reference:
1) 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol")
2) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
3) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
4) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
5) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
6) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
7) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
8) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
9) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
10) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
11) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
12) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
13) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
14) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: integrate pipapo into commit protocol
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 125 +++++------
net/netfilter/nf_tables_api.c | 341 +++++++++++++++++++++++++++---
net/netfilter/nft_set_hash.c | 87 +++++---
net/netfilter/nft_set_pipapo.c | 115 ++++++----
net/netfilter/nft_set_rbtree.c | 157 ++++++++------
5 files changed, 589 insertions(+), 236 deletions(-)
--
2.30.2
The patch titled
Subject: arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
Date: Fri, 22 Sep 2023 12:58:04 +0100
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above. While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs. And since then new
callers have come along which rely on this working. But given the
brokeness is only observable after commit 8a13897fb0da ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.
Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.
Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Alexandre Ghiti <alex(a)ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Helge Deller <deller(a)gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm64/mm/hugetlbpage.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
--- a/arch/arm64/mm/hugetlbpage.c~arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries
+++ a/arch/arm64/mm/hugetlbpage.c
@@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct
flush_tlb_range(&vma, saddr, addr);
}
-static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
-{
- VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
-
- return page_folio(pfn_to_page(swp_offset_pfn(entry)));
-}
-
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned long sz)
{
@@ -257,13 +250,10 @@ void set_huge_pte_at(struct mm_struct *m
unsigned long pfn, dpfn;
pgprot_t hugeprot;
- if (!pte_present(pte)) {
- struct folio *folio;
-
- folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
- ncontig = num_contig_ptes(folio_size(folio), &pgsize);
+ ncontig = num_contig_ptes(sz, &pgsize);
- for (i = 0; i < ncontig; i++, ptep++)
+ if (!pte_present(pte)) {
+ for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
set_pte_at(mm, addr, ptep, pte);
return;
}
@@ -273,7 +263,6 @@ void set_huge_pte_at(struct mm_struct *m
return;
}
- ncontig = find_num_contig(mm, addr, ptep, &pgsize);
pfn = pte_pfn(pte);
dpfn = pgsize >> PAGE_SHIFT;
hugeprot = pte_pgprot(pte);
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-hugetlb-add-huge-page-size-param-to-set_huge_pte_at.patch
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that address possible UaF
and memleaks. I am using original commit IDs for reference:
1) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
2) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
3) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
4) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
5) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
6) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
7) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
8) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
9) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
10) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
11) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
12) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
13) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
14) 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 126 +++++-----
net/netfilter/nf_tables_api.c | 369 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 87 ++++---
net/netfilter/nft_set_pipapo.c | 67 +++---
net/netfilter/nft_set_rbtree.c | 161 +++++++------
5 files changed, 547 insertions(+), 263 deletions(-)
--
2.30.2
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that results in UaF and
memleaks. I am using original commit IDs for reference:
1) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
2) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
3) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
4) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
5) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
6) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
7) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
8) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
9) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
10) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
11) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
12) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
13) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
14) 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 126 +++++-----
net/netfilter/nf_tables_api.c | 369 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 87 ++++---
net/netfilter/nft_set_pipapo.c | 67 +++---
net/netfilter/nft_set_rbtree.c | 161 +++++++------
5 files changed, 547 insertions(+), 263 deletions(-)
--
2.30.2
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Cc: stable(a)vger.kernel.org
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
include/linux/maple_tree.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index e41c70ac7744..f66f5f78f8cf 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -511,6 +511,15 @@ static inline bool mas_is_paused(const struct ma_state *mas)
return mas->node == MAS_PAUSE;
}
+/* Check if the mas is pointing to a node or not */
+static inline bool mas_is_active(struct ma_state *mas)
+{
+ if ((unsigned long)mas->node >= MAPLE_RESERVED_RANGE)
+ return true;
+
+ return false;
+}
+
/**
* mas_reset() - Reset a Maple Tree operation state.
* @mas: Maple Tree operation state.
--
2.40.1
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.
The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.
Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.
Cc: stable(a)vger.kernel.org
Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver")
Reported-by: Aviram Dali <aviramd(a)marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Hello Aviram,
I have not tested this, but based on your report I believe the status
check is indeed missing here and could sometimes lead to unnoticed
partial writes.
Please test on your side and reply with your Tested-by if you validate
the change.
Any backport on kernels predating v4.17 will likely fail because of a
folder rename, so you will have to do the backport manually if needed.
Thanks,
Miquèl
---
drivers/mtd/nand/raw/marvell_nand.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index 30c15e4e1cc0..576441095012 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -1162,6 +1162,7 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
.ndcb[2] = NDCB2_ADDR5_PAGE(page),
};
unsigned int oob_bytes = lt->spare_bytes + (raw ? lt->ecc_bytes : 0);
+ u8 status;
int ret;
/* NFCv2 needs more information about the operation being executed */
@@ -1195,7 +1196,18 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
ret = marvell_nfc_wait_op(chip,
PSEC_TO_MSEC(sdr->tPROG_max));
- return ret;
+ if (ret)
+ return ret;
+
+ /* Check write status on the chip side */
+ ret = nand_status_op(chip, &status);
+ if (ret)
+ return ret;
+
+ if (status & NAND_STATUS_FAIL)
+ return -EIO;
+
+ return 0;
}
static int marvell_nfc_hw_ecc_hmg_write_page_raw(struct nand_chip *chip,
@@ -1624,6 +1636,7 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip,
int data_len = lt->data_bytes;
int spare_len = lt->spare_bytes;
int chunk, ret;
+ u8 status;
marvell_nfc_select_target(chip, chip->cur_cs);
@@ -1660,6 +1673,14 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip,
if (ret)
return ret;
+ /* Check write status on the chip side */
+ ret = nand_status_op(chip, &status);
+ if (ret)
+ return ret;
+
+ if (status & NAND_STATUS_FAIL)
+ return -EIO;
+
return 0;
}
--
2.34.1
We currently provide the physical address of the DMA region
rather than the output of dma_map_resource() which is obviously wrong.
Fixes: 7330fc505af4 ("mtd: rawnand: qcom: stop using phys_to_dma()")
Cc: stable(a)vger.kernel.org
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum(a)quicinc.com>
---
v5: Incorporated suggestions from Miquel/Mani
- Added tag to automatically include this patch in stable tree.
v4: Incorporated suggestion from Miquel
- Modified title and commit description.
https://lore.kernel.org/all/20230912115903.1007-1-quic_bibekkum@quicinc.com/
v3: Incorporated comments from Miquel
- Modified the commit message and title as per suggestions.
https://lore.kernel.org/all/20230912101814.7748-1-quic_bibekkum@quicinc.com/
v2: Incorporated comments from Pavan/Mani.
https://lore.kernel.org/all/20230911133026.29868-1-quic_bibekkum@quicinc.co…
v1: https://lore.kernel.org/all/20230907092854.11408-1-quic_bibekkum@quicinc.co…
drivers/mtd/nand/raw/qcom_nandc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 64499c1b3603..b079605c84d3 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -3444,7 +3444,7 @@ static int qcom_nandc_probe(struct platform_device *pdev)
err_aon_clk:
clk_disable_unprepare(nandc->core_clk);
err_core_clk:
- dma_unmap_resource(dev, res->start, resource_size(res),
+ dma_unmap_resource(dev, nandc->base_dma, resource_size(res),
DMA_BIDIRECTIONAL, 0);
return ret;
}
--
2.17.1
Defining a prctl flag as an int is a footgun because on a 64 bit machine
and with a variadic implementation of prctl (like in musl and glibc),
when used directly as a prctl argument, it can get casted to long with
garbage upper bits which would result in unexpected behaviors.
This patch changes the constant to an unsigned long to eliminate that
possibilities. This does not break UAPI.
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Cc: stable(a)vger.kernel.org
Signed-off-by: Florent Revest <revest(a)chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev(a)ispras.ru>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
include/uapi/linux/prctl.h | 2 +-
tools/include/uapi/linux/prctl.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3c36aeade991..9a85c69782bd 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 3c36aeade991..9a85c69782bd 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Olá, querido, Em duas ocasiões, enviei-lhe um e-mail ao qual você não
respondeu. Você consegue encontrar tempo para me responder agora?
.......................................
Hi Dear,On two occasions, I have sent an email to you which you have
not responded to.Can you find time to respond to me now?
Dzień dobry!
Czy mógłbym przedstawić rozwiązanie, które umożliwia monitoring każdego auta w czasie rzeczywistym w tym jego pozycję, zużycie paliwa i przebieg?
Dodatkowo nasze narzędzie minimalizuje koszty utrzymania samochodów, skraca czas przejazdów, a także tworzenie planu tras czy dostaw.
Z naszej wiedzy i doświadczenia korzysta już ponad 49 tys. Klientów. Monitorujemy 809 000 pojazdów na całym świecie, co jest naszą najlepszą wizytówką.
Bardzo proszę o e-maila zwrotnego, jeśli moglibyśmy wspólnie omówić potencjał wykorzystania takiego rozwiązania w Państwa firmie.
Pozdrawiam
Mateusz Talaga
Hi All,
This series fixes a bug in arm64's implementation of set_huge_pte_at(), which
can result in an unprivileged user causing a kernel panic. The problem was
triggered when running the new uffd poison mm selftest for HUGETLB memory. This
test (and the uffd poison feature) was merged for v6.6-rc1. However, upon
inspection there are multiple other pre-existing paths that can trigger this
bug.
Ideally, I'd like to get this fix in for v6.6 if possible? And I guess it should
be backported too, given there are call sites where this can theoretically
happen that pre-date v6.6-rc1 (I've cc'ed stable(a)vger.kernel.org).
Description of Bug
------------------
arm64's huge pte implementation supports multiple huge page sizes, some of which
are implemented in the page table with contiguous mappings. So set_huge_pte_at()
needs to work out how big the logical pte is, so that it can also work out how
many physical ptes (or pmds) need to be written. It does this by grabbing the
folio out of the pte and querying its size.
However, there are cases when the pte being set is actually a swap entry. But
this also used to work fine, because for huge ptes, we only ever saw migration
entries and hwpoison entries. And both of these types of swap entries have a PFN
embedded, so the code would grab that and everything still worked out.
But over time, more calls to set_huge_pte_at() have been added that set swap
entry types that do not embed a PFN. And this causes the code to go bang. The
triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm:
add uffd unit test for UFFDIO_POISON"), which sets a PTE_MARKER_POISONED swap
entry. But review shows there are other places too (PTE_MARKER_UFFD_WP).
If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it
will dereference a bad pointer in page_folio():
static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
{
VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
return page_folio(pfn_to_page(swp_offset_pfn(entry)));
}
So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill
set_huge_swap_pte_at()"), which aimed to simplify the interface to the core code
by removing set_huge_swap_pte_at() (which took a page size parameter) and
replacing it with calls to set_huge_swap_pte_at() where the size was inferred
from the folio, as descibed above. While that commit didn't break anything at
the time, it did break the interface because it couldn't handle swap entries
without PFNs. And since then new callers have come along which rely on this
working.
Fix
---
The simplest fix would have been to revert the dodgy cleanup commit, but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at(). As per the original intent of the change, it would also
leave us open to future bugs when people invariably get it wrong and call the
wrong helper.
So instead, I've converted the first parameter of set_huge_pte_at() to be a vma
rather than an mm. This means that the arm64 code can easily recover the huge
page size in all cases. It's a bigger change, due to needing to touch the arches
that implement the function, but it is entirely mechanical, so in my view, low
risk.
I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390 (and
additionally x86_64). I've additionally booted and run mm selftests against
arm64, where I observe the uffd poison test is fixed, and there are no other
regressions.
Patches
-------
patches 1-7: Convert core mm and arches to pass vma instead of mm
patch: 8: Fixes the arm64 bug
Patches based on v6.6-rc2.
Thanks,
Ryan
Ryan Roberts (8):
parisc: hugetlb: Convert set_huge_pte_at() to take vma
powerpc: hugetlb: Convert set_huge_pte_at() to take vma
riscv: hugetlb: Convert set_huge_pte_at() to take vma
s390: hugetlb: Convert set_huge_pte_at() to take vma
sparc: hugetlb: Convert set_huge_pte_at() to take vma
mm: hugetlb: Convert set_huge_pte_at() to take vma
arm64: hugetlb: Convert set_huge_pte_at() to take vma
arm64: hugetlb: Fix set_huge_pte_at() to work with all swap entries
arch/arm64/include/asm/hugetlb.h | 2 +-
arch/arm64/mm/hugetlbpage.c | 22 ++++----------
arch/parisc/include/asm/hugetlb.h | 2 +-
arch/parisc/mm/hugetlbpage.c | 4 +--
.../include/asm/nohash/32/hugetlb-8xx.h | 3 +-
arch/powerpc/mm/book3s64/hugetlbpage.c | 2 +-
arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 2 +-
arch/powerpc/mm/nohash/8xx.c | 2 +-
arch/powerpc/mm/pgtable.c | 7 ++++-
arch/riscv/include/asm/hugetlb.h | 2 +-
arch/riscv/mm/hugetlbpage.c | 3 +-
arch/s390/include/asm/hugetlb.h | 8 +++--
arch/s390/mm/hugetlbpage.c | 8 ++++-
arch/sparc/include/asm/hugetlb.h | 8 +++--
arch/sparc/mm/hugetlbpage.c | 8 ++++-
include/asm-generic/hugetlb.h | 6 ++--
include/linux/hugetlb.h | 6 ++--
mm/damon/vaddr.c | 2 +-
mm/hugetlb.c | 30 +++++++++----------
mm/migrate.c | 2 +-
mm/rmap.c | 10 +++----
mm/vmalloc.c | 5 +++-
22 files changed, 80 insertions(+), 64 deletions(-)
--
2.25.1
The patch titled
Subject: mm: make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long
has been added to the -mm mm-unstable branch. Its filename is
mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Florent Revest <revest(a)chromium.org>
Subject: mm: make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long
Date: Mon, 28 Aug 2023 17:08:56 +0200
Defining a prctl flag as an int is a footgun because on a 64 bit machine
and with a variadic implementation of prctl (like in musl and glibc), when
used directly as a prctl argument, it can get casted to long with garbage
upper bits which would result in unexpected behaviors.
This patch changes the constant to an unsigned long to eliminate that
possibilities. This does not break UAPI.
Link: https://lkml.kernel.org/r/20230828150858.393570-5-revest@chromium.org
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Signed-off-by: Florent Revest <revest(a)chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev(a)ispras.ru>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Ayush Jain <ayush.jain3(a)amd.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy(a)arm.com>
Cc: Topi Miettinen <toiwoton(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/uapi/linux/prctl.h | 2 +-
tools/include/uapi/linux/prctl.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/include/uapi/linux/prctl.h~mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long
+++ a/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
--- a/tools/include/uapi/linux/prctl.h~mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long
+++ a/tools/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
_
Patches currently in -mm which might be from revest(a)chromium.org are
kselftest-vm-fix-tabs-spaces-inconsistency-in-the-mdwe-test.patch
kselftest-vm-fix-mdwes-mmap_fixed-test-case.patch
kselftest-vm-check-errnos-in-mdwe_test.patch
mm-make-pr_mdwe_refuse_exec_gain-an-unsigned-long.patch
mm-add-a-no_inherit-flag-to-the-pr_set_mdwe-prctl.patch
kselftest-vm-add-tests-for-no-inherit-memory-deny-write-execute.patch
Hi,
We are a professional bags manufacturer from China supplying all types of customized bags. If you're interested, please don't hesitate to contact me.
Best regards,
Steven
There were three heads close together for a space of twenty seconds or so, and then a fearful explosion happened—the unique, tremendous laughter of Mr Colclough, which went off like a charge of melinite and staggered the furniture
THE FIGHT IN THE COACH-HOUSE
So many shocks, emotions, perils, horrors, added to the wound, his first, had tried his youthful body and sensitive nature too severely
The patch titled
Subject: maple_tree: add mas_is_active() to detect in-tree walks
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: add mas_is_active() to detect in-tree walks
Date: Thu, 21 Sep 2023 14:12:35 -0400
Patch series "maple_tree: Fix mas_prev() state regression".
Pedro Falcato contacted me on IRC with an mprotect regression which was
bisected back to the iterator changes for maple tree. Root cause analysis
showed the mas_prev() running off the end of the VMA space (previous from
0) followed by mas_find(), would skip the first value.
This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.
This patch (of 2):
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Pedro Falcato <pedro.falcato(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/maple_tree.h | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/include/linux/maple_tree.h~maple_tree-add-mas_active-to-detect-in-tree-walks
+++ a/include/linux/maple_tree.h
@@ -511,6 +511,15 @@ static inline bool mas_is_paused(const s
return mas->node == MAS_PAUSE;
}
+/* Check if the mas is pointing to a node or not */
+static inline bool mas_is_active(struct ma_state *mas)
+{
+ if ((unsigned long)mas->node >= MAPLE_RESERVED_RANGE)
+ return true;
+
+ return false;
+}
+
/**
* mas_reset() - Reset a Maple Tree operation state.
* @mas: Maple Tree operation state.
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-add-mas_active-to-detect-in-tree-walks.patch
maple_tree-add-mas_underflow-and-mas_overflow-states.patch
The patch titled
Subject: nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-potential-use-after-free-in-nilfs_gccache_submit_read_data.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pan Bian <bianpan2016(a)163.com>
Subject: nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
Date: Thu, 21 Sep 2023 23:17:31 +0900
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails. If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed. However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug. This patch moves the release
operation after unlocking and putting the page.
NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed. However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.
[konishi.ryusuke(a)gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163…
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e89 ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Reported-by: Ferry Meng <mengferry(a)linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.c…
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/gcinode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/gcinode.c~nilfs2-fix-potential-use-after-free-in-nilfs_gccache_submit_read_data
+++ a/fs/nilfs2/gcinode.c
@@ -73,10 +73,8 @@ int nilfs_gccache_submit_read_data(struc
struct the_nilfs *nilfs = inode->i_sb->s_fs_info;
err = nilfs_dat_translate(nilfs->ns_dat, vbn, &pbn);
- if (unlikely(err)) { /* -EIO, -ENOMEM, -ENOENT */
- brelse(bh);
+ if (unlikely(err)) /* -EIO, -ENOMEM, -ENOENT */
goto failed;
- }
}
lock_buffer(bh);
@@ -102,6 +100,8 @@ int nilfs_gccache_submit_read_data(struc
failed:
unlock_page(bh->b_page);
put_page(bh->b_page);
+ if (unlikely(err))
+ brelse(bh);
return err;
}
_
Patches currently in -mm which might be from bianpan2016(a)163.com are
nilfs2-fix-potential-use-after-free-in-nilfs_gccache_submit_read_data.patch
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
present before passing it down the stack to insulate callers from
changes to the send address.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jordan Rife <jrife(a)google.com>
---
v3->v4: Maintain reverse xmas tree order for variable declarations.
Remove precondition check for msg_namelen.
v2->v3: Add "Fixes" tag.
v1->v2: Split up original patch into patch series. Perform address copy
in sock_sendmsg() instead of sock->ops->sendmsg().
net/socket.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index c8b08b32f097e..a39ec136f5cff 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -737,6 +737,14 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
return ret;
}
+static int __sock_sendmsg(struct socket *sock, struct msghdr *msg)
+{
+ int err = security_socket_sendmsg(sock, msg,
+ msg_data_left(msg));
+
+ return err ?: sock_sendmsg_nosec(sock, msg);
+}
+
/**
* sock_sendmsg - send a message through @sock
* @sock: socket
@@ -747,10 +755,19 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
*/
int sock_sendmsg(struct socket *sock, struct msghdr *msg)
{
- int err = security_socket_sendmsg(sock, msg,
- msg_data_left(msg));
+ struct sockaddr_storage *save_addr = (struct sockaddr_storage *)msg->msg_name;
+ struct sockaddr_storage address;
+ int ret;
- return err ?: sock_sendmsg_nosec(sock, msg);
+ if (msg->msg_name) {
+ memcpy(&address, msg->msg_name, msg->msg_namelen);
+ msg->msg_name = &address;
+ }
+
+ ret = __sock_sendmsg(sock, msg);
+ msg->msg_name = save_addr;
+
+ return ret;
}
EXPORT_SYMBOL(sock_sendmsg);
@@ -1138,7 +1155,7 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (sock->type == SOCK_SEQPACKET)
msg.msg_flags |= MSG_EOR;
- res = sock_sendmsg(sock, &msg);
+ res = __sock_sendmsg(sock, &msg);
*from = msg.msg_iter;
return res;
}
@@ -2174,7 +2191,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
if (sock->file->f_flags & O_NONBLOCK)
flags |= MSG_DONTWAIT;
msg.msg_flags = flags;
- err = sock_sendmsg(sock, &msg);
+ err = __sock_sendmsg(sock, &msg);
out_put:
fput_light(sock->file, fput_needed);
@@ -2538,7 +2555,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
err = sock_sendmsg_nosec(sock, msg_sys);
goto out_freectl;
}
- err = sock_sendmsg(sock, msg_sys);
+ err = __sock_sendmsg(sock, msg_sys);
/*
* If this is sendmmsg() and sending to current destination address was
* successful, remember it.
--
2.42.0.459.ge4e396fd5e-goog
I'm announcing the release of the 5.10.196 kernel.
This release is only needed by any 5.10.y user that uses configfs, it
resolves a regression in 5.10.195 in that subsystem. Note that many
kernel subsystems use configfs for configuration so to be safe, you
probably want to upgrade if you are not sure.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
fs/configfs/dir.c | 2 --
2 files changed, 1 insertion(+), 3 deletions(-)
Greg Kroah-Hartman (2):
Revert "configfs: fix a race in configfs_lookup()"
Linux 5.10.196
Good day to you.
I would like to share a good news to you, our company have been
awarded the tender to supply to the Malaysian Army Aviation. Thus
we would like to request for updated quotation for item:
Attached your previous quotation for your reference.
Appreciate if you could give us the best price & lead time for
the items. Looking forward for your good offer.
Thank you & Kind Regards,
Andrea MASTELLA
Procurement,
Supply Chain Management.
Phone: +603 8778 9500
The information in this e-mail and any attachment(s) here to is
only for the use of the intended recipient and may be
confidential or privileged. If you are not the intended
recipient, any use of, reliance on, reference to, disclosure of,
alteration to or copying of the information for any purpose is
prohibited. Any information not related to SIR's official
business is solely the author's and does not necessarily
represent SIR's view and is not necessarily endorsed by SIR. SIR
shall not be liable for loss or damage caused by viruses
transmitted by this e-mail or its attachments. SIR is not
responsible for any unauthorized changes made to the information
or for the effect of such changes.
Hi,
My name is Dr. Lisa Williams, from the United States, currently living
in the United Kingdom.
I hope you consider my friend request. I will share some of my photos
and more details about me when I get your reply.
With love
Lisa
Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices")
made the pointers to additional zoned devices to be stored in a
dynamically allocated dmz->ddev array. However, this array is not freed.
Free it when cleaning up zoned device information inside
dmz_put_zoned_device(). Assigning NULL to dmz->ddev elements doesn't make
sense there as they are not supposed to be reused later and the whole dmz
target structure is being cleaned anyway.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/md/dm-zoned-target.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index ad8e670a2f9b..e25cd9db6275 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -753,12 +753,10 @@ static void dmz_put_zoned_device(struct dm_target *ti)
struct dmz_target *dmz = ti->private;
int i;
- for (i = 0; i < dmz->nr_ddevs; i++) {
- if (dmz->ddev[i]) {
+ for (i = 0; i < dmz->nr_ddevs; i++)
+ if (dmz->ddev[i])
dm_put_device(ti, dmz->ddev[i]);
- dmz->ddev[i] = NULL;
- }
- }
+ kfree(dmz->ddev);
}
static int dmz_fixup_devices(struct dm_target *ti)
--
2.42.0
On 9/20/23 11:42, Ian Rogers wrote:
> On Wed, 20 Sept 2023 at 11:13, Florian Fainelli <f.fainelli(a)gmail.com
> <mailto:f.fainelli@gmail.com>> wrote:
>
> On 9/20/23 04:30, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.10.196
> release.
> > There are 83 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied,
> please
> > let me know.
> >
> > Responses should be made by Fri, 22 Sep 2023 11:28:09 +0000.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.196-r… <https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.196-r…>
> > or in the git tree and branch at:
> >
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git <http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git> linux-5.10.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> perf fails to build on ARM, ARM64 and MIPS with:
>
> fixdep: error opening depfile:
> /local/users/fainelli/buildroot/output/bmips/build/linux-custom/tools/perf/pmu-events/.pmu-events.o.d:
> No such file or directory
> make[5]: *** [pmu-events/Build:33:
> /local/users/fainelli/buildroot/output/bmips/build/linux-custom/tools/perf/pmu-events/pmu-events.o]
> Error 2
> make[4]: *** [Makefile.perf:653:
> /local/users/fainelli/buildroot/output/bmips/build/linux-custom/tools/perf/pmu-events/pmu-events-in.o]
> Error 2
> make[3]: *** [Makefile.perf:229: sub-make] Error 2
> make[2]: *** [Makefile:70: all] Error 2
> make[1]: *** [package/pkg-generic.mk:294 <http://pkg-generic.mk:294>:
> /local/users/fainelli/buildroot/output/bmips/build/linux-tools/.stamp_built]
> Error 2
> make: *** [Makefile:27: _all] Error 2
>
> this is caused by 653fc524e350b62479529140dc9abef05abbcc29 ("perf
> build:
> Update build rule for generated files"). Reverting that commit plus
> 5804de1f2324ddcfe3f0b6ad58fcfe4d344e0471 ("perf jevents: Switch
> build to
> use jevents.py") gets us going again.
>
>
> Given the perf tool is backward compatible, does doing backports make sense?
For bugfixes certainly this does not appear to be one though?
--
Florian
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: qcom: camss: Fix pm_domain_on sequence in probe
Author: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Date: Wed Aug 30 16:16:06 2023 +0100
We need to make sure camss_configure_pd() happens before
camss_register_entities() as the vfe_get() path relies on the pointer
provided by camss_configure_pd().
Fix the ordering sequence in probe to ensure the pointers vfe_get() demands
are present by the time camss_register_entities() runs.
In order to facilitate backporting to stable kernels I've moved the
configure_pd() call pretty early on the probe() function so that
irrespective of the existence of the old error handling jump labels this
patch should still apply to -next circa Aug 2023 to v5.13 inclusive.
Fixes: 2f6f8af67203 ("media: camss: Refactor VFE power domain toggling")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/platform/qcom/camss/camss.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
---
diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c
index f11dc59135a5..75991d849b57 100644
--- a/drivers/media/platform/qcom/camss/camss.c
+++ b/drivers/media/platform/qcom/camss/camss.c
@@ -1619,6 +1619,12 @@ static int camss_probe(struct platform_device *pdev)
if (ret < 0)
goto err_cleanup;
+ ret = camss_configure_pd(camss);
+ if (ret < 0) {
+ dev_err(dev, "Failed to configure power domains: %d\n", ret);
+ goto err_cleanup;
+ }
+
ret = camss_init_subdevices(camss);
if (ret < 0)
goto err_cleanup;
@@ -1678,12 +1684,6 @@ static int camss_probe(struct platform_device *pdev)
}
}
- ret = camss_configure_pd(camss);
- if (ret < 0) {
- dev_err(dev, "Failed to configure power domains: %d\n", ret);
- return ret;
- }
-
pm_runtime_enable(dev);
return 0;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 57a943ebfcdb4a97fbb409640234bdb44bfa1953
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091622-outpost-audio-2222@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
57a943ebfcdb ("drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma")
5d945cbcd4b1 ("drm/amd/display: Create a file dedicated to planes")
60693e3a3890 ("Merge tag 'amd-drm-next-5.20-2022-07-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57a943ebfcdb4a97fbb409640234bdb44bfa1953 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Thu, 31 Aug 2023 15:12:28 -0100
Subject: [PATCH] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy
gamma
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma
using a pre-defined sRGB transfer function. It works fine for DCN2
family where degamma ROM and custom curves go to the same color block.
But, on DCN3+, degamma is split into two blocks: degamma ROM for
pre-defined TFs and `gamma correction` for user/custom curves and
degamma ROM settings doesn't apply to cursor plane. To get DRM legacy
gamma working as expected, enable cursor degamma ROM for implict sRGB
degamma on HW with this configuration.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 2198df96ed6f..cc74dd69acf2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1269,6 +1269,13 @@ void amdgpu_dm_plane_handle_cursor_update(struct drm_plane *plane,
attributes.rotation_angle = 0;
attributes.attribute_flags.value = 0;
+ /* Enable cursor degamma ROM on DCN3+ for implicit sRGB degamma in DRM
+ * legacy gamma setup.
+ */
+ if (crtc_state->cm_is_degamma_srgb &&
+ adev->dm.dc->caps.color.dpp.gamma_corr)
+ attributes.attribute_flags.bits.ENABLE_CURSOR_DEGAMMA = 1;
+
attributes.pitch = afb->base.pitches[0] / afb->base.format->cpp[0];
if (crtc_state->stream) {
From: Zhihao Cheng <chengzhihao1(a)huawei.com>
commit d919a1e79bac890421537cf02ae773007bf55e6b upstream.
Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
moved proc_flush_task() behind __exit_signal(). Then, process systemd can
take long period high cpu usage during releasing task in following
concurrent processes:
systemd ps
kernel_waitid stat(/proc/tgid)
do_wait filename_lookup
wait_consider_task lookup_fast
release_task
__exit_signal
__unhash_process
detach_pid
__change_pid // remove task->pid_links
d_revalidate -> pid_revalidate // 0
d_invalidate(/proc/tgid)
shrink_dcache_parent(/proc/tgid)
d_walk(/proc/tgid)
spin_lock_nested(/proc/tgid/fd)
// iterating opened fd
proc_flush_pid |
d_invalidate (/proc/tgid/fd) |
shrink_dcache_parent(/proc/tgid/fd) |
shrink_dentry_list(subdirs) ↓
shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock
Function d_invalidate() will remove dentry from hash firstly, but why does
proc_flush_pid() process dentry '/proc/tgid/fd' before dentry
'/proc/tgid'? That's because proc_pid_make_inode() adds proc inode in
reverse order by invoking hlist_add_head_rcu(). But proc should not add
any inodes under '/proc/tgid' except '/proc/tgid/task/pid', fix it by
adding inode into 'pid->inodes' only if the inode is /proc/tgid or
/proc/tgid/task/pid.
Performance regression:
Create 200 tasks, each task open one file for 50,000 times. Kill all
tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).
Before fix:
$ time killall -wq aa
real 4m40.946s # During this period, we can see 'ps' and 'systemd'
taking high cpu usage.
After fix:
$ time killall -wq aa
real 1m20.732s # During this period, we can see 'systemd' taking
high cpu usage.
Link: https://lkml.kernel.org/r/20220713130029.4133533-1-chengzhihao1@huawei.com
Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Suggested-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Brian Foster <bfoster(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Yu Kuai <yukuai3(a)huawei.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ bp: Context adjustments ]
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
---
fs/proc/base.c | 46 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 8 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a484c30bd5cf..712948e97991 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1881,7 +1881,7 @@ void proc_pid_evict_inode(struct proc_inode *ei)
put_pid(pid);
}
-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block *sb,
struct task_struct *task, umode_t mode)
{
struct inode * inode;
@@ -1910,11 +1910,6 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
/* Let the pid remember us for quick removal */
ei->pid = pid;
- if (S_ISDIR(mode)) {
- spin_lock(&pid->lock);
- hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
- spin_unlock(&pid->lock);
- }
task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
security_task_to_inode(task, inode);
@@ -1927,6 +1922,39 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
return NULL;
}
+/*
+ * Generating an inode and adding it into @pid->inodes, so that task will
+ * invalidate inode's dentry before being released.
+ *
+ * This helper is used for creating dir-type entries under '/proc' and
+ * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
+ * can be released by invalidating '/proc/<tgid>' dentry.
+ * In theory, dentries under '/proc/<tgid>/task' can also be released by
+ * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
+ * thread exiting situation: Any one of threads should invalidate its
+ * '/proc/<tgid>/task/<pid>' dentry before released.
+ */
+static struct inode *proc_pid_make_base_inode(struct super_block *sb,
+ struct task_struct *task, umode_t mode)
+{
+ struct inode *inode;
+ struct proc_inode *ei;
+ struct pid *pid;
+
+ inode = proc_pid_make_inode(sb, task, mode);
+ if (!inode)
+ return NULL;
+
+ /* Let proc_flush_pid find this directory inode */
+ ei = PROC_I(inode);
+ pid = ei->pid;
+ spin_lock(&pid->lock);
+ hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+ spin_unlock(&pid->lock);
+
+ return inode;
+}
+
int pid_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
@@ -3341,7 +3369,8 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry,
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
@@ -3637,7 +3666,8 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry,
struct task_struct *task, const void *ptr)
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
--
2.34.1
Hi Greg,
I recently found a bug in the rsvp traffic classifier in the Linux kernel.
This classifier is already retired in the upstream but affects stable
releases.
The symptom of the bug is that the kernel can be tricked into accessing a
wild pointer, thus crash the kernel.
Since it is just a crash and cannot be used for LPE, I do not want to
trouble security(a)kernel.org. And since the classifier is already
retired in the upstream, I cannot report there.
Since it affects stable releases, I decided to report it here. If it
is not appropriate, I appologize in advance and wonder what will be a
good channel to report bugs that only affects stable releases and no
equivalent fix exists in the upstream.
[Root Cause]
The root cause of the bug is an slab-out-of-bound access, but since the
offset to the original pointer is an `unsign int` fully controlled by
users, the behaviour is ususally a wild pointer access.
in `rsvp_change`, RSVP_PINFO is passed to the kernel without any checks
~~~
static int rsvp_change(...)
{
......
if (tb[TCA_RSVP_PINFO]) {
pinfo = nla_data(tb[TCA_RSVP_PINFO]);
f->spi = pinfo->spi;
f->tunnelhdr = pinfo->tunnelhdr;
}
......
if (pinfo) {
s->dpi = pinfo->dpi;
s->protocol = pinfo->protocol;
s->tunnelid = pinfo->tunnelid;
}
......
}
~~~
As a result, later when the classifier actually does the classification
in `rsvp_classify`:
~~~
TC_INDIRECT_SCOPE int RSVP_CLS(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res)
{
......
*(u32 *)(xprt + s->dpi.offset) ^ s->dpi.key)
......
}
~~~
`xprt + s->dpi.offset` becomes a wild pointer and crashes the kernel.
[Severity]
This will cause a local denial-of-service.
[Patch]
I don't know enough about this subsystem to suggest a proper patch. But
I will suggest to retire rsvp classifier completely just like in the
upstream.
[Affected Version]
I confirmed that this bug affects v5.10, v6.1, and v6.2.
[Proof-of-Concept]
A POC file is attached to this email.
[Splash]
A kernel oops splash is attached to this email.
Best,
Kyle Zeng
From: valis <sec(a)valis.email>
Commit 76e42ae831991c828cffa8c37736ebfb831ad5ec upstream.
[ Fixed small conflict as 'fnew->ifindex' assignment is not protected by
CONFIG_NET_CLS_IND on upstream since a51486266c3 ]
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee5993b ("net: sched: fw use RCU")
Reported-by: valis <sec(a)valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy(a)starlabs.sg>
Signed-off-by: valis <sec(a)valis.email>
Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Reviewed-by: Victor Nogueira <victor(a)mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela(a)mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan(a)starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Luiz Capitulino <luizcap(a)amazon.com>
---
net/sched/cls_fw.c | 1 -
1 file changed, 1 deletion(-)
Valis, Greg,
I noticed that 4.14 is missing this fix while we backported all three fixes
from this series to all stable kernels:
https://lore.kernel.org/all/20230729123202.72406-1-jhs@mojatatu.com
Is there a reason to have skipped 4.14 for this fix? It seems we need it.
This is only compiled-tested though, would be good to have a confirmation
from Valis that the issue is present on 4.14 before applying.
- Luiz
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index e63f9c2e37e5..7b04b315b2bd 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -281,7 +281,6 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
return -ENOBUFS;
fnew->id = f->id;
- fnew->res = f->res;
#ifdef CONFIG_NET_CLS_IND
fnew->ifindex = f->ifindex;
#endif /* CONFIG_NET_CLS_IND */
--
2.40.1
From: Christian König <christian.koenig(a)amd.com>
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.
The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
(cherry picked from commit 35588314e963938dfdcdb792c9170108399377d6)
---
This is a backport of 35588314e963 ("drm/amdgpu: fix amdgpu_cs_p1_user_fence")
to 6.5 and older stable kernels because the original patch does not apply
cleanly as is.
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fb78a8f47587..946d031d2520 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -127,7 +127,6 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
struct drm_gem_object *gobj;
struct amdgpu_bo *bo;
unsigned long size;
- int r;
gobj = drm_gem_object_lookup(p->filp, data->handle);
if (gobj == NULL)
@@ -139,23 +138,14 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
drm_gem_object_put(gobj);
size = amdgpu_bo_size(bo);
- if (size != PAGE_SIZE || (data->offset + 8) > size) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (size != PAGE_SIZE || data->offset > (size - 8))
+ return -EINVAL;
- if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
- r = -EINVAL;
- goto error_unref;
- }
+ if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
+ return -EINVAL;
*offset = data->offset;
-
return 0;
-
-error_unref:
- amdgpu_bo_unref(&bo);
- return r;
}
static int amdgpu_cs_p1_bo_handles(struct amdgpu_cs_parser *p,
--
2.41.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092016-bronze-dolphin-b917@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092057-deceiving-jukebox-5350@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ec5fa9fcdeca69edf7dab5ca3b2e0ceb1c08fe9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092029-banter-truth-cf72@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ec5fa9fcdeca ("drm/amd/display: Adjust the MST resume flow")
1e5d4d8eb8c0 ("drm/amd/display: Ext displays with dock can't recognized after resume")
028c4ccfb812 ("drm/amd/display: force connector state when bpc changes during compliance")
d5a43956b73b ("drm/amd/display: move dp capability related logic to link_dp_capability")
94dfeaa46925 ("drm/amd/display: move dp phy related logic to link_dp_phy")
630168a97314 ("drm/amd/display: move dp link training logic to link_dp_training")
d144b40a4833 ("drm/amd/display: move dc_link_dpia logic to link_dp_dpia")
a28d0bac0956 ("drm/amd/display: move dpcd logic from dc_link_dpcd to link_dpcd")
a98cdd8c4856 ("drm/amd/display: refactor ddc logic from dc_link_ddc to link_ddc")
4370f72e3845 ("drm/amd/display: refactor hpd logic from dc_link to link_hpd")
0e8cf83a2b47 ("drm/amd/display: allow hpo and dio encoder switching during dp retrain test")
7462475e3a06 ("drm/amd/display: move dccg programming from link hwss hpo dp to hwss")
e85d59885409 ("drm/amd/display: use encoder type independent hwss instead of accessing enc directly")
ebf13b72020a ("drm/amd/display: Revert Scaler HCBlank issue workaround")
639f6ad6df7f ("drm/amd/display: Revert Reduce delay when sink device not able to ACK 00340h write")
e3aa827e2ab3 ("drm/amd/display: Avoid setting pixel rate divider to N/A")
180f33d27a55 ("drm/amd/display: Adjust DP 8b10b LT exit behavior")
b7ada7ee61d3 ("drm/amd/display: Populate DP2.0 output type for DML pipe")
ea192af507d9 ("drm/amd/display: Only update link settings after successful MST link train")
be9f6b222c52 ("drm/amd/display: Fix fallback issues for DP LL 1.4a tests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ec5fa9fcdeca69edf7dab5ca3b2e0ceb1c08fe9a Mon Sep 17 00:00:00 2001
From: Wayne Lin <wayne.lin(a)amd.com>
Date: Tue, 22 Aug 2023 16:03:17 +0800
Subject: [PATCH] drm/amd/display: Adjust the MST resume flow
[Why]
In drm_dp_mst_topology_mgr_resume() today, it will resume the
mst branch to be ready handling mst mode and also consecutively do
the mst topology probing. Which will cause the dirver have chance
to fire hotplug event before restoring the old state. Then Userspace
will react to the hotplug event based on a wrong state.
[How]
Adjust the mst resume flow as:
1. set dpcd to resume mst branch status
2. restore source old state
3. Do mst resume topology probing
For drm_dp_mst_topology_mgr_resume(), it's better to adjust it to
pull out topology probing work into a 2nd part procedure of the mst
resume. Will have a follow up patch in drm.
Reviewed-by: Chao-kai Wang <stylon.wang(a)amd.com>
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Acked-by: Stylon Wang <stylon.wang(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ca129983a08b..c6fd34bab358 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2340,14 +2340,62 @@ static int dm_late_init(void *handle)
return detect_mst_link_for_all_connectors(adev_to_drm(adev));
}
+static void resume_mst_branch_status(struct drm_dp_mst_topology_mgr *mgr)
+{
+ int ret;
+ u8 guid[16];
+ u64 tmp64;
+
+ mutex_lock(&mgr->lock);
+ if (!mgr->mst_primary)
+ goto out_fail;
+
+ if (drm_dp_read_dpcd_caps(mgr->aux, mgr->dpcd) < 0) {
+ drm_dbg_kms(mgr->dev, "dpcd read failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+
+ ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
+ DP_MST_EN |
+ DP_UP_REQ_EN |
+ DP_UPSTREAM_IS_SRC);
+ if (ret < 0) {
+ drm_dbg_kms(mgr->dev, "mst write failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+
+ /* Some hubs forget their guids after they resume */
+ ret = drm_dp_dpcd_read(mgr->aux, DP_GUID, guid, 16);
+ if (ret != 16) {
+ drm_dbg_kms(mgr->dev, "dpcd read failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+
+ if (memchr_inv(guid, 0, 16) == NULL) {
+ tmp64 = get_jiffies_64();
+ memcpy(&guid[0], &tmp64, sizeof(u64));
+ memcpy(&guid[8], &tmp64, sizeof(u64));
+
+ ret = drm_dp_dpcd_write(mgr->aux, DP_GUID, guid, 16);
+
+ if (ret != 16) {
+ drm_dbg_kms(mgr->dev, "check mstb guid failed - undocked during suspend?\n");
+ goto out_fail;
+ }
+ }
+
+ memcpy(mgr->mst_primary->guid, guid, 16);
+
+out_fail:
+ mutex_unlock(&mgr->lock);
+}
+
static void s3_handle_mst(struct drm_device *dev, bool suspend)
{
struct amdgpu_dm_connector *aconnector;
struct drm_connector *connector;
struct drm_connector_list_iter iter;
struct drm_dp_mst_topology_mgr *mgr;
- int ret;
- bool need_hotplug = false;
drm_connector_list_iter_begin(dev, &iter);
drm_for_each_connector_iter(connector, &iter) {
@@ -2369,18 +2417,15 @@ static void s3_handle_mst(struct drm_device *dev, bool suspend)
if (!dp_is_lttpr_present(aconnector->dc_link))
try_to_configure_aux_timeout(aconnector->dc_link->ddc, LINK_AUX_DEFAULT_TIMEOUT_PERIOD);
- ret = drm_dp_mst_topology_mgr_resume(mgr, true);
- if (ret < 0) {
- dm_helpers_dp_mst_stop_top_mgr(aconnector->dc_link->ctx,
- aconnector->dc_link);
- need_hotplug = true;
- }
+ /* TODO: move resume_mst_branch_status() into drm mst resume again
+ * once topology probing work is pulled out from mst resume into mst
+ * resume 2nd step. mst resume 2nd step should be called after old
+ * state getting restored (i.e. drm_atomic_helper_resume()).
+ */
+ resume_mst_branch_status(mgr);
}
}
drm_connector_list_iter_end(&iter);
-
- if (need_hotplug)
- drm_kms_helper_hotplug_event(dev);
}
static int amdgpu_dm_smu_write_watermarks_table(struct amdgpu_device *adev)
@@ -2774,7 +2819,8 @@ static int dm_resume(void *handle)
struct dm_atomic_state *dm_state = to_dm_atomic_state(dm->atomic_obj.state);
enum dc_connection_type new_connection_type = dc_connection_none;
struct dc_state *dc_state;
- int i, r, j;
+ int i, r, j, ret;
+ bool need_hotplug = false;
if (amdgpu_in_reset(adev)) {
dc_state = dm->cached_dc_state;
@@ -2872,7 +2918,7 @@ static int dm_resume(void *handle)
continue;
/*
- * this is the case when traversing through already created
+ * this is the case when traversing through already created end sink
* MST connectors, should be skipped
*/
if (aconnector && aconnector->mst_root)
@@ -2932,6 +2978,27 @@ static int dm_resume(void *handle)
dm->cached_state = NULL;
+ /* Do mst topology probing after resuming cached state*/
+ drm_connector_list_iter_begin(ddev, &iter);
+ drm_for_each_connector_iter(connector, &iter) {
+ aconnector = to_amdgpu_dm_connector(connector);
+ if (aconnector->dc_link->type != dc_connection_mst_branch ||
+ aconnector->mst_root)
+ continue;
+
+ ret = drm_dp_mst_topology_mgr_resume(&aconnector->mst_mgr, true);
+
+ if (ret < 0) {
+ dm_helpers_dp_mst_stop_top_mgr(aconnector->dc_link->ctx,
+ aconnector->dc_link);
+ need_hotplug = true;
+ }
+ }
+ drm_connector_list_iter_end(&iter);
+
+ if (need_hotplug)
+ drm_kms_helper_hotplug_event(ddev);
+
amdgpu_dm_irq_resume_late(adev);
amdgpu_dm_smu_write_watermarks_table(adev);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e5c624f027ac74f97e97c8f36c69228ac9f1102d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092033-smother-cannabis-e03d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e5c624f027ac ("tracing: Have event inject files inc the trace array ref count")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e5c624f027ac74f97e97c8f36c69228ac9f1102d Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 6 Sep 2023 22:47:16 -0400
Subject: [PATCH] tracing: Have event inject files inc the trace array ref
count
The event inject files add events for a specific trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when a event inject file is opened.
Link: https://lkml.kernel.org/r/20230907024804.292337868@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Fixes: 6c3edaf9fd6a ("tracing: Introduce trace event injection")
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
index abe805d471eb..8650562bdaa9 100644
--- a/kernel/trace/trace_events_inject.c
+++ b/kernel/trace/trace_events_inject.c
@@ -328,7 +328,8 @@ event_inject_read(struct file *file, char __user *buf, size_t size,
}
const struct file_operations event_inject_fops = {
- .open = tracing_open_generic,
+ .open = tracing_open_file_tr,
.read = event_inject_read,
.write = event_inject_write,
+ .release = tracing_release_file_tr,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e5c624f027ac74f97e97c8f36c69228ac9f1102d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092032-groom-mustiness-69eb@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e5c624f027ac ("tracing: Have event inject files inc the trace array ref count")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e5c624f027ac74f97e97c8f36c69228ac9f1102d Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 6 Sep 2023 22:47:16 -0400
Subject: [PATCH] tracing: Have event inject files inc the trace array ref
count
The event inject files add events for a specific trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when a event inject file is opened.
Link: https://lkml.kernel.org/r/20230907024804.292337868@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Fixes: 6c3edaf9fd6a ("tracing: Introduce trace event injection")
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
index abe805d471eb..8650562bdaa9 100644
--- a/kernel/trace/trace_events_inject.c
+++ b/kernel/trace/trace_events_inject.c
@@ -328,7 +328,8 @@ event_inject_read(struct file *file, char __user *buf, size_t size,
}
const struct file_operations event_inject_fops = {
- .open = tracing_open_generic,
+ .open = tracing_open_file_tr,
.read = event_inject_read,
.write = event_inject_write,
+ .release = tracing_release_file_tr,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ef064187a9709393a981a56cce1e31880fd97107
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092004-excavate-unending-0257@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ef064187a970 ("drm/amd/display: fix the white screen issue when >= 64GB DRAM")
c0fb85ae02b6 ("drm/amd/display: setup system context in dm_init")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ef064187a9709393a981a56cce1e31880fd97107 Mon Sep 17 00:00:00 2001
From: Yifan Zhang <yifan1.zhang(a)amd.com>
Date: Fri, 8 Sep 2023 16:46:39 +0800
Subject: [PATCH] drm/amd/display: fix the white screen issue when >= 64GB DRAM
Dropping bit 31:4 of page table base is wrong, it makes page table
base points to wrong address if phys addr is beyond 64GB; dropping
page_table_start/end bit 31:4 is unnecessary since dcn20_vmid_setup
will do that. Also, while we are at it, cleanup the assignments using
upper_32_bits()/lower_32_bits() and AMDGPU_GPU_PAGE_SHIFT.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Acked-by: Harry Wentland <harry.wentland(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang(a)amd.com>
Co-developed-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 88ba8b66de1f..6a0ea15936ae 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1274,11 +1274,15 @@ static void mmhub_read_system_context(struct amdgpu_device *adev, struct dc_phy_
pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);
- page_table_start.high_part = (u32)(adev->gmc.gart_start >> 44) & 0xF;
- page_table_start.low_part = (u32)(adev->gmc.gart_start >> 12);
- page_table_end.high_part = (u32)(adev->gmc.gart_end >> 44) & 0xF;
- page_table_end.low_part = (u32)(adev->gmc.gart_end >> 12);
- page_table_base.high_part = upper_32_bits(pt_base) & 0xF;
+ page_table_start.high_part = upper_32_bits(adev->gmc.gart_start >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_start.low_part = lower_32_bits(adev->gmc.gart_start >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_end.high_part = upper_32_bits(adev->gmc.gart_end >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_end.low_part = lower_32_bits(adev->gmc.gart_end >>
+ AMDGPU_GPU_PAGE_SHIFT);
+ page_table_base.high_part = upper_32_bits(pt_base);
page_table_base.low_part = lower_32_bits(pt_base);
pa_config->system_aperture.start_addr = (uint64_t)logical_addr_low << 18;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092038-rebound-flashy-4766@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092037-underpaid-casing-6ccf@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092036-outline-champion-bd90@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092035-atop-usual-3d91@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092033-sash-truffle-be6d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092034-campsite-isolated-53bf@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 5229a658f6453362fbb9da6bf96872ef25a7097e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092033-banker-expensive-aa8d@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
5229a658f645 ("ext4: do not let fstrim block system suspend")
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5229a658f6453362fbb9da6bf96872ef25a7097e Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:55 +0200
Subject: [PATCH] ext4: do not let fstrim block system suspend
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable(a)kernel.org
Reported-by: Len Brown <lenb(a)kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09091adfde64..1e599305d85f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/nospec.h>
#include <linux/backing-dev.h>
+#include <linux/freezer.h>
#include <trace/events/ext4.h>
/*
@@ -6916,6 +6917,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
EXT4_CLUSTER_BITS(sb);
}
+static bool ext4_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6949,8 +6955,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
+ if (ext4_trim_interrupted())
+ return count;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -7072,6 +7078,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
for (group = first_group; group <= last_group; group++) {
+ if (ext4_trim_interrupted())
+ break;
grp = ext4_get_group_info(sb, group);
if (!grp)
continue;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092028-snore-semantic-95ac@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092026-imperfect-carnivore-b6de@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092027-worried-darkened-ffcc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092026-charger-blah-7144@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()")
bd2eea8d0a6b ("ext4: remove the 'group' parameter of ext4_trim_extent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092025-handlebar-decimal-2fff@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
d63c00ea435a ("ext4: mark group as trimmed only if it was fully scanned")
2327fb2e2341 ("ext4: change s_last_trim_minblks type to unsigned long")
173b6e383d2a ("ext4: avoid trim error on fs with small groups")
afcc4e32f606 ("ext4: scope ret locally in ext4_try_to_trim_range()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092024-quintuple-veteran-81ec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092023-outfit-renounce-262c@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
de8bf0e5ee74 ("ext4: replace the traditional ternary conditional operator with with max()/min()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 13 Sep 2023 17:04:54 +0200
Subject: [PATCH] ext4: move setting of trimmed bit into
ext4_try_to_trim_range()
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable(a)kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..09091adfde64 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6906,6 +6906,16 @@ __acquires(bitlock)
return ret;
}
+static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
+ ext4_group_t grp)
+{
+ if (grp < ext4_get_groups_count(sb))
+ return EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ return (ext4_blocks_count(EXT4_SB(sb)->s_es) -
+ ext4_group_first_block_no(sb, grp) - 1) >>
+ EXT4_CLUSTER_BITS(sb);
+}
+
static int ext4_try_to_trim_range(struct super_block *sb,
struct ext4_buddy *e4b, ext4_grpblk_t start,
ext4_grpblk_t max, ext4_grpblk_t minblocks)
@@ -6913,9 +6923,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
__releases(ext4_group_lock_ptr(sb, e4b->bd_group))
{
ext4_grpblk_t next, count, free_count;
+ bool set_trimmed = false;
void *bitmap;
bitmap = e4b->bd_bitmap;
+ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group))
+ set_trimmed = true;
start = max(e4b->bd_info->bb_first_free, start);
count = 0;
free_count = 0;
@@ -6930,16 +6943,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
int ret = ext4_trim_extent(sb, start, next - start, e4b);
if (ret && ret != -EOPNOTSUPP)
- break;
+ return count;
count += next - start;
}
free_count += next - start;
start = next + 1;
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
if (need_resched()) {
ext4_unlock_group(sb, e4b->bd_group);
@@ -6951,6 +6962,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
break;
}
+ if (set_trimmed)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+
return count;
}
@@ -6961,7 +6975,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
- * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6971,7 +6984,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks, bool set_trimmed)
+ ext4_grpblk_t minblocks)
{
struct ext4_buddy e4b;
int ret;
@@ -6988,13 +7001,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_lock_group(sb, group);
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
- minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
+ minblocks < EXT4_SB(sb)->s_last_trim_minblks)
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0 && set_trimmed)
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- } else {
+ else
ret = 0;
- }
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
@@ -7027,7 +7037,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
- bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -7046,10 +7055,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks - 1) {
+ if (end >= max_blks - 1)
end = max_blks - 1;
- eof = true;
- }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -7063,7 +7070,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -7082,13 +7088,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group) {
+ if (group == last_group)
end = last_cluster;
- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
- }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen, whole_group);
+ end, minlen);
if (cnt < 0) {
ret = cnt;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092018-italics-animation-cbfc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092016-cut-spearfish-b5dd@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092015-hazard-genre-fff0@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 737dd811a3dbfd7edd4ad2ba5152e93d99074f83
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092014-ritzy-preaching-56d2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
737dd811a3db ("ata: libahci: clear pending interrupt status")
93c7711494f4 ("ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737dd811a3dbfd7edd4ad2ba5152e93d99074f83 Mon Sep 17 00:00:00 2001
From: Szuying Chen <chensiying21(a)gmail.com>
Date: Thu, 7 Sep 2023 16:17:10 +0800
Subject: [PATCH] ata: libahci: clear pending interrupt status
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen(a)asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable(a)vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..f1263364fa97 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1256,6 +1256,26 @@ static ssize_t ahci_activity_show(struct ata_device *dev, char *buf)
return sprintf(buf, "%d\n", emp->blink_policy);
}
+static void ahci_port_clear_pending_irq(struct ata_port *ap)
+{
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *port_mmio = ahci_port_base(ap);
+ u32 tmp;
+
+ /* clear SError */
+ tmp = readl(port_mmio + PORT_SCR_ERR);
+ dev_dbg(ap->host->dev, "PORT_SCR_ERR 0x%x\n", tmp);
+ writel(tmp, port_mmio + PORT_SCR_ERR);
+
+ /* clear port IRQ */
+ tmp = readl(port_mmio + PORT_IRQ_STAT);
+ dev_dbg(ap->host->dev, "PORT_IRQ_STAT 0x%x\n", tmp);
+ if (tmp)
+ writel(tmp, port_mmio + PORT_IRQ_STAT);
+
+ writel(1 << ap->port_no, hpriv->mmio + HOST_IRQ_STAT);
+}
+
static void ahci_port_init(struct device *dev, struct ata_port *ap,
int port_no, void __iomem *mmio,
void __iomem *port_mmio)
@@ -1270,18 +1290,7 @@ static void ahci_port_init(struct device *dev, struct ata_port *ap,
if (rc)
dev_warn(dev, "%s (%d)\n", emsg, rc);
- /* clear SError */
- tmp = readl(port_mmio + PORT_SCR_ERR);
- dev_dbg(dev, "PORT_SCR_ERR 0x%x\n", tmp);
- writel(tmp, port_mmio + PORT_SCR_ERR);
-
- /* clear port IRQ */
- tmp = readl(port_mmio + PORT_IRQ_STAT);
- dev_dbg(dev, "PORT_IRQ_STAT 0x%x\n", tmp);
- if (tmp)
- writel(tmp, port_mmio + PORT_IRQ_STAT);
-
- writel(1 << port_no, mmio + HOST_IRQ_STAT);
+ ahci_port_clear_pending_irq(ap);
/* mark esata ports */
tmp = readl(port_mmio + PORT_CMD);
@@ -1603,6 +1612,8 @@ int ahci_do_hardreset(struct ata_link *link, unsigned int *class,
tf.status = ATA_BUSY;
ata_tf_to_fis(&tf, 0, 0, d2h_fis);
+ ahci_port_clear_pending_irq(ap);
+
rc = sata_link_hardreset(link, timing, deadline, online,
ahci_check_ready);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 24e0e61db3cb86a66824531989f1df80e0939f26
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092059-broiling-pumice-a10f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
24e0e61db3cb ("ata: libata: disallow dev-initiated LPM transitions to unsupported states")
7fe183c773c4 ("ata: start separating SATA specific code from libata-core.c")
a52fbcfc7b38 ("ata: move EXPORT_SYMBOL_GPL()s close to exported code")
10a663a1b151 ("ata: ahci: Add shutdown to freeze hardware resources of ahci")
95364f36701e ("ata: make qc_prep return ata_completion_errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e0e61db3cb86a66824531989f1df80e0939f26 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)wdc.com>
Date: Mon, 4 Sep 2023 22:42:56 +0200
Subject: [PATCH] ata: libata: disallow dev-initiated LPM transitions to
unsupported states
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In AHCI 1.3.1, the register description for CAP.SSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Slumber state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Slumber requests."
In AHCI 1.3.1, the register description for CAP.PSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Partial state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Partial requests."
Ensure that we always set the corresponding bits in PxSCTL.IPM, such that
a device is not allowed to initiate transitions to power states which are
unsupported by the HBA.
DevSleep is always initiated by the HBA, however, for completeness, set the
corresponding bit in PxSCTL.IPM such that agressive link power management
cannot transition to DevSleep if DevSleep is not supported.
sata_link_scr_lpm() is used by libahci, ata_piix and libata-pmp.
However, only libahci has the ability to read the CAP/CAP2 register to see
if these features are supported. Therefore, in order to not introduce any
regressions on ata_piix or libata-pmp, create flags that indicate that the
respective feature is NOT supported. This way, the behavior for ata_piix
and libata-pmp should remain unchanged.
This change is based on a patch originally submitted by Runa Guo-oc.
Signed-off-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Fixes: 1152b2617a6e ("libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index abb5911c9d09..08745e7db820 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1883,6 +1883,15 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
else
dev_info(&pdev->dev, "SSS flag set, parallel bus scan disabled\n");
+ if (!(hpriv->cap & HOST_CAP_PART))
+ host->flags |= ATA_HOST_NO_PART;
+
+ if (!(hpriv->cap & HOST_CAP_SSC))
+ host->flags |= ATA_HOST_NO_SSC;
+
+ if (!(hpriv->cap2 & HOST_CAP2_SDS))
+ host->flags |= ATA_HOST_NO_DEVSLP;
+
if (pi.flags & ATA_FLAG_EM)
ahci_reset_em(host);
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 5d31c08be013..a701e1538482 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -396,10 +396,23 @@ int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy,
case ATA_LPM_MED_POWER_WITH_DIPM:
case ATA_LPM_MIN_POWER_WITH_PARTIAL:
case ATA_LPM_MIN_POWER:
- if (ata_link_nr_enabled(link) > 0)
- /* no restrictions on LPM transitions */
+ if (ata_link_nr_enabled(link) > 0) {
+ /* assume no restrictions on LPM transitions */
scontrol &= ~(0x7 << 8);
- else {
+
+ /*
+ * If the controller does not support partial, slumber,
+ * or devsleep, then disallow these transitions.
+ */
+ if (link->ap->host->flags & ATA_HOST_NO_PART)
+ scontrol |= (0x1 << 8);
+
+ if (link->ap->host->flags & ATA_HOST_NO_SSC)
+ scontrol |= (0x2 << 8);
+
+ if (link->ap->host->flags & ATA_HOST_NO_DEVSLP)
+ scontrol |= (0x4 << 8);
+ } else {
/* empty port, power off */
scontrol &= ~0xf;
scontrol |= (0x1 << 2);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 52d58b13e5ee..bf4913f4d7ac 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -222,6 +222,10 @@ enum {
ATA_HOST_PARALLEL_SCAN = (1 << 2), /* Ports on this host can be scanned in parallel */
ATA_HOST_IGNORE_ATA = (1 << 3), /* Ignore ATA devices on this host. */
+ ATA_HOST_NO_PART = (1 << 4), /* Host does not support partial */
+ ATA_HOST_NO_SSC = (1 << 5), /* Host does not support slumber */
+ ATA_HOST_NO_DEVSLP = (1 << 6), /* Host does not support devslp */
+
/* bits 24:31 of host->flags are reserved for LLD specific flags */
/* various lengths of time */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092028-purveyor-limpness-f224@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
763748b238ef ("btrfs: reduce amount of reserved metadata for delayed item insertion")
c9d02ab4b436 ("btrfs: set delayed item type when initializing it")
3bae13e9d42e ("btrfs: do not BUG_ON() on failure to reserve metadata for delayed item")
a176affe547c ("btrfs: assert that delayed item is a dir index item when adding it")
b7ef5f3a6f37 ("btrfs: loop only once over data sizes array when inserting an item batch")
086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible")
eb10d85ee77f ("btrfs: factor out the copying loop of dir items from log_dir_items()")
289cffcb0399 ("btrfs: remove no longer needed checks for NULL log context")
5a656c3628b2 ("btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool")
506650dcb3a7 ("btrfs: improve the batch insertion of delayed items")
bfaa324e9a80 ("btrfs: remove total_data_size variable in btrfs_batch_insert_items()")
bb385bedded3 ("btrfs: fix error handling in __btrfs_update_delayed_inode")
64d6b281ba4d ("btrfs: remove unnecessary check_parent_dirs_for_sync()")
3e6a86a193b0 ("btrfs: skip logging directories already logged when logging all parents")
ab12313a9f56 ("btrfs: avoid logging new ancestor inodes when logging new inode")
47d3db41e190 ("btrfs: fix race that makes inode logging fallback to transaction commit")
4d6221d7d831 ("btrfs: fix race that causes unnecessary logging of ancestor inodes")
5893dfb98f25 ("btrfs: refactor btrfs_drop_extents() to make it easier to extend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092027-census-monorail-5614@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
763748b238ef ("btrfs: reduce amount of reserved metadata for delayed item insertion")
c9d02ab4b436 ("btrfs: set delayed item type when initializing it")
3bae13e9d42e ("btrfs: do not BUG_ON() on failure to reserve metadata for delayed item")
a176affe547c ("btrfs: assert that delayed item is a dir index item when adding it")
b7ef5f3a6f37 ("btrfs: loop only once over data sizes array when inserting an item batch")
086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible")
eb10d85ee77f ("btrfs: factor out the copying loop of dir items from log_dir_items()")
289cffcb0399 ("btrfs: remove no longer needed checks for NULL log context")
5a656c3628b2 ("btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool")
506650dcb3a7 ("btrfs: improve the batch insertion of delayed items")
bfaa324e9a80 ("btrfs: remove total_data_size variable in btrfs_batch_insert_items()")
bb385bedded3 ("btrfs: fix error handling in __btrfs_update_delayed_inode")
64d6b281ba4d ("btrfs: remove unnecessary check_parent_dirs_for_sync()")
3e6a86a193b0 ("btrfs: skip logging directories already logged when logging all parents")
ab12313a9f56 ("btrfs: avoid logging new ancestor inodes when logging new inode")
47d3db41e190 ("btrfs: fix race that makes inode logging fallback to transaction commit")
4d6221d7d831 ("btrfs: fix race that causes unnecessary logging of ancestor inodes")
5893dfb98f25 ("btrfs: refactor btrfs_drop_extents() to make it easier to extend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092026-anemia-unwanted-1c2a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
763748b238ef ("btrfs: reduce amount of reserved metadata for delayed item insertion")
c9d02ab4b436 ("btrfs: set delayed item type when initializing it")
3bae13e9d42e ("btrfs: do not BUG_ON() on failure to reserve metadata for delayed item")
a176affe547c ("btrfs: assert that delayed item is a dir index item when adding it")
b7ef5f3a6f37 ("btrfs: loop only once over data sizes array when inserting an item batch")
086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible")
eb10d85ee77f ("btrfs: factor out the copying loop of dir items from log_dir_items()")
289cffcb0399 ("btrfs: remove no longer needed checks for NULL log context")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092025-unwell-virtual-9c97@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092024-maritime-smelting-37a5@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item")
91bfe3104b8d ("btrfs: improve error message after failure to add delayed dir index item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 28 Aug 2023 09:06:43 +0100
Subject: [PATCH] btrfs: remove BUG() after failure to insert delayed dir index
item
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bb9908cbabfe..6e779bc16340 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1426,7 +1426,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
}
-/* Will return 0 or -ENOMEM */
+static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
+ return;
+
+ /*
+ * Adding the new dir index item does not require touching another
+ * leaf, so we can release 1 unit of metadata that was previously
+ * reserved when starting the transaction. This applies only to
+ * the case where we had a transaction start and excludes the
+ * transaction join case (when replaying log trees).
+ */
+ trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, bytes, 0);
+ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
+ ASSERT(trans->bytes_reserved >= bytes);
+ trans->bytes_reserved -= bytes;
+}
+
+/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
@@ -1468,6 +1490,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
mutex_lock(&delayed_node->mutex);
+ /*
+ * First attempt to insert the delayed item. This is to make the error
+ * handling path simpler in case we fail (-EEXIST). There's no risk of
+ * any other task coming in and running the delayed item before we do
+ * the metadata space reservation below, because we are holding the
+ * delayed node's mutex and that mutex must also be locked before the
+ * node's delayed items can be run.
+ */
+ ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
+ if (unlikely(ret)) {
+ btrfs_err(trans->fs_info,
+"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
+ name_len, name, index, btrfs_root_id(delayed_node->root),
+ delayed_node->inode_id, dir->index_cnt,
+ delayed_node->index_cnt, ret);
+ btrfs_release_delayed_item(delayed_item);
+ btrfs_release_dir_index_item_space(trans);
+ mutex_unlock(&delayed_node->mutex);
+ goto release_node;
+ }
+
if (delayed_node->index_item_leaves == 0 ||
delayed_node->curr_index_batch_size + data_len > leaf_data_size) {
delayed_node->curr_index_batch_size = data_len;
@@ -1485,37 +1528,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
* impossible.
*/
if (WARN_ON(ret)) {
- mutex_unlock(&delayed_node->mutex);
btrfs_release_delayed_item(delayed_item);
+ mutex_unlock(&delayed_node->mutex);
goto release_node;
}
delayed_node->index_item_leaves++;
- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
-
- /*
- * Adding the new dir index item does not require touching another
- * leaf, so we can release 1 unit of metadata that was previously
- * reserved when starting the transaction. This applies only to
- * the case where we had a transaction start and excludes the
- * transaction join case (when replaying log trees).
- */
- trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, bytes, 0);
- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
- ASSERT(trans->bytes_reserved >= bytes);
- trans->bytes_reserved -= bytes;
- }
-
- ret = __btrfs_add_delayed_item(delayed_node, delayed_item);
- if (unlikely(ret)) {
- btrfs_err(trans->fs_info,
-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d",
- name_len, name, index, btrfs_root_id(delayed_node->root),
- delayed_node->inode_id, dir->index_cnt,
- delayed_node->index_cnt, ret);
- BUG();
+ } else {
+ btrfs_release_dir_index_item_space(trans);
}
mutex_unlock(&delayed_node->mutex);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092013-slightly-hubcap-822b@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
487781796d30 ("btrfs: make fast fsyncs wait only for writeback")
75b463d2b47a ("btrfs: do not commit logs and transactions during link and rename operations")
260db43cd2f5 ("btrfs: delete duplicated words + other fixes in comments")
3ef64143a796 ("btrfs: remove no longer used trans_list member of struct btrfs_ordered_extent")
cd8d39f4aeb3 ("btrfs: remove no longer used log_list member of struct btrfs_ordered_extent")
7af597433d43 ("btrfs: make full fsyncs always operate on the entire file again")
0a8068a3dd42 ("btrfs: make ranged full fsyncs more efficient")
da447009a256 ("btrfs: factor out inode items copy loop from btrfs_log_inode()")
a5eeb3d17b97 ("btrfs: add helper to get the end offset of a file extent item")
95418ed1d107 ("btrfs: fix missing file extent item for hole after ranged fsync")
3f1c64ce0438 ("btrfs: delete the ordered isize update code")
41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
236ebc20d9af ("btrfs: fix log context list corruption after rename whiteout error")
b5e4ff9d465d ("Btrfs: fix infinite loop during fsync after rename operations")
0e56315ca147 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES")
9c7d3a548331 ("btrfs: move extent_io_tree defs to their own header")
6f0d04f8e72e ("btrfs: separate out the extent io init function")
33ca832fefa5 ("btrfs: separate out the extent leak code")
afd7a71872f1 ("Merge tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092012-congress-stubborn-56b5@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
487781796d30 ("btrfs: make fast fsyncs wait only for writeback")
75b463d2b47a ("btrfs: do not commit logs and transactions during link and rename operations")
260db43cd2f5 ("btrfs: delete duplicated words + other fixes in comments")
3ef64143a796 ("btrfs: remove no longer used trans_list member of struct btrfs_ordered_extent")
cd8d39f4aeb3 ("btrfs: remove no longer used log_list member of struct btrfs_ordered_extent")
7af597433d43 ("btrfs: make full fsyncs always operate on the entire file again")
0a8068a3dd42 ("btrfs: make ranged full fsyncs more efficient")
da447009a256 ("btrfs: factor out inode items copy loop from btrfs_log_inode()")
a5eeb3d17b97 ("btrfs: add helper to get the end offset of a file extent item")
95418ed1d107 ("btrfs: fix missing file extent item for hole after ranged fsync")
3f1c64ce0438 ("btrfs: delete the ordered isize update code")
41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
236ebc20d9af ("btrfs: fix log context list corruption after rename whiteout error")
b5e4ff9d465d ("Btrfs: fix infinite loop during fsync after rename operations")
0e56315ca147 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES")
9c7d3a548331 ("btrfs: move extent_io_tree defs to their own header")
6f0d04f8e72e ("btrfs: separate out the extent io init function")
33ca832fefa5 ("btrfs: separate out the extent leak code")
afd7a71872f1 ("Merge tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092011-resume-sneezing-595f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
4ca8e03cf2bf ("btrfs: check for BTRFS_FS_ERROR in pending ordered assert")
487781796d30 ("btrfs: make fast fsyncs wait only for writeback")
75b463d2b47a ("btrfs: do not commit logs and transactions during link and rename operations")
260db43cd2f5 ("btrfs: delete duplicated words + other fixes in comments")
3ef64143a796 ("btrfs: remove no longer used trans_list member of struct btrfs_ordered_extent")
cd8d39f4aeb3 ("btrfs: remove no longer used log_list member of struct btrfs_ordered_extent")
7af597433d43 ("btrfs: make full fsyncs always operate on the entire file again")
0a8068a3dd42 ("btrfs: make ranged full fsyncs more efficient")
da447009a256 ("btrfs: factor out inode items copy loop from btrfs_log_inode()")
a5eeb3d17b97 ("btrfs: add helper to get the end offset of a file extent item")
95418ed1d107 ("btrfs: fix missing file extent item for hole after ranged fsync")
3f1c64ce0438 ("btrfs: delete the ordered isize update code")
41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
236ebc20d9af ("btrfs: fix log context list corruption after rename whiteout error")
b5e4ff9d465d ("Btrfs: fix infinite loop during fsync after rename operations")
0e56315ca147 ("Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES")
9c7d3a548331 ("btrfs: move extent_io_tree defs to their own header")
6f0d04f8e72e ("btrfs: separate out the extent io init function")
33ca832fefa5 ("btrfs: separate out the extent leak code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ca8e03cf2bfaeef7c85939fa1ea0c749cd116ab Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 24 Aug 2023 16:59:04 -0400
Subject: [PATCH] btrfs: check for BTRFS_FS_ERROR in pending ordered assert
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: stable(a)vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index b46ab348e8e5..345c449d588c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -639,7 +639,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
- ASSERT(trans);
+ ASSERT(trans || BTRFS_FS_ERROR(fs_info));
if (trans) {
if (atomic_dec_and_test(&trans->pending_ordered))
wake_up(&trans->pending_wait);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ee34a82e890a7babb5585daf1a6dd7d4d1cf142a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092056-manifesto-shame-79c2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ee34a82e890a ("btrfs: release path before inode lookup during the ino lookup ioctl")
0202e83fdab0 ("btrfs: simplify iget helpers")
c75e839414d3 ("btrfs: kill the subvol_srcu")
5c8fd99fec9d ("btrfs: make inodes hold a ref on their roots")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
0024652895e3 ("btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root")
bd647ce385ec ("btrfs: add a leak check for roots")
8260edba67a2 ("btrfs: make the init of static elements in fs_info separate")
ae18c37ad5a1 ("btrfs: move fs_info init work into it's own helper function")
141386e1a5d6 ("btrfs: free more things in btrfs_free_fs_info")
bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root")
81f096edf047 ("btrfs: use btrfs_put_fs_root to free roots always")
0d4b0463011d ("btrfs: export and rename free_fs_info")
fbb0ce40d606 ("btrfs: hold a ref on the root in btrfs_check_uuid_tree_entry")
ca2037fba6af ("btrfs: hold a ref on the root in btrfs_recover_log_trees")
5119cfc36f6d ("btrfs: hold a ref on the root in create_pending_snapshot")
5168489a079a ("btrfs: hold a ref on the root in get_subvol_name_from_objectid")
6f9a3da5da9e ("btrfs: hold a ref on the root in btrfs_ioctl_send")
fd79d43b347e ("btrfs: hold a ref on the root in scrub_print_warning_inode")
0b2dee5cff74 ("btrfs: hold a ref for the root in btrfs_find_orphan_roots")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee34a82e890a7babb5585daf1a6dd7d4d1cf142a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Sat, 26 Aug 2023 11:28:20 +0100
Subject: [PATCH] btrfs: release path before inode lookup during the ino lookup
ioctl
During the ino lookup ioctl we can end up calling btrfs_iget() to get an
inode reference while we are holding on a root's btree. If btrfs_iget()
needs to lookup the inode from the root's btree, because it's not
currently loaded in memory, then it will need to lock another or the
same path in the same root btree. This may result in a deadlock and
trigger the following lockdep splat:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Not tainted
------------------------------------------------------
syz-executor277/5012 is trying to acquire lock:
ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
but task is already holding lock:
ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-tree-00){++++}-{3:3}:
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302
btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955
btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline]
btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338
btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline]
open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494
btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154
btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
fc_mount fs/namespace.c:1112 [inline]
vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142
btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (btrfs-tree-01){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(btrfs-tree-00);
lock(btrfs-tree-01);
lock(btrfs-tree-00);
rlock(btrfs-tree-01);
*** DEADLOCK ***
1 lock held by syz-executor277/5012:
#0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
stack backtrace:
CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0bec94ea39
Fix this simply by releasing the path before calling btrfs_iget() as at
point we don't need the path anymore.
Reported-by: syzbot+bf66ad948981797d2f1d(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000045fa140603c4a969@google.com/
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..d27b0d86b8e2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1958,6 +1958,13 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
+ /*
+ * We don't need the path anymore, so release it and
+ * avoid deadlocks and lockdep warnings in case
+ * btrfs_iget() needs to lookup the inode from its root
+ * btree and lock the same leaf.
+ */
+ btrfs_release_path(path);
temp_inode = btrfs_iget(sb, key2.objectid, root);
if (IS_ERR(temp_inode)) {
ret = PTR_ERR(temp_inode);
@@ -1978,7 +1985,6 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
- btrfs_release_path(path);
key.objectid = key.offset;
key.offset = (u64)-1;
dirid = key.objectid;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ee34a82e890a7babb5585daf1a6dd7d4d1cf142a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023092054-antibody-prenatal-dc3c@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ee34a82e890a ("btrfs: release path before inode lookup during the ino lookup ioctl")
0202e83fdab0 ("btrfs: simplify iget helpers")
c75e839414d3 ("btrfs: kill the subvol_srcu")
5c8fd99fec9d ("btrfs: make inodes hold a ref on their roots")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
0024652895e3 ("btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root")
bd647ce385ec ("btrfs: add a leak check for roots")
8260edba67a2 ("btrfs: make the init of static elements in fs_info separate")
ae18c37ad5a1 ("btrfs: move fs_info init work into it's own helper function")
141386e1a5d6 ("btrfs: free more things in btrfs_free_fs_info")
bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root")
81f096edf047 ("btrfs: use btrfs_put_fs_root to free roots always")
0d4b0463011d ("btrfs: export and rename free_fs_info")
fbb0ce40d606 ("btrfs: hold a ref on the root in btrfs_check_uuid_tree_entry")
ca2037fba6af ("btrfs: hold a ref on the root in btrfs_recover_log_trees")
5119cfc36f6d ("btrfs: hold a ref on the root in create_pending_snapshot")
5168489a079a ("btrfs: hold a ref on the root in get_subvol_name_from_objectid")
6f9a3da5da9e ("btrfs: hold a ref on the root in btrfs_ioctl_send")
fd79d43b347e ("btrfs: hold a ref on the root in scrub_print_warning_inode")
0b2dee5cff74 ("btrfs: hold a ref for the root in btrfs_find_orphan_roots")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee34a82e890a7babb5585daf1a6dd7d4d1cf142a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Sat, 26 Aug 2023 11:28:20 +0100
Subject: [PATCH] btrfs: release path before inode lookup during the ino lookup
ioctl
During the ino lookup ioctl we can end up calling btrfs_iget() to get an
inode reference while we are holding on a root's btree. If btrfs_iget()
needs to lookup the inode from the root's btree, because it's not
currently loaded in memory, then it will need to lock another or the
same path in the same root btree. This may result in a deadlock and
trigger the following lockdep splat:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Not tainted
------------------------------------------------------
syz-executor277/5012 is trying to acquire lock:
ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
but task is already holding lock:
ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-tree-00){++++}-{3:3}:
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302
btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955
btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline]
btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338
btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline]
open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494
btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154
btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
fc_mount fs/namespace.c:1112 [inline]
vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142
btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (btrfs-tree-01){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(btrfs-tree-00);
lock(btrfs-tree-01);
lock(btrfs-tree-00);
rlock(btrfs-tree-01);
*** DEADLOCK ***
1 lock held by syz-executor277/5012:
#0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
stack backtrace:
CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0bec94ea39
Fix this simply by releasing the path before calling btrfs_iget() as at
point we don't need the path anymore.
Reported-by: syzbot+bf66ad948981797d2f1d(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000045fa140603c4a969@google.com/
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..d27b0d86b8e2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1958,6 +1958,13 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
+ /*
+ * We don't need the path anymore, so release it and
+ * avoid deadlocks and lockdep warnings in case
+ * btrfs_iget() needs to lookup the inode from its root
+ * btree and lock the same leaf.
+ */
+ btrfs_release_path(path);
temp_inode = btrfs_iget(sb, key2.objectid, root);
if (IS_ERR(temp_inode)) {
ret = PTR_ERR(temp_inode);
@@ -1978,7 +1985,6 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
goto out_put;
}
- btrfs_release_path(path);
key.objectid = key.offset;
key.offset = (u64)-1;
dirid = key.objectid;
Dropped in August:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…
Only re-applied and build tested against v6.1.53.
If they do apply and test well against linux-5.{4,10,15}, all the better.
Florian Westphal (1):
netfilter: nf_tables: don't skip expired elements during walk
Pablo Neira Ayuso (4):
netfilter: nf_tables: GC transaction API to avoid race with control
plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from
packet path
netfilter: nf_tables: remove busy mark and gc batch API
include/net/netfilter/nf_tables.h | 120 +++++-------
net/netfilter/nf_tables_api.c | 307 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 85 +++++----
net/netfilter/nft_set_pipapo.c | 66 +++++--
net/netfilter/nft_set_rbtree.c | 146 ++++++++------
5 files changed, 476 insertions(+), 248 deletions(-)
--
2.42.0.459.ge4e396fd5e-goog
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: cff9b2332ab762b7e0586c793c431a8f2ea4db04
Gitweb: https://git.kernel.org/tip/cff9b2332ab762b7e0586c793c431a8f2ea4db04
Author: Liam R. Howlett <Liam.Howlett(a)oracle.com>
AuthorDate: Fri, 15 Sep 2023 13:44:44 -04:00
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Tue, 19 Sep 2023 10:48:04 +02:00
kernel/sched: Modify initial boot task idle setup
Initial booting is setting the task flag to idle (PF_IDLE) by the call
path sched_init() -> init_idle(). Having the task idle and calling
call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
set. Subsequent calls to any cond_resched() will enable IRQs,
potentially earlier than the IRQ setup has completed. Recent changes
have caused just this scenario and IRQs have been enabled early.
This causes a warning later in start_kernel() as interrupts are enabled
before they are fully set up.
Fix this issue by setting the PF_IDLE flag later in the boot sequence.
Although the boot task was marked as idle since (at least) d80e4fda576d,
I am not sure that it is wrong to do so. The forced context-switch on
idle task was introduced in the tiny_rcu update, so I'm going to claim
this fixes 5f6130fa52ee.
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+Zo…
---
kernel/sched/core.c | 2 +-
kernel/sched/idle.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2299a5c..802551e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
* PF_KTHREAD should already be set at this point; regardless, make it
* look like a proper per-CPU kthread.
*/
- idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
+ idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
kthread_set_per_cpu(idle, cpu);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 342f58a..5007b25 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -373,6 +373,7 @@ EXPORT_SYMBOL_GPL(play_idle_precise);
void cpu_startup_entry(enum cpuhp_state state)
{
+ current->flags |= PF_IDLE;
arch_cpu_idle_prepare();
cpuhp_online_idle(state);
while (1)
From: Marek Vasut <marex(a)denx.de>
[ Upstream commit 362fa8f6e6a05089872809f4465bab9d011d05b3 ]
This bridge seems to need the HSE packet, otherwise the image is
shifted up and corrupted at the bottom. This makes the bridge
work with Samsung DSIM on i.MX8MM and i.MX8MP.
Signed-off-by: Marek Vasut <marex(a)denx.de>
Reviewed-by: Sam Ravnborg <sam(a)ravnborg.org>
Signed-off-by: Robert Foss <rfoss(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615201902.566182-3-marex…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/bridge/tc358762.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/tc358762.c b/drivers/gpu/drm/bridge/tc358762.c
index 5641395fd310e..11445c50956e1 100644
--- a/drivers/gpu/drm/bridge/tc358762.c
+++ b/drivers/gpu/drm/bridge/tc358762.c
@@ -231,7 +231,7 @@ static int tc358762_probe(struct mipi_dsi_device *dsi)
dsi->lanes = 1;
dsi->format = MIPI_DSI_FMT_RGB888;
dsi->mode_flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
- MIPI_DSI_MODE_LPM;
+ MIPI_DSI_MODE_LPM | MIPI_DSI_MODE_VIDEO_HSE;
ret = tc358762_parse_dt(ctx);
if (ret < 0)
--
2.40.1
The folio conversion changed the behaviour of shmem_sg_alloc_table() to
put the entire length of the last folio into the sg list, even if the sg
list should have been shorter. gen8_ggtt_insert_entries() relied on the
list being the right langth and would overrun the end of the page tables.
Other functions may also have been affected.
Clamp the length of the last entry in the sg list to be the expected
length.
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: stable(a)vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+ unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
}
} while (1);
+ nr_pages = min_t(unsigned long,
+ folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
sg = sg_next(sg);
st->nents++;
- sg_set_folio(sg, folio, folio_size(folio), 0);
+ sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
- sg->length += folio_size(folio);
+ sg->length += nr_pages * PAGE_SIZE;
}
- next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- i += folio_nr_pages(folio) - 1;
+ next_pfn = folio_pfn(folio) + nr_pages;
+ i += nr_pages - 1;
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);
--
2.40.1
From: Kent Overstreet <kent.overstreet(a)gmail.com>
commit 3644e2d2dda78e21edd8f5415b6d7ab03f5f54f3 upstream.
If iter->count is 0 and iocb->ki_pos is page aligned, this causes
nr_pages to be 0.
Then in generic_file_buffered_read_get_pages() find_get_pages_contig()
returns 0 - because we asked for 0 pages, so we call
generic_file_buffered_read_no_cached_page() which attempts to add a page
to the page cache, which fails with -EEXIST, and then we loop. Oops...
Signed-off-by: Kent Overstreet <kent.overstreet(a)gmail.com>
Reported-by: Jens Axboe <axboe(a)kernel.dk>
Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
---
mm/filemap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 3a983bc1a71c..3b0d8c6dd587 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2203,6 +2203,9 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
return 0;
+ if (unlikely(!iov_iter_count(iter)))
+ return 0;
+
iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
index = *ppos >> PAGE_SHIFT;
--
2.34.1
The quilt patch titled
Subject: proc: nommu: /proc/<pid>/maps: release mmap read lock
has been removed from the -mm tree. Its filename was
proc-nommu-proc-pid-maps-release-mmap-read-lock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ben Wolsieffer <Ben.Wolsieffer(a)hefring.com>
Subject: proc: nommu: /proc/<pid>/maps: release mmap read lock
Date: Thu, 14 Sep 2023 12:30:20 -0400
The no-MMU implementation of /proc/<pid>/map doesn't normally release
the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
whether to release the lock. Since _vml is NULL when the end of the
mappings is reached, the lock is not released.
Reading /proc/1/maps twice doesn't cause a hang because it only
takes the read lock, which can be taken multiple times and therefore
doesn't show any problem if the lock isn't released. Instead, you need
to perform some operation that attempts to take the write lock after
reading /proc/<pid>/maps. To actually reproduce the bug, compile the
following code as 'proc_maps_bug':
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
int main(int argc, char *argv[]) {
void *buf;
sleep(1);
buf = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
puts("mmap returned");
return 0;
}
Then, run:
./proc_maps_bug &; cat /proc/$!/maps; fg
Without this patch, mmap() will hang and the command will never
complete.
This code was incorrectly adapted from the MMU implementation, which at
the time released the lock in m_next() before returning the last entry.
The MMU implementation has diverged further from the no-MMU version since
then, so this patch brings their locking and error handling into sync,
fixing the bug and hopefully avoiding similar issues in the future.
Link: https://lkml.kernel.org/r/20230914163019.4050530-2-ben.wolsieffer@hefring.c…
Fixes: 47fecca15c09 ("fs/proc/task_nommu.c: don't use priv->task->mm")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer(a)hefring.com>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Giulio Benetti <giulio.benetti(a)benettiengineering.com>
Cc: Greg Ungerer <gerg(a)uclinux.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_nommu.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
--- a/fs/proc/task_nommu.c~proc-nommu-proc-pid-maps-release-mmap-read-lock
+++ a/fs/proc/task_nommu.c
@@ -192,11 +192,16 @@ static void *m_start(struct seq_file *m,
return ERR_PTR(-ESRCH);
mm = priv->mm;
- if (!mm || !mmget_not_zero(mm))
+ if (!mm || !mmget_not_zero(mm)) {
+ put_task_struct(priv->task);
+ priv->task = NULL;
return NULL;
+ }
if (mmap_read_lock_killable(mm)) {
mmput(mm);
+ put_task_struct(priv->task);
+ priv->task = NULL;
return ERR_PTR(-EINTR);
}
@@ -205,23 +210,21 @@ static void *m_start(struct seq_file *m,
if (vma)
return vma;
- mmap_read_unlock(mm);
- mmput(mm);
return NULL;
}
-static void m_stop(struct seq_file *m, void *_vml)
+static void m_stop(struct seq_file *m, void *v)
{
struct proc_maps_private *priv = m->private;
+ struct mm_struct *mm = priv->mm;
- if (!IS_ERR_OR_NULL(_vml)) {
- mmap_read_unlock(priv->mm);
- mmput(priv->mm);
- }
- if (priv->task) {
- put_task_struct(priv->task);
- priv->task = NULL;
- }
+ if (!priv->task)
+ return;
+
+ mmap_read_unlock(mm);
+ mmput(mm);
+ put_task_struct(priv->task);
+ priv->task = NULL;
}
static void *m_next(struct seq_file *m, void *_p, loff_t *pos)
_
Patches currently in -mm which might be from Ben.Wolsieffer(a)hefring.com are
The quilt patch titled
Subject: mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-cma-and-highatomic-landing-on-the-wrong-buddy-list.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
Date: Mon, 11 Sep 2023 14:11:08 -0400
Commit 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
bypasses the pcplist on lock contention and returns the page directly to
the buddy list of the page's migratetype.
For pages that don't have their own pcplist, such as CMA and HIGHATOMIC,
the migratetype is temporarily updated such that the page can hitch a ride
on the MOVABLE pcplist. Their true type is later reassessed when flushing
in free_pcppages_bulk(). However, when lock contention is detected after
the type was already overridden, the bypass will then put the page on the
wrong buddy list.
Once on the MOVABLE buddy list, the page becomes eligible for fallbacks
and even stealing. In the case of HIGHATOMIC, otherwise ineligible
allocations can dip into the highatomic reserves. In the case of CMA, the
page can be lost from the CMA region permanently.
Use a separate pcpmigratetype variable for the pcplist override. Use the
original migratetype when going directly to the buddy. This fixes the bug
and should make the intentions more obvious in the code.
Originally sent here to address the HIGHATOMIC case:
https://lore.kernel.org/lkml/20230821183733.106619-4-hannes@cmpxchg.org/
Changelog updated in response to the CMA-specific bug report.
[mgorman(a)techsingularity.net: updated changelog]
Link: https://lkml.kernel.org/r/20230911181108.GA104295@cmpxchg.org
Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Joe Liu <joe.liu(a)mediatek.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-cma-and-highatomic-landing-on-the-wrong-buddy-list
+++ a/mm/page_alloc.c
@@ -2400,7 +2400,7 @@ void free_unref_page(struct page *page,
struct per_cpu_pages *pcp;
struct zone *zone;
unsigned long pfn = page_to_pfn(page);
- int migratetype;
+ int migratetype, pcpmigratetype;
if (!free_unref_page_prepare(page, pfn, order))
return;
@@ -2408,24 +2408,24 @@ void free_unref_page(struct page *page,
/*
* We only track unmovable, reclaimable and movable on pcp lists.
* Place ISOLATE pages on the isolated list because they are being
- * offlined but treat HIGHATOMIC as movable pages so we can get those
- * areas back if necessary. Otherwise, we may have to free
+ * offlined but treat HIGHATOMIC and CMA as movable pages so we can
+ * get those areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
*/
- migratetype = get_pcppage_migratetype(page);
+ migratetype = pcpmigratetype = get_pcppage_migratetype(page);
if (unlikely(migratetype >= MIGRATE_PCPTYPES)) {
if (unlikely(is_migrate_isolate(migratetype))) {
free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
return;
}
- migratetype = MIGRATE_MOVABLE;
+ pcpmigratetype = MIGRATE_MOVABLE;
}
zone = page_zone(page);
pcp_trylock_prepare(UP_flags);
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (pcp) {
- free_unref_page_commit(zone, pcp, page, migratetype, order);
+ free_unref_page_commit(zone, pcp, page, pcpmigratetype, order);
pcp_spin_unlock(pcp);
} else {
free_one_page(zone, page, pfn, order, migratetype, FPI_NONE);
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
The patch titled
Subject: i915: limit the length of an sg list to the requested length
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
i915-limit-the-length-of-an-sg-list-to-the-requested-length.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: i915: limit the length of an sg list to the requested length
Date: Tue, 19 Sep 2023 20:48:55 +0100
The folio conversion changed the behaviour of shmem_sg_alloc_table() to
put the entire length of the last folio into the sg list, even if the sg
list should have been shorter. gen8_ggtt_insert_entries() relied on the
list being the right langth and would overrun the end of the page tables.
Other functions may also have been affected.
Clamp the length of the last entry in the sg list to be the expected
length.
Link: https://lkml.kernel.org/r/20230919194855.347582-1-willy@infradead.org
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Closes: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Tested-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [6.5.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c~i915-limit-the-length-of-an-sg-list-to-the-requested-length
+++ a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+ unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915
}
} while (1);
+ nr_pages = min_t(unsigned long,
+ folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915
sg = sg_next(sg);
st->nents++;
- sg_set_folio(sg, folio, folio_size(folio), 0);
+ sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
- sg->length += folio_size(folio);
+ sg->length += nr_pages * PAGE_SIZE;
}
- next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- i += folio_nr_pages(folio) - 1;
+ next_pfn = folio_pfn(folio) + nr_pages;
+ i += nr_pages - 1;
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);
_
Patches currently in -mm which might be from willy(a)infradead.org are
i915-limit-the-length-of-an-sg-list-to-the-requested-length.patch
mm-convert-dax-lock-unlock-page-to-lock-unlock-folio.patch
buffer-pass-gfp-flags-to-folio_alloc_buffers.patch
buffer-hoist-gfp-flags-from-grow_dev_page-to-__getblk_gfp.patch
ext4-use-bdev_getblk-to-avoid-memory-reclaim-in-readahead-path.patch
buffer-use-bdev_getblk-to-avoid-memory-reclaim-in-readahead-path.patch
buffer-convert-getblk_unmovable-and-__getblk-to-use-bdev_getblk.patch
buffer-convert-sb_getblk-to-call-__getblk.patch
ext4-call-bdev_getblk-from-sb_getblk_gfp.patch
buffer-remove-__getblk_gfp.patch
hugetlb-use-a-folio-in-free_hpage_workfn.patch
hugetlb-remove-a-few-calls-to-page_folio.patch
hugetlb-convert-remove_pool_huge_page-to-remove_pool_hugetlb_folio.patch
(Re-sending email since the previous email was undeliverable due to HTML content)
Hi Team,
This is a request to backport the following fix to 6.5/scsi-fixes. This was merged into Linus' tree.
This fix fixes a crash due to a null pointer exception when a lun reset is issued from sgreset for a lun.
With this fix, there is no longer a crash.
I have another fix, which I have tested, dependent on this fix. It is currently in the pipeline.
I'll send out a patch for that fix when the internal review is complete.
Please let me know if you need any more information to backport this fix.
commit 15924b0503630016dee4dbb945a8df4df659070b
Author: Karan Tilak Kumar kartilak(a)cisco.com
Date: Thu Aug 17 11:21:46 2023 -0700
scsi: fnic: Replace sgreset tag with max_tag_id
sgreset is issued with a SCSI command pointer. The device reset code
assumes that it was issued on a hardware queue, and calls block multiqueue
layer. However, the assumption is broken, and there is no hardware queue
associated with the sgreset, and this leads to a crash due to a null
pointer exception.
Fix the code to use the max_tag_id as a tag which does not overlap with the
other tags issued by mid layer.
Tested by running FC traffic for a few minutes, and by issuing sgreset on
the device in parallel. Without the fix, the crash is observed right away.
With this fix, no crash is observed.
Reviewed-by: Sesidhar Baddela sebaddel(a)cisco.com
Tested-by: Karan Tilak Kumar kartilak(a)cisco.com
Signed-off-by: Karan Tilak Kumar kartilak(a)cisco.com
Link: https://lore.kernel.org/r/20230817182146.229059-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen martin.petersen(a)oracle.com
Thanks,
Karan
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 9cc0a598b944816f2968baf2631757f22721b996
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091616-equipment-bucktooth-6ae5@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
9cc0a598b944 ("mtd: rawnand: brcmnand: Fix potential false time out warning")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9cc0a598b944816f2968baf2631757f22721b996 Mon Sep 17 00:00:00 2001
From: William Zhang <william.zhang(a)broadcom.com>
Date: Thu, 6 Jul 2023 11:29:06 -0700
Subject: [PATCH] mtd: rawnand: brcmnand: Fix potential false time out warning
If system is busy during the command status polling function, the driver
may not get the chance to poll the status register till the end of time
out and return the premature status. Do a final check after time out
happens to ensure reading the correct status.
Fixes: 9d2ee0a60b8b ("mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-3-william.zhang@broa…
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 9ea96911d16b..9a373a10304d 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1080,6 +1080,14 @@ static int bcmnand_ctrl_poll_status(struct brcmnand_controller *ctrl,
cpu_relax();
} while (time_after(limit, jiffies));
+ /*
+ * do a final check after time out in case the CPU was busy and the driver
+ * did not get enough time to perform the polling to avoid false alarms
+ */
+ val = brcmnand_read_reg(ctrl, BRCMNAND_INTFC_STATUS);
+ if ((val & mask) == expected_val)
+ return 0;
+
dev_warn(ctrl->dev, "timeout on status poll (expected %x got %x)\n",
expected_val, val & mask);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5d53244186c9ac58cb88d76a0958ca55b83a15cd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091632-marauding-viable-2fad@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5d53244186c9 ("mtd: rawnand: brcmnand: Fix potential out-of-bounds access in oob write")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d53244186c9ac58cb88d76a0958ca55b83a15cd Mon Sep 17 00:00:00 2001
From: William Zhang <william.zhang(a)broadcom.com>
Date: Thu, 6 Jul 2023 11:29:08 -0700
Subject: [PATCH] mtd: rawnand: brcmnand: Fix potential out-of-bounds access in
oob write
When the oob buffer length is not in multiple of words, the oob write
function does out-of-bounds read on the oob source buffer at the last
iteration. Fix that by always checking length limit on the oob buffer
read and fill with 0xff when reaching the end of the buffer to the oob
registers.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230706182909.79151-5-william.zhang@broa…
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index b2c6396060db..71d0ba652bee 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1477,19 +1477,33 @@ static int write_oob_to_regs(struct brcmnand_controller *ctrl, int i,
const u8 *oob, int sas, int sector_1k)
{
int tbytes = sas << sector_1k;
- int j;
+ int j, k = 0;
+ u32 last = 0xffffffff;
+ u8 *plast = (u8 *)&last;
/* Adjust OOB values for 1K sector size */
if (sector_1k && (i & 0x01))
tbytes = max(0, tbytes - (int)ctrl->max_oob);
tbytes = min_t(int, tbytes, ctrl->max_oob);
- for (j = 0; j < tbytes; j += 4)
+ /*
+ * tbytes may not be multiple of words. Make sure we don't read out of
+ * the boundary and stop at last word.
+ */
+ for (j = 0; (j + 3) < tbytes; j += 4)
oob_reg_write(ctrl, j,
(oob[j + 0] << 24) |
(oob[j + 1] << 16) |
(oob[j + 2] << 8) |
(oob[j + 3] << 0));
+
+ /* handle the remaing bytes */
+ while (j < tbytes)
+ plast[k++] = oob[j++];
+
+ if (tbytes & 0x3)
+ oob_reg_write(ctrl, (tbytes & ~0x3), (__force u32)cpu_to_be32(last));
+
return tbytes;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 57a943ebfcdb4a97fbb409640234bdb44bfa1953
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091617-deserve-animal-a57e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
57a943ebfcdb ("drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma")
5d945cbcd4b1 ("drm/amd/display: Create a file dedicated to planes")
60693e3a3890 ("Merge tag 'amd-drm-next-5.20-2022-07-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57a943ebfcdb4a97fbb409640234bdb44bfa1953 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Thu, 31 Aug 2023 15:12:28 -0100
Subject: [PATCH] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy
gamma
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma
using a pre-defined sRGB transfer function. It works fine for DCN2
family where degamma ROM and custom curves go to the same color block.
But, on DCN3+, degamma is split into two blocks: degamma ROM for
pre-defined TFs and `gamma correction` for user/custom curves and
degamma ROM settings doesn't apply to cursor plane. To get DRM legacy
gamma working as expected, enable cursor degamma ROM for implict sRGB
degamma on HW with this configuration.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 2198df96ed6f..cc74dd69acf2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1269,6 +1269,13 @@ void amdgpu_dm_plane_handle_cursor_update(struct drm_plane *plane,
attributes.rotation_angle = 0;
attributes.attribute_flags.value = 0;
+ /* Enable cursor degamma ROM on DCN3+ for implicit sRGB degamma in DRM
+ * legacy gamma setup.
+ */
+ if (crtc_state->cm_is_degamma_srgb &&
+ adev->dm.dc->caps.color.dpp.gamma_corr)
+ attributes.attribute_flags.bits.ENABLE_CURSOR_DEGAMMA = 1;
+
attributes.pitch = afb->base.pitches[0] / afb->base.format->cpp[0];
if (crtc_state->stream) {