Good friend. we kindly want to know if you're capable for investment
project in
your country. i
need a serious partnership with good background, kindly reply
me to discuss details immediately. i will appreciate you to contact me
on this email address Thanks and awaiting your quick response,
Wormer?
commit 4aa3b75c74603c3374877d5fd18ad9cc3a9a62ed upstream.
The Counter (CNTR) register is 24 bits wide, but we can have an
effective 25-bit count value by setting bit 24 to the XOR of the Borrow
flag and Carry flag. The flags can be read from the FLAG register, but a
race condition exists: the Borrow flag and Carry flag are instantaneous
and could change by the time the count value is read from the CNTR
register.
Since the race condition could result in an incorrect 25-bit count
value, remove support for 25-bit count values from this driver.
Fixes: 28e5d3bb0325 ("iio: 104-quad-8: Add IIO support for the ACCES 104-QUAD-8")
Cc: <stable(a)vger.kernel.org> # 4.14.x
Signed-off-by: William Breathitt Gray <william.gray(a)linaro.org>
---
drivers/iio/counter/104-quad-8.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/iio/counter/104-quad-8.c b/drivers/iio/counter/104-quad-8.c
index 181585ae6e..bdb07694e2 100644
--- a/drivers/iio/counter/104-quad-8.c
+++ b/drivers/iio/counter/104-quad-8.c
@@ -64,9 +64,6 @@ static int quad8_read_raw(struct iio_dev *indio_dev,
{
struct quad8_iio *const priv = iio_priv(indio_dev);
const int base_offset = priv->base + 2 * chan->channel;
- unsigned int flags;
- unsigned int borrow;
- unsigned int carry;
int i;
switch (mask) {
@@ -76,12 +73,7 @@ static int quad8_read_raw(struct iio_dev *indio_dev,
return IIO_VAL_INT;
}
- flags = inb(base_offset + 1);
- borrow = flags & BIT(0);
- carry = !!(flags & BIT(1));
-
- /* Borrow XOR Carry effectively doubles count range */
- *val = (borrow ^ carry) << 24;
+ *val = 0;
/* Reset Byte Pointer; transfer Counter to Output Latch */
outb(0x11, base_offset + 1);
base-commit: df06e352f27a9f368ec6a3b077881c35d933e32c
--
2.40.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f4e9e0e69468583c2c6d9d5c7bfc975e292bf188
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042436-smile-upwind-5931@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
9760ebffbf55 ("mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator")
47d9644de92c ("nommu: convert nommu to using the vma iterator")
a27a11f92fe2 ("mm/mremap: use vmi version of vma_merge()")
076f16bf7698 ("mmap: use vmi version of vma_merge()")
0c0c5bffd0a2 ("mmap: pass through vmi iterator to __split_vma()")
178e22ac2078 ("madvise: use vmi iterator for __split_vma() and vma_merge()")
f10c2abcdac4 ("mempolicy: convert to vma iterator")
37598f5a9d8b ("mlock: convert mlock to vma iterator")
2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
11a9b90274f6 ("userfaultfd: use vma iterator")
f2ebfe43ba6c ("mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
5ab0fc155dc0 ("Sync mm-stable with mm-hotfixes-stable to pick up dependent patches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4e9e0e69468583c2c6d9d5c7bfc975e292bf188 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 10 Apr 2023 11:22:05 -0400
Subject: [PATCH] mm/mempolicy: fix use-after-free of VMA iterator
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a256a241fd1d..2068b594dc88 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_area_struct *vma,
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
-
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
+
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start, unsigned long len,
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start, unsigned long len,
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
We used to map the dtb differently between early_pg_dir and
swapper_pg_dir which caused issues when we referenced addresses from
the early mapping with swapper_pg_dir (reserved_mem): move the dtb mapping
to the fixmap region in patch 1, which allows to simplify dtb handling in
patch 2.
base-commit-tag: v6.1.24
Alexandre Ghiti (3):
riscv: Move early dtb mapping into the fixmap region
riscv: Do not set initial_boot_params to the linear address of the dtb
riscv: No need to relocate the dtb as it lies in the fixmap region
Documentation/riscv/vm-layout.rst | 4 +-
arch/riscv/include/asm/fixmap.h | 8 +++
arch/riscv/include/asm/pgtable.h | 8 ++-
arch/riscv/kernel/setup.c | 6 +--
arch/riscv/mm/init.c | 82 ++++++++++++++-----------------
5 files changed, 53 insertions(+), 55 deletions(-)
--
2.37.2
We used to map the dtb differently between early_pg_dir and
swapper_pg_dir which caused issues when we referenced addresses from
the early mapping with swapper_pg_dir (reserved_mem): move the dtb mapping
to the fixmap region in patch 1, which allows to simplify dtb handling in
patch 2.
base-commit-tag: v5.15.108
Changes in v2:
- Fix upstream commit line position
Alexandre Ghiti (3):
riscv: Move early dtb mapping into the fixmap region
riscv: Do not set initial_boot_params to the linear address of the dtb
riscv: No need to relocate the dtb as it lies in the fixmap region
Documentation/riscv/vm-layout.rst | 2 +-
arch/riscv/include/asm/fixmap.h | 8 ++++
arch/riscv/include/asm/pgtable.h | 8 +++-
arch/riscv/kernel/setup.c | 6 +--
arch/riscv/mm/init.c | 68 ++++++++++++++++---------------
5 files changed, 52 insertions(+), 40 deletions(-)
--
2.37.2
Dev can be renamed also while up for supported device. We currently
wrongly clear the NETDEV_LED_MODE_LINKUP flag on NETDEV_CHANGENAME
event.
Fix this by rechecking if the carrier is ok on NETDEV_CHANGENAME and
correctly set the NETDEV_LED_MODE_LINKUP bit.
Fixes: 5f820ed52371 ("leds: trigger: netdev: fix handling on interface rename")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
Cc: stable(a)vger.kernel.org # v5.5+
---
drivers/leds/trigger/ledtrig-netdev.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/leds/trigger/ledtrig-netdev.c b/drivers/leds/trigger/ledtrig-netdev.c
index d5e774d83021..f4d670ec30bc 100644
--- a/drivers/leds/trigger/ledtrig-netdev.c
+++ b/drivers/leds/trigger/ledtrig-netdev.c
@@ -318,6 +318,9 @@ static int netdev_trig_notify(struct notifier_block *nb,
clear_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode);
switch (evt) {
case NETDEV_CHANGENAME:
+ if (netif_carrier_ok(dev))
+ set_bit(NETDEV_LED_MODE_LINKUP, &trigger_data->mode);
+ fallthrough;
case NETDEV_REGISTER:
if (trigger_data->net_dev)
dev_put(trigger_data->net_dev);
--
2.39.2
We used to map the dtb differently between early_pg_dir and
swapper_pg_dir which caused issues when we referenced addresses from
the early mapping with swapper_pg_dir (reserved_mem): move the dtb mapping
to the fixmap region in patch 1, which allows to simplify dtb handling in
patch 2.
base-commit-tag: v5.15.108
Alexandre Ghiti (3):
[ Upstream commit ef69d2559fe9 ]
[ Upstream commit f1581626071c ]
[ Upstream commit 1b50f956c8fe ]
Documentation/riscv/vm-layout.rst | 2 +-
arch/riscv/include/asm/fixmap.h | 8 ++++
arch/riscv/include/asm/pgtable.h | 8 +++-
arch/riscv/kernel/setup.c | 6 +--
arch/riscv/mm/init.c | 68 ++++++++++++++++---------------
5 files changed, 52 insertions(+), 40 deletions(-)
--
2.37.2
Since commit 3e4be65eb82c ("Bluetooth: hci_qca: Add poweroff support
during hci down for wcn3990"), the setup callback which registers the
debugfs interface can be called multiple times.
This specifically leads to the following error when powering on the
controller:
debugfs: Directory 'ibs' with parent 'hci0' already present!
Add a driver flag to avoid trying to register the debugfs interface more
than once.
Fixes: 3e4be65eb82c ("Bluetooth: hci_qca: Add poweroff support during hci down for wcn3990")
Cc: stable(a)vger.kernel.org # 4.20
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/bluetooth/hci_qca.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 38ff962662ff..db020c04b3e8 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -78,7 +78,8 @@ enum qca_flags {
QCA_HW_ERROR_EVENT,
QCA_SSR_TRIGGERED,
QCA_BT_OFF,
- QCA_ROM_FW
+ QCA_ROM_FW,
+ QCA_DEBUGFS_CREATED,
};
enum qca_capabilities {
@@ -635,6 +636,9 @@ static void qca_debugfs_init(struct hci_dev *hdev)
if (!hdev->debugfs)
return;
+ if (test_and_set_bit(QCA_DEBUGFS_CREATED, &qca->flags))
+ return;
+
ibs_dir = debugfs_create_dir("ibs", hdev->debugfs);
/* read only */
--
2.39.2
Dear,
Please grant me permission to share a very crucial discussion with
you. I am looking forward to hearing from you at your earliest
convenience.
Mrs. Mariam Kouame
Following up on:
https://lore.kernel.org/stable/20230412-mustang-machine-e9fccdb6b81c@wendy/
Here's some backports that do pull back the rename of the driver and
Kconfig symbol etc.
CC: stable(a)vger.kernel.org
CC: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
CC: Greentime Hu <greentime.hu(a)sifive.com>
CC: Zong Li <zong.li(a)sifive.com>
CC: Palmer Dabbelt <palmer(a)rivosinc.com>
CC: Sasha Levin <sashal(a)kernel.org>
Yang Yingliang (3):
soc: sifive: l2_cache: fix missing iounmap() in error path in
sifive_l2_init()
soc: sifive: l2_cache: fix missing free_irq() in error path in
sifive_l2_init()
soc: sifive: l2_cache: fix missing of_node_put() in sifive_l2_init()
drivers/soc/sifive/sifive_l2_cache.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
--
2.39.2
Following up on:
https://lore.kernel.org/stable/20230412-mustang-machine-e9fccdb6b81c@wendy/
Here's some backports that do pull back the rename of the driver and
Kconfig symbol etc.
Changes in v2:
- use the right hash for 2/3
CC: stable(a)vger.kernel.org
CC: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
CC: Greentime Hu <greentime.hu(a)sifive.com>
CC: Zong Li <zong.li(a)sifive.com>
CC: Palmer Dabbelt <palmer(a)rivosinc.com>
CC: Sasha Levin <sashal(a)kernel.org>
Yang Yingliang (3):
soc: sifive: l2_cache: fix missing iounmap() in error path in
sifive_l2_init()
soc: sifive: l2_cache: fix missing free_irq() in error path in
sifive_l2_init()
soc: sifive: l2_cache: fix missing of_node_put() in sifive_l2_init()
drivers/soc/sifive/sifive_l2_cache.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
--
2.39.2
Dzień dobry,
czy rozważali Państwo rozwój kwalifikacji językowych swoich pracowników?
Opracowaliśmy kursy językowe dla różnych branż, w których koncentrujemy się na podniesieniu poziomu słownictwa i jakości komunikacji wykorzystując autorską metodę, stworzoną specjalnie dla wymagającego biznesu.
Niestandardowy kurs on-line, dopasowany do profilu firmy i obszarów świadczonych usług, w szybkim czasie przyniesie efekty, które zwiększą komfort i jakość pracy, rozwijając możliwości biznesowe.
Zdalne szkolenie językowe to m.in. zajęcia z native speakerami, które w szybkim czasie nauczą pracowników rozmawiać za pomocą jasnego i zwięzłego języka Business English.
Czy mógłbym przedstawić więcej szczegółów i opowiedzieć jak działamy?
Pozdrawiam
Krzysztof Maj
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x f4e9e0e69468583c2c6d9d5c7bfc975e292bf188
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042437-profane-confidant-7987@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
9760ebffbf55 ("mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator")
47d9644de92c ("nommu: convert nommu to using the vma iterator")
a27a11f92fe2 ("mm/mremap: use vmi version of vma_merge()")
076f16bf7698 ("mmap: use vmi version of vma_merge()")
0c0c5bffd0a2 ("mmap: pass through vmi iterator to __split_vma()")
178e22ac2078 ("madvise: use vmi iterator for __split_vma() and vma_merge()")
f10c2abcdac4 ("mempolicy: convert to vma iterator")
37598f5a9d8b ("mlock: convert mlock to vma iterator")
2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
11a9b90274f6 ("userfaultfd: use vma iterator")
f2ebfe43ba6c ("mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
5ab0fc155dc0 ("Sync mm-stable with mm-hotfixes-stable to pick up dependent patches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4e9e0e69468583c2c6d9d5c7bfc975e292bf188 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 10 Apr 2023 11:22:05 -0400
Subject: [PATCH] mm/mempolicy: fix use-after-free of VMA iterator
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a256a241fd1d..2068b594dc88 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_area_struct *vma,
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
-
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
+
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start, unsigned long len,
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start, unsigned long len,
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
This is the start of the stable review cycle for the 5.4.241 release.
There are 92 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.241-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.241-rc1
Darrick J. Wong <djwong(a)kernel.org>
xfs: force log and push AIL to clear pinned inodes when aborting mount
Brian Foster <bfoster(a)redhat.com>
xfs: don't reuse busy extents on extent trim
Brian Foster <bfoster(a)redhat.com>
xfs: consider shutdown in bmapbt cursor delete assert
Darrick J. Wong <djwong(a)kernel.org>
xfs: shut down the filesystem if we screw up quota reservation
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: report corruption only as a regular error
Jeffrey Mitchell <jeffrey.mitchell(a)starlab.io>
xfs: set inode size after creating symlink
Christoph Hellwig <hch(a)lst.de>
xfs: fix up non-directory creation in SGID directories
Christoph Hellwig <hch(a)lst.de>
xfs: remove the di_version field from struct icdinode
Christoph Hellwig <hch(a)lst.de>
xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize
Christoph Hellwig <hch(a)lst.de>
xfs: simplify di_flags2 inheritance in xfs_ialloc
Christoph Hellwig <hch(a)lst.de>
xfs: only check the superblock version for dinode size calculation
Christoph Hellwig <hch(a)lst.de>
xfs: add a new xfs_sb_version_has_v3inode helper
Christoph Hellwig <hch(a)lst.de>
xfs: remove the kuid/kgid conversion wrappers
Christoph Hellwig <hch(a)lst.de>
xfs: remove the icdinode di_uid/di_gid members
Christoph Hellwig <hch(a)lst.de>
xfs: ensure that the inode uid/gid match values match the icdinode ones
Christoph Hellwig <hch(a)lst.de>
xfs: merge the projid fields in struct xfs_icdinode
Kaixu Xia <kaixuxia(a)tencent.com>
xfs: show the proper user quota options
Steve Clevenger <scclevenger(a)os.amperecomputing.com>
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
George Cherian <george.cherian(a)marvell.com>
watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
Gregor Herburger <gregor.herburger(a)tq-group.com>
i2c: ocores: generate stop condition after timeout in polling mode
ZhaoLong Wang <wangzhaolong1(a)huawei.com>
ubi: Fix deadlock caused by recursively holding work_sem
Lee Jones <lee.jones(a)linaro.org>
mtd: ubi: wl: Fix a couple of kernel-doc issues
Zhihao Cheng <chengzhihao1(a)huawei.com>
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Robbie Harwood <rharwood(a)redhat.com>
asymmetric_keys: log on fatal failures in PE/pkcs7
Robbie Harwood <rharwood(a)redhat.com>
verify_pefile: relax wrapper length check
Hans de Goede <hdegoede(a)redhat.com>
drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
Hans de Goede <hdegoede(a)redhat.com>
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
Alexander Stein <alexander.stein(a)ew.tq-group.com>
i2c: imx-lpi2c: clean rx/tx buffers upon new message
Grant Grundler <grundler(a)chromium.org>
power: supply: cros_usbpd: reclassify "default case!" as debug
Roman Gushchin <roman.gushchin(a)linux.dev>
net: macb: fix a memory corruption in extended buffer descriptor mode
Eric Dumazet <edumazet(a)google.com>
udp6: fix potential access to stale information
Saravanan Vajravel <saravanan.vajravel(a)broadcom.com>
RDMA/core: Fix GID entry ref leak when create_ah fails
Xin Long <lucien.xin(a)gmail.com>
sctp: fix a potential overflow in sctp_ifwdtsn_skip
Denis Plotnikov <den-plotnikov(a)yandex-team.ru>
qlcnic: check pci_reset_function result
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
niu: Fix missing unwind goto in niu_alloc_channels()
Zheng Wang <zyytlz.wz(a)163.com>
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
Christophe Kerello <christophe.kerello(a)foss.st.com>
mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
Arseniy Krasnov <avkrasnov(a)sberdevices.ru>
mtd: rawnand: meson: fix bitmask for length in command word
Bang Li <libang.linuxer(a)gmail.com>
mtdblock: tolerate corrected bit-flips
Christoph Hellwig <hch(a)lst.de>
btrfs: fix fast csum implementation detection
David Sterba <dsterba(a)suse.com>
btrfs: print checksum type and implementation at mount time
Min Li <lm0963hack(a)gmail.com>
Bluetooth: Fix race condition in hidp_session_thread
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
Xu Biang <xubiang(a)hust.edu.cn>
ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: emu10k1: fix capture interrupt handler unlinking
Kornel Dulęba <korneld(a)chromium.org>
Revert "pinctrl: amd: Disable and mask interrupts on resume"
Johan Hovold <johan+linaro(a)kernel.org>
irqdomain: Fix mapping-creation race
Johan Hovold <johan+linaro(a)kernel.org>
irqdomain: Refactor __irq_domain_alloc_irqs()
Johan Hovold <johan+linaro(a)kernel.org>
irqdomain: Look for existing mapping only once
Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Zheng Yejian <zhengyejian1(a)huawei.com>
ring-buffer: Fix race while reader and writer are on the same page
Boris Brezillon <boris.brezillon(a)collabora.com>
drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
Pratyush Yadav <ptyadav(a)amazon.de>
net_sched: prevent NULL dereference if default qdisc setup failed
Steven Rostedt (Google) <rostedt(a)goodmis.org>
tracing: Free error logs of tracing instances
Oleksij Rempel <linux(a)rempel-privat.de>
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
John Keeping <john(a)metanate.com>
ftrace: Mark get_lock_parent_ip() __always_inline
Kan Liang <kan.liang(a)linux.intel.com>
perf/core: Fix the same task check in perf_event_set_output
Jeremy Soller <jeremy(a)system76.com>
ALSA: hda/realtek: Add quirk for Clevo X370SNW
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix sysfs interface lifetime
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Sherry Sun <sherry.sun(a)nxp.com>
tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix transmit end interrupt handler
William Breathitt Gray <william.gray(a)linaro.org>
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
Lars-Peter Clausen <lars(a)metafoo.de>
iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Quectel RM500U-CN modem
Enrico Sau <enrico.sau(a)gmail.com>
USB: serial: option: add Telit FE990 compositions
RD Babiera <rdbabiera(a)google.com>
usb: typec: altmodes/displayport: Fix configure initial pin assignment
Kees Jan Koster <kjkoster(a)kjkoster.org>
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
D Scott Phillips <scott(a)os.amperecomputing.com>
xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: callback request does not use correct credential for AUTH_SYS
Jeff Layton <jlayton(a)kernel.org>
sunrpc: only free unix grouplist after RCU settles
Dhruva Gole <d-gole(a)ti.com>
gpio: davinci: Add irq chip flag to skip set wake
Ziyang Xuan <william.xuanziyang(a)huawei.com>
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
Xin Long <lucien.xin(a)gmail.com>
sctp: check send stream number after wait_for_sndbuf
Jakub Kicinski <kuba(a)kernel.org>
net: don't let netpoll invoke NAPI if in xmit context
Eric Dumazet <edumazet(a)google.com>
icmp: guard against too small mtu
Felix Fietkau <nbd(a)nbd.name>
wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
pwm: sprd: Explicitly set .polarity in .get_state()
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
pwm: cros-ec: Explicitly set .polarity in .get_state()
Kornel Dulęba <korneld(a)chromium.org>
pinctrl: amd: Disable and mask interrupts on resume
Sachi King <nakato(a)nakato.io>
pinctrl: amd: disable and mask interrupts on probe
Linus Walleij <linus.walleij(a)linaro.org>
pinctrl: amd: Use irqchip template
Steve French <stfrench(a)microsoft.com>
smb3: fix problem with null cifs super block with previous patch
Kees Cook <keescook(a)chromium.org>
treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()
Tom Saeger <tom.saeger(a)oracle.com>
Revert "treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()"
Waiman Long <longman(a)redhat.com>
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
Basavaraj Natikar <Basavaraj.Natikar(a)amd.com>
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
Jiri Kosina <jkosina(a)suse.cz>
scsi: ses: Handle enclosure with just a primary component gracefully
-------------
Diffstat:
Documentation/sound/hd-audio/models.rst | 2 +-
Makefile | 4 +-
arch/mips/lasat/picvue_proc.c | 2 +-
arch/x86/kernel/sysfb_efi.c | 8 ++
arch/x86/pci/fixup.c | 21 +++
crypto/asymmetric_keys/pkcs7_verify.c | 10 +-
crypto/asymmetric_keys/verify_pefile.c | 32 +++--
drivers/gpio/gpio-davinci.c | 2 +-
drivers/gpu/drm/drm_panel_orientation_quirks.c | 13 +-
drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 +
drivers/hwtracing/coresight/coresight-etm4x.c | 2 +-
drivers/i2c/busses/i2c-imx-lpi2c.c | 2 +
drivers/i2c/busses/i2c-ocores.c | 35 ++---
drivers/iio/adc/ti-ads7950.c | 1 +
drivers/iio/dac/cio-dac.c | 4 +-
drivers/infiniband/core/verbs.c | 2 +
drivers/mtd/mtdblock.c | 12 +-
drivers/mtd/nand/raw/meson_nand.c | 6 +-
drivers/mtd/nand/raw/stm32_fmc2_nand.c | 3 +
drivers/mtd/ubi/build.c | 21 ++-
drivers/mtd/ubi/wl.c | 5 +-
drivers/net/ethernet/cadence/macb_main.c | 4 +
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 +-
drivers/net/ethernet/sun/niu.c | 2 +-
drivers/pinctrl/pinctrl-amd.c | 52 +++++--
drivers/power/supply/cros_usbpd-charger.c | 2 +-
drivers/pwm/pwm-cros-ec.c | 1 +
drivers/pwm/pwm-sprd.c | 1 +
drivers/scsi/ses.c | 20 ++-
drivers/tty/serial/fsl_lpuart.c | 8 +-
drivers/tty/serial/sh-sci.c | 9 +-
drivers/usb/host/xhci.c | 6 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/option.c | 10 ++
drivers/usb/typec/altmodes/displayport.c | 6 +-
drivers/watchdog/sbsa_gwdt.c | 1 +
fs/btrfs/disk-io.c | 17 +++
fs/btrfs/super.c | 2 -
fs/cifs/cifsproto.h | 2 +-
fs/cifs/smb2ops.c | 2 +-
fs/nfsd/nfs4callback.c | 4 +-
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/super.c | 2 +
fs/nilfs2/the_nilfs.c | 12 +-
fs/xfs/libxfs/xfs_attr_leaf.c | 5 +-
fs/xfs/libxfs/xfs_bmap.c | 10 +-
fs/xfs/libxfs/xfs_btree.c | 30 ++--
fs/xfs/libxfs/xfs_format.h | 33 +++--
fs/xfs/libxfs/xfs_ialloc.c | 6 +-
fs/xfs/libxfs/xfs_inode_buf.c | 54 +++----
fs/xfs/libxfs/xfs_inode_buf.h | 8 +-
fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
fs/xfs/libxfs/xfs_inode_fork.h | 9 +-
fs/xfs/libxfs/xfs_log_format.h | 10 +-
fs/xfs/libxfs/xfs_trans_resv.c | 2 +-
fs/xfs/xfs_acl.c | 12 +-
fs/xfs/xfs_bmap_util.c | 16 +--
fs/xfs/xfs_buf_item.c | 2 +-
fs/xfs/xfs_dquot.c | 6 +-
fs/xfs/xfs_error.c | 2 +-
fs/xfs/xfs_extent_busy.c | 14 --
fs/xfs/xfs_icache.c | 8 +-
fs/xfs/xfs_inode.c | 61 +++-----
fs/xfs/xfs_inode.h | 21 +--
fs/xfs/xfs_inode_item.c | 20 ++-
fs/xfs/xfs_ioctl.c | 22 ++-
fs/xfs/xfs_iops.c | 11 +-
fs/xfs/xfs_itable.c | 8 +-
fs/xfs/xfs_linux.h | 32 +----
fs/xfs/xfs_log_recover.c | 6 +-
fs/xfs/xfs_mount.c | 90 ++++++------
fs/xfs/xfs_qm.c | 43 +++---
fs/xfs/xfs_qm_bhv.c | 2 +-
fs/xfs/xfs_quota.h | 4 +-
fs/xfs/xfs_super.c | 10 +-
fs/xfs/xfs_symlink.c | 7 +-
fs/xfs/xfs_trans_dquot.c | 16 ++-
include/linux/ftrace.h | 2 +-
kernel/cgroup/cpuset.c | 6 +-
kernel/events/core.c | 2 +-
kernel/irq/irqdomain.c | 182 +++++++++++++++---------
kernel/trace/ring_buffer.c | 13 +-
kernel/trace/trace.c | 1 +
mm/swapfile.c | 3 +-
net/9p/trans_xen.c | 4 +
net/bluetooth/hidp/core.c | 2 +-
net/bluetooth/l2cap_core.c | 24 +---
net/can/j1939/transport.c | 5 +-
net/core/netpoll.c | 19 ++-
net/ipv4/icmp.c | 5 +
net/ipv6/ip6_output.c | 7 +-
net/ipv6/udp.c | 8 +-
net/mac80211/sta_info.c | 3 +-
net/sched/sch_generic.c | 1 +
net/sctp/socket.c | 4 +
net/sctp/stream_interleave.c | 3 +-
net/sunrpc/svcauth_unix.c | 17 ++-
sound/firewire/tascam/tascam-stream.c | 2 +-
sound/i2c/cs8427.c | 7 +-
sound/pci/emu10k1/emupcm.c | 4 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/pci/hda/patch_sigmatel.c | 10 ++
102 files changed, 731 insertions(+), 549 deletions(-)
The set value of `fast_switch_enabled` flag doesn't guarantee that
fast_switch callback is set. For some drivers such as amd_pstate, the
adjust_perf callback is used but it still sets `fast_switch_possible`
flag. This is not wrong because this flag doesn't imply fast_switch
callback is set, it implies whether the driver can guarantee that
frequency can be changed on any CPU sharing the policy and that the
change will affect all of the policy CPUs without the need to send any
IPIs or issue callbacks from the notifier chain. Therefore add an extra
NULL check before calling fast_switch in sugov_update_single_freq
function.
Ideally `sugov_update_single_freq` function should not be called with
amd_pstate. But in a corner case scenario, when aperf/mperf overflow
occurs, kernel disables frequency invariance calculation which causes
schedutil to fallback to sugov_update_single_freq which currently relies
on the fast_switch callback.
Normal flow:
sugov_update_single_perf
cpufreq_driver_adjust_perf
cpufreq_driver->adjust_perf
Error case flow:
sugov_update_single_perf
sugov_update_single_freq <-- This is chosen because the freq invariant is disabled due to aperf/mperf overflow
cpufreq_driver_fast_switch
cpufreq_driver->fast_switch <-- Here NULL pointer dereference is happening, because fast_switch is not set
Fix this NULL pointer dereference issue by doing a NULL check.
Fixes: a61dec744745 ("cpufreq: schedutil: Avoid missing updates for one-CPU policies")
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/cpufreq/cpufreq.c | 11 +++++++++++
include/linux/cpufreq.h | 1 +
kernel/sched/cpufreq_schedutil.c | 2 +-
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6d8fd3b8dcb5..364d31b55380 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2138,6 +2138,17 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
}
EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
+/**
+ * cpufreq_driver_has_fast_switch - Check "fast switch" callback.
+ *
+ * Return 'true' if the ->fast_switch callback is present for the
+ * current driver or 'false' otherwise.
+ */
+bool cpufreq_driver_has_fast_switch(void)
+{
+ return !!cpufreq_driver->fast_switch;
+}
+
/**
* cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
* @cpu: Target CPU.
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 65623233ab2f..8a9286fc718b 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -604,6 +604,7 @@ struct cpufreq_governor {
/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq);
+bool cpufreq_driver_has_fast_switch(void);
void cpufreq_driver_adjust_perf(unsigned int cpu,
unsigned long min_perf,
unsigned long target_perf,
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e3211455b203..a1c449525ac2 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -364,7 +364,7 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
* concurrently on two different CPUs for the same target and it is not
* necessary to acquire the lock in the fast switch case.
*/
- if (sg_policy->policy->fast_switch_enabled) {
+ if (sg_policy->policy->fast_switch_enabled && cpufreq_driver_has_fast_switch()) {
cpufreq_driver_fast_switch(sg_policy->policy, next_f);
} else {
raw_spin_lock(&sg_policy->update_lock);
--
2.34.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8caa81eb950cb2e9d2d6959b37d853162d197f57
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042242-unsaved-sanded-d4c1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
8caa81eb950c ("pwm: meson: Explicitly set .polarity in .get_state()")
6c452cff79f8 ("pwm: Make .get_state() callback return an error code")
8eca6b0a647a ("Merge tag 'pwm/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8caa81eb950cb2e9d2d6959b37d853162d197f57 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Wed, 22 Mar 2023 22:45:44 +0100
Subject: [PATCH] pwm: meson: Explicitly set .polarity in .get_state()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The driver only supports normal polarity. Complete the implementation of
.get_state() by setting .polarity accordingly.
This fixes a regression that was possible since commit c73a3107624d
("pwm: Handle .get_state() failures") which stopped to zero-initialize
the state passed to the .get_state() callback. This was reported at
https://forum.odroid.com/viewtopic.php?f=177&t=46360 . While this was an
unintended side effect, the real issue is the driver's callback not
setting the polarity.
There is a complicating fact, that the .apply() callback fakes support
for inversed polarity. This is not (and cannot) be matched by
.get_state(). As fixing this isn't easy, only point it out in a comment
to prevent authors of other drivers from copying that approach.
Fixes: c375bcbaabdb ("pwm: meson: Read the full hardware state in meson_pwm_get_state()")
Reported-by: Munehisa Kamata <kamatam(a)amazon.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Link: https://lore.kernel.org/r/20230310191405.2606296-1-u.kleine-koenig@pengutro…
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c
index 16d79ca5d8f5..5cd7b90872c6 100644
--- a/drivers/pwm/pwm-meson.c
+++ b/drivers/pwm/pwm-meson.c
@@ -162,6 +162,12 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
duty = state->duty_cycle;
period = state->period;
+ /*
+ * Note this is wrong. The result is an output wave that isn't really
+ * inverted and so is wrongly identified by .get_state as normal.
+ * Fixing this needs some care however as some machines might rely on
+ * this.
+ */
if (state->polarity == PWM_POLARITY_INVERSED)
duty = period - duty;
@@ -358,6 +364,8 @@ static int meson_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
state->duty_cycle = 0;
}
+ state->polarity = PWM_POLARITY_NORMAL;
+
return 0;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8caa81eb950cb2e9d2d6959b37d853162d197f57
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042244-audience-anemic-4b09@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
8caa81eb950c ("pwm: meson: Explicitly set .polarity in .get_state()")
6c452cff79f8 ("pwm: Make .get_state() callback return an error code")
8eca6b0a647a ("Merge tag 'pwm/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8caa81eb950cb2e9d2d6959b37d853162d197f57 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Wed, 22 Mar 2023 22:45:44 +0100
Subject: [PATCH] pwm: meson: Explicitly set .polarity in .get_state()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The driver only supports normal polarity. Complete the implementation of
.get_state() by setting .polarity accordingly.
This fixes a regression that was possible since commit c73a3107624d
("pwm: Handle .get_state() failures") which stopped to zero-initialize
the state passed to the .get_state() callback. This was reported at
https://forum.odroid.com/viewtopic.php?f=177&t=46360 . While this was an
unintended side effect, the real issue is the driver's callback not
setting the polarity.
There is a complicating fact, that the .apply() callback fakes support
for inversed polarity. This is not (and cannot) be matched by
.get_state(). As fixing this isn't easy, only point it out in a comment
to prevent authors of other drivers from copying that approach.
Fixes: c375bcbaabdb ("pwm: meson: Read the full hardware state in meson_pwm_get_state()")
Reported-by: Munehisa Kamata <kamatam(a)amazon.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Link: https://lore.kernel.org/r/20230310191405.2606296-1-u.kleine-koenig@pengutro…
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c
index 16d79ca5d8f5..5cd7b90872c6 100644
--- a/drivers/pwm/pwm-meson.c
+++ b/drivers/pwm/pwm-meson.c
@@ -162,6 +162,12 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
duty = state->duty_cycle;
period = state->period;
+ /*
+ * Note this is wrong. The result is an output wave that isn't really
+ * inverted and so is wrongly identified by .get_state as normal.
+ * Fixing this needs some care however as some machines might rely on
+ * this.
+ */
if (state->polarity == PWM_POLARITY_INVERSED)
duty = period - duty;
@@ -358,6 +364,8 @@ static int meson_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
state->duty_cycle = 0;
}
+ state->polarity = PWM_POLARITY_NORMAL;
+
return 0;
}
From: Gao Xiang <hsiangkao(a)redhat.com>
commit ada49d64fb3538144192181db05de17e2ffc3551 upstream.
Currently, commit e9e2eae89ddb dropped a (int) decoration from
XFS_LITINO(mp), and since sizeof() expression is also involved,
the result of XFS_LITINO(mp) is simply as the size_t type
(commonly unsigned long).
Considering the expression in xfs_attr_shortform_bytesfit():
offset = (XFS_LITINO(mp) - bytes) >> 3;
let "bytes" be (int)340, and
"XFS_LITINO(mp)" be (unsigned long)336.
on 64-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffffffffffcUL >> 3) = -1
but on 32-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffcUL >> 3) = 0x1fffffff
instead.
so offset becomes a large positive number on 32-bit platform, and
cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0.
Therefore, one result is
"ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));"
assertion failure in xfs_idata_realloc(), which was also the root
cause of the original bugreport from Dennis, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1894177
And it can also be manually triggered with the following commands:
$ touch a;
$ setfattr -n user.0 -v "`seq 0 80`" a;
$ setfattr -n user.1 -v "`seq 0 80`" a
on 32-bit platform.
Fix the case in xfs_attr_shortform_bytesfit() by bailing out
"XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading
comment together with this bugfix suggested by Darrick. It seems the
other users of XFS_LITINO(mp) are not impacted.
Fixes: e9e2eae89ddb ("xfs: only check the superblock version for dinode size calculation")
Cc: <stable(a)vger.kernel.org> # 5.7+
Reported-and-tested-by: Dennis Gilmore <dgilmore(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Gao Xiang <hsiangkao(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu(a)oracle.com>
Acked-by: Darrick J. Wong <djwong(a)kernel.org>
---
Hi Greg,
I had missed this commit when backporting fixes for 5.4.y from v5.11 &
v5.12. The commit has been acked by Darrick.
fs/xfs/libxfs/xfs_attr_leaf.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index f5b16120c64d..2b74b6e9a354 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -435,7 +435,7 @@ xfs_attr_copy_value(
*========================================================================*/
/*
- * Query whether the requested number of additional bytes of extended
+ * Query whether the total requested number of attr fork bytes of extended
* attribute space will be able to fit inline.
*
* Returns zero if not, else the di_forkoff fork offset to be used in the
@@ -455,6 +455,12 @@ xfs_attr_shortform_bytesfit(
int maxforkoff;
int offset;
+ /*
+ * Check if the new size could fit at all first:
+ */
+ if (bytes > XFS_LITINO(mp))
+ return 0;
+
/* rounded down */
offset = (XFS_LITINO(mp) - bytes) >> 3;
--
2.39.1
While using the vdpa device with vIOMMU enabled
in the guest VM, when the vdpa device bind to vfio-pci and run testpmd
then system will fail to unmap.
The test process is
Load guest VM --> attach to virtio driver--> bind to vfio-pci driver
So the mapping process is
1)batched mode map to normal MR
2)batched mode unmapped the normal MR
3)unmapped all the memory
4)mapped to iommu MR
This error happened in step 3). The iotlb was freed in step 2)
and the function vhost_vdpa_process_iotlb_msg will return fail
Which causes failure.
To fix this, we will not remove the AS while the iotlb->nmaps is 0.
This will free in the vhost_vdpa_clean
Cc: stable(a)vger.kernel.org
Fixes: aaca8373c4b1 ("vhost-vdpa: support ASID based IOTLB API")
Signed-off-by: Cindy Lu <lulu(a)redhat.com>
---
drivers/vhost/vdpa.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..74c7d1f978b7 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -851,11 +851,7 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v,
if (!v->in_batch)
ops->set_map(vdpa, asid, iotlb);
}
- /* If we are in the middle of batch processing, delay the free
- * of AS until BATCH_END.
- */
- if (!v->in_batch && !iotlb->nmaps)
- vhost_vdpa_remove_as(v, asid);
+
}
static int vhost_vdpa_va_map(struct vhost_vdpa *v,
@@ -1112,8 +1108,6 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid,
if (v->in_batch && ops->set_map)
ops->set_map(vdpa, asid, iotlb);
v->in_batch = false;
- if (!iotlb->nmaps)
- vhost_vdpa_remove_as(v, asid);
break;
default:
r = -EINVAL;
--
2.34.3
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".
This is erroneous on systems like Meteor Lake, which has different
num_leaves per CPU. Delete the global "num_cache_leaves" and initialize
num_leaves accurately on each CPU.
Cc: Andreas Herrmann <aherrmann(a)suse.com>
Cc: Chen Yu <yu.c.chen(a)intel.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Pu Wen <puwen(a)hygon.cn>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: Zhang Rui <rui.zhang(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Len Brown <len.brown(a)intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
---
After this change, all CPUs will traverse CPUID leaf 0x4 when booted for
the first time. On systems with asymmetric cache topologies this is
useless work.
Creating a list of processor models that have asymmetric cache topologies
was considered. The burden of maintaining such list would outweigh the
performance benefit of skipping this extra step.
---
Changes since v1:
* Do not make num_cache_leaves a per-CPU variable. Instead, reuse the
existing per-CPU ci_cpu_cacheinfo variable. (Dave Hansen)
---
arch/x86/kernel/cpu/cacheinfo.c | 45 ++++++++++++++++++---------------
1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 4063e8991211..45c4e9daf3f1 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -176,7 +176,16 @@ struct _cpuid4_info_regs {
struct amd_northbridge *nb;
};
-static unsigned short num_cache_leaves;
+static inline unsigned int get_num_cache_leaves(unsigned int cpu)
+{
+ return get_cpu_cacheinfo(cpu)->num_leaves;
+}
+
+static inline void
+set_num_cache_leaves(unsigned int nr_leaves, unsigned int cpu)
+{
+ get_cpu_cacheinfo(cpu)->num_leaves = nr_leaves;
+}
/* AMD doesn't have CPUID4. Emulate it here to report the same
information to the user. This makes some assumptions about the machine:
@@ -716,19 +725,21 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu)
void init_amd_cacheinfo(struct cpuinfo_x86 *c)
{
+ unsigned int cpu = c->cpu_index;
+
if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(find_num_cache_leaves(c), cpu);
} else if (c->extended_cpuid_level >= 0x80000006) {
if (cpuid_edx(0x80000006) & 0xf000)
- num_cache_leaves = 4;
+ set_num_cache_leaves(4, cpu);
else
- num_cache_leaves = 3;
+ set_num_cache_leaves(3, cpu);
}
}
void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
{
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(find_num_cache_leaves(c), c->cpu_index);
}
void init_intel_cacheinfo(struct cpuinfo_x86 *c)
@@ -738,24 +749,21 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */
unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */
unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
-#ifdef CONFIG_SMP
unsigned int cpu = c->cpu_index;
-#endif
if (c->cpuid_level > 3) {
- static int is_initialized;
-
- if (is_initialized == 0) {
- /* Init num_cache_leaves from boot CPU */
- num_cache_leaves = find_num_cache_leaves(c);
- is_initialized++;
- }
+ /*
+ * There should be at least one leaf. A non-zero value means
+ * that the number of leaves has been initialized.
+ */
+ if (!get_num_cache_leaves(cpu))
+ set_num_cache_leaves(find_num_cache_leaves(c), cpu);
/*
* Whenever possible use cpuid(4), deterministic cache
* parameters cpuid leaf to find the cache details
*/
- for (i = 0; i < num_cache_leaves; i++) {
+ for (i = 0; i < get_num_cache_leaves(cpu); i++) {
struct _cpuid4_info_regs this_leaf = {};
int retval;
@@ -791,14 +799,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
* Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
* trace cache
*/
- if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
+ if ((!get_num_cache_leaves(cpu) || c->x86 == 15) && c->cpuid_level > 1) {
/* supports eax=2 call */
int j, n;
unsigned int regs[4];
unsigned char *dp = (unsigned char *)regs;
int only_trace = 0;
- if (num_cache_leaves != 0 && c->x86 == 15)
+ if (get_num_cache_leaves(cpu) && c->x86 == 15)
only_trace = 1;
/* Number of times to iterate */
@@ -1000,12 +1008,9 @@ int init_cache_level(unsigned int cpu)
{
struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
- if (!num_cache_leaves)
- return -ENOENT;
if (!this_cpu_ci)
return -EINVAL;
this_cpu_ci->num_levels = 3;
- this_cpu_ci->num_leaves = num_cache_leaves;
return 0;
}
--
2.25.1
Good day,
You can access personal and business loans at an affordable interest rate of 4% with a flexible repayment period of 6 months to 20 years.
Farmers, contractors and self employed individuals are also welcome to apply. For more information about our loans, please see attached flyer.
If interested, send us your details at smartfinancesolutions(a)financier.com and we will get back to you swiftly.
Marketing Dpt
Good day,
You can access personal and business loans at an affordable interest rate of 4% with a flexible repayment period of 6 months to 20 years.
Farmers, contractors and self employed individuals are also welcome to apply. For more information about our loans, please see attached flyer.
If interested, send us your details at smartfinancesolutions(a)financier.com and we will get back to you swiftly.
Marketing Dpt
OverCurrent condition is not standardized in the UHCI spec.
Zhaoxin UHCI controllers report OverCurrent bit active off.
In order to handle OverCurrent condition correctly, the uhci-hcd
driver needs to be told to expect the active-off behavior.
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
v2->v3
- Change patch code style.
drivers/usb/host/uhci-pci.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c
index 3592f757fe05..7bd2fddde770 100644
--- a/drivers/usb/host/uhci-pci.c
+++ b/drivers/usb/host/uhci-pci.c
@@ -119,11 +119,13 @@ static int uhci_pci_init(struct usb_hcd *hcd)
uhci->rh_numports = uhci_count_ports(hcd);
- /* Intel controllers report the OverCurrent bit active on.
- * VIA controllers report it active off, so we'll adjust the
- * bit value. (It's not standardized in the UHCI spec.)
+ /*
+ * Intel controllers report the OverCurrent bit active on. VIA
+ * and ZHAOXIN controllers report it active off, so we'll adjust
+ * the bit value. (It's not standardized in the UHCI spec.)
*/
- if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA)
+ if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA ||
+ to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN)
uhci->oc_low = 1;
/* HP's server management chip requires a longer port reset delay. */
--
2.32.0
This is a good example that emphasizes that the order in which patches
are queued to stable matters. More details in the revert commit.
Tested and intended for 4.14, 4.19, 5.4, 5.10.
Baokun Li (1):
ext4: fix use-after-free in ext4_xattr_set_entry
Ritesh Harjani (1):
ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
Tudor Ambarus (1):
Revert "ext4: fix use-after-free in ext4_xattr_set_entry"
fs/ext4/inline.c | 11 +++++------
fs/ext4/xattr.c | 26 +-------------------------
fs/ext4/xattr.h | 6 +++---
3 files changed, 9 insertions(+), 34 deletions(-)
--
2.40.0.634.g4ca3ef3211-goog
From: Pingfan Liu <kernelfans(a)gmail.com>
Purgatory.ro is a standalone binary that is not linked against the rest of
the kernel. Its image is copied into an array that is linked to the
kernel, and from there kexec relocates it wherever it desires.
Unlike the debug info for vmlinux, which can be used for analyzing crash
such info is useless in purgatory.ro. And discarding them can save about
200K space.
Original:
259080 kexec-purgatory.o
Stripped debug info:
29152 kexec-purgatory.o
Signed-off-by: Pingfan Liu <kernelfans(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Steve Wahl <steve.wahl(a)hpe.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Link: https://lore.kernel.org/r/1596433788-3784-1-git-send-email-kernelfans@gmail…
(cherry picked from commit 52416ffcf823ee11aa19792715664ab94757f111)
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
---
arch/x86/purgatory/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 2f15a2ac4209..2040ddb824c2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -20,6 +20,9 @@ KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initia
KBUILD_CFLAGS += -m$(BITS)
KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
+AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
+AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 16c52e503043aed1e2a2ce38d9249de5936c1f6b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042213-overbid-jitters-7a29@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
16c52e503043 ("LoongArch: Make WriteCombine configurable for ioremap()")
41596803302d ("LoongArch: Make -mstrict-align configurable")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16c52e503043aed1e2a2ce38d9249de5936c1f6b Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Tue, 18 Apr 2023 19:38:58 +0800
Subject: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()
LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).
This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.
Kernel parameter writecombine=on/off can be used to override the Kconfig
option.
Cc: stable(a)vger.kernel.org
Suggested-by: WANG Xuerui <kernel(a)xen0n.name>
Reviewed-by: WANG Xuerui <kernel(a)xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 19600c50277b..6ae5f129fbca 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -128,6 +128,7 @@ parameter is applicable::
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
+ LOONGARCH LoongArch architecture is enabled.
LOOP Loopback device support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6221a1d057dd..7016cb12dc4e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6933,6 +6933,12 @@
When enabled, memory and cache locality will be
impacted.
+ writecombine= [LOONGARCH] Control the MAT (Memory Access Type) of
+ ioremap_wc().
+
+ on - Enable writecombine, use WUC for ioremap_wc()
+ off - Disable writecombine, use SUC for ioremap_wc()
+
x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
supporting x2apic.
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 7fd51257e0ed..3ddde336e6a5 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -447,6 +447,22 @@ config ARCH_IOREMAP
protection support. However, you can enable LoongArch DMW-based
ioremap() for better performance.
+config ARCH_WRITECOMBINE
+ bool "Enable WriteCombine (WUC) for ioremap()"
+ help
+ LoongArch maintains cache coherency in hardware, but when paired
+ with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which
+ is similar to WriteCombine) is out of the scope of cache coherency
+ machanism for PCIe devices (this is a PCIe protocol violation, which
+ may be fixed in newer chipsets).
+
+ This means WUC can only used for write-only memory regions now, so
+ this option is disabled by default, making WUC silently fallback to
+ SUC for ioremap(). You can enable this option if the kernel is ensured
+ to run on hardware without this bug.
+
+ You can override this setting via writecombine=on/off boot parameter.
+
config ARCH_STRICT_ALIGN
bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
default y
diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
index 402a7d9e3a53..545e2708fbf7 100644
--- a/arch/loongarch/include/asm/io.h
+++ b/arch/loongarch/include/asm/io.h
@@ -54,8 +54,10 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
* @offset: bus address of the memory
* @size: size of the resource to map
*/
+extern pgprot_t pgprot_wc;
+
#define ioremap_wc(offset, size) \
- ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
+ ioremap_prot((offset), (size), pgprot_val(pgprot_wc))
#define ioremap_cache(offset, size) \
ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index bae84ccf6d36..27f71f9531e1 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -160,6 +160,27 @@ static void __init smbios_parse(void)
dmi_walk(find_tokens, NULL);
}
+#ifdef CONFIG_ARCH_WRITECOMBINE
+pgprot_t pgprot_wc = PAGE_KERNEL_WUC;
+#else
+pgprot_t pgprot_wc = PAGE_KERNEL_SUC;
+#endif
+
+EXPORT_SYMBOL(pgprot_wc);
+
+static int __init setup_writecombine(char *p)
+{
+ if (!strcmp(p, "on"))
+ pgprot_wc = PAGE_KERNEL_WUC;
+ else if (!strcmp(p, "off"))
+ pgprot_wc = PAGE_KERNEL_SUC;
+ else
+ pr_warn("Unknown writecombine setting \"%s\".\n", p);
+
+ return 0;
+}
+early_param("writecombine", setup_writecombine);
+
static int usermem __initdata;
static int __init early_parse_mem(char *p)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a25bc8486f9c01c1af6b6c5657234b2eee2c39d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042351-feline-gas-e7b7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a25bc8486f9c ("KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()")
85fbe08e4da8 ("KVM: arm64: Factor out firmware register handling from psci.c")
1ebdbeb03efe ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Date: Wed, 19 Apr 2023 13:16:13 +0300
Subject: [PATCH] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two
between 0-32768 but if it is more than sizeof(long) this will corrupt
memory.
Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Steven Price <steven.price(a)arm.com>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..c4b4678bc4a4 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
u64 val;
int wa_level;
+ if (KVM_REG_SIZE(reg->id) != sizeof(val))
+ return -ENOENT;
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a25bc8486f9c01c1af6b6c5657234b2eee2c39d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042350-coveted-collapse-44f5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a25bc8486f9c ("KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()")
85fbe08e4da8 ("KVM: arm64: Factor out firmware register handling from psci.c")
1ebdbeb03efe ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Date: Wed, 19 Apr 2023 13:16:13 +0300
Subject: [PATCH] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two
between 0-32768 but if it is more than sizeof(long) this will corrupt
memory.
Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Steven Price <steven.price(a)arm.com>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..c4b4678bc4a4 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
u64 val;
int wa_level;
+ if (KVM_REG_SIZE(reg->id) != sizeof(val))
+ return -ENOENT;
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a25bc8486f9c01c1af6b6c5657234b2eee2c39d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042349-cackle-parasite-ff0b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a25bc8486f9c ("KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()")
85fbe08e4da8 ("KVM: arm64: Factor out firmware register handling from psci.c")
1ebdbeb03efe ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Date: Wed, 19 Apr 2023 13:16:13 +0300
Subject: [PATCH] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two
between 0-32768 but if it is more than sizeof(long) this will corrupt
memory.
Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Steven Price <steven.price(a)arm.com>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..c4b4678bc4a4 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
u64 val;
int wa_level;
+ if (KVM_REG_SIZE(reg->id) != sizeof(val))
+ return -ENOENT;
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
The data->block[0] variable comes from user and is a number between
0-255. Without proper check, the variable may be very large to cause
an out-of-bounds when performing memcpy in slimpro_i2c_blkwr.
Fix this bug by checking the value of writelen.
Fixes: f6505fbabc42 ("i2c: add SLIMpro I2C device driver on APM X-Gene platform")
Signed-off-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Put length check inside slimpro_i2c_blkwr
Changes in v3:
- Correct the format of patch
Changes in v4:
- CC stable email address
drivers/i2c/busses/i2c-xgene-slimpro.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/i2c/busses/i2c-xgene-slimpro.c b/drivers/i2c/busses/i2c-xgene-slimpro.c
index bc9a3e7e0c96..0f7263e2276a 100644
--- a/drivers/i2c/busses/i2c-xgene-slimpro.c
+++ b/drivers/i2c/busses/i2c-xgene-slimpro.c
@@ -308,6 +308,9 @@ static int slimpro_i2c_blkwr(struct slimpro_i2c_dev *ctx, u32 chip,
u32 msg[3];
int rc;
+ if (writelen > I2C_SMBUS_BLOCK_MAX)
+ return -EINVAL;
+
memcpy(ctx->dma_buffer, data, writelen);
paddr = dma_map_single(ctx->dev, ctx->dma_buffer, writelen,
DMA_TO_DEVICE);
--
2.25.1
OverCurrent condition is not standardized in the UHCI spec.
Zhaoxin UHCI controllers report OverCurrent bit active off.
In order to handle OverCurrent condition correctly, the uhci-hcd
driver needs to be told to expect the active-off behavior.
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
v1->v2
- Modify the description of this patch.
- Let Zhaoxin and VIA share a common oc_low flag
drivers/usb/host/uhci-pci.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c
index 3592f757fe05..034586911bb5 100644
--- a/drivers/usb/host/uhci-pci.c
+++ b/drivers/usb/host/uhci-pci.c
@@ -119,11 +119,12 @@ static int uhci_pci_init(struct usb_hcd *hcd)
uhci->rh_numports = uhci_count_ports(hcd);
- /* Intel controllers report the OverCurrent bit active on.
- * VIA controllers report it active off, so we'll adjust the
- * bit value. (It's not standardized in the UHCI spec.)
+ /* Intel controllers report the OverCurrent bit active on. VIA
+ * and ZHAOXIN controllers report it active off, so we'll adjust
+ * the bit value. (It's not standardized in the UHCI spec.)
*/
- if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA)
+ if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA ||
+ to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN)
uhci->oc_low = 1;
/* HP's server management chip requires a longer port reset delay. */
--
2.32.0
This is just a resend of
https://lore.kernel.org/stable/20230318173943.3188213-1-qyousef@layalina.io/
Changes in v2:
* Fix compilation error against patch 7 due to misiplace #endif to
protect against CONFIG_SMP which doesn't contain the newly added
field to struct rq.
Commit 2ff401441711 ("sched/uclamp: Fix relationship between uclamp and
migration margin") was cherry-picked into 5.10 kernels but missed the rest of
the series.
This ports the remainder of the fixes.
Based on 5.10.172.
NOTE:
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not
necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair:
unlink misfit task from cpu overutilized") which is nice to have but not
strictly required. It improves the search for best CPU under adverse thermal
pressure to try harder. And the new search effectively replaces the capacity
inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp
2. default ubuntu config without uclamp + smp
3. default ubunto config without smp (which automatically disables
uclamp)
4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Tested on 5.10 Android GKI kernel and android device (with slight modifications
due to other conflicts on there).
Qais Yousef (10):
sched/uclamp: Make task_fits_capacity() use util_fits_cpu()
sched/uclamp: Fix fits_capacity() check in feec()
sched/uclamp: Make select_idle_capacity() use util_fits_cpu()
sched/uclamp: Make asym_fits_capacity() use util_fits_cpu()
sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early
exit condition
sched/fair: Detect capacity inversion
sched/fair: Consider capacity inversion in util_fits_cpu()
sched/uclamp: Fix a uninitialized variable warnings
sched/fair: Fixes for capacity inversion detection
kernel/sched/core.c | 10 +--
kernel/sched/fair.c | 183 ++++++++++++++++++++++++++++++++++---------
kernel/sched/sched.h | 70 ++++++++++++++++-
3 files changed, 217 insertions(+), 46 deletions(-)
--
2.25.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f4e9e0e69468583c2c6d9d5c7bfc975e292bf188
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042254-platinum-service-2032@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
9760ebffbf55 ("mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator")
47d9644de92c ("nommu: convert nommu to using the vma iterator")
a27a11f92fe2 ("mm/mremap: use vmi version of vma_merge()")
076f16bf7698 ("mmap: use vmi version of vma_merge()")
0c0c5bffd0a2 ("mmap: pass through vmi iterator to __split_vma()")
178e22ac2078 ("madvise: use vmi iterator for __split_vma() and vma_merge()")
f10c2abcdac4 ("mempolicy: convert to vma iterator")
37598f5a9d8b ("mlock: convert mlock to vma iterator")
2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
11a9b90274f6 ("userfaultfd: use vma iterator")
f2ebfe43ba6c ("mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
5ab0fc155dc0 ("Sync mm-stable with mm-hotfixes-stable to pick up dependent patches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4e9e0e69468583c2c6d9d5c7bfc975e292bf188 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 10 Apr 2023 11:22:05 -0400
Subject: [PATCH] mm/mempolicy: fix use-after-free of VMA iterator
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a256a241fd1d..2068b594dc88 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_area_struct *vma,
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
-
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
+
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start, unsigned long len,
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start, unsigned long len,
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 24bf08c4376be417f16ceb609188b16f461b0443
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042218-hexagon-cheese-b961@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
24bf08c4376b ("mm/userfaultfd: fix uffd-wp handling for THP migration entries")
6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
6c54dc6c7437 ("mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively")
28c5209dfd5f ("mm/rmap: pass rmap flags to hugepage_add_anon_rmap()")
f1e2db12e45b ("mm/rmap: remove do_page_add_anon_rmap()")
14f9135d5470 ("mm/rmap: convert RMAP flags to a proper distinct rmap_t type")
fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
b51ad4f8679e ("mm/memory: slightly simplify copy_present_pte()")
623a1ddfeb23 ("mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range()")
3bff7e3f1f16 ("mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()")
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
84d60fdd3733 ("mm: slightly clarify KSM logic in do_swap_page()")
53a05ad9f21d ("mm: optimize do_wp_page() for exclusive pages in the swapcache")
9030fb0bb9d6 ("Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24bf08c4376be417f16ceb609188b16f461b0443 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 5 Apr 2023 18:02:35 +0200
Subject: [PATCH] mm/userfaultfd: fix uffd-wp handling for THP migration
entries
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..e3706a2b34b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 24bf08c4376be417f16ceb609188b16f461b0443
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042216-retainer-immersion-5428@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
24bf08c4376b ("mm/userfaultfd: fix uffd-wp handling for THP migration entries")
6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
6c54dc6c7437 ("mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively")
28c5209dfd5f ("mm/rmap: pass rmap flags to hugepage_add_anon_rmap()")
f1e2db12e45b ("mm/rmap: remove do_page_add_anon_rmap()")
14f9135d5470 ("mm/rmap: convert RMAP flags to a proper distinct rmap_t type")
fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
b51ad4f8679e ("mm/memory: slightly simplify copy_present_pte()")
623a1ddfeb23 ("mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range()")
3bff7e3f1f16 ("mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()")
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
84d60fdd3733 ("mm: slightly clarify KSM logic in do_swap_page()")
53a05ad9f21d ("mm: optimize do_wp_page() for exclusive pages in the swapcache")
9030fb0bb9d6 ("Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24bf08c4376be417f16ceb609188b16f461b0443 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 5 Apr 2023 18:02:35 +0200
Subject: [PATCH] mm/userfaultfd: fix uffd-wp handling for THP migration
entries
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..e3706a2b34b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 0b5dfe12755f87ec014bb4cc1930485026167430
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042210-apache-rally-cc5e@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
0b5dfe12755f ("drm/amd/display: fix a divided-by-zero error")
13b90cf900ab ("drm/amd/display: fix PSR-SU/DSC interoperability support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b5dfe12755f87ec014bb4cc1930485026167430 Mon Sep 17 00:00:00 2001
From: Alex Hung <alex.hung(a)amd.com>
Date: Mon, 3 Apr 2023 17:45:41 +0800
Subject: [PATCH] drm/amd/display: fix a divided-by-zero error
[Why & How]
timing.dsc_cfg.num_slices_v can be zero and it is necessary to check
before using it.
This fixes the error "divide error: 0000 [#1] PREEMPT SMP NOPTI".
Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
index e39b133d05af..b56f07f99d09 100644
--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
+++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
@@ -934,6 +934,10 @@ bool psr_su_set_dsc_slice_height(struct dc *dc, struct dc_link *link,
pic_height = stream->timing.v_addressable +
stream->timing.v_border_top + stream->timing.v_border_bottom;
+
+ if (stream->timing.dsc_cfg.num_slices_v == 0)
+ return false;
+
slice_height = pic_height / stream->timing.dsc_cfg.num_slices_v;
config->dsc_slice_height = slice_height;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0b5dfe12755f87ec014bb4cc1930485026167430
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042209-scuff-accustom-83bb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0b5dfe12755f ("drm/amd/display: fix a divided-by-zero error")
13b90cf900ab ("drm/amd/display: fix PSR-SU/DSC interoperability support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b5dfe12755f87ec014bb4cc1930485026167430 Mon Sep 17 00:00:00 2001
From: Alex Hung <alex.hung(a)amd.com>
Date: Mon, 3 Apr 2023 17:45:41 +0800
Subject: [PATCH] drm/amd/display: fix a divided-by-zero error
[Why & How]
timing.dsc_cfg.num_slices_v can be zero and it is necessary to check
before using it.
This fixes the error "divide error: 0000 [#1] PREEMPT SMP NOPTI".
Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
index e39b133d05af..b56f07f99d09 100644
--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
+++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
@@ -934,6 +934,10 @@ bool psr_su_set_dsc_slice_height(struct dc *dc, struct dc_link *link,
pic_height = stream->timing.v_addressable +
stream->timing.v_border_top + stream->timing.v_border_bottom;
+
+ if (stream->timing.dsc_cfg.num_slices_v == 0)
+ return false;
+
slice_height = pic_height / stream->timing.dsc_cfg.num_slices_v;
config->dsc_slice_height = slice_height;
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 1e994cc0956b8dabd1b1fef315bbd722733b8aa8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042204-threefold-stingy-5bc3@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
1e994cc0956b ("drm/amd/display: limit timing for single dimm memory")
71c4ca2d3b07 ("drm/amd/display: Remove stutter only configurations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e994cc0956b8dabd1b1fef315bbd722733b8aa8 Mon Sep 17 00:00:00 2001
From: Daniel Miess <Daniel.Miess(a)amd.com>
Date: Tue, 4 Apr 2023 14:04:11 -0400
Subject: [PATCH] drm/amd/display: limit timing for single dimm memory
[Why]
1. It could hit bandwidth limitdation under single dimm
memory when connecting 8K external monitor.
2. IsSupportedVidPn got validation failed with
2K240Hz eDP + 8K24Hz external monitor.
3. It's better to filter out such combination in
EnumVidPnCofuncModality
4. For short term, filter out in dc bandwidth validation.
[How]
Force 2K@240Hz+8K@24Hz timing validation false in dc.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Daniel Miess <Daniel.Miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
index 54ed3de869d3..9ffba4c6fe55 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
@@ -1697,6 +1697,23 @@ static void dcn314_get_panel_config_defaults(struct dc_panel_config *panel_confi
*panel_config = panel_config_defaults;
}
+static bool filter_modes_for_single_channel_workaround(struct dc *dc,
+ struct dc_state *context)
+{
+ // Filter 2K@240Hz+8K@24fps above combination timing if memory only has single dimm LPDDR
+ if (dc->clk_mgr->bw_params->vram_type == 34 && dc->clk_mgr->bw_params->num_channels < 2) {
+ int total_phy_pix_clk = 0;
+
+ for (int i = 0; i < context->stream_count; i++)
+ if (context->res_ctx.pipe_ctx[i].stream)
+ total_phy_pix_clk += context->res_ctx.pipe_ctx[i].stream->phy_pix_clk;
+
+ if (total_phy_pix_clk >= (1148928+826260)) //2K@240Hz+8K@24fps
+ return true;
+ }
+ return false;
+}
+
bool dcn314_validate_bandwidth(struct dc *dc,
struct dc_state *context,
bool fast_validate)
@@ -1712,6 +1729,9 @@ bool dcn314_validate_bandwidth(struct dc *dc,
BW_VAL_TRACE_COUNT();
+ if (filter_modes_for_single_channel_workaround(dc, context))
+ goto validate_fail;
+
DC_FP_START();
// do not support self refresh only
out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt, &vlevel, fast_validate, false);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e994cc0956b8dabd1b1fef315bbd722733b8aa8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042202-confused-runt-7bf2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1e994cc0956b ("drm/amd/display: limit timing for single dimm memory")
71c4ca2d3b07 ("drm/amd/display: Remove stutter only configurations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e994cc0956b8dabd1b1fef315bbd722733b8aa8 Mon Sep 17 00:00:00 2001
From: Daniel Miess <Daniel.Miess(a)amd.com>
Date: Tue, 4 Apr 2023 14:04:11 -0400
Subject: [PATCH] drm/amd/display: limit timing for single dimm memory
[Why]
1. It could hit bandwidth limitdation under single dimm
memory when connecting 8K external monitor.
2. IsSupportedVidPn got validation failed with
2K240Hz eDP + 8K24Hz external monitor.
3. It's better to filter out such combination in
EnumVidPnCofuncModality
4. For short term, filter out in dc bandwidth validation.
[How]
Force 2K@240Hz+8K@24Hz timing validation false in dc.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Daniel Miess <Daniel.Miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
index 54ed3de869d3..9ffba4c6fe55 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
@@ -1697,6 +1697,23 @@ static void dcn314_get_panel_config_defaults(struct dc_panel_config *panel_confi
*panel_config = panel_config_defaults;
}
+static bool filter_modes_for_single_channel_workaround(struct dc *dc,
+ struct dc_state *context)
+{
+ // Filter 2K@240Hz+8K@24fps above combination timing if memory only has single dimm LPDDR
+ if (dc->clk_mgr->bw_params->vram_type == 34 && dc->clk_mgr->bw_params->num_channels < 2) {
+ int total_phy_pix_clk = 0;
+
+ for (int i = 0; i < context->stream_count; i++)
+ if (context->res_ctx.pipe_ctx[i].stream)
+ total_phy_pix_clk += context->res_ctx.pipe_ctx[i].stream->phy_pix_clk;
+
+ if (total_phy_pix_clk >= (1148928+826260)) //2K@240Hz+8K@24fps
+ return true;
+ }
+ return false;
+}
+
bool dcn314_validate_bandwidth(struct dc *dc,
struct dc_state *context,
bool fast_validate)
@@ -1712,6 +1729,9 @@ bool dcn314_validate_bandwidth(struct dc *dc,
BW_VAL_TRACE_COUNT();
+ if (filter_modes_for_single_channel_workaround(dc, context))
+ goto validate_fail;
+
DC_FP_START();
// do not support self refresh only
out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt, &vlevel, fast_validate, false);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042229-enchilada-cheddar-abb9@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
e30942859030 ("Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042224-tricolor-shining-6cd0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
e30942859030 ("Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042221-hazing-platonic-1a57@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
e30942859030 ("Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042219-woof-sanitary-4833@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042217-zoom-handset-cf9f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042205-easel-pantry-72d9@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
659c0ce1cb9e ("kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()")
111767c1d86b ("LSM: Signal to SafeSetID when setting group IDs")
40852275a94a ("LSM: add SafeSetID module that gates setid calls")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Fri, 17 Feb 2023 17:21:54 +0100
Subject: [PATCH] kernel/sys.c: fix and improve control flow in
__sys_setres[ug]id()
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..351de7916302 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042203-wad-winter-024f@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
659c0ce1cb9e ("kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()")
111767c1d86b ("LSM: Signal to SafeSetID when setting group IDs")
40852275a94a ("LSM: add SafeSetID module that gates setid calls")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Fri, 17 Feb 2023 17:21:54 +0100
Subject: [PATCH] kernel/sys.c: fix and improve control flow in
__sys_setres[ug]id()
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..351de7916302 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042201-superior-freebase-a71f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
659c0ce1cb9e ("kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()")
111767c1d86b ("LSM: Signal to SafeSetID when setting group IDs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Fri, 17 Feb 2023 17:21:54 +0100
Subject: [PATCH] kernel/sys.c: fix and improve control flow in
__sys_setres[ug]id()
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..351de7916302 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
From: Filipe Manana <fdmanana(a)suse.com>
commit d47704bd1c78c85831561bcf701b90dd66f811b2 upstream.
At find_delalloc_subrange(), when we need to get the next extent map, we
do a full search on the extent map tree (a red black tree). This is fine
but it's a lot more efficient to simply use rb_next(), which typically
requires iterating over less nodes of the tree and never needs to compare
the ranges of nodes with the one we are looking for.
So add a public helper to extent_map.{h,c} to get the extent map that
immediately follows another extent map, using rb_next(), and use that
helper at find_delalloc_subrange().
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
Please add this patch to the next 6.1 stable release.
It happens to fix a bug recently reported at:
https://bugzilla.redhat.com/show_bug.cgi?id=2187312
Thanks.
fs/btrfs/extent_map.c | 31 +++++++++++++++++++++++++++++-
fs/btrfs/extent_map.h | 2 ++
fs/btrfs/file.c | 44 ++++++++++++++++++++++++++-----------------
3 files changed, 59 insertions(+), 18 deletions(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index b8ae02aa632e..4abbe4b35253 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -523,7 +523,7 @@ void replace_extent_mapping(struct extent_map_tree *tree,
setup_extent_mapping(tree, new, modified);
}
-static struct extent_map *next_extent_map(struct extent_map *em)
+static struct extent_map *next_extent_map(const struct extent_map *em)
{
struct rb_node *next;
@@ -533,6 +533,35 @@ static struct extent_map *next_extent_map(struct extent_map *em)
return container_of(next, struct extent_map, rb_node);
}
+/*
+ * Get the extent map that immediately follows another one.
+ *
+ * @tree: The extent map tree that the extent map belong to.
+ * Holding read or write access on the tree's lock is required.
+ * @em: An extent map from the given tree. The caller must ensure that
+ * between getting @em and between calling this function, the
+ * extent map @em is not removed from the tree - for example, by
+ * holding the tree's lock for the duration of those 2 operations.
+ *
+ * Returns the extent map that immediately follows @em, or NULL if @em is the
+ * last extent map in the tree.
+ */
+struct extent_map *btrfs_next_extent_map(const struct extent_map_tree *tree,
+ const struct extent_map *em)
+{
+ struct extent_map *next;
+
+ /* The lock must be acquired either in read mode or write mode. */
+ lockdep_assert_held(&tree->lock);
+ ASSERT(extent_map_in_tree(em));
+
+ next = next_extent_map(em);
+ if (next)
+ refcount_inc(&next->refs);
+
+ return next;
+}
+
static struct extent_map *prev_extent_map(struct extent_map *em)
{
struct rb_node *prev;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index ad311864272a..68d3f2c9ea1d 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -87,6 +87,8 @@ static inline u64 extent_map_block_end(struct extent_map *em)
void extent_map_tree_init(struct extent_map_tree *tree);
struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree,
u64 start, u64 len);
+struct extent_map *btrfs_next_extent_map(const struct extent_map_tree *tree,
+ const struct extent_map *em);
int add_extent_mapping(struct extent_map_tree *tree,
struct extent_map *em, int modified);
void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1bda59c68360..77202addead8 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3248,40 +3248,50 @@ static bool find_delalloc_subrange(struct btrfs_inode *inode, u64 start, u64 end
*/
read_lock(&em_tree->lock);
em = lookup_extent_mapping(em_tree, start, len);
- read_unlock(&em_tree->lock);
+ if (!em) {
+ read_unlock(&em_tree->lock);
+ return (delalloc_len > 0);
+ }
/* extent_map_end() returns a non-inclusive end offset. */
- em_end = em ? extent_map_end(em) : 0;
+ em_end = extent_map_end(em);
/*
* If we have a hole/prealloc extent map, check the next one if this one
* ends before our range's end.
*/
- if (em && (em->block_start == EXTENT_MAP_HOLE ||
- test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) && em_end < end) {
+ if ((em->block_start == EXTENT_MAP_HOLE ||
+ test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) && em_end < end) {
struct extent_map *next_em;
- read_lock(&em_tree->lock);
- next_em = lookup_extent_mapping(em_tree, em_end, len - em_end);
- read_unlock(&em_tree->lock);
-
+ next_em = btrfs_next_extent_map(em_tree, em);
free_extent_map(em);
- em_end = next_em ? extent_map_end(next_em) : 0;
+
+ /*
+ * There's no next extent map or the next one starts beyond our
+ * range, return the range found in the io tree (if any).
+ */
+ if (!next_em || next_em->start > end) {
+ read_unlock(&em_tree->lock);
+ free_extent_map(next_em);
+ return (delalloc_len > 0);
+ }
+
+ em_end = extent_map_end(next_em);
em = next_em;
}
- if (em && (em->block_start == EXTENT_MAP_HOLE ||
- test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
- free_extent_map(em);
- em = NULL;
- }
+ read_unlock(&em_tree->lock);
/*
- * No extent map or one for a hole or prealloc extent. Use the delalloc
- * range we found in the io tree if we have one.
+ * We have a hole or prealloc extent that ends at or beyond our range's
+ * end, return the range found in the io tree (if any).
*/
- if (!em)
+ if (em->block_start == EXTENT_MAP_HOLE ||
+ test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
+ free_extent_map(em);
return (delalloc_len > 0);
+ }
/*
* We don't have any range as EXTENT_DELALLOC in the io tree, so the
--
2.34.1
From: Brian Foster <bfoster(a)redhat.com>
commit 7cd3099f4925d7c15887d1940ebd65acd66100f5 upstream.
Per-inode ioend completion batching has a log reservation deadlock
vector between preallocated append transactions and transactions
that are acquired at completion time for other purposes (i.e.,
unwritten extent conversion or COW fork remaps). For example, if the
ioend completion workqueue task executes on a batch of ioends that
are sorted such that an append ioend sits at the tail, it's possible
for the outstanding append transaction reservation to block
allocation of transactions required to process preceding ioends in
the list.
Append ioend completion is historically the common path for on-disk
inode size updates. While file extending writes may have completed
sometime earlier, the on-disk inode size is only updated after
successful writeback completion. These transactions are preallocated
serially from writeback context to mitigate concurrency and
associated log reservation pressure across completions processed by
multi-threaded workqueue tasks.
However, now that delalloc blocks unconditionally map to unwritten
extents at physical block allocation time, size updates via append
ioends are relatively rare. This means that inode size updates most
commonly occur as part of the preexisting completion time
transaction to convert unwritten extents. As a result, there is no
longer a strong need to preallocate size update transactions.
Remove the preallocation of inode size update transactions to avoid
the ioend completion processing log reservation deadlock. Instead,
continue to send all potential size extending ioends to workqueue
context for completion and allocate the transaction from that
context. This ensures that no outstanding log reservation is owned
by the ioend completion worker task when it begins to process
ioends.
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Reported-by: Christian Theune <ct(a)flyingcircus.io>
Link: https://lore.kernel.org/linux-xfs/CAOQ4uxjj2UqA0h4Y31NbmpHksMhVrXfXjLG4Tnz3…
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Acked-by: Darrick J. Wong <djwong(a)kernel.org>
---
Greg,
One more fix from v5.13 that I missed from my backports.
Thanks,
Amir.
fs/xfs/xfs_aops.c | 45 +++------------------------------------------
1 file changed, 3 insertions(+), 42 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 953de843d9c3..e341d6531e68 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
XFS_I(ioend->io_inode)->i_d.di_size;
}
-STATIC int
-xfs_setfilesize_trans_alloc(
- struct iomap_ioend *ioend)
-{
- struct xfs_mount *mp = XFS_I(ioend->io_inode)->i_mount;
- struct xfs_trans *tp;
- int error;
-
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
- if (error)
- return error;
-
- ioend->io_private = tp;
-
- /*
- * We may pass freeze protection with a transaction. So tell lockdep
- * we released it.
- */
- __sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
- /*
- * We hand off the transaction to the completion thread now, so
- * clear the flag here.
- */
- xfs_trans_clear_context(tp);
- return 0;
-}
-
/*
* Update on-disk file size now that data has been written to disk.
*/
@@ -191,12 +164,10 @@ xfs_end_ioend(
error = xfs_reflink_end_cow(ip, offset, size);
else if (ioend->io_type == IOMAP_UNWRITTEN)
error = xfs_iomap_write_unwritten(ip, offset, size, false);
- else
- ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
+ if (!error && xfs_ioend_is_append(ioend))
+ error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
done:
- if (ioend->io_private)
- error = xfs_setfilesize_ioend(ioend, error);
iomap_finish_ioends(ioend, error);
memalloc_nofs_restore(nofs_flag);
}
@@ -246,7 +217,7 @@ xfs_end_io(
static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
{
- return ioend->io_private ||
+ return xfs_ioend_is_append(ioend) ||
ioend->io_type == IOMAP_UNWRITTEN ||
(ioend->io_flags & IOMAP_F_SHARED);
}
@@ -259,8 +230,6 @@ xfs_end_bio(
struct xfs_inode *ip = XFS_I(ioend->io_inode);
unsigned long flags;
- ASSERT(xfs_ioend_needs_workqueue(ioend));
-
spin_lock_irqsave(&ip->i_ioend_lock, flags);
if (list_empty(&ip->i_ioend_list))
WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
@@ -510,14 +479,6 @@ xfs_prepare_ioend(
ioend->io_offset, ioend->io_size);
}
- /* Reserve log space if we might write beyond the on-disk inode size. */
- if (!status &&
- ((ioend->io_flags & IOMAP_F_SHARED) ||
- ioend->io_type != IOMAP_UNWRITTEN) &&
- xfs_ioend_is_append(ioend) &&
- !ioend->io_private)
- status = xfs_setfilesize_trans_alloc(ioend);
-
memalloc_nofs_restore(nofs_flag);
if (xfs_ioend_needs_workqueue(ioend))
--
2.34.1
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable(a)vger.kernel.org
Cc: James Zhu <James.Zhu(a)amd.com>
Cc: Leo Liu <leo.liu(a)amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
---
Hi folks, this was build/boot tested on Deck. I've also adjusted the
context, function was reworked on 6.2.
But what a surprise was for me not see this fix already in 6.1.y, since
I've CCed stable, and the reason for that is really peculiar:
$ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432
Leo Liu <leo.liu(a)amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 82132ecc5432
v6.2-rc1~124^2~1^2~13
$ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e
Leo Liu <leo.liu(a)amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 9a8cc8cabc1e
v6.1-rc8~16^2^2
This is quite strange for me, we have 2 commit hashes pointing to the *same*
commit, and each one is present..in a different release !!?!
Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear
duplicate in mainline? Specially in different releases, this could cause
some confusion I guess.
Thanks in advance,
Guilherme
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index ce64ca1c6e66..5c1193dd7d88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -26,6 +26,7 @@
#include <linux/firmware.h>
#include <linux/module.h>
+#include <linux/dmi.h>
#include <linux/pci.h>
#include <linux/debugfs.h>
#include <drm/drm_drv.h>
@@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
{
unsigned long bo_size;
const char *fw_name;
+ const char *bios_ver;
const struct common_firmware_header *hdr;
unsigned char fw_check;
unsigned int fw_shared_size, log_offset;
@@ -159,6 +161,21 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
adev->vcn.indirect_sram = true;
+ /*
+ * Some Steam Deck's BIOS versions are incompatible with the
+ * indirect SRAM mode, leading to amdgpu being unable to get
+ * properly probed (and even potentially crashing the kernel).
+ * Hence, check for these versions here - notice this is
+ * restricted to Vangogh (Deck's APU).
+ */
+ bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
+
+ if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
+ !strncmp("F7A0114", bios_ver, 7))) {
+ adev->vcn.indirect_sram = false;
+ dev_info(adev->dev,
+ "Steam Deck quirk: indirect SRAM disabled on BIOS %s\n", bios_ver);
+ }
break;
case IP_VERSION(3, 0, 16):
fw_name = FIRMWARE_DIMGREY_CAVEFISH;
--
2.40.0
From: Mel Gorman <mgorman(a)techsingularity.net>
commit 1c0908d8e441631f5b8ba433523cf39339ee2ba0 upstream.
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.
kernel BUG at fs/inode.c:625!
Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1
pc : clear_inode+0xa0/0xc0
lr : clear_inode+0x38/0xc0
Call trace:
clear_inode+0xa0/0xc0
evict+0x160/0x180
iput+0x154/0x240
do_unlinkat+0x184/0x300
__arm64_sys_unlinkat+0x48/0xc0
el0_svc_common.constprop.4+0xe4/0x2c0
do_el0_svc+0xac/0x100
el0_svc+0x78/0x200
el0t_64_sync_handler+0x9c/0xc0
el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to
make sure that we have acquire semantics on all paths where the lock can
be taken. Looking at the rtmutex code, this really isn't obvious to me
-- for example, try_to_take_rt_mutex() appears to be able to return via
the 'takeit' label without acquire semantics and it looks like we might
be relying on the caller's subsequent _unlock_ of the wait_lock for
ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.
[ bigeasy(a)linutronix.de: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.n…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
Could this be please backported to 5.15 and earlier? It is already part
of the 6.X kernels. I asked about this by the end of January and I'm
kindly asking again ;)
This patch applies against v5.15. Should it not apply to earlier
versions, please let me know an I kindly provide a backport.
I received reports that this fixes "mysterious" crashes and that is how
I noticed that it is not part of the earlier kernels.
kernel/locking/rtmutex.c | 55 ++++++++++++++++++++++++++++++------
kernel/locking/rtmutex_api.c | 6 ++--
2 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ea5a701ab2408..c9b21fd30bed5 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -87,15 +87,31 @@ static inline int __ww_mutex_check_kill(struct rt_mutex *lock,
* set this bit before looking at the lock.
*/
-static __always_inline void
-rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+static __always_inline struct task_struct *
+rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner)
{
unsigned long val = (unsigned long)owner;
if (rt_mutex_has_waiters(lock))
val |= RT_MUTEX_HAS_WAITERS;
- WRITE_ONCE(lock->owner, (struct task_struct *)val);
+ return (struct task_struct *)val;
+}
+
+static __always_inline void
+rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+{
+ /*
+ * lock->wait_lock is held but explicit acquire semantics are needed
+ * for a new lock owner so WRITE_ONCE is insufficient.
+ */
+ xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+}
+
+static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock)
+{
+ /* lock->wait_lock is held so the unlock provides release semantics. */
+ WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL));
}
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
@@ -104,7 +120,8 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
}
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
+static __always_inline void
+fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -170,8 +187,21 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
* still set.
*/
owner = READ_ONCE(*p);
- if (owner & RT_MUTEX_HAS_WAITERS)
- WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ if (owner & RT_MUTEX_HAS_WAITERS) {
+ /*
+ * See rt_mutex_set_owner() and rt_mutex_clear_owner() on
+ * why xchg_acquire() is used for updating owner for
+ * locking and WRITE_ONCE() for unlocking.
+ *
+ * WRITE_ONCE() would work for the acquire case too, but
+ * in case that the lock acquisition failed it might
+ * force other lockers into the slow path unnecessarily.
+ */
+ if (acquire_lock)
+ xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS);
+ else
+ WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ }
}
/*
@@ -206,6 +236,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
owner = *p;
} while (cmpxchg_relaxed(p, owner,
owner | RT_MUTEX_HAS_WAITERS) != owner);
+
+ /*
+ * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
+ * operations in the event of contention. Ensure the successful
+ * cmpxchg is visible.
+ */
+ smp_mb__after_atomic();
}
/*
@@ -1231,7 +1268,7 @@ static int __sched __rt_mutex_slowtrylock(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the lock waiters bit
* unconditionally. Clean this up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1591,7 +1628,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit
* unconditionally. We might have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1701,7 +1738,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the waiter bit unconditionally.
* We might have to fix that up:
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
debug_rt_mutex_free_waiter(&waiter);
}
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index 5c9299aaabae1..a461be2f873db 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -245,7 +245,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
{
debug_rt_mutex_proxy_unlock(lock);
- rt_mutex_set_owner(lock, NULL);
+ rt_mutex_clear_owner(lock);
}
/**
@@ -360,7 +360,7 @@ int __sched rt_mutex_wait_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
raw_spin_unlock_irq(&lock->wait_lock);
return ret;
@@ -416,7 +416,7 @@ bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, false);
raw_spin_unlock_irq(&lock->wait_lock);
--
2.39.1
In upstream commit 77e52ae35463 ("futex: Move to kernel/futex/") the
futex code from kernel/futex.c was moved into kernel/futex/core.c in
preparation of the split-up of the implementation in various files.
Point kernel-doc references to the new files as otherwise the
documentation shows errors on build:
[...]
Error: Cannot open file ./kernel/futex.c
Error: Cannot open file ./kernel/futex.c
[...]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 3.4.3 -internal ./kernel/futex.c' failed with return code 2
There is no direct upstream commit for this change. It is made in
analogy to commit bc67f1c454fb ("docs: futex: Fix kernel-doc
references") applied as consequence of the restructuring of the futex
code.
Fixes: 77e52ae35463 ("futex: Move to kernel/futex/")
Signed-off-by: Salvatore Bonaccorso <carnil(a)debian.org>
---
v1->v2:
- Fix typo in description about new target file for futex.c code
- Indent block with build log output
Documentation/kernel-hacking/locking.rst | 2 +-
Documentation/translations/it_IT/kernel-hacking/locking.rst | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 6ed806e6061b..a6d89efede79 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1358,7 +1358,7 @@ Mutex API reference
Futex API reference
===================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Further reading
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index bf1acd6204ef..192ab8e28125 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -1400,7 +1400,7 @@ Riferimento per l'API dei Mutex
Riferimento per l'API dei Futex
===============================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Approfondimenti
--
2.40.0
From: Huanhuan Wang <huanhuan.wang(a)corigine.com>
There are two pointers in struct xfrm_dev_offload, *dev, *real_dev.
The *dev points whether bonding interface or real interface, if
bonding IPsec offload is used, it points bonding interface; if not,
it points real interface. And *real_dev always points real interface.
So nfp should always use real_dev instead of dev.
Prior to this change the system becomes unresponsive when offloading
IPsec for a device which is a lower device to a bonding device.
Fixes: 859a497fe80c ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
CC: stable(a)vger.kernel.org
Signed-off-by: Huanhuan Wang <huanhuan.wang(a)corigine.com>
Acked-by: Simon Horman <simon.horman(a)corigine.com>
Signed-off-by: Louis Peens <louis.peens(a)corigine.com>
---
drivers/net/ethernet/netronome/nfp/crypto/ipsec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
index c0dcce8ae437..b1f026b81dea 100644
--- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
+++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
@@ -269,7 +269,7 @@ static void set_sha2_512hmac(struct nfp_ipsec_cfg_add_sa *cfg, int *trunc_len)
static int nfp_net_xfrm_add_state(struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
- struct net_device *netdev = x->xso.dev;
+ struct net_device *netdev = x->xso.real_dev;
struct nfp_ipsec_cfg_mssg msg = {};
int i, key_len, trunc_len, err = 0;
struct nfp_ipsec_cfg_add_sa *cfg;
@@ -513,7 +513,7 @@ static void nfp_net_xfrm_del_state(struct xfrm_state *x)
.cmd = NFP_IPSEC_CFG_MSSG_INV_SA,
.sa_idx = x->xso.offload_handle - 1,
};
- struct net_device *netdev = x->xso.dev;
+ struct net_device *netdev = x->xso.real_dev;
struct nfp_net *nn;
int err;
--
2.34.1
In upstream commit 77e52ae35463 ("futex: Move to kernel/futex/") the
futex code from kernel/futex.c was moved into kernel/futex/core in
preparation of the split-up of the implementation in various files.
Point kernel-doc references to the new files as otherwise the
documentation shows errors on build:
[...]
Error: Cannot open file ./kernel/futex.c
Error: Cannot open file ./kernel/futex.c
[...]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 3.4.3 -internal ./kernel/futex.c' failed with return code 2
There is no direct upstream commit for this change. It is made in
analogy to commit bc67f1c454fb ("docs: futex: Fix kernel-doc
references") applied as consequence of the restructuring of the futex
code.
Fixes: 77e52ae35463 ("futex: Move to kernel/futex/")
Signed-off-by: Salvatore Bonaccorso <carnil(a)debian.org>
---
Documentation/kernel-hacking/locking.rst | 2 +-
Documentation/translations/it_IT/kernel-hacking/locking.rst | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 6ed806e6061b..a6d89efede79 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1358,7 +1358,7 @@ Mutex API reference
Futex API reference
===================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Further reading
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index bf1acd6204ef..192ab8e28125 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -1400,7 +1400,7 @@ Riferimento per l'API dei Mutex
Riferimento per l'API dei Futex
===============================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Approfondimenti
--
2.40.0
From: Pingfan Liu <kernelfans(a)gmail.com>
Purgatory.ro is a standalone binary that is not linked against the rest of
the kernel. Its image is copied into an array that is linked to the
kernel, and from there kexec relocates it wherever it desires.
Unlike the debug info for vmlinux, which can be used for analyzing crash
such info is useless in purgatory.ro. And discarding them can save about
200K space.
Original:
259080 kexec-purgatory.o
Stripped debug info:
29152 kexec-purgatory.o
Signed-off-by: Pingfan Liu <kernelfans(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Steve Wahl <steve.wahl(a)hpe.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Link: https://lore.kernel.org/r/1596433788-3784-1-git-send-email-kernelfans@gmail…
(cherry picked from commit 52416ffcf823ee11aa19792715664ab94757f111)
[Alyssa: fixed for LLVM_IAS=1 by adding -g to AFLAGS_REMOVE_*]
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
---
arch/x86/purgatory/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 9733d1cc791d..969d2b2eb7d7 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -27,7 +27,7 @@ KCOV_INSTRUMENT := n
# make up the standalone purgatory.ro
PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel
-PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss
+PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss -g0
PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING
# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
@@ -58,6 +58,9 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
+AFLAGS_REMOVE_setup-x86_$(BITS).o += -g -Wa,-gdwarf-2
+AFLAGS_REMOVE_entry64.o += -g -Wa,-gdwarf-2
+
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
This is based on upstream commit
d83806c4c0cc ("purgatory: fix disabling debug info"), but adapted to
the linker flags used in 5.10.y.
Since 3a260e9844c9, the linker flags can contain -g instead of
-Wa,-gdwarf-2 (when using the LLVM assembler). As a result, in that
case, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 3a260e9844c9 ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
---
arch/x86/purgatory/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..ebaf329a2368 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -64,8 +64,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += -g -Wa,-gdwarf-2
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
This is based on upstream commit
d83806c4c0cc ("purgatory: fix disabling debug info"), but adapted to
the linker flags used in 5.15.y.
Since 0ee2f0567a56, the linker flags can contain -g instead of
-Wa,-gdwarf-2 (when using the LLVM assembler). As a result, in that
case, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 0ee2f0567a56 ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
---
arch/x86/purgatory/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..ebaf329a2368 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -64,8 +64,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += -g -Wa,-gdwarf-2
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
(cherry picked from commit d83806c4c0cccc0d6d3c3581a11983a9c186a138)
---
arch/riscv/purgatory/Makefile | 4 +---
arch/x86/purgatory/Makefile | 3 +--
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index dd58e1d99397..659e21862077 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -74,9 +74,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_ctype.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..82fec66d46d2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
base-commit: cdc7aff9ed012801e62eedd99e4a5573eccac4db
--
2.37.1
The quilt patch titled
Subject: ia64: fix an addr to taddr in huge_pte_offset()
has been removed from the -mm tree. Its filename was
ia64-fix-an-addr-to-taddr-in-huge_pte_offset.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: ia64: fix an addr to taddr in huge_pte_offset()
Date: Sun, 16 Apr 2023 22:17:05 -0700 (PDT)
I know nothing of ia64 htlbpage_to_page(), but guess that the p4d
line should be using taddr rather than addr, like everywhere else.
Link: https://lkml.kernel.org/r/732eae88-3beb-246-2c72-281de786740@google.com
Fixes: c03ab9e32a2c ("ia64: add support for folded p4d page tables")
Signed-off-by: Hugh Dickins <hughd(a)google.com
Acked-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/ia64/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/ia64/mm/hugetlbpage.c~ia64-fix-an-addr-to-taddr-in-huge_pte_offset
+++ a/arch/ia64/mm/hugetlbpage.c
@@ -58,7 +58,7 @@ huge_pte_offset (struct mm_struct *mm, u
pgd = pgd_offset(mm, taddr);
if (pgd_present(*pgd)) {
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_offset(pgd, taddr);
if (p4d_present(*p4d)) {
pud = pud_offset(p4d, taddr);
if (pud_present(*pud)) {
_
Patches currently in -mm which might be from hughd(a)google.com are
The quilt patch titled
Subject: mm/hugetlb: fix uffd-wp during fork()
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-uffd-wp-during-fork.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix uffd-wp during fork()
Date: Mon, 17 Apr 2023 15:53:12 -0400
Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins",
v2.
This patch (of 6):
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mika Penttil�� <mpenttil(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wp-during-fork
+++ a/mm/hugetlb.c
@@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(
static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
- struct folio *new_folio)
+ struct folio *new_folio, pte_t old)
{
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
__folio_mark_uptodate(new_folio);
hugepage_add_new_anon_rmap(new_folio, vma, addr);
- set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5032,14 +5036,12 @@ again:
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
- bool uffd_wp = huge_pte_uffd_wp(entry);
-
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
- bool uffd_wp = huge_pte_uffd_wp(entry);
+ bool uffd_wp = pte_swp_uffd_wp(entry);
if (!is_readable_migration_entry(swp_entry) && cow) {
/*
@@ -5050,10 +5052,10 @@ again:
swp_offset(swp_entry));
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
- entry = huge_pte_mkuffd_wp(entry);
+ entry = pte_swp_mkuffd_wp(entry);
set_huge_pte_at(src, addr, src_pte, entry);
}
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
@@ -5118,7 +5120,8 @@ again:
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+ hugetlb_install_folio(dst_vma, dst_pte, addr,
+ new_folio, src_pte_old);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5136,6 +5139,9 @@ again:
entry = huge_pte_wrprotect(entry);
}
+ if (!userfaultfd_wp(dst_vma))
+ entry = huge_pte_clear_uffd_wp(entry);
+
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
_
Patches currently in -mm which might be from peterx(a)redhat.com are
On some Zhaoxin platforms, xHCI will prefetch TRB for performance
improvement. However this TRB prefetch mechanism may cross page boundary,
which may access memory not allocated by xHCI driver. In order to fix
this issue, two pages was allocated for TRB and only the first
page will be used.
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
drivers/usb/host/xhci-mem.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index d0a9467aa5fc..d5517400d874 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2369,8 +2369,12 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
- TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size);
+ if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
+ TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
+ else
+ xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
+ TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size);
/* See Table 46 and Note on Figure 55 */
xhci->device_pool = dma_pool_create("xHCI input/output contexts", dev,
--
2.32.0
commit 08d0cc5f34265d1a1e3031f319f594bd1970976c upstream.
This change is desired because without it, it has been observed that
re-applying aspm settings can cause the system to crash with certain pci
devices (ie. Genesys GL9755).
Tested by issuing 100 suspend/resume cycles on a symptomatic system running
5.15.107.
L1 settings looked identical before and after:
```
localhost ~ # lspci -vvv -d 0x17a0: | grep L1Sub
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+
L1SubCtl2: T_PwrOn=3100us
```
Cc: <stable(a)vger.kernel.org> # 5.15.y
OverCurrent condition is not standardized in the UHCI spec.
Zhaoxin UHCI controllers report OverCurrent bit active off.
In order to handle OverCurrent condition correctly, the uhci-hcd
driver needs to be told to expect the active-off behavior.
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
v1->v2
- Modify the description of this patch.
- Let Zhaoxin and VIA share a common oc_low flag
drivers/usb/host/uhci-pci.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c
index 3592f757fe05..034586911bb5 100644
--- a/drivers/usb/host/uhci-pci.c
+++ b/drivers/usb/host/uhci-pci.c
@@ -119,11 +119,12 @@ static int uhci_pci_init(struct usb_hcd *hcd)
uhci->rh_numports = uhci_count_ports(hcd);
- /* Intel controllers report the OverCurrent bit active on.
- * VIA controllers report it active off, so we'll adjust the
- * bit value. (It's not standardized in the UHCI spec.)
+ /* Intel controllers report the OverCurrent bit active on. VIA
+ * and ZHAOXIN controllers report it active off, so we'll adjust
+ * the bit value. (It's not standardized in the UHCI spec.)
*/
- if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA)
+ if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA ||
+ to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN)
uhci->oc_low = 1;
/* HP's server management chip requires a longer port reset delay. */
--
2.32.0
Hi all,
After merging the tip tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:
/tmp/next/build/kernel/time/posix-cpu-timers.c: In function 'posix_cpu_timer_wait_running_nsleep':
/tmp/next/build/kernel/time/posix-cpu-timers.c:1310:30: error: 'timr' is a pointer; did you mean to use '->'?
1310 | spin_unlock_irq(&timr.it_lock);
| ^
| ->
/tmp/next/build/kernel/time/posix-cpu-timers.c:1312:28: error: 'timr' is a pointer; did you mean to use '->'?
1312 | spin_lock_irq(&timr.it_lock);
| ^
| ->
Caused by commit
2aaae4bf41b101f7e ("posix-cpu-timers: Implement the missing timer_wait_running callback")
The !POSIX_CPU_TIMERS_TASK_WORK case wasn't fully updated. I've used
the version of the tip tree from next-20230420 instead.
The following commit has been merged into the timers/core branch of tip:
Commit-ID: f7abf14f0001a5a47539d9f60bbdca649e43536b
Gitweb: https://git.kernel.org/tip/f7abf14f0001a5a47539d9f60bbdca649e43536b
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Mon, 17 Apr 2023 15:37:55 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 21 Apr 2023 15:34:33 +02:00
posix-cpu-timers: Implement the missing timer_wait_running callback
For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <elver(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Marco Elver <elver(a)google.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
---
include/linux/posix-timers.h | 17 ++++---
kernel/time/posix-cpu-timers.c | 81 +++++++++++++++++++++++++++------
kernel/time/posix-timers.c | 4 ++-
3 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 2c6e99c..d607f51 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -4,6 +4,7 @@
#include <linux/spinlock.h>
#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/alarmtimer.h>
#include <linux/timerqueue.h>
@@ -62,16 +63,18 @@ static inline int clockid_to_fd(const clockid_t clk)
* cpu_timer - Posix CPU timer representation for k_itimer
* @node: timerqueue node to queue in the task/sig
* @head: timerqueue head on which this timer is queued
- * @task: Pointer to target task
+ * @pid: Pointer to target task PID
* @elist: List head for the expiry list
* @firing: Timer is currently firing
+ * @handling: Pointer to the task which handles expiry
*/
struct cpu_timer {
- struct timerqueue_node node;
- struct timerqueue_head *head;
- struct pid *pid;
- struct list_head elist;
- int firing;
+ struct timerqueue_node node;
+ struct timerqueue_head *head;
+ struct pid *pid;
+ struct list_head elist;
+ int firing;
+ struct task_struct __rcu *handling;
};
static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
@@ -135,10 +138,12 @@ struct posix_cputimers {
/**
* posix_cputimers_work - Container for task work based posix CPU timer expiry
* @work: The task work to be scheduled
+ * @mutex: Mutex held around expiry in context of this task work
* @scheduled: @work has been scheduled already, no further processing
*/
struct posix_cputimers_work {
struct callback_head work;
+ struct mutex mutex;
unsigned int scheduled;
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 2f5e9b3..e9c6f9d 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -846,6 +846,8 @@ static u64 collect_timerqueue(struct timerqueue_head *head,
return expires;
ctmr->firing = 1;
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
@@ -1161,7 +1163,49 @@ static void handle_posix_cpu_timers(struct task_struct *tsk);
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
static void posix_cpu_timers_work(struct callback_head *work)
{
+ struct posix_cputimers_work *cw = container_of(work, typeof(*cw), work);
+
+ mutex_lock(&cw->mutex);
handle_posix_cpu_timers(current);
+ mutex_unlock(&cw->mutex);
+}
+
+/*
+ * Invoked from the posix-timer core when a cancel operation failed because
+ * the timer is marked firing. The caller holds rcu_read_lock(), which
+ * protects the timer and the task which is expiring it from being freed.
+ */
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ struct task_struct *tsk = rcu_dereference(timr->it.cpu.handling);
+
+ /* Has the handling task completed expiry already? */
+ if (!tsk)
+ return;
+
+ /* Ensure that the task cannot go away */
+ get_task_struct(tsk);
+ /* Now drop the RCU protection so the mutex can be locked */
+ rcu_read_unlock();
+ /* Wait on the expiry mutex */
+ mutex_lock(&tsk->posix_cputimers_work.mutex);
+ /* Release it immediately again. */
+ mutex_unlock(&tsk->posix_cputimers_work.mutex);
+ /* Drop the task reference. */
+ put_task_struct(tsk);
+ /* Relock RCU so the callsite is balanced */
+ rcu_read_lock();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ /* Ensure that timr->it.cpu.handling task cannot go away */
+ rcu_read_lock();
+ spin_unlock_irq(&timr->it_lock);
+ posix_cpu_timer_wait_running(timr);
+ rcu_read_unlock();
+ /* @timr is on stack and is valid */
+ spin_lock_irq(&timr->it_lock);
}
/*
@@ -1177,6 +1221,7 @@ void clear_posix_cputimers_work(struct task_struct *p)
sizeof(p->posix_cputimers_work.work));
init_task_work(&p->posix_cputimers_work.work,
posix_cpu_timers_work);
+ mutex_init(&p->posix_cputimers_work.mutex);
p->posix_cputimers_work.scheduled = false;
}
@@ -1255,6 +1300,18 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
lockdep_posixtimer_exit();
}
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ cpu_relax();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ spin_unlock_irq(&timr->it_lock);
+ cpu_relax();
+ spin_lock_irq(&timr->it_lock);
+}
+
static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
{
return false;
@@ -1363,6 +1420,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
*/
if (likely(cpu_firing >= 0))
cpu_timer_fire(timer);
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(timer->it.cpu.handling, NULL);
spin_unlock(&timer->it_lock);
}
}
@@ -1497,23 +1556,16 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
expires = cpu_timer_getexpires(&timer.it.cpu);
error = posix_cpu_timer_set(&timer, 0, &zero_it, &it);
if (!error) {
- /*
- * Timer is now unarmed, deletion can not fail.
- */
+ /* Timer is now unarmed, deletion can not fail. */
posix_cpu_timer_del(&timer);
+ } else {
+ while (error == TIMER_RETRY) {
+ posix_cpu_timer_wait_running_nsleep(&timer);
+ error = posix_cpu_timer_del(&timer);
+ }
}
- spin_unlock_irq(&timer.it_lock);
- while (error == TIMER_RETRY) {
- /*
- * We need to handle case when timer was or is in the
- * middle of firing. In other cases we already freed
- * resources.
- */
- spin_lock_irq(&timer.it_lock);
- error = posix_cpu_timer_del(&timer);
- spin_unlock_irq(&timer.it_lock);
- }
+ spin_unlock_irq(&timer.it_lock);
if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) {
/*
@@ -1623,6 +1675,7 @@ const struct k_clock clock_posix_cpu = {
.timer_del = posix_cpu_timer_del,
.timer_get = posix_cpu_timer_get,
.timer_rearm = posix_cpu_timer_rearm,
+ .timer_wait_running = posix_cpu_timer_wait_running,
};
const struct k_clock clock_process = {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0c8a87a..808a247 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -846,6 +846,10 @@ static struct k_itimer *timer_wait_running(struct k_itimer *timer,
rcu_read_lock();
unlock_timer(timer, *flags);
+ /*
+ * kc->timer_wait_running() might drop RCU lock. So @timer
+ * cannot be touched anymore after the function returns!
+ */
if (!WARN_ON_ONCE(!kc->timer_wait_running))
kc->timer_wait_running(timer);
Thomas,
Here's a small collection of irqchip patches for 6.4. The only
significant thing is the RISC-V IPI rework, which spans both the
irqchip subsystem and the arch code (and is Acked by Palmer).
The rest is a bunch of errata workarounds, fixes and cleanups.
Please pull,
M.
The following changes since commit 197b6b60ae7bc51dd0814953c562833143b292aa:
Linux 6.3-rc4 (2023-03-26 14:40:20 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git tags/irqchip-6.4
for you to fetch changes up to 2ff1b0839ddd514be4752c64c1c6facf91ff3a56:
Merge branch irq/misc-6.4 into irq/irqchip-next (2023-04-21 14:05:31 +0100)
----------------------------------------------------------------
irqchip changes for 6.4
- Large RISC-V IPI rework to make way for a new interrupt
architecture
- More Loongarch fixes from Lianmin Lv, fixing issues in the so
called "dual-bridge" systems.
- Workaround for the nvidia T241 chip that gets confused in
3 and 4 socket configurations, leading to the GIC
malfunctionning in some contexts
- Drop support for non-firmware driven GIC configurarations
now that the old ARM11MP Cavium board is gone
- Workaround for the Rockchip 3588 chip that doesn't
correctly deal with the shareability attributes.
- Replace uses of of_find_property() with the more appropriate
of_property_read_bool()
- Make bcm-6345-l1 request its MMIO region
- Add suspend support to the SiFive PLIC
- Drop support for stih415, stih416 and stid127 platforms
----------------------------------------------------------------
Alain Volmat (1):
irqchip/st: Remove stih415/stih416 and stid127 platforms support
Anup Patel (7):
RISC-V: Clear SIP bit only when using SBI IPI operations
irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
RISC-V: Treat IPIs as normal Linux IRQs
RISC-V: Allow marking IPIs as suitable for remote FENCEs
RISC-V: Use IPIs for remote TLB flush when possible
RISC-V: Use IPIs for remote icache flush when possible
irqchip/riscv-intc: Add empty irq_eoi() for chained irq handlers
Jianmin Lv (5):
irqchip/loongson-eiointc: Fix returned value on parsing MADT
irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
irqchip/loongson-eiointc: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
Marc Zyngier (5):
irqchip/gic: Drop support for board files
Merge branch irq/gic-6.4 into irq/irqchip-next
Merge branch irq/riscv-ipi into irq/irqchip-next
Merge branch irq/loongarch-fixes-6.4 into irq/irqchip-next
Merge branch irq/misc-6.4 into irq/irqchip-next
Mason Huo (1):
irqchip/irq-sifive-plic: Add syscore callbacks for hibernation
Rob Herring (1):
irqchip: Use of_property_read_bool() for boolean properties
Sebastian Reichel (1):
irqchip/gic-v3: Add Rockchip 3588001 erratum workaround
Shanker Donthineni (1):
irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
Álvaro Fernández Rojas (1):
irqchip/bcm-6345-l1: Request memory region
Documentation/arm64/silicon-errata.rst | 5 +
arch/arm64/Kconfig | 10 ++
arch/riscv/Kconfig | 2 +
arch/riscv/include/asm/irq.h | 4 +
arch/riscv/include/asm/sbi.h | 9 +-
arch/riscv/include/asm/smp.h | 49 +++++++---
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpu-hotplug.c | 3 +-
arch/riscv/kernel/irq.c | 21 +++-
arch/riscv/kernel/sbi-ipi.c | 77 +++++++++++++++
arch/riscv/kernel/sbi.c | 100 +++----------------
arch/riscv/kernel/smp.c | 171 +++++++++++++++++----------------
arch/riscv/kernel/smpboot.c | 5 +-
arch/riscv/mm/cacheflush.c | 5 +-
arch/riscv/mm/tlbflush.c | 93 +++++++++++++++---
drivers/clocksource/timer-clint.c | 65 ++++++++++---
drivers/firmware/smccc/smccc.c | 26 +++++
drivers/firmware/smccc/soc_id.c | 28 +-----
drivers/irqchip/Kconfig | 3 +
drivers/irqchip/irq-bcm6345-l1.c | 6 +-
drivers/irqchip/irq-csky-apb-intc.c | 2 +-
drivers/irqchip/irq-gic-v2m.c | 2 +-
drivers/irqchip/irq-gic-v3-its.c | 35 +++++++
drivers/irqchip/irq-gic-v3.c | 115 +++++++++++++++++++---
drivers/irqchip/irq-gic.c | 60 +-----------
drivers/irqchip/irq-loongson-eiointc.c | 32 ++++--
drivers/irqchip/irq-loongson-pch-pic.c | 6 +-
drivers/irqchip/irq-riscv-intc.c | 71 ++++++++------
drivers/irqchip/irq-sifive-plic.c | 93 +++++++++++++++++-
drivers/irqchip/irq-st.c | 15 ---
include/linux/arm-smccc.h | 18 ++++
include/linux/irqchip/arm-gic.h | 6 --
32 files changed, 761 insertions(+), 377 deletions(-)
create mode 100644 arch/riscv/kernel/sbi-ipi.c
From: Kornel Dulęba <korneld(a)chromium.org>
Leverage gpiochip_line_is_irq to check whether a pin has an irq
associated with it. The previous check ("irq == 0") didn't make much
sense. The irq variable refers to the pinctrl irq, and has nothing do to
with an individual pin.
On some systems, during suspend/resume cycle, the firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
Without this patch that caused an interrupt storm.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217315
Signed-off-by: Kornel Dulęba <korneld(a)chromium.org>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/pinctrl/pinctrl-amd.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 24465010397b..675c9826b78a 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -660,21 +660,21 @@ static bool do_amd_gpio_irq_handler(int irq, void *dev_id)
* We must read the pin register again, in case the
* value was changed while executing
* generic_handle_domain_irq() above.
- * If we didn't find a mapping for the interrupt,
- * disable it in order to avoid a system hang caused
- * by an interrupt storm.
+ * If the line is not an irq, disable it in order to
+ * avoid a system hang caused by an interrupt storm.
*/
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
regval = readl(regs + i);
- if (irq == 0) {
- regval &= ~BIT(INTERRUPT_ENABLE_OFF);
+ if (!gpiochip_line_is_irq(gc, irqnr + i)) {
+ regval &= ~BIT(INTERRUPT_MASK_OFF);
dev_dbg(&gpio_dev->pdev->dev,
"Disabling spurious GPIO IRQ %d\n",
irqnr + i);
+ } else {
+ ret = true;
}
writel(regval, regs + i);
raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
- ret = true;
}
}
/* did not cause wake on resume context for shared IRQ */
--
2.34.1
commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe")
had a mistake in loop iteration 63 that it would clear offset 0xFC instead
of 0x100. Offset 0xFC is actually `WAKE_INT_MASTER_REG`. This was
clearing bits 13 and 15 from the register which significantly changed the
expected handling for some platforms for GPIO0.
commit b26cd9325be4 ("pinctrl: amd: Disable and mask interrupts on resume")
actually fixed this bug, but lead to regressions on Lenovo Z13 and some
other systems. This is because there was no handling in the driver for bit
15 debounce behavior.
Quoting a public BKDG:
```
EnWinBlueBtn. Read-write. Reset: 0. 0=GPIO0 detect debounced power button;
Power button override is 4 seconds. 1=GPIO0 detect debounced power button
in S3/S5/S0i3, and detect "pressed less than 2 seconds" and "pressed 2~10
seconds" in S0; Power button override is 10 seconds
```
Cross referencing the same master register in Windows it's obvious that
Windows doesn't use debounce values in this configuration. So align the
Linux driver to do this as well. This fixes wake on lid when
WAKE_INT_MASTER_REG is properly programmed.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217315
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/pinctrl/pinctrl-amd.c | 7 +++++++
drivers/pinctrl/pinctrl-amd.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index c250110f6775..6b9ae92017d4 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -125,6 +125,12 @@ static int amd_gpio_set_debounce(struct gpio_chip *gc, unsigned offset,
struct amd_gpio *gpio_dev = gpiochip_get_data(gc);
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+
+ /* Use special handling for Pin0 debounce */
+ pin_reg = readl(gpio_dev->base + WAKE_INT_MASTER_REG);
+ if (pin_reg & INTERNAL_GPIO0_DEBOUNCE)
+ debounce = 0;
+
pin_reg = readl(gpio_dev->base + offset * 4);
if (debounce) {
@@ -219,6 +225,7 @@ static void amd_gpio_dbg_show(struct seq_file *s, struct gpio_chip *gc)
char *debounce_enable;
char *wake_cntrlz;
+ seq_printf(s, "WAKE_INT_MASTER_REG: 0x%08x\n", readl(gpio_dev->base + WAKE_INT_MASTER_REG));
for (bank = 0; bank < gpio_dev->hwbank_num; bank++) {
unsigned int time = 0;
unsigned int unit = 0;
diff --git a/drivers/pinctrl/pinctrl-amd.h b/drivers/pinctrl/pinctrl-amd.h
index 81ae8319a1f0..1cf2d06bbd8c 100644
--- a/drivers/pinctrl/pinctrl-amd.h
+++ b/drivers/pinctrl/pinctrl-amd.h
@@ -17,6 +17,7 @@
#define AMD_GPIO_PINS_BANK3 32
#define WAKE_INT_MASTER_REG 0xfc
+#define INTERNAL_GPIO0_DEBOUNCE (1 << 15)
#define EOI_MASK (1 << 29)
#define WAKE_INT_STATUS_REG0 0x2f8
--
2.34.1
Currently, on a handful of ASICs. We allow the framebuffer for a given
plane to exist in either VRAM or GTT. However, if the plane's new
framebuffer is in a different memory domain than it's previous
framebuffer, flipping between them can cause the screen to flicker. So,
to fix this, don't perform an immediate flip in the aforementioned case.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Reviewed-by: Roman Li <Roman.Li(a)amd.com>
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
v2: make a number of clarifications to the commit message and drop
locking.
v3: use a stronger check
v4: drop mem_type
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index dfcb9815b5a8..76a776fd8437 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7900,6 +7900,13 @@ static void amdgpu_dm_commit_cursors(struct drm_atomic_state *state)
amdgpu_dm_plane_handle_cursor_update(plane, old_plane_state);
}
+static inline uint32_t get_mem_type(struct drm_framebuffer *fb)
+{
+ struct amdgpu_bo *abo = gem_to_amdgpu_bo(fb->obj[0]);
+
+ return abo->tbo.resource ? abo->tbo.resource->mem_type : 0;
+}
+
static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
struct dc_state *dc_state,
struct drm_device *dev,
@@ -8042,11 +8049,13 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
/*
* Only allow immediate flips for fast updates that don't
- * change FB pitch, DCC state, rotation or mirroing.
+ * change memory domain, FB pitch, DCC state, rotation or
+ * mirroring.
*/
bundle->flip_addrs[planes_count].flip_immediate =
crtc->state->async_flip &&
- acrtc_state->update_type == UPDATE_TYPE_FAST;
+ acrtc_state->update_type == UPDATE_TYPE_FAST &&
+ get_mem_type(old_plane_state->fb) == get_mem_type(fb);
timestamp_ns = ktime_get_ns();
bundle->flip_addrs[planes_count].flip_timestamp_in_us = div_u64(timestamp_ns, 1000);
--
2.40.0
Until now, the page table walker counted increments to the PA and IPA
of a walk in two separate places. While the PA is incremented as soon as
a leaf PTE is installed in stage2_map_walker_try_leaf(), the IPA is
actually bumped in the generic table walker context. Critically,
__kvm_pgtable_visit() rereads the PTE after the LEAF callback returns
to work out if a table or leaf was installed, and only bumps the IPA for
a leaf PTE.
This arrangement worked fine when we handled faults behind the write lock,
as the walker had exclusive access to the stage-2 page tables. However,
commit 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel")
started handling all stage-2 faults behind the read lock, opening up a
race where a walker could increment the PA but not the IPA of a walk.
Nothing good ensues, as the walker starts mapping with the incorrect
IPA -> PA relationship.
For example, assume that two vCPUs took a data abort on the same IPA.
One observes that dirty logging is disabled, and the other observed that
it is enabled:
vCPU attempting PMD mapping vCPU attempting PTE mapping
====================================== =====================================
/* install PMD */
stage2_make_pte(ctx, leaf);
data->phys += granule;
/* replace PMD with a table */
stage2_try_break_pte(ctx, data->mmu);
stage2_make_pte(ctx, table);
/* table is observed */
ctx.old = READ_ONCE(*ptep);
table = kvm_pte_table(ctx.old, level);
/*
* map walk continues w/o incrementing
* IPA.
*/
__kvm_pgtable_walk(..., level + 1);
Bring an end to the whole mess by using the IPA as the single source of
truth for how far along a walk has gotten. Work out the correct PA to
map by calculating the IPA offset from the beginning of the walk and add
that to the starting physical address.
Cc: stable(a)vger.kernel.org
Fixes: 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel")
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
arch/arm64/include/asm/kvm_pgtable.h | 1 +
arch/arm64/kvm/hyp/pgtable.c | 32 ++++++++++++++++++++++++----
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4cd6762bda80..dc3c072e862f 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -209,6 +209,7 @@ struct kvm_pgtable_visit_ctx {
kvm_pte_t old;
void *arg;
struct kvm_pgtable_mm_ops *mm_ops;
+ u64 start;
u64 addr;
u64 end;
u32 level;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d61bd3e591d..140f82300db5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -58,6 +58,7 @@
struct kvm_pgtable_walk_data {
struct kvm_pgtable_walker *walker;
+ u64 start;
u64 addr;
u64 end;
};
@@ -201,6 +202,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
.old = READ_ONCE(*ptep),
.arg = data->walker->arg,
.mm_ops = mm_ops,
+ .start = data->start,
.addr = data->addr,
.end = data->end,
.level = level,
@@ -293,6 +295,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_pgtable_walker *walker)
{
struct kvm_pgtable_walk_data walk_data = {
+ .start = ALIGN_DOWN(addr, PAGE_SIZE),
.addr = ALIGN_DOWN(addr, PAGE_SIZE),
.end = PAGE_ALIGN(walk_data.addr + size),
.walker = walker,
@@ -794,20 +797,43 @@ static bool stage2_pte_executable(kvm_pte_t pte)
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}
+static u64 stage2_map_walker_phys_addr(const struct kvm_pgtable_visit_ctx *ctx,
+ const struct stage2_map_data *data)
+{
+ u64 phys = data->phys;
+
+ /*
+ * Stage-2 walks to update ownership data are communicated to the map
+ * walker using an invalid PA. Avoid offsetting an already invalid PA,
+ * which could overflow and make the address valid again.
+ */
+ if (!kvm_phys_is_valid(phys))
+ return phys;
+
+ /*
+ * Otherwise, work out the correct PA based on how far the walk has
+ * gotten.
+ */
+ return phys + (ctx->addr - ctx->start);
+}
+
static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
struct stage2_map_data *data)
{
+ u64 phys = stage2_map_walker_phys_addr(ctx, data);
+
if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
return false;
- return kvm_block_mapping_supported(ctx, data->phys);
+ return kvm_block_mapping_supported(ctx, phys);
}
static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
struct stage2_map_data *data)
{
kvm_pte_t new;
- u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
+ u64 phys = stage2_map_walker_phys_addr(ctx, data);
+ u64 granule = kvm_granule_size(ctx->level);
struct kvm_pgtable *pgt = data->mmu->pgt;
struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
@@ -841,8 +867,6 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
stage2_make_pte(ctx, new);
- if (kvm_phys_is_valid(phys))
- data->phys += granule;
return 0;
}
--
2.40.0.634.g4ca3ef3211-goog
--
Hello Dear,
how are you today? I hope you are fine
My name is Dr. Ava Smith, I Am an English and French nationality.
I will give you pictures and more details about me as soon as I hear from you
Thanks
Ava
Do not call gadget stop until the poll for controller halt is
completed. DEVTEN is cleared as part of gadget stop, so the intention to
allow ep0 events to continue while waiting for controller halt is not
happening.
Fixes: c96683798e27 ("usb: dwc3: ep0: Don't prepare beyond Setup stage")
Cc: stable(a)vger.kernel.org
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
---
drivers/usb/dwc3/gadget.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 9f492c8a7d0b..dd6057bad37e 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2637,7 +2637,6 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
* bit.
*/
dwc3_stop_active_transfers(dwc);
- __dwc3_gadget_stop(dwc);
spin_unlock_irqrestore(&dwc->lock, flags);
/*
@@ -2674,7 +2673,19 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
* remaining event generated by the controller while polling for
* DSTS.DEVCTLHLT.
*/
- return dwc3_gadget_run_stop(dwc, false);
+ ret = dwc3_gadget_run_stop(dwc, false);
+
+ /*
+ * Stop the gadget after controller is halted, so that if needed, the
+ * events to update EP0 state can still occur while the run/stop
+ * routine polls for the halted state. DEVTEN is cleared as part of
+ * gadget stop.
+ */
+ spin_lock_irqsave(&dwc->lock, flags);
+ __dwc3_gadget_stop(dwc);
+ spin_unlock_irqrestore(&dwc->lock, flags);
+
+ return ret;
}
static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
Hi,
I would like to see these patches backported. They are needed so perf
can be cross compiled with gcc on v5.15, v5.10 and v5.4.
I built it with tuxmake [1] here are two example commandlines:
tuxmake --runtime podman --target-arch arm64 --toolchain gcc-12 --kconfig defconfig perf
tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-12 --kconfig defconfig perf
Tried to build perf with both gcc-11 and gcc-12.
Patch 'tools perf: Fix compilation error with new binutils'
and 'tools build: Add feature test for init_disassemble_info API changes'
didn't apply cleanly, thats why I send these in a patchset.
When apply 'tools build: Add feature test for
init_disassemble_info API changes' to 5.4 it will be a minor merge
conflict, do you want me to send this patch in two separate patches one
for 5.4 and another for v5.10?
The sha for these two patches in mainline are.
cfd59ca91467 tools build: Add feature test for init_disassemble_info API changes
83aa0120487e tools perf: Fix compilation error with new binutils
The above patches solves these:
util/annotate.c: In function 'symbol__disassemble_bpf':
util/annotate.c:1729:9: error: too few arguments to function 'init_disassemble_info'
1729 | init_disassemble_info(&info, s,
| ^~~~~~~~~~~~~~~~~~~~~
Please apply these to v5.10 and v5.4
a45b3d692623 tools include: add dis-asm-compat.h to handle version differences
d08c84e01afa perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK>
The above patches solves these:
/home/anders/src/kernel/stable-5.10/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror]
43 | (void) (&_max1 == &_max2); \
| ^~
builtin-sched.c:673:34: note: in expansion of macro 'max'
673 | (size_t) max(16 * 1024, PTHREAD_STACK_MIN));
| ^~~
Please apply these to v5.15, v5.10 and v5.4
8e8bf60a6754 perf build: Fixup disabling of -Wdeprecated-declarations for the python scripting engine
4ee3c4da8b1b perf scripting python: Do not build fail on deprecation warnings
63a4354ae75c perf scripting perl: Ignore some warnings to keep building with perl headers
Build error that the above 3 patches solves are:
/usr/lib/x86_64-linux-gnu/perl/5.36/CORE/handy.h:125:23: error: cast from function call of type 'STRLEN' {aka 'long unsigned int'} to non-matching type '_Bool' [-Werror=bad-function-cast]
125 | #define cBOOL(cbool) ((bool) (cbool))
| ^
Cheers,
Anders
[1] https://tuxmake.org/
Andres Freund (2):
tools perf: Fix compilation error with new binutils
tools build: Add feature test for init_disassemble_info API changes
tools/build/Makefile.feature | 1 +
tools/build/feature/Makefile | 4 ++++
tools/build/feature/test-all.c | 4 ++++
tools/build/feature/test-disassembler-init-styled.c | 13 +++++++++++++
tools/perf/Makefile.config | 8 ++++++++
tools/perf/util/annotate.c | 7 ++++---
6 files changed, 34 insertions(+), 3 deletions(-)
create mode 100644 tools/build/feature/test-disassembler-init-styled.c
--
2.39.2