The patch titled
Subject: mm/page_table_check: fix crash on ZONE_DEVICE
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_table_check-fix-crash-on-zone_device.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/page_table_check: fix crash on ZONE_DEVICE
Date: Wed, 5 Jun 2024 17:21:46 -0400
Not all pages may apply to pgtable check. One example is ZONE_DEVICE
pages: they map PFNs directly, and they don't allocate page_ext at all
even if there's struct page around. One may reference
devm_memremap_pages().
When both ZONE_DEVICE and page-table-check enabled, then try to map some
dax memories, one can trigger kernel bug constantly now when the kernel
was trying to inject some pfn maps on the dax device:
kernel BUG at mm/page_table_check.c:55!
While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
fault resolutions, skip all the checks if page_ext doesn't even exist in
pgtable checker, which applies to ZONE_DEVICE but maybe more.
Link: https://lkml.kernel.org/r/20240605212146.994486-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_table_check.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/page_table_check.c~mm-page_table_check-fix-crash-on-zone_device
+++ a/mm/page_table_check.c
@@ -73,6 +73,9 @@ static void page_table_check_clear(unsig
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+ if (!page_ext)
+ return;
+
BUG_ON(PageSlab(page));
anon = PageAnon(page);
@@ -110,6 +113,9 @@ static void page_table_check_set(unsigne
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+ if (!page_ext)
+ return;
+
BUG_ON(PageSlab(page));
anon = PageAnon(page);
@@ -140,7 +146,10 @@ void __page_table_check_zero(struct page
BUG_ON(PageSlab(page));
page_ext = page_ext_get(page);
- BUG_ON(!page_ext);
+
+ if (!page_ext)
+ return;
+
for (i = 0; i < (1ul << order); i++) {
struct page_table_check *ptc = get_page_table_check(page_ext);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-page_table_check-fix-crash-on-zone_device.patch
mm-debug_vm_pgtable-drop-random_orvalue-trick.patch
mm-drop-leftover-comment-references-to-pxx_huge.patch
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: c625dabbf1c4a8e77e4734014f2fde7aa9071a1f
Gitweb: https://git.kernel.org/tip/c625dabbf1c4a8e77e4734014f2fde7aa9071a1f
Author: Yazen Ghannam <yazen.ghannam(a)amd.com>
AuthorDate: Mon, 03 Apr 2023 16:42:44
Committer: Borislav Petkov (AMD) <bp(a)alien8.de>
CommitterDate: Wed, 05 Jun 2024 21:23:34 +02:00
x86/amd_nb: Check for invalid SMN reads
AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.
SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated.
However, the PCI config access may succeed, but the return value may be
invalid. This is in similar fashion to PCI bad reads, i.e. return all
bits set.
Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.
However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.
Check for this condition, clear the return value, and set a proper error
code.
Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com
---
arch/x86/kernel/amd_nb.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 3cf156f..027a8c7 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -215,7 +215,14 @@ out:
int amd_smn_read(u16 node, u32 address, u32 *value)
{
- return __amd_smn_rw(node, address, value, false);
+ int err = __amd_smn_rw(node, address, value, false);
+
+ if (PCI_POSSIBLE_ERROR(*value)) {
+ err = -ENODEV;
+ *value = 0;
+ }
+
+ return err;
}
EXPORT_SYMBOL_GPL(amd_smn_read);
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.
Thread A Thread B
... ...
perf_event_free_task perf_event_release_kernel
...
acquire event->child_mutex
...
get_ctx
... release event->child_mutex
acquire ctx->mutex
...
perf_free_event (acquire/release event->child_mutex)
...
release ctx->mutex
wait_var_event
acquire ctx->mutex
acquire event->child_mutex
# move existing events to free_list
release event->child_mutex
release ctx->mutex
put_ctx
... ...
In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.
Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haifeng Xu <haifeng.xu(a)shopee.com>
Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org>
Acked-by: Mark Rutland <mark.rutland(a)arm.com>
---
Changes since v1:
- Add the fixed tag.
- Simplify v1's patch. (Frederic)
Changes since v2:
- Use Reviewed-by tag instead of Signed-off-by tag.
Changes since v3:
- Add Acked-by tag.
- Cc stable(a)vger.kernel.org. (Mark)
---
kernel/events/core.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4f0c45ab8d7d..15c35070db6a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5340,6 +5340,7 @@ int perf_event_release_kernel(struct perf_event *event)
again:
mutex_lock(&event->child_mutex);
list_for_each_entry(child, &event->child_list, child_list) {
+ void *var = NULL;
/*
* Cannot change, child events are not migrated, see the
@@ -5380,11 +5381,23 @@ int perf_event_release_kernel(struct perf_event *event)
* this can't be the last reference.
*/
put_event(event);
+ } else {
+ var = &ctx->refcount;
}
mutex_unlock(&event->child_mutex);
mutex_unlock(&ctx->mutex);
put_ctx(ctx);
+
+ if (var) {
+ /*
+ * If perf_event_free_task() has deleted all events from the
+ * ctx while the child_mutex got released above, make sure to
+ * notify about the preceding put_ctx().
+ */
+ smp_mb(); /* pairs with wait_var_event() */
+ wake_up_var(var);
+ }
goto again;
}
mutex_unlock(&event->child_mutex);
--
2.25.1
For atomic_sub_and_test() the @i parameter is the value to subtract, not
add. Fix the kerneldoc comment accordingly.
Fixes: ad8110706f38 ("locking/atomic: scripts: generate kerneldoc comments")
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
include/linux/atomic/atomic-instrumented.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index debd487fe971..12b558c05384 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -1349,7 +1349,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
/**
* atomic_sub_and_test() - atomic subtract and test if zero with full ordering
- * @i: int value to add
+ * @i: int value to subtract
* @v: pointer to atomic_t
*
* Atomically updates @v to (@v - @i) with full ordering.
--
2.45.0.rc1.225.g2a3ae87e7f-goog
From: Nilay Shroff <nilay(a)linux.ibm.com>
[ Upstream commit d3a043733f25d743f3aa617c7f82dbcb5ee2211a ]
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.
Signed-off-by: Nilay Shroff <nilay(a)linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/multipath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 811f7b96b5517..c993548403f5c 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -436,7 +436,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
int node, srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
- for_each_node(node)
+ for_each_online_node(node)
__nvme_find_path(head, node);
srcu_read_unlock(&head->srcu, srcu_idx);
}
--
2.43.0
From: Nilay Shroff <nilay(a)linux.ibm.com>
[ Upstream commit d3a043733f25d743f3aa617c7f82dbcb5ee2211a ]
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.
Signed-off-by: Nilay Shroff <nilay(a)linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/multipath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 73eddb67f0d24..9ddaa599a9cd0 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -516,7 +516,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
int node, srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
- for_each_node(node)
+ for_each_online_node(node)
__nvme_find_path(head, node);
srcu_read_unlock(&head->srcu, srcu_idx);
}
--
2.43.0
From: Nilay Shroff <nilay(a)linux.ibm.com>
[ Upstream commit d3a043733f25d743f3aa617c7f82dbcb5ee2211a ]
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.
Signed-off-by: Nilay Shroff <nilay(a)linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/multipath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 75386d3e0f981..615fbdc09d1cc 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -594,7 +594,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
int node, srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
- for_each_node(node)
+ for_each_online_node(node)
__nvme_find_path(head, node);
srcu_read_unlock(&head->srcu, srcu_idx);
}
--
2.43.0
From: Nilay Shroff <nilay(a)linux.ibm.com>
[ Upstream commit d3a043733f25d743f3aa617c7f82dbcb5ee2211a ]
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.
Signed-off-by: Nilay Shroff <nilay(a)linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/multipath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d16e976ae1a47..9c1e135b8df3b 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -595,7 +595,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
int node, srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
- for_each_node(node)
+ for_each_online_node(node)
__nvme_find_path(head, node);
srcu_read_unlock(&head->srcu, srcu_idx);
}
--
2.43.0
From: Alex Henrie <alexhenrie24(a)gmail.com>
[ Upstream commit 3295f1b866bfbcabd625511968e8a5c541f9ab32 ]
The incompatible device in my possession has a sticker that says
"F5U002 Rev 2" and "P80453-B", and lsusb identifies it as
"050d:0002 Belkin Components IEEE-1284 Controller". There is a bug
report from 2007 from Michael Trausch who was seeing the exact same
errors that I saw in 2024 trying to use this cable.
Link: https://lore.kernel.org/all/46DE5830.9060401@trausch.us/
Signed-off-by: Alex Henrie <alexhenrie24(a)gmail.com>
Link: https://lore.kernel.org/r/20240326150723.99939-5-alexhenrie24@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/usb/misc/uss720.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/misc/uss720.c b/drivers/usb/misc/uss720.c
index 0be8efcda15d5..d972c09629397 100644
--- a/drivers/usb/misc/uss720.c
+++ b/drivers/usb/misc/uss720.c
@@ -677,7 +677,7 @@ static int uss720_probe(struct usb_interface *intf,
struct parport_uss720_private *priv;
struct parport *pp;
unsigned char reg;
- int i;
+ int ret;
dev_dbg(&intf->dev, "probe: vendor id 0x%x, device id 0x%x\n",
le16_to_cpu(usbdev->descriptor.idVendor),
@@ -688,8 +688,8 @@ static int uss720_probe(struct usb_interface *intf,
usb_put_dev(usbdev);
return -ENODEV;
}
- i = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2);
- dev_dbg(&intf->dev, "set interface result %d\n", i);
+ ret = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2);
+ dev_dbg(&intf->dev, "set interface result %d\n", ret);
interface = intf->cur_altsetting;
@@ -725,12 +725,18 @@ static int uss720_probe(struct usb_interface *intf,
set_1284_register(pp, 7, 0x00, GFP_KERNEL);
set_1284_register(pp, 6, 0x30, GFP_KERNEL); /* PS/2 mode */
set_1284_register(pp, 2, 0x0c, GFP_KERNEL);
- /* debugging */
- get_1284_register(pp, 0, ®, GFP_KERNEL);
+
+ /* The Belkin F5U002 Rev 2 P80453-B USB parallel port adapter shares the
+ * device ID 050d:0002 with some other device that works with this
+ * driver, but it itself does not. Detect and handle the bad cable
+ * here. */
+ ret = get_1284_register(pp, 0, ®, GFP_KERNEL);
dev_dbg(&intf->dev, "reg: %7ph\n", priv->reg);
+ if (ret < 0)
+ return ret;
- i = usb_find_last_int_in_endpoint(interface, &epd);
- if (!i) {
+ ret = usb_find_last_int_in_endpoint(interface, &epd);
+ if (!ret) {
dev_dbg(&intf->dev, "epaddr %d interval %d\n",
epd->bEndpointAddress, epd->bInterval);
}
--
2.43.0
From: Alex Henrie <alexhenrie24(a)gmail.com>
[ Upstream commit 3295f1b866bfbcabd625511968e8a5c541f9ab32 ]
The incompatible device in my possession has a sticker that says
"F5U002 Rev 2" and "P80453-B", and lsusb identifies it as
"050d:0002 Belkin Components IEEE-1284 Controller". There is a bug
report from 2007 from Michael Trausch who was seeing the exact same
errors that I saw in 2024 trying to use this cable.
Link: https://lore.kernel.org/all/46DE5830.9060401@trausch.us/
Signed-off-by: Alex Henrie <alexhenrie24(a)gmail.com>
Link: https://lore.kernel.org/r/20240326150723.99939-5-alexhenrie24@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/usb/misc/uss720.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/misc/uss720.c b/drivers/usb/misc/uss720.c
index 0be8efcda15d5..d972c09629397 100644
--- a/drivers/usb/misc/uss720.c
+++ b/drivers/usb/misc/uss720.c
@@ -677,7 +677,7 @@ static int uss720_probe(struct usb_interface *intf,
struct parport_uss720_private *priv;
struct parport *pp;
unsigned char reg;
- int i;
+ int ret;
dev_dbg(&intf->dev, "probe: vendor id 0x%x, device id 0x%x\n",
le16_to_cpu(usbdev->descriptor.idVendor),
@@ -688,8 +688,8 @@ static int uss720_probe(struct usb_interface *intf,
usb_put_dev(usbdev);
return -ENODEV;
}
- i = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2);
- dev_dbg(&intf->dev, "set interface result %d\n", i);
+ ret = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2);
+ dev_dbg(&intf->dev, "set interface result %d\n", ret);
interface = intf->cur_altsetting;
@@ -725,12 +725,18 @@ static int uss720_probe(struct usb_interface *intf,
set_1284_register(pp, 7, 0x00, GFP_KERNEL);
set_1284_register(pp, 6, 0x30, GFP_KERNEL); /* PS/2 mode */
set_1284_register(pp, 2, 0x0c, GFP_KERNEL);
- /* debugging */
- get_1284_register(pp, 0, ®, GFP_KERNEL);
+
+ /* The Belkin F5U002 Rev 2 P80453-B USB parallel port adapter shares the
+ * device ID 050d:0002 with some other device that works with this
+ * driver, but it itself does not. Detect and handle the bad cable
+ * here. */
+ ret = get_1284_register(pp, 0, ®, GFP_KERNEL);
dev_dbg(&intf->dev, "reg: %7ph\n", priv->reg);
+ if (ret < 0)
+ return ret;
- i = usb_find_last_int_in_endpoint(interface, &epd);
- if (!i) {
+ ret = usb_find_last_int_in_endpoint(interface, &epd);
+ if (!ret) {
dev_dbg(&intf->dev, "epaddr %d interval %d\n",
epd->bEndpointAddress, epd->bInterval);
}
--
2.43.0
From: Yunlei He <heyunlei(a)oppo.com>
[ Upstream commit ac5eecf481c29942eb9a862e758c0c8b68090c33 ]
In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set.
If create new file or open file during this gap, these files
will not use inlinecrypt. Worse case, it may lead to data
corruption if wrappedkey_v0 is enable.
Thread A: Thread B:
-f2fs_remount -f2fs_file_open or f2fs_new_inode
-default_options
<- clear SB_INLINECRYPT flag
-fscrypt_select_encryption_impl
-parse_options
<- set SB_INLINECRYPT again
Signed-off-by: Yunlei He <heyunlei(a)oppo.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/f2fs/super.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 9a74d60f61dba..f73b2b9445acd 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1713,8 +1713,6 @@ static void default_options(struct f2fs_sb_info *sbi)
F2FS_OPTION(sbi).compress_ext_cnt = 0;
F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_ON;
- sbi->sb->s_flags &= ~SB_INLINECRYPT;
-
set_opt(sbi, INLINE_XATTR);
set_opt(sbi, INLINE_DATA);
set_opt(sbi, INLINE_DENTRY);
--
2.43.0
From: Yunlei He <heyunlei(a)oppo.com>
[ Upstream commit ac5eecf481c29942eb9a862e758c0c8b68090c33 ]
In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set.
If create new file or open file during this gap, these files
will not use inlinecrypt. Worse case, it may lead to data
corruption if wrappedkey_v0 is enable.
Thread A: Thread B:
-f2fs_remount -f2fs_file_open or f2fs_new_inode
-default_options
<- clear SB_INLINECRYPT flag
-fscrypt_select_encryption_impl
-parse_options
<- set SB_INLINECRYPT again
Signed-off-by: Yunlei He <heyunlei(a)oppo.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/f2fs/super.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index df1e5496352c2..706d7adda3b22 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2068,8 +2068,6 @@ static void default_options(struct f2fs_sb_info *sbi)
F2FS_OPTION(sbi).compress_mode = COMPR_MODE_FS;
F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_ON;
- sbi->sb->s_flags &= ~SB_INLINECRYPT;
-
set_opt(sbi, INLINE_XATTR);
set_opt(sbi, INLINE_DATA);
set_opt(sbi, INLINE_DENTRY);
--
2.43.0
From: Yunlei He <heyunlei(a)oppo.com>
[ Upstream commit ac5eecf481c29942eb9a862e758c0c8b68090c33 ]
In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set.
If create new file or open file during this gap, these files
will not use inlinecrypt. Worse case, it may lead to data
corruption if wrappedkey_v0 is enable.
Thread A: Thread B:
-f2fs_remount -f2fs_file_open or f2fs_new_inode
-default_options
<- clear SB_INLINECRYPT flag
-fscrypt_select_encryption_impl
-parse_options
<- set SB_INLINECRYPT again
Signed-off-by: Yunlei He <heyunlei(a)oppo.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/f2fs/super.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c529ce5d986cc..f496622921843 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2092,8 +2092,6 @@ static void default_options(struct f2fs_sb_info *sbi)
F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_ON;
F2FS_OPTION(sbi).memory_mode = MEMORY_MODE_NORMAL;
- sbi->sb->s_flags &= ~SB_INLINECRYPT;
-
set_opt(sbi, INLINE_XATTR);
set_opt(sbi, INLINE_DATA);
set_opt(sbi, INLINE_DENTRY);
--
2.43.0
When the best selected CPU is offline, work_on_cpu() will stuck forever.
This can be happen if a node is online while all its CPUs are offline
(we can use "maxcpus=1" without "nr_cpus=1" to reproduce it), Therefore,
in this case, we should call local_pci_probe() instead of work_on_cpu().
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn>
---
v1 -> v2 Added the method to reproduce this issue
---
drivers/pci/pci-driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index af2996d0d17f..32a99828e6a3 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
free_cpumask_var(wq_domain_mask);
}
- if (cpu < nr_cpu_ids)
+ if ((cpu < nr_cpu_ids) && cpu_online(cpu))
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
--
2.33.0
Attention,
I am writing from Srisawad Corp. PCL. We would like to invest in your
project, we have the capacity to fund your project and we are open to
discuss on any of these three types of investment plans. We can invest
either as a debt/loan finance, equity venture or a combination of debt/loan
and equity venture, likewise known as convertible note depending on the
investment plan most suitable for your project.
We are majorly interested in projects with high profit value. Projects that
fall within our interest can be funded completely with very considerable
and workable terms. Should you be interested please get back to me through
the below email for more information.
Email: doungchai(a)sawadscorpsocl.com
Kindly let us know what your ideas are.
Doungchai Kaewbootta
Executive Director/Managing Director
Srisawad Corp. PCL
99/392 Srisawad Building, 4th, 6th floor,
Soi Chaengwattana 10 Intersection 3 (Benjamitr)
Chaengwattana Road, Thung Song Hong Subdistrict
Lak Si District, Bangkok 10210
Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the
loads and stores are atomic. In the extremely unlikely scenario the
compiler tears the stores, it's theoretically possible for KVM to attempt
to get a vCPU using an out-of-bounds index, e.g. if the write is split
into multiple 8-bit stores, and is paired with a 32-bit load on a VM with
257 vCPUs:
CPU0 CPU1
last_boosted_vcpu = 0xff;
(last_boosted_vcpu = 0x100)
last_boosted_vcpu[15:8] = 0x01;
i = (last_boosted_vcpu = 0x1ff)
last_boosted_vcpu[7:0] = 0x00;
vcpu = kvm->vcpu_array[0x1ff];
As detected by KCSAN:
BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm]
write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16:
kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm
handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
__se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
__x64_sys_ioctl (fs/ioctl.c:890)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4:
kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm
handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
__se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
__x64_sys_ioctl (fs/ioctl.c:890)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
value changed: 0x00000012 -> 0x00000000
Fixes: 217ece6129f2 ("KVM: use yield_to instead of sleep in kvm_vcpu_on_spin")
Cc: stable(a)vger.kernel.org
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changelog:
v2:
* Reworded the git commit as suggested by Sean
* Dropped the me->kvm->last_boosted_vcpu in favor of
kvm->last_boosted_vcpu as suggested by Sean
v1:
* https://lore.kernel.org/all/20240509090146.146153-1-leitao@debian.org/
---
virt/kvm/kvm_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff0a20565f90..d9ce063c76f9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4066,12 +4066,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
{
struct kvm *kvm = me->kvm;
struct kvm_vcpu *vcpu;
- int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
+ int last_boosted_vcpu;
unsigned long i;
int yielded = 0;
int try = 3;
int pass;
+ last_boosted_vcpu = READ_ONCE(kvm->last_boosted_vcpu);
kvm_vcpu_set_in_spin_loop(me, true);
/*
* We boost the priority of a VCPU that is runnable but not
@@ -4109,7 +4110,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
yielded = kvm_vcpu_yield_to(vcpu);
if (yielded > 0) {
- kvm->last_boosted_vcpu = i;
+ WRITE_ONCE(kvm->last_boosted_vcpu, i);
break;
} else if (yielded < 0) {
try--;
--
2.43.0
The patch titled
Subject: nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-nilfs_empty_dir-misjudgment-and-long-loop-on-i-o-errors.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
Date: Tue, 4 Jun 2024 22:42:55 +0900
The error handling in nilfs_empty_dir() when a directory folio/page read
fails is incorrect, as in the old ext2 implementation, and if the
folio/page cannot be read or nilfs_check_folio() fails, it will falsely
determine the directory as empty and corrupt the file system.
In addition, since nilfs_empty_dir() does not immediately return on a
failed folio/page read, but continues to loop, this can cause a long loop
with I/O if i_size of the directory's inode is also corrupted, causing the
log writer thread to wait and hang, as reported by syzbot.
Fix these issues by making nilfs_empty_dir() immediately return a false
value (0) if it fails to get a directory folio/page.
Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+c8166c541d3971bf6c87(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/dir.c~nilfs2-fix-nilfs_empty_dir-misjudgment-and-long-loop-on-i-o-errors
+++ a/fs/nilfs2/dir.c
@@ -607,7 +607,7 @@ int nilfs_empty_dir(struct inode *inode)
kaddr = nilfs_get_folio(inode, i, &folio);
if (IS_ERR(kaddr))
- continue;
+ return 0;
de = (struct nilfs_dir_entry *)kaddr;
kaddr += nilfs_last_byte(inode, i) - NILFS_DIR_REC_LEN(1);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-potential-kernel-bug-due-to-lack-of-writeback-flag-waiting.patch
nilfs2-fix-nilfs_empty_dir-misjudgment-and-long-loop-on-i-o-errors.patch
Just ${Subject} since it's a fix for a potential security footgun/DOS, whereever
commit 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
has been queued up.
Thanks!
Holger
Hi,
This seven series fix an issue reported by kernel test robot [3].
Shuah, I (as well as Kees and Sean [4]) think this should be in -next
really soon to make sure everything works fine for the v6.9 release,
which is not currently the case. I cannot test against all kselftests
though. I would prefer to let you handle this, but I guess you're not
able to do so and I'll push it on my branch without reply from you.
Even if I push it on my branch, please push it on yours too as soon as
you see this and I'll remove it from mine.
Mark, Jakub, could you please test this series?
As reported by Kernel Test Robot [1] and Sean Christopherson [2], some
tests fail since v6.9-rc1 . This is due to the use of vfork() which
introduced some side effects. Similarly, while making it more generic,
a previous commit made some Landlock file system tests flaky, and
subject to the host's file system mount configuration.
This series fixes all these side effects by replacing vfork() with
clone3() and CLONE_VFORK, which is cleaner (no arbitrary shared memory)
and makes the Kselftest framework more robust.
I tried different approaches and I found this one to be the cleaner and
less invasive for current test cases.
I successfully ran the following tests (using TEST_F and
fork/clone/clone3, and KVM_ONE_VCPU_TEST) with this series:
- kvm:fix_hypercall_test
- kvm:sync_regs_test
- kvm:userspace_msr_exit_test
- kvm:vmx_pmu_caps_test
- landlock:fs_test
- landlock:net_test
- landlock:ptrace_test
- move_mount_set_group:move_mount_set_group_test
- net/af_unix:scm_pidfd
- perf_events:remove_on_exec
- pidfd:pidfd_getfd_test
- pidfd:pidfd_setns_test
- seccomp:seccomp_bpf
- user_events:abi_test
[1] https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com
[2] https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com
[3] https://lore.kernel.org/r/202405100339.vfBe0t9C-lkp@intel.com
[4] https://lore.kernel.org/r/202405061002.01D399877A@keescook
Previous versions:
v1: https://lore.kernel.org/r/20240426172252.1862930-1-mic@digikod.net
v2: https://lore.kernel.org/r/20240429130931.2394118-1-mic@digikod.net
v3: https://lore.kernel.org/r/20240429191911.2552580-1-mic@digikod.net
v4: https://lore.kernel.org/r/20240502210926.145539-1-mic@digikod.net
v5: https://lore.kernel.org/r/20240503105820.300927-1-mic@digikod.net
v6: https://lore.kernel.org/r/20240506165518.474504-1-mic@digikod.net
Regards,
Mickaël Salaün (10):
selftests/pidfd: Fix config for pidfd_setns_test
selftests/landlock: Fix FS tests when run on a private mount point
selftests/harness: Fix fixture teardown
selftests/harness: Fix interleaved scheduling leading to race
conditions
selftests/landlock: Do not allocate memory in fixture data
selftests/harness: Constify fixture variants
selftests/pidfd: Fix wrong expectation
selftests/harness: Share _metadata between forked processes
selftests/harness: Fix vfork() side effects
selftests/harness: Handle TEST_F()'s explicit exit codes
tools/testing/selftests/kselftest_harness.h | 127 +++++++++++++-----
tools/testing/selftests/landlock/fs_test.c | 83 +++++++-----
tools/testing/selftests/pidfd/config | 2 +
.../selftests/pidfd/pidfd_setns_test.c | 2 +-
4 files changed, 147 insertions(+), 67 deletions(-)
base-commit: e67572cd2204894179d89bd7b984072f19313b03
--
2.45.0
On the Qualcomm RB1 and RB2 platforms the I2C bus connected to the
LT9611UXC bridge under some circumstances can go into a state when all
transfers timeout. This causes both issues with fetching of EDID and
with updating of the bridge's firmware.
While we are debugging the issue, switch corresponding I2C bus to use
i2c-gpio driver. While using i2c-gpio no communication issues are
observed.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
---
Dmitry Baryshkov (2):
arm64: dts: qcom: qrb2210-rb1: switch I2C2 to i2c-gpio
arm64: dts: qcom: qrb4210-rb2: switch I2C2 to i2c-gpio
arch/arm64/boot/dts/qcom/qrb2210-rb1.dts | 13 ++++++++++++-
arch/arm64/boot/dts/qcom/qrb4210-rb2.dts | 13 ++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)
---
base-commit: 0e1980c40b6edfa68b6acf926bab22448a6e40c9
change-id: 20240604-rb12-i2c2g-pio-f6035fa8e022
Best regards,
--
Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
On SC7280, in host mode, it is observed that stressing out controller
results in HC died error:
xhci-hcd.12.auto: xHCI host not responding to stop endpoint command
xhci-hcd.12.auto: xHCI host controller not responding, assume dead
xhci-hcd.12.auto: HC died; cleaning up
And at this instant only restarting the host mode fixes it. Disable
SuperSpeed instances in park mode for SC7280 to mitigate this issue.
Reported-by: Doug Anderson <dianders(a)google.com>
Cc: <stable(a)vger.kernel.org>
Fixes: bb9efa59c665 ("arm64: dts: qcom: sc7280: Add USB related nodes")
Signed-off-by: Krishna Kurapati <quic_kriskura(a)quicinc.com>
---
Removed RB/TB tag from Doug as commit text was updated.
arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 41f51d326111..e0d3eeb6f639 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -4131,6 +4131,7 @@ usb_1_dwc3: usb@a600000 {
iommus = <&apps_smmu 0xe0 0x0>;
snps,dis_u2_susphy_quirk;
snps,dis_enblslpm_quirk;
+ snps,parkmode-disable-ss-quirk;
phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
phy-names = "usb2-phy", "usb3-phy";
maximum-speed = "super-speed";
--
2.34.1
On SC7180, in host mode, it is observed that stressing out controller
results in HC died error:
xhci-hcd.12.auto: xHCI host not responding to stop endpoint command
xhci-hcd.12.auto: xHCI host controller not responding, assume dead
xhci-hcd.12.auto: HC died; cleaning up
And at this instant only restarting the host mode fixes it. Disable
SuperSpeed instances in park mode for SC7180 to mitigate this issue.
Reported-by: Doug Anderson <dianders(a)google.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes")
Signed-off-by: Krishna Kurapati <quic_kriskura(a)quicinc.com>
---
Removed RB/TB tag from Doug as commit text was updated.
arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index 2b481e20ae38..cc93b5675d5d 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -3063,6 +3063,7 @@ usb_1_dwc3: usb@a600000 {
iommus = <&apps_smmu 0x540 0>;
snps,dis_u2_susphy_quirk;
snps,dis_enblslpm_quirk;
+ snps,parkmode-disable-ss-quirk;
phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
phy-names = "usb2-phy", "usb3-phy";
maximum-speed = "super-speed";
--
2.34.1
I'm not seeing a test mail for v6.9.4-rc1 but it's in the stable-rc git
and I'm seeing extensive breakage on many platforms with it due to the
backporting of c0e0f139354 ("drm: Make drivers depends on DRM_DW_HDMI")
which was reverted upstream in commit 8f7f115596d3dcced ("Revert "drm:
Make drivers depends on DRM_DW_HDMI"") for a combination of the reasons
outlined in that revert and the extensive breakage that it cause in
-next.
jmicron_pmos() and sdhci_pci_probe() use pci_{read,write}_config_byte()
that return PCIBIOS_* codes. The return code is then returned as is by
jmicron_probe() and sdhci_pci_probe(). Similarly, the return code is
also returned as is from jmicron_resume(). Both probe and resume
functions should return normal errnos.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning them the fix these issues.
Fixes: 7582041ff3d4 ("mmc: sdhci-pci: fix simple_return.cocci warnings")
Fixes: 45211e215984 ("sdhci: toggle JMicron PMOS setting")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/mmc/host/sdhci-pci-core.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index ef89ec382bfe..23e6ba70144c 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -1326,7 +1326,7 @@ static int jmicron_pmos(struct sdhci_pci_chip *chip, int on)
ret = pci_read_config_byte(chip->pdev, 0xAE, &scratch);
if (ret)
- return ret;
+ goto fail;
/*
* Turn PMOS on [bit 0], set over current detection to 2.4 V
@@ -1337,7 +1337,10 @@ static int jmicron_pmos(struct sdhci_pci_chip *chip, int on)
else
scratch &= ~0x47;
- return pci_write_config_byte(chip->pdev, 0xAE, scratch);
+ ret = pci_write_config_byte(chip->pdev, 0xAE, scratch);
+
+fail:
+ return pcibios_err_to_errno(ret);
}
static int jmicron_probe(struct sdhci_pci_chip *chip)
@@ -2202,7 +2205,7 @@ static int sdhci_pci_probe(struct pci_dev *pdev,
ret = pci_read_config_byte(pdev, PCI_SLOT_INFO, &slots);
if (ret)
- return ret;
+ return pcibios_err_to_errno(ret);
slots = PCI_SLOT_INFO_SLOTS(slots) + 1;
dev_dbg(&pdev->dev, "found %d slot(s)\n", slots);
@@ -2211,7 +2214,7 @@ static int sdhci_pci_probe(struct pci_dev *pdev,
ret = pci_read_config_byte(pdev, PCI_SLOT_INFO, &first_bar);
if (ret)
- return ret;
+ return pcibios_err_to_errno(ret);
first_bar &= PCI_SLOT_INFO_FIRST_BAR_MASK;
--
2.39.2
gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is then returned all the way into the module
init function amd64_edac_init() that returns it as is. The module init
functions, however, should return normal errnos.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from gpu_get_node_map().
For consistency, convert also the other similar cases which return
PCIBIOS_* codes even if they do not have any bugs at the moment.
Fixes: 4251566ebc1c ("EDAC/amd64: Cache and use GPU node map")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/edac/amd64_edac.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1f3520d76861..a17f3c0cdfa6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -81,7 +81,7 @@ int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
amd64_warn("%s: error reading F%dx%03x.\n",
func, PCI_FUNC(pdev->devfn), offset);
- return err;
+ return pcibios_err_to_errno(err);
}
int __amd64_write_pci_cfg_dword(struct pci_dev *pdev, int offset,
@@ -94,7 +94,7 @@ int __amd64_write_pci_cfg_dword(struct pci_dev *pdev, int offset,
amd64_warn("%s: error writing to F%dx%03x.\n",
func, PCI_FUNC(pdev->devfn), offset);
- return err;
+ return pcibios_err_to_errno(err);
}
/*
@@ -1025,8 +1025,10 @@ static int gpu_get_node_map(struct amd64_pvt *pvt)
}
ret = pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp);
- if (ret)
+ if (ret) {
+ ret = pcibios_err_to_errno(ret);
goto out;
+ }
gpu_node_map.node_count = FIELD_GET(LNTM_NODE_COUNT, tmp);
gpu_node_map.base_node_id = FIELD_GET(LNTM_BASE_NODE_ID, tmp);
--
2.39.2
Some code includes the __used macro to prevent functions and data from
being optimized out. This macro implements __attribute__((__used__)), which
operates at the compiler and IR-level, and so still allows a linker to
remove objects intended to be kept.
Compilers supporting __attribute__((__retain__)) can address this gap by
setting the flag SHF_GNU_RETAIN on the section of a function/variable,
indicating to the linker the object should be retained. This attribute is
available since gcc 11, clang 13, and binutils 2.36.
Provide a __retain macro implementing __attribute__((__retain__)), whose
first user will be the '__bpf_kfunc' tag.
Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/
Cc: stable(a)vger.kernel.org # v6.6+
Signed-off-by: Tony Ambardar <Tony.Ambardar(a)gmail.com>
---
include/linux/compiler_attributes.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index 32284cd26d52..1c22e1a734dc 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -326,6 +326,20 @@
*/
#define __pure __attribute__((__pure__))
+/*
+ * Optional: only supported since gcc >= 11, clang >= 13
+ *
+ * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-re…
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#retain
+ */
+#if __has_attribute(__retain__) && \
+ (defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || \
+ defined(CONFIG_LTO_CLANG))
+# define __retain __attribute__((__retain__))
+#else
+# define __retain
+#endif
+
/*
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-se…
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-se…
--
2.34.1
Hi Sasha,
A week ago, I notified the stable mailing list that this patch is not
a bug fix that should be backported, and Greg removed it from the
backport queue to the stable trees.
[1] https://lkml.kernel.org/r/CAKFNMo=kyzbvfLrTv8JhuY=e7-fkjtpL3DvcQ1r+RUPPeC4S…
On Tue, May 28, 2024 at 3:28 AM Greg KH <greg(a)kroah.com> wrote:
> > This commit fixes the sparse warning output by build "make C=1" with
> > the sparse check, but does not fix any operational bugs.
> >
> > Therefore, if fixing a harmless sparse warning does not meet the
> > requirements for backporting to stable trees (I assume it does),
> > please drop it as it is a false positive pickup. Sorry if the
> > "Fixes:" tag is confusing.
> >
> > The same goes for the same patch queued to other stable-trees.
>
> Now dropped, thanks!
>
> greg k-h
So I think this is just a case where the "Fixes" tag was mechanically
detected and mistakenly picked up again.
Could you please confirm?
Regards,
Ryusuke Konishi
On Mon, Jun 3, 2024 at 9:11 PM Sasha Levin wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> nilfs2: make superblock data array index computation sparse friendly
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> nilfs2-make-superblock-data-array-index-computation-.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 358bc3e8f5a5e2c51fc07aadb70e25fa206e764b
> Author: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
> Date: Tue Apr 30 17:00:19 2024 +0900
>
> nilfs2: make superblock data array index computation sparse friendly
>
> [ Upstream commit 91d743a9c8299de1fc1b47428d8bb4c85face00f ]
>
> Upon running sparse, "warning: dubious: x & !y" is output at an array
> index calculation within nilfs_load_super_block().
>
> The calculation is not wrong, but to eliminate the sparse warning, replace
> it with an equivalent calculation.
>
> Also, add a comment to make it easier to understand what the unintuitive
> array index calculation is doing and whether it's correct.
>
> Link: https://lkml.kernel.org/r/20240430080019.4242-3-konishi.ryusuke@gmail.com
> Fixes: e339ad31f599 ("nilfs2: introduce secondary super block")
> Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
> Cc: Bart Van Assche <bvanassche(a)acm.org>
> Cc: Jens Axboe <axboe(a)kernel.dk>
> Cc: kernel test robot <lkp(a)intel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
> index 71400496ed365..3e3c1d32da180 100644
> --- a/fs/nilfs2/the_nilfs.c
> +++ b/fs/nilfs2/the_nilfs.c
> @@ -592,7 +592,7 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
> struct nilfs_super_block **sbp = nilfs->ns_sbp;
> struct buffer_head **sbh = nilfs->ns_sbh;
> u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
> - int valid[2], swp = 0;
> + int valid[2], swp = 0, older;
>
> if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
> nilfs_err(sb, "device size too small");
> @@ -648,9 +648,25 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
> if (swp)
> nilfs_swap_super_block(nilfs);
>
> + /*
> + * Calculate the array index of the older superblock data.
> + * If one has been dropped, set index 0 pointing to the remaining one,
> + * otherwise set index 1 pointing to the old one (including if both
> + * are the same).
> + *
> + * Divided case valid[0] valid[1] swp -> older
> + * -------------------------------------------------------------
> + * Both SBs are invalid 0 0 N/A (Error)
> + * SB1 is invalid 0 1 1 0
> + * SB2 is invalid 1 0 0 0
> + * SB2 is newer 1 1 1 0
> + * SB2 is older or the same 1 1 0 1
> + */
> + older = valid[1] ^ swp;
> +
> nilfs->ns_sbwcount = 0;
> nilfs->ns_sbwtime = le64_to_cpu(sbp[0]->s_wtime);
> - nilfs->ns_prot_seq = le64_to_cpu(sbp[valid[1] & !swp]->s_last_seq);
> + nilfs->ns_prot_seq = le64_to_cpu(sbp[older]->s_last_seq);
> *sbpp = sbp[0];
> return 0;
> }
Not for 6.1 and earlier kernels.
-------- Forwarded Message --------
Date: Sat, 11 May 2024 09:05:18 -0400
Subject: Re: Patch "nfc: nci: Fix kcov check in nci_rx_work()" has been added to the 6.1-stable tree
Message-ID: <Zj9tDunQd3BDcG2a@sashalap>
On Sat, May 11, 2024 at 07:53:00AM +0900, Tetsuo Handa wrote:
>On 2024/05/11 6:39, Sasha Levin wrote:
>> This is a note to let you know that I've just added the patch titled
>>
>> nfc: nci: Fix kcov check in nci_rx_work()
>>
>> to the 6.1-stable tree which can be found at:
>> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>>
>> The filename of the patch is:
>> nfc-nci-fix-kcov-check-in-nci_rx_work.patch
>> and it can be found in the queue-6.1 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let <stable(a)vger.kernel.org> know about it.
>>
>
>I think we should not add this patch to 6.1 and earlier kernels, for
>only 6.2 and later kernels call kcov_remote_stop() from nci_rx_work().
Dropped, thanks!
--
Thanks,
Sasha
On 2024/06/03 21:13, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> nfc: nci: Fix kcov check in nci_rx_work()
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> nfc-nci-fix-kcov-check-in-nci_rx_work.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
On Thu, 30 May 2024 21:02:36 +0200,
Sasha Levin wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> ALSA: timer: Set lower bound of start tick time
>
> to the 6.8-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> alsa-timer-set-lower-bound-of-start-tick-time.patch
> and it can be found in the queue-6.8 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please drop this one for 6.8 and older (you posted for 6.6 too).
As already explained in another mail, this commit needs a prerequisite
use of guard().
An alternative patch has been already submitted. Take it instead:
https://lore.kernel.org/all/20240527062431.18709-1-tiwai@suse.de/
thanks,
Takashi
>
>
>
> commit d717dbdb94145bee1e9cf9eca387d973564203c4
> Author: Takashi Iwai <tiwai(a)suse.de>
> Date: Tue May 14 20:27:36 2024 +0200
>
> ALSA: timer: Set lower bound of start tick time
>
> [ Upstream commit 4a63bd179fa8d3fcc44a0d9d71d941ddd62f0c4e ]
>
> Currently ALSA timer doesn't have the lower limit of the start tick
> time, and it allows a very small size, e.g. 1 tick with 1ns resolution
> for hrtimer. Such a situation may lead to an unexpected RCU stall,
> where the callback repeatedly queuing the expire update, as reported
> by fuzzer.
>
> This patch introduces a sanity check of the timer start tick time, so
> that the system returns an error when a too small start size is set.
> As of this patch, the lower limit is hard-coded to 100us, which is
> small enough but can still work somehow.
>
> Reported-by: syzbot+43120c2af6ca2938cc38(a)syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/r/000000000000fa00a1061740ab6d@google.com
> Cc: <stable(a)vger.kernel.org>
> Link: https://lore.kernel.org/r/20240514182745.4015-1-tiwai@suse.de
> Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/sound/core/timer.c b/sound/core/timer.c
> index e6e551d4a29e0..42c4c2b029526 100644
> --- a/sound/core/timer.c
> +++ b/sound/core/timer.c
> @@ -553,6 +553,14 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
> goto unlock;
> }
>
> + /* check the actual time for the start tick;
> + * bail out as error if it's way too low (< 100us)
> + */
> + if (start) {
> + if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
> + return -EINVAL;
> + }
> +
> if (start)
> timeri->ticks = timeri->cticks = ticks;
> else if (!timeri->cticks)
BPF kfuncs are often not directly referenced and may be inadvertently
removed by optimization steps during kernel builds, thus the __bpf_kfunc
tag mitigates against this removal by including the __used macro. However,
this macro alone does not prevent removal during linking, and may still
yield build warnings (e.g. on mips64el):
LD vmlinux
BTFIDS vmlinux
WARN: resolve_btfids: unresolved symbol bpf_verify_pkcs7_signature
WARN: resolve_btfids: unresolved symbol bpf_lookup_user_key
WARN: resolve_btfids: unresolved symbol bpf_lookup_system_key
WARN: resolve_btfids: unresolved symbol bpf_key_put
WARN: resolve_btfids: unresolved symbol bpf_iter_task_next
WARN: resolve_btfids: unresolved symbol bpf_iter_css_task_new
WARN: resolve_btfids: unresolved symbol bpf_get_file_xattr
WARN: resolve_btfids: unresolved symbol bpf_ct_insert_entry
WARN: resolve_btfids: unresolved symbol bpf_cgroup_release
WARN: resolve_btfids: unresolved symbol bpf_cgroup_from_id
WARN: resolve_btfids: unresolved symbol bpf_cgroup_acquire
WARN: resolve_btfids: unresolved symbol bpf_arena_free_pages
NM System.map
SORTTAB vmlinux
OBJCOPY vmlinux.32
Update the __bpf_kfunc tag to better guard against linker optimization by
including the new __retain compiler macro, which fixes the warnings above.
Verify the __retain macro with readelf by checking object flags for 'R':
$ readelf -Wa kernel/trace/bpf_trace.o
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
...
[178] .text.bpf_key_put PROGBITS 00000000 6420 0050 00 AXR 0 0 8
...
Key to Flags:
...
R (retain), D (mbind), p (processor specific)
Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/r/202401211357.OCX9yllM-lkp@intel.com/
Fixes: 57e7c169cd6a ("bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs")
Cc: stable(a)vger.kernel.org # v6.6+
Signed-off-by: Tony Ambardar <Tony.Ambardar(a)gmail.com>
---
include/linux/btf.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9e56fd12a9f..7c3e40c3295e 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -82,7 +82,7 @@
* as to avoid issues such as the compiler inlining or eliding either a static
* kfunc, or a global kfunc in an LTO build.
*/
-#define __bpf_kfunc __used noinline
+#define __bpf_kfunc __used __retain noinline
#define __bpf_kfunc_start_defs() \
__diag_push(); \
--
2.34.1
On Mon, 03 Jun 2024 12:52:54 +0100,
Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> KVM: arm64: nv: Add sanitising to VNCR-backed sysregs
>
> to the 6.8-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> kvm-arm64-nv-add-sanitising-to-vncr-backed-sysregs.patch
> and it can be found in the queue-6.8 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit fbb2bcdc458dd7db3860f85a06e98cc25904d20d
> Author: Marc Zyngier <maz(a)kernel.org>
> Date: Wed Feb 14 13:18:04 2024 +0000
>
> KVM: arm64: nv: Add sanitising to VNCR-backed sysregs
>
> [ Upstream commit 888f0880702293096619b300150cd7e59fcd9743 ]
>
> VNCR-backed "registers" are actually only memory. Which means that
> there is zero control over what the guest can write, and that it
> is the hypervisor's job to actually sanitise the content of the
> backing store. Yeah, this is fun.
>
> In order to preserve some form of sanity, add a repainting mechanism
> that makes use of a per-VM set of RES0/RES1 masks, one pair per VNCR
> register. These masks get applied on access to the backing store via
> __vcpu_sys_reg(), ensuring that the state that is consumed by KVM is
> correct.
>
> So far, nothing populates these masks, but stay tuned.
>
> Signed-off-by: Marc Zyngier <maz(a)kernel.org>
> Reviewed-by: Joey Gouly <joey.gouly(a)arm.com>
> Link: https://lore.kernel.org/r/20240214131827.2856277-4-maz@kernel.org
> Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
> Stable-dep-of: ce5d2448eb8f ("KVM: arm64: Destroy mpidr_data for 'late' vCPU creation")
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Please drop this. It serves no purpose on 6.8 aside from wasting
memory. If backporting ce5d2448eb8f is hard due to some conflicts,
we'll tackle it ourselves.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: e306a894bd511804ba9db7c00ca9cc05b55df1f2
Gitweb: https://git.kernel.org/tip/e306a894bd511804ba9db7c00ca9cc05b55df1f2
Author: Samuel Holland <samuel.holland(a)sifive.com>
AuthorDate: Wed, 29 May 2024 14:54:56 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 03 Jun 2024 13:53:12 +02:00
irqchip/sifive-plic: Chain to parent IRQ after handlers are ready
Now that the PLIC uses a platform driver, the driver is probed later in the
boot process, where interrupts from peripherals might already be pending.
As a result, plic_handle_irq() may be called as early as the call to
irq_set_chained_handler() completes. But this call happens before the
per-context handler is completely set up, so there is a window where
plic_handle_irq() can see incomplete per-context state and crash.
Avoid this by delaying the call to irq_set_chained_handler() until all
handlers from all PLICs are initialized.
Fixes: 8ec99b033147 ("irqchip/sifive-plic: Convert PLIC driver into a platform driver")
Reported-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Signed-off-by: Samuel Holland <samuel.holland(a)sifive.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Reviewed-by: Anup Patel <anup(a)brainfault.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240529215458.937817-1-samuel.holland@sifive.com
Closes: https://lore.kernel.org/r/CAMuHMdVYFFR7K5SbHBLY-JHhb7YpgGMS_hnRWm8H0KD-wBo+…
---
drivers/irqchip/irq-sifive-plic.c | 34 +++++++++++++++---------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 8fb183c..9e22f7e 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -85,7 +85,7 @@ struct plic_handler {
struct plic_priv *priv;
};
static int plic_parent_irq __ro_after_init;
-static bool plic_cpuhp_setup_done __ro_after_init;
+static bool plic_global_setup_done __ro_after_init;
static DEFINE_PER_CPU(struct plic_handler, plic_handlers);
static int plic_irq_set_type(struct irq_data *d, unsigned int type);
@@ -487,10 +487,8 @@ static int plic_probe(struct platform_device *pdev)
unsigned long plic_quirks = 0;
struct plic_handler *handler;
u32 nr_irqs, parent_hwirq;
- struct irq_domain *domain;
struct plic_priv *priv;
irq_hw_number_t hwirq;
- bool cpuhp_setup;
if (is_of_node(dev->fwnode)) {
const struct of_device_id *id;
@@ -549,14 +547,6 @@ static int plic_probe(struct platform_device *pdev)
continue;
}
- /* Find parent domain and register chained handler */
- domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(), DOMAIN_BUS_ANY);
- if (!plic_parent_irq && domain) {
- plic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
- if (plic_parent_irq)
- irq_set_chained_handler(plic_parent_irq, plic_handle_irq);
- }
-
/*
* When running in M-mode we need to ignore the S-mode handler.
* Here we assume it always comes later, but that might be a
@@ -597,25 +587,35 @@ done:
goto fail_cleanup_contexts;
/*
- * We can have multiple PLIC instances so setup cpuhp state
+ * We can have multiple PLIC instances so setup global state
* and register syscore operations only once after context
* handlers of all online CPUs are initialized.
*/
- if (!plic_cpuhp_setup_done) {
- cpuhp_setup = true;
+ if (!plic_global_setup_done) {
+ struct irq_domain *domain;
+ bool global_setup = true;
+
for_each_online_cpu(cpu) {
handler = per_cpu_ptr(&plic_handlers, cpu);
if (!handler->present) {
- cpuhp_setup = false;
+ global_setup = false;
break;
}
}
- if (cpuhp_setup) {
+
+ if (global_setup) {
+ /* Find parent domain and register chained handler */
+ domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(), DOMAIN_BUS_ANY);
+ if (domain)
+ plic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT);
+ if (plic_parent_irq)
+ irq_set_chained_handler(plic_parent_irq, plic_handle_irq);
+
cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
"irqchip/sifive/plic:starting",
plic_starting_cpu, plic_dying_cpu);
register_syscore_ops(&plic_irq_syscore_ops);
- plic_cpuhp_setup_done = true;
+ plic_global_setup_done = true;
}
}
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array
is allocated in __exfat_get_entry_set. The problem is that the bh-array is
allocated with GFP_KERNEL. It does not make sense. In the following cases,
a deadlock for sbi->s_lock between the two processes may occur.
CPU0 CPU1
---- ----
kswapd
balance_pgdat
lock(fs_reclaim)
exfat_iterate
lock(&sbi->s_lock)
exfat_readdir
exfat_get_uniname_from_ext_entry
exfat_get_dentry_set
__exfat_get_dentry_set
kmalloc_array
...
lock(fs_reclaim)
...
evict
exfat_evict_inode
lock(&sbi->s_lock)
To fix this, let's allocate bh-array with GFP_NOFS.
Fixes: a3ff29a95fde ("exfat: support dynamic allocate bh for exfat_entry_set_cache")
Cc: stable(a)vger.kernel.org # v6.2+
Reported-by: syzbot+412a392a2cd4a65e71db(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000fef47e0618c0327f@google.com
Signed-off-by: Sungjong Seo <sj1557.seo(a)samsung.com>
---
fs/exfat/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 84572e11cc05..7446bf09a04a 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -813,7 +813,7 @@ static int __exfat_get_dentry_set(struct exfat_entry_set_cache *es,
num_bh = EXFAT_B_TO_BLK_ROUND_UP(off + num_entries * DENTRY_SIZE, sb);
if (num_bh > ARRAY_SIZE(es->__bh)) {
- es->bh = kmalloc_array(num_bh, sizeof(*es->bh), GFP_KERNEL);
+ es->bh = kmalloc_array(num_bh, sizeof(*es->bh), GFP_NOFS);
if (!es->bh) {
brelse(bh);
return -ENOMEM;
--
2.25.1
Currently return status is not getting checked for get_api_version
and because of that for x86 arch we are getting below smatch error.
CC drivers/soc/xilinx/zynqmp_power.o
drivers/soc/xilinx/zynqmp_power.c: In function 'zynqmp_pm_probe':
drivers/soc/xilinx/zynqmp_power.c:295:12: warning: 'pm_api_version' is
used uninitialized [-Wuninitialized]
295 | if (pm_api_version < ZYNQMP_PM_VERSION)
| ^
CHECK drivers/soc/xilinx/zynqmp_power.c
drivers/soc/xilinx/zynqmp_power.c:295 zynqmp_pm_probe() error:
uninitialized symbol 'pm_api_version'.
So, check return status of pm_get_api_version and return error in case
of failure to avoid checking uninitialized pm_api_version variable.
Fixes: b9b3a8be28b3 ("firmware: xilinx: Remove eemi ops for get_api_version")
Signed-off-by: Jay Buddhabhatti <jay.buddhabhatti(a)amd.com>
Cc: stable(a)vger.kernel.org
---
V1: https://lore.kernel.org/lkml/20240424063118.23200-1-jay.buddhabhatti@amd.co…
V2: https://lore.kernel.org/lkml/20240509045616.22338-1-jay.buddhabhatti@amd.co…
V2->V3: Added stable tree email in cc
V1->V2: Removed AMD copyright
---
drivers/soc/xilinx/zynqmp_power.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/xilinx/zynqmp_power.c b/drivers/soc/xilinx/zynqmp_power.c
index 965b1143936a..b82c01373f53 100644
--- a/drivers/soc/xilinx/zynqmp_power.c
+++ b/drivers/soc/xilinx/zynqmp_power.c
@@ -190,7 +190,9 @@ static int zynqmp_pm_probe(struct platform_device *pdev)
u32 pm_api_version;
struct mbox_client *client;
- zynqmp_pm_get_api_version(&pm_api_version);
+ ret = zynqmp_pm_get_api_version(&pm_api_version);
+ if (ret)
+ return ret;
/* Check PM API version number */
if (pm_api_version < ZYNQMP_PM_VERSION)
--
2.17.1
Correct the specified regulator-min-microvolt value for the buck DCDC_REG2
regulator, which is part of the Rockchip RK809 PMIC, in the Pine64 Quartz64
Model B board dts. According to the RK809 datasheet, version 1.01, this
regulator is capable of producing voltages as low as 0.5 V on its output,
instead of going down to 0.9 V only, which is additionally confirmed by the
regulator-min-microvolt values found in the board dts files for the other
supported boards that use the same RK809 PMIC.
This allows the DVFS to clock the GPU on the Quartz64 Model B below 700 MHz,
all the way down to 200 MHz, which saves some power and reduces the amount of
generated heat a bit, improving the thermal headroom and possibly improving
the bursty CPU and GPU performance on this board.
This also eliminates the following warnings in the kernel log:
core: _opp_supported_by_regulators: OPP minuV: 825000 maxuV: 825000, not supported by regulator
panfrost fde60000.gpu: _opp_add: OPP not supported by regulators (200000000)
core: _opp_supported_by_regulators: OPP minuV: 825000 maxuV: 825000, not supported by regulator
panfrost fde60000.gpu: _opp_add: OPP not supported by regulators (300000000)
core: _opp_supported_by_regulators: OPP minuV: 825000 maxuV: 825000, not supported by regulator
panfrost fde60000.gpu: _opp_add: OPP not supported by regulators (400000000)
core: _opp_supported_by_regulators: OPP minuV: 825000 maxuV: 825000, not supported by regulator
panfrost fde60000.gpu: _opp_add: OPP not supported by regulators (600000000)
Fixes: dcc8c66bef79 ("arm64: dts: rockchip: add Pine64 Quartz64-B device tree")
Cc: stable(a)vger.kernel.org
Reported-By: Diederik de Haas <didi.debian(a)cknow.org>
Signed-off-by: Dragan Simic <dsimic(a)manjaro.org>
---
arch/arm64/boot/dts/rockchip/rk3566-quartz64-b.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3566-quartz64-b.dts b/arch/arm64/boot/dts/rockchip/rk3566-quartz64-b.dts
index 26322a358d91..b908ce006c26 100644
--- a/arch/arm64/boot/dts/rockchip/rk3566-quartz64-b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3566-quartz64-b.dts
@@ -289,7 +289,7 @@ vdd_gpu: DCDC_REG2 {
regulator-name = "vdd_gpu";
regulator-always-on;
regulator-boot-on;
- regulator-min-microvolt = <900000>;
+ regulator-min-microvolt = <500000>;
regulator-max-microvolt = <1350000>;
regulator-ramp-delay = <6001>;
The patch titled
Subject: mm/vmalloc: fix vbq->free breakage
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmalloc-fix-vbq-free-breakage.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "hailong.liu" <hailong.liu(a)oppo.com>
Subject: mm/vmalloc: fix vbq->free breakage
Date: Thu, 30 May 2024 17:31:08 +0800
The function xa_for_each() in _vm_unmap_aliases() loops through all vbs.
However, since commit 062eacf57ad9 ("mm: vmalloc: remove a global
vmap_blocks xarray") the vb from xarray may not be on the corresponding
CPU vmap_block_queue. Consequently, purge_fragmented_block() might use
the wrong vbq->lock to protect the free list, leading to vbq->free
breakage.
Link: https://lkml.kernel.org/r/20240530093108.4512-1-hailong.liu@oppo.com
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Hailong.Liu <liuhailong(a)oppo.com>
Reported-by: Guangye Yang <guangye.yang(a)mediatek.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Gao Xiang <xiang(a)kernel.org>
Cc: Guangye Yang <guangye.yang(a)mediatek.com>
Cc: liuhailong <liuhailong(a)oppo.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-fix-vbq-free-breakage
+++ a/mm/vmalloc.c
@@ -2830,10 +2830,9 @@ static void _vm_unmap_aliases(unsigned l
for_each_possible_cpu(cpu) {
struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
struct vmap_block *vb;
- unsigned long idx;
rcu_read_lock();
- xa_for_each(&vbq->vmap_blocks, idx, vb) {
+ list_for_each_entry_rcu(vb, &vbq->free, free_list) {
spin_lock(&vb->lock);
/*
_
Patches currently in -mm which might be from hailong.liu(a)oppo.com are
mm-vmalloc-fix-vbq-free-breakage.patch
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream.
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c(a)syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Tested-by: Jiri Olsa <jolsa(a)kernel.org>
Acked-by: Andy Lutomirski <luto(a)kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKey…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[gpiccoli: Backport the patch due to differences in the trees. The main changes
between 5.4.y and 5.15.y are due to renaming the fixup function, by
commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()"),
and on processor.h thread_struct due to commit cf122cfba5b1 ("kill uaccess_try()").
Following 2 commits cause divergence in the diffs too (in the removed lines):
cd072dab453a ("x86/fault: Add a helper function to sanitize error code")
d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey").]
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
Hi folks, this was backported by AUTOSEL up to 5.15.y; I'm manually submitting
the backport to 5.4.y and 5.10.y. I've detailed a bit the changes necessary
due to other nonrelated missing patches, but these are really simple and
non-intrusive. Nevertheless, I've explicitely CCed x86 ML to be sure the
maintainers are aware of the backport, and if anybody thinks we shouldn't
do it for these (very) old releases, please respond here.
Cheers,
Guilherme
arch/x86/entry/vsyscall/vsyscall_64.c | 28 ++-------------------------
arch/x86/include/asm/processor.h | 1 -
arch/x86/mm/fault.c | 27 +-------------------------
3 files changed, 3 insertions(+), 53 deletions(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index e7c596dea947..86e5a1c1055f 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
static bool write_ok_or_segv(unsigned long ptr, size_t size)
{
- /*
- * XXX: if access_ok, get_user, and put_user handled
- * sig_on_uaccess_err, this could go away.
- */
-
if (!access_ok((void __user *)ptr, size)) {
struct thread_struct *thread = ¤t->thread;
@@ -120,10 +115,8 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
bool emulate_vsyscall(unsigned long error_code,
struct pt_regs *regs, unsigned long address)
{
- struct task_struct *tsk;
unsigned long caller;
int vsyscall_nr, syscall_nr, tmp;
- int prev_sig_on_uaccess_err;
long ret;
unsigned long orig_dx;
@@ -172,8 +165,6 @@ bool emulate_vsyscall(unsigned long error_code,
goto sigsegv;
}
- tsk = current;
-
/*
* Check for access_ok violations and find the syscall nr.
*
@@ -233,12 +224,8 @@ bool emulate_vsyscall(unsigned long error_code,
goto do_ret; /* skip requested */
/*
- * With a real vsyscall, page faults cause SIGSEGV. We want to
- * preserve that behavior to make writing exploits harder.
+ * With a real vsyscall, page faults cause SIGSEGV.
*/
- prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
- current->thread.sig_on_uaccess_err = 1;
-
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
@@ -261,23 +248,12 @@ bool emulate_vsyscall(unsigned long error_code,
break;
}
- current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
check_fault:
if (ret == -EFAULT) {
/* Bad news -- userspace fed a bad pointer to a vsyscall. */
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall fault (exploit attempt?)");
-
- /*
- * If we failed to generate a signal for any reason,
- * generate one here. (This should be impossible.)
- */
- if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
- !sigismember(&tsk->pending.signal, SIGSEGV)))
- goto sigsegv;
-
- return true; /* Don't emulate the ret. */
+ goto sigsegv;
}
regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9cbd86cf0deb..087df5edef78 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -487,7 +487,6 @@ struct thread_struct {
mm_segment_t addr_limit;
- unsigned int sig_on_uaccess_err:1;
unsigned int uaccess_err:1; /* uaccess failed */
/* Floating point and extended processor state */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c494c8c05824..21383ef7b506 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -743,33 +743,8 @@ no_context(struct pt_regs *regs, unsigned long error_code,
}
/* Are we prepared to handle this kernel fault? */
- if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
- /*
- * Any interrupt that takes a fault gets the fixup. This makes
- * the below recursive fault logic only apply to a faults from
- * task context.
- */
- if (in_interrupt())
- return;
-
- /*
- * Per the above we're !in_interrupt(), aka. task context.
- *
- * In this case we need to make sure we're not recursively
- * faulting through the emulate_vsyscall() logic.
- */
- if (current->thread.sig_on_uaccess_err && signal) {
- set_signal_archinfo(address, error_code);
-
- /* XXX: hwpoison faults will set the wrong code. */
- force_sig_fault(signal, si_code, (void __user *)address);
- }
-
- /*
- * Barring that, we can do the fixup and be happy.
- */
+ if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
return;
- }
#ifdef CONFIG_VMAP_STACK
/*
--
2.45.1
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Use IRQ ONESHOT flag to ensure the timestamp is not updated in the
hard handler during the thread handler. And use a fixed value of 1
sample that correspond to this first timestamp.
This way we can ensure the timestamp is always corresponding to the
value used by the timestamping mechanism. Otherwise, it is possible
that between FIFO count read and FIFO processing the timestamp is
overwritten in the hard handler.
Fixes: 111e1abd0045 ("iio: imu: inv_mpu6050: use the common inv_sensors timestamp module")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
---
drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 4 ++--
drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 0dc0f22a5582..3d3b27f28c9d 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -100,8 +100,8 @@ irqreturn_t inv_mpu6050_read_fifo(int irq, void *p)
goto end_session;
/* Each FIFO data contains all sensors, so same number for FIFO and sensor data */
fifo_period = NSEC_PER_SEC / INV_MPU6050_DIVIDER_TO_FIFO_RATE(st->chip_config.divider);
- inv_sensors_timestamp_interrupt(&st->timestamp, nb, pf->timestamp);
- inv_sensors_timestamp_apply_odr(&st->timestamp, fifo_period, nb, 0);
+ inv_sensors_timestamp_interrupt(&st->timestamp, 1, pf->timestamp);
+ inv_sensors_timestamp_apply_odr(&st->timestamp, fifo_period, 1, 0);
/* clear internal data buffer for avoiding kernel data leak */
memset(data, 0, sizeof(data));
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index 1b603567ccc8..84273660ca2e 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -300,6 +300,7 @@ int inv_mpu6050_probe_trigger(struct iio_dev *indio_dev, int irq_type)
if (!st->trig)
return -ENOMEM;
+ irq_type |= IRQF_ONESHOT;
ret = devm_request_threaded_irq(&indio_dev->dev, st->irq,
&inv_mpu6050_interrupt_timestamp,
&inv_mpu6050_interrupt_handle,
--
2.34.1
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Use IRQ_ONESHOT flag to ensure the timestamp is not updated in the
hard handler during the thread handler. And compute and use the
effective watermark value that correspond to this first timestamp.
This way we can ensure the timestamp is always corresponding to the
value used by the timestamping mechanism. Otherwise, it is possible
that between FIFO count read and FIFO processing the timestamp is
overwritten in the hard handler.
Fixes: ec74ae9fd37c ("iio: imu: inv_icm42600: add accurate timestamping")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
---
.../imu/inv_icm42600/inv_icm42600_buffer.c | 19 +++++++++++++++++--
.../imu/inv_icm42600/inv_icm42600_buffer.h | 2 ++
.../iio/imu/inv_icm42600/inv_icm42600_core.c | 1 +
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
index 63b85ec88c13..a8cf74c84c3c 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
@@ -222,10 +222,15 @@ int inv_icm42600_buffer_update_watermark(struct inv_icm42600_state *st)
latency_accel = period_accel * wm_accel;
/* 0 value for watermark means that the sensor is turned off */
+ if (wm_gyro == 0 && wm_accel == 0)
+ return 0;
+
if (latency_gyro == 0) {
watermark = wm_accel;
+ st->fifo.watermark.eff_accel = wm_accel;
} else if (latency_accel == 0) {
watermark = wm_gyro;
+ st->fifo.watermark.eff_gyro = wm_gyro;
} else {
/* compute the smallest latency that is a multiple of both */
if (latency_gyro <= latency_accel)
@@ -241,6 +246,13 @@ int inv_icm42600_buffer_update_watermark(struct inv_icm42600_state *st)
watermark = latency / period;
if (watermark < 1)
watermark = 1;
+ /* update effective watermark */
+ st->fifo.watermark.eff_gyro = latency / period_gyro;
+ if (st->fifo.watermark.eff_gyro < 1)
+ st->fifo.watermark.eff_gyro = 1;
+ st->fifo.watermark.eff_accel = latency / period_accel;
+ if (st->fifo.watermark.eff_accel < 1)
+ st->fifo.watermark.eff_accel = 1;
}
/* compute watermark value in bytes */
@@ -514,7 +526,7 @@ int inv_icm42600_buffer_fifo_parse(struct inv_icm42600_state *st)
/* handle gyroscope timestamp and FIFO data parsing */
if (st->fifo.nb.gyro > 0) {
ts = &gyro_st->ts;
- inv_sensors_timestamp_interrupt(ts, st->fifo.nb.gyro,
+ inv_sensors_timestamp_interrupt(ts, st->fifo.watermark.eff_gyro,
st->timestamp.gyro);
ret = inv_icm42600_gyro_parse_fifo(st->indio_gyro);
if (ret)
@@ -524,7 +536,7 @@ int inv_icm42600_buffer_fifo_parse(struct inv_icm42600_state *st)
/* handle accelerometer timestamp and FIFO data parsing */
if (st->fifo.nb.accel > 0) {
ts = &accel_st->ts;
- inv_sensors_timestamp_interrupt(ts, st->fifo.nb.accel,
+ inv_sensors_timestamp_interrupt(ts, st->fifo.watermark.eff_accel,
st->timestamp.accel);
ret = inv_icm42600_accel_parse_fifo(st->indio_accel);
if (ret)
@@ -577,6 +589,9 @@ int inv_icm42600_buffer_init(struct inv_icm42600_state *st)
unsigned int val;
int ret;
+ st->fifo.watermark.eff_gyro = 1;
+ st->fifo.watermark.eff_accel = 1;
+
/*
* Default FIFO configuration (bits 7 to 5)
* - use invalid value
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h
index 8b85ee333bf8..f6c85daf42b0 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h
@@ -32,6 +32,8 @@ struct inv_icm42600_fifo {
struct {
unsigned int gyro;
unsigned int accel;
+ unsigned int eff_gyro;
+ unsigned int eff_accel;
} watermark;
size_t count;
struct {
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c
index 96116a68ab29..62fdae530334 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c
@@ -537,6 +537,7 @@ static int inv_icm42600_irq_init(struct inv_icm42600_state *st, int irq,
if (ret)
return ret;
+ irq_type |= IRQF_ONESHOT;
return devm_request_threaded_irq(dev, irq, inv_icm42600_irq_timestamp,
inv_icm42600_irq_handler, irq_type,
"inv_icm42600", st);
--
2.34.1
Hi, I hope this is the right place
I think I found a security bug.
I have a faulty hard disk and sometimes the system doesn't boot
but a root console appears.
it's already the second time and I didn't think to take a photo.
I'm not talking about the control D or Root password screen!
A root console appears directly.
kernel 6.8.11-1 manjaro 64 bit
On Fri, May 31, 2024 at 4:16 AM Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> s390/vdso: Create .build-id links for unstripped vdso files
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> s390-vdso-create-.build-id-links-for-unstripped-vdso.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit e5fcc928a0c3c2fe5e6e2c66a16b7d67334fac02
> Author: Jens Remus <jremus(a)linux.ibm.com>
> Date: Mon Apr 29 17:02:53 2024 +0200
>
> s390/vdso: Create .build-id links for unstripped vdso files
>
> [ Upstream commit fc2f5f10f9bc5e58d38e9fda7dae107ac04a799f ]
>
> Citing Andy Lutomirski from commit dda1e95cee38 ("x86/vdso: Create
> .build-id links for unstripped vdso files"):
>
> "With this change, doing 'make vdso_install' and telling gdb:
>
> set debug-file-directory /lib/modules/KVER/vdso
>
> will enable vdso debugging with symbols. This is useful for
> testing, but kernel RPM builds will probably want to manually delete
> these symlinks or otherwise do something sensible when they strip
> the vdso/*.so files."
>
> Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO")
I doubt this Fixes tag.
I do not think this is a bug fix.
Its prerequisites are not suitable for stable either.
> Signed-off-by: Jens Remus <jremus(a)linux.ibm.com>
> Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/scripts/Makefile.vdsoinst b/scripts/Makefile.vdsoinst
> index c477d17b0aa5b..a81ca735003e4 100644
> --- a/scripts/Makefile.vdsoinst
> +++ b/scripts/Makefile.vdsoinst
> @@ -21,7 +21,7 @@ $$(dest): $$(src) FORCE
> $$(call cmd,install)
>
> # Some architectures create .build-id symlinks
> -ifneq ($(filter arm sparc x86, $(SRCARCH)),)
> +ifneq ($(filter arm s390 sparc x86, $(SRCARCH)),)
> link := $(install-dir)/.build-id/$$(shell $(READELF) -n $$(src) | sed -n 's(a)^.*Build ID: \(..\)\(.*\)@\1/\2@p').debug
>
> __default: $$(link)
--
Best Regards
Masahiro Yamada
Hi all,
I have noticed strange messages in kernel version 6.9, obviously from CPU topology
detection, which were not present in 6.8.y and earlier kernels.
This is coming from an older server machine: 2-socket Ivy Bridge Xeon E5-2697 v2 (24C/48T)
in an Asus Z9PE-D16/2L motherboard (Intel C-602A chipset); BIOS patched to the latest
available from Asus. All memory slots occupied, so 256 GB RAM in total.
From a "good boot", e.g. kernel 6.8.11, dmesg output looks like this:
[ 1.823797] smpboot: x86: Booting SMP configuration:
[ 1.823799] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11
[ 1.827514] .... node #1, CPUs: #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23
[ 0.011462] smpboot: CPU 12 Converting physical 0 to logical die 1
[ 1.875532] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35
[ 1.882453] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[ 1.887532] MDS CPU bug present and SMT on, data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 1.933640] smp: Brought up 2 nodes, 48 CPUs
[ 1.933640] smpboot: Max logical packages: 2
[ 1.933640] smpboot: Total of 48 processors activated (259199.61 BogoMIPS)
From a "bad" boot, e.g. kernel 6.9.2, dmesg output has these messages in it:
[ 1.785937] smpboot: x86: Booting SMP configuration:
[ 1.785939] .... node #0, CPUs: #4
[ 1.786215] .... node #1, CPUs: #12 #16
[ 1.793547] MDS CPU bug present and SMT on, data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 1.797547] .... node #0, CPUs: #1 #2 #3 #5 #6 #7 #8 #9 #10 #11
[ 1.801858] .... node #1, CPUs: #13 #14 #15 #17 #18 #19 #20 #21 #22 #23
[ 1.804687] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35
[ 1.810728] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[ 1.901547] smp: Brought up 2 nodes, 48 CPUs
[ 1.901547] smpboot: Total of 48 processors activated (259207.87 BogoMIPS)
[ 1.903803] BUG: arch topology borken
[ 1.903879] the SMT domain not a subset of the CLS domain
[ 1.903970] BUG: arch topology borken
[ 1.904040] the SMT domain not a subset of the CLS domain
[ 1.904128] BUG: arch topology borken
[ 1.904198] the SMT domain not a subset of the CLS domain
... and this "BUG" and the following line repeat 48 times which is the number of logical
CPUs this machine has. Also, there is a funny typo in the message, but that might be
intended, I guess?! Moreover I noticed, from node #1, CPU #12 detection message is
missing, so the counting maybe wrong?!
However the machine boots, and except from these strange messages, I cannot detect any
other abnormal behaviour. It is running ~15 QEMU/KVM virtual machines just fine. Because
these messages look unusual and a bit scary though, I have bisected the issue, to be able
to report it here. The first bad commit I found is this one:
22d63660c35eb751c63a709bf901a64c1726592a is the first bad commit
commit 22d63660c35eb751c63a709bf901a64c1726592a
Author: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue Feb 13 22:04:08 2024 +0100
x86/cpu: Use common topology code for Intel
Intel CPUs use either topology leaf 0xb/0x1f evaluation or the legacy
SMP/HT evaluation based on CPUID leaf 0x1/0x4.
Move it over to the consolidated topology code and remove the random
topology hacks which are sprinkled into the Intel and the common code.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Juergen Gross <jgross(a)suse.com>
Tested-by: Sohil Mehta <sohil.mehta(a)intel.com>
Tested-by: Michael Kelley <mhklinux(a)outlook.com>
Tested-by: Zhang Rui <rui.zhang(a)intel.com>
Tested-by: Wang Wendy <wendy.wang(a)intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
Link: https://lore.kernel.org/r/20240212153624.893644349@linutronix.de
arch/x86/kernel/cpu/common.c | 65 -----------------------------------
arch/x86/kernel/cpu/cpu.h | 4 ---
arch/x86/kernel/cpu/intel.c | 25 --------------
arch/x86/kernel/cpu/topology.c | 22 ------------
arch/x86/kernel/cpu/topology_common.c | 5 ++-
5 files changed, 4 insertions(+), 117 deletions(-)
root@linus:/usr/src/linux#
I attach my bisect log, and full dmesg output from a good and from a bad kernel version.
Moreover, the last 3 bad kernels from my bisect session did not boot at all, including the
one with commit SHA1 from the first bad commit above. These kernels also had the series of
"BUG" messages scrolling through on the console, and then additionally a kernel panic,
seemingly coming from a divide exception from function init_intel_microcode:
<5>[ 5.968685] Key type dns_resolver registered
<4>[ 5.974402] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
<4>[ 5.977017] divide error: 0000 [#1] PREEMPT SMP PTI
<4>[ 5.977116] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4+ #1
<4>[ 5.977213] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D16 Series/Z9PE-D16 Series,
BIOS 5601 06/11/2015
<4>[ 5.977337] RIP: 0010:init_intel_microcode+0x3c/0x80
<4>[ 5.977436] Code: ff 75 44 40 80 fe 05 76 3e 48 8b 05 b6 45 f7 ff a9 00 00 00 40 75
30 8b 05 85 46 f7 ff 0f b7 0d aa 46 f7 ff 31 d2 48 c1 e0 0a <48> f7 f1 89 05 9b f9 46 ff
48 c7 c0 c0 98 e4 a8 31 d2 31 c9 31 f6
<4>[ 5.977602] RSP: 0000:ffffb79b8008fd80 EFLAGS: 00010206
<4>[ 5.977697] RAX: 0000000001e00000 RBX: 0000000000000000 RCX: 0000000000000000
<4>[ 5.977795] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
<4>[ 5.977894] RBP: ffffb79b8008fdf8 R08: 0000000000000000 R09: 0000000000000000
<4>[ 5.977992] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[ 5.978090] R13: 000000000000019a R14: ffffb79b8008fe08 R15: ffff96ad4026cf00
<4>[ 5.978187] FS: 0000000000000000(0000) GS:ffff96cc3fa40000(0000) knlGS:0000000000000000
<4>[ 5.978308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 5.978402] CR2: 0000000000000000 CR3: 0000000e6d236001 CR4: 00000000001706f0
<4>[ 5.978500] Call Trace:
<4>[ 5.978588] <TASK>
<4>[ 5.978675] ? show_regs+0x6d/0x80
<4>[ 5.978767] ? die+0x37/0xa0
<4>[ 5.978857] ? do_trap+0xd4/0xf0
<4>[ 5.978948] ? do_error_trap+0x71/0xb0
<4>[ 5.979040] ? init_intel_microcode+0x3c/0x80
<4>[ 5.979131] ? exc_divide_error+0x3a/0x70
<4>[ 5.979226] ? init_intel_microcode+0x3c/0x80
<4>[ 5.979317] ? asm_exc_divide_error+0x1b/0x20
<4>[ 5.979427] ? init_intel_microcode+0x3c/0x80
<4>[ 5.979520] ? microcode_init+0x196/0x260
<4>[ 5.979612] ? __pfx_microcode_init+0x10/0x10
<4>[ 5.979718] do_one_initcall+0x5e/0x340
<4>[ 5.979813] kernel_init_freeable+0x322/0x490
<4>[ 5.979906] ? __pfx_kernel_init+0x10/0x10
<4>[ 5.979998] kernel_init+0x1b/0x200
<4>[ 5.980089] ret_from_fork+0x47/0x70
<4>[ 5.980180] ? __pfx_kernel_init+0x10/0x10
<4>[ 5.980272] ret_from_fork_asm+0x1b/0x30
<4>[ 5.980364] </TASK>
<4>[ 5.980450] Modules linked in:
<4>[ 5.980544] ---[ end trace 0000000000000000 ]---
<4>[ 6.959943] RIP: 0010:init_intel_microcode+0x3c/0x80
<4>[ 6.960041] Code: ff 75 44 40 80 fe 05 76 3e 48 8b 05 b6 45 f7 ff a9 00 00 00 40 75
30 8b 05 85 46 f7 ff 0f b7 0d aa 46 f7 ff 31 d2 48 c1 e0 0a <48> f7 f1 89 05 9b f9 46 ff
48 c7 c0 c0 98 e4 a8 31 d2 31 c9 31 f6
<4>[ 6.960207] RSP: 0000:ffffb79b8008fd80 EFLAGS: 00010206
<4>[ 6.960316] RAX: 0000000001e00000 RBX: 0000000000000000 RCX: 0000000000000000
<4>[ 6.960414] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
<4>[ 6.960512] RBP: ffffb79b8008fdf8 R08: 0000000000000000 R09: 0000000000000000
<4>[ 6.960610] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[ 6.960708] R13: 000000000000019a R14: ffffb79b8008fe08 R15: ffff96ad4026cf00
<4>[ 6.960806] FS: 0000000000000000(0000) GS:ffff96cc3fa40000(0000) knlGS:0000000000000000
<4>[ 6.960927] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 6.961021] CR2: 0000000000000000 CR3: 0000000e6d236001 CR4: 00000000001706f0
<0>[ 6.961120] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
<0>[ 6.961312] Kernel Offset: 0x25c00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
I also attached full dmesg log file "dmesg-erst-7373208397568540677" of this panic which I
could find in /var/lib/systemd/pstore.
Beste Grüße,
Peter Schneider
--
Climb the mountain not to plant your flag, but to embrace the challenge,
enjoy the air and behold the view. Climb it so you can see the world,
not so the world can see you. -- David McCullough Jr.
OpenPGP: 0xA3828BD796CCE11A8CADE8866E3A92C92C3FF244
Download: https://www.peters-netzplatz.de/download/pschneider1968_pub.aschttps://keys.mailvelope.com/pks/lookup?op=get&search=pschneider1968@googlem…https://keys.mailvelope.com/pks/lookup?op=get&search=pschneider1968@gmail.c…
LPM consists of HIPM (host initiated power management) and DIPM
(device initiated power management).
ata_eh_set_lpm() will only enable HIPM if both the HBA and the device
supports it.
However, DIPM will be enabled as long as the device supports it.
The HBA will later reject the device's request to enter a power state
that it does not support (Slumber/Partial/DevSleep) (DevSleep is never
initiated by the device).
For a HBA that doesn't support any LPM states, simply don't set a LPM
policy such that all the HIPM/DIPM probing/enabling will be skipped.
Not enabling HIPM or DIPM in the first place is safer than relying on
the device following the AHCI specification and respecting the NAK.
(There are comments in the code that some devices misbehave when
receiving a NAK.)
Performing this check in ahci_update_initial_lpm_policy() also has the
advantage that a HBA that doesn't support any LPM states will take the
exact same code paths as a port that is external/hot plug capable.
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Cc: stable(a)vger.kernel.org
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
We have not received any bug reports with this.
The devices that were quirked recently all supported both Partial and
Slumber.
This is more a defensive action, as it seems unnecessary to enable DIPM
in the first place, if the HBA doesn't support any LPM states.
drivers/ata/ahci.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 07d66d2c5f0d..214de08de642 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1735,6 +1735,12 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap)
if (ap->pflags & ATA_PFLAG_EXTERNAL)
return;
+ /* If no LPM states are supported by the HBA, do not bother with LPM */
+ if ((ap->host->flags & ATA_HOST_NO_PART) &&
+ (ap->host->flags & ATA_HOST_NO_SSC) &&
+ (ap->host->flags & ATA_HOST_NO_DEVSLP))
+ return;
+
/* user modified policy via module param */
if (mobile_lpm_policy != -1) {
policy = mobile_lpm_policy;
--
2.45.1
The quilt patch titled
Subject: mm/vmalloc: fix vbq->free breakage
has been removed from the -mm tree. Its filename was
mm-vmalloc-fix-vbq-free-breakage.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: "hailong.liu" <hailong.liu(a)oppo.com>
Subject: mm/vmalloc: fix vbq->free breakage
Date: Thu, 30 May 2024 17:31:08 +0800
The function xa_for_each() in _vm_unmap_aliases() loops through all vbs.
However, since commit 062eacf57ad9 ("mm: vmalloc: remove a global
vmap_blocks xarray") the vb from xarray may not be on the corresponding
CPU vmap_block_queue. Consequently, purge_fragmented_block() might use
the wrong vbq->lock to protect the free list, leading to vbq->free
breakage.
Link: https://lkml.kernel.org/r/20240530093108.4512-1-hailong.liu@oppo.com
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Hailong.Liu <liuhailong(a)oppo.com>
Reported-by: Guangye Yang <guangye.yang(a)mediatek.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Gao Xiang <xiang(a)kernel.org>
Cc: Guangye Yang <guangye.yang(a)mediatek.com>
Cc: liuhailong <liuhailong(a)oppo.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-fix-vbq-free-breakage
+++ a/mm/vmalloc.c
@@ -2830,10 +2830,9 @@ static void _vm_unmap_aliases(unsigned l
for_each_possible_cpu(cpu) {
struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
struct vmap_block *vb;
- unsigned long idx;
rcu_read_lock();
- xa_for_each(&vbq->vmap_blocks, idx, vb) {
+ list_for_each_entry_rcu(vb, &vbq->free, free_list) {
spin_lock(&vb->lock);
/*
_
Patches currently in -mm which might be from hailong.liu(a)oppo.com are
On SC7180, in host mode, it is observed that stressing out controller
in host mode results in HC died error and only restarting the host
mode fixes it. Disable SS instances in park mode for these targets to
avoid host controller being dead.
Reported-by: Doug Anderson <dianders(a)google.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes")
Signed-off-by: Krishna Kurapati <quic_kriskura(a)quicinc.com>
---
arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index 2b481e20ae38..cc93b5675d5d 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -3063,6 +3063,7 @@ usb_1_dwc3: usb@a600000 {
iommus = <&apps_smmu 0x540 0>;
snps,dis_u2_susphy_quirk;
snps,dis_enblslpm_quirk;
+ snps,parkmode-disable-ss-quirk;
phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
phy-names = "usb2-phy", "usb3-phy";
maximum-speed = "super-speed";
--
2.34.1
MSGF_LEG_MASK is laid out with INTA in bit 0, INTB in bit 1, INTC in bit
2, and INTD in bit 3. Hardware IRQ numbers start at 0, and we register
PCI_NUM_INTX irqs. So to enable INTA (aka hwirq 0) we should set bit 0.
Remove the subtraction of one.
This bug would cause legacy interrupts not to be delivered, as enabling
INTB would actually enable INTA, and enabling INTA wouldn't enable
anything at all. It is likely that this got overlooked for so long since
most PCIe hardware uses MSIs. This fixes the following UBSAN error:
UBSAN: shift-out-of-bounds in ../drivers/pci/controller/pcie-xilinx-nwl.c:389:11
shift exponent 18446744073709551615 is too large for 32-bit type 'int'
CPU: 1 PID: 61 Comm: kworker/u10:1 Not tainted 6.6.20+ #268
Hardware name: xlnx,zynqmp (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
show_stack (arch/arm64/kernel/stacktrace.c:242)
dump_stack_lvl (lib/dump_stack.c:107)
dump_stack (lib/dump_stack.c:114)
__ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:387)
nwl_unmask_leg_irq (drivers/pci/controller/pcie-xilinx-nwl.c:389 (discriminator 1))
irq_enable (kernel/irq/internals.h:234 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345)
__irq_startup (kernel/irq/internals.h:239 kernel/irq/chip.c:180 kernel/irq/chip.c:250)
irq_startup (kernel/irq/chip.c:270)
__setup_irq (kernel/irq/manage.c:1800)
request_threaded_irq (kernel/irq/manage.c:2206)
pcie_pme_probe (include/linux/interrupt.h:168 drivers/pci/pcie/pme.c:348)
<snip>
Fixes: 9a181e1093af ("PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
---
Changes in v4:
- Explain likely effects of the off-by-one error
- Trim down UBSAN backtrace
Changes in v3:
- Expand commit message
drivers/pci/controller/pcie-xilinx-nwl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
index 0408f4d612b5..437927e3bcca 100644
--- a/drivers/pci/controller/pcie-xilinx-nwl.c
+++ b/drivers/pci/controller/pcie-xilinx-nwl.c
@@ -371,7 +371,7 @@ static void nwl_mask_intx_irq(struct irq_data *data)
u32 mask;
u32 val;
- mask = 1 << (data->hwirq - 1);
+ mask = 1 << data->hwirq;
raw_spin_lock_irqsave(&pcie->leg_mask_lock, flags);
val = nwl_bridge_readl(pcie, MSGF_LEG_MASK);
nwl_bridge_writel(pcie, (val & (~mask)), MSGF_LEG_MASK);
@@ -385,7 +385,7 @@ static void nwl_unmask_intx_irq(struct irq_data *data)
u32 mask;
u32 val;
- mask = 1 << (data->hwirq - 1);
+ mask = 1 << data->hwirq;
raw_spin_lock_irqsave(&pcie->leg_mask_lock, flags);
val = nwl_bridge_readl(pcie, MSGF_LEG_MASK);
nwl_bridge_writel(pcie, (val | mask), MSGF_LEG_MASK);
--
2.35.1.1320.gc452695387.dirty
Recently, suspend testing on sc7180-trogdor based devices has started
to sometimes fail with messages like this:
port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0
port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16
port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs
port a88000.serial:0.0: PM: failed to suspend: error -16
I could reproduce these problems by logging in via an agetty on the
debug serial port (which was _not_ used for kernel console) and
running:
cat /var/log/messages
...and then (via an SSH session) forcing a few suspend/resume cycles.
Tracing through the code and doing some printf()-based debugging shows
that the -16 (-EBUSY) comes from the recently added
serial_port_runtime_suspend().
The idea of the serial_port_runtime_suspend() function is to prevent
the port from being _runtime_ suspended if it still has bytes left to
transmit. Having bytes left to transmit isn't a reason to block
_system_ suspend, though. If a serdev device in the kernel needs to
block system suspend it should block its own suspend and it can use
serdev_device_wait_until_sent() to ensure bytes are sent.
The DEFINE_RUNTIME_DEV_PM_OPS() used by the serial_port code means
that the system suspend function will be pm_runtime_force_suspend().
In pm_runtime_force_suspend() we can see that before calling the
runtime suspend function we'll call pm_runtime_disable(). This should
be a reliable way to detect that we're called from system suspend and
that we shouldn't look for busyness.
Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy")
Cc: stable(a)vger.kernel.org
Reviewed-by: Tony Lindgren <tony.lindgren(a)linux.intel.com>
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
---
In v1 [1] this was part of a 2-patch series. I'm now just sending this
patch on its own since the Qualcomm GENI serial driver has ended up
having a whole pile of problems that are taking a while to unravel.
It makes sense to disconnect the two efforts. The core problem fixed
by this patch and the geni problems never had any dependencies anyway.
[1] https://lore.kernel.org/r/20240523162207.1.I2395e66cf70c6e67d774c56943825c2…
Changes in v3:
- Adjust comment as per Tony Lindgren.
- Add Cc: stable.
Changes in v2:
- Fix "regulator" => "regular" in comment.
- Fix "PM Runtime" => "runtime PM" in comment.
- Commit messages says how serdev devices should ensure bytes xfered.
drivers/tty/serial/serial_port.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c
index 91a338d3cb34..d35f1d24156c 100644
--- a/drivers/tty/serial/serial_port.c
+++ b/drivers/tty/serial/serial_port.c
@@ -64,6 +64,13 @@ static int serial_port_runtime_suspend(struct device *dev)
if (port->flags & UPF_DEAD)
return 0;
+ /*
+ * Nothing to do on pm_runtime_force_suspend(), see
+ * DEFINE_RUNTIME_DEV_PM_OPS.
+ */
+ if (!pm_runtime_enabled(dev))
+ return 0;
+
uart_port_lock_irqsave(port, &flags);
if (!port_dev->tx_enabled) {
uart_port_unlock_irqrestore(port, flags);
--
2.45.1.288.g0e0cd299f1-goog
ich7_lpc_probe() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The error handling code assumes incorrectly it's a normal errno
and checks for < 0. The return code is returned from the probe function
as is but probe functions should return normal errnos.
Remove < 0 from the check and convert PCIBIOS_* returns code using
pcibios_err_to_errno() into normal errno before returning it.
Fixes: a328e95b82c1 ("leds: LED driver for Intel NAS SS4200 series (v5)")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/leds/leds-ss4200.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/leds/leds-ss4200.c b/drivers/leds/leds-ss4200.c
index fcaa34706b6c..2ef9fc7371bd 100644
--- a/drivers/leds/leds-ss4200.c
+++ b/drivers/leds/leds-ss4200.c
@@ -356,8 +356,10 @@ static int ich7_lpc_probe(struct pci_dev *dev,
nas_gpio_pci_dev = dev;
status = pci_read_config_dword(dev, PMBASE, &g_pm_io_base);
- if (status)
+ if (status) {
+ status = pcibios_err_to_errno(status);
goto out;
+ }
g_pm_io_base &= 0x00000ff80;
status = pci_read_config_dword(dev, GPIO_CTRL, &gc);
@@ -369,8 +371,9 @@ static int ich7_lpc_probe(struct pci_dev *dev,
}
status = pci_read_config_dword(dev, GPIO_BASE, &nas_gpio_io_base);
- if (0 > status) {
+ if (status) {
dev_info(&dev->dev, "Unable to read GPIOBASE.\n");
+ status = pcibios_err_to_errno(status);
goto out;
}
dev_dbg(&dev->dev, ": GPIOBASE = 0x%08x\n", nas_gpio_io_base);
--
2.39.2
This bug was found by syzkaller.
The reproducer and the detailed warning log can be viewed here [1]
[1] https://lore.kernel.org/bpf/20240129091746.260538-1-kovalev@altlinux.org/#t
Capability cap_sys_admin is required for reproduce and on kernels with panic_on_warn
enabled it will cause the system to crash.
v2:
Added an additional patch that fixes a build error by the clang compiler.
To solve the problem, it is proposed to backport the following commits:
[PATCH v2 5.10.y 1/2] bpf: Convert BPF_DISPATCHER to use static_call() (not
[PATCH v2 5.10.y 2/2] bpf: Add explicit cast to 'void *' for
This bug was found by syzkaller.
The reproducer and the detailed warning log can be viewed here [1]
[1] https://lore.kernel.org/bpf/20240129091746.260538-1-kovalev@altlinux.org/#t
Capability cap_sys_admin is required for reproduce and on kernels with panic_on_warn
enabled it will cause the system to crash.
v2:
Added an additional patch that fixes a build error by the clang compiler.
To solve the problem, it is proposed to backport the following commits:
[PATCH v2 5.15.y 1/2] bpf: Convert BPF_DISPATCHER to use static_call() (not
[PATCH v2 5.15.y 2/2] bpf: Add explicit cast to 'void *' for
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.
The problem is that Apacer AS340 drives do not handle low power modes
correctly. The problem was most likely not seen before because no one
had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Cc: stable(a)vger.kernel.org
Reported-by: Tim Teichmann <teichmanntim(a)outlook.de>
Closes: https://lore.kernel.org/linux-ide/87bk4pbve8.ffs@tglx/
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
On the system reporting this issue, the HBA supports SALP (HIPM) and
LPM states Partial and Slumber.
This drive only supports DIPM but not HIPM, however, that should not
matter, as a DIPM request from the device still has to be acked by the
HBA, and according to AHCI 1.3.1, section 5.3.2.11 P:Idle, if the link
layer has negotiated to low power state based on device power management
request, the HBA will jump to state PM:LowPower.
In PM:LowPower, the HBA will automatically request to wake the link
(exit from Partial/Slumber) when a new command is queued (by writing to
PxCI). Thus, there should be no need for host software to request an
explicit wakeup (by writing PxCMD.ICC to 1).
Therefore, even with only DIPM supported/enabled, we shouldn't see command
timeouts with the current code. Also, only enabling only DIPM (by
modifying the AHCI driver) with another drive (which support both DIPM
and HIPM), shows no errors. Thus, it seems like the drive is the problem.
drivers/ata/libata-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4f35aab81a0a..25b400f1c3de 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4155,6 +4155,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
ATA_HORKAGE_ZERO_AFTER_TRIM |
ATA_HORKAGE_NOLPM },
+ /* Apacer models with LPM issues */
+ { "Apacer AS340*", NULL, ATA_HORKAGE_NOLPM },
+
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM },
{ "SAMSUNG SSD PM830 mSATA *", "CXM13D1Q", ATA_HORKAGE_NOLPM },
--
2.45.1
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.
The problem is that AMD Radeon S3 SSD drives do not handle low power modes
correctly. The problem was most likely not seen before because no one
had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Cc: stable(a)vger.kernel.org
Reported-by: Doru Iorgulescu <doru.iorgulescu1(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/ata/libata-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4f35aab81a0a..6c4b69d34aa1 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4155,6 +4155,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
ATA_HORKAGE_ZERO_AFTER_TRIM |
ATA_HORKAGE_NOLPM },
+ /* AMD Radeon devices with broken LPM support */
+ { "R3SL240G", NULL, ATA_HORKAGE_NOLPM },
+
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM },
{ "SAMSUNG SSD PM830 mSATA *", "CXM13D1Q", ATA_HORKAGE_NOLPM },
--
2.45.1
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.
The problem is that Crucial CT240BX500SSD1 drives do not handle low power
modes correctly. The problem was most likely not seen before because no
one had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Cc: stable(a)vger.kernel.org
Reported-by: Aarrayy <lp610mh(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
On the system reporting this issue, the HBA supports SALP (HIPM) and
LPM states Partial and Slumber.
This drive only supports DIPM but not HIPM, however, that should not
matter, as a DIPM request from the device still has to be acked by the
HBA, and according to AHCI 1.3.1, section 5.3.2.11 P:Idle, if the link
layer has negotiated to low power state based on device power management
request, the HBA will jump to state PM:LowPower.
In PM:LowPower, the HBA will automatically request to wake the link
(exit from Partial/Slumber) when a new command is queued (by writing to
PxCI). Thus, there should be no need for host software to request an
explicit wakeup (by writing PxCMD.ICC to 1).
Therefore, even with only DIPM supported/enabled, we shouldn't see command
timeouts with the current code. Also, only enabling only DIPM (by
modifying the AHCI driver) with another drive (which support both DIPM
and HIPM), shows no errors. Thus, it seems like the drive is the problem.
drivers/ata/libata-core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4f35aab81a0a..b0ce621fe2a1 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4136,8 +4136,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
{ "PIONEER BD-RW BDR-207M", NULL, ATA_HORKAGE_NOLPM },
{ "PIONEER BD-RW BDR-205", NULL, ATA_HORKAGE_NOLPM },
- /* Crucial BX100 SSD 500GB has broken LPM support */
+ /* Crucial devices with broken LPM support */
{ "CT500BX100SSD1", NULL, ATA_HORKAGE_NOLPM },
+ { "CT240BX500SSD1", NULL, ATA_HORKAGE_NOLPM },
/* 512GB MX100 with MU01 firmware has both queued TRIM and LPM issues */
{ "Crucial_CT512MX100*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM |
--
2.45.1
The subordinate pointer can be null if we are out of bus number. The
function of_pci_prop_intr_map() deferences this pointer without checking
and may crashes the kernel.
This crash can be reproduced by starting a QEMU instance:
qemu-system-riscv64 -machine virt \
-kernel ../build-pci-riscv/arch/riscv/boot/Image \
-append "console=ttyS0 root=/dev/vda" \
-nographic \
-drive "file=root.img,format=raw,id=hd0" \
-device virtio-blk-device,drive=hd0 \
-device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=0 \
-device pcie-pci-bridge,id=br1,bus=rp1
Then hot-add a bridge with
device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
Then the kernel crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028
[snip]
[<ffffffff804dac82>] of_pci_prop_intr_map+0x104/0x362
[<ffffffff804db262>] of_pci_add_properties+0x382/0x3ca
[<ffffffff804c8228>] of_pci_make_dev_node+0xb6/0x116
[<ffffffff804a6b72>] pci_bus_add_device+0xa8/0xaa
[<ffffffff804a6ba4>] pci_bus_add_devices+0x30/0x6a
[<ffffffff804d3b5c>] shpchp_configure_device+0xa0/0x112
[<ffffffff804d2b3a>] board_added+0xce/0x354
[<ffffffff804d2e9a>] shpchp_enable_slot+0xda/0x30c
[<ffffffff804d336c>] shpchp_pushbutton_thread+0x84/0xa0
NULL check this pointer first before proceeding.
Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/pci/of_property.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c
index 5fb516807ba2..c405978a0b7e 100644
--- a/drivers/pci/of_property.c
+++ b/drivers/pci/of_property.c
@@ -199,6 +199,9 @@ static int of_pci_prop_intr_map(struct pci_dev *pdev, struct of_changeset *ocs,
int ret;
u8 pin;
+ if (!pdev->subordinate)
+ return 0;
+
pnode = pci_device_to_OF_node(pdev->bus->self);
if (!pnode)
pnode = pci_bus_to_OF_node(pdev->bus);
--
2.39.2
The subordinate pointer can be null if we are out of bus number. The
function of_pci_prop_bus_range() deferences this pointer without checking
and may crashes the kernel.
This crash can be reproduced by starting a QEMU instance:
qemu-system-riscv64 -machine virt \
-kernel ../build-pci-riscv/arch/riscv/boot/Image \
-append "console=ttyS0 root=/dev/vda" \
-nographic \
-drive "file=root.img,format=raw,id=hd0" \
-device virtio-blk-device,drive=hd0 \
-device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=0 \
-device pcie-pci-bridge,id=br1,bus=rp1
Then hot-add a bridge with
device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
Then the kernel crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
[snip]
[<ffffffff804db234>] of_pci_add_properties+0x34c/0x3c6
[<ffffffff804c8228>] of_pci_make_dev_node+0xb6/0x116
[<ffffffff804a6b72>] pci_bus_add_device+0xa8/0xaa
[<ffffffff804a6ba4>] pci_bus_add_devices+0x30/0x6a
[<ffffffff804d3b5c>] shpchp_configure_device+0xa0/0x112
[<ffffffff804d2b3a>] board_added+0xce/0x354
[<ffffffff804d2e9a>] shpchp_enable_slot+0xda/0x30c
[<ffffffff804d336c>] shpchp_pushbutton_thread+0x84/0xa0
NULL check this pointer first before proceeding.
Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/pci/of_property.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c
index c2c7334152bc..5fb516807ba2 100644
--- a/drivers/pci/of_property.c
+++ b/drivers/pci/of_property.c
@@ -91,6 +91,9 @@ static int of_pci_prop_bus_range(struct pci_dev *pdev,
struct of_changeset *ocs,
struct device_node *np)
{
+ if (!pdev->subordinate)
+ return 0;
+
u32 bus_range[] = { pdev->subordinate->busn_res.start,
pdev->subordinate->busn_res.end };
--
2.39.2
pci_dev->subordinate pointer can be NULL if we run out of bus number. The
driver deferences this pointer without checking, and the kernel crashes.
This crash can be reproduced by starting a QEMU instance:
qemu-system-x86_64 -machine pc-q35-2.10 \
-kernel bzImage \
-drive "file=img,format=raw" \
-m 2048 -smp 1 -enable-kvm \
-append "console=ttyS0 root=/dev/sda debug" \
-nographic \
-device pcie-root-port,bus=pcie.0,slot=1,id=rp1 \
-device pcie-pci-bridge,id=br1,bus=rp1
Then hot-add a bridge with the QEMU command:
device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
Then the kernel crashes:
shpchp 0000:02:01.0: enabling device (0000 -> 0002)
shpchp 0000:02:01.0: enabling bus mastering
BUG: kernel NULL pointer dereference, address: 00000000000000da
[snip]
Call Trace:
<TASK>
? show_regs+0x63/0x70
? __die+0x23/0x70
? page_fault_oops+0x17a/0x480
? shpc_init+0x3fb/0x9d0
? search_module_extables+0x4e/0x80
? shpc_init+0x3fb/0x9d0
? kernelmode_fixup_or_oops+0x9b/0x120
? __bad_area_nosemaphore+0x16e/0x240
? bad_area_nosemaphore+0x11/0x20
? do_user_addr_fault+0x2a3/0x610
? exc_page_fault+0x6d/0x160
? asm_exc_page_fault+0x2b/0x30
? shpc_init+0x3fb/0x9d0
shpc_probe+0x92/0x390
NULL check this pointer first before proceeding. If there is no
secondary bus number, there is no point in initializing this hot-plug
controller, so just bails out.
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org # all
---
This one exists since beginning of git history. So I didn't bother
with a Fixes: tag.
This patch is almost a copy-paste from pciehp
---
drivers/pci/hotplug/shpchp_core.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/pci/hotplug/shpchp_core.c b/drivers/pci/hotplug/shpchp_core.c
index 56c7795ed890..14cf9e894201 100644
--- a/drivers/pci/hotplug/shpchp_core.c
+++ b/drivers/pci/hotplug/shpchp_core.c
@@ -262,6 +262,12 @@ static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (acpi_get_hp_hw_control_from_firmware(pdev))
return -ENODEV;
+ if (!pdev->subordinate) {
+ /* Can happen if we run out of bus numbers during probe */
+ pci_err(pdev, "Hotplug bridge without secondary bus, ignoring\n");
+ return -ENODEV;
+ }
+
ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
if (!ctrl)
goto err_out_none;
--
2.39.2
In function pci_scan_bridge_extend(), if the variable next_busnr gets to
256, "child = pci_find_bus()" will return bus 0 (root bus). Consequently,
we have a circular PCI topology. The scan will then go in circle until the
kernel crashes due to stack overflow.
This can be reproduced with:
qemu-system-x86_64 -machine pc-q35-2.10 \
-kernel bzImage \
-m 2048 -smp 1 -enable-kvm \
-append "console=ttyS0 root=/dev/sda debug" \
-nographic \
-device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=253 \
-device pcie-root-port,bus=pcie.0,slot=2,id=rp2,bus-reserve=0 \
-device pcie-root-port,bus=pcie.0,slot=3,id=rp3,bus-reserve=0
Check if next_busnr "overflow" and bail out if this is the case.
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org # all
---
This bug exists since the beginning of git history. So I didn't bother
tracing beyond git to see which patch introduced this.
---
drivers/pci/probe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1325fbae2f28..03caae76337c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1382,6 +1382,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
else
next_busnr = max + 1;
+ if (next_busnr == 256)
+ goto out;
+
/*
* Prevent assigning a bus number that already exists.
* This can happen when a bridge is hot-plugged, so in this
--
2.39.2
On Thu, May 30, 2024 at 10:05 PM Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> ovl: add helper ovl_file_modified()
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> ovl-add-helper-ovl_file_modified.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit f87db32c0cdadc7eea4a37560867da0bd0bb87e8
> Author: Amir Goldstein <amir73il(a)gmail.com>
> Date: Wed Sep 27 13:43:44 2023 +0300
>
> ovl: add helper ovl_file_modified()
>
> [ Upstream commit c002728f608183449673818076380124935e6b9b ]
>
> A simple wrapper for updating ovl inode size/mtime, to conform
> with ovl_file_accessed().
>
> Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
> Stable-dep-of: 7c98f7cb8fda ("remove call_{read,write}_iter() functions")
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
No objection to this patch, except for the fact that I think it is not
in the best
interest of the stable tree to backport 7c98f7cb8fda as is.
I suggest that you consider backporting only the parts of 7c98f7cb8fda that
open code call_{read,write}_iter() in call sites (some or all), if you
need those
as dependencies but actually leave the wrappers in the stable tree.
If the bots selected 7c98f7cb8fda to stable because of the Fixes: tag,
then I think that Fixes: tag was misleading the stable bots in this case.
Thanks,
Amir.
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 8be4dc050d1ed..9fd88579bfbfb 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -235,6 +235,12 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
> return ret;
> }
>
> +static void ovl_file_modified(struct file *file)
> +{
> + /* Update size/mtime */
> + ovl_copyattr(file_inode(file));
> +}
> +
> static void ovl_file_accessed(struct file *file)
> {
> struct inode *inode, *upperinode;
> @@ -290,10 +296,8 @@ static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
> struct kiocb *orig_iocb = aio_req->orig_iocb;
>
> if (iocb->ki_flags & IOCB_WRITE) {
> - struct inode *inode = file_inode(orig_iocb->ki_filp);
> -
> kiocb_end_write(iocb);
> - ovl_copyattr(inode);
> + ovl_file_modified(orig_iocb->ki_filp);
> }
>
> orig_iocb->ki_pos = iocb->ki_pos;
> @@ -403,7 +407,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> ovl_iocb_to_rwf(ifl));
> file_end_write(real.file);
> /* Update size */
> - ovl_copyattr(inode);
> + ovl_file_modified(file);
> } else {
> struct ovl_aio_req *aio_req;
>
> @@ -489,7 +493,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
>
> file_end_write(real.file);
> /* Update size */
> - ovl_copyattr(inode);
> + ovl_file_modified(out);
> revert_creds(old_cred);
> fdput(real);
>
> @@ -570,7 +574,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> revert_creds(old_cred);
>
> /* Update size */
> - ovl_copyattr(inode);
> + ovl_file_modified(file);
>
> fdput(real);
>
> @@ -654,7 +658,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
> revert_creds(old_cred);
>
> /* Update size */
> - ovl_copyattr(inode_out);
> + ovl_file_modified(file_out);
>
> fdput(real_in);
> fdput(real_out);
On Thu, May 30, 2024 at 10:05 PM Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> fs: move kiocb_start_write() into vfs_iocb_iter_write()
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> fs-move-kiocb_start_write-into-vfs_iocb_iter_write.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 17f38d69e7960a2b346db04750b0e4ba867c0b83
> Author: Amir Goldstein <amir73il(a)gmail.com>
> Date: Wed Nov 22 14:27:12 2023 +0200
>
> fs: move kiocb_start_write() into vfs_iocb_iter_write()
>
> [ Upstream commit 6ae654392bb516a0baa47fed1f085d84e8cad739 ]
>
> In vfs code, sb_start_write() is usually called after the permission hook
> in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule,
> where kiocb_start_write() is called by its callers.
>
> Move kiocb_start_write() from the callers into vfs_iocb_iter_write()
> after the rw_verify_area() checks, to make them "start-write-safe".
>
> The semantics of vfs_iocb_iter_write() is changed, so that the caller is
> responsible for calling kiocb_end_write() on completion only if async
> iocb was queued. The completion handlers of both callers were adapted
> to this semantic change.
>
This comment about semantics change looks like a clue from my past
self that backporting this commit as standalone is risky.
This commit was part of a pretty big shuffle in splice and ovl code.
I'd feel much more comfortable with backporting the entire ovl
series 14ab6d425e8067..5b02bfc1e7e3 and splice series
v6.7..6ae654392bb51 than just 3 individual commits in the middle.
Thanks,
Amir.
> This is needed for fanotify "pre content" events.
>
> Suggested-by: Jan Kara <jack(a)suse.cz>
> Suggested-by: Josef Bacik <josef(a)toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
> Link: https://lore.kernel.org/r/20231122122715.2561213-14-amir73il@gmail.com
> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
> Reviewed-by: Jan Kara <jack(a)suse.cz>
> Signed-off-by: Christian Brauner <brauner(a)kernel.org>
> Stable-dep-of: 7c98f7cb8fda ("remove call_{read,write}_iter() functions")
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
> index 009d23cd435b5..5857241c59181 100644
> --- a/fs/cachefiles/io.c
> +++ b/fs/cachefiles/io.c
> @@ -259,7 +259,8 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
>
> _enter("%ld", ret);
>
> - kiocb_end_write(iocb);
> + if (ki->was_async)
> + kiocb_end_write(iocb);
>
> if (ret < 0)
> trace_cachefiles_io_error(object, inode, ret,
> @@ -319,8 +320,6 @@ int __cachefiles_write(struct cachefiles_object *object,
> ki->iocb.ki_complete = cachefiles_write_complete;
> atomic_long_add(ki->b_writing, &cache->b_writing);
>
> - kiocb_start_write(&ki->iocb);
> -
> get_file(ki->iocb.ki_filp);
> cachefiles_grab_object(object, cachefiles_obj_get_ioreq);
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 9fd88579bfbfb..a1c64c2b8e204 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -295,10 +295,8 @@ static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
> struct kiocb *iocb = &aio_req->iocb;
> struct kiocb *orig_iocb = aio_req->orig_iocb;
>
> - if (iocb->ki_flags & IOCB_WRITE) {
> - kiocb_end_write(iocb);
> + if (iocb->ki_flags & IOCB_WRITE)
> ovl_file_modified(orig_iocb->ki_filp);
> - }
>
> orig_iocb->ki_pos = iocb->ki_pos;
> ovl_aio_put(aio_req);
> @@ -310,6 +308,9 @@ static void ovl_aio_rw_complete(struct kiocb *iocb, long res)
> struct ovl_aio_req, iocb);
> struct kiocb *orig_iocb = aio_req->orig_iocb;
>
> + if (iocb->ki_flags & IOCB_WRITE)
> + kiocb_end_write(iocb);
> +
> ovl_aio_cleanup_handler(aio_req);
> orig_iocb->ki_complete(orig_iocb, res);
> }
> @@ -421,7 +422,6 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> aio_req->iocb.ki_flags = ifl;
> aio_req->iocb.ki_complete = ovl_aio_rw_complete;
> refcount_set(&aio_req->ref, 2);
> - kiocb_start_write(&aio_req->iocb);
> ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
> ovl_aio_put(aio_req);
> if (ret != -EIOCBQUEUED)
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 4771701c896ba..9a56949f3b8d1 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -865,6 +865,10 @@ static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
> return ret;
> }
>
> +/*
> + * Caller is responsible for calling kiocb_end_write() on completion
> + * if async iocb was queued.
> + */
> ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
> struct iov_iter *iter)
> {
> @@ -885,7 +889,10 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
> if (ret < 0)
> return ret;
>
> + kiocb_start_write(iocb);
> ret = call_write_iter(file, iocb, iter);
> + if (ret != -EIOCBQUEUED)
> + kiocb_end_write(iocb);
> if (ret > 0)
> fsnotify_modify(file);
>
On Thu, May 30, 2024 at 10:05 PM Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> splice: remove permission hook from iter_file_splice_write()
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> splice-remove-permission-hook-from-iter_file_splice_.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 9519e9d1e625d4f01b3c8a1c32042e3f5da53b0b
> Author: Amir Goldstein <amir73il(a)gmail.com>
> Date: Thu Nov 23 18:51:44 2023 +0100
>
> splice: remove permission hook from iter_file_splice_write()
>
> [ Upstream commit d53471ba6f7ae97a4e223539029528108b705af1 ]
>
> All the callers of ->splice_write(), (e.g. do_splice_direct() and
> do_splice()) already check rw_verify_area() for the entire range
> and perform all the other checks that are in vfs_write_iter().
>
Alas, that is incorrect for 6.6.y, because it depends on prior commit
ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()")
And in any case, this commit is part of a pretty hairy shuffle in splice code.
I'd feel much more comfortable with backporting the entire series
0db1d53937fa..6ae654392bb51 than just 3 individual commits in the
middle of the series.
I looked into it and ca7ab482401c does not apply cleanly to 6.6.y -
it depends on the ovl changes 14ab6d425e8067..5b02bfc1e7e3.
Not only for conflict, but also for correct locking order.
That amounts to quite a few non trivial ovl and splice patches,
so maybe you need to reconsider, but on the up side, all those ovl
and splice patches are actually very subtle bug fixes, so I cannot
say that they are not stable tree worthy.
There is also a coda commit that depends on this for conflict:
581a4d003001 coda: convert to new timestamp accessors
I did not check if it all compiles or works, only that it applies cleanly.
> Instead of creating another tiny helper for special caller, just
> open-code it.
>
> This is needed for fanotify "pre content" events.
>
> Suggested-by: Jan Kara <jack(a)suse.cz>
> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
> Link: https://lore.kernel.org/r/20231122122715.2561213-6-amir73il@gmail.com
> Signed-off-by: Christian Brauner <brauner(a)kernel.org>
> Stable-dep-of: 7c98f7cb8fda ("remove call_{read,write}_iter() functions")
Why would you want to backport this commit?
It hinders backporting work - it does not assist it.
Any new code that open codes call_{read,write}_iter() is not affected
by the existence of the helpers in stable kernels and any old code that
does use these helpers works as well.
Thanks,
Amir.
Hello,
Kernel 6.8 which is the default kernel of Ubundu 24.04 and other
distributions derived from it is at the end of its life for you !?!
As it provides better support for the Raspberry Pi 5 graphics card, I
hoped to be able to benefit from it under Raspberry Pi OS which still
uses the latest LTS kernel available.
But if you don't make kernel 6.8 an LTS kernel, for Ubuntu and its
derivatives, I will never be able to benefit from it on my RPi5 ;-(
In addition, I am afraid that OS derived from Ubuntu 24.04 LTS will
end up in difficulties due to this situation.
Best regards.
On riscv32, it is possible for the last page in virtual address space
(0xfffff000) to be allocated. This page overlaps with PTR_ERR, so that
shouldn't happen.
There is already some code to ensure memblock won't allocate the last page.
However, buddy allocator is left unchecked.
Fix this by reserving physical memory that would be mapped at virtual
addresses greater than 0xfffff000.
Reported-by: Björn Töpel <bjorn(a)kernel.org>
Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong…
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
---
arch/riscv/mm/init.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 968761843203..7c985435b3fc 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -235,18 +235,19 @@ static void __init setup_bootmem(void)
kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
/*
- * memblock allocator is not aware of the fact that last 4K bytes of
- * the addressable memory can not be mapped because of IS_ERR_VALUE
- * macro. Make sure that last 4k bytes are not usable by memblock
- * if end of dram is equal to maximum addressable memory. For 64-bit
- * kernel, this problem can't happen here as the end of the virtual
- * address space is occupied by the kernel mapping then this check must
- * be done as soon as the kernel mapping base address is determined.
+ * Reserve physical address space that would be mapped to virtual
+ * addresses greater than (void *)(-PAGE_SIZE) because:
+ * - This memory would overlap with ERR_PTR
+ * - This memory belongs to high memory, which is not supported
+ *
+ * This is not applicable to 64-bit kernel, because virtual addresses
+ * after (void *)(-PAGE_SIZE) are not linearly mapped: they are
+ * occupied by kernel mapping. Also it is unrealistic for high memory
+ * to exist on 64-bit platforms.
*/
if (!IS_ENABLED(CONFIG_64BIT)) {
- max_mapped_addr = __pa(~(ulong)0);
- if (max_mapped_addr == (phys_ram_end - 1))
- memblock_set_current_limit(max_mapped_addr - 4096);
+ max_mapped_addr = __va_to_pa_nodebug(-PAGE_SIZE);
+ memblock_reserve(max_mapped_addr, (phys_addr_t)-max_mapped_addr);
}
min_low_pfn = PFN_UP(phys_ram_base);
--
2.39.2
HAVE_ARCH_HUGE_VMAP also works on XIP kernel, so remove its dependency on
!XIP_KERNEL.
This also fixes a boot problem for XIP kernel introduced by the commit in
"Fixes:". This commit used huge page mapping for vmemmap, but huge page
vmap was not enabled for XIP kernel.
Fixes: ff172d4818ad ("riscv: Use hugepage mappings for vmemmap")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
---
This patch replaces:
https://patchwork.kernel.org/project/linux-riscv/patch/20240508173116.28661…
arch/riscv/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index b94176e25be1..0525ee2d63c7 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -106,7 +106,7 @@ config RISCV
select HAS_IOPORT if MMU
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
- select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT && !XIP_KERNEL
+ select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE if !XIP_KERNEL
select HAVE_ARCH_KASAN if MMU && 64BIT
--
2.39.2
Kernel VM do not have an Xe file. Include a check for Xe file in the VM
before trying to get pid from VM's Xe file when taking a devcoredump.
Fixes: b10d0c5e9df7 ("drm/xe: Add process name to devcoredump")
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost(a)intel.com>
---
drivers/gpu/drm/xe/xe_devcoredump.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 1643d44f8bc4..6f63b8e4e3b9 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -176,7 +176,7 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
ss->snapshot_time = ktime_get_real();
ss->boot_time = ktime_get_boottime();
- if (q->vm) {
+ if (q->vm && q->vm->xef) {
task = get_pid_task(q->vm->xef->drm->pid, PIDTYPE_PID);
if (task)
process_name = task->comm;
--
2.34.1
The patch titled
Subject: ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-null-pointer-dereference-in-ocfs2_journal_dirty.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
Date: Thu, 30 May 2024 19:06:29 +0800
bdev->bd_super has been removed and commit 8887b94d9322 change the usage
from bdev->bd_super to b_assoc_map->host->i_sb. This introduces the
following NULL pointer dereference in ocfs2_journal_dirty() since
b_assoc_map is still not initialized. This can be easily reproduced by
running xfstests generic/186, which simulate no more credits.
[ 134.351592] BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 134.355341] RIP: 0010:ocfs2_journal_dirty+0x14f/0x160 [ocfs2]
...
[ 134.365071] Call Trace:
[ 134.365312] <TASK>
[ 134.365524] ? __die_body+0x1e/0x60
[ 134.365868] ? page_fault_oops+0x13d/0x4f0
[ 134.366265] ? __pfx_bit_wait_io+0x10/0x10
[ 134.366659] ? schedule+0x27/0xb0
[ 134.366981] ? exc_page_fault+0x6a/0x140
[ 134.367356] ? asm_exc_page_fault+0x26/0x30
[ 134.367762] ? ocfs2_journal_dirty+0x14f/0x160 [ocfs2]
[ 134.368305] ? ocfs2_journal_dirty+0x13d/0x160 [ocfs2]
[ 134.368837] ocfs2_create_new_meta_bhs.isra.51+0x139/0x2e0 [ocfs2]
[ 134.369454] ocfs2_grow_tree+0x688/0x8a0 [ocfs2]
[ 134.369927] ocfs2_split_and_insert.isra.67+0x35c/0x4a0 [ocfs2]
[ 134.370521] ocfs2_split_extent+0x314/0x4d0 [ocfs2]
[ 134.371019] ocfs2_change_extent_flag+0x174/0x410 [ocfs2]
[ 134.371566] ocfs2_add_refcount_flag+0x3fa/0x630 [ocfs2]
[ 134.372117] ocfs2_reflink_remap_extent+0x21b/0x4c0 [ocfs2]
[ 134.372994] ? inode_update_timestamps+0x4a/0x120
[ 134.373692] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2]
[ 134.374545] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2]
[ 134.375393] ocfs2_reflink_remap_blocks+0xe4/0x4e0 [ocfs2]
[ 134.376197] ocfs2_remap_file_range+0x1de/0x390 [ocfs2]
[ 134.376971] ? security_file_permission+0x29/0x50
[ 134.377644] vfs_clone_file_range+0xfe/0x320
[ 134.378268] ioctl_file_clone+0x45/0xa0
[ 134.378853] do_vfs_ioctl+0x457/0x990
[ 134.379422] __x64_sys_ioctl+0x6e/0xd0
[ 134.379987] do_syscall_64+0x5d/0x170
[ 134.380550] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 134.381231] RIP: 0033:0x7fa4926397cb
[ 134.381786] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01 48
[ 134.383930] RSP: 002b:00007ffc2b39f7b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 134.384854] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa4926397cb
[ 134.385734] RDX: 00007ffc2b39f7f0 RSI: 000000004020940d RDI: 0000000000000003
[ 134.386606] RBP: 0000000000000000 R08: 00111a82a4f015bb R09: 00007fa494221000
[ 134.387476] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 134.388342] R13: 0000000000f10000 R14: 0000558e844e2ac8 R15: 0000000000f10000
[ 134.389207] </TASK>
Fix it by only aborting transaction and journal in ocfs2_journal_dirty()
now, and leave ocfs2_abort() later when detecting an aborted handle,
e.g. start next transaction. Also log the handle details in this case.
Link: https://lkml.kernel.org/r/20240530110630.3933832-1-joseph.qi@linux.alibaba.…
Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/journal.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/ocfs2/journal.c~ocfs2-fix-null-pointer-dereference-in-ocfs2_journal_dirty
+++ a/fs/ocfs2/journal.c
@@ -778,13 +778,15 @@ void ocfs2_journal_dirty(handle_t *handl
if (!is_handle_aborted(handle)) {
journal_t *journal = handle->h_transaction->t_journal;
- mlog(ML_ERROR, "jbd2_journal_dirty_metadata failed. "
- "Aborting transaction and journal.\n");
+ mlog(ML_ERROR, "jbd2_journal_dirty_metadata failed: "
+ "handle type %u started at line %u, credits %u/%u "
+ "errcode %d. Aborting transaction and journal.\n",
+ handle->h_type, handle->h_line_no,
+ handle->h_requested_credits,
+ jbd2_handle_buffer_credits(handle), status);
handle->h_err = status;
jbd2_journal_abort_handle(handle);
jbd2_journal_abort(journal, status);
- ocfs2_abort(bh->b_assoc_map->host->i_sb,
- "Journal already aborted.\n");
}
}
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-null-pointer-dereference-in-ocfs2_journal_dirty.patch
ocfs2-fix-null-pointer-dereference-in-ocfs2_abort_trigger.patch
Hello
There is a big performance bug (only affecting games performance) on Linux (not reproducible on Windows) with AM5 boards (at least for my MSI MAG b650 Tomahawk) if these BIOS settings are used:
CSM -> Enabled
Wi-Fi -> Disabled (or set to Bluetooth only)
This does not happen even on Win 7... (I've only tested DX12 games) and does not happen if UEFI mode is chosen. I've tested with many different BIOS versions all showing the same result.
I have tried troubleshooting with MSI with some benchmarks posted (https://forum-en.msi.com/index.php?threads/b650-tomahawk-bios-bug-disabling…) and after a week we realized this only happens on Linux (tested on Arch/Fedora/Ubuntu with 6.8 and 6.9 kernels for the first two).
Please investigate.
Thank you
The patch titled
Subject: nilfs2: fix potential kernel bug due to lack of writeback flag waiting
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-potential-kernel-bug-due-to-lack-of-writeback-flag-waiting.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix potential kernel bug due to lack of writeback flag waiting
Date: Thu, 30 May 2024 23:15:56 +0900
Destructive writes to a block device on which nilfs2 is mounted can cause
a kernel bug in the folio/page writeback start routine or writeback end
routine (__folio_start_writeback in the log below):
kernel BUG at mm/page-writeback.c:3070!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
...
RIP: 0010:__folio_start_writeback+0xbaa/0x10e0
Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff
e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f>
0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00
...
Call Trace:
<TASK>
nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2]
nilfs_segctor_construct+0x181/0x6b0 [nilfs2]
nilfs_segctor_thread+0x548/0x11c0 [nilfs2]
kthread+0x2f0/0x390
ret_from_fork+0x4b/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
This is because when the log writer starts a writeback for segment summary
blocks or a super root block that use the backing device's page cache, it
does not wait for the ongoing folio/page writeback, resulting in an
inconsistent writeback state.
Fix this issue by waiting for ongoing writebacks when putting
folios/pages on the backing device into writeback state.
Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 3 +++
1 file changed, 3 insertions(+)
--- a/fs/nilfs2/segment.c~nilfs2-fix-potential-kernel-bug-due-to-lack-of-writeback-flag-waiting
+++ a/fs/nilfs2/segment.c
@@ -1652,6 +1652,7 @@ static void nilfs_segctor_prepare_write(
if (bh->b_folio != bd_folio) {
if (bd_folio) {
folio_lock(bd_folio);
+ folio_wait_writeback(bd_folio);
folio_clear_dirty_for_io(bd_folio);
folio_start_writeback(bd_folio);
folio_unlock(bd_folio);
@@ -1665,6 +1666,7 @@ static void nilfs_segctor_prepare_write(
if (bh == segbuf->sb_super_root) {
if (bh->b_folio != bd_folio) {
folio_lock(bd_folio);
+ folio_wait_writeback(bd_folio);
folio_clear_dirty_for_io(bd_folio);
folio_start_writeback(bd_folio);
folio_unlock(bd_folio);
@@ -1681,6 +1683,7 @@ static void nilfs_segctor_prepare_write(
}
if (bd_folio) {
folio_lock(bd_folio);
+ folio_wait_writeback(bd_folio);
folio_clear_dirty_for_io(bd_folio);
folio_start_writeback(bd_folio);
folio_unlock(bd_folio);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-potential-kernel-bug-due-to-lack-of-writeback-flag-waiting.patch
On 5/24/2024 12:17 PM, David Howells wrote:
> Hi Christian,
>
> Can you pick this up, please?
>
> Thanks,
> David
> ---
> From: Marc Dionne<marc.dionne(a)auristor.com>
>
> afs: Don't cross .backup mountpoint from backup volume
>
> Don't cross a mountpoint that explicitly specifies a backup volume
> (target is <vol>.backup) when starting from a backup volume.
>
> It it not uncommon to mount a volume's backup directly in the volume
> itself. This can cause tools that are not paying attention to get
> into a loop mounting the volume onto itself as they attempt to
> traverse the tree, leading to a variety of problems.
>
> This doesn't prevent the general case of loops in a sequence of
> mountpoints, but addresses a common special case in the same way
> as other afs clients.
>
> Reported-by: Jan Henrik Sylvester<jan.henrik.sylvester(a)uni-hamburg.de>
> Link:http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html
> Reported-by: Markus Suvanto<markus.suvanto(a)gmail.com>
> Link:http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.ht…
> Signed-off-by: Marc Dionne<marc.dionne(a)auristor.com>
> Signed-off-by: David Howells<dhowells(a)redhat.com>
> Reviewed-by: Jeffrey Altman<jaltman(a)auristor.com>
> cc:linux-afs@lists.infradead.org
> ---
> fs/afs/mntpt.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
> index 97f50e9fd9eb..297487ee8323 100644
> --- a/fs/afs/mntpt.c
> +++ b/fs/afs/mntpt.c
> @@ -140,6 +140,11 @@ static int afs_mntpt_set_params(struct fs_context *fc, struct dentry *mntpt)
> put_page(page);
> if (ret < 0)
> return ret;
> +
> + /* Don't cross a backup volume mountpoint from a backup volume */
> + if (src_as->volume && src_as->volume->type == AFSVL_BACKVOL &&
> + ctx->type == AFSVL_BACKVOL)
> + return -ENODEV;
> }
>
> return 0;
Please add
cc: stable(a)vger.kernel.org
when it is applied to vfs-fixes.
Thank you.
Jeffrey Altman
On 5/24/2024 12:17 PM, David Howells wrote:
> Hi Christian,
>
> Can you pick this up, please?
>
> Thanks,
> David
> ---
> From: Marc Dionne<marc.dionne(a)auristor.com>
>
> afs: Don't cross .backup mountpoint from backup volume
>
> Don't cross a mountpoint that explicitly specifies a backup volume
> (target is <vol>.backup) when starting from a backup volume.
>
> It it not uncommon to mount a volume's backup directly in the volume
> itself. This can cause tools that are not paying attention to get
> into a loop mounting the volume onto itself as they attempt to
> traverse the tree, leading to a variety of problems.
>
> This doesn't prevent the general case of loops in a sequence of
> mountpoints, but addresses a common special case in the same way
> as other afs clients.
>
> Reported-by: Jan Henrik Sylvester<jan.henrik.sylvester(a)uni-hamburg.de>
> Link:http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html
> Reported-by: Markus Suvanto<markus.suvanto(a)gmail.com>
> Link:http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.ht…
> Signed-off-by: Marc Dionne<marc.dionne(a)auristor.com>
> Signed-off-by: David Howells<dhowells(a)redhat.com>
> Reviewed-by: Jeffrey Altman<jaltman(a)auristor.com>
> cc:linux-afs@lists.infradead.org
> ---
> fs/afs/mntpt.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
> index 97f50e9fd9eb..297487ee8323 100644
> --- a/fs/afs/mntpt.c
> +++ b/fs/afs/mntpt.c
> @@ -140,6 +140,11 @@ static int afs_mntpt_set_params(struct fs_context *fc, struct dentry *mntpt)
> put_page(page);
> if (ret < 0)
> return ret;
> +
> + /* Don't cross a backup volume mountpoint from a backup volume */
> + if (src_as->volume && src_as->volume->type == AFSVL_BACKVOL &&
> + ctx->type == AFSVL_BACKVOL)
> + return -ENODEV;
> }
>
> return 0;
Please add
cc: stable(a)vger.kernel.org
when it is applied to vfs-fixes.
Thank you.
Jeffrey Altman
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 2a38e4ca302280fdcce370ba2bee79bac16c4587
Gitweb: https://git.kernel.org/tip/2a38e4ca302280fdcce370ba2bee79bac16c4587
Author: Dave Hansen <dave.hansen(a)linux.intel.com>
AuthorDate: Fri, 17 May 2024 13:05:34 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Thu, 30 May 2024 08:29:45 -07:00
x86/cpu: Provide default cache line size if not enumerated
tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.
Long Story:
The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0. Normally, this value is populated from
c->x86_clflush_size.
Right now the code is set up to get c->x86_clflush_size from two
places. First, modern CPUs get it from CPUID. Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().
The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID. But there are oddballs.
Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:
cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
if (cap0 & (1<<19))
c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;
So they: land in get_cpu_address_sizes() and see that CPUID has level
0x80000008 and jump into the side of the if() that does not fill in
c->x86_clflush_size. That assigns a 0 to c->x86_cache_alignment, and
hilarity ensues in code like:
buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
GFP_KERNEL);
To fix this, always provide a sane value for ->x86_clflush_size.
Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs. It would not, for instance, have worked on
the QEMU config.
1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel…
2. You can also get this behavior if you use "-cpu 486,+clzero"
in QEMU.
[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog
because bpetkov brutally murdered it recently. ]
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Reported-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Tested-by: Jörn Heusipp <osmanx(a)heusipp.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linu…
Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.d…
Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.i…
---
arch/x86/kernel/cpu/common.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2b170da..e31293c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1075,6 +1075,10 @@ void get_cpu_address_sizes(struct cpuinfo_x86 *c)
c->x86_virt_bits = (eax >> 8) & 0xff;
c->x86_phys_bits = eax & 0xff;
+
+ /* Provide a sane default if not enumerated: */
+ if (!c->x86_clflush_size)
+ c->x86_clflush_size = 32;
}
c->x86_cache_bits = c->x86_phys_bits;
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b9210e56d71d9deb1ad692e405f6b2394f7baa4d
Gitweb: https://git.kernel.org/tip/b9210e56d71d9deb1ad692e405f6b2394f7baa4d
Author: Dave Hansen <dave.hansen(a)linux.intel.com>
AuthorDate: Fri, 17 May 2024 13:05:34 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Thu, 30 May 2024 07:14:27 -07:00
x86/cpu: Provide default cache line size if not enumerated
tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.
Long Story:
The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0. Normally, this value is populated from
c->x86_clflush_size.
Right now the code is set up to get c->x86_clflush_size from two
places. First, modern CPUs get it from CPUID. Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().
The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID. But there are oddballs.
Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:
cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
if (cap0 & (1<<19))
c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;
So they: land in get_cpu_address_sizes() and see that CPUID has level
0x80000008 and jump into the side of the if() that does not fill in
c->x86_clflush_size. That assigns a 0 to c->x86_cache_alignment, and
hilarity ensues in code like:
buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
GFP_KERNEL);
To fix this, always provide a sane value for ->x86_clflush_size.
Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs. It would not, for instance, have worked on
the QEMU config.
1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel…
2. You can also get this behavior if you use "-cpu 486,+clzero"
in QEMU.
[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog
because bpetkov brutally murdered it recently. ]
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Reported-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Tested-by: Jörn Heusipp <osmanx(a)heusipp.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linu…
Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.d…
Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.i…
---
arch/x86/kernel/cpu/common.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2b170da..373b16b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1070,6 +1070,10 @@ void get_cpu_address_sizes(struct cpuinfo_x86 *c)
cpu_has(c, X86_FEATURE_PSE36))
c->x86_phys_bits = 36;
}
+
+ /* Provide a sane default if not enumerated: */
+ if (!c->x86_clflush_size)
+ c->x86_clflush_size = 32;
} else {
cpuid(0x80000008, &eax, &ebx, &ecx, &edx);
From: Dave Hansen <dave.hansen(a)linux.intel.com>
tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.
Long Story:
The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0. Normally, this value is populated from
c->x86_clflush_size.
Right now the code is set up to get c->x86_clflush_size from two
places. First, modern CPUs get it from CPUID. Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().
The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID. But there are oddballs.
Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:
cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
if (cap0 & (1<<19))
c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;
So they: land in get_cpu_address_sizes(), set vp_bits_from_cpuid=0 and
never fill in c->x86_clflush_size, assign c->x86_cache_alignment, and
hilarity ensues in code like:
buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
GFP_KERNEL);
To fix this, always provide a sane value for ->x86_clflush_size.
Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs. It would not, for instance, have worked on
the QEMU config.
1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel…
2. You can also get this behavior if you use "-cpu 486,+clzero"
in QEMU.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reported-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linu…
Cc: stable(a)vger.kernel.org
---
b/arch/x86/kernel/cpu/common.c | 4 ++++
1 file changed, 4 insertions(+)
diff -puN arch/x86/kernel/cpu/common.c~default-x86_clflush_size arch/x86/kernel/cpu/common.c
--- a/arch/x86/kernel/cpu/common.c~default-x86_clflush_size 2024-05-17 12:51:25.886169008 -0700
+++ b/arch/x86/kernel/cpu/common.c 2024-05-17 13:03:09.761999885 -0700
@@ -1064,6 +1064,10 @@ void get_cpu_address_sizes(struct cpuinf
c->x86_virt_bits = (eax >> 8) & 0xff;
c->x86_phys_bits = eax & 0xff;
+
+ /* Provide a sane default if not enumerated: */
+ if (!c->x86_clflush_size)
+ c->x86_clflush_size = 32;
} else {
if (IS_ENABLED(CONFIG_X86_64)) {
c->x86_clflush_size = 64;
_
From: Bill Wendling <morbo(a)google.com>
Work for __counted_by on generic pointers in structures (not just
flexible array members) has started landing in Clang 19 (current tip of
tree). During the development of this feature, a restriction was added
to __counted_by to prevent the flexible array member's element type from
including a flexible array member itself such as:
struct foo {
int count;
char buf[];
};
struct bar {
int count;
struct foo data[] __counted_by(count);
};
because the size of data cannot be calculated with the standard array
size formula:
sizeof(struct foo) * count
This restriction was downgraded to a warning but due to CONFIG_WERROR,
it can still break the build. The application of __counted_by on the
states member of 'struct _StateArray' triggers this restriction,
resulting in:
drivers/gpu/drm/radeon/pptable.h:442:5: error: 'counted_by' should not be applied to an array with element of unknown size because 'ATOM_PPLIB_STATE_V2' (aka 'struct _ATOM_PPLIB_STATE_V2') is a struct type with a flexible array member. This will be an error in a future compiler version [-Werror,-Wbounds-safety-counted-by-elt-type-unknown-size]
442 | ATOM_PPLIB_STATE_V2 states[] __counted_by(ucNumEntries);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Remove this use of __counted_by to fix the warning/error. However,
rather than remove it altogether, leave it commented, as it may be
possible to support this in future compiler releases.
Cc: stable(a)vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2028
Fixes: efade6fe50e7 ("drm/radeon: silence UBSAN warning (v3)")
Signed-off-by: Bill Wendling <morbo(a)google.com>
Co-developed-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
drivers/gpu/drm/radeon/pptable.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/pptable.h b/drivers/gpu/drm/radeon/pptable.h
index b7f22597ee95..969a8fb0ee9e 100644
--- a/drivers/gpu/drm/radeon/pptable.h
+++ b/drivers/gpu/drm/radeon/pptable.h
@@ -439,7 +439,7 @@ typedef struct _StateArray{
//how many states we have
UCHAR ucNumEntries;
- ATOM_PPLIB_STATE_V2 states[] __counted_by(ucNumEntries);
+ ATOM_PPLIB_STATE_V2 states[] /* __counted_by(ucNumEntries) */;
}StateArray;
---
base-commit: e64e8f7c178e5228e0b2dbb504b9dc75953a319f
change-id: 20240529-drop-counted-by-radeon-states-state-array-01936ded4c18
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Hi all,
This series fixed some issues on bootloader - kernel
interface.
The first two fixed booting with devicetree, the last two
enhanced kernel's tolerance on different bootloader implementation.
Please review.
Thanks
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
Changes in v3:
- Polish all individual patches with comments received offline.
- Link to v2: https://lore.kernel.org/r/20240522-loongarch-booting-fixes-v2-0-727edb96e54…
Changes in v2:
- Enhance PATCH 3-4 based on off list discussions with Huacai & co.
- Link to v1: https://lore.kernel.org/r/20240521-loongarch-booting-fixes-v1-0-659c201c037…
---
Jiaxun Yang (4):
LoongArch: Fix built-in DTB detection
LoongArch: smp: Add all CPUs enabled by fdt to NUMA node 0
LoongArch: Fix entry point in image header
LoongArch: Override higher address bits in JUMP_VIRT_ADDR
arch/loongarch/include/asm/stackframe.h | 2 +-
arch/loongarch/kernel/head.S | 2 +-
arch/loongarch/kernel/setup.c | 6 ++++--
arch/loongarch/kernel/smp.c | 5 ++++-
arch/loongarch/kernel/vmlinux.lds.S | 2 ++
drivers/firmware/efi/libstub/loongarch.c | 2 +-
6 files changed, 13 insertions(+), 6 deletions(-)
---
base-commit: 124cfbcd6d185d4f50be02d5f5afe61578916773
change-id: 20240521-loongarch-booting-fixes-366e13e7ca55
Best regards,
--
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
From: Matthias Maennich <maennich(a)google.com>
Build environments might be running with different umask settings
resulting in indeterministic file modes for the files contained in
kheaders.tar.xz. The file itself is served with 444, i.e. world
readable. Archive the files explicitly with 744,a+X to improve
reproducibility across build environments.
--mode=0444 is not suitable as directories need to be executable. Also,
444 makes it hard to delete all the readonly files after extraction.
Cc: stable(a)vger.kernel.org
Cc: linux-kbuild(a)vger.kernel.org
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
kernel/gen_kheaders.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh
index 6d443ea22bb7..8b6e0c2bc0df 100755
--- a/kernel/gen_kheaders.sh
+++ b/kernel/gen_kheaders.sh
@@ -84,7 +84,7 @@ find $cpio_dir -type f -print0 |
# Create archive and try to normalize metadata for reproducibility.
tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
- --owner=0 --group=0 --sort=name --numeric-owner \
+ --owner=0 --group=0 --sort=name --numeric-owner --mode=u=rw,go=r,a+X \
-I $XZ -cf $tarfile -C $cpio_dir/ . > /dev/null
echo $headers_md5 > kernel/kheaders.md5
--
2.45.1.288.g0e0cd299f1-goog
Commit b8b8b4e0c052 ("ata: ahci: Add Intel Alder Lake-P AHCI controller
to low power chipsets list") added Intel Alder Lake to the ahci_pci_tbl.
Because of the way that the Intel PCS quirk was implemented, having
an explicit entry in the ahci_pci_tbl caused the Intel PCS quirk to
be applied. (The quirk was not being applied if there was no explict
entry.)
Thus, entries that were added to the ahci_pci_tbl also got the Intel
PCS quirk applied.
The quirk was cleaned up in commit 7edbb6059274 ("ahci: clean up
intel_pcs_quirk"), such that it is clear which entries that actually
applies the Intel PCS quirk.
Newer Intel AHCI controllers do not need the Intel PCS quirk,
and applying it when not needed actually breaks some platforms.
Do not apply the Intel PCS quirk for Intel Alder Lake.
This is in line with how things worked before commit b8b8b4e0c052 ("ata:
ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list"),
such that certain platforms using Intel Alder Lake will work once again.
Cc: stable(a)vger.kernel.org # 6.7
Fixes: b8b8b4e0c052 ("ata: ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list")
Signed-off-by: Jason Nader <dev(a)kayoway.com>
---
drivers/ata/ahci.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 6548f10e61d9..07d66d2c5f0d 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -429,7 +429,6 @@ static const struct pci_device_id ahci_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, 0x02d7), board_ahci_pcs_quirk }, /* Comet Lake PCH RAID */
/* Elkhart Lake IDs 0x4b60 & 0x4b62 https://sata-io.org/product/8803 not tested yet */
{ PCI_VDEVICE(INTEL, 0x4b63), board_ahci_pcs_quirk }, /* Elkhart Lake AHCI */
- { PCI_VDEVICE(INTEL, 0x7ae2), board_ahci_pcs_quirk }, /* Alder Lake-P AHCI */
/* JMicron 360/1/3/5/6, match class to avoid IDE function */
{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
--
2.45.1
On SC7280, in host mode, it is observed that stressing out controller
in host mode results in HC died error and only restarting the host
mode fixes it. Disable SS instances in park mode for these targets to
avoid host controller being dead.
Reported-by: Doug Anderson <dianders(a)google.com>
Cc: <stable(a)vger.kernel.org>
Fixes: bb9efa59c665 ("arm64: dts: qcom: sc7280: Add USB related nodes")
Signed-off-by: Krishna Kurapati <quic_kriskura(a)quicinc.com>
---
arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 41f51d326111..e0d3eeb6f639 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -4131,6 +4131,7 @@ usb_1_dwc3: usb@a600000 {
iommus = <&apps_smmu 0xe0 0x0>;
snps,dis_u2_susphy_quirk;
snps,dis_enblslpm_quirk;
+ snps,parkmode-disable-ss-quirk;
phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
phy-names = "usb2-phy", "usb3-phy";
maximum-speed = "super-speed";
--
2.34.1
amd_gpio_init() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is then returned as is but amd_gpio_init() is
a module init function that should return normal errnos.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from amd_gpio_init().
Fixes: f942a7de047d ("gpio: add a driver for GPIO pins found on AMD-8111 south bridge chips")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/gpio/gpio-amd8111.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-amd8111.c b/drivers/gpio/gpio-amd8111.c
index 6f3ded619c8b..3377667a28de 100644
--- a/drivers/gpio/gpio-amd8111.c
+++ b/drivers/gpio/gpio-amd8111.c
@@ -195,8 +195,10 @@ static int __init amd_gpio_init(void)
found:
err = pci_read_config_dword(pdev, 0x58, &gp.pmbase);
- if (err)
+ if (err) {
+ err = pcibios_err_to_errno(err);
goto out;
+ }
err = -EIO;
gp.pmbase &= 0x0000FF00;
if (gp.pmbase == 0)
--
2.39.2
Add support for device with USB ID 1e71:2020.
Fan speed control reported to be working with existing userspace (hidraw)
software, so I assume it's compatible. Fan channel count is the same.
No known differences from already supported devices, at least regarding
fan speed control and initialization.
Discovered in liquidctl project:
https://github.com/liquidctl/liquidctl/pull/702
Signed-off-by: Aleksandr Mezin <mezin.alexander(a)gmail.com>
Cc: stable(a)vger.kernel.org # v6.1+
---
v2: Improved the description, changed the subject to include device id
(previous subject was "hwmon: (nzxt-smart2) add another USB ID").
drivers/hwmon/nzxt-smart2.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/hwmon/nzxt-smart2.c b/drivers/hwmon/nzxt-smart2.c
index 7aa586eb74be..df6fa72a6b59 100644
--- a/drivers/hwmon/nzxt-smart2.c
+++ b/drivers/hwmon/nzxt-smart2.c
@@ -799,6 +799,7 @@ static const struct hid_device_id nzxt_smart2_hid_id_table[] = {
{ HID_USB_DEVICE(0x1e71, 0x2010) }, /* NZXT RGB & Fan Controller */
{ HID_USB_DEVICE(0x1e71, 0x2011) }, /* NZXT RGB & Fan Controller (6 RGB) */
{ HID_USB_DEVICE(0x1e71, 0x2019) }, /* NZXT RGB & Fan Controller (6 RGB) */
+ { HID_USB_DEVICE(0x1e71, 0x2020) }, /* NZXT RGB & Fan Controller (6 RGB) */
{},
};
--
2.45.1
The _scoped() version of the fwnode_for_each_available_child_node()
follows the approach recently taken for other loops that handle child
nodes like for_each_child_of_node_scoped() or
device_for_each_child_node_scoped(), which are based on the __free()
auto cleanup handler to remove the need for fwnode_handle_put() on
early loop exits.
This new variant has been tested with the LTC2992, which currently uses
the non-scoped variant. There is one error path that does not decrement
the refcount of the child node, which can be fixed by using the new
macro. The bug was introduced in a later modification of the loop, which
shows how useful an automatic cleanup solution can be in many uses of
the non-scoped version.
In order to provide a backportable patch, the conversion in the LTC2992
driver is carried out in two steps: first the missing
fwnode_handle_put() is added, and then the code is refactored to adopt
the new, safer approach.
@Andy Shevchenko: I kept your Reviewed-by in 3/3, that now also removes
the new fwnode_handle_put() and braces added with 1/3.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Changes in v2:
- Fix the memory leak in a backportable patch and tag it for stable.
- Refactor 1/3 with 3/3 as well.
- Link to v1: https://lore.kernel.org/r/20240522-fwnode_for_each_available_child_node_sco…
---
Javier Carrasco (3):
hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt()
device property: introduce fwnode_for_each_available_child_node_scoped()
hwmon: (ltc2992) Use fwnode_for_each_available_child_node_scoped()
drivers/hwmon/ltc2992.c | 11 +++--------
include/linux/property.h | 5 +++++
2 files changed, 8 insertions(+), 8 deletions(-)
---
base-commit: 124cfbcd6d185d4f50be02d5f5afe61578916773
change-id: 20240521-fwnode_for_each_available_child_node_scoped-8f1f09d3a10c
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
If a process module does not have base config extension then the same
format applies to all of it's inputs and the process->base_config_ext is
NULL, causing NULL dereference when specifically crafted topology and
sequences used.
Fixes: 648fea128476 ("ASoC: SOF: ipc4-topology: set copier output format for process module")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
Hi Mark,
Can you please pick this patch for 6.10 as it can fix a potential NULL
pointer dereference bug.
Thank you,
Peter
sound/soc/sof/ipc4-topology.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/sound/soc/sof/ipc4-topology.c b/sound/soc/sof/ipc4-topology.c
index beff10989324..33e8c5f7d9ca 100644
--- a/sound/soc/sof/ipc4-topology.c
+++ b/sound/soc/sof/ipc4-topology.c
@@ -217,6 +217,14 @@ sof_ipc4_get_input_pin_audio_fmt(struct snd_sof_widget *swidget, int pin_index)
}
process = swidget->private;
+
+ /*
+ * For process modules without base config extension, base module config
+ * format is used for all input pins
+ */
+ if (process->init_config != SOF_IPC4_MODULE_INIT_CONFIG_TYPE_BASE_CFG_WITH_EXT)
+ return &process->base_config.audio_fmt;
+
base_cfg_ext = process->base_config_ext;
/*
--
2.45.1
The SPMI GPIO driver assumes that the parent device is an SPMI device
and accesses random data when backcasting the parent struct device
pointer for non-SPMI devices.
Fortunately this does not seem to cause any issues currently when the
parent device is an I2C client like the PM8008, but this could change if
the structures are reorganised (e.g. using structure randomisation).
Notably the interrupt implementation is also broken for non-SPMI devices.
Also note that the two GPIO pins on PM8008 are used for interrupts and
reset so their practical use should be limited.
Drop the broken GPIO support for PM8008 for now.
Fixes: ea119e5a482a ("pinctrl: qcom-pmic-gpio: Add support for pm8008")
Cc: stable(a)vger.kernel.org # 5.13
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/pinctrl/qcom/pinctrl-spmi-gpio.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c b/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c
index f4e2c88a7c82..e61be7d05494 100644
--- a/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c
+++ b/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c
@@ -1206,7 +1206,6 @@ static const struct of_device_id pmic_gpio_of_match[] = {
{ .compatible = "qcom,pm7325-gpio", .data = (void *) 10 },
{ .compatible = "qcom,pm7550ba-gpio", .data = (void *) 8},
{ .compatible = "qcom,pm8005-gpio", .data = (void *) 4 },
- { .compatible = "qcom,pm8008-gpio", .data = (void *) 2 },
{ .compatible = "qcom,pm8019-gpio", .data = (void *) 6 },
/* pm8150 has 10 GPIOs with holes on 2, 5, 7 and 8 */
{ .compatible = "qcom,pm8150-gpio", .data = (void *) 10 },
--
2.43.2
commit bd11dc4fb969ec148e50cd87f88a78246dbc4d0b upstream.
SO_KEEPALIVE support has been added a while ago, as part of a series
"adding SOL_SOCKET" support. To have a full control of this keep-alive
feature, it is important to also support TCP_KEEP* socket options at the
SOL_TCP level.
Supporting them on the setsockopt() part is easy, it is just a matter of
remembering each value in the MPTCP sock structure, and calling
tcp_sock_set_keep*() helpers on each subflow. If the value is not
modified (0), calling these helpers will not do anything. For the
getsockopt() part, the corresponding value from the MPTCP sock structure
or the default one is simply returned. All of this is very similar to
other TCP_* socket options supported by MPTCP.
It looks important for kernels supporting SO_KEEPALIVE, to also support
TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
the former is supported, the latter ones will be supported as well. But
also, not having this simple and isolated change is preventing MPTCP
support in some apps, and libraries like GoLang [1]. This is why this
patch is seen as a fix.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/383
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Link: https://github.com/golang/go/issues/56539 [1]
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Mat Martineau <martineau(a)kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-3-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ Conflicts in the same context, because commit 29b5e5ef8739 ("mptcp:
implement TCP_NOTSENT_LOWAT support") (new feature), commit
013e3179dbd2 ("mptcp: fix rcv space initialization") (not backported
because of the various conflicts, and because the race fixed by this
commit "does not produce ill effects in practice"), and commit
4f6e14bd19d6 ("mptcp: support TCP_CORK and TCP_NODELAY") are not in
this version. The adaptations done by 7f71a337b515 ("mptcp: cleanup
SOL_TCP handling") have been adapted to this case here. Also,
TCP_KEEPINTVL and TCP_KEEPCNT value had to be set without lock, the
same way it was done on TCP side prior commit 6fd70a6b4e6f ("tcp: set
TCP_KEEPINTVL locklessly") and commit 84485080cbc1 ("tcp: set
TCP_KEEPCNT locklessly"). ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 115 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 118 insertions(+)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 78aa6125eafb..b4ccae4f6849 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -250,6 +250,9 @@ struct mptcp_sock {
bool use_64bit_ack; /* Set when we received a 64-bit DSN */
bool csum_enabled;
spinlock_t join_list_lock;
+ int keepalive_cnt;
+ int keepalive_idle;
+ int keepalive_intvl;
struct work_struct work;
struct sk_buff *ooo_last_skb;
struct rb_root out_of_order_queue;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index bd797d8f72ab..36d85af12e76 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -593,6 +593,60 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
return ret;
}
+static int __tcp_sock_set_keepintvl(struct sock *sk, int val)
+{
+ if (val < 1 || val > MAX_TCP_KEEPINTVL)
+ return -EINVAL;
+
+ WRITE_ONCE(tcp_sk(sk)->keepalive_intvl, val * HZ);
+
+ return 0;
+}
+
+static int __tcp_sock_set_keepcnt(struct sock *sk, int val)
+{
+ if (val < 1 || val > MAX_TCP_KEEPCNT)
+ return -EINVAL;
+
+ /* Paired with READ_ONCE() in keepalive_probes() */
+ WRITE_ONCE(tcp_sk(sk)->keepalive_probes, val);
+
+ return 0;
+}
+
+static int mptcp_setsockopt_set_val(struct mptcp_sock *msk, int max,
+ int (*set_val)(struct sock *, int),
+ int *msk_val, sockptr_t optval,
+ unsigned int optlen)
+{
+ struct mptcp_subflow_context *subflow;
+ struct sock *sk = (struct sock *)msk;
+ int val, err;
+
+ err = mptcp_get_int_option(msk, optval, optlen, &val);
+ if (err)
+ return err;
+
+ lock_sock(sk);
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ int ret;
+
+ lock_sock(ssk);
+ ret = set_val(ssk, val);
+ err = err ? : ret;
+ release_sock(ssk);
+ }
+
+ if (!err) {
+ *msk_val = val;
+ sockopt_seq_inc(msk);
+ }
+ release_sock(sk);
+
+ return err;
+}
+
static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -601,6 +655,21 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
return -EOPNOTSUPP;
case TCP_CONGESTION:
return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
+ case TCP_KEEPIDLE:
+ return mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPIDLE,
+ &tcp_sock_set_keepidle_locked,
+ &msk->keepalive_idle,
+ optval, optlen);
+ case TCP_KEEPINTVL:
+ return mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPINTVL,
+ &__tcp_sock_set_keepintvl,
+ &msk->keepalive_intvl,
+ optval, optlen);
+ case TCP_KEEPCNT:
+ return mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPCNT,
+ &__tcp_sock_set_keepcnt,
+ &msk->keepalive_cnt,
+ optval, optlen);
}
return -EOPNOTSUPP;
@@ -667,9 +736,40 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
return ret;
}
+static int mptcp_put_int_option(struct mptcp_sock *msk, char __user *optval,
+ int __user *optlen, int val)
+{
+ int len;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+ if (len < 0)
+ return -EINVAL;
+
+ if (len < sizeof(int) && len > 0 && val >= 0 && val <= 255) {
+ unsigned char ucval = (unsigned char)val;
+
+ len = 1;
+ if (put_user(len, optlen))
+ return -EFAULT;
+ if (copy_to_user(optval, &ucval, 1))
+ return -EFAULT;
+ } else {
+ len = min_t(unsigned int, len, sizeof(int));
+ if (put_user(len, optlen))
+ return -EFAULT;
+ if (copy_to_user(optval, &val, len))
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
char __user *optval, int __user *optlen)
{
+ struct sock *sk = (void *)msk;
+
switch (optname) {
case TCP_ULP:
case TCP_CONGESTION:
@@ -677,6 +777,18 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
case TCP_CC_INFO:
return mptcp_getsockopt_first_sf_only(msk, SOL_TCP, optname,
optval, optlen);
+ case TCP_KEEPIDLE:
+ return mptcp_put_int_option(msk, optval, optlen,
+ msk->keepalive_idle ? :
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_time) / HZ);
+ case TCP_KEEPINTVL:
+ return mptcp_put_int_option(msk, optval, optlen,
+ msk->keepalive_intvl ? :
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_intvl) / HZ);
+ case TCP_KEEPCNT:
+ return mptcp_put_int_option(msk, optval, optlen,
+ msk->keepalive_cnt ? :
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_probes));
}
return -EOPNOTSUPP;
}
@@ -746,6 +858,9 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
if (inet_csk(sk)->icsk_ca_ops != inet_csk(ssk)->icsk_ca_ops)
tcp_set_congestion_control(ssk, msk->ca_name, false, true);
+ tcp_sock_set_keepidle_locked(ssk, msk->keepalive_idle);
+ __tcp_sock_set_keepintvl(ssk, msk->keepalive_intvl);
+ __tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
}
static void __mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
--
2.43.0
It looks like the patch "mptcp: fix full TCP keep-alive support" has
been backported up to v6.8 recently (thanks!), but not before due to
conflicts.
I had to adapt a bit the code not to backport new features, but the
modifications were simple, and isolated from the rest. Conflicts have
been described in each patch.
MPTCP sockopts tests have been executed, and no issues have been
reported.
Matthieu Baerts (NGI0) (1):
mptcp: fix full TCP keep-alive support
Paolo Abeni (2):
mptcp: avoid some duplicate code in socket option handling
mptcp: cleanup SOL_TCP handling
net/mptcp/protocol.h | 3 +
net/mptcp/sockopt.c | 144 +++++++++++++++++++++++++++++++------------
2 files changed, 108 insertions(+), 39 deletions(-)
--
2.43.0
If an NTFS file system is mounted to another system with different
PAGE_SIZE from the original system, log->page_size will change in
log_replay(), but log->page_{mask,bits} don't change correspondingly.
This will cause a panic because "u32 bytes = log->page_size - page_off"
will get a negative value in the later read_log_page().
Cc: stable(a)vger.kernel.org
Fixes: b46acd6a6a627d876898e ("fs/ntfs3: Add NTFS journal")
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
fs/ntfs3/fslog.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c
index 855519713bf7..19c448093df7 100644
--- a/fs/ntfs3/fslog.c
+++ b/fs/ntfs3/fslog.c
@@ -3914,6 +3914,9 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
goto out;
}
+ log->page_mask = log->page_size - 1;
+ log->page_bits = blksize_bits(log->page_size);
+
/* If the file size has shrunk then we won't mount it. */
if (log->l_size < le64_to_cpu(ra2->l_size)) {
err = -EINVAL;
--
2.43.0
Hello,
I hope this note finds you well. I recently checked my account statement and observed a expense from your shop that I do not remember. The expense appears strange to me, and I would value your assistance in clarifying it.
Could you please offer clarification on this charge? Any information you can give would be greatly valued.
Thank you for your prompt response to this concern.
Best regards,
Linda Smith
The crypto_ahb and crypto_axi clks are hardware voteable.
This means that the halt bit isn't reliable because some
other voter in the system, e.g. TrustZone, could be keeping
the clk enabled when the kernel turns it off from clk_disable().
Make these clks use voting mode by changing the halt check to
BRANCH_HALT_VOTED and toggle the voting bit in the voting register
instead of directly controlling the branch by writing to the branch
register. This fixes stuck clk warnings seen on ipq9574 and saves
power by actually turning the clk off.
Also changes the CRYPTO_AHB_CLK_ENA & CRYPTO_AXI_CLK_ENA
offset to 0xb004 from 0x16014.
Cc: stable(a)vger.kernel.org
Fixes: f6b2bd9cb29a ("clk: qcom: gcc-ipq9574: Enable crypto clocks")
Signed-off-by: Md Sadre Alam <quic_mdalam(a)quicinc.com>
---
Change in [v3]
* Updated commit message
Change in [v2]
* Added Fixes tag and stable kernel tag
* updated commit message about offset change
drivers/clk/qcom/gcc-ipq9574.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/clk/qcom/gcc-ipq9574.c b/drivers/clk/qcom/gcc-ipq9574.c
index 0a3f846695b8..f8b9a1e93bef 100644
--- a/drivers/clk/qcom/gcc-ipq9574.c
+++ b/drivers/clk/qcom/gcc-ipq9574.c
@@ -2140,9 +2140,10 @@ static struct clk_rcg2 pcnoc_bfdcd_clk_src = {
static struct clk_branch gcc_crypto_axi_clk = {
.halt_reg = 0x16010,
+ .halt_check = BRANCH_HALT_VOTED,
.clkr = {
- .enable_reg = 0x16010,
- .enable_mask = BIT(0),
+ .enable_reg = 0xb004,
+ .enable_mask = BIT(15),
.hw.init = &(const struct clk_init_data) {
.name = "gcc_crypto_axi_clk",
.parent_hws = (const struct clk_hw *[]) {
@@ -2156,9 +2157,10 @@ static struct clk_branch gcc_crypto_axi_clk = {
static struct clk_branch gcc_crypto_ahb_clk = {
.halt_reg = 0x16014,
+ .halt_check = BRANCH_HALT_VOTED,
.clkr = {
- .enable_reg = 0x16014,
- .enable_mask = BIT(0),
+ .enable_reg = 0xb004,
+ .enable_mask = BIT(16),
.hw.init = &(const struct clk_init_data) {
.name = "gcc_crypto_ahb_clk",
.parent_hws = (const struct clk_hw *[]) {
--
2.34.1
The patch titled
Subject: mm/hugetlb: do not call vma_add_reservation upon ENOMEM
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-hugetlb-do-not-call-vma_add_reservation-upon-enomem.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: mm/hugetlb: do not call vma_add_reservation upon ENOMEM
Date: Tue, 28 May 2024 22:53:23 +0200
sysbot reported a splat [1] on __unmap_hugepage_range(). This is because
vma_needs_reservation() can return -ENOMEM if
allocate_file_region_entries() fails to allocate the file_region struct
for the reservation.
Check for that and do not call vma_add_reservation() if that is the case,
otherwise region_abort() and region_del() will see that we do not have any
file_regions.
If we detect that vma_needs_reservation() returned -ENOMEM, we clear the
hugetlb_restore_reserve flag as if this reservation was still consumed, so
free_huge_folio() will not increment the resv count.
[1] https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/T/…
Link: https://lkml.kernel.org/r/20240528205323.20439-1-osalvador@suse.de
Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-and-tested-by: syzbot+d3fe2dc5ffe9380b714b(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-do-not-call-vma_add_reservation-upon-enomem
+++ a/mm/hugetlb.c
@@ -5768,8 +5768,20 @@ void __unmap_hugepage_range(struct mmu_g
* do_exit() will not see it, and will keep the reservation
* forever.
*/
- if (adjust_reservation && vma_needs_reservation(h, vma, address))
- vma_add_reservation(h, vma, address);
+ if (adjust_reservation) {
+ int rc = vma_needs_reservation(h, vma, address);
+
+ if (rc < 0)
+ /* Pressumably allocate_file_region_entries failed
+ * to allocate a file_region struct. Clear
+ * hugetlb_restore_reserve so that global reserve
+ * count will not be incremented by free_huge_folio.
+ * Act as if we consumed the reservation.
+ */
+ folio_clear_hugetlb_restore_reserve(page_folio(page));
+ else if (rc)
+ vma_add_reservation(h, vma, address);
+ }
tlb_remove_page_size(tlb, page, huge_page_size(h));
/*
_
Patches currently in -mm which might be from osalvador(a)suse.de are
mm-hugetlb-do-not-call-vma_add_reservation-upon-enomem.patch
mm-hugetlb-drop-node_alloc_noretry-from-alloc_fresh_hugetlb_folio.patch
The patch titled
Subject: mm/ksm: fix ksm_zero_pages accounting
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-ksm-fix-ksm_zero_pages-accounting.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Chengming Zhou <chengming.zhou(a)linux.dev>
Subject: mm/ksm: fix ksm_zero_pages accounting
Date: Tue, 28 May 2024 13:15:22 +0800
We normally ksm_zero_pages++ in ksmd when page is merged with zero page,
but ksm_zero_pages-- is done from page tables side, where there is no any
accessing protection of ksm_zero_pages.
So we can read very exceptional value of ksm_zero_pages in rare cases,
such as -1, which is very confusing to users.
Fix it by changing to use atomic_long_t, and the same case with the
mm->ksm_zero_pages.
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.…
Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM")
Fixes: 6080d19f0704 ("ksm: add ksm zero pages for each process")
Signed-off-by: Chengming Zhou <chengming.zhou(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: xu xin <xu.xin16(a)zte.com.cn>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/base.c | 2 +-
include/linux/ksm.h | 17 ++++++++++++++---
include/linux/mm_types.h | 2 +-
mm/ksm.c | 11 +++++------
4 files changed, 21 insertions(+), 11 deletions(-)
--- a/fs/proc/base.c~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/fs/proc/base.c
@@ -3214,7 +3214,7 @@ static int proc_pid_ksm_stat(struct seq_
mm = get_task_mm(task);
if (mm) {
seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items);
- seq_printf(m, "ksm_zero_pages %lu\n", mm->ksm_zero_pages);
+ seq_printf(m, "ksm_zero_pages %ld\n", mm_ksm_zero_pages(mm));
seq_printf(m, "ksm_merging_pages %lu\n", mm->ksm_merging_pages);
seq_printf(m, "ksm_process_profit %ld\n", ksm_process_profit(mm));
mmput(mm);
--- a/include/linux/ksm.h~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/include/linux/ksm.h
@@ -33,16 +33,27 @@ void __ksm_exit(struct mm_struct *mm);
*/
#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte))
-extern unsigned long ksm_zero_pages;
+extern atomic_long_t ksm_zero_pages;
+
+static inline void ksm_map_zero_page(struct mm_struct *mm)
+{
+ atomic_long_inc(&ksm_zero_pages);
+ atomic_long_inc(&mm->ksm_zero_pages);
+}
static inline void ksm_might_unmap_zero_page(struct mm_struct *mm, pte_t pte)
{
if (is_ksm_zero_pte(pte)) {
- ksm_zero_pages--;
- mm->ksm_zero_pages--;
+ atomic_long_dec(&ksm_zero_pages);
+ atomic_long_dec(&mm->ksm_zero_pages);
}
}
+static inline long mm_ksm_zero_pages(struct mm_struct *mm)
+{
+ return atomic_long_read(&mm->ksm_zero_pages);
+}
+
static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
--- a/include/linux/mm_types.h~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/include/linux/mm_types.h
@@ -985,7 +985,7 @@ struct mm_struct {
* Represent how many empty pages are merged with kernel zero
* pages when enabling KSM use_zero_pages.
*/
- unsigned long ksm_zero_pages;
+ atomic_long_t ksm_zero_pages;
#endif /* CONFIG_KSM */
#ifdef CONFIG_LRU_GEN_WALKS_MMU
struct {
--- a/mm/ksm.c~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/mm/ksm.c
@@ -296,7 +296,7 @@ static bool ksm_use_zero_pages __read_mo
static bool ksm_smart_scan = true;
/* The number of zero pages which is placed by KSM */
-unsigned long ksm_zero_pages;
+atomic_long_t ksm_zero_pages = ATOMIC_LONG_INIT(0);
/* The number of pages that have been skipped due to "smart scanning" */
static unsigned long ksm_pages_skipped;
@@ -1429,8 +1429,7 @@ static int replace_page(struct vm_area_s
* the dirty bit in zero page's PTE is set.
*/
newpte = pte_mkdirty(pte_mkspecial(pfn_pte(page_to_pfn(kpage), vma->vm_page_prot)));
- ksm_zero_pages++;
- mm->ksm_zero_pages++;
+ ksm_map_zero_page(mm);
/*
* We're replacing an anonymous page with a zero page, which is
* not anonymous. We need to do proper accounting otherwise we
@@ -3373,7 +3372,7 @@ static void wait_while_offlining(void)
#ifdef CONFIG_PROC_FS
long ksm_process_profit(struct mm_struct *mm)
{
- return (long)(mm->ksm_merging_pages + mm->ksm_zero_pages) * PAGE_SIZE -
+ return (long)(mm->ksm_merging_pages + mm_ksm_zero_pages(mm)) * PAGE_SIZE -
mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
}
#endif /* CONFIG_PROC_FS */
@@ -3662,7 +3661,7 @@ KSM_ATTR_RO(pages_skipped);
static ssize_t ksm_zero_pages_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
- return sysfs_emit(buf, "%ld\n", ksm_zero_pages);
+ return sysfs_emit(buf, "%ld\n", atomic_long_read(&ksm_zero_pages));
}
KSM_ATTR_RO(ksm_zero_pages);
@@ -3671,7 +3670,7 @@ static ssize_t general_profit_show(struc
{
long general_profit;
- general_profit = (ksm_pages_sharing + ksm_zero_pages) * PAGE_SIZE -
+ general_profit = (ksm_pages_sharing + atomic_long_read(&ksm_zero_pages)) * PAGE_SIZE -
ksm_rmap_items * sizeof(struct ksm_rmap_item);
return sysfs_emit(buf, "%ld\n", general_profit);
_
Patches currently in -mm which might be from chengming.zhou(a)linux.dev are
mm-ksm-fix-ksm_pages_scanned-accounting.patch
mm-ksm-fix-ksm_zero_pages-accounting.patch
The patch titled
Subject: mm/ksm: fix ksm_pages_scanned accounting
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-ksm-fix-ksm_pages_scanned-accounting.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Chengming Zhou <chengming.zhou(a)linux.dev>
Subject: mm/ksm: fix ksm_pages_scanned accounting
Date: Tue, 28 May 2024 13:15:21 +0800
Patch series "mm/ksm: fix some accounting problems", v3.
We encountered some abnormal ksm_pages_scanned and ksm_zero_pages during
some random tests.
1. ksm_pages_scanned unchanged even ksmd scanning has progress.
2. ksm_zero_pages maybe -1 in some rare cases.
This patch (of 2):
During testing, I found ksm_pages_scanned is unchanged although the
scan_get_next_rmap_item() did return valid rmap_item that is not NULL.
The reason is the scan_get_next_rmap_item() will return NULL after a full
scan, so ksm_do_scan() just return without accounting of the
ksm_pages_scanned.
Fix it by just putting ksm_pages_scanned accounting in that loop, and it
will be accounted more timely if that loop would last for a long time.
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-0-34bb358fdc13@linux.…
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-1-34bb358fdc13@linux.…
Fixes: b348b5fe2b5f ("mm/ksm: add pages scanned metric")
Signed-off-by: Chengming Zhou <chengming.zhou(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: xu xin <xu.xin16(a)zte.com.cn>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/ksm.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/mm/ksm.c~mm-ksm-fix-ksm_pages_scanned-accounting
+++ a/mm/ksm.c
@@ -2753,18 +2753,16 @@ static void ksm_do_scan(unsigned int sca
{
struct ksm_rmap_item *rmap_item;
struct page *page;
- unsigned int npages = scan_npages;
- while (npages-- && likely(!freezing(current))) {
+ while (scan_npages-- && likely(!freezing(current))) {
cond_resched();
rmap_item = scan_get_next_rmap_item(&page);
if (!rmap_item)
return;
cmp_and_merge_page(page, rmap_item);
put_page(page);
+ ksm_pages_scanned++;
}
-
- ksm_pages_scanned += scan_npages - npages;
}
static int ksmd_should_run(void)
_
Patches currently in -mm which might be from chengming.zhou(a)linux.dev are
mm-ksm-fix-ksm_pages_scanned-accounting.patch
mm-ksm-fix-ksm_zero_pages-accounting.patch
sk_psock_get will return NULL if the refcount of psock has gone to 0, which
will happen when the last call of sk_psock_put is done. However,
sk_psock_drop may not have finished yet, so the close callback will still
point to sock_map_close despite psock being NULL.
This can be reproduced with a thread deleting an element from the sock map,
while the second one creates a socket, adds it to the map and closes it.
That will trigger the WARN_ON_ONCE:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Modules linked in:
CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02
RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293
RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000
RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0
RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3
R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840
R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870
FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0
Call Trace:
<TASK>
unix_release+0x87/0xc0 net/unix/af_unix.c:1048
__sock_release net/socket.c:659 [inline]
sock_close+0xbe/0x240 net/socket.c:1421
__fput+0x42b/0x8a0 fs/file_table.c:422
__do_sys_close fs/open.c:1556 [inline]
__se_sys_close fs/open.c:1541 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1541
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb37d618070
Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070
RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Use sk_psock, which will only check that the pointer is not been set to
NULL yet, which should only happen after the callbacks are restored. If,
then, a reference can still be gotten, we may call sk_psock_stop and cancel
psock->work.
As suggested by Paolo Abeni, reorder the condition so the control flow is
less convoluted.
After that change, the reproducer does not trigger the WARN_ON_ONCE
anymore.
Suggested-by: Paolo Abeni <pabeni(a)redhat.com>
Reported-by: syzbot+07a2e4a1a57118ef7355(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=07a2e4a1a57118ef7355
Fixes: aadb2bb83ff7 ("sock_map: Fix a potential use-after-free in sock_map_close()")
Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
---
v2: change control flow as suggested by Paolo Abeni
v1: https://lore.kernel.org/netdev/20240520214153.847619-1-cascardo@igalia.com/
---
net/core/sock_map.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 9402889840bf..c3179567a99a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1680,19 +1680,23 @@ void sock_map_close(struct sock *sk, long timeout)
lock_sock(sk);
rcu_read_lock();
- psock = sk_psock_get(sk);
- if (unlikely(!psock)) {
- rcu_read_unlock();
- release_sock(sk);
- saved_close = READ_ONCE(sk->sk_prot)->close;
- } else {
+ psock = sk_psock(sk);
+ if (likely(psock)) {
saved_close = psock->saved_close;
sock_map_remove_links(sk, psock);
+ psock = sk_psock_get(sk);
+ if (unlikely(!psock))
+ goto no_psock;
rcu_read_unlock();
sk_psock_stop(psock);
release_sock(sk);
cancel_delayed_work_sync(&psock->work);
sk_psock_put(sk, psock);
+ } else {
+ saved_close = READ_ONCE(sk->sk_prot)->close;
+no_psock:
+ rcu_read_unlock();
+ release_sock(sk);
}
/* Make sure we do not recurse. This is a bug.
--
2.34.1
The patch titled
Subject: kmsan: do not wipe out origin when doing partial unpoisoning
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kmsan: do not wipe out origin when doing partial unpoisoning
Date: Tue, 28 May 2024 12:48:06 +0200
As noticed by Brian, KMSAN should not be zeroing the origin when
unpoisoning parts of a four-byte uninitialized value, e.g.:
char a[4];
kmsan_unpoison_memory(a, 1);
This led to false negatives, as certain poisoned values could receive zero
origins, preventing those values from being reported.
To fix the problem, check that kmsan_internal_set_shadow_origin() writes
zero origins only to slots which have zero shadow.
Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com
Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reported-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.…
Reviewed-by: Marco Elver <elver(a)google.com>
Tested-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/core.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--- a/mm/kmsan/core.c~kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning
+++ a/mm/kmsan/core.c
@@ -196,8 +196,7 @@ void kmsan_internal_set_shadow_origin(vo
u32 origin, bool checked)
{
u64 address = (u64)addr;
- void *shadow_start;
- u32 *origin_start;
+ u32 *shadow_start, *origin_start;
size_t pad = 0;
KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
@@ -225,8 +224,16 @@ void kmsan_internal_set_shadow_origin(vo
origin_start =
(u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
- for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
- origin_start[i] = origin;
+ /*
+ * If the new origin is non-zero, assume that the shadow byte is also non-zero,
+ * and unconditionally overwrite the old origin slot.
+ * If the new origin is zero, overwrite the old origin slot iff the
+ * corresponding shadow slot is zero.
+ */
+ for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++) {
+ if (origin || !shadow_start[i])
+ origin_start[i] = origin;
+ }
}
struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
_
Patches currently in -mm which might be from glider(a)google.com are
kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning.patch
This fixes a NULL pointer dereference bug due to a data race which
looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
Workqueue: events_unbound netfs_rreq_write_to_cache_work
RIP: 0010:cachefiles_prepare_write+0x30/0xa0
Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
Call Trace:
<TASK>
? __die+0x1f/0x70
? page_fault_oops+0x15d/0x440
? search_module_extables+0xe/0x40
? fixup_exception+0x22/0x2f0
? exc_page_fault+0x5f/0x100
? asm_exc_page_fault+0x22/0x30
? cachefiles_prepare_write+0x30/0xa0
netfs_rreq_write_to_cache_work+0x135/0x2e0
process_one_work+0x137/0x2c0
worker_thread+0x2e9/0x400
? __pfx_worker_thread+0x10/0x10
kthread+0xcc/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:
struct cachefiles_object *object = cachefiles_cres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()). The cookie must not be
withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIES_TATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
---
fs/netfs/fscache_cookie.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/netfs/fscache_cookie.c b/fs/netfs/fscache_cookie.c
index bce2492186d0..d4d4b3a8b106 100644
--- a/fs/netfs/fscache_cookie.c
+++ b/fs/netfs/fscache_cookie.c
@@ -741,6 +741,10 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
spin_lock(&cookie->lock);
}
if (test_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) {
+ if (atomic_read(&cookie->n_accesses) != 0)
+ /* still being accessed: postpone it */
+ break;
+
__fscache_set_cookie_state(cookie,
FSCACHE_COOKIE_STATE_LRU_DISCARDING);
wake = true;
--
2.39.2
On Mon, May 27, 2024 at 11:40 PM Chengen Du <chengen.du(a)canonical.com> wrote:
>
> Hi Willem,
>
> Thank you for your suggestions on the patch.
> However, there are some parts I am not familiar with, and I would appreciate more detailed information from your side.
Please respond with plain-text email. This message did not make it to
the list. Also no top posting.
https://docs.kernel.org/process/submitting-patches.htmlhttps://subspace.kernel.org/etiquette.html
> > > @@ -2457,7 +2465,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > > sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> > > sll->sll_family = AF_PACKET;
> > > sll->sll_hatype = dev->type;
> > > - sll->sll_protocol = skb->protocol;
> > > + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> > > + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> >
> > In SOCK_RAW mode, the VLAN tag will be present, so should be returned.
>
> Based on libpcap's handling, the SLL may not be used in SOCK_RAW mode.
The kernel fills in the sockaddr_ll fields in tpacket_rcv for both
SOCK_RAW and SOCK_DGRAM. Libpcap already can use both SOCK_RAW and
SOCK_DGRAM. And constructs the sll2_header pseudo header that tcpdump
sees itself, in pcap_handle_packet_mmap.
> Do you recommend evaluating the mode and maintaining the original logic in SOCK_RAW mode,
> or should we use the same logic for both SOCK_DGRAM and SOCK_RAW modes?
I suggest keeping as is for SOCK_RAW, as returning data that starts at
a VLAN header together with skb->protocol of ETH_P_IPV6 would be just
as confusing as the inverse that we do today on SOCK_DGRAM.
> >
> > I'm concerned about returning a different value between SOCK_RAW and
> > SOCK_DGRAM. But don't immediately see a better option. And for
> > SOCK_DGRAM this approach is indistinguishable from the result on a
> > device with hardware offload, so is acceptable.
> >
> > This test for ETH_P_8021Q ignores the QinQ stacked VLAN case. When
> > fixing VLAN encap, both variants should be addressed at the same time.
> > Note that ETH_P_8021AD is included in the eth_type_vlan test you call
> > above.
>
> In patch 1, the eth_type_vlan() function is used to determine if we need to set the sll_protocol to the VLAN-encapsulated protocol, which includes both ETH_P_8021Q and ETH_P_8021AD.
> You mentioned previously that we might want the true network protocol instead of the inner VLAN tag in the QinQ case (which means 802.1ad?).
> I believe I may have misunderstood your point.
I mean that if SOCK_DGRAM strips all VLAN headers to return the data
from the start of the true network header, then skb->protocol should
return that network protocol.
With vlan stacking, your patch currently returns ETH_P_8021Q.
See the packet formats in
https://en.wikipedia.org/wiki/IEEE_802.1ad#Frame_format if you're
confused about how stacking works.
> Could you please confirm if both ETH_P_8021Q and ETH_P_8021AD should use the VLAN-encapsulated protocol when VLAN hardware offloading is unavailable?
> Or are there other aspects that this judgment does not handle correctly?
Addresses an issue where a CAN bus error during a BAM transmission could
stall the socket queue, preventing further transmissions even after the
bus error is resolved. The fix activates the next queued session after
the error recovery, allowing communication to continue.
Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol")
Cc: stable(a)vger.kernel.org
Reported-by: Alexander Hölzl <alexander.hoelzl(a)gmx.net>
Tested-by: Alexander Hölzl <alexander.hoelzl(a)gmx.net>
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
---
net/can/j1939/transport.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index fe3df23a25957..9805124d16763 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -1681,6 +1681,8 @@ static int j1939_xtp_rx_rts_session_active(struct j1939_session *session,
j1939_session_timers_cancel(session);
j1939_session_cancel(session, J1939_XTP_ABORT_BUSY);
+ if (session->transmission)
+ j1939_session_deactivate_activate_next(session);
return -EBUSY;
}
--
2.39.2
Just a quick note to say I've just booted 6.1.92, plus the 274 patches
in the current stable queue applied, on an x86-64 workstation and no
problems or issues to report, no unexpected errors in dmesg.
(AMD Ryzen, 3 x AMD gpu, sata, nvme, lots of USB, Gentoo, SELinux, KDE)
Eddie
In LUCID EVO PLL CAL_L_VAL and L_VAL bitfields are part of single
PLL_L_VAL register. Update for L_VAL bitfield values in PLL_L_VAL
register using regmap_write() API in __alpha_pll_trion_set_rate
callback will override LUCID EVO PLL initial configuration related
to PLL_CAL_L_VAL bit fields in PLL_L_VAL register.
Observed random PLL lock failures during PLL enable due to such
override in PLL calibration value. Use regmap_update_bits() with
L_VAL bitfield mask instead of regmap_write() API to update only
PLL_L_VAL bitfields in __alpha_pll_trion_set_rate callback.
Fixes: 260e36606a03 ("clk: qcom: clk-alpha-pll: add Lucid EVO PLL configuration interfaces")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ajit Pandey <quic_ajipan(a)quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
---
drivers/clk/qcom/clk-alpha-pll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/clk/qcom/clk-alpha-pll.c b/drivers/clk/qcom/clk-alpha-pll.c
index d4227909d1fe..62138ed3df88 100644
--- a/drivers/clk/qcom/clk-alpha-pll.c
+++ b/drivers/clk/qcom/clk-alpha-pll.c
@@ -1665,7 +1665,7 @@ static int __alpha_pll_trion_set_rate(struct clk_hw *hw, unsigned long rate,
if (ret < 0)
return ret;
- regmap_write(pll->clkr.regmap, PLL_L_VAL(pll), l);
+ regmap_update_bits(pll->clkr.regmap, PLL_L_VAL(pll), LUCID_EVO_PLL_L_VAL_MASK, l);
regmap_write(pll->clkr.regmap, PLL_ALPHA_VAL(pll), a);
/* Latch the PLL input */
--
2.25.1
As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the
Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams
do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188
as well.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com>
---
Changes in v2:
- Porting to latest release
drivers/usb/host/xhci-pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index b47d57d80b96..effeec5cf1fa 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -399,6 +399,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
pdev->device == PCI_DEVICE_ID_EJ188) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
+ xhci->quirks |= XHCI_BROKEN_STREAMS;
}
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0014) {
--
2.25.1
There are a number of error paths in the probe function that do not call
of_node_put() to decrement the np device node refcount, leading to
memory leaks if those errors occur.
In order to ease backporting, the fix has been divided into two patches:
the first one simply adds the missing calls to of_node_put(), and the
second one adds the __free() macro to the existing device nodes to
remove the need for of_node_put(), ensuring that the same bug will not
arise in the future.
The issue was found by chance while analyzing the code, and I do not
have the hardware to test it beyond compiling and static analysis tools.
Although the issue is clear and the fix too, if someone wants to
volunteer to test the series with real hardware, it would be great.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
cpufreq: qcom-nvmem: fix memory leaks in probe error paths
cpufreq: qcom-nvmem: eliminate uses of of_node_put()
drivers/cpufreq/qcom-cpufreq-nvmem.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
---
base-commit: 3689b0ef08b70e4e03b82ebd37730a03a672853a
change-id: 20240523-qcom-cpufreq-nvmem_memleak-6b6821db52b1
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the
Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams
do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188
as well.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com>
---
Changes in v2:
- Porting to latest release
drivers/usb/host/xhci-pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index b47d57d80b96..effeec5cf1fa 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -399,6 +399,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
pdev->device == PCI_DEVICE_ID_EJ188) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
+ xhci->quirks |= XHCI_BROKEN_STREAMS;
}
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0014) {
--
2.25.1
I'm announcing the release of the 6.9.2 kernel.
All users of the 6.9 kernel series must upgrade.
The updated 6.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/ABI/stable/sysfs-block | 10 ++
Documentation/admin-guide/hw-vuln/core-scheduling.rst | 4
Documentation/admin-guide/mm/damon/usage.rst | 6 -
Documentation/sphinx/kernel_include.py | 1
Makefile | 2
arch/x86/include/asm/percpu.h | 6 -
block/genhd.c | 15 ++-
block/partitions/core.c | 5 -
drivers/android/binder.c | 2
drivers/android/binder_internal.h | 2
drivers/bluetooth/btusb.c | 1
drivers/cpufreq/amd-pstate.c | 22 ++++-
drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 7 +
drivers/media/v4l2-core/v4l2-ctrls-core.c | 18 +---
drivers/net/ethernet/micrel/ks8851_common.c | 18 ----
drivers/net/usb/ax88179_178a.c | 37 ++++++--
drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 10 --
drivers/remoteproc/mtk_scp.c | 10 ++
drivers/tty/serial/kgdboc.c | 30 ++++++-
drivers/usb/dwc3/gadget.c | 4
drivers/usb/typec/tipd/core.c | 51 ++++++++----
drivers/usb/typec/tipd/tps6598x.h | 11 ++
drivers/usb/typec/ucsi/displayport.c | 4
include/linux/blkdev.h | 13 +++
include/net/bluetooth/hci.h | 9 ++
include/net/bluetooth/hci_core.h | 1
net/bluetooth/hci_conn.c | 75 ++++++++++++------
net/bluetooth/hci_event.c | 31 ++++---
net/bluetooth/iso.c | 2
net/bluetooth/l2cap_core.c | 17 ----
net/bluetooth/sco.c | 6 -
security/keys/trusted-keys/trusted_tpm2.c | 25 ++++--
sound/soc/intel/boards/Makefile | 1
sound/soc/intel/boards/sof_sdw.c | 12 +-
sound/soc/intel/boards/sof_sdw_common.h | 1
sound/soc/intel/boards/sof_sdw_rt_dmic.c | 52 ++++++++++++
36 files changed, 356 insertions(+), 165 deletions(-)
Akira Yokosawa (1):
docs: kernel_include.py: Cope with docutils 0.21
AngeloGioacchino Del Regno (1):
remoteproc: mediatek: Make sure IPI buffer fits in L2TCM
Bard Liao (1):
ASoC: Intel: sof_sdw: use generic rtd_init function for Realtek SDW DMICs
Ben Greear (1):
wifi: iwlwifi: Use request_module_nowait
Carlos Llamas (1):
binder: fix max_thread type inconsistency
Christoph Hellwig (2):
block: add a disk_has_partscan helper
block: add a partscan sysfs attribute for disks
Daniel Thompson (1):
serial: kgdboc: Fix NMI-safety problems from keyboard reset code
Greg Kroah-Hartman (1):
Linux 6.9.2
Hans Verkuil (1):
Revert "media: v4l2-ctrls: show all owned controls in log_status"
Heikki Krogerus (1):
usb: typec: ucsi: displayport: Fix potential deadlock
Jarkko Sakkinen (2):
KEYS: trusted: Fix memory leak in tpm2_key_encode()
KEYS: trusted: Do not use WARN when encode fails
Javier Carrasco (2):
usb: typec: tipd: fix event checking for tps25750
usb: typec: tipd: fix event checking for tps6598x
Jose Fernandez (1):
drm/amd/display: Fix division by zero in setup_dsc_config
Jose Ignacio Tornos Martinez (1):
net: usb: ax88179_178a: fix link status when link is set to down/up
Perry Yuan (1):
cpufreq: amd-pstate: fix the highest frequency issue which limits performance
Peter Tsao (1):
Bluetooth: btusb: Fix the patch for MT7920 the affected to MT7921
Prashanth K (1):
usb: dwc3: Wait unconditionally after issuing EndXfer command
Ronald Wahl (1):
net: ks8851: Fix another TX stall caused by wrong ISR flag handling
SeongJae Park (2):
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Sungwoo Kim (1):
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Thomas Weißschuh (1):
admin-guide/hw-vuln/core-scheduling: fix return type of PR_SCHED_CORE_GET
Uros Bizjak (1):
x86/percpu: Use __force to cast from __percpu address space
The dynamically created mei client device (mei csi) is used as one V4L2
sub device of the whole video pipeline, and the V4L2 connection graph is
built by software node. The mei_stop() and mei_restart() will delete the
old mei csi client device and create a new mei client device, which will
cause the software node information saved in old mei csi device lost and
the whole video pipeline will be broken.
Removing mei_stop()/mei_restart() during system suspend/resume can fix
the issue above and won't impact hardware actual power saving logic.
Fixes: f6085a96c973 ("mei: vsc: Unregister interrupt handler for system suspend")
Cc: stable(a)vger.kernel.org # for 6.8+
Reported-by: Hao Yao <hao.yao(a)intel.com>
Signed-off-by: Wentong Wu <wentong.wu(a)intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Tested-by: Jason Chen <jason.z.chen(a)intel.com>
Tested-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
---
Changes since v2:
- add change log which is not covered by v2, and no code change
Changes since v1:
- correct Fixes commit id in commit message, and no code change
---
drivers/misc/mei/platform-vsc.c | 39 +++++++++++++--------------------
1 file changed, 15 insertions(+), 24 deletions(-)
diff --git a/drivers/misc/mei/platform-vsc.c b/drivers/misc/mei/platform-vsc.c
index b543e6b9f3cf..1ec65d87488a 100644
--- a/drivers/misc/mei/platform-vsc.c
+++ b/drivers/misc/mei/platform-vsc.c
@@ -399,41 +399,32 @@ static void mei_vsc_remove(struct platform_device *pdev)
static int mei_vsc_suspend(struct device *dev)
{
- struct mei_device *mei_dev = dev_get_drvdata(dev);
- struct mei_vsc_hw *hw = mei_dev_to_vsc_hw(mei_dev);
+ struct mei_device *mei_dev;
+ int ret = 0;
- mei_stop(mei_dev);
+ mei_dev = dev_get_drvdata(dev);
+ if (!mei_dev)
+ return -ENODEV;
- mei_disable_interrupts(mei_dev);
+ mutex_lock(&mei_dev->device_lock);
- vsc_tp_free_irq(hw->tp);
+ if (!mei_write_is_idle(mei_dev))
+ ret = -EAGAIN;
- return 0;
+ mutex_unlock(&mei_dev->device_lock);
+
+ return ret;
}
static int mei_vsc_resume(struct device *dev)
{
- struct mei_device *mei_dev = dev_get_drvdata(dev);
- struct mei_vsc_hw *hw = mei_dev_to_vsc_hw(mei_dev);
- int ret;
-
- ret = vsc_tp_request_irq(hw->tp);
- if (ret)
- return ret;
-
- ret = mei_restart(mei_dev);
- if (ret)
- goto err_free;
+ struct mei_device *mei_dev;
- /* start timer if stopped in suspend */
- schedule_delayed_work(&mei_dev->timer_work, HZ);
+ mei_dev = dev_get_drvdata(dev);
+ if (!mei_dev)
+ return -ENODEV;
return 0;
-
-err_free:
- vsc_tp_free_irq(hw->tp);
-
- return ret;
}
static DEFINE_SIMPLE_DEV_PM_OPS(mei_vsc_pm_ops, mei_vsc_suspend, mei_vsc_resume);
--
2.34.1
On the RK3066, there is a bit that must be cleared on flush, otherwise
we do not get display output (at least for RGB).
Signed-off-by: Val Packett <val(a)packett.cool>
Cc: stable(a)vger.kernel.org
---
Hi! This was required to get display working on an old RK3066 tablet,
along with the next tiny patch in the series enabling the RGB output.
I have spent quite a lot of time banging my head against the wall debugging
that display (especially since at the same time a scaler chip is used for
LVDS encoding), but finally adding debug prints showed that RK3066_SYS_CTRL0
ended up being reset to all-zero after being written correctly upon init.
Looking at the register definitions in the vendor driver revealed that the
reason was pretty self-explanatory: "dma_stop".
---
drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 3 +++
drivers/gpu/drm/rockchip/rockchip_drm_vop.h | 1 +
drivers/gpu/drm/rockchip/rockchip_vop_reg.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index a13473b2d..d4daeba74 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -1578,6 +1578,9 @@ static void vop_crtc_atomic_flush(struct drm_crtc *crtc,
spin_lock(&vop->reg_lock);
+ /* If the chip has a DMA stop bit (RK3066), it must be cleared. */
+ VOP_REG_SET(vop, common, dma_stop, 0);
+
/* Enable AFBC if there is some AFBC window, disable otherwise. */
s = to_rockchip_crtc_state(crtc->state);
VOP_AFBC_SET(vop, enable, s->enable_afbc);
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
index b33e5bdc2..0cf512cc1 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
@@ -122,6 +122,7 @@ struct vop_common {
struct vop_reg lut_buffer_index;
struct vop_reg gate_en;
struct vop_reg mmu_en;
+ struct vop_reg dma_stop;
struct vop_reg out_mode;
struct vop_reg standby;
};
diff --git a/drivers/gpu/drm/rockchip/rockchip_vop_reg.c b/drivers/gpu/drm/rockchip/rockchip_vop_reg.c
index b9ee02061..9bcb40a64 100644
--- a/drivers/gpu/drm/rockchip/rockchip_vop_reg.c
+++ b/drivers/gpu/drm/rockchip/rockchip_vop_reg.c
@@ -466,6 +466,7 @@ static const struct vop_output rk3066_output = {
};
static const struct vop_common rk3066_common = {
+ .dma_stop = VOP_REG(RK3066_SYS_CTRL0, 0x1, 0),
.standby = VOP_REG(RK3066_SYS_CTRL0, 0x1, 1),
.out_mode = VOP_REG(RK3066_DSP_CTRL0, 0xf, 0),
.cfg_done = VOP_REG(RK3066_REG_CFG_DONE, 0x1, 0),
--
2.45.0
On Mon, May 27, 2024 at 2:21 AM Sasha Levin wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> nilfs2: make superblock data array index computation sparse friendly
>
> to the 6.9-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> nilfs2-make-superblock-data-array-index-computation-.patch
> and it can be found in the queue-6.9 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 5017482ff3b29550015cce7f81279dc69aefd6fe
> Author: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
> Date: Tue Apr 30 17:00:19 2024 +0900
>
> nilfs2: make superblock data array index computation sparse friendly
>
> [ Upstream commit 91d743a9c8299de1fc1b47428d8bb4c85face00f ]
>
> Upon running sparse, "warning: dubious: x & !y" is output at an array
> index calculation within nilfs_load_super_block().
>
> The calculation is not wrong, but to eliminate the sparse warning, replace
> it with an equivalent calculation.
>
> Also, add a comment to make it easier to understand what the unintuitive
> array index calculation is doing and whether it's correct.
>
> Link: https://lkml.kernel.org/r/20240430080019.4242-3-konishi.ryusuke@gmail.com
> Fixes: e339ad31f599 ("nilfs2: introduce secondary super block")
> Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
> Cc: Bart Van Assche <bvanassche(a)acm.org>
> Cc: Jens Axboe <axboe(a)kernel.dk>
> Cc: kernel test robot <lkp(a)intel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
> index 2ae2c1bbf6d17..adbc6e87471ab 100644
> --- a/fs/nilfs2/the_nilfs.c
> +++ b/fs/nilfs2/the_nilfs.c
> @@ -592,7 +592,7 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
> struct nilfs_super_block **sbp = nilfs->ns_sbp;
> struct buffer_head **sbh = nilfs->ns_sbh;
> u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
> - int valid[2], swp = 0;
> + int valid[2], swp = 0, older;
>
> if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
> nilfs_err(sb, "device size too small");
> @@ -648,9 +648,25 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
> if (swp)
> nilfs_swap_super_block(nilfs);
>
> + /*
> + * Calculate the array index of the older superblock data.
> + * If one has been dropped, set index 0 pointing to the remaining one,
> + * otherwise set index 1 pointing to the old one (including if both
> + * are the same).
> + *
> + * Divided case valid[0] valid[1] swp -> older
> + * -------------------------------------------------------------
> + * Both SBs are invalid 0 0 N/A (Error)
> + * SB1 is invalid 0 1 1 0
> + * SB2 is invalid 1 0 0 0
> + * SB2 is newer 1 1 1 0
> + * SB2 is older or the same 1 1 0 1
> + */
> + older = valid[1] ^ swp;
> +
> nilfs->ns_sbwcount = 0;
> nilfs->ns_sbwtime = le64_to_cpu(sbp[0]->s_wtime);
> - nilfs->ns_prot_seq = le64_to_cpu(sbp[valid[1] & !swp]->s_last_seq);
> + nilfs->ns_prot_seq = le64_to_cpu(sbp[older]->s_last_seq);
> *sbpp = sbp[0];
> return 0;
> }
This commit fixes the sparse warning output by build "make C=1" with
the sparse check, but does not fix any operational bugs.
Therefore, if fixing a harmless sparse warning does not meet the
requirements for backporting to stable trees (I assume it does),
please drop it as it is a false positive pickup. Sorry if the
"Fixes:" tag is confusing.
The same goes for the same patch queued to other stable-trees.
Thanks,
Ryusuke Konishi
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x e60b613df8b6253def41215402f72986fee3fc8d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052700-ferry-breeder-caa8@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e60b613df8b6253def41215402f72986fee3fc8d Mon Sep 17 00:00:00 2001
From: Zheng Yejian <zhengyejian1(a)huawei.com>
Date: Fri, 10 May 2024 03:28:59 +0800
Subject: [PATCH] ftrace: Fix possible use-after-free issue in
ftrace_location()
KASAN reports a bug:
BUG: KASAN: use-after-free in ftrace_location+0x90/0x120
Read of size 8 at addr ffff888141d40010 by task insmod/424
CPU: 8 PID: 424 Comm: insmod Tainted: G W 6.9.0-rc2+
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_report+0xcf/0x610
kasan_report+0xb5/0xe0
ftrace_location+0x90/0x120
register_kprobe+0x14b/0xa40
kprobe_init+0x2d/0xff0 [kprobe_example]
do_one_initcall+0x8f/0x2d0
do_init_module+0x13a/0x3c0
load_module+0x3082/0x33d0
init_module_from_file+0xd2/0x130
__x64_sys_finit_module+0x306/0x440
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x71/0x79
The root cause is that, in lookup_rec(), ftrace record of some address
is being searched in ftrace pages of some module, but those ftrace pages
at the same time is being freed in ftrace_release_mod() as the
corresponding module is being deleted:
CPU1 | CPU2
register_kprobes() { | delete_module() {
check_kprobe_address_safe() { |
arch_check_ftrace_location() { |
ftrace_location() { |
lookup_rec() // USE! | ftrace_release_mod() // Free!
To fix this issue:
1. Hold rcu lock as accessing ftrace pages in ftrace_location_range();
2. Use ftrace_location_range() instead of lookup_rec() in
ftrace_location();
3. Call synchronize_rcu() before freeing any ftrace pages both in
ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem().
Link: https://lore.kernel.org/linux-trace-kernel/20240509192859.1273558-1-zhengye…
Cc: stable(a)vger.kernel.org
Cc: <mhiramat(a)kernel.org>
Cc: <mark.rutland(a)arm.com>
Cc: <mathieu.desnoyers(a)efficios.com>
Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5a01d72f66db..2308c0a2fd29 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1595,12 +1595,15 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
unsigned long ftrace_location_range(unsigned long start, unsigned long end)
{
struct dyn_ftrace *rec;
+ unsigned long ip = 0;
+ rcu_read_lock();
rec = lookup_rec(start, end);
if (rec)
- return rec->ip;
+ ip = rec->ip;
+ rcu_read_unlock();
- return 0;
+ return ip;
}
/**
@@ -1614,25 +1617,22 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
*/
unsigned long ftrace_location(unsigned long ip)
{
- struct dyn_ftrace *rec;
+ unsigned long loc;
unsigned long offset;
unsigned long size;
- rec = lookup_rec(ip, ip);
- if (!rec) {
+ loc = ftrace_location_range(ip, ip);
+ if (!loc) {
if (!kallsyms_lookup_size_offset(ip, &size, &offset))
goto out;
/* map sym+0 to __fentry__ */
if (!offset)
- rec = lookup_rec(ip, ip + size - 1);
+ loc = ftrace_location_range(ip, ip + size - 1);
}
- if (rec)
- return rec->ip;
-
out:
- return 0;
+ return loc;
}
/**
@@ -6591,6 +6591,8 @@ static int ftrace_process_locs(struct module *mod,
/* We should have used all pages unless we skipped some */
if (pg_unuse) {
WARN_ON(!skipped);
+ /* Need to synchronize with ftrace_location_range() */
+ synchronize_rcu();
ftrace_free_pages(pg_unuse);
}
return ret;
@@ -6804,6 +6806,9 @@ void ftrace_release_mod(struct module *mod)
out_unlock:
mutex_unlock(&ftrace_lock);
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page)
+ synchronize_rcu();
for (pg = tmp_page; pg; pg = tmp_page) {
/* Needs to be called outside of ftrace_lock */
@@ -7137,6 +7142,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
unsigned long start = (unsigned long)(start_ptr);
unsigned long end = (unsigned long)(end_ptr);
struct ftrace_page **last_pg = &ftrace_pages_start;
+ struct ftrace_page *tmp_page = NULL;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
struct dyn_ftrace key;
@@ -7178,12 +7184,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
ftrace_update_tot_cnt--;
if (!pg->index) {
*last_pg = pg->next;
- if (pg->records) {
- free_pages((unsigned long)pg->records, pg->order);
- ftrace_number_of_pages -= 1 << pg->order;
- }
- ftrace_number_of_groups--;
- kfree(pg);
+ pg->next = tmp_page;
+ tmp_page = pg;
pg = container_of(last_pg, struct ftrace_page, next);
if (!(*last_pg))
ftrace_pages = pg;
@@ -7200,6 +7202,11 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
clear_func_from_hashes(func);
kfree(func);
}
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page) {
+ synchronize_rcu();
+ ftrace_free_pages(tmp_page);
+ }
}
void __init ftrace_free_init_mem(void)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x e60b613df8b6253def41215402f72986fee3fc8d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052759-earmark-vagrantly-05cf@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e60b613df8b6253def41215402f72986fee3fc8d Mon Sep 17 00:00:00 2001
From: Zheng Yejian <zhengyejian1(a)huawei.com>
Date: Fri, 10 May 2024 03:28:59 +0800
Subject: [PATCH] ftrace: Fix possible use-after-free issue in
ftrace_location()
KASAN reports a bug:
BUG: KASAN: use-after-free in ftrace_location+0x90/0x120
Read of size 8 at addr ffff888141d40010 by task insmod/424
CPU: 8 PID: 424 Comm: insmod Tainted: G W 6.9.0-rc2+
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_report+0xcf/0x610
kasan_report+0xb5/0xe0
ftrace_location+0x90/0x120
register_kprobe+0x14b/0xa40
kprobe_init+0x2d/0xff0 [kprobe_example]
do_one_initcall+0x8f/0x2d0
do_init_module+0x13a/0x3c0
load_module+0x3082/0x33d0
init_module_from_file+0xd2/0x130
__x64_sys_finit_module+0x306/0x440
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x71/0x79
The root cause is that, in lookup_rec(), ftrace record of some address
is being searched in ftrace pages of some module, but those ftrace pages
at the same time is being freed in ftrace_release_mod() as the
corresponding module is being deleted:
CPU1 | CPU2
register_kprobes() { | delete_module() {
check_kprobe_address_safe() { |
arch_check_ftrace_location() { |
ftrace_location() { |
lookup_rec() // USE! | ftrace_release_mod() // Free!
To fix this issue:
1. Hold rcu lock as accessing ftrace pages in ftrace_location_range();
2. Use ftrace_location_range() instead of lookup_rec() in
ftrace_location();
3. Call synchronize_rcu() before freeing any ftrace pages both in
ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem().
Link: https://lore.kernel.org/linux-trace-kernel/20240509192859.1273558-1-zhengye…
Cc: stable(a)vger.kernel.org
Cc: <mhiramat(a)kernel.org>
Cc: <mark.rutland(a)arm.com>
Cc: <mathieu.desnoyers(a)efficios.com>
Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5a01d72f66db..2308c0a2fd29 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1595,12 +1595,15 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
unsigned long ftrace_location_range(unsigned long start, unsigned long end)
{
struct dyn_ftrace *rec;
+ unsigned long ip = 0;
+ rcu_read_lock();
rec = lookup_rec(start, end);
if (rec)
- return rec->ip;
+ ip = rec->ip;
+ rcu_read_unlock();
- return 0;
+ return ip;
}
/**
@@ -1614,25 +1617,22 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
*/
unsigned long ftrace_location(unsigned long ip)
{
- struct dyn_ftrace *rec;
+ unsigned long loc;
unsigned long offset;
unsigned long size;
- rec = lookup_rec(ip, ip);
- if (!rec) {
+ loc = ftrace_location_range(ip, ip);
+ if (!loc) {
if (!kallsyms_lookup_size_offset(ip, &size, &offset))
goto out;
/* map sym+0 to __fentry__ */
if (!offset)
- rec = lookup_rec(ip, ip + size - 1);
+ loc = ftrace_location_range(ip, ip + size - 1);
}
- if (rec)
- return rec->ip;
-
out:
- return 0;
+ return loc;
}
/**
@@ -6591,6 +6591,8 @@ static int ftrace_process_locs(struct module *mod,
/* We should have used all pages unless we skipped some */
if (pg_unuse) {
WARN_ON(!skipped);
+ /* Need to synchronize with ftrace_location_range() */
+ synchronize_rcu();
ftrace_free_pages(pg_unuse);
}
return ret;
@@ -6804,6 +6806,9 @@ void ftrace_release_mod(struct module *mod)
out_unlock:
mutex_unlock(&ftrace_lock);
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page)
+ synchronize_rcu();
for (pg = tmp_page; pg; pg = tmp_page) {
/* Needs to be called outside of ftrace_lock */
@@ -7137,6 +7142,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
unsigned long start = (unsigned long)(start_ptr);
unsigned long end = (unsigned long)(end_ptr);
struct ftrace_page **last_pg = &ftrace_pages_start;
+ struct ftrace_page *tmp_page = NULL;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
struct dyn_ftrace key;
@@ -7178,12 +7184,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
ftrace_update_tot_cnt--;
if (!pg->index) {
*last_pg = pg->next;
- if (pg->records) {
- free_pages((unsigned long)pg->records, pg->order);
- ftrace_number_of_pages -= 1 << pg->order;
- }
- ftrace_number_of_groups--;
- kfree(pg);
+ pg->next = tmp_page;
+ tmp_page = pg;
pg = container_of(last_pg, struct ftrace_page, next);
if (!(*last_pg))
ftrace_pages = pg;
@@ -7200,6 +7202,11 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
clear_func_from_hashes(func);
kfree(func);
}
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page) {
+ synchronize_rcu();
+ ftrace_free_pages(tmp_page);
+ }
}
void __init ftrace_free_init_mem(void)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e60b613df8b6253def41215402f72986fee3fc8d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052758-tweezers-sassy-6775@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e60b613df8b6253def41215402f72986fee3fc8d Mon Sep 17 00:00:00 2001
From: Zheng Yejian <zhengyejian1(a)huawei.com>
Date: Fri, 10 May 2024 03:28:59 +0800
Subject: [PATCH] ftrace: Fix possible use-after-free issue in
ftrace_location()
KASAN reports a bug:
BUG: KASAN: use-after-free in ftrace_location+0x90/0x120
Read of size 8 at addr ffff888141d40010 by task insmod/424
CPU: 8 PID: 424 Comm: insmod Tainted: G W 6.9.0-rc2+
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_report+0xcf/0x610
kasan_report+0xb5/0xe0
ftrace_location+0x90/0x120
register_kprobe+0x14b/0xa40
kprobe_init+0x2d/0xff0 [kprobe_example]
do_one_initcall+0x8f/0x2d0
do_init_module+0x13a/0x3c0
load_module+0x3082/0x33d0
init_module_from_file+0xd2/0x130
__x64_sys_finit_module+0x306/0x440
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x71/0x79
The root cause is that, in lookup_rec(), ftrace record of some address
is being searched in ftrace pages of some module, but those ftrace pages
at the same time is being freed in ftrace_release_mod() as the
corresponding module is being deleted:
CPU1 | CPU2
register_kprobes() { | delete_module() {
check_kprobe_address_safe() { |
arch_check_ftrace_location() { |
ftrace_location() { |
lookup_rec() // USE! | ftrace_release_mod() // Free!
To fix this issue:
1. Hold rcu lock as accessing ftrace pages in ftrace_location_range();
2. Use ftrace_location_range() instead of lookup_rec() in
ftrace_location();
3. Call synchronize_rcu() before freeing any ftrace pages both in
ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem().
Link: https://lore.kernel.org/linux-trace-kernel/20240509192859.1273558-1-zhengye…
Cc: stable(a)vger.kernel.org
Cc: <mhiramat(a)kernel.org>
Cc: <mark.rutland(a)arm.com>
Cc: <mathieu.desnoyers(a)efficios.com>
Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5a01d72f66db..2308c0a2fd29 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1595,12 +1595,15 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
unsigned long ftrace_location_range(unsigned long start, unsigned long end)
{
struct dyn_ftrace *rec;
+ unsigned long ip = 0;
+ rcu_read_lock();
rec = lookup_rec(start, end);
if (rec)
- return rec->ip;
+ ip = rec->ip;
+ rcu_read_unlock();
- return 0;
+ return ip;
}
/**
@@ -1614,25 +1617,22 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
*/
unsigned long ftrace_location(unsigned long ip)
{
- struct dyn_ftrace *rec;
+ unsigned long loc;
unsigned long offset;
unsigned long size;
- rec = lookup_rec(ip, ip);
- if (!rec) {
+ loc = ftrace_location_range(ip, ip);
+ if (!loc) {
if (!kallsyms_lookup_size_offset(ip, &size, &offset))
goto out;
/* map sym+0 to __fentry__ */
if (!offset)
- rec = lookup_rec(ip, ip + size - 1);
+ loc = ftrace_location_range(ip, ip + size - 1);
}
- if (rec)
- return rec->ip;
-
out:
- return 0;
+ return loc;
}
/**
@@ -6591,6 +6591,8 @@ static int ftrace_process_locs(struct module *mod,
/* We should have used all pages unless we skipped some */
if (pg_unuse) {
WARN_ON(!skipped);
+ /* Need to synchronize with ftrace_location_range() */
+ synchronize_rcu();
ftrace_free_pages(pg_unuse);
}
return ret;
@@ -6804,6 +6806,9 @@ void ftrace_release_mod(struct module *mod)
out_unlock:
mutex_unlock(&ftrace_lock);
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page)
+ synchronize_rcu();
for (pg = tmp_page; pg; pg = tmp_page) {
/* Needs to be called outside of ftrace_lock */
@@ -7137,6 +7142,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
unsigned long start = (unsigned long)(start_ptr);
unsigned long end = (unsigned long)(end_ptr);
struct ftrace_page **last_pg = &ftrace_pages_start;
+ struct ftrace_page *tmp_page = NULL;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
struct dyn_ftrace key;
@@ -7178,12 +7184,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
ftrace_update_tot_cnt--;
if (!pg->index) {
*last_pg = pg->next;
- if (pg->records) {
- free_pages((unsigned long)pg->records, pg->order);
- ftrace_number_of_pages -= 1 << pg->order;
- }
- ftrace_number_of_groups--;
- kfree(pg);
+ pg->next = tmp_page;
+ tmp_page = pg;
pg = container_of(last_pg, struct ftrace_page, next);
if (!(*last_pg))
ftrace_pages = pg;
@@ -7200,6 +7202,11 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
clear_func_from_hashes(func);
kfree(func);
}
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page) {
+ synchronize_rcu();
+ ftrace_free_pages(tmp_page);
+ }
}
void __init ftrace_free_init_mem(void)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e60b613df8b6253def41215402f72986fee3fc8d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052757-fantasy-resent-77c6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e60b613df8b6253def41215402f72986fee3fc8d Mon Sep 17 00:00:00 2001
From: Zheng Yejian <zhengyejian1(a)huawei.com>
Date: Fri, 10 May 2024 03:28:59 +0800
Subject: [PATCH] ftrace: Fix possible use-after-free issue in
ftrace_location()
KASAN reports a bug:
BUG: KASAN: use-after-free in ftrace_location+0x90/0x120
Read of size 8 at addr ffff888141d40010 by task insmod/424
CPU: 8 PID: 424 Comm: insmod Tainted: G W 6.9.0-rc2+
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_report+0xcf/0x610
kasan_report+0xb5/0xe0
ftrace_location+0x90/0x120
register_kprobe+0x14b/0xa40
kprobe_init+0x2d/0xff0 [kprobe_example]
do_one_initcall+0x8f/0x2d0
do_init_module+0x13a/0x3c0
load_module+0x3082/0x33d0
init_module_from_file+0xd2/0x130
__x64_sys_finit_module+0x306/0x440
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x71/0x79
The root cause is that, in lookup_rec(), ftrace record of some address
is being searched in ftrace pages of some module, but those ftrace pages
at the same time is being freed in ftrace_release_mod() as the
corresponding module is being deleted:
CPU1 | CPU2
register_kprobes() { | delete_module() {
check_kprobe_address_safe() { |
arch_check_ftrace_location() { |
ftrace_location() { |
lookup_rec() // USE! | ftrace_release_mod() // Free!
To fix this issue:
1. Hold rcu lock as accessing ftrace pages in ftrace_location_range();
2. Use ftrace_location_range() instead of lookup_rec() in
ftrace_location();
3. Call synchronize_rcu() before freeing any ftrace pages both in
ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem().
Link: https://lore.kernel.org/linux-trace-kernel/20240509192859.1273558-1-zhengye…
Cc: stable(a)vger.kernel.org
Cc: <mhiramat(a)kernel.org>
Cc: <mark.rutland(a)arm.com>
Cc: <mathieu.desnoyers(a)efficios.com>
Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5a01d72f66db..2308c0a2fd29 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1595,12 +1595,15 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
unsigned long ftrace_location_range(unsigned long start, unsigned long end)
{
struct dyn_ftrace *rec;
+ unsigned long ip = 0;
+ rcu_read_lock();
rec = lookup_rec(start, end);
if (rec)
- return rec->ip;
+ ip = rec->ip;
+ rcu_read_unlock();
- return 0;
+ return ip;
}
/**
@@ -1614,25 +1617,22 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
*/
unsigned long ftrace_location(unsigned long ip)
{
- struct dyn_ftrace *rec;
+ unsigned long loc;
unsigned long offset;
unsigned long size;
- rec = lookup_rec(ip, ip);
- if (!rec) {
+ loc = ftrace_location_range(ip, ip);
+ if (!loc) {
if (!kallsyms_lookup_size_offset(ip, &size, &offset))
goto out;
/* map sym+0 to __fentry__ */
if (!offset)
- rec = lookup_rec(ip, ip + size - 1);
+ loc = ftrace_location_range(ip, ip + size - 1);
}
- if (rec)
- return rec->ip;
-
out:
- return 0;
+ return loc;
}
/**
@@ -6591,6 +6591,8 @@ static int ftrace_process_locs(struct module *mod,
/* We should have used all pages unless we skipped some */
if (pg_unuse) {
WARN_ON(!skipped);
+ /* Need to synchronize with ftrace_location_range() */
+ synchronize_rcu();
ftrace_free_pages(pg_unuse);
}
return ret;
@@ -6804,6 +6806,9 @@ void ftrace_release_mod(struct module *mod)
out_unlock:
mutex_unlock(&ftrace_lock);
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page)
+ synchronize_rcu();
for (pg = tmp_page; pg; pg = tmp_page) {
/* Needs to be called outside of ftrace_lock */
@@ -7137,6 +7142,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
unsigned long start = (unsigned long)(start_ptr);
unsigned long end = (unsigned long)(end_ptr);
struct ftrace_page **last_pg = &ftrace_pages_start;
+ struct ftrace_page *tmp_page = NULL;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
struct dyn_ftrace key;
@@ -7178,12 +7184,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
ftrace_update_tot_cnt--;
if (!pg->index) {
*last_pg = pg->next;
- if (pg->records) {
- free_pages((unsigned long)pg->records, pg->order);
- ftrace_number_of_pages -= 1 << pg->order;
- }
- ftrace_number_of_groups--;
- kfree(pg);
+ pg->next = tmp_page;
+ tmp_page = pg;
pg = container_of(last_pg, struct ftrace_page, next);
if (!(*last_pg))
ftrace_pages = pg;
@@ -7200,6 +7202,11 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
clear_func_from_hashes(func);
kfree(func);
}
+ /* Need to synchronize with ftrace_location_range() */
+ if (tmp_page) {
+ synchronize_rcu();
+ ftrace_free_pages(tmp_page);
+ }
}
void __init ftrace_free_init_mem(void)
From: Josef Bacik <josef(a)toxicpanda.com>
[ Upstream commit 418b9687dece5bd763c09b5c27a801a7e3387be9 ]
nfsd is the only thing using this helper, and it doesn't use the private
currently. When we switch to per-network namespace stats we will need
the struct net * in order to get to the nfsd_net. Use the net as the
proc private so we can utilize this when we make the switch over.
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
net/sunrpc/stats.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
This pre-requisite is needed for upstream commit 4b14885411f7 ("nfsd:
make all of the nfsd stats per-network namespace").
diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 65fc1297c6df..383860cb1d5b 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -314,7 +314,7 @@ EXPORT_SYMBOL_GPL(rpc_proc_unregister);
struct proc_dir_entry *
svc_proc_register(struct net *net, struct svc_stat *statp, const struct proc_ops *proc_ops)
{
- return do_register(net, statp->program->pg_name, statp, proc_ops);
+ return do_register(net, statp->program->pg_name, net, proc_ops);
}
EXPORT_SYMBOL_GPL(svc_proc_register);
--
2.45.1
On present kernels, HPET fallback occurs on some 8-16 socket x86
systems due to the TSC adjust architectural MSR not being respected.
This was fixed upstream in commit 455f9075f14484f358b3c1d6845b4a438de198a7.
Please backport this fix to -stable for 6.9, 6.8, 6.6, 6.1, 5.15,
5.10, 5.4 and 4.19 branches to allow correct TSC operation on existing
distros using these kernels. The patch cleanly applies to all of these
latest -stable branches.
Many thanks,
Daniel
--
Daniel J Blueman
From: Andrey Konovalov <andreyknvl(a)gmail.com>
After commit 8fea0c8fda30 ("usb: core: hcd: Convert from tasklet to BH
workqueue"), usb_giveback_urb_bh() runs in the BH workqueue with
interrupts enabled.
Thus, the remote coverage collection section in usb_giveback_urb_bh()->
__usb_hcd_giveback_urb() might be interrupted, and the interrupt handler
might invoke __usb_hcd_giveback_urb() again.
This breaks KCOV, as it does not support nested remote coverage collection
sections within the same context (neither in task nor in softirq).
Update kcov_remote_start/stop_usb_softirq() to disable interrupts for the
duration of the coverage collection section to avoid nested sections in
the softirq context (in addition to such in the task context, which are
already handled).
Reported-by: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Closes: https://lore.kernel.org/linux-usb/0f4d1964-7397-485b-bc48-11c01e2fcbca@I-lo…
Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Fixes: 8fea0c8fda30 ("usb: core: hcd: Convert from tasklet to BH workqueue")
Cc: stable(a)vger.kernel.org
Acked-by: Dmitry Vyukov <dvyukov(a)google.com>
Signed-off-by: Andrey Konovalov <andreyknvl(a)gmail.com>
---
Changes v2->v3:
- Cc: stable(a)vger.kernel.org.
---
drivers/usb/core/hcd.c | 12 ++++++-----
include/linux/kcov.h | 47 ++++++++++++++++++++++++++++++++++--------
2 files changed, 45 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index c0e005670d67..fb1aa0d4fc28 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1623,6 +1623,7 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus);
struct usb_anchor *anchor = urb->anchor;
int status = urb->unlinked;
+ unsigned long flags;
urb->hcpriv = NULL;
if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) &&
@@ -1640,13 +1641,14 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
/* pass ownership to the completion handler */
urb->status = status;
/*
- * This function can be called in task context inside another remote
- * coverage collection section, but kcov doesn't support that kind of
- * recursion yet. Only collect coverage in softirq context for now.
+ * Only collect coverage in the softirq context and disable interrupts
+ * to avoid scenarios with nested remote coverage collection sections
+ * that KCOV does not support.
+ * See the comment next to kcov_remote_start_usb_softirq() for details.
*/
- kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
+ flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
urb->complete(urb);
- kcov_remote_stop_softirq();
+ kcov_remote_stop_softirq(flags);
usb_anchor_resume_wakeups(anchor);
atomic_dec(&urb->use_count);
diff --git a/include/linux/kcov.h b/include/linux/kcov.h
index b851ba415e03..1068a7318d89 100644
--- a/include/linux/kcov.h
+++ b/include/linux/kcov.h
@@ -55,21 +55,47 @@ static inline void kcov_remote_start_usb(u64 id)
/*
* The softirq flavor of kcov_remote_*() functions is introduced as a temporary
- * work around for kcov's lack of nested remote coverage sections support in
- * task context. Adding support for nested sections is tracked in:
- * https://bugzilla.kernel.org/show_bug.cgi?id=210337
+ * workaround for KCOV's lack of nested remote coverage sections support.
+ *
+ * Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337.
+ *
+ * kcov_remote_start_usb_softirq():
+ *
+ * 1. Only collects coverage when called in the softirq context. This allows
+ * avoiding nested remote coverage collection sections in the task context.
+ * For example, USB/IP calls usb_hcd_giveback_urb() in the task context
+ * within an existing remote coverage collection section. Thus, KCOV should
+ * not attempt to start collecting coverage within the coverage collection
+ * section in __usb_hcd_giveback_urb() in this case.
+ *
+ * 2. Disables interrupts for the duration of the coverage collection section.
+ * This allows avoiding nested remote coverage collection sections in the
+ * softirq context (a softirq might occur during the execution of a work in
+ * the BH workqueue, which runs with in_serving_softirq() > 0).
+ * For example, usb_giveback_urb_bh() runs in the BH workqueue with
+ * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
+ * the middle of its remote coverage collection section, and the interrupt
+ * handler might invoke __usb_hcd_giveback_urb() again.
*/
-static inline void kcov_remote_start_usb_softirq(u64 id)
+static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
{
- if (in_serving_softirq())
+ unsigned long flags = 0;
+
+ if (in_serving_softirq()) {
+ local_irq_save(flags);
kcov_remote_start_usb(id);
+ }
+
+ return flags;
}
-static inline void kcov_remote_stop_softirq(void)
+static inline void kcov_remote_stop_softirq(unsigned long flags)
{
- if (in_serving_softirq())
+ if (in_serving_softirq()) {
kcov_remote_stop();
+ local_irq_restore(flags);
+ }
}
#ifdef CONFIG_64BIT
@@ -103,8 +129,11 @@ static inline u64 kcov_common_handle(void)
}
static inline void kcov_remote_start_common(u64 id) {}
static inline void kcov_remote_start_usb(u64 id) {}
-static inline void kcov_remote_start_usb_softirq(u64 id) {}
-static inline void kcov_remote_stop_softirq(void) {}
+static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
+{
+ return 0;
+}
+static inline void kcov_remote_stop_softirq(unsigned long flags) {}
#endif /* CONFIG_KCOV */
#endif /* _LINUX_KCOV_H */
--
2.25.1
[CCing Greg and stable and regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.
Please correct me if I'm wrong, but since 6.8.10 we afaics have below
regression report about a general protection fault as well as two
reports about NFSd related NULL pointer dereferences[1] that got some
attention[2]. From a quick look I saw no fixes for those queued for the
next 6.8.y release. And the series might be EOL soon.
Hmmm. That sounds not really good. Is there some easy way to fix this?
Chuck, you earlier mentioned (see quote below) that Greg pulled in three
changes as dep into 6.8.10 that might be unneeded. Might reverting all
three be the best way forward?
Ciao, Thorsten
[1]
https://lore.kernel.org/all/CAK2bqVJoT3yy2m0OmTnqH9EAKkj6O1iTk42EyyMtvvxKh6…
and https://lore.kernel.org/all/A8DQDS.ZXN0FMYZ3DIM1@gmail.com/
[2] https://x.com/spendergrsec/status/1793489498443252143
On 24.05.24 20:21, Chuck Lever III wrote:
>> On May 24, 2024, at 4:59 AM, Jaroslav Pulchart <jaroslav.pulchart(a)gooddata.com> wrote:
>>
>>>
>>>>
>>>> On Wed, May 22, 2024 at 04:36:57AM -0400, Jaroslav Pulchart wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to report some issue causing a "general protection fault"
>>>>> crash (constantly) after we updated the kernel from 6.8.9 to 6.8.10.
>>>>> This is triggered when monitoring is using nfsstat on a server where
>>>>> nfsd is running.
>>>>>
>>>>> [ 3049.260633] general protection fault, probably for non-canonical
>>>>> address 0x66fb103e19e9cc89: 0000 [#1] PREEMPT SMP NOPTI
>>>>> [ 3049.261628] CPU: 22 PID: 74991 Comm: nfsstat Tainted: G
>>>>> E 6.8.10-1.gdc.el9.x86_64 #1
>>>>> [ 3049.262336] Hardware name: RDO OpenStack Compute/RHEL, BIOS
>>>>> edk2-20240214-2.el9 02/14/2024
>>>>> [ 3049.263003] RIP: 0010:_raw_spin_lock_irqsave+0x19/0x40
>>>>> [ 3049.263487] Code: cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
>>>>> 90 0f 1f 44 00 00 41 54 9c 41 5c fa 65 ff 05 a6 92 f5 42 31 c0 ba 01
>>>>> 00 00 00 <f0> 0f b1 17 75 0a 4c 89 e0 41 5c c3 cc cc cc cc 89 c6 e8 d0
>>>>> 07 00
>>>>> [ 3049.264882] RSP: 0018:ffffb1bca6b9bd00 EFLAGS: 00010046
>>>>> [ 3049.265365] RAX: 0000000000000000 RBX: 66fb103e19e9c989 RCX: 0000000000000001
>>>>> [ 3049.265953] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 66fb103e19e9cc89
>>>>> [ 3049.266542] RBP: ffffffffc15df280 R08: 0000000000000001 R09: ffffa049a1785cb8
>>>>> [ 3049.267112] R10: ffffb1bca6b9bd70 R11: ffffa04964e49000 R12: 0000000000000246
>>>>> [ 3049.267702] R13: 66fb103e19e9cc89 R14: ffffa048445590a0 R15: 0000000000000001
>>>>> [ 3049.268278] FS: 00007fa3ddf03740(0000) GS:ffffa05703d00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 3049.268928] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 3049.269443] CR2: 00007fa3dddfca50 CR3: 0000000342d1e004 CR4: 0000000000770ef0
>>>>> [ 3049.270025] PKRU: 55555554
>>>>> [ 3049.270371] Call Trace:
>>>>> [ 3049.270723] <TASK>
>>>>> [ 3049.271035] ? die_addr+0x33/0x90
>>>>> [ 3049.271423] ? exc_general_protection+0x1ea/0x450
>>>>> [ 3049.271879] ? asm_exc_general_protection+0x22/0x30
>>>>> [ 3049.272344] ? _raw_spin_lock_irqsave+0x19/0x40
>>>>> [ 3049.272803] __percpu_counter_sum+0xd/0x70
>>>>> [ 3049.273219] nfsd_show+0x4f/0x1d0 [nfsd]
>>>>> [ 3049.273666] seq_read_iter+0x11d/0x4d0
>>>>> [ 3049.274073] ? avc_has_perm+0x42/0xc0
>>>>> [ 3049.274489] seq_read+0xfe/0x140
>>>>> [ 3049.274866] proc_reg_read+0x56/0xa0
>>>>> [ 3049.275257] vfs_read+0xa7/0x340
>>>>> [ 3049.275647] ? __do_sys_newfstat+0x57/0x60
>>>>> [ 3049.276059] ksys_read+0x5f/0xe0
>>>>> [ 3049.276439] do_syscall_64+0x5e/0x170
>>>>> [ 3049.276836] entry_SYSCALL_64_after_hwframe+0x78/0x80
>>>>> [ 3049.277296] RIP: 0033:0x7fa3ddcfd9b2
>>>>> [ 3049.277719] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ea 1d 0c 00 e8 c5
>>>>> fd 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75
>>>>> 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89
>>>>> 54 24
>>>>> [ 3049.279139] RSP: 002b:00007ffd930672e8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000000
>>>>> [ 3049.279788] RAX: ffffffffffffffda RBX: 0000555ded47c2a0 RCX: 00007fa3ddcfd9b2
>>>>> [ 3049.280402] RDX: 0000000000000400 RSI: 0000555ded47c480 RDI: 0000000000000003
>>>>> [ 3049.281046] RBP: 00007fa3dddf75e0 R08: 0000000000000003 R09: 0000000000000077
>>>>> [ 3049.281673] R10: 000000000000005d R11: 0000000000000246 R12: 0000555ded47c2a0
>>>>> [ 3049.282307] R13: 0000000000000d68 R14: 00007fa3dddf69e0 R15: 0000000000000d68
>>>>> [ 3049.282928] </TASK>
>>>>> [ 3049.283310] Modules linked in: mptcp_diag(E) xsk_diag(E)
>>>>> raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) udp_diag(E)
>>>>> tcp_diag(E) inet_diag(E) tun(E) br_netfilter(E) bridge(E) stp(E)
>>>>> llc(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E)
>>>>> nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E)
>>>>> zram(E) tls(E) isofs(E) vfat(E) fat(E) intel_rapl_msr(E)
>>>>> intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) irqbypass(E)
>>>>> virtio_net(E) i2c_i801(E) virtio_gpu(E) i2c_smbus(E) net_failover(E)
>>>>> virtio_balloon(E) failover(E) virtio_dma_buf(E) fuse(E) ext4(E)
>>>>> mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E)
>>>>> crct10dif_pclmul(E) crc32_pclmul(Ea) polyval_clmulni(E)
>>>>> polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E)
>>>>> virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E)
>>>>> raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E)
>>>>> dm_region_hash(E) dm_log(E) dm_mod(E)
>>>>> [ 3049.283345] Unloaded tainted modules: edac_mce_amd(E):1 padlock_aes(E)
>>>>>
>>>>> Any suggestion on how to fix it is appreciated.
>>>>
>>>> Bisect between v6.8.9 and v6.8.10 would give us the exact point
>>>> where the failures were introduced.
>>>>
>>>> I see that GregKH pulled in:
>>>>
>>>> 26a0ddb04230 ("nfsd: rename NFSD_NET_* to NFSD_STATS_*")
>>>> b7b05f98f3f0 ("nfsd: expose /proc/net/sunrpc/nfsd in net namespaces")
>>>> abf5fb593c90 ("nfsd: make all of the nfsd stats per-network namespace")
>>>>
>>>> for v6.8.10 as a Stable-Dep-of: 18180a4550d0 ("NFSD: Fix nfsd4_encode_fattr4() crasher")
>>>>
>>>> Which is a little baffling, I don't see how those two change sets
>>>> are mechanically related to each other. But I suspect the culprit is
>>>> one of those three stat-related patches.
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>>
>>>
>>> Hello,
>>>
>>> I run bisecting. It was easy to reproduce, simple execution of
>>> "nfsstat" from terminal stuck the server:
>>>
>>> abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566 is the first bad commit
>>>
>>>
>>> $ git bisect bad
>>> abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566 is the first bad commit
>>> commit abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566 (HEAD)
>>> Author: Josef Bacik <josef(a)toxicpanda.com>
>>> Date: Fri Jan 26 10:39:47 2024 -0500
>>>
>>> nfsd: make all of the nfsd stats per-network namespace
>>>
>>> [ Upstream commit 4b14885411f74b2b0ce0eb2b39d0fffe54e5ca0d ]
>>>
>>> We have a global set of counters that we modify for all of the nfsd
>>> operations, but now that we're exposing these stats across all network
>>> namespaces we need to make the stats also be per-network namespace. We
>>> already have some caching stats that are per-network namespace, so move
>>> these definitions into the same counter and then adjust all the helpers
>>> and users of these stats to provide the appropriate nfsd_net struct so
>>> that the stats are maintained for the per-network namespace objects.
>>>
>>> Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
>>> Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
>>> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
>>> Stable-dep-of: 18180a4550d0 ("NFSD: Fix nfsd4_encode_fattr4() crasher")
>>> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>>>
>>> fs/nfsd/cache.h | 2 --
>>> fs/nfsd/netns.h | 17 +++++++++++++++--
>>> fs/nfsd/nfs4proc.c | 6 +++---
>>> fs/nfsd/nfs4state.c | 3 ++-
>>> fs/nfsd/nfscache.c | 36 +++++++-----------------------------
>>> fs/nfsd/nfsctl.c | 12 +++---------
>>> fs/nfsd/nfsfh.c | 3 ++-
>>> fs/nfsd/stats.c | 26 ++++++++++++++------------
>>> fs/nfsd/stats.h | 54 +++++++++++++++++++-----------------------------------
>>> fs/nfsd/vfs.c | 6 ++++--
>>> 10 files changed, 69 insertions(+), 96 deletions(-)
>>>
>>> $ git bisect log
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [f3d61438b613b87afb63118bea6fb18c50ba7a6b] Linux 6.8.9
>>> git bisect good f3d61438b613b87afb63118bea6fb18c50ba7a6b
>>> # status: waiting for bad commit, 1 good commit known
>>> # bad: [a0c69a570e420e86c7569b8c052913213eef2b45] Linux 6.8.10
>>> git bisect bad a0c69a570e420e86c7569b8c052913213eef2b45
>>> # bad: [4aaed9dbe8acd2b6114458f0498a617283d6275b] hv_netvsc: Don't
>>> free decrypted memory
>>> git bisect bad 4aaed9dbe8acd2b6114458f0498a617283d6275b
>>> # bad: [ee190d04c2f99c8e557b00e997621c04592baed1] net: gro: add flush
>>> check in udp_gro_receive_segment
>>> git bisect bad ee190d04c2f99c8e557b00e997621c04592baed1
>>> # bad: [781e34b736014188ba9e46a71535237313dcda81] efi/unaccepted:
>>> touch soft lockup during memory accept
>>> git bisect bad 781e34b736014188ba9e46a71535237313dcda81
>>> # bad: [6a7b07689af6e4e023404bf69b1230f43b2a15bc] NFSD: Fix
>>> nfsd4_encode_fattr4() crasher
>>> git bisect bad 6a7b07689af6e4e023404bf69b1230f43b2a15bc
>>> # good: [e05194baae299f2148ab5f6bab659c6ce8d1f6d3] nfs: expose
>>> /proc/net/sunrpc/nfs in net namespaces
>>> git bisect good e05194baae299f2148ab5f6bab659c6ce8d1f6d3
>>> # good: [946ab150335d92f852288c1c6b0f0466b5d6e97f] power: supply:
>>> mt6360_charger: Fix of_match for usb-otg-vbus regulator
>>> git bisect good 946ab150335d92f852288c1c6b0f0466b5d6e97f
>>> # good: [b7b05f98f3f06fea3986b46e5c7fe2928676b02d] nfsd: expose
>>> /proc/net/sunrpc/nfsd in net namespaces
>>> git bisect good b7b05f98f3f06fea3986b46e5c7fe2928676b02d
>>> # bad: [0e8003af77879572dbc1df56860cbe2bfa8498f0] NFSD: add support
>>> for CB_GETATTR callback
>>> git bisect bad 0e8003af77879572dbc1df56860cbe2bfa8498f0
>>> # bad: [abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566] nfsd: make all of
>>> the nfsd stats per-network namespace
>>> git bisect bad abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566
>>> # first bad commit: [abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566] nfsd:
>>> make all of the nfsd stats per-network namespace
>>
>> I built a full 6.8.10 with reverted single commit
>> "abf5fb593c90d3ab55d6cf1dea7bec8ee0bf3566". The server does not get
>> stuck when calling "nfsstat".
>
> Good to know, but I don't think it's entirely safe to revert
> only that patch -- all three would have to come off.
>
> I can't seem to get nfsstat to trigger a problem on my
> server.
>
>
> --
> Chuck Lever
>
>
bcm4377_init_cfg() uses pci_{read,write}_config_dword() that return
PCIBIOS_* codes. The return codes are returned into the calling
bcm4377_probe() which directly returns the error which is of incorrect
type (a probe should return normal errnos).
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from bcm4377_init_cfg. This conversion is the
easiest by adding a label next to return and doing the conversion there
once rather than adding pcibios_err_to_errno() into every single return
statement.
Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/bluetooth/hci_bcm4377.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/bluetooth/hci_bcm4377.c b/drivers/bluetooth/hci_bcm4377.c
index 0c2f15235b4c..b00240109dc3 100644
--- a/drivers/bluetooth/hci_bcm4377.c
+++ b/drivers/bluetooth/hci_bcm4377.c
@@ -2134,44 +2134,46 @@ static int bcm4377_init_cfg(struct bcm4377_data *bcm4377)
BCM4377_PCIECFG_BAR0_WINDOW1,
bcm4377->hw->bar0_window1);
if (ret)
- return ret;
+ goto fail;
ret = pci_write_config_dword(bcm4377->pdev,
BCM4377_PCIECFG_BAR0_WINDOW2,
bcm4377->hw->bar0_window2);
if (ret)
- return ret;
+ goto fail;
ret = pci_write_config_dword(
bcm4377->pdev, BCM4377_PCIECFG_BAR0_CORE2_WINDOW1,
BCM4377_PCIECFG_BAR0_CORE2_WINDOW1_DEFAULT);
if (ret)
- return ret;
+ goto fail;
if (bcm4377->hw->has_bar0_core2_window2) {
ret = pci_write_config_dword(bcm4377->pdev,
BCM4377_PCIECFG_BAR0_CORE2_WINDOW2,
bcm4377->hw->bar0_core2_window2);
if (ret)
- return ret;
+ goto fail;
}
ret = pci_write_config_dword(bcm4377->pdev, BCM4377_PCIECFG_BAR2_WINDOW,
BCM4377_PCIECFG_BAR2_WINDOW_DEFAULT);
if (ret)
- return ret;
+ goto fail;
ret = pci_read_config_dword(bcm4377->pdev,
BCM4377_PCIECFG_SUBSYSTEM_CTRL, &ctrl);
if (ret)
- return ret;
+ goto fail;
if (bcm4377->hw->clear_pciecfg_subsystem_ctrl_bit19)
ctrl &= ~BIT(19);
ctrl |= BIT(16);
- return pci_write_config_dword(bcm4377->pdev,
- BCM4377_PCIECFG_SUBSYSTEM_CTRL, ctrl);
+ ret = pci_write_config_dword(bcm4377->pdev,
+ BCM4377_PCIECFG_SUBSYSTEM_CTRL, ctrl);
+fail:
+ return pcibios_err_to_errno(ret);
}
static int bcm4377_probe_dmi(struct bcm4377_data *bcm4377)
--
2.39.2
From: Nathan Lynch <nathanl(a)linux.ibm.com>
[ Upstream commit ff2e185cf73df480ec69675936c4ee75a445c3e4 ]
plpar_hcall(), plpar_hcall9(), and related functions expect callers to
provide valid result buffers of certain minimum size. Currently this
is communicated only through comments in the code and the compiler has
no idea.
For example, if I write a bug like this:
long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE
plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...);
This compiles with no diagnostics emitted, but likely results in stack
corruption at runtime when plpar_hcall9() stores results past the end
of the array. (To be clear this is a contrived example and I have not
found a real instance yet.)
To make this class of error less likely, we can use explicitly-sized
array parameters instead of pointers in the declarations for the hcall
APIs. When compiled with -Warray-bounds[1], the code above now
provokes a diagnostic like this:
error: array argument is too small;
is of size 32, callee requires at least 72 [-Werror,-Warray-bounds]
60 | plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf,
| ^ ~~~~~~
[1] Enabled for LLVM builds but not GCC for now. See commit
0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") and
related changes.
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.i…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/powerpc/include/asm/hvcall.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index a0b17f9f1ea4e..8347f57e1c6a3 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -383,7 +383,7 @@ long plpar_hcall_norets(unsigned long opcode, ...);
* Used for all but the craziest of phyp interfaces (see plpar_hcall9)
*/
#define PLPAR_HCALL_BUFSIZE 4
-long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL_BUFSIZE], ...);
/**
* plpar_hcall_raw: - Make a hypervisor call without calculating hcall stats
@@ -397,7 +397,7 @@ long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...);
* plpar_hcall, but plpar_hcall_raw works in real mode and does not
* calculate hypervisor call statistics.
*/
-long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall_raw(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL_BUFSIZE], ...);
/**
* plpar_hcall9: - Make a pseries hypervisor call with up to 9 return arguments
@@ -408,8 +408,8 @@ long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...);
* PLPAR_HCALL9_BUFSIZE to size the return argument buffer.
*/
#define PLPAR_HCALL9_BUFSIZE 9
-long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...);
-long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall9(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL9_BUFSIZE], ...);
+long plpar_hcall9_raw(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL9_BUFSIZE], ...);
struct hvcall_mpp_data {
unsigned long entitled_mem;
--
2.43.0
kthread creation may possibly fail inside race_signal_callback(). In
such a case stop the already started threads, put the already taken
references to them and return with error code.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 2989f6451084 ("dma-buf: Add selftests for dma-fence")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
v2: use kthread_stop_put() to actually put the last reference as
T.J. Mercier noticed;
link to v1: https://lore.kernel.org/lkml/20240522122326.696928-1-pchelkin@ispras.ru/
drivers/dma-buf/st-dma-fence.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index b7c6f7ea9e0c..6a1bfcd0cc21 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -540,6 +540,12 @@ static int race_signal_callback(void *arg)
t[i].before = pass;
t[i].task = kthread_run(thread_signal_callback, &t[i],
"dma-fence:%d", i);
+ if (IS_ERR(t[i].task)) {
+ ret = PTR_ERR(t[i].task);
+ while (--i >= 0)
+ kthread_stop_put(t[i].task);
+ return ret;
+ }
get_task_struct(t[i].task);
}
--
2.39.2
From: Rand Deeb <rand.sec96(a)gmail.com>
[ Upstream commit 789c17185fb0f39560496c2beab9b57ce1d0cbe7 ]
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96(a)gmail.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/ssb/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index b9934b9c2d708..070a99a4180cc 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -341,11 +341,13 @@ static int ssb_bus_match(struct device *dev, struct device_driver *drv)
static int ssb_device_uevent(const struct device *dev, struct kobj_uevent_env *env)
{
- const struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
+ const struct ssb_device *ssb_dev;
if (!dev)
return -ENODEV;
+ ssb_dev = dev_to_ssb_dev(dev);
+
return add_uevent_var(env,
"MODALIAS=ssb:v%04Xid%04Xrev%02X",
ssb_dev->id.vendor, ssb_dev->id.coreid,
--
2.43.0
From: Rand Deeb <rand.sec96(a)gmail.com>
[ Upstream commit 789c17185fb0f39560496c2beab9b57ce1d0cbe7 ]
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96(a)gmail.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/ssb/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index ab080cf26c9ff..0c736d51566dc 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -341,11 +341,13 @@ static int ssb_bus_match(struct device *dev, struct device_driver *drv)
static int ssb_device_uevent(const struct device *dev, struct kobj_uevent_env *env)
{
- const struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
+ const struct ssb_device *ssb_dev;
if (!dev)
return -ENODEV;
+ ssb_dev = dev_to_ssb_dev(dev);
+
return add_uevent_var(env,
"MODALIAS=ssb:v%04Xid%04Xrev%02X",
ssb_dev->id.vendor, ssb_dev->id.coreid,
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 334bdfeab9403..365a2c7a89bad 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -190,6 +199,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -201,6 +211,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 088fcad138c98..38c6e9f16f41e 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -189,6 +198,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -200,6 +210,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7c76b841b17bb..21bde60c95230 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -183,6 +192,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7c76b841b17bb..21bde60c95230 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -183,6 +192,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
intel_th_pci_activate() uses pci_{read,write}_config_dword() that
return PCIBIOS_* codes. The value is returned as is to the caller. The
non-errno return value is returned all the way to active_store() which
then returns the value like it is an error. PCIBIOS_* return codes,
however, are positive (0x8X) so the return value of the store function
is treated as the length consumed from the write buffer which can
confuse the userspace writer.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from intel_th_pci_activate().
Fixes: a0e7df335afd ("intel_th: Perform time resync on capture start")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/hwtracing/intel_th/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 147d338c191e..40e2c922fbe7 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -46,7 +46,7 @@ static int intel_th_pci_activate(struct intel_th *th)
if (err)
dev_err(&pdev->dev, "failed to read NPKDSC register\n");
- return err;
+ return pcibios_err_to_errno(err);
}
static void intel_th_pci_deactivate(struct intel_th *th)
--
2.39.2
[Why]
After supend/resume, with topology unchanged, observe that
link_address_sent of all mstb are marked as false even the topology probing
is done without any error.
It is caused by wrongly also include "ret == 0" case as a probing failure
case.
[How]
Remove inappropriate checking conditions.
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Harry Wentland <hwentlan(a)amd.com>
Cc: Jani Nikula <jani.nikula(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 37dfdc55ffeb ("drm/dp_mst: Cleanup drm_dp_send_link_address() a bit")
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 7f8e1cfbe19d..68831f4e502a 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -2929,7 +2929,7 @@ static int drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
/* FIXME: Actually do some real error handling here */
ret = drm_dp_mst_wait_tx_reply(mstb, txmsg);
- if (ret <= 0) {
+ if (ret < 0) {
drm_err(mgr->dev, "Sending link address failed with %d\n", ret);
goto out;
}
@@ -2981,7 +2981,7 @@ static int drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
mutex_unlock(&mgr->lock);
out:
- if (ret <= 0)
+ if (ret < 0)
mstb->link_address_sent = false;
kfree(txmsg);
return ret < 0 ? ret : changed;
--
2.37.3
Until recently the "upper layer" was MTD. But following incremental
reworks to bring spi-nand support and more recently generic ECC support,
there is now an intermediate "generic NAND" layer that also needs to get
access to some values. When using "converted" ECC engines, like the
software ones, these values are already propagated correctly. But
otherwise when using good old raw NAND controller drivers, we need to
manually set these values ourselves at the end of the "scan" operation,
once these values have been negotiated.
Without this propagation, later (generic) checks like the one warning
users that the ECC strength is not high enough might simply no longer
work.
Fixes: 8c126720fe10 ("mtd: rawnand: Use the ECC framework nand_ecc_is_strong_enough() helper")
Cc: stable(a)vger.kernel.org
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Closes: https://lore.kernel.org/all/Zhe2JtvvN1M4Ompw@pengutronix.de/
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Hello Sascha, this is only compile tested, would you mind checking if
that fixes your setup?
Thanks, Miquèl
drivers/mtd/nand/raw/nand_base.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index d7dbbd469b89..acd137dd0957 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -6301,6 +6301,7 @@ static const struct nand_ops rawnand_ops = {
static int nand_scan_tail(struct nand_chip *chip)
{
struct mtd_info *mtd = nand_to_mtd(chip);
+ struct nand_device *base = &chip->base;
struct nand_ecc_ctrl *ecc = &chip->ecc;
int ret, i;
@@ -6445,9 +6446,13 @@ static int nand_scan_tail(struct nand_chip *chip)
if (!ecc->write_oob_raw)
ecc->write_oob_raw = ecc->write_oob;
- /* propagate ecc info to mtd_info */
+ /* Propagate ECC info to the generic NAND and MTD layers */
mtd->ecc_strength = ecc->strength;
+ if (!base->ecc.ctx.conf.strength)
+ base->ecc.ctx.conf.strength = ecc->strength;
mtd->ecc_step_size = ecc->size;
+ if (!base->ecc.ctx.conf.step_size)
+ base->ecc.ctx.conf.step_size = ecc->size;
/*
* Set the number of read / write steps for one page depending on ECC
@@ -6455,6 +6460,8 @@ static int nand_scan_tail(struct nand_chip *chip)
*/
if (!ecc->steps)
ecc->steps = mtd->writesize / ecc->size;
+ if (!base->ecc.ctx.nsteps)
+ base->ecc.ctx.nsteps = ecc->steps;
if (ecc->steps * ecc->size != mtd->writesize) {
WARN(1, "Invalid ECC parameters\n");
ret = -EINVAL;
--
2.40.1
The nand_read_data_op() operation, which only consists in DATA_IN
cycles, is sadly not supported by all controllers despite being very
basic. The core, for some time, supposed all drivers would support
it. An improvement to this situation for supporting more constrained
controller added a check to verify if the operation was supported before
attempting it by running the function with the check_only boolean set
first, and then possibly falling back to another (possibly slightly less
optimized) alternative.
An even newer addition moved that check very early and probe time, in
order to perform the check only once. The content of the operation was
not so important, as long as the controller driver would tell whether
such operation on the NAND bus would be possible or not. In practice, no
buffer was provided (no fake buffer or whatever) as it is anyway not
relevant for the "check_only" condition. Unfortunately, early in the
function, there is an if statement verifying that the input parameters
are right for normal use, making the early check always unsuccessful.
Fixes: 9f820fc0651c ("mtd: rawnand: Check the data only read pattern only once")
Cc: stable(a)vger.kernel.org
Reported-by: Alexander Dahl <ada(a)thorsis.com>
Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsi…
Reported-by: Steven Seeger <steven.seeger(a)flightsystems.net>
Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM…
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Reviewed-by: Alexander Dahl <ada(a)thorsis.com>
---
drivers/mtd/nand/raw/nand_base.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index acd137dd0957..248e654ecefd 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -2173,7 +2173,7 @@ EXPORT_SYMBOL_GPL(nand_reset_op);
int nand_read_data_op(struct nand_chip *chip, void *buf, unsigned int len,
bool force_8bit, bool check_only)
{
- if (!len || !buf)
+ if (!len || (!check_only && !buf))
return -EINVAL;
if (nand_has_exec_op(chip)) {
--
2.40.1
Early during NAND identification, mtd_info fields have not yet been
initialized (namely, writesize and oobsize) and thus cannot be used for
sanity checks yet. Of course if there is a misuse of
nand_change_read_column_op() so early we won't be warned, but there is
anyway no actual check to perform at this stage as we do not yet know
the NAND geometry.
So, if the fields are empty, especially mtd->writesize which is *always*
set quite rapidly after identification, let's skip the sanity checks.
nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
identification in the very unlikely case of:
- bitflips appearing in the parameter page,
- the controller driver not supporting simple DATA_IN cycles.
As nand_change_read_column_op() uses nand_fill_column_cycles() the logic
explaind above also applies in this secondary helper.
Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
Cc: stable(a)vger.kernel.org
Reported-by: Alexander Dahl <ada(a)thorsis.com>
Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsi…
Reported-by: Steven Seeger <steven.seeger(a)flightsystems.net>
Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM…
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changes in v2:
* Dropped the double (( ))
* Fixed nand_fill_column_cycles() as well.
---
drivers/mtd/nand/raw/nand_base.c | 57 ++++++++++++++++++--------------
1 file changed, 32 insertions(+), 25 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 248e654ecefd..53e16d39af4b 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -1093,28 +1093,32 @@ static int nand_fill_column_cycles(struct nand_chip *chip, u8 *addrs,
unsigned int offset_in_page)
{
struct mtd_info *mtd = nand_to_mtd(chip);
+ bool ident_stage = !mtd->writesize;
- /* Make sure the offset is less than the actual page size. */
- if (offset_in_page > mtd->writesize + mtd->oobsize)
- return -EINVAL;
-
- /*
- * On small page NANDs, there's a dedicated command to access the OOB
- * area, and the column address is relative to the start of the OOB
- * area, not the start of the page. Asjust the address accordingly.
- */
- if (mtd->writesize <= 512 && offset_in_page >= mtd->writesize)
- offset_in_page -= mtd->writesize;
-
- /*
- * The offset in page is expressed in bytes, if the NAND bus is 16-bit
- * wide, then it must be divided by 2.
- */
- if (chip->options & NAND_BUSWIDTH_16) {
- if (WARN_ON(offset_in_page % 2))
+ /* Bypass all checks during NAND identification */
+ if (likely(!ident_stage)) {
+ /* Make sure the offset is less than the actual page size. */
+ if (offset_in_page > mtd->writesize + mtd->oobsize)
return -EINVAL;
- offset_in_page /= 2;
+ /*
+ * On small page NANDs, there's a dedicated command to access the OOB
+ * area, and the column address is relative to the start of the OOB
+ * area, not the start of the page. Asjust the address accordingly.
+ */
+ if (mtd->writesize <= 512 && offset_in_page >= mtd->writesize)
+ offset_in_page -= mtd->writesize;
+
+ /*
+ * The offset in page is expressed in bytes, if the NAND bus is 16-bit
+ * wide, then it must be divided by 2.
+ */
+ if (chip->options & NAND_BUSWIDTH_16) {
+ if (WARN_ON(offset_in_page % 2))
+ return -EINVAL;
+
+ offset_in_page /= 2;
+ }
}
addrs[0] = offset_in_page;
@@ -1123,7 +1127,7 @@ static int nand_fill_column_cycles(struct nand_chip *chip, u8 *addrs,
* Small page NANDs use 1 cycle for the columns, while large page NANDs
* need 2
*/
- if (mtd->writesize <= 512)
+ if (!ident_stage && mtd->writesize <= 512)
return 1;
addrs[1] = offset_in_page >> 8;
@@ -1436,16 +1440,19 @@ int nand_change_read_column_op(struct nand_chip *chip,
unsigned int len, bool force_8bit)
{
struct mtd_info *mtd = nand_to_mtd(chip);
+ bool ident_stage = !mtd->writesize;
if (len && !buf)
return -EINVAL;
- if (offset_in_page + len > mtd->writesize + mtd->oobsize)
- return -EINVAL;
+ if (!ident_stage) {
+ if (offset_in_page + len > mtd->writesize + mtd->oobsize)
+ return -EINVAL;
- /* Small page NANDs do not support column change. */
- if (mtd->writesize <= 512)
- return -ENOTSUPP;
+ /* Small page NANDs do not support column change. */
+ if (mtd->writesize <= 512)
+ return -ENOTSUPP;
+ }
if (nand_has_exec_op(chip)) {
const struct nand_interface_config *conf =
--
2.40.1
.setup_interface first gets called with a "target" value of
NAND_DATA_IFACE_CHECK_ONLY, in which case an error is expected
if the controller driver does not support the timing mode (NVDDR).
Fixes: a9ecc8c814e9 ("mtd: rawnand: Choose the best timings, NV-DDR included")
Signed-off-by: Val Packett <val(a)packett.cool>
Cc: stable(a)vger.kernel.org
---
drivers/mtd/nand/raw/rockchip-nand-controller.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/rockchip-nand-controller.c b/drivers/mtd/nand/raw/rockchip-nand-controller.c
index 7baaef69d..555804476 100644
--- a/drivers/mtd/nand/raw/rockchip-nand-controller.c
+++ b/drivers/mtd/nand/raw/rockchip-nand-controller.c
@@ -420,13 +420,13 @@ static int rk_nfc_setup_interface(struct nand_chip *chip, int target,
u32 rate, tc2rw, trwpw, trw2c;
u32 temp;
- if (target < 0)
- return 0;
-
timings = nand_get_sdr_timings(conf);
if (IS_ERR(timings))
return -EOPNOTSUPP;
+ if (target < 0)
+ return 0;
+
if (IS_ERR(nfc->nfc_clk))
rate = clk_get_rate(nfc->ahb_clk);
else
--
2.45.0
The platform driver conversion of EINJ mistakenly used
platform_device_del() to unwind platform_device_register_full() at
module exit. This leads to a small leak of one 'struct platform_device'
instance per module load/unload cycle. Switch to
platform_device_unregister() which performs both device_del() and final
put_device().
Fixes: 5621fafaac00 ("EINJ: Migrate to a platform driver")
Cc: <stable(a)vger.kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: Ben Cheatham <Benjamin.Cheatham(a)amd.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/acpi/apei/einj-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index 01faca3a238a..bb9f8475ce59 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -903,7 +903,7 @@ static void __exit einj_exit(void)
if (einj_initialized)
platform_driver_unregister(&einj_driver);
- platform_device_del(einj_dev);
+ platform_device_unregister(einj_dev);
}
module_init(einj_init);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: mgb4: Fix double debugfs remove
Author: Martin Tůma <martin.tuma(a)digiteqautomotive.com>
Date: Tue May 21 18:22:54 2024 +0200
Fixes an error where debugfs_remove_recursive() is called first on a parent
directory and then again on a child which causes a kernel panic.
Signed-off-by: Martin Tůma <martin.tuma(a)digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 0ab13674a9bd ("media: pci: mgb4: Added Digiteq Automotive MGB4 driver")
Cc: <stable(a)vger.kernel.org>
[hverkuil: added Fixes/Cc tags]
drivers/media/pci/mgb4/mgb4_core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
---
diff --git a/drivers/media/pci/mgb4/mgb4_core.c b/drivers/media/pci/mgb4/mgb4_core.c
index 60498a5abebf..ab4f07e2e560 100644
--- a/drivers/media/pci/mgb4/mgb4_core.c
+++ b/drivers/media/pci/mgb4/mgb4_core.c
@@ -642,9 +642,6 @@ static void mgb4_remove(struct pci_dev *pdev)
struct mgb4_dev *mgbdev = pci_get_drvdata(pdev);
int i;
-#ifdef CONFIG_DEBUG_FS
- debugfs_remove_recursive(mgbdev->debugfs);
-#endif
#if IS_REACHABLE(CONFIG_HWMON)
hwmon_device_unregister(mgbdev->hwmon_dev);
#endif
@@ -659,6 +656,10 @@ static void mgb4_remove(struct pci_dev *pdev)
if (mgbdev->vin[i])
mgb4_vin_free(mgbdev->vin[i]);
+#ifdef CONFIG_DEBUG_FS
+ debugfs_remove_recursive(mgbdev->debugfs);
+#endif
+
device_remove_groups(&mgbdev->pdev->dev, mgb4_pci_groups);
free_spi(mgbdev);
free_i2c(mgbdev);
When multiple streams are in use, multiple TDs might be in flight when
an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for
each, to ensure everything is reset properly and the caches cleared.
Change the logic so that any N>1 TDs found active for different streams
are deferred until after the first one is processed, calling
xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to
queue another command until we are done with all of them. Also change
the error/"should never happen" paths to ensure we at least clear any
affected TDs, even if we can't issue a command to clear the hardware
cache, and complain loudly with an xhci_warn() if this ever happens.
This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct
assumptions about number of rings per endpoint.") early on in the XHCI
driver's life, when stream support was first added. At that point, this
condition would cause TDs to not be given back at all, causing hanging
transfers - but no security bug. It was then identified but not fixed
nor made into a warning in commit 674f8438c121 ("xhci: split handling
halted endpoints into two steps"), which added a FIXME comment for the
problem case (without materially changing the behavior as far as I can
tell, though the new logic made the problem more obvious).
Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some
cached cancelled URBs."), it was acknowledged again. This commit was
unfortunately not reviewed at all, as it was authored by the maintainer
directly. Had it been, perhaps a second set of eyes would've noticed
that it does not fix the bug, but rather just makes it (much) worse.
It turns the "transfers hang" bug into a "random memory corruption" bug,
by blindly marking TDs as complete without actually clearing them at all
nor moving the dequeue pointer past them, which means they aren't actually
complete, and the xHC will try to transfer data to/from them when the
endpoint resumes, now to freed memory buffers.
This could have been a legitimate oversight, but apparently the commit
author was aware of the problem (yet still chose to submit it): It was
still mentioned as a FIXME, an xhci_dbg() was added to log the problem
condition, and the remaining issue was mentioned in the commit
description. The choice of making the log type xhci_dbg() for what is,
at this point, a completely unhandled and known broken condition is
puzzling and unfortunate, as it guarantees that no actual users would
see the log in production, thereby making it nigh undebuggable (indeed,
even if you turn on DEBUG, the message doesn't really hint at there
being a problem at all).
It took me *months* of random xHC crashes to finally find a reliable
repro and be able to do a deep dive debug session, which could all have
been avoided had this unhandled, broken condition been actually reported
with a warning, as it should have been as a bug intentionally left in
unfixed (never mind that it shouldn't have been left in at all).
> Another fix to solve clearing the caches of all stream rings with
> cancelled TDs is needed, but not as urgent.
3 years after that statement and 14 years after the original bug was
introduced, I think it's finally time to fix it. And maybe next time
let's not leave bugs unfixed (that are actually worse than the original
bug), and let's actually get people to review kernel commits please.
Fixes xHC crashes and IOMMU faults with UAS devices when handling
errors/faults. Easiest repro is to use `hdparm` to mark an early sector
(e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop.
At least in the case of JMicron controllers, the read errors end up
having to cancel two TDs (for two queued requests to different streams)
and the one that didn't get cleared properly ends up faulting the xHC
entirely when it tries to access DMA pages that have since been unmapped,
referred to by the stale TDs. This normally happens quickly (after two
or three loops). After this fix, I left the `cat` in a loop running
overnight and experienced no xHC failures, with all read errors
recovered properly. Repro'd and tested on an Apple M1 Mac Mini
(dwc3 host).
On systems without an IOMMU, this bug would instead silently corrupt
freed memory, making this a security bug (even on systems with IOMMUs
this could silently corrupt memory belonging to other USB devices on the
same controller, so it's still a security bug). Given that the kernel
autoprobes partition tables, I'm pretty sure a malicious USB device
pretending to be a UAS device and reporting an error with the right
timing could deliberately trigger a UAF and write to freed memory, with
no user action.
Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.")
Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.")
Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps")
Cc: stable(a)vger.kernel.org
Cc: security(a)kernel.org
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
drivers/usb/host/xhci-ring.c | 54 +++++++++++++++++++++++++++++++++++---------
drivers/usb/host/xhci.h | 1 +
2 files changed, 44 insertions(+), 11 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 575f0fd9c9f1..9c06502be098 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1034,13 +1034,27 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
break;
case TD_DIRTY: /* TD is cached, clear it */
case TD_HALTED:
+ case TD_CLEARING_CACHE_DEFERRED:
+ if (cached_td) {
+ if (cached_td->urb->stream_id != td->urb->stream_id) {
+ /* Multiple streams case, defer move dq */
+ xhci_dbg(xhci,
+ "Move dq deferred: stream %u URB %p\n",
+ td->urb->stream_id, td->urb);
+ td->cancel_status = TD_CLEARING_CACHE_DEFERRED;
+ break;
+ }
+
+ /* Should never happen, at least try to clear the TD if it does */
+ xhci_warn(xhci,
+ "Found multiple active URBs %p and %p in stream %u?\n",
+ td->urb, cached_td->urb,
+ td->urb->stream_id);
+ td_to_noop(xhci, ring, cached_td, false);
+ cached_td->cancel_status = TD_CLEARED;
+ }
+
td->cancel_status = TD_CLEARING_CACHE;
- if (cached_td)
- /* FIXME stream case, several stopped rings */
- xhci_dbg(xhci,
- "Move dq past stream %u URB %p instead of stream %u URB %p\n",
- td->urb->stream_id, td->urb,
- cached_td->urb->stream_id, cached_td->urb);
cached_td = td;
break;
}
@@ -1060,10 +1074,16 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
if (err) {
/* Failed to move past cached td, just set cached TDs to no-op */
list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
- if (td->cancel_status != TD_CLEARING_CACHE)
+ /*
+ * Deferred TDs need to have the deq pointer set after the above command
+ * completes, so if that failed we just give up on all of them (and
+ * complain loudly since this could cause issues due to caching).
+ */
+ if (td->cancel_status != TD_CLEARING_CACHE &&
+ td->cancel_status != TD_CLEARING_CACHE_DEFERRED)
continue;
- xhci_dbg(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
- td->urb);
+ xhci_warn(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
+ td->urb);
td_to_noop(xhci, ring, td, false);
td->cancel_status = TD_CLEARED;
}
@@ -1350,6 +1370,7 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
struct xhci_ep_ctx *ep_ctx;
struct xhci_slot_ctx *slot_ctx;
struct xhci_td *td, *tmp_td;
+ bool deferred = false;
ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
@@ -1436,6 +1457,8 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n",
__func__, td->urb);
xhci_td_cleanup(ep->xhci, td, ep_ring, td->status);
+ } else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) {
+ deferred = true;
} else {
xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n",
__func__, td->urb, td->cancel_status);
@@ -1445,8 +1468,17 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
ep->ep_state &= ~SET_DEQ_PENDING;
ep->queued_deq_seg = NULL;
ep->queued_deq_ptr = NULL;
- /* Restart any rings with pending URBs */
- ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+
+ if (deferred) {
+ /* We have more streams to clear */
+ xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n",
+ __func__);
+ xhci_invalidate_cancelled_tds(ep);
+ } else {
+ /* Restart any rings with pending URBs */
+ xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__);
+ ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+ }
}
static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id,
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 6f4bf98a6282..aa4379bdb90c 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1276,6 +1276,7 @@ enum xhci_cancelled_td_status {
TD_DIRTY = 0,
TD_HALTED,
TD_CLEARING_CACHE,
+ TD_CLEARING_CACHE_DEFERRED,
TD_CLEARED,
};
---
base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
change-id: 20240524-xhci-streams-124e88db52e6
Best regards,
--
Hector Martin <marcan(a)marcan.st>
Am 26.05.24 um 22:19 schrieb Sasha Levin:
> This is a note to let you know that I've just added the patch titled
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> to the 5.4-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> platform-x86-xiaomi-wmi-fix-race-condition-when-repo.patch
> and it can be found in the queue-5.4 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
Hi,
the underlying race condition can only be triggered since
commit e2ffcda16290 ("ACPI: OSL: Allow Notify () handlers to run on all CPUs"), which
afaik was introduced with kernel 6.8.
Because of this, i do not think that we have to backport this commit to kernels before 6.8.
Thanks,
Armin Wolf
>
> commit 7217162b48f60edc29afbeff641b7de02076bb86
> Author: Armin Wolf <W_Armin(a)gmx.de>
> Date: Tue Apr 2 16:30:57 2024 +0200
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> [ Upstream commit 290680c2da8061e410bcaec4b21584ed951479af ]
>
> Multiple WMI events can be received concurrently, so multiple instances
> of xiaomi_wmi_notify() can be active at the same time. Since the input
> device is shared between those handlers, the key input sequence can be
> disturbed.
>
> Fix this by protecting the key input sequence with a mutex.
>
> Compile-tested only.
>
> Fixes: edb73f4f0247 ("platform/x86: wmi: add Xiaomi WMI key driver")
> Signed-off-by: Armin Wolf <W_Armin(a)gmx.de>
> Link: https://lore.kernel.org/r/20240402143059.8456-2-W_Armin@gmx.de
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
> Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/platform/x86/xiaomi-wmi.c b/drivers/platform/x86/xiaomi-wmi.c
> index 54a2546bb93bf..be80f0bda9484 100644
> --- a/drivers/platform/x86/xiaomi-wmi.c
> +++ b/drivers/platform/x86/xiaomi-wmi.c
> @@ -2,8 +2,10 @@
> /* WMI driver for Xiaomi Laptops */
>
> #include <linux/acpi.h>
> +#include <linux/device.h>
> #include <linux/input.h>
> #include <linux/module.h>
> +#include <linux/mutex.h>
> #include <linux/wmi.h>
>
> #include <uapi/linux/input-event-codes.h>
> @@ -20,12 +22,21 @@
>
> struct xiaomi_wmi {
> struct input_dev *input_dev;
> + struct mutex key_lock; /* Protects the key event sequence */
> unsigned int key_code;
> };
>
> +static void xiaomi_mutex_destroy(void *data)
> +{
> + struct mutex *lock = data;
> +
> + mutex_destroy(lock);
> +}
> +
> static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> {
> struct xiaomi_wmi *data;
> + int ret;
>
> if (wdev == NULL || context == NULL)
> return -EINVAL;
> @@ -35,6 +46,11 @@ static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> return -ENOMEM;
> dev_set_drvdata(&wdev->dev, data);
>
> + mutex_init(&data->key_lock);
> + ret = devm_add_action_or_reset(&wdev->dev, xiaomi_mutex_destroy, &data->key_lock);
> + if (ret < 0)
> + return ret;
> +
> data->input_dev = devm_input_allocate_device(&wdev->dev);
> if (data->input_dev == NULL)
> return -ENOMEM;
> @@ -59,10 +75,12 @@ static void xiaomi_wmi_notify(struct wmi_device *wdev, union acpi_object *dummy)
> if (data == NULL)
> return;
>
> + mutex_lock(&data->key_lock);
> input_report_key(data->input_dev, data->key_code, 1);
> input_sync(data->input_dev);
> input_report_key(data->input_dev, data->key_code, 0);
> input_sync(data->input_dev);
> + mutex_unlock(&data->key_lock);
> }
>
> static const struct wmi_device_id xiaomi_wmi_id_table[] = {
Am 26.05.24 um 22:14 schrieb Sasha Levin:
> This is a note to let you know that I've just added the patch titled
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> to the 5.10-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> platform-x86-xiaomi-wmi-fix-race-condition-when-repo.patch
> and it can be found in the queue-5.10 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
Hi,
the underlying race condition can only be triggered since
commit e2ffcda16290 ("ACPI: OSL: Allow Notify () handlers to run on all CPUs"), which
afaik was introduced with kernel 6.8.
Because of this, i do not think that we have to backport this commit to kernels before 6.8.
Thanks,
Armin Wolf
>
> commit 6f4e7901c3ed3c0bd3da7af5854dbb765fad2e00
> Author: Armin Wolf <W_Armin(a)gmx.de>
> Date: Tue Apr 2 16:30:57 2024 +0200
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> [ Upstream commit 290680c2da8061e410bcaec4b21584ed951479af ]
>
> Multiple WMI events can be received concurrently, so multiple instances
> of xiaomi_wmi_notify() can be active at the same time. Since the input
> device is shared between those handlers, the key input sequence can be
> disturbed.
>
> Fix this by protecting the key input sequence with a mutex.
>
> Compile-tested only.
>
> Fixes: edb73f4f0247 ("platform/x86: wmi: add Xiaomi WMI key driver")
> Signed-off-by: Armin Wolf <W_Armin(a)gmx.de>
> Link: https://lore.kernel.org/r/20240402143059.8456-2-W_Armin@gmx.de
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
> Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/platform/x86/xiaomi-wmi.c b/drivers/platform/x86/xiaomi-wmi.c
> index 54a2546bb93bf..be80f0bda9484 100644
> --- a/drivers/platform/x86/xiaomi-wmi.c
> +++ b/drivers/platform/x86/xiaomi-wmi.c
> @@ -2,8 +2,10 @@
> /* WMI driver for Xiaomi Laptops */
>
> #include <linux/acpi.h>
> +#include <linux/device.h>
> #include <linux/input.h>
> #include <linux/module.h>
> +#include <linux/mutex.h>
> #include <linux/wmi.h>
>
> #include <uapi/linux/input-event-codes.h>
> @@ -20,12 +22,21 @@
>
> struct xiaomi_wmi {
> struct input_dev *input_dev;
> + struct mutex key_lock; /* Protects the key event sequence */
> unsigned int key_code;
> };
>
> +static void xiaomi_mutex_destroy(void *data)
> +{
> + struct mutex *lock = data;
> +
> + mutex_destroy(lock);
> +}
> +
> static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> {
> struct xiaomi_wmi *data;
> + int ret;
>
> if (wdev == NULL || context == NULL)
> return -EINVAL;
> @@ -35,6 +46,11 @@ static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> return -ENOMEM;
> dev_set_drvdata(&wdev->dev, data);
>
> + mutex_init(&data->key_lock);
> + ret = devm_add_action_or_reset(&wdev->dev, xiaomi_mutex_destroy, &data->key_lock);
> + if (ret < 0)
> + return ret;
> +
> data->input_dev = devm_input_allocate_device(&wdev->dev);
> if (data->input_dev == NULL)
> return -ENOMEM;
> @@ -59,10 +75,12 @@ static void xiaomi_wmi_notify(struct wmi_device *wdev, union acpi_object *dummy)
> if (data == NULL)
> return;
>
> + mutex_lock(&data->key_lock);
> input_report_key(data->input_dev, data->key_code, 1);
> input_sync(data->input_dev);
> input_report_key(data->input_dev, data->key_code, 0);
> input_sync(data->input_dev);
> + mutex_unlock(&data->key_lock);
> }
>
> static const struct wmi_device_id xiaomi_wmi_id_table[] = {
Am 26.05.24 um 22:07 schrieb Sasha Levin:
> This is a note to let you know that I've just added the patch titled
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> platform-x86-xiaomi-wmi-fix-race-condition-when-repo.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
Hi,
the underlying race condition can only be triggered since
commit e2ffcda16290 ("ACPI: OSL: Allow Notify () handlers to run on all CPUs"), which
afaik was introduced with kernel 6.8.
Because of this, i do not think that we have to backport this commit to kernels before 6.8.
Thanks,
Armin Wolf
>
> commit 1f436551dd453c28c23f800e7273136e526197cb
> Author: Armin Wolf <W_Armin(a)gmx.de>
> Date: Tue Apr 2 16:30:57 2024 +0200
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> [ Upstream commit 290680c2da8061e410bcaec4b21584ed951479af ]
>
> Multiple WMI events can be received concurrently, so multiple instances
> of xiaomi_wmi_notify() can be active at the same time. Since the input
> device is shared between those handlers, the key input sequence can be
> disturbed.
>
> Fix this by protecting the key input sequence with a mutex.
>
> Compile-tested only.
>
> Fixes: edb73f4f0247 ("platform/x86: wmi: add Xiaomi WMI key driver")
> Signed-off-by: Armin Wolf <W_Armin(a)gmx.de>
> Link: https://lore.kernel.org/r/20240402143059.8456-2-W_Armin@gmx.de
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
> Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/platform/x86/xiaomi-wmi.c b/drivers/platform/x86/xiaomi-wmi.c
> index 54a2546bb93bf..be80f0bda9484 100644
> --- a/drivers/platform/x86/xiaomi-wmi.c
> +++ b/drivers/platform/x86/xiaomi-wmi.c
> @@ -2,8 +2,10 @@
> /* WMI driver for Xiaomi Laptops */
>
> #include <linux/acpi.h>
> +#include <linux/device.h>
> #include <linux/input.h>
> #include <linux/module.h>
> +#include <linux/mutex.h>
> #include <linux/wmi.h>
>
> #include <uapi/linux/input-event-codes.h>
> @@ -20,12 +22,21 @@
>
> struct xiaomi_wmi {
> struct input_dev *input_dev;
> + struct mutex key_lock; /* Protects the key event sequence */
> unsigned int key_code;
> };
>
> +static void xiaomi_mutex_destroy(void *data)
> +{
> + struct mutex *lock = data;
> +
> + mutex_destroy(lock);
> +}
> +
> static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> {
> struct xiaomi_wmi *data;
> + int ret;
>
> if (wdev == NULL || context == NULL)
> return -EINVAL;
> @@ -35,6 +46,11 @@ static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> return -ENOMEM;
> dev_set_drvdata(&wdev->dev, data);
>
> + mutex_init(&data->key_lock);
> + ret = devm_add_action_or_reset(&wdev->dev, xiaomi_mutex_destroy, &data->key_lock);
> + if (ret < 0)
> + return ret;
> +
> data->input_dev = devm_input_allocate_device(&wdev->dev);
> if (data->input_dev == NULL)
> return -ENOMEM;
> @@ -59,10 +75,12 @@ static void xiaomi_wmi_notify(struct wmi_device *wdev, union acpi_object *dummy)
> if (data == NULL)
> return;
>
> + mutex_lock(&data->key_lock);
> input_report_key(data->input_dev, data->key_code, 1);
> input_sync(data->input_dev);
> input_report_key(data->input_dev, data->key_code, 0);
> input_sync(data->input_dev);
> + mutex_unlock(&data->key_lock);
> }
>
> static const struct wmi_device_id xiaomi_wmi_id_table[] = {