When support for PCC Opregion was added, validation of field_obj
was missed.
Based on the acpi_ev_address_space_dispatch function description,
field_obj can be NULL, and also when acpi_ev_address_space_dispatch
is called in the acpi_ex_region_read() NULL is passed as field_obj.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 0acf24ad7e10 ("ACPICA: Add support for PCC Opregion special context data")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <grurikov(a)gmail.com>
---
drivers/acpi/acpica/evregion.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c
index cf53b9535f18..03e8b6f186af 100644
--- a/drivers/acpi/acpica/evregion.c
+++ b/drivers/acpi/acpica/evregion.c
@@ -164,13 +164,17 @@ acpi_ev_address_space_dispatch(union acpi_operand_object *region_obj,
}
if (region_obj->region.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
- struct acpi_pcc_info *ctx =
- handler_desc->address_space.context;
-
- ctx->internal_buffer =
- field_obj->field.internal_pcc_buffer;
- ctx->length = (u16)region_obj->region.length;
- ctx->subspace_id = (u8)region_obj->region.address;
+ if (field_obj != NULL) {
+ struct acpi_pcc_info *ctx =
+ handler_desc->address_space.context;
+
+ ctx->internal_buffer =
+ field_obj->field.internal_pcc_buffer;
+ ctx->length = (u16)region_obj->region.length;
+ ctx->subspace_id = (u8)region_obj->region.address;
+ } else {
+ return_ACPI_STATUS(AE_ERROR);
+ }
}
if (region_obj->region.space_id ==
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110551-fragrance-deodorant-e7fa@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri, 25 Oct 2024 12:32:55 -0700
Subject: [PATCH] cxl/port: Fix CXL port initialization order when the
subsystem is built-in
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race,
cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses
the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That
flow only works if cxl_acpi can assume ports are enabled immediately
upon cxl_acpi_probe() return. That in turn can only happen in the
CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before
cxl_acpi_probe() runs.
Fix up the order to prevent initialization failures. Ensure that
cxl_port is built-in when cxl_acpi is also built-in, arrange for
Makefile order to resolve the subsys_initcall() order of cxl_port and
cxl_acpi, and arrange for Makefile order to resolve the
device_initcall() (module_init()) order of the remaining objects.
As for what contributed to this not being found earlier, the CXL
regression environment, cxl_test, builds all CXL functionality as a
module to allow to symbol mocking and other dynamic reload tests. As a
result there is no regression coverage for the built-in case.
Reported-by: Gregory Price <gourry(a)gourry.net>
Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
Tested-by: Gregory Price <gourry(a)gourry.net>
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: stable(a)vger.kernel.org
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Alison Schofield <alison.schofield(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Tested-by: Alejandro Lucero <alucerop(a)amd.com>
Reviewed-by: Alejandro Lucero <alucerop(a)amd.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil…
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 29c192f20082..876469e23f7a 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select CXL_PORT
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,13 +1,21 @@
# SPDX-License-Identifier: GPL-2.0
+
+# Order is important here for the built-in case:
+# - 'core' first for fundamental init
+# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
+# are immediately enabled
+# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
+# immediately enabled
+# - 'pci' last, also mirrors the hardware enumeration hierarchy
obj-y += core/
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
-obj-$(CONFIG_CXL_PORT) += cxl_port.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-cxl_mem-y := mem.o
-cxl_pci-y := pci.o
+cxl_port-y := port.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o security.o
-cxl_port-y := port.o
+cxl_mem-y := mem.o
+cxl_pci-y := pci.o
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 861dde65768f..9dc394295e1f 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = {
},
};
-module_cxl_driver(cxl_port_driver);
+static int __init cxl_port_init(void)
+{
+ return cxl_driver_register(&cxl_port_driver);
+}
+/*
+ * Be ready to immediately enable ports emitted by the platform CXL root
+ * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y.
+ */
+subsys_initcall(cxl_port_init);
+
+static void __exit cxl_port_exit(void)
+{
+ cxl_driver_unregister(&cxl_port_driver);
+}
+module_exit(cxl_port_exit);
+
MODULE_DESCRIPTION("CXL: Port enumeration and services");
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110550-chamomile-zit-2c77@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri, 25 Oct 2024 12:32:55 -0700
Subject: [PATCH] cxl/port: Fix CXL port initialization order when the
subsystem is built-in
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race,
cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses
the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That
flow only works if cxl_acpi can assume ports are enabled immediately
upon cxl_acpi_probe() return. That in turn can only happen in the
CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before
cxl_acpi_probe() runs.
Fix up the order to prevent initialization failures. Ensure that
cxl_port is built-in when cxl_acpi is also built-in, arrange for
Makefile order to resolve the subsys_initcall() order of cxl_port and
cxl_acpi, and arrange for Makefile order to resolve the
device_initcall() (module_init()) order of the remaining objects.
As for what contributed to this not being found earlier, the CXL
regression environment, cxl_test, builds all CXL functionality as a
module to allow to symbol mocking and other dynamic reload tests. As a
result there is no regression coverage for the built-in case.
Reported-by: Gregory Price <gourry(a)gourry.net>
Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
Tested-by: Gregory Price <gourry(a)gourry.net>
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: stable(a)vger.kernel.org
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Alison Schofield <alison.schofield(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Tested-by: Alejandro Lucero <alucerop(a)amd.com>
Reviewed-by: Alejandro Lucero <alucerop(a)amd.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil…
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 29c192f20082..876469e23f7a 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select CXL_PORT
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,13 +1,21 @@
# SPDX-License-Identifier: GPL-2.0
+
+# Order is important here for the built-in case:
+# - 'core' first for fundamental init
+# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
+# are immediately enabled
+# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
+# immediately enabled
+# - 'pci' last, also mirrors the hardware enumeration hierarchy
obj-y += core/
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
-obj-$(CONFIG_CXL_PORT) += cxl_port.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-cxl_mem-y := mem.o
-cxl_pci-y := pci.o
+cxl_port-y := port.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o security.o
-cxl_port-y := port.o
+cxl_mem-y := mem.o
+cxl_pci-y := pci.o
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 861dde65768f..9dc394295e1f 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = {
},
};
-module_cxl_driver(cxl_port_driver);
+static int __init cxl_port_init(void)
+{
+ return cxl_driver_register(&cxl_port_driver);
+}
+/*
+ * Be ready to immediately enable ports emitted by the platform CXL root
+ * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y.
+ */
+subsys_initcall(cxl_port_init);
+
+static void __exit cxl_port_exit(void)
+{
+ cxl_driver_unregister(&cxl_port_driver);
+}
+module_exit(cxl_port_exit);
+
MODULE_DESCRIPTION("CXL: Port enumeration and services");
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 101c268bd2f37e965a5468353e62d154db38838e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110539-swooned-cold-53f5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 101c268bd2f37e965a5468353e62d154db38838e Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 22 Oct 2024 18:43:49 -0700
Subject: [PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder
shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable(a)vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Alison Schofield <alison.schofield(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Zijun Hu <quic_zijuhu(a)quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwil…
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index a4c853411a6b..e42f1ad73078 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data,
}
EXPORT_SYMBOL_GPL(device_for_each_child_reverse);
+/**
+ * device_for_each_child_reverse_from - device child iterator in reversed order.
+ * @parent: parent struct device.
+ * @from: optional starting point in child list
+ * @fn: function to be called for each device.
+ * @data: data for the callback.
+ *
+ * Iterate over @parent's child devices, starting at @from, and call @fn
+ * for each, passing it @data. This helper is identical to
+ * device_for_each_child_reverse() when @from is NULL.
+ *
+ * @fn is checked each iteration. If it returns anything other than 0,
+ * iteration stop and that value is returned to the caller of
+ * device_for_each_child_reverse_from();
+ */
+int device_for_each_child_reverse_from(struct device *parent,
+ struct device *from, const void *data,
+ int (*fn)(struct device *, const void *))
+{
+ struct klist_iter i;
+ struct device *child;
+ int error = 0;
+
+ if (!parent->p)
+ return 0;
+
+ klist_iter_init_node(&parent->p->klist_children, &i,
+ (from ? &from->p->knode_parent : NULL));
+ while ((child = prev_device(&i)) && !error)
+ error = fn(child, data);
+ klist_iter_exit(&i);
+ return error;
+}
+EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from);
+
/**
* device_find_child - device iterator for locating a particular device.
* @parent: parent struct device
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 3df10517a327..223c273c0cd1 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld)
return 0;
}
-static int cxl_decoder_reset(struct cxl_decoder *cxld)
+static int commit_reap(struct device *dev, const void *data)
+{
+ struct cxl_port *port = to_cxl_port(dev->parent);
+ struct cxl_decoder *cxld;
+
+ if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev))
+ return 0;
+
+ cxld = to_cxl_decoder(dev);
+ if (port->commit_end == cxld->id &&
+ ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
+ port->commit_end--;
+ dev_dbg(&port->dev, "reap: %s commit_end: %d\n",
+ dev_name(&cxld->dev), port->commit_end);
+ }
+
+ return 0;
+}
+
+void cxl_port_commit_reap(struct cxl_decoder *cxld)
+{
+ struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+ lockdep_assert_held_write(&cxl_region_rwsem);
+
+ /*
+ * Once the highest committed decoder is disabled, free any other
+ * decoders that were pinned allocated by out-of-order release.
+ */
+ port->commit_end--;
+ dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev),
+ port->commit_end);
+ device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL,
+ commit_reap);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL);
+
+static void cxl_decoder_reset(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
@@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
u32 ctrl;
if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
- return 0;
+ return;
- if (port->commit_end != id) {
+ if (port->commit_end == id)
+ cxl_port_commit_reap(cxld);
+ else
dev_dbg(&port->dev,
"%s: out of order reset, expected decoder%d.%d\n",
dev_name(&cxld->dev), port->id, port->commit_end);
- return -EBUSY;
- }
down_read(&cxl_dpa_rwsem);
ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
@@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
up_read(&cxl_dpa_rwsem);
- port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
/* Userspace is now responsible for reconfiguring this decoder */
@@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
cxled = to_cxl_endpoint_decoder(&cxld->dev);
cxled->state = CXL_DECODER_STATE_MANUAL;
}
-
- return 0;
}
static int cxl_setup_hdm_decoder_from_dvsec(
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e701e4b04032..3478d2058303 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
return 0;
} else {
- dev_err(&cxlr->dev,
- "Failed to synchronize CPU cache state\n");
+ dev_WARN(&cxlr->dev,
+ "Failed to synchronize CPU cache state\n");
return -ENXIO;
}
}
@@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
return 0;
}
-static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
+static void cxl_region_decode_reset(struct cxl_region *cxlr, int count)
{
struct cxl_region_params *p = &cxlr->params;
- int i, rc = 0;
+ int i;
/*
- * Before region teardown attempt to flush, and if the flush
- * fails cancel the region teardown for data consistency
- * concerns
+ * Before region teardown attempt to flush, evict any data cached for
+ * this region, or scream loudly about missing arch / platform support
+ * for CXL teardown.
*/
- rc = cxl_region_invalidate_memregion(cxlr);
- if (rc)
- return rc;
+ cxl_region_invalidate_memregion(cxlr);
for (i = count - 1; i >= 0; i--) {
struct cxl_endpoint_decoder *cxled = p->targets[i];
@@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
cxl_rr = cxl_rr_load(iter, cxlr);
cxld = cxl_rr->decoder;
if (cxld->reset)
- rc = cxld->reset(cxld);
- if (rc)
- return rc;
+ cxld->reset(cxld);
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
}
endpoint_reset:
- rc = cxled->cxld.reset(&cxled->cxld);
- if (rc)
- return rc;
+ cxled->cxld.reset(&cxled->cxld);
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
}
/* all decoders associated with this region have been torn down */
clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
-
- return 0;
}
static int commit_decoder(struct cxl_decoder *cxld)
@@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
* still pending.
*/
if (p->state == CXL_CONFIG_RESET_PENDING) {
- rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
- /*
- * Revert to committed since there may still be active
- * decoders associated with this region, or move forward
- * to active to mark the reset successful
- */
- if (rc)
- p->state = CXL_CONFIG_COMMIT;
- else
- p->state = CXL_CONFIG_ACTIVE;
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
+ p->state = CXL_CONFIG_ACTIVE;
}
}
@@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
get_device(&cxlr->dev);
if (p->state > CXL_CONFIG_ACTIVE) {
- /*
- * TODO: tear down all impacted regions if a device is
- * removed out of order
- */
- rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
- if (rc)
- goto out;
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
p->state = CXL_CONFIG_ACTIVE;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 0d8b810a51f0..5406e3ab3d4a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -359,7 +359,7 @@ struct cxl_decoder {
struct cxl_region *region;
unsigned long flags;
int (*commit)(struct cxl_decoder *cxld);
- int (*reset)(struct cxl_decoder *cxld);
+ void (*reset)(struct cxl_decoder *cxld);
};
/*
@@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
int cxl_num_decoders_committed(struct cxl_port *port);
bool is_cxl_port(const struct device *dev);
struct cxl_port *to_cxl_port(const struct device *dev);
+void cxl_port_commit_reap(struct cxl_decoder *cxld);
struct pci_bus;
int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev,
struct pci_bus *bus);
diff --git a/include/linux/device.h b/include/linux/device.h
index b4bde8d22697..667cb6db9019 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data,
int (*fn)(struct device *dev, void *data));
int device_for_each_child_reverse(struct device *dev, void *data,
int (*fn)(struct device *dev, void *data));
+int device_for_each_child_reverse_from(struct device *parent,
+ struct device *from, const void *data,
+ int (*fn)(struct device *, const void *));
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
struct device *device_find_child_by_name(struct device *parent,
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 90d5afd52dd0..c5bbd89b3192 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
return 0;
}
-static int mock_decoder_reset(struct cxl_decoder *cxld)
+static void mock_decoder_reset(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
int id = cxld->id;
if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
- return 0;
+ return;
dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
- if (port->commit_end != id) {
+ if (port->commit_end == id)
+ cxl_port_commit_reap(cxld);
+ else
dev_dbg(&port->dev,
"%s: out of order reset, expected decoder%d.%d\n",
dev_name(&cxld->dev), port->id, port->commit_end);
- return -EBUSY;
- }
-
- port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
-
- return 0;
}
static void default_mock_decoder(struct cxl_decoder *cxld)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f8c879192465d9f328cb0df07208ef077c560bb1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110503-registry-reheat-cd11@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f8c879192465d9f328cb0df07208ef077c560bb1 Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Date: Wed, 23 Oct 2024 17:24:33 +0000
Subject: [PATCH] soc: qcom: pmic_glink: Handle GLINK intent allocation
rejections
Some versions of the pmic_glink firmware does not allow dynamic GLINK
intent allocations, attempting to send a message before the firmware has
allocated its receive buffers and announced these intent allocations
will fail. When this happens something like this showns up in the log:
pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
GLINK has been updated to distinguish between the cases where the remote
is going down (-ECANCELED) and the intent allocation being rejected
(-EAGAIN).
Retry the send until intent buffers becomes available, or an actual
error occur.
To avoid infinitely waiting for the firmware in the event that this
misbehaves and no intents arrive, an arbitrary 5 second timeout is
used.
This patch was developed with input from Chris Lew.
Reported-by: Johan Hovold <johan(a)kernel.org>
Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t
Cc: stable(a)vger.kernel.org # rpmsg: glink: Handle rejected intent request better
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Reviewed-by: Chris Lew <quic_clew(a)quicinc.com>
Link: https://lore.kernel.org/r/20241023-pmic-glink-ecancelled-v2-2-ebc268129407@…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 9606222993fd..baa4ac6704a9 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -4,6 +4,7 @@
* Copyright (c) 2022, Linaro Ltd
*/
#include <linux/auxiliary_bus.h>
+#include <linux/delay.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/platform_device.h>
@@ -13,6 +14,8 @@
#include <linux/soc/qcom/pmic_glink.h>
#include <linux/spinlock.h>
+#define PMIC_GLINK_SEND_TIMEOUT (5 * HZ)
+
enum {
PMIC_GLINK_CLIENT_BATT = 0,
PMIC_GLINK_CLIENT_ALTMODE,
@@ -112,13 +115,29 @@ EXPORT_SYMBOL_GPL(pmic_glink_client_register);
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
{
struct pmic_glink *pg = client->pg;
+ bool timeout_reached = false;
+ unsigned long start;
int ret;
mutex_lock(&pg->state_lock);
- if (!pg->ept)
+ if (!pg->ept) {
ret = -ECONNRESET;
- else
- ret = rpmsg_send(pg->ept, data, len);
+ } else {
+ start = jiffies;
+ for (;;) {
+ ret = rpmsg_send(pg->ept, data, len);
+ if (ret != -EAGAIN)
+ break;
+
+ if (timeout_reached) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+
+ usleep_range(1000, 5000);
+ timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT);
+ }
+ }
mutex_unlock(&pg->state_lock);
return ret;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x d67907154808745b0fae5874edc7b0f78d33991c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110527-maternal-devious-a112@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d67907154808745b0fae5874edc7b0f78d33991c Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Wed, 2 Oct 2024 12:01:21 +0200
Subject: [PATCH] firmware: qcom: scm: suppress download mode error
Stop spamming the logs with errors about missing mechanism for setting
the so called download (or dump) mode for users that have not requested
that feature to be enabled in the first place.
This avoids the follow error being logged on boot as well as on
shutdown when the feature it not available and download mode has not
been enabled on the kernel command line:
qcom_scm firmware:scm: No available mechanism for setting download mode
Fixes: 79cb2cb8d89b ("firmware: qcom: scm: Disable SDI and write no dump to dump mode")
Fixes: 781d32d1c970 ("firmware: qcom_scm: Clear download bit during reboot")
Cc: Mukesh Ojha <quic_mojha(a)quicinc.com>
Cc: stable(a)vger.kernel.org # 6.4
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Link: https://lore.kernel.org/r/20241002100122.18809-2-johan+linaro@kernel.org
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 10986cb11ec0..e2ac595902ed 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -545,7 +545,7 @@ static void qcom_scm_set_download_mode(u32 dload_mode)
} else if (__qcom_scm_is_call_available(__scm->dev, QCOM_SCM_SVC_BOOT,
QCOM_SCM_BOOT_SET_DLOAD_MODE)) {
ret = __qcom_scm_set_dload_mode(__scm->dev, !!dload_mode);
- } else {
+ } else if (dload_mode) {
dev_err(__scm->dev,
"No available mechanism for setting download mode\n");
}
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x d67907154808745b0fae5874edc7b0f78d33991c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110526-driven-small-2a5c@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d67907154808745b0fae5874edc7b0f78d33991c Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Wed, 2 Oct 2024 12:01:21 +0200
Subject: [PATCH] firmware: qcom: scm: suppress download mode error
Stop spamming the logs with errors about missing mechanism for setting
the so called download (or dump) mode for users that have not requested
that feature to be enabled in the first place.
This avoids the follow error being logged on boot as well as on
shutdown when the feature it not available and download mode has not
been enabled on the kernel command line:
qcom_scm firmware:scm: No available mechanism for setting download mode
Fixes: 79cb2cb8d89b ("firmware: qcom: scm: Disable SDI and write no dump to dump mode")
Fixes: 781d32d1c970 ("firmware: qcom_scm: Clear download bit during reboot")
Cc: Mukesh Ojha <quic_mojha(a)quicinc.com>
Cc: stable(a)vger.kernel.org # 6.4
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Link: https://lore.kernel.org/r/20241002100122.18809-2-johan+linaro@kernel.org
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 10986cb11ec0..e2ac595902ed 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -545,7 +545,7 @@ static void qcom_scm_set_download_mode(u32 dload_mode)
} else if (__qcom_scm_is_call_available(__scm->dev, QCOM_SCM_SVC_BOOT,
QCOM_SCM_BOOT_SET_DLOAD_MODE)) {
ret = __qcom_scm_set_dload_mode(__scm->dev, !!dload_mode);
- } else {
+ } else if (dload_mode) {
dev_err(__scm->dev,
"No available mechanism for setting download mode\n");
}
An atomicity violation occurs when the validity of the variables
da7219->clk_src and da7219->mclk_rate is being assessed. Since the entire
assessment is not protected by a lock, the da7219 variable might still be
in flux during the assessment, rendering this check invalid.
To fix this issue, we recommend adding a lock before the block
if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) so that
the legitimacy check for da7219->clk_src and da7219->mclk_rate is
protected by the lock, ensuring the validity of the check.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 6d817c0e9fd7 ("ASoC: codecs: Add da7219 codec driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
sound/soc/codecs/da7219.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/sound/soc/codecs/da7219.c b/sound/soc/codecs/da7219.c
index 311ea7918b31..e2da3e317b5a 100644
--- a/sound/soc/codecs/da7219.c
+++ b/sound/soc/codecs/da7219.c
@@ -1167,17 +1167,20 @@ static int da7219_set_dai_sysclk(struct snd_soc_dai *codec_dai,
struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
int ret = 0;
- if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq))
+ mutex_lock(&da7219->pll_lock);
+
+ if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) {
+ mutex_unlock(&da7219->pll_lock);
return 0;
+ }
if ((freq < 2000000) || (freq > 54000000)) {
+ mutex_unlock(&da7219->pll_lock);
dev_err(codec_dai->dev, "Unsupported MCLK value %d\n",
freq);
return -EINVAL;
}
- mutex_lock(&da7219->pll_lock);
-
switch (clk_id) {
case DA7219_CLKSRC_MCLK_SQR:
snd_soc_component_update_bits(component, DA7219_PLL_CTRL,
--
2.34.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 9d08ec41a0645283d79a2e642205d488feaceacf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110524-patience-rice-79e2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao(a)google.com>
Date: Sat, 19 Oct 2024 22:22:12 -0600
Subject: [PATCH] mm: allow set/clear page_type again
Some page flags (page->flags) were converted to page types
(page->page_types). A recent example is PG_hugetlb.
From the exclusive writer's perspective, e.g., a thread doing
__folio_set_hugetlb(), there is a difference between the page flag and
type APIs: the former allows the same non-atomic operation to be repeated
whereas the latter does not. For example, calling __folio_set_hugetlb()
twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type
(PG_hugetlb) not to be set previously.
Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in
the following error-handling path. And when that happens, it triggers the
aforementioned VM_BUG_ON_FOLIO().
if (folio_test_hugetlb(folio)) {
rc = hugetlb_vmemmap_restore_folio(h, folio);
if (rc) {
spin_lock_irq(&hugetlb_lock);
add_hugetlb_folio(h, folio, false);
...
It is possible to make hugeTLB comply with the new requirements from the
page type API. However, a straightforward fix would be to just allow the
same page type to be set or cleared again inside the API, to avoid any
changes to its callers.
Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com
Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1b3a76710487..cc839e4365c1 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \
} \
static __always_inline void __folio_set_##fname(struct folio *folio) \
{ \
+ if (folio_test_##fname(folio)) \
+ return; \
VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \
folio); \
folio->page.page_type = (unsigned int)PGTY_##lname << 24; \
} \
static __always_inline void __folio_clear_##fname(struct folio *folio) \
{ \
+ if (folio->page.page_type == UINT_MAX) \
+ return; \
VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \
folio->page.page_type = UINT_MAX; \
}
@@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \
} \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
+ if (Page##uname(page)) \
+ return; \
VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \
page->page_type = (unsigned int)PGTY_##lname << 24; \
} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
+ if (page->page_type == UINT_MAX) \
+ return; \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
page->page_type = UINT_MAX; \
}
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 9d08ec41a0645283d79a2e642205d488feaceacf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110524-runner-gravity-4288@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao(a)google.com>
Date: Sat, 19 Oct 2024 22:22:12 -0600
Subject: [PATCH] mm: allow set/clear page_type again
Some page flags (page->flags) were converted to page types
(page->page_types). A recent example is PG_hugetlb.
From the exclusive writer's perspective, e.g., a thread doing
__folio_set_hugetlb(), there is a difference between the page flag and
type APIs: the former allows the same non-atomic operation to be repeated
whereas the latter does not. For example, calling __folio_set_hugetlb()
twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type
(PG_hugetlb) not to be set previously.
Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in
the following error-handling path. And when that happens, it triggers the
aforementioned VM_BUG_ON_FOLIO().
if (folio_test_hugetlb(folio)) {
rc = hugetlb_vmemmap_restore_folio(h, folio);
if (rc) {
spin_lock_irq(&hugetlb_lock);
add_hugetlb_folio(h, folio, false);
...
It is possible to make hugeTLB comply with the new requirements from the
page type API. However, a straightforward fix would be to just allow the
same page type to be set or cleared again inside the API, to avoid any
changes to its callers.
Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com
Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1b3a76710487..cc839e4365c1 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \
} \
static __always_inline void __folio_set_##fname(struct folio *folio) \
{ \
+ if (folio_test_##fname(folio)) \
+ return; \
VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \
folio); \
folio->page.page_type = (unsigned int)PGTY_##lname << 24; \
} \
static __always_inline void __folio_clear_##fname(struct folio *folio) \
{ \
+ if (folio->page.page_type == UINT_MAX) \
+ return; \
VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \
folio->page.page_type = UINT_MAX; \
}
@@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \
} \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
+ if (Page##uname(page)) \
+ return; \
VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \
page->page_type = (unsigned int)PGTY_##lname << 24; \
} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
+ if (page->page_type == UINT_MAX) \
+ return; \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
page->page_type = UINT_MAX; \
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-flagship-precook-4a47@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao(a)google.com>
Date: Sat, 19 Oct 2024 01:29:38 +0000
Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL
stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google…
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Matlack <dmatlack(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: David Stevens <stevensd(a)google.com>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 17506e4a2835..9342e5692dab 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -458,9 +458,7 @@ struct lru_gen_folio {
enum {
MM_LEAF_TOTAL, /* total leaf entries */
- MM_LEAF_OLD, /* old leaf entries */
MM_LEAF_YOUNG, /* young leaf entries */
- MM_NONLEAF_TOTAL, /* total non-leaf entries */
MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */
MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */
NR_MM_STATS
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb4e8440c507..4f1d33e4b360 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
continue;
if (!pte_young(ptent)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
walk->mm_stats[MM_LEAF_TOTAL]++;
if (!pmd_young(val)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
continue;
}
- walk->mm_stats[MM_NONLEAF_TOTAL]++;
-
if (!walk->force_scan && should_clear_pmd_young()) {
if (!pmd_young(val))
continue;
@@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
for (type = 0; type < ANON_AND_FILE; type++) {
- const char *s = " ";
+ const char *s = "xxx";
unsigned long n[3] = {};
if (seq == max_seq) {
- s = "RT ";
+ s = "RTx";
n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
@@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
- const char *s = " ";
+ const char *s = "xxxx";
unsigned long n = 0;
if (seq == max_seq && NR_HIST_GENS == 1) {
- s = "LOYNFA";
+ s = "TYFA";
n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
- s = "loynfa";
+ s = "tyfa";
n = READ_ONCE(mm_state->stats[hist][i]);
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110502-removal-babied-f697@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao(a)google.com>
Date: Sat, 19 Oct 2024 01:29:38 +0000
Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL
stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google…
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Matlack <dmatlack(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: David Stevens <stevensd(a)google.com>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 17506e4a2835..9342e5692dab 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -458,9 +458,7 @@ struct lru_gen_folio {
enum {
MM_LEAF_TOTAL, /* total leaf entries */
- MM_LEAF_OLD, /* old leaf entries */
MM_LEAF_YOUNG, /* young leaf entries */
- MM_NONLEAF_TOTAL, /* total non-leaf entries */
MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */
MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */
NR_MM_STATS
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb4e8440c507..4f1d33e4b360 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
continue;
if (!pte_young(ptent)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
walk->mm_stats[MM_LEAF_TOTAL]++;
if (!pmd_young(val)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
continue;
}
- walk->mm_stats[MM_NONLEAF_TOTAL]++;
-
if (!walk->force_scan && should_clear_pmd_young()) {
if (!pmd_young(val))
continue;
@@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
for (type = 0; type < ANON_AND_FILE; type++) {
- const char *s = " ";
+ const char *s = "xxx";
unsigned long n[3] = {};
if (seq == max_seq) {
- s = "RT ";
+ s = "RTx";
n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
@@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
- const char *s = " ";
+ const char *s = "xxxx";
unsigned long n = 0;
if (seq == max_seq && NR_HIST_GENS == 1) {
- s = "LOYNFA";
+ s = "TYFA";
n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
- s = "loynfa";
+ s = "tyfa";
n = READ_ONCE(mm_state->stats[hist][i]);
}
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110501-scrubber-eating-d64d@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao(a)google.com>
Date: Sat, 19 Oct 2024 01:29:38 +0000
Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL
stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google…
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Matlack <dmatlack(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: David Stevens <stevensd(a)google.com>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 17506e4a2835..9342e5692dab 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -458,9 +458,7 @@ struct lru_gen_folio {
enum {
MM_LEAF_TOTAL, /* total leaf entries */
- MM_LEAF_OLD, /* old leaf entries */
MM_LEAF_YOUNG, /* young leaf entries */
- MM_NONLEAF_TOTAL, /* total non-leaf entries */
MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */
MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */
NR_MM_STATS
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb4e8440c507..4f1d33e4b360 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
continue;
if (!pte_young(ptent)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
walk->mm_stats[MM_LEAF_TOTAL]++;
if (!pmd_young(val)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
continue;
}
- walk->mm_stats[MM_NONLEAF_TOTAL]++;
-
if (!walk->force_scan && should_clear_pmd_young()) {
if (!pmd_young(val))
continue;
@@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
for (type = 0; type < ANON_AND_FILE; type++) {
- const char *s = " ";
+ const char *s = "xxx";
unsigned long n[3] = {};
if (seq == max_seq) {
- s = "RT ";
+ s = "RTx";
n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
@@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
- const char *s = " ";
+ const char *s = "xxxx";
unsigned long n = 0;
if (seq == max_seq && NR_HIST_GENS == 1) {
- s = "LOYNFA";
+ s = "TYFA";
n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
- s = "loynfa";
+ s = "tyfa";
n = READ_ONCE(mm_state->stats[hist][i]);
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110506-octane-phosphate-f084@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001
From: Barry Song <baohua(a)kernel.org>
Date: Fri, 27 Sep 2024 09:19:36 +1200
Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare
fails
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
[v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active]
Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: Oven Liyang <liyangouwen1(a)oppo.com>
Tested-by: Oven Liyang <liyangouwen1(a)oppo.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 3ccee51adfbb..bdf77a3ec47b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct folio *swapcache, *folio = NULL;
+ DECLARE_WAITQUEUE(wait, current);
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
@@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* Relax a bit to prevent rapid
* repeated page faults.
*/
+ add_wait_queue(&swapcache_wq, &wait);
schedule_timeout_uninterruptible(1);
+ remove_wait_queue(&swapcache_wq, &wait);
goto out_page;
}
need_clear_cache = true;
@@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
/* Clear the swap cache pin for direct swapin after PTL unlock */
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ if (waitqueue_active(&swapcache_wq))
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
@@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_unlock(swapcache);
folio_put(swapcache);
}
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ if (waitqueue_active(&swapcache_wq))
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110505-january-napped-4f1b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001
From: Barry Song <baohua(a)kernel.org>
Date: Fri, 27 Sep 2024 09:19:36 +1200
Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare
fails
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
[v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active]
Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: Oven Liyang <liyangouwen1(a)oppo.com>
Tested-by: Oven Liyang <liyangouwen1(a)oppo.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 3ccee51adfbb..bdf77a3ec47b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct folio *swapcache, *folio = NULL;
+ DECLARE_WAITQUEUE(wait, current);
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
@@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* Relax a bit to prevent rapid
* repeated page faults.
*/
+ add_wait_queue(&swapcache_wq, &wait);
schedule_timeout_uninterruptible(1);
+ remove_wait_queue(&swapcache_wq, &wait);
goto out_page;
}
need_clear_cache = true;
@@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
/* Clear the swap cache pin for direct swapin after PTL unlock */
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ if (waitqueue_active(&swapcache_wq))
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
@@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_unlock(swapcache);
folio_put(swapcache);
}
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ if (waitqueue_active(&swapcache_wq))
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-water-overarch-bbef@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001
From: Barry Song <baohua(a)kernel.org>
Date: Fri, 27 Sep 2024 09:19:36 +1200
Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare
fails
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
[v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active]
Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: Oven Liyang <liyangouwen1(a)oppo.com>
Tested-by: Oven Liyang <liyangouwen1(a)oppo.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 3ccee51adfbb..bdf77a3ec47b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct folio *swapcache, *folio = NULL;
+ DECLARE_WAITQUEUE(wait, current);
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
@@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* Relax a bit to prevent rapid
* repeated page faults.
*/
+ add_wait_queue(&swapcache_wq, &wait);
schedule_timeout_uninterruptible(1);
+ remove_wait_queue(&swapcache_wq, &wait);
goto out_page;
}
need_clear_cache = true;
@@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
/* Clear the swap cache pin for direct swapin after PTL unlock */
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ if (waitqueue_active(&swapcache_wq))
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
@@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_unlock(swapcache);
folio_put(swapcache);
}
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ if (waitqueue_active(&swapcache_wq))
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
The patch below was submitted to be applied to the 6.11-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 164f66de6bb6ef454893f193c898dc8f1da6d18b Mon Sep 17 00:00:00 2001
From: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn>
Date: Tue, 8 Oct 2024 17:41:39 +0800
Subject: [PATCH] riscv: Remove duplicated GET_RM
The macro GET_RM defined twice in this file, one can be removed.
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn>
Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241008094141.549248-3-zhangchunyan@iscas.ac.cn
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index d4fd8af7aaf5..1b9867136b61 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -136,8 +136,6 @@
#define REG_PTR(insn, pos, regs) \
(ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
-#define GET_RM(insn) (((insn) >> 12) & 7)
-
#define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs))
#define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs))
#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110521-antler-uneatable-d873@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Wed, 23 Oct 2024 09:17:44 +0200
Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction
If more than 255 colocated APs exist for the set of all
APs found during 2.4/5 GHz scanning, then the 6 GHz scan
construction will loop forever since the loop variable
has type u8, which can never reach the number found when
that's bigger than 255, and is stored in a u32 variable.
Also move it into the loops to have a smaller scope.
Using a u32 there is fine, we limit the number of APs in
the scan list and each has a limit on the number of RNR
entries due to the frame size. With a limit of 1000 scan
results, a frame size upper bound of 4096 (really it's
more like ~2300) and a TBTT entry size of at least 11,
we get an upper bound for the number of ~372k, well in
the bounds of a u32.
Cc: stable(a)vger.kernel.org
Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375
Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
index 3ce9150213a7..ddcbd80a49fb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
@@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
&cp->channel_config[ch_cnt];
u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0;
- u8 j, k, n_s_ssids = 0, n_bssids = 0;
+ u8 k, n_s_ssids = 0, n_bssids = 0;
u8 max_s_ssids, max_bssids;
bool force_passive = false, found = false, allow_passive = true,
unsolicited_probe_on_chan = false, psc_no_listen = false;
@@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
cfg->v5.iter_count = 1;
cfg->v5.iter_interval = 0;
- for (j = 0; j < params->n_6ghz_params; j++) {
+ for (u32 j = 0; j < params->n_6ghz_params; j++) {
s8 tmp_psd_20;
if (!(scan_6ghz_params[j].channel_idx == i))
@@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
* SSID.
* TODO: improve this logic
*/
- for (j = 0; j < params->n_6ghz_params; j++) {
+ for (u32 j = 0; j < params->n_6ghz_params; j++) {
if (!(scan_6ghz_params[j].channel_idx == i))
continue;
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 33549fcf37ec461f398f0a41e1c9948be2e5aca4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110519-underage-paying-6d38@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 33549fcf37ec461f398f0a41e1c9948be2e5aca4 Mon Sep 17 00:00:00 2001
From: Conor Dooley <conor.dooley(a)microchip.com>
Date: Tue, 1 Oct 2024 12:28:13 +0100
Subject: [PATCH] RISC-V: disallow gcc + rust builds
During the discussion before supporting rust on riscv, it was decided
not to support gcc yet, due to differences in extension handling
compared to llvm (only the version of libclang matching the c compiler
is supported). Recently Jason Montleon reported [1] that building with
gcc caused build issues, due to unsupported arguments being passed to
libclang. After some discussion between myself and Miguel, it is better
to disable gcc + rust builds to match the original intent, and
subsequently support it when an appropriate set of extensions can be
deduced from the version of libclang.
Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1]
Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2]
Fixes: 70a57b247251a ("RISC-V: enable building 64-bit kernels with rust support")
Cc: stable(a)vger.kernel.org
Reported-by: Jason Montleon <jmontleo(a)redhat.com>
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Acked-by: Miguel Ojeda <ojeda(a)kernel.org>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Link: https://lore.kernel.org/r/20241001-playlist-deceiving-16ece2f440f5@spud
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/Documentation/rust/arch-support.rst b/Documentation/rust/arch-support.rst
index 750ff371570a..54be7ddf3e57 100644
--- a/Documentation/rust/arch-support.rst
+++ b/Documentation/rust/arch-support.rst
@@ -17,7 +17,7 @@ Architecture Level of support Constraints
============= ================ ==============================================
``arm64`` Maintained Little Endian only.
``loongarch`` Maintained \-
-``riscv`` Maintained ``riscv64`` only.
+``riscv`` Maintained ``riscv64`` and LLVM/Clang only.
``um`` Maintained \-
``x86`` Maintained ``x86_64`` only.
============= ================ ==============================================
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 62545946ecf4..f4c570538d55 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -177,7 +177,7 @@ config RISCV
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RETHOOK if !XIP_KERNEL
select HAVE_RSEQ
- select HAVE_RUST if RUSTC_SUPPORTS_RISCV
+ select HAVE_RUST if RUSTC_SUPPORTS_RISCV && CC_IS_CLANG
select HAVE_SAMPLE_FTRACE_DIRECT
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
select HAVE_STACKPROTECTOR
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110533-dexterous-semantic-aa9e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 18 Oct 2024 04:33:10 +0900
Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked
flag
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5436eb0424bd..10def4b55995 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio)
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110532-veal-argue-331b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 18 Oct 2024 04:33:10 +0900
Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked
flag
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5436eb0424bd..10def4b55995 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio)
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110531-wound-perkiness-ed39@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 18 Oct 2024 04:33:10 +0900
Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked
flag
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5436eb0424bd..10def4b55995 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio)
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110530-diligence-author-75bb@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 18 Oct 2024 04:33:10 +0900
Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked
flag
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5436eb0424bd..10def4b55995 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio)
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110530-elastic-reproduce-14c4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 18 Oct 2024 04:33:10 +0900
Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked
flag
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5436eb0424bd..10def4b55995 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio)
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 05f9c67179c9a8d66dee175fb4b17f380908a26f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-eloquent-stalling-91df@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 05f9c67179c9a8d66dee175fb4b17f380908a26f Mon Sep 17 00:00:00 2001
From: Julien Stephan <jstephan(a)baylibre.com>
Date: Tue, 22 Oct 2024 15:22:39 +0200
Subject: [PATCH] iio: adc: ad7380: fix supplies for ad7380-4
ad7380-4 is the only device in the family that does not have an internal
reference. It uses "refin" as a required external reference.
All other devices in the family use "refio"" as an optional external
reference.
Fixes: 737413da8704 ("iio: adc: ad7380: add support for ad738x-4 4 channels variants")
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Signed-off-by: Julien Stephan <jstephan(a)baylibre.com>
Link: https://patch.msgid.link/20241022-ad7380-fix-supplies-v3-4-f0cefe1b7fa6@bay…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
index b107d8e97ab3..fb728570debe 100644
--- a/drivers/iio/adc/ad7380.c
+++ b/drivers/iio/adc/ad7380.c
@@ -89,6 +89,7 @@ struct ad7380_chip_info {
bool has_mux;
const char * const *supplies;
unsigned int num_supplies;
+ bool external_ref_only;
const char * const *vcm_supplies;
unsigned int num_vcm_supplies;
const unsigned long *available_scan_masks;
@@ -431,6 +432,7 @@ static const struct ad7380_chip_info ad7380_4_chip_info = {
.num_simult_channels = 4,
.supplies = ad7380_supplies,
.num_supplies = ARRAY_SIZE(ad7380_supplies),
+ .external_ref_only = true,
.available_scan_masks = ad7380_4_channel_scan_masks,
.timing_specs = &ad7380_4_timing,
};
@@ -1047,17 +1049,31 @@ static int ad7380_probe(struct spi_device *spi)
"Failed to enable power supplies\n");
fsleep(T_POWERUP_US);
- /*
- * If there is no REFIO supply, then it means that we are using
- * the internal 2.5V reference, otherwise REFIO is reference voltage.
- */
- ret = devm_regulator_get_enable_read_voltage(&spi->dev, "refio");
- if (ret < 0 && ret != -ENODEV)
- return dev_err_probe(&spi->dev, ret,
- "Failed to get refio regulator\n");
+ if (st->chip_info->external_ref_only) {
+ ret = devm_regulator_get_enable_read_voltage(&spi->dev,
+ "refin");
+ if (ret < 0)
+ return dev_err_probe(&spi->dev, ret,
+ "Failed to get refin regulator\n");
- external_ref_en = ret != -ENODEV;
- st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV;
+ st->vref_mv = ret / 1000;
+
+ /* these chips don't have a register bit for this */
+ external_ref_en = false;
+ } else {
+ /*
+ * If there is no REFIO supply, then it means that we are using
+ * the internal reference, otherwise REFIO is reference voltage.
+ */
+ ret = devm_regulator_get_enable_read_voltage(&spi->dev,
+ "refio");
+ if (ret < 0 && ret != -ENODEV)
+ return dev_err_probe(&spi->dev, ret,
+ "Failed to get refio regulator\n");
+
+ external_ref_en = ret != -ENODEV;
+ st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV;
+ }
if (st->chip_info->num_vcm_supplies > ARRAY_SIZE(st->vcm_mv))
return dev_err_probe(&spi->dev, -EINVAL,
The commit 407d1a51921e ("PCI: Create device tree node for bridge")
creates of_node for PCI devices. The newly created of_node is attached
to an existing device. This is done setting directly pdev->dev.of_node
in the code.
Even if pdev->dev.of_node cannot be previously set, this doesn't handle
the fwnode field of the struct device. Indeed, this field needs to be
set if it hasn't already been set.
device_{add,remove}_of_node() have been introduced to handle this case.
Use them instead of the direct setting.
Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
Cc: stable(a)vger.kernel.org
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/pci/of.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index dacea3fc5128..141ffbb1b3e6 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -655,8 +655,8 @@ void of_pci_remove_node(struct pci_dev *pdev)
np = pci_device_to_OF_node(pdev);
if (!np || !of_node_check_flag(np, OF_DYNAMIC))
return;
- pdev->dev.of_node = NULL;
+ device_remove_of_node(&pdev->dev);
of_changeset_revert(np->data);
of_changeset_destroy(np->data);
of_node_put(np);
@@ -713,7 +713,7 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
goto out_free_node;
np->data = cset;
- pdev->dev.of_node = np;
+ device_add_of_node(&pdev->dev, np);
kfree(name);
return;
--
2.46.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6bd301819f8f69331a55ae2336c8b111fc933f3d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110556-record-unscrew-f7e1@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bd301819f8f69331a55ae2336c8b111fc933f3d Mon Sep 17 00:00:00 2001
From: Zicheng Qu <quzicheng(a)huawei.com>
Date: Tue, 22 Oct 2024 13:43:54 +0000
Subject: [PATCH] staging: iio: frequency: ad9832: fix division by zero in
ad9832_calc_freqreg()
In the ad9832_write_frequency() function, clk_get_rate() might return 0.
This can lead to a division by zero when calling ad9832_calc_freqreg().
The check if (fout > (clk_get_rate(st->mclk) / 2)) does not protect
against the case when fout is 0. The ad9832_write_frequency() function
is called from ad9832_write(), and fout is derived from a text buffer,
which can contain any value.
Link: https://lore.kernel.org/all/2024100904-CVE-2024-47663-9bdc@gregkh/
Fixes: ea707584bac1 ("Staging: IIO: DDS: AD9832 / AD9835 driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zicheng Qu <quzicheng(a)huawei.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Link: https://patch.msgid.link/20241022134354.574614-1-quzicheng@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/staging/iio/frequency/ad9832.c b/drivers/staging/iio/frequency/ad9832.c
index 6c390c4eb26d..492612e8f8ba 100644
--- a/drivers/staging/iio/frequency/ad9832.c
+++ b/drivers/staging/iio/frequency/ad9832.c
@@ -129,12 +129,15 @@ static unsigned long ad9832_calc_freqreg(unsigned long mclk, unsigned long fout)
static int ad9832_write_frequency(struct ad9832_state *st,
unsigned int addr, unsigned long fout)
{
+ unsigned long clk_freq;
unsigned long regval;
- if (fout > (clk_get_rate(st->mclk) / 2))
+ clk_freq = clk_get_rate(st->mclk);
+
+ if (!clk_freq || fout > (clk_freq / 2))
return -EINVAL;
- regval = ad9832_calc_freqreg(clk_get_rate(st->mclk), fout);
+ regval = ad9832_calc_freqreg(clk_freq, fout);
st->freq_data[0] = cpu_to_be16((AD9832_CMD_FRE8BITSW << CMD_SHIFT) |
(addr << ADD_SHIFT) |
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110522-unshaken-unvisited-f359@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Wed, 23 Oct 2024 09:17:44 +0200
Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction
If more than 255 colocated APs exist for the set of all
APs found during 2.4/5 GHz scanning, then the 6 GHz scan
construction will loop forever since the loop variable
has type u8, which can never reach the number found when
that's bigger than 255, and is stored in a u32 variable.
Also move it into the loops to have a smaller scope.
Using a u32 there is fine, we limit the number of APs in
the scan list and each has a limit on the number of RNR
entries due to the frame size. With a limit of 1000 scan
results, a frame size upper bound of 4096 (really it's
more like ~2300) and a TBTT entry size of at least 11,
we get an upper bound for the number of ~372k, well in
the bounds of a u32.
Cc: stable(a)vger.kernel.org
Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375
Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
index 3ce9150213a7..ddcbd80a49fb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
@@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
&cp->channel_config[ch_cnt];
u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0;
- u8 j, k, n_s_ssids = 0, n_bssids = 0;
+ u8 k, n_s_ssids = 0, n_bssids = 0;
u8 max_s_ssids, max_bssids;
bool force_passive = false, found = false, allow_passive = true,
unsolicited_probe_on_chan = false, psc_no_listen = false;
@@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
cfg->v5.iter_count = 1;
cfg->v5.iter_interval = 0;
- for (j = 0; j < params->n_6ghz_params; j++) {
+ for (u32 j = 0; j < params->n_6ghz_params; j++) {
s8 tmp_psd_20;
if (!(scan_6ghz_params[j].channel_idx == i))
@@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
* SSID.
* TODO: improve this logic
*/
- for (j = 0; j < params->n_6ghz_params; j++) {
+ for (u32 j = 0; j < params->n_6ghz_params; j++) {
if (!(scan_6ghz_params[j].channel_idx == i))
continue;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110521-compile-luminous-2080@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Wed, 23 Oct 2024 09:17:44 +0200
Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction
If more than 255 colocated APs exist for the set of all
APs found during 2.4/5 GHz scanning, then the 6 GHz scan
construction will loop forever since the loop variable
has type u8, which can never reach the number found when
that's bigger than 255, and is stored in a u32 variable.
Also move it into the loops to have a smaller scope.
Using a u32 there is fine, we limit the number of APs in
the scan list and each has a limit on the number of RNR
entries due to the frame size. With a limit of 1000 scan
results, a frame size upper bound of 4096 (really it's
more like ~2300) and a TBTT entry size of at least 11,
we get an upper bound for the number of ~372k, well in
the bounds of a u32.
Cc: stable(a)vger.kernel.org
Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375
Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
index 3ce9150213a7..ddcbd80a49fb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
@@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
&cp->channel_config[ch_cnt];
u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0;
- u8 j, k, n_s_ssids = 0, n_bssids = 0;
+ u8 k, n_s_ssids = 0, n_bssids = 0;
u8 max_s_ssids, max_bssids;
bool force_passive = false, found = false, allow_passive = true,
unsolicited_probe_on_chan = false, psc_no_listen = false;
@@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
cfg->v5.iter_count = 1;
cfg->v5.iter_interval = 0;
- for (j = 0; j < params->n_6ghz_params; j++) {
+ for (u32 j = 0; j < params->n_6ghz_params; j++) {
s8 tmp_psd_20;
if (!(scan_6ghz_params[j].channel_idx == i))
@@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm,
* SSID.
* TODO: improve this logic
*/
- for (j = 0; j < params->n_6ghz_params; j++) {
+ for (u32 j = 0; j < params->n_6ghz_params; j++) {
if (!(scan_6ghz_params[j].channel_idx == i))
continue;
This patch fixes an issue in the function xenbus_dev_probe(). In the
xenbus_dev_probe() function, within the if (err) branch at line 313, the
program incorrectly returns err directly without releasing the resources
allocated by err = drv->probe(dev, id). As the return value is non-zero,
the upper layers assume the processing logic has failed. However, the probe
operation was performed earlier without a corresponding remove operation.
Since the probe actually allocates resources, failing to perform the remove
operation could lead to problems.
To fix this issue, we followed the resource release logic of the
xenbus_dev_remove() function by adding a new block fail_remove before the
fail_put block. After entering the branch if (err) at line 313, the
function will use a goto statement to jump to the fail_remove block,
ensuring that the previously acquired resources are correctly released,
thus preventing the reference count leak.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and detecting potential issues where resources are not properly managed.
In this case, the tool flagged the missing release operation as a
potential problem, which led to the development of this patch.
Fixes: 4bac07c993d0 ("xen: add the Xenbus sysfs and virtual device hotplug driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/xen/xenbus/xenbus_probe.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 9f097f1f4a4c..6d32ffb01136 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -313,7 +313,7 @@ int xenbus_dev_probe(struct device *_dev)
if (err) {
dev_warn(&dev->dev, "watch_otherend on %s failed.\n",
dev->nodename);
- return err;
+ goto fail_remove;
}
dev->spurious_threshold = 1;
@@ -322,6 +322,12 @@ int xenbus_dev_probe(struct device *_dev)
dev->nodename);
return 0;
+fail_remove:
+ if (drv->remove) {
+ down(&dev->reclaim_sem);
+ drv->remove(dev);
+ up(&dev->reclaim_sem);
+ }
fail_put:
module_put(drv->driver.owner);
fail:
--
2.34.1
We intend that signal handlers are entered with PSTATE.{SM,ZA}={0,0}.
The logic for this in setup_return() manipulates the saved state and
live CPU state in an unsafe manner, and consequently, when a task enters
a signal handler:
* The task entering the signal handler might not have its PSTATE.{SM,ZA}
bits cleared, and other register state that is affected by changes to
PSTATE.{SM,ZA} might not be zeroed as expected.
* An unrelated task might have its PSTATE.{SM,ZA} bits cleared
unexpectedly, potentially zeroing other register state that is
affected by changes to PSTATE.{SM,ZA}.
Tasks which do not set PSTATE.{SM,ZA} (i.e. those only using plain
FPSIMD or non-streaming SVE) are not affected, as there is no
resulting change to PSTATE.{SM,ZA}.
Consider for example two tasks on one CPU:
A: Begins signal entry in kernel mode, is preempted prior to SMSTOP.
B: Using SM and/or ZA in userspace with register state current on the
CPU, is preempted.
A: Scheduled in, no register state changes made as in kernel mode.
A: Executes SMSTOP, modifying live register state.
A: Scheduled out.
B: Scheduled in, fpsimd_thread_switch() sees the register state on the
CPU is tracked as being that for task B so the state is not reloaded
prior to returning to userspace.
Task B is now running with SM and ZA incorrectly cleared.
Fix this by:
* Checking TIF_FOREIGN_FPSTATE, and only updating the saved or live
state as appropriate.
* Using {get,put}_cpu_fpsimd_context() to ensure mutual exclusion
against other code which manipulates this state. To allow their use,
the logic is moved into a new fpsimd_enter_sighandler() helper in
fpsimd.c.
This race has been observed intermittently with fp-stress, especially
with preempt disabled, commonly but not exclusively reporting "Bad SVCR: 0".
Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Commit message tweaks.
- Flush the task state when updating in memory to ensure we reload.
- Link to v1: https://lore.kernel.org/r/20241023-arm64-fp-sme-sigentry-v1-1-249ff7ec3ad0@…
---
arch/arm64/include/asm/fpsimd.h | 1 +
arch/arm64/kernel/fpsimd.c | 33 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/signal.c | 19 +------------------
3 files changed, 35 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state);
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
+extern void fpsimd_enter_sighandler(void);
extern void fpsimd_signal_preserve_current_state(void);
extern void fpsimd_preserve_current_state(void);
extern void fpsimd_restore_current_state(void);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 77006df20a75aee7c991cf116b6d06bfe953d1a4..c4149f474ce889af42bc2ce9402e7d032818c2e4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1693,6 +1693,39 @@ void fpsimd_signal_preserve_current_state(void)
sve_to_fpsimd(current);
}
+/*
+ * Called by the signal handling code when preparing current to enter
+ * a signal handler. Currently this only needs to take care of exiting
+ * streaming mode and clearing ZA on SME systems.
+ */
+void fpsimd_enter_sighandler(void)
+{
+ if (!system_supports_sme())
+ return;
+
+ get_cpu_fpsimd_context();
+
+ if (test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ /* Exiting streaming mode zeros the FPSIMD state */
+ if (current->thread.svcr & SVCR_SM_MASK) {
+ memset(¤t->thread.uw.fpsimd_state, 0,
+ sizeof(current->thread.uw.fpsimd_state));
+ current->thread.fp_type = FP_STATE_FPSIMD;
+ }
+
+ current->thread.svcr &= ~(SVCR_ZA_MASK |
+ SVCR_SM_MASK);
+
+ /* Ensure any copies on other CPUs aren't reused */
+ fpsimd_flush_task_state(current);
+ } else {
+ /* The register state is current, just update it. */
+ sme_smstop();
+ }
+
+ put_cpu_fpsimd_context();
+}
+
/*
* Called by KVM when entering the guest.
*/
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 5619869475304776fc005fe24a385bf86bfdd253..fe07d0bd9f7978d73973f07ce38b7bdd7914abb2 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -1218,24 +1218,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
/* TCO (Tag Check Override) always cleared for signal handlers */
regs->pstate &= ~PSR_TCO_BIT;
- /* Signal handlers are invoked with ZA and streaming mode disabled */
- if (system_supports_sme()) {
- /*
- * If we were in streaming mode the saved register
- * state was SVE but we will exit SM and use the
- * FPSIMD register state - flush the saved FPSIMD
- * register state in case it gets loaded.
- */
- if (current->thread.svcr & SVCR_SM_MASK) {
- memset(¤t->thread.uw.fpsimd_state, 0,
- sizeof(current->thread.uw.fpsimd_state));
- current->thread.fp_type = FP_STATE_FPSIMD;
- }
-
- current->thread.svcr &= ~(SVCR_ZA_MASK |
- SVCR_SM_MASK);
- sme_smstop();
- }
+ fpsimd_enter_sighandler();
if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241023-arm64-fp-sme-sigentry-a2bd7187e71b
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Following series is a backport of CVE-2024-47674 fix "mm: avoid leaving
partial pfn mappings around in error case" to 5.10.
This required an extra commit "mm: add remap_pfn_range_notrack" to make
both picks clean. The patchset shows no regression compared to 5.10.228
tag.
Christoph Hellwig (1):
mm: add remap_pfn_range_notrack
Linus Torvalds (1):
mm: avoid leaving partial pfn mappings around in error case
include/linux/mm.h | 2 ++
mm/memory.c | 70 ++++++++++++++++++++++++++++++++--------------
2 files changed, 51 insertions(+), 21 deletions(-)
--
2.46.0
A kernel test robot detected a missing error code:
qmc.c:1942 qmc_probe() warn: missing error code 'ret'
Indeed, the error returned by platform_get_irq() is checked and the
operation is aborted in case of failure but the ret error code is
not set in that case.
Set the ret error code.
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/r/202411051350.KNy6ZIWA-lkp@intel.com/
Fixes: 3178d58e0b97 ("soc: fsl: cpm1: Add support for QMC")
Cc: stable(a)vger.kernel.org
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/soc/fsl/qe/qmc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c
index 19cc581b06d0..b3f773e135fd 100644
--- a/drivers/soc/fsl/qe/qmc.c
+++ b/drivers/soc/fsl/qe/qmc.c
@@ -2004,8 +2004,10 @@ static int qmc_probe(struct platform_device *pdev)
/* Set the irq handler */
irq = platform_get_irq(pdev, 0);
- if (irq < 0)
+ if (irq < 0) {
+ ret = irq;
goto err_exit_xcc;
+ }
ret = devm_request_irq(qmc->dev, irq, qmc_irq_handler, 0, "qmc", qmc);
if (ret < 0)
goto err_exit_xcc;
--
2.46.2
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v4: Thanks Laurent.
- Remove refcounted cleaup to support devres.
- Link to v3: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v3-1-c0959c8906d3@chromiu…
Changes in v3: Thanks Sakari.
- Rename variable to initialized.
- Other CodeStyle.
- Link to v2: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v2-1-547ce6a6962e@chromiu…
Changes in v2: Thanks to Laurent.
- The main structure is not allocated with devres so there is a small
period of time where we can get an irq with the structure free. Do not
use devres for the IRQ.
- Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu…
---
Ricardo Ribalda (2):
media: uvcvideo: Remove refcounted cleanup
media: uvcvideo: Fix crash during unbind if gpio unit is in use
drivers/media/usb/uvc/uvc_driver.c | 30 ++++++++----------------------
drivers/media/usb/uvc/uvcvideo.h | 1 -
2 files changed, 8 insertions(+), 23 deletions(-)
---
base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a
change-id: 20241031-uvc-crashrmmod-666de3fc9141
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
We used the wrong device for the device managed functions. We used the
usb device, when we should be using the interface device.
If we unbind the driver from the usb interface, the cleanup functions
are never called. In our case, the IRQ is never disabled.
If an IRQ is triggered, it will try to access memory sections that are
already free, causing an OOPS.
Luckily this bug has small impact, as it is only affected by devices
with gpio units and the user has to unbind the device, a disconnect will
not trigger this error.
Cc: stable(a)vger.kernel.org
Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT")
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v2: Thanks to Laurent.
- The main structure is not allocated with devres so there is a small
period of time where we can get an irq with the structure free. Do not
use devres for the IRQ.
- Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu…
---
drivers/media/usb/uvc/uvc_driver.c | 28 +++++++++++++++++++++-------
drivers/media/usb/uvc/uvcvideo.h | 1 +
2 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c
index a96f6ca0889f..af6aec27083c 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev)
struct gpio_desc *gpio_privacy;
int irq;
- gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy",
+ gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy",
GPIOD_IN);
if (IS_ERR_OR_NULL(gpio_privacy))
return PTR_ERR_OR_ZERO(gpio_privacy);
irq = gpiod_to_irq(gpio_privacy);
if (irq < 0)
- return dev_err_probe(&dev->udev->dev, irq,
+ return dev_err_probe(&dev->intf->dev, irq,
"No IRQ for privacy GPIO\n");
unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT,
@@ -1329,15 +1329,28 @@ static int uvc_gpio_parse(struct uvc_device *dev)
static int uvc_gpio_init_irq(struct uvc_device *dev)
{
struct uvc_entity *unit = dev->gpio_unit;
+ int ret;
if (!unit || unit->gpio.irq < 0)
return 0;
- return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL,
- uvc_gpio_irq,
- IRQF_ONESHOT | IRQF_TRIGGER_FALLING |
- IRQF_TRIGGER_RISING,
- "uvc_privacy_gpio", dev);
+ ret = request_threaded_irq(unit->gpio.irq, NULL, uvc_gpio_irq,
+ IRQF_ONESHOT | IRQF_TRIGGER_FALLING |
+ IRQF_TRIGGER_RISING,
+ "uvc_privacy_gpio", dev);
+
+ if (!ret)
+ dev->gpio_unit->gpio.inited = true;
+
+ return ret;
+}
+
+static void uvc_gpio_cleanup(struct uvc_device *dev)
+{
+ if (!dev->gpio_unit || !dev->gpio_unit->gpio.inited)
+ return;
+
+ free_irq(dev->gpio_unit->gpio.irq, dev);
}
/* ------------------------------------------------------------------------
@@ -1880,6 +1893,7 @@ static void uvc_delete(struct kref *kref)
struct uvc_device *dev = container_of(kref, struct uvc_device, ref);
struct list_head *p, *n;
+ uvc_gpio_cleanup(dev);
uvc_status_cleanup(dev);
uvc_ctrl_cleanup_device(dev);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 07f9921d83f2..376cd670539b 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -234,6 +234,7 @@ struct uvc_entity {
u8 *bmControls;
struct gpio_desc *gpio_privacy;
int irq;
+ bool inited;
} gpio;
};
---
base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a
change-id: 20241031-uvc-crashrmmod-666de3fc9141
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
We used the wrong device for the device managed functions. We used the
usb device, when we should be using the interface device.
If we unbind the driver from the usb interface, the cleanup functions
are never called. In our case, the IRQ is never disabled.
If an IRQ is triggered, it will try to access memory sections that are
already free, causing an OOPS.
Luckily this bug has small impact, as it is only affected by devices
with gpio units and the user has to unbind the device, a disconnect will
not trigger this error.
Cc: stable(a)vger.kernel.org
Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT")
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v3: Thanks Sakari.
- Rename variable to initialized.
- Other CodeStyle.
- Link to v2: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v2-1-547ce6a6962e@chromiu…
Changes in v2: Thanks to Laurent.
- The main structure is not allocated with devres so there is a small
period of time where we can get an irq with the structure free. Do not
use devres for the IRQ.
- Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu…
---
drivers/media/usb/uvc/uvc_driver.c | 27 ++++++++++++++++++++-------
drivers/media/usb/uvc/uvcvideo.h | 1 +
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c
index a96f6ca0889f..2a907e3e3f81 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev)
struct gpio_desc *gpio_privacy;
int irq;
- gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy",
+ gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy",
GPIOD_IN);
if (IS_ERR_OR_NULL(gpio_privacy))
return PTR_ERR_OR_ZERO(gpio_privacy);
irq = gpiod_to_irq(gpio_privacy);
if (irq < 0)
- return dev_err_probe(&dev->udev->dev, irq,
+ return dev_err_probe(&dev->intf->dev, irq,
"No IRQ for privacy GPIO\n");
unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT,
@@ -1329,15 +1329,27 @@ static int uvc_gpio_parse(struct uvc_device *dev)
static int uvc_gpio_init_irq(struct uvc_device *dev)
{
struct uvc_entity *unit = dev->gpio_unit;
+ int ret;
if (!unit || unit->gpio.irq < 0)
return 0;
- return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL,
- uvc_gpio_irq,
- IRQF_ONESHOT | IRQF_TRIGGER_FALLING |
- IRQF_TRIGGER_RISING,
- "uvc_privacy_gpio", dev);
+ ret = request_threaded_irq(unit->gpio.irq, NULL, uvc_gpio_irq,
+ IRQF_ONESHOT | IRQF_TRIGGER_FALLING |
+ IRQF_TRIGGER_RISING,
+ "uvc_privacy_gpio", dev);
+
+ unit->gpio.initialized = !ret;
+
+ return ret;
+}
+
+static void uvc_gpio_cleanup(struct uvc_device *dev)
+{
+ if (!dev->gpio_unit || !dev->gpio_unit->gpio.initialized)
+ return;
+
+ free_irq(dev->gpio_unit->gpio.irq, dev);
}
/* ------------------------------------------------------------------------
@@ -1880,6 +1892,7 @@ static void uvc_delete(struct kref *kref)
struct uvc_device *dev = container_of(kref, struct uvc_device, ref);
struct list_head *p, *n;
+ uvc_gpio_cleanup(dev);
uvc_status_cleanup(dev);
uvc_ctrl_cleanup_device(dev);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 07f9921d83f2..965a789ed03e 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -234,6 +234,7 @@ struct uvc_entity {
u8 *bmControls;
struct gpio_desc *gpio_privacy;
int irq;
+ bool initialized;
} gpio;
};
---
base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a
change-id: 20241031-uvc-crashrmmod-666de3fc9141
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
The channels array in the cfg80211_scan_request has a __counted_by
attribute attached to it, which points to the n_channels variable. This
attribute is used in bounds checking, and if it is not set before the
array is filled, then the bounds sanitizer will issue a warning or a
kernel panic if CONFIG_UBSAN_TRAP is set.
This patch sets the size of allocated memory as the initial value for
n_channels. It is updated with the actual number of added elements after
the array is filled.
Fixes: aa4ec06c455d ("wifi: cfg80211: use __counted_by where appropriate")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aleksei Vetrov <vvvvvv(a)google.com>
---
Changes in v2:
- Added Fixes tag and added stable to CC
- Link to v1: https://lore.kernel.org/r/20241028-nl80211_parse_sched_scan-bounds-checker-…
---
net/wireless/nl80211.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d7d099f7118ab5d5c745905abdea85d246c2b7b2..9b1b9dc5a7eb2a864da7b0212bc6a156b7757a9d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -9776,6 +9776,7 @@ nl80211_parse_sched_scan(struct wiphy *wiphy, struct wireless_dev *wdev,
request = kzalloc(size, GFP_KERNEL);
if (!request)
return ERR_PTR(-ENOMEM);
+ request->n_channels = n_channels;
if (n_ssids)
request->ssids = (void *)request +
---
base-commit: 81983758430957d9a5cb3333fe324fd70cf63e7e
change-id: 20241028-nl80211_parse_sched_scan-bounds-checker-fix-c5842f41b863
Best regards,
--
Aleksei Vetrov <vvvvvv(a)google.com>
on 2024/11/2 3:22, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> net: hns3: add sync command to sync io-pgtable
>
> to the 6.11-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> net-hns3-add-sync-command-to-sync-io-pgtable.patch
> and it can be found in the queue-6.11 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Hi:
This patch was reverted from netdev,
so, it also need be reverted from stable tree.
I am sorry for that.
reverted link:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=2…
Thanks,
Jijie Shao
>
>
>
> commit 0ea8c71561bc40a678c7bf15e081737e1f2d15e2
> Author: Jian Shen <shenjian15(a)huawei.com>
> Date: Fri Oct 25 17:29:31 2024 +0800
>
> net: hns3: add sync command to sync io-pgtable
>
> [ Upstream commit f2c14899caba76da93ff3fff46b4d5a8f43ce07e ]
>
> To avoid errors in pgtable prefectch, add a sync command to sync
> io-pagtable.
>
> This is a supplement for the previous patch.
> We want all the tx packet can be handled with tx bounce buffer path.
> But it depends on the remain space of the spare buffer, checked by the
> hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true.
> But once it return false by no available space, the packet will be handled
> with the former path, which will map/unmap the skb buffer.
> Then the driver will face the smmu prefetch risk again.
>
> So add a sync command in this case to avoid smmu prefectch,
> just protects corner scenes.
>
> Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision")
> Signed-off-by: Jian Shen <shenjian15(a)huawei.com>
> Signed-off-by: Peiyang Wang <wangpeiyang1(a)huawei.com>
> Signed-off-by: Jijie Shao <shaojijie(a)huawei.com>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> index ac88e301f2211..8760b4e9ade6b 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> @@ -381,6 +381,24 @@ static const struct hns3_rx_ptype hns3_rx_ptype_tbl[] = {
> #define HNS3_INVALID_PTYPE \
> ARRAY_SIZE(hns3_rx_ptype_tbl)
>
> +static void hns3_dma_map_sync(struct device *dev, unsigned long iova)
> +{
> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> + struct iommu_iotlb_gather iotlb_gather;
> + size_t granule;
> +
> + if (!domain || !iommu_is_dma_domain(domain))
> + return;
> +
> + granule = 1 << __ffs(domain->pgsize_bitmap);
> + iova = ALIGN_DOWN(iova, granule);
> + iotlb_gather.start = iova;
> + iotlb_gather.end = iova + granule - 1;
> + iotlb_gather.pgsize = granule;
> +
> + iommu_iotlb_sync(domain, &iotlb_gather);
> +}
> +
> static irqreturn_t hns3_irq_handle(int irq, void *vector)
> {
> struct hns3_enet_tqp_vector *tqp_vector = vector;
> @@ -1728,7 +1746,9 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv,
> unsigned int type)
> {
> struct hns3_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use];
> + struct hnae3_handle *handle = ring->tqp->handle;
> struct device *dev = ring_to_dev(ring);
> + struct hnae3_ae_dev *ae_dev;
> unsigned int size;
> dma_addr_t dma;
>
> @@ -1760,6 +1780,13 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv,
> return -ENOMEM;
> }
>
> + /* Add a SYNC command to sync io-pgtale to avoid errors in pgtable
> + * prefetch
> + */
> + ae_dev = hns3_get_ae_dev(handle);
> + if (ae_dev->dev_version >= HNAE3_DEVICE_VERSION_V3)
> + hns3_dma_map_sync(dev, dma);
> +
> desc_cb->priv = priv;
> desc_cb->length = size;
> desc_cb->dma = dma;
Hi all,
I am having the exact same issue.
linux-lts-6.6.28-1 works, anything above doesnt.
When kernel above linux-lts-6.6.28-1:
- Boltctl does not show anything
- thunderbolt.host_reset=0 had no impact
- triggers following errors:
[ 50.627948] ucsi_acpi USBC000:00: unknown error 0
[ 50.627957] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-5)
Gists:
- https://gist.github.com/ricklahaye/83695df8c8273c30d2403da97a353e15 dmesg
with Linux system 6.11.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 17 Oct 2024
20:53:41 +0000 x86_64 GNU/Linux where thunderbolt dock does not work
- https://gist.github.com/ricklahaye/79e4040abcd368524633e86addec1833 dmesg
with Linux system 6.6.28-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 17 Apr 2024
10:11:09 +0000 x86_64 GNU/Linux where thunderbolt does work
- https://gist.github.com/ricklahaye/c9a7b4a7eeba5e7900194eecf9fce454
boltctl with Linux system 6.6.28-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 17 Apr
2024 10:11:09 +0000 x86_64 GNU/Linux where thunderbolt does work
Kind regards,
Rick
Ps: sorry for resend; this time with plain text format
This series introduces the camera pipeline support for the
STM32MP25 SOC. The STM32MP25 has 3 pipelines, fed from a
single camera input which can be either parallel or csi.
This series adds the basic support for the 1st pipe (dump)
which, in term of features is same as the one featured on
the STM32MP13 SOC. It focuses on introduction of the
CSI input stage for the DCMIPP, and the CSI specific new
control code for the DCMIPP.
One of the subdev of the DCMIPP, dcmipp_parallel is now
renamed as dcmipp_input since it allows to not only control
the parallel but also the csi interface.
Signed-off-by: Alain Volmat <alain.volmat(a)foss.st.com>
---
Alain Volmat (15):
media: stm32: dcmipp: correct dma_set_mask_and_coherent mask value
dt-bindings: media: add description of stm32 csi
media: stm32: csi: addition of the STM32 CSI driver
media: stm32: dcmipp: use v4l2_subdev_is_streaming
media: stm32: dcmipp: replace s_stream with enable/disable_streams
media: stm32: dcmipp: rename dcmipp_parallel into dcmipp_input
media: stm32: dcmipp: add support for csi input into dcmipp-input
media: stm32: dcmipp: add bayer 10~14 bits formats
media: stm32: dcmipp: add 1X16 RGB / YUV formats support
media: stm32: dcmipp: avoid duplicated format on enum in bytecap
media: stm32: dcmipp: fill media ctl hw_revision field
dt-bindings: media: add the stm32mp25 compatible of DCMIPP
media: stm32: dcmipp: add core support for the stm32mp25
arm64: dts: st: add csi & dcmipp node in stm32mp25
arm64: dts: st: enable imx335/csi/dcmipp pipeline on stm32mp257f-ev1
.../devicetree/bindings/media/st,stm32-dcmipp.yaml | 53 +-
.../bindings/media/st,stm32mp25-csi.yaml | 125 +++
MAINTAINERS | 8 +
arch/arm64/boot/dts/st/stm32mp251.dtsi | 23 +
arch/arm64/boot/dts/st/stm32mp257f-ev1.dts | 85 ++
drivers/media/platform/st/stm32/Kconfig | 14 +
drivers/media/platform/st/stm32/Makefile | 1 +
drivers/media/platform/st/stm32/stm32-csi.c | 1144 ++++++++++++++++++++
.../media/platform/st/stm32/stm32-dcmipp/Makefile | 2 +-
.../st/stm32/stm32-dcmipp/dcmipp-bytecap.c | 128 ++-
.../st/stm32/stm32-dcmipp/dcmipp-byteproc.c | 119 +-
.../platform/st/stm32/stm32-dcmipp/dcmipp-common.h | 4 +-
.../platform/st/stm32/stm32-dcmipp/dcmipp-core.c | 116 +-
.../platform/st/stm32/stm32-dcmipp/dcmipp-input.c | 540 +++++++++
.../st/stm32/stm32-dcmipp/dcmipp-parallel.c | 440 --------
15 files changed, 2226 insertions(+), 576 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241007-csi_dcmipp_mp25-7779601f57da
Best regards,
--
Alain Volmat <alain.volmat(a)foss.st.com>
From: Daniel Maslowski <cyrevolt(a)googlemail.com>
[ Upstream commit fb197c5d2fd24b9af3d4697d0cf778645846d6d5 ]
When alignment handling is delegated to the kernel, everything must be
word-aligned in purgatory, since the trap handler is then set to the
kexec one. Without the alignment, hitting the exception would
ultimately crash. On other occasions, the kernel's handler would take
care of exceptions.
This has been tested on a JH7110 SoC with oreboot and its SBI delegating
unaligned access exceptions and the kernel configured to handle them.
Fixes: 736e30af583fb ("RISC-V: Add purgatory")
Signed-off-by: Daniel Maslowski <cyrevolt(a)gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Xiangyu: bp to fix CVE: CVE-2024-43868 ,resolved minor conflicts]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
arch/riscv/purgatory/entry.S | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/purgatory/entry.S b/arch/riscv/purgatory/entry.S
index 0194f4554130..a4ede42bc151 100644
--- a/arch/riscv/purgatory/entry.S
+++ b/arch/riscv/purgatory/entry.S
@@ -11,6 +11,8 @@
.macro size, sym:req
.size \sym, . - \sym
.endm
+#include <asm/asm.h>
+#include <linux/linkage.h>
.text
@@ -39,6 +41,7 @@ size purgatory_start
.data
+.align LGREG
.globl riscv_kernel_entry
riscv_kernel_entry:
.quad 0
--
2.43.0
Since commit 50ea5449c563 ("can: mcp251xfd: fix ring configuration
when switching from CAN-CC to CAN-FD mode"), the current ring and
coalescing configuration is passed to can_ram_get_layout(). That fixed
the issue when switching between CAN-CC and CAN-FD mode with
configured ring (rx, tx) and/or coalescing parameters (rx-frames-irq,
tx-frames-irq).
However 50ea5449c563 ("can: mcp251xfd: fix ring configuration when
switching from CAN-CC to CAN-FD mode"), introduced a regression when
switching CAN modes with disabled coalescing configuration: Even if
the previous CAN mode has no coalescing configured, the new mode is
configured with active coalescing. This leads to delayed receiving of
CAN-FD frames.
This comes from the fact, that ethtool uses usecs = 0 and max_frames =
1 to disable coalescing, however the driver uses internally
priv->{rx,tx}_obj_num_coalesce_irq = 0 to indicate disabled
coalescing.
Fix the regression by assigning struct ethtool_coalesce
ec->{rx,tx}_max_coalesced_frames_irq = 1 if coalescing is disabled in
the driver as can_ram_get_layout() expects this.
Reported-by: https://github.com/vdh-robothania
Closes: https://github.com/raspberrypi/linux/issues/6407
Fixes: 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode")
Cc: stable(a)vger.kernel.org
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://patch.msgid.link/20241025-mcp251xfd-fix-coalesing-v1-1-9d11416de1df…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
index e684991fa391..7209a831f0f2 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
@@ -2,7 +2,7 @@
//
// mcp251xfd - Microchip MCP251xFD Family CAN controller driver
//
-// Copyright (c) 2019, 2020, 2021 Pengutronix,
+// Copyright (c) 2019, 2020, 2021, 2024 Pengutronix,
// Marc Kleine-Budde <kernel(a)pengutronix.de>
//
// Based on:
@@ -483,9 +483,11 @@ int mcp251xfd_ring_alloc(struct mcp251xfd_priv *priv)
};
const struct ethtool_coalesce ec = {
.rx_coalesce_usecs_irq = priv->rx_coalesce_usecs_irq,
- .rx_max_coalesced_frames_irq = priv->rx_obj_num_coalesce_irq,
+ .rx_max_coalesced_frames_irq = priv->rx_obj_num_coalesce_irq == 0 ?
+ 1 : priv->rx_obj_num_coalesce_irq,
.tx_coalesce_usecs_irq = priv->tx_coalesce_usecs_irq,
- .tx_max_coalesced_frames_irq = priv->tx_obj_num_coalesce_irq,
+ .tx_max_coalesced_frames_irq = priv->tx_obj_num_coalesce_irq == 0 ?
+ 1 : priv->tx_obj_num_coalesce_irq,
};
struct can_ram_layout layout;
--
2.45.2
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 6098475d4cb48d821bdf453c61118c56e26294f0 ]
Currently we have a global spi_add_lock which we take when adding new
devices so that we can check that we're not trying to reuse a chip
select that's already controlled. This means that if the SPI device is
itself a SPI controller and triggers the instantiation of further SPI
devices we trigger a deadlock as we try to register and instantiate
those devices while in the process of doing so for the parent controller
and hence already holding the global spi_add_lock. Since we only care
about concurrency within a single SPI bus move the lock to be per
controller, avoiding the deadlock.
This can be easily triggered in the case of spi-mux.
Reported-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Hardik Gohil <hgohil(a)mvista.com>
---
This fix was not backported to v5.4 and v5.10
please also apply following fix on top of this
spi: fix use-after-free of the add_lock mutex
commit 6c53b45c71b4920b5e62f0ea8079a1da382b9434 upstream.
Commit 6098475d4cb4 ("spi: Fix deadlock when adding SPI controllers on
SPI buses") introduced a per-controller mutex. But mutex_unlock() of
said lock is called after the controller is already freed:
drivers/spi/spi.c | 15 +++++----------
include/linux/spi/spi.h | 3 +++
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 0f9410e..58f1947 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -472,12 +472,6 @@ static LIST_HEAD(spi_controller_list);
*/
static DEFINE_MUTEX(board_lock);
-/*
- * Prevents addition of devices with same chip select and
- * addition of devices below an unregistering controller.
- */
-static DEFINE_MUTEX(spi_add_lock);
-
/**
* spi_alloc_device - Allocate a new SPI device
* @ctlr: Controller to which device is connected
@@ -580,7 +574,7 @@ int spi_add_device(struct spi_device *spi)
* chipselect **BEFORE** we call setup(), else we'll trash
* its configuration. Lock against concurrent add() calls.
*/
- mutex_lock(&spi_add_lock);
+ mutex_lock(&ctlr->add_lock);
status = bus_for_each_dev(&spi_bus_type, NULL, spi, spi_dev_check);
if (status) {
@@ -624,7 +618,7 @@ int spi_add_device(struct spi_device *spi)
}
done:
- mutex_unlock(&spi_add_lock);
+ mutex_unlock(&ctlr->add_lock);
return status;
}
EXPORT_SYMBOL_GPL(spi_add_device);
@@ -2512,6 +2506,7 @@ int spi_register_controller(struct spi_controller *ctlr)
spin_lock_init(&ctlr->bus_lock_spinlock);
mutex_init(&ctlr->bus_lock_mutex);
mutex_init(&ctlr->io_mutex);
+ mutex_init(&ctlr->add_lock);
ctlr->bus_lock_flag = 0;
init_completion(&ctlr->xfer_completion);
if (!ctlr->max_dma_len)
@@ -2657,7 +2652,7 @@ void spi_unregister_controller(struct spi_controller *ctlr)
/* Prevent addition of new devices, unregister existing ones */
if (IS_ENABLED(CONFIG_SPI_DYNAMIC))
- mutex_lock(&spi_add_lock);
+ mutex_lock(&ctlr->add_lock);
device_for_each_child(&ctlr->dev, NULL, __unregister);
@@ -2688,7 +2683,7 @@ void spi_unregister_controller(struct spi_controller *ctlr)
mutex_unlock(&board_lock);
if (IS_ENABLED(CONFIG_SPI_DYNAMIC))
- mutex_unlock(&spi_add_lock);
+ mutex_unlock(&ctlr->add_lock);
}
EXPORT_SYMBOL_GPL(spi_unregister_controller);
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index ca39b33..1b9cb90 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -483,6 +483,9 @@ struct spi_controller {
/* I/O mutex */
struct mutex io_mutex;
+ /* Used to avoid adding the same CS twice */
+ struct mutex add_lock;
+
/* lock and mutex for SPI bus locking */
spinlock_t bus_lock_spinlock;
struct mutex bus_lock_mutex;
--
2.7.4
The patch titled
Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
Date: Mon, 4 Nov 2024 19:29:44 -0800
commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation must
now occur, for FOLL_LONGTERM cases.
A user-visible consequence has now appeared: user space can no longer pin
more than 2GB of memory anymore on x86_64. That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().
In addition to the directly visible effect described above, there is also
the problem of adding an unnecessary allocation. The **pages array
argument has already been allocated, and there is no need for a redundant
**folios array allocation in this case.
Fix this by avoiding the new allocation entirely. This is done by
referring to either the original page[i] within **pages, or to the
associated folio. Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.
Link: https://lkml.kernel.org/r/20241105032944.141488-2-jhubbard@nvidia.com
Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Dongwon Kim <dongwon.kim(a)intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Junxiao Chang <junxiao.chang(a)intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 116 +++++++++++++++++++++++++++++++++++------------------
1 file changed, 77 insertions(+), 39 deletions(-)
--- a/mm/gup.c~mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases
+++ a/mm/gup.c
@@ -2273,20 +2273,57 @@ struct page *get_dump_page(unsigned long
#endif /* CONFIG_ELF_CORE */
#ifdef CONFIG_MIGRATION
+
+/*
+ * An array of either pages or folios ("pofs"). Although it may seem tempting to
+ * avoid this complication, by simply interpreting a list of folios as a list of
+ * pages, that approach won't work in the longer term, because eventually the
+ * layouts of struct page and struct folio will become completely different.
+ * Furthermore, this pof approach avoids excessive page_folio() calls.
+ */
+struct pages_or_folios {
+ union {
+ struct page **pages;
+ struct folio **folios;
+ void **entries;
+ };
+ bool has_folios;
+ long nr_entries;
+};
+
+static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i)
+{
+ if (pofs->has_folios)
+ return pofs->folios[i];
+ return page_folio(pofs->pages[i]);
+}
+
+static void pofs_clear_entry(struct pages_or_folios *pofs, long i)
+{
+ pofs->entries[i] = NULL;
+}
+
+static void pofs_unpin(struct pages_or_folios *pofs)
+{
+ if (pofs->has_folios)
+ unpin_folios(pofs->folios, pofs->nr_entries);
+ else
+ unpin_user_pages(pofs->pages, pofs->nr_entries);
+}
+
/*
* Returns the number of collected folios. Return value is always >= 0.
*/
static unsigned long collect_longterm_unpinnable_folios(
- struct list_head *movable_folio_list,
- unsigned long nr_folios,
- struct folio **folios)
+ struct list_head *movable_folio_list,
+ struct pages_or_folios *pofs)
{
unsigned long i, collected = 0;
struct folio *prev_folio = NULL;
bool drain_allow = true;
- for (i = 0; i < nr_folios; i++) {
- struct folio *folio = folios[i];
+ for (i = 0; i < pofs->nr_entries; i++) {
+ struct folio *folio = pofs_get_folio(pofs, i);
if (folio == prev_folio)
continue;
@@ -2327,16 +2364,15 @@ static unsigned long collect_longterm_un
* Returns -EAGAIN if all folios were successfully migrated or -errno for
* failure (or partial success).
*/
-static int migrate_longterm_unpinnable_folios(
- struct list_head *movable_folio_list,
- unsigned long nr_folios,
- struct folio **folios)
+static int
+migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
+ struct pages_or_folios *pofs)
{
int ret;
unsigned long i;
- for (i = 0; i < nr_folios; i++) {
- struct folio *folio = folios[i];
+ for (i = 0; i < pofs->nr_entries; i++) {
+ struct folio *folio = pofs_get_folio(pofs, i);
if (folio_is_device_coherent(folio)) {
/*
@@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_f
* convert the pin on the source folio to a normal
* reference.
*/
- folios[i] = NULL;
+ pofs_clear_entry(pofs, i);
folio_get(folio);
gup_put_folio(folio, 1, FOLL_PIN);
@@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_f
* calling folio_isolate_lru() which takes a reference so the
* folio won't be freed if it's migrating.
*/
- unpin_folio(folios[i]);
- folios[i] = NULL;
+ unpin_folio(pofs_get_folio(pofs, i));
+ pofs_clear_entry(pofs, i);
}
if (!list_empty(movable_folio_list)) {
@@ -2387,12 +2423,26 @@ static int migrate_longterm_unpinnable_f
return -EAGAIN;
err:
- unpin_folios(folios, nr_folios);
+ pofs_unpin(pofs);
putback_movable_pages(movable_folio_list);
return ret;
}
+static long
+check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
+{
+ LIST_HEAD(movable_folio_list);
+ unsigned long collected;
+
+ collected =
+ collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+ if (!collected)
+ return 0;
+
+ return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
+}
+
/*
* Check whether all folios are *allowed* to be pinned indefinitely (long term).
* Rather confusingly, all folios in the range are required to be pinned via
@@ -2417,16 +2467,13 @@ err:
static long check_and_migrate_movable_folios(unsigned long nr_folios,
struct folio **folios)
{
- unsigned long collected;
- LIST_HEAD(movable_folio_list);
+ struct pages_or_folios pofs = {
+ .folios = folios,
+ .has_folios = true,
+ .nr_entries = nr_folios,
+ };
- collected = collect_longterm_unpinnable_folios(&movable_folio_list,
- nr_folios, folios);
- if (!collected)
- return 0;
-
- return migrate_longterm_unpinnable_folios(&movable_folio_list,
- nr_folios, folios);
+ return check_and_migrate_movable_pages_or_folios(&pofs);
}
/*
@@ -2436,22 +2483,13 @@ static long check_and_migrate_movable_fo
static long check_and_migrate_movable_pages(unsigned long nr_pages,
struct page **pages)
{
- struct folio **folios;
- long i, ret;
-
- folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL);
- if (!folios) {
- unpin_user_pages(pages, nr_pages);
- return -ENOMEM;
- }
-
- for (i = 0; i < nr_pages; i++)
- folios[i] = page_folio(pages[i]);
+ struct pages_or_folios pofs = {
+ .pages = pages,
+ .has_folios = false,
+ .nr_entries = nr_pages,
+ };
- ret = check_and_migrate_movable_folios(nr_pages, folios);
-
- kfree(folios);
- return ret;
+ return check_and_migrate_movable_pages_or_folios(&pofs);
}
#else
static long check_and_migrate_movable_pages(unsigned long nr_pages,
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch
kaslr-rename-physmem_end-and-physmem_end-to-direct_map_physmem_end.patch
The patch titled
Subject: ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrew Kanner <andrew.kanner(a)gmail.com>
Subject: ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
Date: Sun, 3 Nov 2024 20:38:45 +0100
Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove():
[ 57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12
[ 57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper. Leaking 1 clusters and removing the entry
[ 57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004
[...]
[ 57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[...]
[ 57.331328] Call Trace:
[ 57.331477] <TASK>
[...]
[ 57.333511] ? do_user_addr_fault+0x3e5/0x740
[ 57.333778] ? exc_page_fault+0x70/0x170
[ 57.334016] ? asm_exc_page_fault+0x2b/0x30
[ 57.334263] ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10
[ 57.334596] ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[ 57.334913] ocfs2_xa_remove_entry+0x23/0xc0
[ 57.335164] ocfs2_xa_set+0x704/0xcf0
[ 57.335381] ? _raw_spin_unlock+0x1a/0x40
[ 57.335620] ? ocfs2_inode_cache_unlock+0x16/0x20
[ 57.335915] ? trace_preempt_on+0x1e/0x70
[ 57.336153] ? start_this_handle+0x16c/0x500
[ 57.336410] ? preempt_count_sub+0x50/0x80
[ 57.336656] ? _raw_read_unlock+0x20/0x40
[ 57.336906] ? start_this_handle+0x16c/0x500
[ 57.337162] ocfs2_xattr_block_set+0xa6/0x1e0
[ 57.337424] __ocfs2_xattr_set_handle+0x1fd/0x5d0
[ 57.337706] ? ocfs2_start_trans+0x13d/0x290
[ 57.337971] ocfs2_xattr_set+0xb13/0xfb0
[ 57.338207] ? dput+0x46/0x1c0
[ 57.338393] ocfs2_xattr_trusted_set+0x28/0x30
[ 57.338665] ? ocfs2_xattr_trusted_set+0x28/0x30
[ 57.338948] __vfs_removexattr+0x92/0xc0
[ 57.339182] __vfs_removexattr_locked+0xd5/0x190
[ 57.339456] ? preempt_count_sub+0x50/0x80
[ 57.339705] vfs_removexattr+0x5f/0x100
[...]
Reproducer uses faultinject facility to fail ocfs2_xa_remove() ->
ocfs2_xa_value_truncate() with -ENOMEM.
In this case the comment mentions that we can return 0 if
ocfs2_xa_cleanup_value_truncate() is going to wipe the entry
anyway. But the following 'rc' check is wrong and execution flow do
'ocfs2_xa_remove_entry(loc);' twice:
* 1st: in ocfs2_xa_cleanup_value_truncate();
* 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'.
Fix this by skipping the 2nd removal of the same entry and making
syzkaller repro happy.
Link: https://lkml.kernel.org/r/20241103193845.2940988-1-andrew.kanner@gmail.com
Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.")
Signed-off-by: Andrew Kanner <andrew.kanner(a)gmail.com>
Reported-by: syzbot+386ce9e60fa1b18aac5b(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/
Tested-by: syzbot+386ce9e60fa1b18aac5b(a)syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/xattr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/ocfs2/xattr.c~ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove
+++ a/fs/ocfs2/xattr.c
@@ -2036,8 +2036,7 @@ static int ocfs2_xa_remove(struct ocfs2_
rc = 0;
ocfs2_xa_cleanup_value_truncate(loc, "removing",
orig_clusters);
- if (rc)
- goto out;
+ goto out;
}
}
_
Patches currently in -mm which might be from andrew.kanner(a)gmail.com are
ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove.patch
on 2024/11/2 3:24, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> net: hns3: add sync command to sync io-pgtable
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> net-hns3-add-sync-command-to-sync-io-pgtable.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Hi:
This patch was reverted from netdev,
so, it also need be reverted from stable tree.
I am sorry for that.
reverted link:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=2…
Thanks,
Jijie Shao
>
>
>
> commit 98cb88cce78da8a369cf780343d0280f9d3af4ca
> Author: Jian Shen <shenjian15(a)huawei.com>
> Date: Fri Oct 25 17:29:31 2024 +0800
>
> net: hns3: add sync command to sync io-pgtable
>
> [ Upstream commit f2c14899caba76da93ff3fff46b4d5a8f43ce07e ]
>
> To avoid errors in pgtable prefectch, add a sync command to sync
> io-pagtable.
>
> This is a supplement for the previous patch.
> We want all the tx packet can be handled with tx bounce buffer path.
> But it depends on the remain space of the spare buffer, checked by the
> hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true.
> But once it return false by no available space, the packet will be handled
> with the former path, which will map/unmap the skb buffer.
> Then the driver will face the smmu prefetch risk again.
>
> So add a sync command in this case to avoid smmu prefectch,
> just protects corner scenes.
>
> Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision")
> Signed-off-by: Jian Shen <shenjian15(a)huawei.com>
> Signed-off-by: Peiyang Wang <wangpeiyang1(a)huawei.com>
> Signed-off-by: Jijie Shao <shaojijie(a)huawei.com>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> index 1f9bbf13214fb..bfcebf4e235ef 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> @@ -381,6 +381,24 @@ static const struct hns3_rx_ptype hns3_rx_ptype_tbl[] = {
> #define HNS3_INVALID_PTYPE \
> ARRAY_SIZE(hns3_rx_ptype_tbl)
>
> +static void hns3_dma_map_sync(struct device *dev, unsigned long iova)
> +{
> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> + struct iommu_iotlb_gather iotlb_gather;
> + size_t granule;
> +
> + if (!domain || !iommu_is_dma_domain(domain))
> + return;
> +
> + granule = 1 << __ffs(domain->pgsize_bitmap);
> + iova = ALIGN_DOWN(iova, granule);
> + iotlb_gather.start = iova;
> + iotlb_gather.end = iova + granule - 1;
> + iotlb_gather.pgsize = granule;
> +
> + iommu_iotlb_sync(domain, &iotlb_gather);
> +}
> +
> static irqreturn_t hns3_irq_handle(int irq, void *vector)
> {
> struct hns3_enet_tqp_vector *tqp_vector = vector;
> @@ -1728,7 +1746,9 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv,
> unsigned int type)
> {
> struct hns3_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use];
> + struct hnae3_handle *handle = ring->tqp->handle;
> struct device *dev = ring_to_dev(ring);
> + struct hnae3_ae_dev *ae_dev;
> unsigned int size;
> dma_addr_t dma;
>
> @@ -1760,6 +1780,13 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv,
> return -ENOMEM;
> }
>
> + /* Add a SYNC command to sync io-pgtale to avoid errors in pgtable
> + * prefetch
> + */
> + ae_dev = hns3_get_ae_dev(handle);
> + if (ae_dev->dev_version >= HNAE3_DEVICE_VERSION_V3)
> + hns3_dma_map_sync(dev, dma);
> +
> desc_cb->priv = priv;
> desc_cb->length = size;
> desc_cb->dma = dma;
From: Richard Zhu <hongxing.zhu(a)nxp.com>
When enable initcall_debug together with higher debug level below.
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=9
CONFIG_CONSOLE_LOGLEVEL_QUIET=9
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7
The initialization of i.MX8MP PCIe PHY might be timeout failed randomly.
To fix this issue, adjust the sequence of the resets refer to the power
up sequence listed below.
i.MX8MP PCIe PHY power up sequence:
/---------------------------------------------
1.8v supply ---------/
/---------------------------------------------------
0.8v supply ---/
---\ /--------------------------------------------------
X REFCLK Valid
Reference Clock ---/ \--------------------------------------------------
-------------------------------------------
|
i_init_restn --------------
------------------------------------
|
i_cmn_rstn ---------------------
-------------------------------
|
o_pll_lock_done --------------------------
Logs:
imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff -> 0x0000000000
imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff -> 0x0018000000
probe of clk_imx8mp_audiomix.reset.0 returned 0 after 1052 usecs
probe of 30e20000.clock-controller returned 0 after 32971 usecs
phy phy-32f00000.pcie-phy.4: phy poweron failed --> -110
probe of 30e10000.dma-controller returned 0 after 10235 usecs
imx6q-pcie 33800000.pcie: waiting for PHY ready timeout!
dwhdmi-imx 32fd8000.hdmi: Detected HDMI TX controller v2.13a with HDCP (samsung_dw_hdmi_phy2)
imx6q-pcie 33800000.pcie: probe with driver imx6q-pcie failed with error -110
Fixes: dce9edff16ee ("phy: freescale: imx8m-pcie: Add i.MX8MP PCIe PHY support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
v2 changes:
- Rebase to latest fixes branch of linux-phy git repo.
- Richard's environment have problem and can't sent out patch. So I help
post this fix patch.
---
drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
index 11fcb1867118c..e98361dcdeadf 100644
--- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
+++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
@@ -141,11 +141,6 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
IMX8MM_GPR_PCIE_REF_CLK_PLL);
usleep_range(100, 200);
- /* Do the PHY common block reset */
- regmap_update_bits(imx8_phy->iomuxc_gpr, IOMUXC_GPR14,
- IMX8MM_GPR_PCIE_CMN_RST,
- IMX8MM_GPR_PCIE_CMN_RST);
-
switch (imx8_phy->drvdata->variant) {
case IMX8MP:
reset_control_deassert(imx8_phy->perst);
@@ -156,6 +151,11 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
break;
}
+ /* Do the PHY common block reset */
+ regmap_update_bits(imx8_phy->iomuxc_gpr, IOMUXC_GPR14,
+ IMX8MM_GPR_PCIE_CMN_RST,
+ IMX8MM_GPR_PCIE_CMN_RST);
+
/* Polling to check the phy is ready or not. */
ret = readl_poll_timeout(imx8_phy->base + IMX8MM_PCIE_PHY_CMN_REG075,
val, val == ANA_PLL_DONE, 10, 20000);
--
2.34.1
Some applications need to check if there is an input device on the camera
before proceeding to the next step. When there is no input device,
the application will report an error.
Create input device for all uvc devices with status endpoints.
and only when bTriggerSupport and bTriggerUsage are one are
allowed to report camera button.
Fixes: 3bc22dc66a4f ("media: uvcvideo: Only create input devs if hw supports it")
Signed-off-by: chenchangcheng <ccc194101(a)163.com>
---
drivers/media/usb/uvc/uvc_status.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
index a78a88c710e2..177640c6a813 100644
--- a/drivers/media/usb/uvc/uvc_status.c
+++ b/drivers/media/usb/uvc/uvc_status.c
@@ -44,9 +44,6 @@ static int uvc_input_init(struct uvc_device *dev)
struct input_dev *input;
int ret;
- if (!uvc_input_has_button(dev))
- return 0;
-
input = input_allocate_device();
if (input == NULL)
return -ENOMEM;
@@ -110,10 +107,12 @@ static void uvc_event_streaming(struct uvc_device *dev,
if (len <= offsetof(struct uvc_status, streaming))
return;
- uvc_dbg(dev, STATUS, "Button (intf %u) %s len %d\n",
- status->bOriginator,
- status->streaming.button ? "pressed" : "released", len);
- uvc_input_report_key(dev, KEY_CAMERA, status->streaming.button);
+ if (uvc_input_has_button(dev)) {
+ uvc_dbg(dev, STATUS, "Button (intf %u) %s len %d\n",
+ status->bOriginator,
+ status->streaming.button ? "pressed" : "released", len);
+ uvc_input_report_key(dev, KEY_CAMERA, status->streaming.button);
+ }
} else {
uvc_dbg(dev, STATUS, "Stream %u error event %02x len %d\n",
status->bOriginator, status->bEvent, len);
--
2.25.1
If the device was already runtime suspended then during system suspend
we cannot access the device registers else it will crash.
Also we cannot access any registers after dwc3_core_exit() on some
platforms so move the dwc3_enable_susphy() call to the top.
Cc: stable(a)vger.kernel.org # v5.15+
Reported-by: William McVicker <willmcvicker(a)google.com>
Closes: https://lore.kernel.org/all/ZyVfcUuPq56R2m1Y@google.com
Fixes: 705e3ce37bcc ("usb: dwc3: core: Fix system suspend on TI AM62 platforms")
Signed-off-by: Roger Quadros <rogerq(a)kernel.org>
---
drivers/usb/dwc3/core.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 427e5660f87c..98114c2827c0 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2342,10 +2342,18 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
u32 reg;
int i;
- dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) &
- DWC3_GUSB2PHYCFG_SUSPHY) ||
- (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) &
- DWC3_GUSB3PIPECTL_SUSPHY);
+ if (!pm_runtime_suspended(dwc->dev) && !PMSG_IS_AUTO(msg)) {
+ dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) &
+ DWC3_GUSB2PHYCFG_SUSPHY) ||
+ (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) &
+ DWC3_GUSB3PIPECTL_SUSPHY);
+ /*
+ * TI AM62 platform requires SUSPHY to be
+ * enabled for system suspend to work.
+ */
+ if (!dwc->susphy_state)
+ dwc3_enable_susphy(dwc, true);
+ }
switch (dwc->current_dr_role) {
case DWC3_GCTL_PRTCAP_DEVICE:
@@ -2398,15 +2406,6 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
break;
}
- if (!PMSG_IS_AUTO(msg)) {
- /*
- * TI AM62 platform requires SUSPHY to be
- * enabled for system suspend to work.
- */
- if (!dwc->susphy_state)
- dwc3_enable_susphy(dwc, true);
- }
-
return 0;
}
---
base-commit: 42f7652d3eb527d03665b09edac47f85fb600924
change-id: 20241102-am62-lpm-usb-fix-347dd86135c1
Best regards,
--
Roger Quadros <rogerq(a)kernel.org>
From: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com>
In an event where the platform running the device control plane
is rebooted, reset is detected on the driver. It releases
all the resources and waits for the reset to complete. Once the
reset is done, it tries to build the resources back. At this
time if the device control plane is not yet started, then
the driver timeouts on the virtchnl message and retries to
establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox
workqueue is still alive and polling for the mailbox message.
This results in accessing the released control queue leading to
null-ptr-deref. Fix it by unrolling the work queue cancellation
and mailbox deinitialization in the reverse order which they got
initialized.
Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request")
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: stable(a)vger.kernel.org # 6.9+
Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
---
drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 +
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index c3848e10e7db..b4fbb99bfad2 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter)
*/
err = idpf_vc_core_init(adapter);
if (err) {
+ cancel_delayed_work_sync(&adapter->mbx_task);
idpf_deinit_dflt_mbx(adapter);
goto unlock_mutex;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index ce217e274506..d46c95f91b0d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -3063,7 +3063,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
adapter->state = __IDPF_VER_CHECK;
if (adapter->vcxn_mngr)
idpf_vc_xn_shutdown(adapter->vcxn_mngr);
- idpf_deinit_dflt_mbx(adapter);
set_bit(IDPF_HR_DRV_LOAD, adapter->flags);
queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task,
msecs_to_jiffies(task_delay));
--
2.42.0
We used the wrong device for the device managed functions. We used the
usb device, when we should be using the interface device.
If we unbind the driver from the usb interface, the cleanup functions
are never called. In our case, the IRQ is never disabled.
If an IRQ is triggered, it will try to access memory sections that are
already free, causing an OOPS.
Luckily this bug has small impact, as it is only affected by devices
with gpio units and the user has to unbind the device, a disconnect will
not trigger this error.
Cc: stable(a)vger.kernel.org
Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
drivers/media/usb/uvc/uvc_driver.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c
index a96f6ca0889f..1100d3ed342e 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev)
struct gpio_desc *gpio_privacy;
int irq;
- gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy",
+ gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy",
GPIOD_IN);
if (IS_ERR_OR_NULL(gpio_privacy))
return PTR_ERR_OR_ZERO(gpio_privacy);
irq = gpiod_to_irq(gpio_privacy);
if (irq < 0)
- return dev_err_probe(&dev->udev->dev, irq,
+ return dev_err_probe(&dev->intf->dev, irq,
"No IRQ for privacy GPIO\n");
unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT,
@@ -1333,7 +1333,7 @@ static int uvc_gpio_init_irq(struct uvc_device *dev)
if (!unit || unit->gpio.irq < 0)
return 0;
- return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL,
+ return devm_request_threaded_irq(&dev->intf->dev, unit->gpio.irq, NULL,
uvc_gpio_irq,
IRQF_ONESHOT | IRQF_TRIGGER_FALLING |
IRQF_TRIGGER_RISING,
---
base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a
change-id: 20241031-uvc-crashrmmod-666de3fc9141
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class
of signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with
siginfo. This prevents the process from correctly identifying the fault
address and handling the error. From the user-space perspective,
applications are unaware that the limit has been reached and that the
siginfo is effectively 'corrupted'. This can lead to unpredictable
behavior and crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and
skip the comparison to max there if override_rlimit is set. This
effectively restores the old behavior.
v2: refactor to make the logic simpler (Eric, Oleg, Alexey)
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Co-developed-by: Andrei Vagin <avagin(a)google.com>
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
---
include/linux/user_namespace.h | 3 ++-
kernel/signal.c | 3 ++-
kernel/ucount.c | 6 ++++--
3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 3625096d5f85..7183e5aca282 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -141,7 +141,8 @@ static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type ty
long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type);
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit);
void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type);
bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max);
diff --git a/kernel/signal.c b/kernel/signal.c
index 4344860ffcac..cbabb2d05e0a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
*/
rcu_read_lock();
ucounts = task_ucounts(t);
- sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
+ sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING,
+ override_rlimit);
rcu_read_unlock();
if (!sigpending)
return NULL;
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 16c0ea1cb432..49fcec41e5b4 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type)
do_dec_rlimit_put_ucounts(ucounts, NULL, type);
}
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit)
{
/* Caller must hold a reference to ucounts */
struct ucounts *iter;
@@ -320,7 +321,8 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
goto unwind;
if (iter == ucounts)
ret = new;
- max = get_userns_rlimit_max(iter->ns, type);
+ if (!override_rlimit)
+ max = get_userns_rlimit_max(iter->ns, type);
/*
* Grab an extra ucount reference for the caller when
* the rlimit count was previously 0.
--
2.47.0.199.ga7371fff76-goog
The patch titled
Subject: signal: restore the override_rlimit logic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
signal-restore-the-override_rlimit-logic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Roman Gushchin <roman.gushchin(a)linux.dev>
Subject: signal: restore the override_rlimit logic
Date: Mon, 4 Nov 2024 19:54:19 +0000
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo.
This prevents the process from correctly identifying the fault address and
handling the error. From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'. This can lead to unpredictable behavior and
crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set. This effectively
restores the old behavior.
Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Co-developed-by: Andrei Vagin <avagin(a)google.com>
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/user_namespace.h | 3 ++-
kernel/signal.c | 3 ++-
kernel/ucount.c | 6 ++++--
3 files changed, 8 insertions(+), 4 deletions(-)
--- a/include/linux/user_namespace.h~signal-restore-the-override_rlimit-logic
+++ a/include/linux/user_namespace.h
@@ -141,7 +141,8 @@ static inline long get_rlimit_value(stru
long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type);
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit);
void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type);
bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max);
--- a/kernel/signal.c~signal-restore-the-override_rlimit-logic
+++ a/kernel/signal.c
@@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_st
*/
rcu_read_lock();
ucounts = task_ucounts(t);
- sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
+ sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING,
+ override_rlimit);
rcu_read_unlock();
if (!sigpending)
return NULL;
--- a/kernel/ucount.c~signal-restore-the-override_rlimit-logic
+++ a/kernel/ucount.c
@@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucoun
do_dec_rlimit_put_ucounts(ucounts, NULL, type);
}
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit)
{
/* Caller must hold a reference to ucounts */
struct ucounts *iter;
@@ -320,7 +321,8 @@ long inc_rlimit_get_ucounts(struct ucoun
goto dec_unwind;
if (iter == ucounts)
ret = new;
- max = get_userns_rlimit_max(iter->ns, type);
+ if (!override_rlimit)
+ max = get_userns_rlimit_max(iter->ns, type);
/*
* Grab an extra ucount reference for the caller when
* the rlimit count was previously 0.
_
Patches currently in -mm which might be from roman.gushchin(a)linux.dev are
signal-restore-the-override_rlimit-logic.patch
Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround
broken TEF FIFO tail index erratum") introduced
mcp251xfd_get_tef_len() to get the number of unhandled transmit events
from the Transmit Event FIFO (TEF).
As the TEF has no head pointer, the driver uses the TX FIFO's tail
pointer instead, assuming that send frames are completed. However the
check for the TEF being full was not correct. This leads to the driver
stop working if the TEF is full.
Fix the TEF full check by assuming that if, from the driver's point of
view, there are no free TX buffers in the chip and the TX FIFO is
empty, all messages must have been sent and the TEF must therefore be
full.
Reported-by: Sven Schuchmann <schuchmann(a)schleissheimer.de>
Closes: https://patch.msgid.link/FR3P281MB155216711EFF900AD9791B7ED9692@FR3P281MB15…
Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum")
Tested-by: Sven Schuchmann <schuchmann(a)schleissheimer.de>
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20241104-mcp251xfd-fix-length-calculation-v3-1-608…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
index f732556d233a..d3ac865933fd 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
@@ -16,9 +16,9 @@
#include "mcp251xfd.h"
-static inline bool mcp251xfd_tx_fifo_sta_full(u32 fifo_sta)
+static inline bool mcp251xfd_tx_fifo_sta_empty(u32 fifo_sta)
{
- return !(fifo_sta & MCP251XFD_REG_FIFOSTA_TFNRFNIF);
+ return fifo_sta & MCP251XFD_REG_FIFOSTA_TFERFFIF;
}
static inline int
@@ -122,7 +122,11 @@ mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p)
if (err)
return err;
- if (mcp251xfd_tx_fifo_sta_full(fifo_sta)) {
+ /* If the chip says the TX-FIFO is empty, but there are no TX
+ * buffers free in the ring, we assume all have been sent.
+ */
+ if (mcp251xfd_tx_fifo_sta_empty(fifo_sta) &&
+ mcp251xfd_get_tx_free(tx_ring) == 0) {
*len_p = tx_ring->obj_num;
return 0;
}
--
2.45.2
In commit b382380c0d2d ("can: m_can: Add hrtimer to generate software
interrupt") support for IRQ-less devices was added. Instead of an
interrupt, the interrupt routine is called by a hrtimer-based polling
loop.
That patch forgot to change free_irq() to be only called for devices
with IRQs. Fix this, by calling free_irq() conditionally only if an
IRQ is available for the device (and thus has been requested
previously).
Fixes: b382380c0d2d ("can: m_can: Add hrtimer to generate software interrupt")
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Markus Schneider-Pargmann <msp(a)baylibre.com>
Link: https://patch.msgid.link/20240930-m_can-cleanups-v1-1-001c579cdee4@pengutro…
Cc: <stable(a)vger.kernel.org> # v6.6+
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/m_can/m_can.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index a978b960f1f1..16e9e7d7527d 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1765,7 +1765,8 @@ static int m_can_close(struct net_device *dev)
netif_stop_queue(dev);
m_can_stop(dev);
- free_irq(dev->irq, dev);
+ if (dev->irq)
+ free_irq(dev->irq, dev);
m_can_clean(dev);
--
2.45.2
From: Thomas Mühlbacher <tmuehlbacher(a)posteo.net>
The ISA variable is only defined if X86_32 is also defined. However,
these drivers are still useful and in use on at least some modern 64-bit
x86 industrial systems as well. With the correct module parameters, they
work as long as IO port communication is possible, despite their name
having ISA in them.
Fixes: a29689e60ed3 ("net: handle HAS_IOPORT dependencies")
Signed-off-by: Thomas Mühlbacher <tmuehlbacher(a)posteo.net>
Link: https://patch.msgid.link/20240919174151.15473-2-tmuehlbacher@posteo.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/cc770/Kconfig | 2 +-
drivers/net/can/sja1000/Kconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/cc770/Kconfig b/drivers/net/can/cc770/Kconfig
index 467ef19de1c1..aae25c2f849e 100644
--- a/drivers/net/can/cc770/Kconfig
+++ b/drivers/net/can/cc770/Kconfig
@@ -7,7 +7,7 @@ if CAN_CC770
config CAN_CC770_ISA
tristate "ISA Bus based legacy CC770 driver"
- depends on ISA
+ depends on HAS_IOPORT
help
This driver adds legacy support for CC770 and AN82527 chips
connected to the ISA bus using I/O port, memory mapped or
diff --git a/drivers/net/can/sja1000/Kconfig b/drivers/net/can/sja1000/Kconfig
index 01168db4c106..2f516cc6d22c 100644
--- a/drivers/net/can/sja1000/Kconfig
+++ b/drivers/net/can/sja1000/Kconfig
@@ -87,7 +87,7 @@ config CAN_PLX_PCI
config CAN_SJA1000_ISA
tristate "ISA Bus based legacy SJA1000 driver"
- depends on ISA
+ depends on HAS_IOPORT
help
This driver adds legacy support for SJA1000 chips connected to
the ISA bus using I/O port, memory mapped or indirect access.
--
2.45.2
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class
of signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with
siginfo. This prevents the process from correctly identifying the fault
address and handling the error. From the user-space perspective,
applications are unaware that the limit has been reached and that the
siginfo is effectively 'corrupted'. This can lead to unpredictable
behavior and crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and
skip the comparison to max there if override_rlimit is set. This
effectively restores the old behavior.
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Co-developed-by: Andrei Vagin <avagin(a)google.com>
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
---
include/linux/user_namespace.h | 3 ++-
kernel/signal.c | 3 ++-
kernel/ucount.c | 5 +++--
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 3625096d5f85..7183e5aca282 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -141,7 +141,8 @@ static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type ty
long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type);
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit);
void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type);
bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max);
diff --git a/kernel/signal.c b/kernel/signal.c
index 4344860ffcac..cbabb2d05e0a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
*/
rcu_read_lock();
ucounts = task_ucounts(t);
- sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
+ sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING,
+ override_rlimit);
rcu_read_unlock();
if (!sigpending)
return NULL;
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 16c0ea1cb432..046b3d57ebb4 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type)
do_dec_rlimit_put_ucounts(ucounts, NULL, type);
}
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit)
{
/* Caller must hold a reference to ucounts */
struct ucounts *iter;
@@ -316,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(1, &iter->rlimit[type]);
- if (new < 0 || new > max)
+ if (new < 0 || (!override_rlimit && (new > max)))
goto unwind;
if (iter == ucounts)
ret = new;
--
2.47.0.163.g1226f6d8fa-goog
From: Rob Clark <robdclark(a)chromium.org>
commit afce71ff6daa9c0f852df0727fe32c6fb107f0fa upstream.
gem_context_register() makes the context visible to userspace, and which
point a separate thread can trigger the I915_GEM_CONTEXT_DESTROY ioctl.
So we need to ensure that nothing uses the ctx ptr after this. And we
need to ensure that adding the ctx to the xarray is the *last* thing
that gem_context_register() does with the ctx pointer.
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
Fixes: eb4dedae920a ("drm/i915/gem: Delay tracking the GEM context until it is registered")
Fixes: a4c1cdd34e2c ("drm/i915/gem: Delay context creation (v3)")
Fixes: 49bd54b390c2 ("drm/i915: Track all user contexts per client")
Cc: <stable(a)vger.kernel.org> # v5.10+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
[tursulin: Stable and fixes tags add/tidy.]
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230103234948.1218393-1-robd…
(cherry picked from commit bed4b455cf5374e68879be56971c1da563bcd90c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Sherry: bp to fix CVE-2023-52913, ignore context conflicts due to
missing commit 49bd54b390c2 "drm/i915: Track all user contexts per
client")]
Signed-off-by: Sherry Yang <sherry.yang(a)oracle.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 0eb4a0739fa2..0a7c4548b77f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1436,6 +1436,10 @@ void i915_gem_init__contexts(struct drm_i915_private *i915)
init_contexts(&i915->gem.contexts);
}
+/*
+ * Note that this implicitly consumes the ctx reference, by placing
+ * the ctx in the context_xa.
+ */
static void gem_context_register(struct i915_gem_context *ctx,
struct drm_i915_file_private *fpriv,
u32 id)
@@ -1449,13 +1453,13 @@ static void gem_context_register(struct i915_gem_context *ctx,
snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
current->comm, pid_nr(ctx->pid));
- /* And finally expose ourselves to userspace via the idr */
- old = xa_store(&fpriv->context_xa, id, ctx, GFP_KERNEL);
- WARN_ON(old);
-
spin_lock(&i915->gem.contexts.lock);
list_add_tail(&ctx->link, &i915->gem.contexts.list);
spin_unlock(&i915->gem.contexts.lock);
+
+ /* And finally expose ourselves to userspace via the idr */
+ old = xa_store(&fpriv->context_xa, id, ctx, GFP_KERNEL);
+ WARN_ON(old);
}
int i915_gem_context_open(struct drm_i915_private *i915,
@@ -1932,14 +1936,22 @@ finalize_create_context_locked(struct drm_i915_file_private *file_priv,
if (IS_ERR(ctx))
return ctx;
+ /*
+ * One for the xarray and one for the caller. We need to grab
+ * the reference *prior* to making the ctx visble to userspace
+ * in gem_context_register(), as at any point after that
+ * userspace can try to race us with another thread destroying
+ * the context under our feet.
+ */
+ i915_gem_context_get(ctx);
+
gem_context_register(ctx, file_priv, id);
old = xa_erase(&file_priv->proto_context_xa, id);
GEM_BUG_ON(old != pc);
proto_context_close(pc);
- /* One for the xarray and one for the caller */
- return i915_gem_context_get(ctx);
+ return ctx;
}
struct i915_gem_context *
--
2.46.0
Setting GPIO direction = high, sometimes results in GPIO value = 0.
If a GPIO is pulled high, the following construction results in the
value being 0 when the desired value is 1:
$ echo "high" > /sys/class/gpio/gpio336/direction
$ cat /sys/class/gpio/gpio336/value
0
Before the GPIO direction is changed from an input to an output,
exar_set_value() is called with value = 1, but since the GPIO is an
input when exar_set_value() is called, _regmap_update_bits() reads a 1
due to an external pull-up. regmap_set_bits() sets force_write =
false, so the value (1) is not written. When the direction is then
changed, the GPIO becomes an output with the value of 0 (the hardware
default).
regmap_write_bits() sets force_write = true, so the value is always
written by exar_set_value() and an external pull-up doesn't affect the
outcome of setting direction = high.
The same can happen when a GPIO is pulled low, but the scenario is a
little more complicated.
$ echo high > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
1
$ echo in > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
0
$ echo low > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
1
Fixes: 36fb7218e878 ("gpio: exar: switch to using regmap")
Signed-off-by: Sai Kumar Cholleti <skmr537(a)gmail.com>
Signed-off-by: Matthew McClain <mmcclain(a)noprivs.com>
Cc: <stable(a)vger.kernel.org>
---
drivers/gpio/gpio-exar.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpio/gpio-exar.c b/drivers/gpio/gpio-exar.c
index 5170fe7599cd..dfc7a9ca3e62 100644
--- a/drivers/gpio/gpio-exar.c
+++ b/drivers/gpio/gpio-exar.c
@@ -99,11 +99,13 @@ static void exar_set_value(struct gpio_chip *chip, unsigned int offset,
struct exar_gpio_chip *exar_gpio = gpiochip_get_data(chip);
unsigned int addr = exar_offset_to_lvl_addr(exar_gpio, offset);
unsigned int bit = exar_offset_to_bit(exar_gpio, offset);
+ unsigned int bit_value = value ? BIT(bit) : 0;
- if (value)
- regmap_set_bits(exar_gpio->regmap, addr, BIT(bit));
- else
- regmap_clear_bits(exar_gpio->regmap, addr, BIT(bit));
+ /*
+ * regmap_write_bits forces value to be written when an external
+ * pull up/down might otherwise indicate value was already set
+ */
+ regmap_write_bits(exar_gpio->regmap, addr, BIT(bit), bit_value);
}
static int exar_direction_output(struct gpio_chip *chip, unsigned int offset,
--
2.34.1
Hi folks, here is a series with some fixes for dummy_hcd. First of all,
the reasoning behind it.
Syzkaller report [0] shows a hung task on uevent_show, and despite it was
fixed with a patch on drivers/base (a race between drivers shutdown and
uevent_show), another issue remains: a problem with Realtek emulated wifi
device [1]. While working the fix ([1]), we noticed that if it is
applied to recent kernels, all fine. But in v6.1.y and v6.6.y for example,
it didn't solve entirely the issue, and after some debugging, it was
narrowed to dummy_hcd transfer rates being waaay slower in such stable
versions.
The reason of such slowness is well-described in the first 2 patches of
this backport, but the thing is that these patches introduced subtle issues
as well, fixed in the other 2 patches. Hence, I decided to backport all of
them for the 2 latest LTS kernels.
Maybe this is not a good idea - I don't see a strong con, but who's
better to judge the benefits vs the risks than the patch authors,
reviewers, and the USB maintainer?! So, I've CCed Alan, Andrey, Greg and
Marcello here, and I thank you all in advance for reviews on this. And
my apologies for bothering you with the emails, I hope this is a simple
"OK, makes sense" or "Nah, doesn't worth it" situation =)
Cheers,
Guilherme
[0] https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5
[1] https://lore.kernel.org/r/20241101193412.1390391-1-gpiccoli@igalia.com/
Alan Stern (1):
USB: gadget: dummy-hcd: Fix "task hung" problem
Andrey Konovalov (1):
usb: gadget: dummy_hcd: execute hrtimer callback in softirq context
Marcello Sylvester Bauer (2):
usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler
usb: gadget: dummy_hcd: Set transfer interval to 1 microframe
drivers/usb/gadget/udc/dummy_hcd.c | 57 ++++++++++++++++++++----------
1 file changed, 38 insertions(+), 19 deletions(-)
--
2.46.2
Since commit 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init"),
system suspend is broken on AM62 TI platforms.
Before that commit, both DWC3_GUSB3PIPECTL_SUSPHY and DWC3_GUSB2PHYCFG_SUSPHY
bits (hence forth called 2 SUSPHY bits) were being set during core
initialization and even during core re-initialization after a system
suspend/resume.
These bits are required to be set for system suspend/resume to work correctly
on AM62 platforms.
Since that commit, the 2 SUSPHY bits are not set for DEVICE/OTG mode if gadget
driver is not loaded and started.
For Host mode, the 2 SUSPHY bits are set before the first system suspend but
get cleared at system resume during core re-init and are never set again.
This patch resovles these two issues by ensuring the 2 SUSPHY bits are set
before system suspend and restored to the original state during system resume.
Cc: stable(a)vger.kernel.org # v6.9+
Fixes: 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init")
Link: https://lore.kernel.org/all/1519dbe7-73b6-4afc-bfe3-23f4f75d772f@kernel.org/
Signed-off-by: Roger Quadros <rogerq(a)kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
Changes in v3:
- Fix single line comment style
- add DWC3_GUSB3PIPECTL_SUSPHY to documentation of susphy_state
- Added Acked-by tag
- Link to v2: https://lore.kernel.org/r/20241009-am62-lpm-usb-v2-1-da26c0cd2b1e@kernel.org
Changes in v2:
- Fix comment style
- Use both USB3 and USB2 SUSPHY bits to determine susphy_state during system suspend/resume.
- Restore SUSPHY bits at system resume regardless if it was set or cleared before system suspend.
- Link to v1: https://lore.kernel.org/r/20241001-am62-lpm-usb-v1-1-9916b71165f7@kernel.org
---
drivers/usb/dwc3/core.c | 19 +++++++++++++++++++
drivers/usb/dwc3/core.h | 3 +++
2 files changed, 22 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 9eb085f359ce..ca77f0b186c4 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2336,6 +2336,11 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
u32 reg;
int i;
+ dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) &
+ DWC3_GUSB2PHYCFG_SUSPHY) ||
+ (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) &
+ DWC3_GUSB3PIPECTL_SUSPHY);
+
switch (dwc->current_dr_role) {
case DWC3_GCTL_PRTCAP_DEVICE:
if (pm_runtime_suspended(dwc->dev))
@@ -2387,6 +2392,15 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
break;
}
+ if (!PMSG_IS_AUTO(msg)) {
+ /*
+ * TI AM62 platform requires SUSPHY to be
+ * enabled for system suspend to work.
+ */
+ if (!dwc->susphy_state)
+ dwc3_enable_susphy(dwc, true);
+ }
+
return 0;
}
@@ -2454,6 +2468,11 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
break;
}
+ if (!PMSG_IS_AUTO(msg)) {
+ /* restore SUSPHY state to that before system suspend. */
+ dwc3_enable_susphy(dwc, dwc->susphy_state);
+ }
+
return 0;
}
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index c71240e8f7c7..31de4b57ae7c 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1150,6 +1150,8 @@ struct dwc3_scratchpad_array {
* @sys_wakeup: set if the device may do system wakeup.
* @wakeup_configured: set if the device is configured for remote wakeup.
* @suspended: set to track suspend event due to U3/L2.
+ * @susphy_state: state of DWC3_GUSB2PHYCFG_SUSPHY + DWC3_GUSB3PIPECTL_SUSPHY
+ * before PM suspend.
* @imod_interval: set the interrupt moderation interval in 250ns
* increments or 0 to disable.
* @max_cfg_eps: current max number of IN eps used across all USB configs.
@@ -1382,6 +1384,7 @@ struct dwc3 {
unsigned sys_wakeup:1;
unsigned wakeup_configured:1;
unsigned suspended:1;
+ unsigned susphy_state:1;
u16 imod_interval;
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20240923-am62-lpm-usb-f420917bd707
Best regards,
--
Roger Quadros <rogerq(a)kernel.org>
The IT6505 bridge chip has a active low reset line. Since it is a
"reset" and not an "enable" line, the GPIO should be asserted to
put it in reset and deasserted to bring it out of reset during
the power on sequence.
The polarity was inverted when the driver was first introduced, likely
because the device family that was targeted had an inverting level
shifter on the reset line.
The MT8186 Corsola devices already have the IT6505 in their device tree,
but the whole display pipeline is actually disabled and won't be enabled
until some remaining issues are sorted out. The other known user is
the MT8183 Kukui / Jacuzzi family; their device trees currently do not
have the IT6505 included.
Fix the polarity in the driver while there are no actual users.
Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
---
drivers/gpu/drm/bridge/ite-it6505.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/bridge/ite-it6505.c b/drivers/gpu/drm/bridge/ite-it6505.c
index 7502a5f81557..df7ecdf0f422 100644
--- a/drivers/gpu/drm/bridge/ite-it6505.c
+++ b/drivers/gpu/drm/bridge/ite-it6505.c
@@ -2618,9 +2618,9 @@ static int it6505_poweron(struct it6505 *it6505)
/* time interval between OVDD and SYSRSTN at least be 10ms */
if (pdata->gpiod_reset) {
usleep_range(10000, 20000);
- gpiod_set_value_cansleep(pdata->gpiod_reset, 0);
- usleep_range(1000, 2000);
gpiod_set_value_cansleep(pdata->gpiod_reset, 1);
+ usleep_range(1000, 2000);
+ gpiod_set_value_cansleep(pdata->gpiod_reset, 0);
usleep_range(25000, 35000);
}
@@ -2651,7 +2651,7 @@ static int it6505_poweroff(struct it6505 *it6505)
disable_irq_nosync(it6505->irq);
if (pdata->gpiod_reset)
- gpiod_set_value_cansleep(pdata->gpiod_reset, 0);
+ gpiod_set_value_cansleep(pdata->gpiod_reset, 1);
if (pdata->pwr18) {
err = regulator_disable(pdata->pwr18);
@@ -3205,7 +3205,7 @@ static int it6505_init_pdata(struct it6505 *it6505)
return PTR_ERR(pdata->ovdd);
}
- pdata->gpiod_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_LOW);
+ pdata->gpiod_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH);
if (IS_ERR(pdata->gpiod_reset)) {
dev_err(dev, "gpiod_reset gpio not found");
return PTR_ERR(pdata->gpiod_reset);
--
2.47.0.163.g1226f6d8fa-goog
This series fixes a wrong handling of the child node within the
for_each_child_of_node() by adding the missing call to of_node_put() to
make it compatible with stable kernels that don't provide the scoped
variant of the macro, which is more secure and was introduced early this
year.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
drm/mediatek: Fix child node refcount handling in early exit
drm/mediatek: Switch to for_each_child_of_node_scoped()
drivers/gpu/drm/mediatek/mtk_drm_drv.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241011-mtk_drm_drv_memleak-5e8b8e45ed1c
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Move LNL scheduling WA to xe_device.h so this can be used in other
places without needing keep the same comment about removal of this WA
in the future. The WA, which flushes work or workqueues, is now wrapped
in macros and can be reused wherever needed.
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
cc: <stable(a)vger.kernel.org> # v6.11+
Suggested-by: John Harrison <John.C.Harrison(a)Intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
---
drivers/gpu/drm/xe/xe_device.h | 14 ++++++++++++++
drivers/gpu/drm/xe/xe_guc_ct.c | 11 +----------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 4c3f0ebe78a9..f1fbfe916867 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -191,4 +191,18 @@ void xe_device_declare_wedged(struct xe_device *xe);
struct xe_file *xe_file_get(struct xe_file *xef);
void xe_file_put(struct xe_file *xef);
+/*
+ * Occasionally it is seen that the G2H worker starts running after a delay of more than
+ * a second even after being queued and activated by the Linux workqueue subsystem. This
+ * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
+ * Lunarlake Hybrid CPU. Issue disappears if we disable Lunarlake atom cores from BIOS
+ * and this is beyond xe kmd.
+ *
+ * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
+ */
+#define LNL_FLUSH_WORKQUEUE(wq__) \
+ flush_workqueue(wq__)
+#define LNL_FLUSH_WORK(wrk__) \
+ flush_work(wrk__)
+
#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 1b5d8fb1033a..703b44b257a7 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1018,17 +1018,8 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
- /*
- * Occasionally it is seen that the G2H worker starts running after a delay of more than
- * a second even after being queued and activated by the Linux workqueue subsystem. This
- * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
- * Lunarlake Hybrid CPU. Issue dissappears if we disable Lunarlake atom cores from BIOS
- * and this is beyond xe kmd.
- *
- * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
- */
if (!ret) {
- flush_work(&ct->g2h_worker);
+ LNL_FLUSH_WORK(&ct->g2h_worker);
if (g2h_fence.done) {
xe_gt_warn(gt, "G2H fence %u, action %04x, done\n",
g2h_fence.seqno, action[0]);
--
2.46.0
On 2024-11-01 15:21:24 [-0400], Sasha Levin wrote:
> commit 052382490ee4f0f6d783ddce02fe6f2d15e134b5
> Author: Wander Lairson Costa <wander(a)redhat.com>
> Date: Mon Oct 21 16:26:24 2024 -0700
>
> igb: Disable threaded IRQ for igb_msix_other
>
> [ Upstream commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f ]
>
> During testing of SR-IOV, Red Hat QE encountered an issue where the
> ip link up command intermittently fails for the igbvf interfaces when
> using the PREEMPT_RT variant. Investigation revealed that
> e1000_write_posted_mbx returns an error due to the lack of an ACK
> from e1000_poll_for_ack.
>
> The underlying issue arises from the fact that IRQs are threaded by
> default under PREEMPT_RT. While the exact hardware details are not
> available, it appears that the IRQ handled by igb_msix_other must
> be processed before e1000_poll_for_ack times out. However,
> e1000_write_posted_mbx is called with preemption disabled, leading
> to a scenario where the IRQ is serviced only after the failure of
> e1000_write_posted_mbx.
>
> To resolve this, we set IRQF_NO_THREAD for the affected interrupt,
> ensuring that the kernel handles it immediately, thereby preventing
> the aforementioned error.
Wander, please send a revert of this patch. The ISR (E1000_ICR_TS set)
may invoke igb_msg_task(), ptp_clock_event(), igb_perout(), igb_extts()
each of which acquire sleeping locks on PREEMPT_RT. Not sure if this
improved the situation or not.
Sebastian
From: Hans de Goede <hdegoede(a)redhat.com>
[ Upstream commit d48696b915527b5bcdd207a299aec03fb037eb17 ]
On some x86 Bay Trail tablets which shipped with Android as factory OS,
the DSDT is so broken that the codec needs to be manually instantatiated
by the special x86-android-tablets.ko "fixup" driver for cases like this.
This means that the codec-dev cannot be retrieved through its ACPI fwnode,
add support to the bytcr_rt5640 machine driver for such manually
instantiated rt5640 i2c_clients.
An example of a tablet which needs this is the Vexia EDU ATLA 10 tablet,
which has been distributed to schools in the Spanish Andalucía region.
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://patch.msgid.link/20241024211615.79518-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/intel/boards/bytcr_rt5640.c | 33 ++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c
index 3d2a0e8cad9a5..899a8435a1eb8 100644
--- a/sound/soc/intel/boards/bytcr_rt5640.c
+++ b/sound/soc/intel/boards/bytcr_rt5640.c
@@ -17,6 +17,7 @@
#include <linux/acpi.h>
#include <linux/clk.h>
#include <linux/device.h>
+#include <linux/device/bus.h>
#include <linux/dmi.h>
#include <linux/gpio/consumer.h>
#include <linux/gpio/machine.h>
@@ -32,6 +33,8 @@
#include "../atom/sst-atom-controls.h"
#include "../common/soc-intel-quirks.h"
+#define BYT_RT5640_FALLBACK_CODEC_DEV_NAME "i2c-rt5640"
+
enum {
BYT_RT5640_DMIC1_MAP,
BYT_RT5640_DMIC2_MAP,
@@ -1616,9 +1619,33 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev)
codec_dev = acpi_get_first_physical_node(adev);
acpi_dev_put(adev);
- if (!codec_dev)
- return -EPROBE_DEFER;
- priv->codec_dev = get_device(codec_dev);
+
+ if (codec_dev) {
+ priv->codec_dev = get_device(codec_dev);
+ } else {
+ /*
+ * Special case for Android tablets where the codec i2c_client
+ * has been manually instantiated by x86_android_tablets.ko due
+ * to a broken DSDT.
+ */
+ codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL,
+ BYT_RT5640_FALLBACK_CODEC_DEV_NAME);
+ if (!codec_dev)
+ return -EPROBE_DEFER;
+
+ if (!i2c_verify_client(codec_dev)) {
+ dev_err(dev, "Error '%s' is not an i2c_client\n",
+ BYT_RT5640_FALLBACK_CODEC_DEV_NAME);
+ put_device(codec_dev);
+ }
+
+ /* fixup codec name */
+ strscpy(byt_rt5640_codec_name, BYT_RT5640_FALLBACK_CODEC_DEV_NAME,
+ sizeof(byt_rt5640_codec_name));
+
+ /* bus_find_device() returns a reference no need to get() */
+ priv->codec_dev = codec_dev;
+ }
/*
* swap SSP0 if bytcr is detected
--
2.43.0
From: Hans de Goede <hdegoede(a)redhat.com>
[ Upstream commit d48696b915527b5bcdd207a299aec03fb037eb17 ]
On some x86 Bay Trail tablets which shipped with Android as factory OS,
the DSDT is so broken that the codec needs to be manually instantatiated
by the special x86-android-tablets.ko "fixup" driver for cases like this.
This means that the codec-dev cannot be retrieved through its ACPI fwnode,
add support to the bytcr_rt5640 machine driver for such manually
instantiated rt5640 i2c_clients.
An example of a tablet which needs this is the Vexia EDU ATLA 10 tablet,
which has been distributed to schools in the Spanish Andalucía region.
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://patch.msgid.link/20241024211615.79518-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/intel/boards/bytcr_rt5640.c | 33 ++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c
index ff879e173d51d..7a57d7abd3803 100644
--- a/sound/soc/intel/boards/bytcr_rt5640.c
+++ b/sound/soc/intel/boards/bytcr_rt5640.c
@@ -17,6 +17,7 @@
#include <linux/acpi.h>
#include <linux/clk.h>
#include <linux/device.h>
+#include <linux/device/bus.h>
#include <linux/dmi.h>
#include <linux/gpio/consumer.h>
#include <linux/gpio/machine.h>
@@ -32,6 +33,8 @@
#include "../atom/sst-atom-controls.h"
#include "../common/soc-intel-quirks.h"
+#define BYT_RT5640_FALLBACK_CODEC_DEV_NAME "i2c-rt5640"
+
enum {
BYT_RT5640_DMIC1_MAP,
BYT_RT5640_DMIC2_MAP,
@@ -1687,9 +1690,33 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev)
codec_dev = acpi_get_first_physical_node(adev);
acpi_dev_put(adev);
- if (!codec_dev)
- return -EPROBE_DEFER;
- priv->codec_dev = get_device(codec_dev);
+
+ if (codec_dev) {
+ priv->codec_dev = get_device(codec_dev);
+ } else {
+ /*
+ * Special case for Android tablets where the codec i2c_client
+ * has been manually instantiated by x86_android_tablets.ko due
+ * to a broken DSDT.
+ */
+ codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL,
+ BYT_RT5640_FALLBACK_CODEC_DEV_NAME);
+ if (!codec_dev)
+ return -EPROBE_DEFER;
+
+ if (!i2c_verify_client(codec_dev)) {
+ dev_err(dev, "Error '%s' is not an i2c_client\n",
+ BYT_RT5640_FALLBACK_CODEC_DEV_NAME);
+ put_device(codec_dev);
+ }
+
+ /* fixup codec name */
+ strscpy(byt_rt5640_codec_name, BYT_RT5640_FALLBACK_CODEC_DEV_NAME,
+ sizeof(byt_rt5640_codec_name));
+
+ /* bus_find_device() returns a reference no need to get() */
+ priv->codec_dev = codec_dev;
+ }
/*
* swap SSP0 if bytcr is detected
--
2.43.0
From: "Gustavo A. R. Silva" <gustavoars(a)kernel.org>
[ Upstream commit 57be3d3562ca4aa62b8047bc681028cc402af8ce ]
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
So, in order to avoid ending up with a flexible-array member in the
middle of multiple other structs, we use the `__struct_group()`
helper to create a new tagged `struct ieee80211_radiotap_header_fixed`.
This structure groups together all the members of the flexible
`struct ieee80211_radiotap_header` except the flexible array.
As a result, the array is effectively separated from the rest of the
members without modifying the memory layout of the flexible structure.
We then change the type of the middle struct members currently causing
trouble from `struct ieee80211_radiotap_header` to `struct
ieee80211_radiotap_header_fixed`.
We also want to ensure that in case new members need to be added to the
flexible structure, they are always included within the newly created
tagged struct. For this, we use `static_assert()`. This ensures that the
memory layout for both the flexible structure and the new tagged struct
is the same after any changes.
This approach avoids having to implement `struct ieee80211_radiotap_header_fixed`
as a completely separate structure, thus preventing having to maintain
two independent but basically identical structures, closing the door
to potential bugs in the future.
So, with these changes, fix the following warnings:
drivers/net/wireless/ath/wil6210/txrx.c:309:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/ipw2x00/ipw2100.c:2521:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/ipw2x00/ipw2200.h:1146:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/ipw2x00/libipw.h:595:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/marvell/libertas/radiotap.h:34:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/marvell/libertas/radiotap.h:5:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/microchip/wilc1000/mon.c:10:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/microchip/wilc1000/mon.c:15:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/virtual/mac80211_hwsim.c:758:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/virtual/mac80211_hwsim.c:767:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Link: https://patch.msgid.link/ZwBMtBZKcrzwU7l4@kspp
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/wil6210/txrx.c | 2 +-
drivers/net/wireless/intel/ipw2x00/ipw2100.c | 2 +-
drivers/net/wireless/intel/ipw2x00/ipw2200.h | 2 +-
.../net/wireless/marvell/libertas/radiotap.h | 4 +-
drivers/net/wireless/microchip/wilc1000/mon.c | 4 +-
drivers/net/wireless/virtual/mac80211_hwsim.c | 4 +-
include/net/ieee80211_radiotap.h | 43 +++++++++++--------
7 files changed, 33 insertions(+), 28 deletions(-)
diff --git a/drivers/net/wireless/ath/wil6210/txrx.c b/drivers/net/wireless/ath/wil6210/txrx.c
index f29ac6de71399..19702b6f09c32 100644
--- a/drivers/net/wireless/ath/wil6210/txrx.c
+++ b/drivers/net/wireless/ath/wil6210/txrx.c
@@ -306,7 +306,7 @@ static void wil_rx_add_radiotap_header(struct wil6210_priv *wil,
struct sk_buff *skb)
{
struct wil6210_rtap {
- struct ieee80211_radiotap_header rthdr;
+ struct ieee80211_radiotap_header_fixed rthdr;
/* fields should be in the order of bits in rthdr.it_present */
/* flags */
u8 flags;
diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2100.c b/drivers/net/wireless/intel/ipw2x00/ipw2100.c
index 0812db8936f13..9e9ff0cb724ca 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2100.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2100.c
@@ -2520,7 +2520,7 @@ static void isr_rx_monitor(struct ipw2100_priv *priv, int i,
* to build this manually element by element, we can write it much
* more efficiently than we can parse it. ORDER MATTERS HERE */
struct ipw_rt_hdr {
- struct ieee80211_radiotap_header rt_hdr;
+ struct ieee80211_radiotap_header_fixed rt_hdr;
s8 rt_dbmsignal; /* signal in dbM, kluged to signed */
} *ipw_rt;
diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.h b/drivers/net/wireless/intel/ipw2x00/ipw2200.h
index 8ebf09121e173..226286cb7eb82 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.h
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.h
@@ -1143,7 +1143,7 @@ struct ipw_prom_priv {
* structure is provided regardless of any bits unset.
*/
struct ipw_rt_hdr {
- struct ieee80211_radiotap_header rt_hdr;
+ struct ieee80211_radiotap_header_fixed rt_hdr;
u64 rt_tsf; /* TSF */ /* XXX */
u8 rt_flags; /* radiotap packet flags */
u8 rt_rate; /* rate in 500kb/s */
diff --git a/drivers/net/wireless/marvell/libertas/radiotap.h b/drivers/net/wireless/marvell/libertas/radiotap.h
index 1ed5608d353ff..d543bfe739dcb 100644
--- a/drivers/net/wireless/marvell/libertas/radiotap.h
+++ b/drivers/net/wireless/marvell/libertas/radiotap.h
@@ -2,7 +2,7 @@
#include <net/ieee80211_radiotap.h>
struct tx_radiotap_hdr {
- struct ieee80211_radiotap_header hdr;
+ struct ieee80211_radiotap_header_fixed hdr;
u8 rate;
u8 txpower;
u8 rts_retries;
@@ -31,7 +31,7 @@ struct tx_radiotap_hdr {
#define IEEE80211_FC_DSTODS 0x0300
struct rx_radiotap_hdr {
- struct ieee80211_radiotap_header hdr;
+ struct ieee80211_radiotap_header_fixed hdr;
u8 flags;
u8 rate;
u8 antsignal;
diff --git a/drivers/net/wireless/microchip/wilc1000/mon.c b/drivers/net/wireless/microchip/wilc1000/mon.c
index 03b7229a0ff5a..c3d27aaec2974 100644
--- a/drivers/net/wireless/microchip/wilc1000/mon.c
+++ b/drivers/net/wireless/microchip/wilc1000/mon.c
@@ -7,12 +7,12 @@
#include "cfg80211.h"
struct wilc_wfi_radiotap_hdr {
- struct ieee80211_radiotap_header hdr;
+ struct ieee80211_radiotap_header_fixed hdr;
u8 rate;
} __packed;
struct wilc_wfi_radiotap_cb_hdr {
- struct ieee80211_radiotap_header hdr;
+ struct ieee80211_radiotap_header_fixed hdr;
u8 rate;
u8 dump;
u16 tx_flags;
diff --git a/drivers/net/wireless/virtual/mac80211_hwsim.c b/drivers/net/wireless/virtual/mac80211_hwsim.c
index 07be0adc13ec5..d86a1bd7aab08 100644
--- a/drivers/net/wireless/virtual/mac80211_hwsim.c
+++ b/drivers/net/wireless/virtual/mac80211_hwsim.c
@@ -736,7 +736,7 @@ static const struct rhashtable_params hwsim_rht_params = {
};
struct hwsim_radiotap_hdr {
- struct ieee80211_radiotap_header hdr;
+ struct ieee80211_radiotap_header_fixed hdr;
__le64 rt_tsft;
u8 rt_flags;
u8 rt_rate;
@@ -745,7 +745,7 @@ struct hwsim_radiotap_hdr {
} __packed;
struct hwsim_radiotap_ack_hdr {
- struct ieee80211_radiotap_header hdr;
+ struct ieee80211_radiotap_header_fixed hdr;
u8 rt_flags;
u8 pad;
__le16 rt_channel;
diff --git a/include/net/ieee80211_radiotap.h b/include/net/ieee80211_radiotap.h
index 2338f8d2a8b33..c6cb6f6427423 100644
--- a/include/net/ieee80211_radiotap.h
+++ b/include/net/ieee80211_radiotap.h
@@ -24,25 +24,27 @@
* struct ieee80211_radiotap_header - base radiotap header
*/
struct ieee80211_radiotap_header {
- /**
- * @it_version: radiotap version, always 0
- */
- uint8_t it_version;
-
- /**
- * @it_pad: padding (or alignment)
- */
- uint8_t it_pad;
-
- /**
- * @it_len: overall radiotap header length
- */
- __le16 it_len;
-
- /**
- * @it_present: (first) present word
- */
- __le32 it_present;
+ __struct_group(ieee80211_radiotap_header_fixed, hdr, __packed,
+ /**
+ * @it_version: radiotap version, always 0
+ */
+ uint8_t it_version;
+
+ /**
+ * @it_pad: padding (or alignment)
+ */
+ uint8_t it_pad;
+
+ /**
+ * @it_len: overall radiotap header length
+ */
+ __le16 it_len;
+
+ /**
+ * @it_present: (first) present word
+ */
+ __le32 it_present;
+ );
/**
* @it_optional: all remaining presence bitmaps
@@ -50,6 +52,9 @@ struct ieee80211_radiotap_header {
__le32 it_optional[];
} __packed;
+static_assert(offsetof(struct ieee80211_radiotap_header, it_optional) == sizeof(struct ieee80211_radiotap_header_fixed),
+ "struct member likely outside of __struct_group()");
+
/* version is always 0 */
#define PKTHDR_RADIOTAP_VERSION 0
--
2.43.0
Memory access #VEs are hard for Linux to handle in contexts like the
entry code or NMIs. But other OSes need them for functionality.
There's a static (pre-guest-boot) way for a VMM to choose one or the
other. But VMMs don't always know which OS they are booting, so they
choose to deliver those #VEs so the "other" OSes will work. That,
unfortunately has left us in the lurch and exposed to these
hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static
configuration is set to "send nasty #VEs", the kernel can dynamically
request that they be disabled. Once they are disabled, access to private
memory that is not in the Mapped state in the Secure-EPT (SEPT) will
result in an exit to the VMM rather than injecting a #VE.
Check if the feature is available and disable SEPT #VE if possible.
If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
attribute is no longer reliable. It reflects the initial state of the
control for the TD, but it will not be updated if someone (e.g. bootloader)
changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
determine if SEPT #VEs are enabled or disabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")
Cc: stable(a)vger.kernel.org
Acked-by: Kai Huang <kai.huang(a)intel.com>
---
arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++-------
arch/x86/include/asm/shared/tdx.h | 10 +++-
2 files changed, 69 insertions(+), 17 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 28b321a95a5e..a27230c44cc2 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -79,7 +79,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
}
/* Read TD-scoped metadata */
-static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+static inline u64 tdg_vm_rd(u64 field, u64 *value)
{
struct tdx_module_args args = {
.rdx = field,
@@ -194,6 +194,62 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
+/*
+ * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure
+ * that no #VE will be delivered for accesses to TD-private memory.
+ *
+ * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM
+ * controls if the guest will receive such #VE with TD attribute
+ * ATTR_SEPT_VE_DISABLE.
+ *
+ * Newer TDX modules allow the guest to control if it wants to receive SEPT
+ * violation #VEs.
+ *
+ * Check if the feature is available and disable SEPT #VE if possible.
+ *
+ * If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
+ * attribute is no longer reliable. It reflects the initial state of the
+ * control for the TD, but it will not be updated if someone (e.g. bootloader)
+ * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
+ * determine if SEPT #VEs are enabled or disabled.
+ */
+static void disable_sept_ve(u64 td_attr)
+{
+ const char *msg = "TD misconfiguration: SEPT #VE has to be disabled";
+ bool debug = td_attr & ATTR_DEBUG;
+ u64 config, controls;
+
+ /* Is this TD allowed to disable SEPT #VE */
+ tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
+ if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) {
+ /* No SEPT #VE controls for the guest: check the attribute */
+ if (td_attr & ATTR_SEPT_VE_DISABLE)
+ return;
+
+ /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */
+ if (debug)
+ pr_warn("%s\n", msg);
+ else
+ tdx_panic(msg);
+ return;
+ }
+
+ /* Check if SEPT #VE has been disabled before us */
+ tdg_vm_rd(TDCS_TD_CTLS, &controls);
+ if (controls & TD_CTLS_PENDING_VE_DISABLE)
+ return;
+
+ /* Keep #VEs enabled for splats in debugging environments */
+ if (debug)
+ return;
+
+ /* Disable SEPT #VEs */
+ tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE,
+ TD_CTLS_PENDING_VE_DISABLE);
+
+ return;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -219,24 +275,12 @@ static void tdx_setup(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ td_attr = args.rdx;
+
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
- /*
- * The kernel can not handle #VE's when accessing normal kernel
- * memory. Ensure that no #VE will be delivered for accesses to
- * TD-private memory. Only VMM-shared memory (MMIO) will #VE.
- */
- td_attr = args.rdx;
- if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
- const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set.";
-
- /* Relax SEPT_VE_DISABLE check for debug TD. */
- if (td_attr & ATTR_DEBUG)
- pr_warn("%s\n", msg);
- else
- tdx_panic(msg);
- }
+ disable_sept_ve(td_attr);
}
/*
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 7e12cfa28bec..fecb2a6e864b 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -19,9 +19,17 @@
#define TDG_VM_RD 7
#define TDG_VM_WR 8
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+/* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
+#define TDCS_CONFIG_FLAGS 0x1110000300000016
+#define TDCS_TD_CTLS 0x1110000300000017
#define TDCS_NOTIFY_ENABLES 0x9100000000000010
+/* TDCS_CONFIG_FLAGS bits */
+#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1)
+
+/* TDCS_TD_CTLS bits */
+#define TD_CTLS_PENDING_VE_DISABLE BIT_ULL(0)
+
/* TDX hypercall Leaf IDs */
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
--
2.45.2
Rename tdx_parse_tdinfo() to tdx_setup() and move setting NOTIFY_ENABLES
there.
The function will be extended to adjust TD configuration.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index c74bb9e7d7a3..28b321a95a5e 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -194,7 +194,7 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
-static void tdx_parse_tdinfo(u64 *cc_mask)
+static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
unsigned int gpa_width;
@@ -219,6 +219,9 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
+ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
+
/*
* The kernel can not handle #VE's when accessing normal kernel
* memory. Ensure that no #VE will be delivered for accesses to
@@ -969,11 +972,11 @@ void __init tdx_early_init(void)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
cc_vendor = CC_VENDOR_INTEL;
- tdx_parse_tdinfo(&cc_mask);
- cc_set_mask(cc_mask);
- /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
+ /* Configure the TD */
+ tdx_setup(&cc_mask);
+
+ cc_set_mask(cc_mask);
/*
* All bits above GPA width are reserved and kernel treats shared bit
--
2.45.2
The TDG_VM_WR TDCALL is used to ask the TDX module to change some
TD-specific VM configuration. There is currently only one user in the
kernel of this TDCALL leaf. More will be added shortly.
Refactor to make way for more users of TDG_VM_WR who will need to modify
other TD configuration values.
Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specific
metadata from the TDX module. There are currently no users for
TDG_VM_RD. Mark it as __maybe_unused until the first user appears.
This is preparation for enumeration and enabling optional TD features.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 32 ++++++++++++++++++++++++++-----
arch/x86/include/asm/shared/tdx.h | 1 +
2 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 327c45c5013f..c74bb9e7d7a3 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -78,6 +78,32 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
}
+/* Read TD-scoped metadata */
+static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ };
+ u64 ret;
+
+ ret = __tdcall_ret(TDG_VM_RD, &args);
+ *value = args.r8;
+
+ return ret;
+}
+
+/* Write TD-scoped metadata */
+static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ .r8 = value,
+ .r9 = mask,
+ };
+
+ return __tdcall(TDG_VM_WR, &args);
+}
+
/**
* tdx_mcall_get_report0() - Wrapper to get TDREPORT0 (a.k.a. TDREPORT
* subtype 0) using TDG.MR.REPORT TDCALL.
@@ -929,10 +955,6 @@ static void tdx_kexec_finish(void)
void __init tdx_early_init(void)
{
- struct tdx_module_args args = {
- .rdx = TDCS_NOTIFY_ENABLES,
- .r9 = -1ULL,
- };
u64 cc_mask;
u32 eax, sig[3];
@@ -951,7 +973,7 @@ void __init tdx_early_init(void)
cc_set_mask(cc_mask);
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdcall(TDG_VM_WR, &args);
+ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
/*
* All bits above GPA width are reserved and kernel treats shared bit
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index fdfd41511b02..7e12cfa28bec 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -16,6 +16,7 @@
#define TDG_VP_VEINFO_GET 3
#define TDG_MR_REPORT 4
#define TDG_MEM_PAGE_ACCEPT 6
+#define TDG_VM_RD 7
#define TDG_VM_WR 8
/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
--
2.45.2
The reset line of the IT6505 bridge chip is active low, not active high.
It was incorrectly inverted in the device tree as the implementation at
the time incorrectly inverted the polarity in its driver, due to a prior
device having an inline inverting level shifter.
Fix the polarity now while the external display pipeline is incomplete,
thereby avoiding any impact to running systems.
A matching fix for the driver should be included if this change is
backported.
Fixes: 8855d01fb81f ("arm64: dts: mediatek: Add MT8186 Krabby platform based Tentacruel / Tentacool")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
---
The matching driver change can be found at
https://lore.kernel.org/all/20241029095411.657616-1-wenst@chromium.org/
arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi b/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi
index e3b58641f2c9..43c83620e479 100644
--- a/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi
@@ -422,7 +422,7 @@ it6505dptx: dp-bridge@5c {
#sound-dai-cells = <0>;
ovdd-supply = <&mt6366_vsim2_reg>;
pwr18-supply = <&pp1800_dpbrdg_dx>;
- reset-gpios = <&pio 177 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&pio 177 GPIO_ACTIVE_LOW>;
ports {
#address-cells = <1>;
--
2.47.0.163.g1226f6d8fa-goog
The quilt patch titled
Subject: lib: string_helpers: fix potential snprintf() output truncation
has been removed from the -mm tree. Its filename was
lib-string_helpers-fix-potential-snprintf-output-truncation.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Subject: lib: string_helpers: fix potential snprintf() output truncation
Date: Mon, 21 Oct 2024 11:14:17 +0200
The output of ".%03u" with the unsigned int in range [0, 4294966295] may
get truncated if the target buffer is not 12 bytes.
Link: https://lkml.kernel.org/r/20241021091417.37796-1-brgl@bgdev.pl
Fixes: 3c9f3681d0b4 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Reviewed-by: Andy Shevchenko <andy(a)kernel.org>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/string_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/string_helpers.c~lib-string_helpers-fix-potential-snprintf-output-truncation
+++ a/lib/string_helpers.c
@@ -57,7 +57,7 @@ int string_get_size(u64 size, u64 blk_si
static const unsigned int rounding[] = { 500, 50, 5 };
int i = 0, j;
u32 remainder = 0, sf_cap;
- char tmp[8];
+ char tmp[12];
const char *unit;
tmp[0] = '\0';
_
Patches currently in -mm which might be from bartosz.golaszewski(a)linaro.org are
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
commit e4d2102018542e3ae5e297bc6e229303abff8a0f upstream.
Robert Gill reported below #GP in 32-bit mode when dosemu software was
executing vm86() system call:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
EIP: restore_all_switch_stack+0xbe/0xcf
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
Call Trace:
show_regs+0x70/0x78
die_addr+0x29/0x70
exc_general_protection+0x13c/0x348
exc_bounds+0x98/0x98
handle_exception+0x14d/0x14d
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
This only happens in 32-bit mode when VERW based mitigations like MDS/RFDS
are enabled. This is because segment registers with an arbitrary user value
can result in #GP when executing VERW. Intel SDM vol. 2C documents the
following behavior for VERW instruction:
#GP(0) - If a memory operand effective address is outside the CS, DS, ES,
FS, or GS segment limit.
CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Use %cs selector to reference VERW operand. This ensures VERW will
not #GP for an arbitrary user %ds.
[ mingo: Fixed the SOB chain. ]
Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
Reported-by: Robert Gill <rtgill82(a)gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com>
Cc: stable(a)vger.kernel.org # 5.10+
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.i…
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Suggested-by: Brian Gerst <brgerst(a)gmail.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
[xiongfeng: fix conflicts caused by the runtime patch jmp]
Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
---
arch/x86/include/asm/nospec-branch.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 87e1ff064025..7978d5fe1ce6 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -199,7 +199,16 @@
*/
.macro CLEAR_CPU_BUFFERS
ALTERNATIVE "jmp .Lskip_verw_\@", "", X86_FEATURE_CLEAR_CPU_BUF
- verw _ASM_RIP(mds_verw_sel)
+#ifdef CONFIG_X86_64
+ verw mds_verw_sel(%rip)
+#else
+ /*
+ * In 32bit mode, the memory operand must be a %cs reference. The data
+ * segments may not be usable (vm86 mode), and the stack segment may not
+ * be flat (ESPFIX32).
+ */
+ verw %cs:mds_verw_sel
+#endif
.Lskip_verw_\@:
.endm
--
2.20.1
The quilt patch titled
Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
has been removed from the -mm tree. Its filename was
mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Date: Sat, 19 Oct 2024 01:29:38 +0000
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google…
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Matlack <dmatlack(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: David Stevens <stevensd(a)google.com>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 2 --
mm/vmscan.c | 14 +++++---------
2 files changed, 5 insertions(+), 11 deletions(-)
--- a/include/linux/mmzone.h~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats
+++ a/include/linux/mmzone.h
@@ -458,9 +458,7 @@ struct lru_gen_folio {
enum {
MM_LEAF_TOTAL, /* total leaf entries */
- MM_LEAF_OLD, /* old leaf entries */
MM_LEAF_YOUNG, /* young leaf entries */
- MM_NONLEAF_TOTAL, /* total non-leaf entries */
MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */
MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */
NR_MM_STATS
--- a/mm/vmscan.c~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats
+++ a/mm/vmscan.c
@@ -3399,7 +3399,6 @@ restart:
continue;
if (!pte_young(ptent)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3552,7 +3551,6 @@ restart:
walk->mm_stats[MM_LEAF_TOTAL]++;
if (!pmd_young(val)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3564,8 +3562,6 @@ restart:
continue;
}
- walk->mm_stats[MM_NONLEAF_TOTAL]++;
-
if (!walk->force_scan && should_clear_pmd_young()) {
if (!pmd_young(val))
continue;
@@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
for (type = 0; type < ANON_AND_FILE; type++) {
- const char *s = " ";
+ const char *s = "xxx";
unsigned long n[3] = {};
if (seq == max_seq) {
- s = "RT ";
+ s = "RTx";
n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
@@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
- const char *s = " ";
+ const char *s = "xxxx";
unsigned long n = 0;
if (seq == max_seq && NR_HIST_GENS == 1) {
- s = "LOYNFA";
+ s = "TYFA";
n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
- s = "loynfa";
+ s = "tyfa";
n = READ_ONCE(mm_state->stats[hist][i]);
}
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-page_alloc-keep-track-of-free-highatomic.patch
On 01/11/2024 19:29, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> dt-bindings: gpu: Convert Samsung Image Rotator to dt-schema
>
> to the 5.4-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> dt-bindings-gpu-convert-samsung-image-rotator-to-dt-.patch
> and it can be found in the queue-5.4 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 25cb1f1f53fe137aefdc5e54bb1392098c4200ed
> Author: Maciej Falkowski <m.falkowski(a)samsung.com>
> Date: Tue Sep 17 12:37:27 2019 +0200
>
> dt-bindings: gpu: Convert Samsung Image Rotator to dt-schema
>
> [ Upstream commit 6e3ffcd592060403ee2d956c9b1704775898db79 ]
>
> Convert Samsung Image Rotator to newer dt-schema format.
>
> Signed-off-by: Maciej Falkowski <m.falkowski(a)samsung.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
> Signed-off-by: Rob Herring <robh(a)kernel.org>
> Stable-dep-of: 338c4d3902fe ("igb: Disable threaded IRQ for igb_msix_other")
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
How is a binding conversion a dep for a one line driver patch that
doesn't parse any new properties?
Hello,
I would like to propose backporting these two commits into stable:
* 663f295e3559 ("smb: client: fix parsing of device numbers")
* a9de67336a4a ("smb: client: set correct device number on nfs reparse points")
Linux SMB client without these two recent fixes swaps device major and
minor numbers, which makes basically char/block device nodes unusable.
Commit 663f295e3559 ("smb: client: fix parsing of device numbers")
should have had following Fixes line:
Fixes: 45e724022e27 ("smb: client: set correct file type from NFS reparse points")
And commit a9de67336a4a ("smb: client: set correct device number on nfs
reparse points") should have contained line:
Fixes: 102466f303ff ("smb: client: allow creating special files via reparse points")
Pali
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
The output of ".%03u" with the unsigned int in range [0, 4294966295] may
get truncated if the target buffer is not 12 bytes. This can't really
happen here as the 'remainder' variable cannot exceed 999 but the
compiler doesn't know it. To make it happy just increase the buffer to
where the warning goes away.
Fixes: 3c9f3681d0b4 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Reviewed-by: Andy Shevchenko <andy(a)kernel.org>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Changes in v2:
- improve the commit message
lib/string_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 4f887aa62fa0..91fa37b5c510 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -57,7 +57,7 @@ int string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
static const unsigned int rounding[] = { 500, 50, 5 };
int i = 0, j;
u32 remainder = 0, sf_cap;
- char tmp[8];
+ char tmp[12];
const char *unit;
tmp[0] = '\0';
--
2.45.2
This patch series is to fix bugs for below 2 APIs:
pci_epc_destroy()
pci_epc_remove_epf()
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Zijun Hu (2):
PCI: endpoint: Fix API pci_epc_destroy() releasing domain_nr ID faults
PCI: endpoint: Fix that API pci_epc_remove_epf() cleans up wrong EPC of EPF
drivers/pci/endpoint/pci-epc-core.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
---
base-commit: 11066801dd4b7c4d75fce65c812723a80c1481ae
change-id: 20241102-epc_rfc-e1d9d03d5101
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
In an event where the platform running the device control plane
is rebooted, reset is detected on the driver. It releases
all the resources and waits for the reset to complete. Once the
reset is done, it tries to build the resources back. At this
time if the device control plane is not yet started, then
the driver timeouts on the virtchnl message and retries to
establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox
workqueue is still alive and polling for the mailbox message.
This results in accessing the released control queue leading to
null-ptr-deref. Fix it by unrolling the work queue cancellation
and mailbox deinitialization in the reverse order which they got
initialized.
Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request")
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: stable(a)vger.kernel.org # 6.9+
Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com>
---
v2:
- remove changes which are not fixes for the actual issue from this patch
---
drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 +
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index c3848e10e7db..b4fbb99bfad2 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter)
*/
err = idpf_vc_core_init(adapter);
if (err) {
+ cancel_delayed_work_sync(&adapter->mbx_task);
idpf_deinit_dflt_mbx(adapter);
goto unlock_mutex;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 3be883726b87..e7eee571d908 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -3070,7 +3070,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
adapter->state = __IDPF_VER_CHECK;
if (adapter->vcxn_mngr)
idpf_vc_xn_shutdown(adapter->vcxn_mngr);
- idpf_deinit_dflt_mbx(adapter);
set_bit(IDPF_HR_DRV_LOAD, adapter->flags);
queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task,
msecs_to_jiffies(task_delay));
--
2.43.0
If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is
fully initialized, we can hit the panic below:
hv_utils: Registering HyperV Utility Driver
hv_vmbus: registering driver hv_utils
...
BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1
RIP: 0010:hv_pkt_iter_first+0x12/0xd0
Call Trace:
...
vmbus_recvpacket
hv_kvp_onchannelcallback
vmbus_on_event
tasklet_action_common
tasklet_action
handle_softirqs
irq_exit_rcu
sysvec_hyperv_stimer0
</IRQ>
<TASK>
asm_sysvec_hyperv_stimer0
...
kvp_register_done
hvt_op_read
vfs_read
ksys_read
__x64_sys_read
This can happen because the KVP/VSS channel callback can be invoked
even before the channel is fully opened:
1) as soon as hv_kvp_init() -> hvutil_transport_init() creates
/dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and
register itself to the driver by writing a message KVP_OP_REGISTER1 to the
file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and
reading the file for the driver's response, which is handled by
hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done().
2) the problem with kvp_register_done() is that it can cause the
channel callback to be called even before the channel is fully opened,
and when the channel callback is starting to run, util_probe()->
vmbus_open() may have not initialized the ringbuffer yet, so the
callback can hit the panic of NULL pointer dereference.
To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in
__vmbus_open(), just before the first hv_ringbuffer_init(), and then we
unload and reload the driver hv_utils, and run the daemon manually within
the 10 seconds.
Fix the panic by checking the channel state in the channel callback.
To avoid the race condition with __vmbus_open(), we disable and enable
the channel callback temporarily in __vmbus_open().
The channel callbacks of the other VMBus devices don't need to check
the channel state since they can't run before the channels are fully
initialized.
Note: we would also need to fix the fcopy driver code, but that has
been removed in commit ec314f61e4fc ("Drivers: hv: Remove fcopy driver") in
the mainline kernel since v6.10. For old 6.x LTS kernels, and the 5.x
and 4.x LTS kernels, the fcopy driver needs to be fixed when the
fix is backported to the stable kernel branches.
Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
drivers/hv/channel.c | 11 +++++++++++
drivers/hv/hv_kvp.c | 3 +++
drivers/hv/hv_snapshot.c | 3 +++
3 files changed, 17 insertions(+)
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index fb8cd8469328..685e407a3fdf 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -657,6 +657,14 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
return -ENOMEM;
}
+ /*
+ * The channel callbacks of KVP/VSS may run before __vmbus_open()
+ * finishes (see kvp_register_done() -> ... -> kvp_poll_wrapper()), so
+ * they check newchannel->state to tell the ringbuffer has been fully
+ * initialized or not. Disable and enable the tasklet to avoid the race.
+ */
+ tasklet_disable(&newchannel->callback_event);
+
newchannel->state = CHANNEL_OPENING_STATE;
newchannel->onchannel_callback = onchannelcallback;
newchannel->channel_callback_context = context;
@@ -750,6 +758,8 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
}
newchannel->state = CHANNEL_OPENED_STATE;
+ tasklet_enable(&newchannel->callback_event);
+
kfree(open_info);
return 0;
@@ -766,6 +776,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
hv_ringbuffer_cleanup(&newchannel->inbound);
vmbus_free_requestor(&newchannel->requestor);
newchannel->state = CHANNEL_OPEN_STATE;
+ tasklet_enable(&newchannel->callback_event);
return err;
}
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index d35b60c06114..ec098067e579 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -662,6 +662,9 @@ void hv_kvp_onchannelcallback(void *context)
if (kvp_transaction.state > HVUTIL_READY)
return;
+ if (channel->state != CHANNEL_OPENED_STATE)
+ return;
+
if (vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 4, &recvlen, &requestid)) {
pr_err_ratelimited("KVP request received. Could not read into recv buf\n");
return;
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 0d2184be1691..f7924c2fc62e 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -301,6 +301,9 @@ void hv_vss_onchannelcallback(void *context)
if (vss_transaction.state > HVUTIL_READY)
return;
+ if (channel->state != CHANNEL_OPENED_STATE)
+ return;
+
if (vmbus_recvpacket(channel, recv_buffer, VSS_MAX_PKT_SIZE, &recvlen, &requestid)) {
pr_err_ratelimited("VSS request received. Could not read into recv buf\n");
return;
--
2.25.1
Setting TPM_CHIP_FLAG_SUSPENDED in the end of tpm_pm_suspend() can be racy
according, as this leaves window for tpm_hwrng_read() to be called while
the operation is in progress. The recent bug report gives also evidence of
this behaviour.
Aadress this by locking the TPM chip before checking any chip->flags both
in tpm_pm_suspend() and tpm_hwrng_read(). Move TPM_CHIP_FLAG_SUSPENDED
check inside tpm_get_random() so that it will be always checked only when
the lock is reserved.
Cc: stable(a)vger.kernel.org # v6.4+
Fixes: 99d464506255 ("tpm: Prevent hwrng from activating during resume")
Reported-by: Mike Seo <mikeseohyungjin(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219383
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v3:
- Check TPM_CHIP_FLAG_SUSPENDED inside tpm_get_random() so that it is
also done under the lock (suggested by Jerry Snitselaar).
v2:
- Addressed my own remark:
https://lore.kernel.org/linux-integrity/D59JAI6RR2CD.G5E5T4ZCZ49W@kernel.or…
---
drivers/char/tpm/tpm-chip.c | 4 ----
drivers/char/tpm/tpm-interface.c | 32 ++++++++++++++++++++++----------
2 files changed, 22 insertions(+), 14 deletions(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 1ff99a7091bb..7df7abaf3e52 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -525,10 +525,6 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
{
struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng);
- /* Give back zero bytes, as TPM chip has not yet fully resumed: */
- if (chip->flags & TPM_CHIP_FLAG_SUSPENDED)
- return 0;
-
return tpm_get_random(chip, data, max);
}
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 8134f002b121..b1daa0d7b341 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -370,6 +370,13 @@ int tpm_pm_suspend(struct device *dev)
if (!chip)
return -ENODEV;
+ rc = tpm_try_get_ops(chip);
+ if (rc) {
+ /* Can be safely set out of locks, as no action cannot race: */
+ chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
+ goto out;
+ }
+
if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED)
goto suspended;
@@ -377,21 +384,19 @@ int tpm_pm_suspend(struct device *dev)
!pm_suspend_via_firmware())
goto suspended;
- rc = tpm_try_get_ops(chip);
- if (!rc) {
- if (chip->flags & TPM_CHIP_FLAG_TPM2) {
- tpm2_end_auth_session(chip);
- tpm2_shutdown(chip, TPM2_SU_STATE);
- } else {
- rc = tpm1_pm_suspend(chip, tpm_suspend_pcr);
- }
-
- tpm_put_ops(chip);
+ if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+ tpm2_end_auth_session(chip);
+ tpm2_shutdown(chip, TPM2_SU_STATE);
+ goto suspended;
}
+ rc = tpm1_pm_suspend(chip, tpm_suspend_pcr);
+
suspended:
chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
+ tpm_put_ops(chip);
+out:
if (rc)
dev_err(dev, "Ignoring error %d while suspending\n", rc);
return 0;
@@ -440,11 +445,18 @@ int tpm_get_random(struct tpm_chip *chip, u8 *out, size_t max)
if (!chip)
return -ENODEV;
+ /* Give back zero bytes, as TPM chip has not yet fully resumed: */
+ if (chip->flags & TPM_CHIP_FLAG_SUSPENDED) {
+ rc = 0;
+ goto out;
+ }
+
if (chip->flags & TPM_CHIP_FLAG_TPM2)
rc = tpm2_get_random(chip, out, max);
else
rc = tpm1_get_random(chip, out, max);
+out:
tpm_put_ops(chip);
return rc;
}
--
2.47.0
The patch titled
Subject: ucounts: fix counter leak in inc_rlimit_get_ucounts()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrei Vagin <avagin(a)google.com>
Subject: ucounts: fix counter leak in inc_rlimit_get_ucounts()
Date: Fri, 1 Nov 2024 19:19:40 +0000
The inc_rlimit_get_ucounts() increments the specified rlimit counter and
then checks its limit. If the value exceeds the limit, the function
returns an error without decrementing the counter.
Link: https://lkml.kernel.org/r/20241101191940.3211128-1-roman.gushchin@linux.dev
Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting")
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Co-developed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Tested-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Alexey Gladkov <legion(a)kernel.org>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/ucount.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/ucount.c~ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts
+++ a/kernel/ucount.c
@@ -318,7 +318,7 @@ long inc_rlimit_get_ucounts(struct ucoun
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(1, &iter->rlimit[type]);
if (new < 0 || (!override_rlimit && (new > max)))
- goto unwind;
+ goto dec_unwind;
if (iter == ucounts)
ret = new;
max = get_userns_rlimit_max(iter->ns, type);
@@ -335,7 +335,6 @@ long inc_rlimit_get_ucounts(struct ucoun
dec_unwind:
dec = atomic_long_sub_return(1, &iter->rlimit[type]);
WARN_ON_ONCE(dec < 0);
-unwind:
do_dec_rlimit_put_ucounts(ucounts, iter, type);
return 0;
}
_
Patches currently in -mm which might be from avagin(a)google.com are
ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts.patch
From: Andrei Vagin <avagin(a)google.com>
The inc_rlimit_get_ucounts() increments the specified rlimit counter and
then checks its limit. If the value exceeds the limit, the function
returns an error without decrementing the counter.
v2: changed the goto label name [Roman]
Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting")
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Co-developed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Tested-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Alexey Gladkov <legion(a)kernel.org>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
---
kernel/ucount.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 8c07714ff27d..9469102c5ac0 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -317,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(1, &iter->rlimit[type]);
if (new < 0 || new > max)
- goto unwind;
+ goto dec_unwind;
if (iter == ucounts)
ret = new;
max = get_userns_rlimit_max(iter->ns, type);
@@ -334,7 +334,6 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
dec_unwind:
dec = atomic_long_sub_return(1, &iter->rlimit[type]);
WARN_ON_ONCE(dec < 0);
-unwind:
do_dec_rlimit_put_ucounts(ucounts, iter, type);
return 0;
}
--
2.47.0.163.g1226f6d8fa-goog
Syzkaller reported a hung task with uevent_show() on stack trace. That
specific issue was addressed by another commit [0], but even with that
fix applied (for example, running v6.12-rc5) we face another type of hung
task that comes from the same reproducer [1]. By investigating that, we
could narrow it to the following path:
(a) Syzkaller emulates a Realtek USB WiFi adapter using raw-gadget and
dummy_hcd infrastructure.
(b) During the probe of rtl8192cu, the driver ends-up performing an efuse
read procedure (which is related to EEPROM load IIUC), and here lies the
issue: the function read_efuse() calls read_efuse_byte() many times, as
loop iterations depending on the efuse size (in our example, 512 in total).
This procedure for reading efuse bytes relies in a loop that performs an
I/O read up to *10k* times in case of failures. We measured the time of
the loop inside read_efuse_byte() alone, and in this reproducer (which
involves the dummy_hcd emulation layer), it takes 15 seconds each. As a
consequence, we have the driver stuck in its probe routine for big time,
exposing a stack trace like below if we attempt to reboot the system, for
example:
task:kworker/0:3 state:D stack:0 pid:662 tgid:662 ppid:2 flags:0x00004000
Workqueue: usb_hub_wq hub_event
Call Trace:
__schedule+0xe22/0xeb6
schedule_timeout+0xe7/0x132
__wait_for_common+0xb5/0x12e
usb_start_wait_urb+0xc5/0x1ef
? usb_alloc_urb+0x95/0xa4
usb_control_msg+0xff/0x184
_usbctrl_vendorreq_sync+0xa0/0x161
_usb_read_sync+0xb3/0xc5
read_efuse_byte+0x13c/0x146
read_efuse+0x351/0x5f0
efuse_read_all_map+0x42/0x52
rtl_efuse_shadow_map_update+0x60/0xef
rtl_get_hwinfo+0x5d/0x1c2
rtl92cu_read_eeprom_info+0x10a/0x8d5
? rtl92c_read_chip_version+0x14f/0x17e
rtl_usb_probe+0x323/0x851
usb_probe_interface+0x278/0x34b
really_probe+0x202/0x4a4
__driver_probe_device+0x166/0x1b2
driver_probe_device+0x2f/0xd8
[...]
We propose hereby to drastically reduce the attempts of doing the I/O reads
in case of failures, restricted to USB devices (given that they're inherently
slower than PCIe ones). By retrying up to 10 times (instead of 10000), we
got reponsiveness in the reproducer, while seems reasonable to believe
that there's no sane USB device implementation in the field requiring this
amount of retries at every I/O read in order to properly work. Based on
that assumption, it'd be good to have it backported to stable but maybe not
since driver implementation (the 10k number comes from day 0), perhaps up
to 6.x series makes sense.
[0] Commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race").
[1] A note about that: this syzkaller report presents multiple reproducers
that differs by the type of emulated USB device. For this specific case,
check the entry from 2024/08/08 06:23 in the list of crashes; the C repro
is available at https://syzkaller.appspot.com/text?tag=ReproC&x=1521fc83980000.
Cc: stable(a)vger.kernel.org # v6.1+
Reported-by: syzbot+edd9fe0d3a65b14588d5(a)syzkaller.appspotmail.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
V3:
- Switch variable declaration to reverse xmas tree order
(thanks Ping-Ke Shih for the suggestion).
V2:
- Restrict the change to USB device only (thanks Ping-Ke Shih).
- Tested in 2 USB devices by Bitterblue Smith - thanks a lot!
V1: https://lore.kernel.org/lkml/20241025150226.896613-1-gpiccoli@igalia.com/
drivers/net/wireless/realtek/rtlwifi/efuse.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/efuse.c b/drivers/net/wireless/realtek/rtlwifi/efuse.c
index 82cf5fb5175f..0ff553f650f9 100644
--- a/drivers/net/wireless/realtek/rtlwifi/efuse.c
+++ b/drivers/net/wireless/realtek/rtlwifi/efuse.c
@@ -162,9 +162,19 @@ void efuse_write_1byte(struct ieee80211_hw *hw, u16 address, u8 value)
void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf)
{
struct rtl_priv *rtlpriv = rtl_priv(hw);
+ u16 retry, max_attempts;
u32 value32;
u8 readbyte;
- u16 retry;
+
+ /*
+ * In case of USB devices, transfer speeds are limited, hence
+ * efuse I/O reads could be (way) slower. So, decrease (a lot)
+ * the read attempts in case of failures.
+ */
+ if (rtlpriv->rtlhal.interface == INTF_PCI)
+ max_attempts = 10000;
+ else
+ max_attempts = 10;
rtl_write_byte(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL] + 1,
(_offset & 0xff));
@@ -178,7 +188,7 @@ void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf)
retry = 0;
value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]);
- while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < 10000)) {
+ while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < max_attempts)) {
value32 = rtl_read_dword(rtlpriv,
rtlpriv->cfg->maps[EFUSE_CTRL]);
retry++;
--
2.46.2
The patch titled
Subject: mm: fix __wp_page_copy_user fallback path for remote mm
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Asahi Lina <lina(a)asahilina.net>
Subject: mm: fix __wp_page_copy_user fallback path for remote mm
Date: Fri, 01 Nov 2024 21:08:02 +0900
If the source page is a PFN mapping, we copy back from userspace.
However, if this fault is a remote access, we cannot use
__copy_from_user_inatomic. Instead, use access_remote_vm() in this case.
Fixes WARN and incorrect zero-filling when writing to CoW mappings in
a remote process, such as when using gdb on a binary present on a DAX
filesystem.
[ 143.683782] ------------[ cut here ]------------
[ 143.683784] WARNING: CPU: 1 PID: 350 at mm/memory.c:2904 __wp_page_copy_user+0x120/0x2bc
[ 143.683793] CPU: 1 PID: 350 Comm: gdb Not tainted 6.6.52 #1
[ 143.683794] Hardware name: linux,dummy-virt (DT)
[ 143.683795] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 143.683796] pc : __wp_page_copy_user+0x120/0x2bc
[ 143.683798] lr : __wp_page_copy_user+0x254/0x2bc
[ 143.683799] sp : ffff80008272b8b0
[ 143.683799] x29: ffff80008272b8b0 x28: 0000000000000000 x27: ffff000083bad580
[ 143.683801] x26: 0000000000000000 x25: 0000fffff7fd5000 x24: ffff000081db04c0
[ 143.683802] x23: ffff00014f24b000 x22: fffffc00053c92c0 x21: ffff000083502150
[ 143.683803] x20: 0000fffff7fd5000 x19: ffff80008272b9d0 x18: 0000000000000000
[ 143.683804] x17: ffff000081db0500 x16: ffff800080fe52a0 x15: 0000fffff7fd5000
[ 143.683804] x14: 0000000000bb1845 x13: 0000000000000080 x12: ffff80008272b880
[ 143.683805] x11: ffff000081d13600 x10: ffff000081d13608 x9 : ffff000081d1360c
[ 143.683806] x8 : ffff000083a16f00 x7 : 0000000000000010 x6 : ffff00014f24b000
[ 143.683807] x5 : ffff00014f24c000 x4 : 0000000000000000 x3 : ffff000083582000
[ 143.683807] x2 : 0000000000000f80 x1 : 0000fffff7fd5000 x0 : 0000000000001000
[ 143.683808] Call trace:
[ 143.683809] __wp_page_copy_user+0x120/0x2bc
[ 143.683810] wp_page_copy+0x98/0x5c0
[ 143.683813] do_wp_page+0x250/0x530
[ 143.683814] __handle_mm_fault+0x278/0x284
[ 143.683817] handle_mm_fault+0x64/0x1e8
[ 143.683819] faultin_page+0x5c/0x110
[ 143.683820] __get_user_pages+0xc8/0x2f4
[ 143.683821] get_user_pages_remote+0xac/0x30c
[ 143.683823] __access_remote_vm+0xb4/0x368
[ 143.683824] access_remote_vm+0x10/0x1c
[ 143.683826] mem_rw.isra.0+0xc4/0x218
[ 143.683831] mem_write+0x18/0x24
[ 143.683831] vfs_write+0xa0/0x37c
[ 143.683834] ksys_pwrite64+0x7c/0xc0
[ 143.683834] __arm64_sys_pwrite64+0x20/0x2c
[ 143.683835] invoke_syscall+0x48/0x10c
[ 143.683837] el0_svc_common.constprop.0+0x40/0xe0
[ 143.683839] do_el0_svc+0x1c/0x28
[ 143.683841] el0_svc+0x3c/0xdc
[ 143.683846] el0t_64_sync_handler+0x120/0x12c
[ 143.683848] el0t_64_sync+0x194/0x198
[ 143.683849] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20241101-mm-remote-pfn-v1-1-080b609270b7@asahilin…
Fixes: 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
Signed-off-by: Asahi Lina <lina(a)asahilina.net>
Cc: Jia He <justin.he(a)arm.com>
Cc: Yibo Cai <Yibo.Cai(a)arm.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Asahi Lina <lina(a)asahilina.net>
Cc: Sergio Lopez Pascual <slp(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm
+++ a/mm/memory.c
@@ -3081,13 +3081,18 @@ static inline int __wp_page_copy_user(st
update_mmu_cache_range(vmf, vma, addr, vmf->pte, 1);
}
+ /* If the mm is a remote mm, copy in the page using access_remote_vm() */
+ if (current->mm != mm) {
+ if (access_remote_vm(mm, (unsigned long)uaddr, kaddr, PAGE_SIZE, 0) != PAGE_SIZE)
+ goto warn;
+ }
/*
* This really shouldn't fail, because the page is there
* in the page tables. But it might just be unreadable,
* in which case we just give up and fill the result with
* zeroes.
*/
- if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+ else if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
if (vmf->pte)
goto warn;
_
Patches currently in -mm which might be from lina(a)asahilina.net are
mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm.patch
The patch titled
Subject: selftests: hugetlb_dio: check for initial conditions to skip in the start
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: selftests: hugetlb_dio: check for initial conditions to skip in the start
Date: Fri, 1 Nov 2024 19:15:57 +0500
The test should be skipped if initial conditions aren't fulfilled in the
start instead of failing and outputting non-compliant TAP logs. This kind
of failure pollutes the results. The initial conditions are:
- The test should only execute if /tmp file can be allocated.
- The test should only execute if huge pages are free.
Before:
TAP version 13
1..4
Bail out! Error opening file
: Read-only file system (30)
# Planned tests != run tests (4 != 0)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
After:
TAP version 13
1..0 # SKIP Unable to allocate file: Read-only file system
Link: https://lkml.kernel.org/r/20241101141557.3159432-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Fixes: 3a103b5315b7 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()")
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Donet Tom <donettom(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugetlb_dio.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
--- a/tools/testing/selftests/mm/hugetlb_dio.c~selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start
+++ a/tools/testing/selftests/mm/hugetlb_dio.c
@@ -44,13 +44,6 @@ void run_dio_using_hugetlb(unsigned int
if (fd < 0)
ksft_exit_fail_perror("Error opening file\n");
- /* Get the free huge pages before allocation */
- free_hpage_b = get_free_hugepages();
- if (free_hpage_b == 0) {
- close(fd);
- ksft_exit_skip("No free hugepage, exiting!\n");
- }
-
/* Allocate a hugetlb page */
orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
if (orig_buffer == MAP_FAILED) {
@@ -94,8 +87,20 @@ void run_dio_using_hugetlb(unsigned int
int main(void)
{
size_t pagesize = 0;
+ int fd;
ksft_print_header();
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
+ if (fd < 0)
+ ksft_exit_skip("Unable to allocate file: %s\n", strerror(errno));
+ close(fd);
+
+ /* Check if huge pages are free */
+ if (!get_free_hugepages())
+ ksft_exit_skip("No free hugepage, exiting\n");
+
ksft_set_plan(4);
/* Get base page size */
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start.patch
From: Kalesh Singh <kaleshsingh(a)google.com>
Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API")
converted tracefs to use the new mount APIs caused mount options
(e.g. gid=<gid>) to not take effect.
The tracefs superblock can be updated from multiple paths:
- on fs_initcall() to init_trace_printk_function_export()
- from a work queue to initialize eventfs
tracer_init_tracefs_work_func()
- fsconfig() syscall to mount or remount of tracefs
The tracefs superblock root inode gets created early on in
init_trace_printk_function_export().
With the new mount API, tracefs effectively uses get_tree_single() instead
of the old API mount_single().
Previously, mount_single() ensured that the options are always applied to
the superblock root inode:
(1) If the root inode didn't exist, call fill_super() to create it
and apply the options.
(2) If the root inode exists, call reconfigure_single() which
effectively calls tracefs_apply_options() to parse and apply
options to the subperblock's fs_info and inode and remount
eventfs (if necessary)
On the other hand, get_tree_single() effectively calls vfs_get_super()
which:
(3) If the root inode doesn't exists, calls fill_super() to create it
and apply the options.
(4) If the root inode already exists, updates the fs_context root
with the superblock's root inode.
(4) above is always the case for tracefs mounts, since the super block's
root inode will already be created by init_trace_printk_function_export().
This means that the mount options get ignored:
- Since it isn't applied to the superblock's root inode, it doesn't
get inherited by the children.
- Since eventfs is initialized from a separate work queue and
before call to mount with the options, and it doesn't get remounted
for mount.
Ensure that the mount options are applied to the super block and eventfs
is remounted to respect the mount options.
To understand this better, if fstab has the following:
tracefs /sys/kernel/tracing tracefs nosuid,nodev,noexec,gid=tracing 0 0
On boot up, permissions look like:
# ls -l /sys/kernel/tracing/trace
-rw-r----- 1 root root 0 Nov 1 08:37 /sys/kernel/tracing/trace
When it should look like:
# ls -l /sys/kernel/tracing/trace
-rw-r----- 1 root tracing 0 Nov 1 08:37 /sys/kernel/tracing/trace
Link: https://lore.kernel.org/r/536e99d3-345c-448b-adee-a21389d7ab4b@redhat.com/
Cc: Eric Sandeen <sandeen(a)redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ali Zahraee <ahzahraee(a)gmail.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: 78ff64081949 ("vfs: Convert tracefs to use the new mount API")
Link: https://lore.kernel.org/20241030171928.4168869-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/inode.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 1748dff58c3b..cfc614c638da 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -392,6 +392,9 @@ static int tracefs_reconfigure(struct fs_context *fc)
struct tracefs_fs_info *sb_opts = sb->s_fs_info;
struct tracefs_fs_info *new_opts = fc->s_fs_info;
+ if (!new_opts)
+ return 0;
+
sync_filesystem(sb);
/* structure copy of new mount options to sb */
*sb_opts = *new_opts;
@@ -478,14 +481,17 @@ static int tracefs_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_op = &tracefs_super_operations;
sb->s_d_op = &tracefs_dentry_operations;
- tracefs_apply_options(sb, false);
-
return 0;
}
static int tracefs_get_tree(struct fs_context *fc)
{
- return get_tree_single(fc, tracefs_fill_super);
+ int err = get_tree_single(fc, tracefs_fill_super);
+
+ if (err)
+ return err;
+
+ return tracefs_reconfigure(fc);
}
static void tracefs_free_fc(struct fs_context *fc)
--
2.45.2
The patch titled
Subject: signal: restore the override_rlimit logic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
signal-restore-the-override_rlimit-logic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Roman Gushchin <roman.gushchin(a)linux.dev>
Subject: signal: restore the override_rlimit logic
Date: Thu, 31 Oct 2024 20:04:38 +0000
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo.
This prevents the process from correctly identifying the fault address and
handling the error. From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'. This can lead to unpredictable behavior and
crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set. This effectively
restores the old behavior.
Link: https://lkml.kernel.org/r/20241031200438.2951287-1-roman.gushchin@linux.dev
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Co-developed-by: Andrei Vagin <avagin(a)google.com>
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/user_namespace.h | 3 ++-
kernel/signal.c | 3 ++-
kernel/ucount.c | 5 +++--
3 files changed, 7 insertions(+), 4 deletions(-)
--- a/include/linux/user_namespace.h~signal-restore-the-override_rlimit-logic
+++ a/include/linux/user_namespace.h
@@ -141,7 +141,8 @@ static inline long get_rlimit_value(stru
long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type);
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit);
void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type);
bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max);
--- a/kernel/signal.c~signal-restore-the-override_rlimit-logic
+++ a/kernel/signal.c
@@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_st
*/
rcu_read_lock();
ucounts = task_ucounts(t);
- sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
+ sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING,
+ override_rlimit);
rcu_read_unlock();
if (!sigpending)
return NULL;
--- a/kernel/ucount.c~signal-restore-the-override_rlimit-logic
+++ a/kernel/ucount.c
@@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucoun
do_dec_rlimit_put_ucounts(ucounts, NULL, type);
}
-long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
+ bool override_rlimit)
{
/* Caller must hold a reference to ucounts */
struct ucounts *iter;
@@ -316,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucoun
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(1, &iter->rlimit[type]);
- if (new < 0 || new > max)
+ if (new < 0 || (!override_rlimit && (new > max)))
goto unwind;
if (iter == ucounts)
ret = new;
_
Patches currently in -mm which might be from roman.gushchin(a)linux.dev are
signal-restore-the-override_rlimit-logic.patch
The GGTT looks to be stored inside stolen memory on igpu which is not
treated as normal RAM. The core kernel skips this memory range when
creating the hibernation image, therefore when coming back from
hibernation the GGTT programming is lost. This seems to cause issues
with broken resume where GuC FW fails to load:
[drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1
[drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01
[drm] *ERROR* GT0: firmware signature verification failed
[drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged.
Current GGTT users are kernel internal and tracked as pinned, so it
should be possible to hook into the existing save/restore logic that we
use for dgpu, where the actual evict is skipped but on restore we
importantly restore the GGTT programming. This has been confirmed to
fix hibernation on at least ADL and MTL, though likely all igpu
platforms are affected.
This also means we have a hole in our testing, where the existing s4
tests only really test the driver hooks, and don't go as far as actually
rebooting and restoring from the hibernation image and in turn powering
down RAM (and therefore losing the contents of stolen).
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_bo.c | 36 ++++++++++++++------------------
drivers/gpu/drm/xe/xe_bo_evict.c | 6 ------
2 files changed, 16 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index d79d8ef5c7d5..0ae5c8f7bab8 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -950,7 +950,10 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
if (WARN_ON(!xe_bo_is_pinned(bo)))
return -EINVAL;
- if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm))
+ if (WARN_ON(xe_bo_is_vram(bo)))
+ return -EINVAL;
+
+ if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo)))
return -EINVAL;
if (!mem_type_is_vram(place->mem_type))
@@ -1770,6 +1773,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
int xe_bo_pin(struct xe_bo *bo)
{
+ struct ttm_place *place = &(bo->placements[0]);
struct xe_device *xe = xe_bo_device(bo);
int err;
@@ -1800,7 +1804,6 @@ int xe_bo_pin(struct xe_bo *bo)
*/
if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
- struct ttm_place *place = &(bo->placements[0]);
if (mem_type_is_vram(place->mem_type)) {
xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS);
@@ -1809,13 +1812,12 @@ int xe_bo_pin(struct xe_bo *bo)
vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT;
place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT);
}
+ }
- if (mem_type_is_vram(place->mem_type) ||
- bo->flags & XE_BO_FLAG_GGTT) {
- spin_lock(&xe->pinned.lock);
- list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
- spin_unlock(&xe->pinned.lock);
- }
+ if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
+ spin_lock(&xe->pinned.lock);
+ list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present);
+ spin_unlock(&xe->pinned.lock);
}
ttm_bo_pin(&bo->ttm);
@@ -1863,24 +1865,18 @@ void xe_bo_unpin_external(struct xe_bo *bo)
void xe_bo_unpin(struct xe_bo *bo)
{
+ struct ttm_place *place = &(bo->placements[0]);
struct xe_device *xe = xe_bo_device(bo);
xe_assert(xe, !bo->ttm.base.import_attach);
xe_assert(xe, xe_bo_is_pinned(bo));
- if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
- bo->flags & XE_BO_FLAG_INTERNAL_TEST)) {
- struct ttm_place *place = &(bo->placements[0]);
-
- if (mem_type_is_vram(place->mem_type) ||
- bo->flags & XE_BO_FLAG_GGTT) {
- spin_lock(&xe->pinned.lock);
- xe_assert(xe, !list_empty(&bo->pinned_link));
- list_del_init(&bo->pinned_link);
- spin_unlock(&xe->pinned.lock);
- }
+ if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) {
+ spin_lock(&xe->pinned.lock);
+ xe_assert(xe, !list_empty(&bo->pinned_link));
+ list_del_init(&bo->pinned_link);
+ spin_unlock(&xe->pinned.lock);
}
-
ttm_bo_unpin(&bo->ttm);
}
diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
index 32043e1e5a86..b01bc20eb90b 100644
--- a/drivers/gpu/drm/xe/xe_bo_evict.c
+++ b/drivers/gpu/drm/xe/xe_bo_evict.c
@@ -34,9 +34,6 @@ int xe_bo_evict_all(struct xe_device *xe)
u8 id;
int ret;
- if (!IS_DGFX(xe))
- return 0;
-
/* User memory */
for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
struct ttm_resource_manager *man =
@@ -125,9 +122,6 @@ int xe_bo_restore_kernel(struct xe_device *xe)
struct xe_bo *bo;
int ret;
- if (!IS_DGFX(xe))
- return 0;
-
spin_lock(&xe->pinned.lock);
for (;;) {
bo = list_first_entry_or_null(&xe->pinned.evicted,
--
2.47.0
On Fri, Nov 1, 2024 at 6:58 AM Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
>
> to the 6.11-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> lib-alloc_tag_module_unload-must-wait-for-pending-kf.patch
> and it can be found in the queue-6.11 subdirectory.
Thanks Sasha! Could you please double-check that the prerequisite
patch https://lore.kernel.org/all/20241021171003.2907935-1-surenb@google.com/
was also picked up? I don't see it in the queue-6.11 directory.
Without that patch this one will cause build errors, that's why I sent
them as a patchset.
Thanks,
Suren.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 536dfe685ebd28b27ebfbc3d4b9168207b7e28a3
> Author: Florian Westphal <fw(a)strlen.de>
> Date: Mon Oct 7 22:52:24 2024 +0200
>
> lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
>
> [ Upstream commit dc783ba4b9df3fb3e76e968b2cbeb9960069263c ]
>
> Ben Greear reports following splat:
> ------------[ cut here ]------------
> net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> ...
> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> codetag_unload_module+0x19b/0x2a0
> ? codetag_load_module+0x80/0x80
>
> nf_nat module exit calls kfree_rcu on those addresses, but the free
> operation is likely still pending by the time alloc_tag checks for leaks.
>
> Wait for outstanding kfree_rcu operations to complete before checking
> resolves this warning.
>
> Reproducer:
> unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> grep nf_nat /proc/allocinfo # will list 4 allocations
> rmmod nft_chain_nat
> rmmod nf_nat # will WARN.
>
> [akpm(a)linux-foundation.org: add comment]
> Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
> Fixes: a473573964e5 ("lib: code tagging module support")
> Signed-off-by: Florian Westphal <fw(a)strlen.de>
> Reported-by: Ben Greear <greearb(a)candelatech.com>
> Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candela…
> Cc: Uladzislau Rezki <urezki(a)gmail.com>
> Cc: Vlastimil Babka <vbabka(a)suse.cz>
> Cc: Suren Baghdasaryan <surenb(a)google.com>
> Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
> Cc: <stable(a)vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/lib/codetag.c b/lib/codetag.c
> index afa8a2d4f3173..d1fbbb7c2ec3d 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -228,6 +228,9 @@ bool codetag_unload_module(struct module *mod)
> if (!mod)
> return true;
>
> + /* await any module's kfree_rcu() operations to complete */
> + kvfree_rcu_barrier();
> +
> mutex_lock(&codetag_lock);
> list_for_each_entry(cttype, &codetag_types, link) {
> struct codetag_module *found = NULL;
From: MrRurikov <grurikov(a)gmal.com>
After having been assigned to a NULL value at rdma.c:1758, pointer 'queue'
is passed as 1st parameter in call to function
'nvmet_rdma_queue_established' at rdma.c:1773, as 1st parameter in call
to function 'nvmet_rdma_queue_disconnect' at rdma.c:1787 and as 2nd
parameter in call to function 'nvmet_rdma_queue_connect_fail' at
rdma.c:1800, where it is dereferenced.
I understand, that driver is confident that the
RDMA_CM_EVENT_CONNECT_REQUEST event will occur first and perform
initialization, but maliciously prepared hardware could send events in
violation of the protocol. Nothing guarantees that the sequence of events
will start with RDMA_CM_EVENT_CONNECT_REQUEST.
Found by Linux Verification Center (linuxtesting.org) with SVACE
Fixes: e1a2ee249b19 ("nvmet-rdma: Fix use after free in nvmet_rdma_cm_handler()")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/nvme/target/rdma.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 1b6264fa5803..becebc95f349 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1770,8 +1770,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
ret = nvmet_rdma_queue_connect(cm_id, event);
break;
case RDMA_CM_EVENT_ESTABLISHED:
- nvmet_rdma_queue_established(queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_established(queue);
+ break;
+ }
case RDMA_CM_EVENT_ADDR_CHANGE:
if (!queue) {
struct nvmet_rdma_port *port = cm_id->context;
@@ -1782,8 +1784,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
fallthrough;
case RDMA_CM_EVENT_DISCONNECTED:
case RDMA_CM_EVENT_TIMEWAIT_EXIT:
- nvmet_rdma_queue_disconnect(queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_disconnect(queue);
+ break;
+ }
case RDMA_CM_EVENT_DEVICE_REMOVAL:
ret = nvmet_rdma_device_removal(cm_id, queue);
break;
@@ -1793,8 +1797,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
fallthrough;
case RDMA_CM_EVENT_UNREACHABLE:
case RDMA_CM_EVENT_CONNECT_ERROR:
- nvmet_rdma_queue_connect_fail(cm_id, queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_connect_fail(cm_id, queue);
+ break;
+ }
default:
pr_err("received unrecognized RDMA CM event %d\n",
event->event);
--
2.34.1
Hey,
Would you be interested in acquiring the attendees list of Passenger traffic Expo 2024?
List contains: Names, Titles, Phone Numbers, Company Details, and more…
Interested? Let me know so that I’ll send you the pricing for the same.
Kind Regards,
Camille Batiste
Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
From: Randy MacLeod <Randy.MacLeod(a)windriver.com>
This is my first commit to -stable so I'm going to carefully explain what I've done.
I work on the Yocto Project and I have done some work on the Linux network
stack a long time ago so I'm not quite a complete newbie.
I took the commit found here:
https://lore.kernel.org/stable/20240527185645.658299380@linuxfoundation.org/
and backported as per my commit log:
Based on above commit but simplified since pskb_may_pull_reason()
does not exist until 6.1.
I also trimmed the original commit log of the "Tested by dropwatch" section
as well as the full stack trace since that may have changed in 5.10/5.15 and
It compiles fine for 5.10 and 5.15 but I have not tested with dropwatch since
the patch is just dropping short xmit packets for bridging.
Finally, since the patch is much simpler than the original, I've removed the
original patch author's SOB line.
Please let me know if any of this is not what y'all'd like to see.
Randy MacLeod (1):
net: bridge: xmit: make sure we have at least eth header len bytes
net/bridge/br_device.c | 5 +++++
1 file changed, 5 insertions(+)
base-commit: 5a8fa04b2a4de1d52be4a04690dcb52ac7998943
--
2.34.1
The assignment of the of_node to the aux bridge needs to mark the
of_node as reused as well, otherwise resource providers like pinctrl will
report a gpio as already requested by a different device when both pinconf
and gpios property are present.
Fix that by using the device_set_of_node_from_dev() helper instead.
Fixes: 6914968a0b52 ("drm/bridge: properly refcount DT nodes in aux bridge drivers")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v2:
- Re-worded commit to be more explicit of what it fixes, as Johan suggested
- Used device_set_of_node_from_dev() helper, as per Johan's suggestion
- Added Fixes tag and cc'ed stable
- Link to v1: https://lore.kernel.org/r/20241017-drm-aux-bridge-mark-of-node-reused-v1-1-…
---
drivers/gpu/drm/bridge/aux-bridge.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/aux-bridge.c b/drivers/gpu/drm/bridge/aux-bridge.c
index b29980f95379ec7af873ed6e0fb79a9abb663c7b..295e9d031e2dc86cbfd2a7350718fca181c99487 100644
--- a/drivers/gpu/drm/bridge/aux-bridge.c
+++ b/drivers/gpu/drm/bridge/aux-bridge.c
@@ -58,9 +58,10 @@ int drm_aux_bridge_register(struct device *parent)
adev->id = ret;
adev->name = "aux_bridge";
adev->dev.parent = parent;
- adev->dev.of_node = of_node_get(parent->of_node);
adev->dev.release = drm_aux_bridge_release;
+ device_set_of_node_from_dev(&adev->dev, parent);
+
ret = auxiliary_device_init(adev);
if (ret) {
ida_free(&drm_aux_bridge_ida, adev->id);
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241017-drm-aux-bridge-mark-of-node-reused-5c2ee740ff19
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
The unsigned variable `size_t len` is cast to the signed type `loff_t`
when passed to the function check_add_overflow(). This function considers
the type of the destination, which is of type loff_t (signed),
potentially leading to an overflow. This issue is similar to the one
described in the link below.
Remove the cast.
Note that even if check_add_overflow() is bypassed, by setting `len` to
a value that is greater than LONG_MAX (which is considered as a negative
value after the cast), the function copy_from_user(), invoked a few lines
later, will not perform any copy and return `len` as (len > INT_MAX)
causing qat_vf_resume_write() to fail with -EFAULT.
Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices")
CC: stable(a)vger.kernel.org # 6.10+
Link: https://lore.kernel.org/all/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mou…
Reported-by: Zijie Zhao <zzjas98(a)gmail.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Xin Zeng <xin.zeng(a)intel.com>
---
drivers/vfio/pci/qat/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/qat/main.c b/drivers/vfio/pci/qat/main.c
index e36740a282e7..1e3563fe7cab 100644
--- a/drivers/vfio/pci/qat/main.c
+++ b/drivers/vfio/pci/qat/main.c
@@ -305,7 +305,7 @@ static ssize_t qat_vf_resume_write(struct file *filp, const char __user *buf,
offs = &filp->f_pos;
if (*offs < 0 ||
- check_add_overflow((loff_t)len, *offs, &end))
+ check_add_overflow(len, *offs, &end))
return -EOVERFLOW;
if (end > mig_dev->state_size)
--
2.47.0
Device nodes accessed via of_get_compatible_child() require
of_node_put() to be called when the node is no longer required to avoid
leaving a reference to the node behind, leaking the resource.
In this case, the usage of 'tnode' is straightforward and there are no
error paths, allowing for a single of_node_put() when 'tnode' is no
longer required.
Cc: stable(a)vger.kernel.org
Fixes: 29646ee33cc3 ("counter: stm32-timer-cnt: add checks on quadrature encoder capability")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
drivers/counter/stm32-timer-cnt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/counter/stm32-timer-cnt.c b/drivers/counter/stm32-timer-cnt.c
index 186e73d6ccb4..0d8206adccb3 100644
--- a/drivers/counter/stm32-timer-cnt.c
+++ b/drivers/counter/stm32-timer-cnt.c
@@ -694,6 +694,7 @@ static int stm32_timer_cnt_probe_encoder(struct device *dev,
}
ret = of_property_read_u32(tnode, "reg", &idx);
+ of_node_put(tnode);
if (ret) {
dev_err(dev, "Can't get index (%d)\n", ret);
return ret;
---
base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8
change-id: 20241027-stm32-timer-cnt-of_node_put-8c6695e7a373
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The patch titled
Subject: mm/damon/core: avoid overflow in damon_feed_loop_next_input()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-avoid-overflow-in-damon_feed_loop_next_input.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: avoid overflow in damon_feed_loop_next_input()
Date: Thu, 31 Oct 2024 09:12:03 -0700
damon_feed_loop_next_input() is inefficient and fragile to overflows.
Specifically, 'score_goal_diff_bp' calculation can overflow when 'score'
is high. The calculation is actually unnecessary at all because 'goal' is
a constant of value 10,000. Calculation of 'compensation' is again
fragile to overflow. Final calculation of return value for under-achiving
case is again fragile to overflow when the current score is
under-achieving the target.
Add two corner cases handling at the beginning of the function to make the
body easier to read, and rewrite the body of the function to avoid
overflows and the unnecessary bp value calcuation.
Link: https://lkml.kernel.org/r/20241031161203.47751-1-sj@kernel.org
Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Closes: https://lore.kernel.org/944f3d5b-9177-48e7-8ec9-7f1331a3fea3@roeck-us.net
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: <stable(a)vger.kernel.org> [6.8.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
--- a/mm/damon/core.c~mm-damon-core-avoid-overflow-in-damon_feed_loop_next_input
+++ a/mm/damon/core.c
@@ -1456,17 +1456,31 @@ static unsigned long damon_feed_loop_nex
unsigned long score)
{
const unsigned long goal = 10000;
- unsigned long score_goal_diff = max(goal, score) - min(goal, score);
- unsigned long score_goal_diff_bp = score_goal_diff * 10000 / goal;
- unsigned long compensation = last_input * score_goal_diff_bp / 10000;
/* Set minimum input as 10000 to avoid compensation be zero */
const unsigned long min_input = 10000;
+ unsigned long score_goal_diff, compensation;
+ bool over_achieving = score > goal;
- if (goal > score)
+ if (score == goal)
+ return last_input;
+ if (score >= goal * 2)
+ return min_input;
+
+ if (over_achieving)
+ score_goal_diff = score - goal;
+ else
+ score_goal_diff = goal - score;
+
+ if (last_input < ULONG_MAX / score_goal_diff)
+ compensation = last_input * score_goal_diff / goal;
+ else
+ compensation = last_input / goal * score_goal_diff;
+
+ if (over_achieving)
+ return max(last_input - compensation, min_input);
+ if (last_input < ULONG_MAX - compensation)
return last_input + compensation;
- if (last_input > compensation + min_input)
- return last_input - compensation;
- return min_input;
+ return ULONG_MAX;
}
#ifdef CONFIG_PSI
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-handle-zero-aggregationops_update-intervals.patch
mm-damon-core-handle-zero-schemes-apply-interval.patch
mm-damon-core-avoid-overflow-in-damon_feed_loop_next_input.patch
selftests-damon-huge_count_read_write-remove-unnecessary-debugging-message.patch
selftests-damon-_debugfs_common-hide-expected-error-message-from-test_write_result.patch
selftests-damon-debugfs_duplicate_context_creation-hide-errors-from-expected-file-write-failures.patch
mm-damon-kconfig-update-dbgfs_kunit-prompt-copy-for-sysfs_kunit.patch
mm-damon-tests-dbgfs-kunit-fix-the-header-double-inclusion-guarding-ifdef-comment.patch
The patch titled
Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Date: Sat, 19 Oct 2024 01:29:38 +0000
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google…
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Matlack <dmatlack(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: David Stevens <stevensd(a)google.com>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 2 --
mm/vmscan.c | 14 +++++---------
2 files changed, 5 insertions(+), 11 deletions(-)
--- a/include/linux/mmzone.h~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats
+++ a/include/linux/mmzone.h
@@ -458,9 +458,7 @@ struct lru_gen_folio {
enum {
MM_LEAF_TOTAL, /* total leaf entries */
- MM_LEAF_OLD, /* old leaf entries */
MM_LEAF_YOUNG, /* young leaf entries */
- MM_NONLEAF_TOTAL, /* total non-leaf entries */
MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */
MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */
NR_MM_STATS
--- a/mm/vmscan.c~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats
+++ a/mm/vmscan.c
@@ -3399,7 +3399,6 @@ restart:
continue;
if (!pte_young(ptent)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3552,7 +3551,6 @@ restart:
walk->mm_stats[MM_LEAF_TOTAL]++;
if (!pmd_young(val)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3564,8 +3562,6 @@ restart:
continue;
}
- walk->mm_stats[MM_NONLEAF_TOTAL]++;
-
if (!walk->force_scan && should_clear_pmd_young()) {
if (!pmd_young(val))
continue;
@@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
for (type = 0; type < ANON_AND_FILE; type++) {
- const char *s = " ";
+ const char *s = "xxx";
unsigned long n[3] = {};
if (seq == max_seq) {
- s = "RT ";
+ s = "RTx";
n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
@@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
- const char *s = " ";
+ const char *s = "xxxx";
unsigned long n = 0;
if (seq == max_seq && NR_HIST_GENS == 1) {
- s = "LOYNFA";
+ s = "TYFA";
n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
- s = "loynfa";
+ s = "tyfa";
n = READ_ONCE(mm_state->stats[hist][i]);
}
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch
mm-page_alloc-keep-track-of-free-highatomic.patch
The patch titled
Subject: mm/damon/core: handle zero schemes apply interval
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-handle-zero-schemes-apply-interval.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: handle zero schemes apply interval
Date: Thu, 31 Oct 2024 11:37:57 -0700
DAMON's logics to determine if this is the time to apply damos schemes
assumes next_apply_sis is always set larger than current
passed_sample_intervals. And therefore assume continuously incrementing
passed_sample_intervals will make it reaches to the next_apply_sis in
future. The logic hence does apply the scheme and update next_apply_sis
only if passed_sample_intervals is same to next_apply_sis.
If Schemes apply interval is set as zero, however, next_apply_sis is set
same to current passed_sample_intervals, respectively. And
passed_sample_intervals is incremented before doing the next_apply_sis
check. Hence, next_apply_sis becomes larger than next_apply_sis, and the
logic says it is not the time to apply schemes and update next_apply_sis.
In other words, DAMON stops applying schemes until passed_sample_intervals
overflows.
Based on the documents and the common sense, a reasonable behavior for
such inputs would be applying the schemes for every sampling interval.
Handle the case by removing the assumption.
Link: https://lkml.kernel.org/r/20241031183757.49610-3-sj@kernel.org
Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.7.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/damon/core.c~mm-damon-core-handle-zero-schemes-apply-interval
+++ a/mm/damon/core.c
@@ -1412,7 +1412,7 @@ static void damon_do_apply_schemes(struc
damon_for_each_scheme(s, c) {
struct damos_quota *quota = &s->quota;
- if (c->passed_sample_intervals != s->next_apply_sis)
+ if (c->passed_sample_intervals < s->next_apply_sis)
continue;
if (!s->wmarks.activated)
@@ -1622,7 +1622,7 @@ static void kdamond_apply_schemes(struct
bool has_schemes_to_apply = false;
damon_for_each_scheme(s, c) {
- if (c->passed_sample_intervals != s->next_apply_sis)
+ if (c->passed_sample_intervals < s->next_apply_sis)
continue;
if (!s->wmarks.activated)
@@ -1642,9 +1642,9 @@ static void kdamond_apply_schemes(struct
}
damon_for_each_scheme(s, c) {
- if (c->passed_sample_intervals != s->next_apply_sis)
+ if (c->passed_sample_intervals < s->next_apply_sis)
continue;
- s->next_apply_sis +=
+ s->next_apply_sis = c->passed_sample_intervals +
(s->apply_interval_us ? s->apply_interval_us :
c->attrs.aggr_interval) / sample_interval;
}
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-handle-zero-aggregationops_update-intervals.patch
mm-damon-core-handle-zero-schemes-apply-interval.patch
selftests-damon-huge_count_read_write-remove-unnecessary-debugging-message.patch
selftests-damon-_debugfs_common-hide-expected-error-message-from-test_write_result.patch
selftests-damon-debugfs_duplicate_context_creation-hide-errors-from-expected-file-write-failures.patch
mm-damon-kconfig-update-dbgfs_kunit-prompt-copy-for-sysfs_kunit.patch
mm-damon-tests-dbgfs-kunit-fix-the-header-double-inclusion-guarding-ifdef-comment.patch
The patch titled
Subject: mm/damon/core: handle zero {aggregation,ops_update} intervals
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-handle-zero-aggregationops_update-intervals.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: handle zero {aggregation,ops_update} intervals
Date: Thu, 31 Oct 2024 11:37:56 -0700
Patch series "mm/damon/core: fix handling of zero non-sampling intervals".
DAMON's internal intervals accounting logic is not correctly handling
non-sampling intervals of zero values for a wrong assumption. This could
cause unexpected monitoring behavior, and even result in infinite hang of
DAMON sysfs interface user threads in case of zero aggregation interval.
Fix those by updating the intervals accounting logic. For details of the
root case and solutions, please refer to commit messages of fixes.
This patch (of 2):
DAMON's logics to determine if this is the time to do aggregation and ops
update assumes next_{aggregation,ops_update}_sis are always set larger
than current passed_sample_intervals. And therefore it further assumes
continuously incrementing passed_sample_intervals every sampling interval
will make it reaches to the next_{aggregation,ops_update}_sis in future.
The logic therefore make the action and update
next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same
to the counts, respectively.
If Aggregation interval or Ops update interval are zero, however,
next_aggregation_sis or next_ops_update_sis are set same to current
passed_sample_intervals, respectively. And passed_sample_intervals is
incremented before doing the next_{aggregation,ops_update}_sis check.
Hence, passed_sample_intervals becomes larger than
next_{aggregation,ops_update}_sis, and the logic says it is not the time
to do the action and update next_{aggregation,ops_update}_sis forever,
until an overflow happens. In other words, DAMON stops doing aggregations
or ops updates effectively forever, and users cannot get monitoring
results.
Based on the documents and the common sense, a reasonable behavior for
such inputs is doing an aggregation and an ops update for every sampling
interval. Handle the case by removing the assumption.
Note that this could incur particular real issue for DAMON sysfs interface
users, in case of zero Aggregation interval. When user starts DAMON with
zero Aggregation interval and asks online DAMON parameter tuning via DAMON
sysfs interface, the request is handled by the aggregation callback.
Until the callback finishes the work, the user who requested the online
tuning just waits. Hence, the user will be stuck until the
passed_sample_intervals overflows.
Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@kernel.org
Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.7.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/damon/core.c~mm-damon-core-handle-zero-aggregationops_update-intervals
+++ a/mm/damon/core.c
@@ -2000,7 +2000,7 @@ static int kdamond_fn(void *data)
if (ctx->ops.check_accesses)
max_nr_accesses = ctx->ops.check_accesses(ctx);
- if (ctx->passed_sample_intervals == next_aggregation_sis) {
+ if (ctx->passed_sample_intervals >= next_aggregation_sis) {
kdamond_merge_regions(ctx,
max_nr_accesses / 10,
sz_limit);
@@ -2018,7 +2018,7 @@ static int kdamond_fn(void *data)
sample_interval = ctx->attrs.sample_interval ?
ctx->attrs.sample_interval : 1;
- if (ctx->passed_sample_intervals == next_aggregation_sis) {
+ if (ctx->passed_sample_intervals >= next_aggregation_sis) {
ctx->next_aggregation_sis = next_aggregation_sis +
ctx->attrs.aggr_interval / sample_interval;
@@ -2028,7 +2028,7 @@ static int kdamond_fn(void *data)
ctx->ops.reset_aggregated(ctx);
}
- if (ctx->passed_sample_intervals == next_ops_update_sis) {
+ if (ctx->passed_sample_intervals >= next_ops_update_sis) {
ctx->next_ops_update_sis = next_ops_update_sis +
ctx->attrs.ops_update_interval /
sample_interval;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-handle-zero-aggregationops_update-intervals.patch
mm-damon-core-handle-zero-schemes-apply-interval.patch
selftests-damon-huge_count_read_write-remove-unnecessary-debugging-message.patch
selftests-damon-_debugfs_common-hide-expected-error-message-from-test_write_result.patch
selftests-damon-debugfs_duplicate_context_creation-hide-errors-from-expected-file-write-failures.patch
mm-damon-kconfig-update-dbgfs_kunit-prompt-copy-for-sysfs_kunit.patch
mm-damon-tests-dbgfs-kunit-fix-the-header-double-inclusion-guarding-ifdef-comment.patch
Improper use of userspace_irqchip_in_use led to syzbot hitting the
following WARN_ON() in kvm_timer_update_irq():
WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459
kvm_timer_update_irq+0x21c/0x394
Call trace:
kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459
kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968
kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264
kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline]
kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline]
kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695
kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
The following sequence led to the scenario:
- Userspace creates a VM and a vCPU.
- The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during
KVM_ARM_VCPU_INIT.
- Without any other setup, such as vGIC or vPMU, userspace issues
KVM_RUN on the vCPU. Since the vPMU is requested, but not setup,
kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change().
As a result, KVM_RUN returns after enabling the timer, but before
incrementing 'userspace_irqchip_in_use':
kvm_arch_vcpu_run_pid_change()
ret = kvm_arm_pmu_v3_enable()
if (!vcpu->arch.pmu.created)
return -EINVAL;
if (ret)
return ret;
[...]
if (!irqchip_in_kernel(kvm))
static_branch_inc(&userspace_irqchip_in_use);
- Userspace ignores the error and issues KVM_ARM_VCPU_INIT again.
Since the timer is already enabled, control moves through the
following flow, ultimately hitting the WARN_ON():
kvm_timer_vcpu_reset()
if (timer->enabled)
kvm_timer_update_irq()
if (!userspace_irqchip())
ret = kvm_vgic_inject_irq()
ret = vgic_lazy_init()
if (unlikely(!vgic_initialized(kvm)))
if (kvm->arch.vgic.vgic_model !=
KVM_DEV_TYPE_ARM_VGIC_V2)
return -EBUSY;
WARN_ON(ret);
Theoretically, since userspace_irqchip_in_use's functionality can be
simply replaced by '!irqchip_in_kernel()', get rid of the static key
to avoid the mismanagement, which also helps with the syzbot issue.
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Suggested-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com>
---
v2:
- Picked the diff shared by Marc to get rid of
'userspace_irqchip_in_use' (thanks).
- Adjusted the commit message accordingly.
v1: https://lore.kernel.org/all/20241025221220.2985227-1-rananta@google.com/
arch/arm64/include/asm/kvm_host.h | 2 --
arch/arm64/kvm/arch_timer.c | 3 +--
arch/arm64/kvm/arm.c | 18 +++---------------
3 files changed, 4 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 329619c6fa961..9f96594a0e05d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -73,8 +73,6 @@ enum kvm_mode kvm_get_mode(void);
static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
#endif
-DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
-
extern unsigned int __ro_after_init kvm_sve_max_vl;
extern unsigned int __ro_after_init kvm_host_sve_max_vl;
int __init kvm_arm_init_sve(void);
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 879982b1cc739..1215df5904185 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -206,8 +206,7 @@ void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
static inline bool userspace_irqchip(struct kvm *kvm)
{
- return static_branch_unlikely(&userspace_irqchip_in_use) &&
- unlikely(!irqchip_in_kernel(kvm));
+ return unlikely(!irqchip_in_kernel(kvm));
}
static void soft_timer_start(struct hrtimer *hrt, u64 ns)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a0d01c46e4084..63f5c05e9dec6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -69,7 +69,6 @@ DECLARE_KVM_NVHE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
static bool vgic_present, kvm_arm_initialised;
static DEFINE_PER_CPU(unsigned char, kvm_hyp_initialized);
-DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
bool is_kvm_arm_initialised(void)
{
@@ -503,9 +502,6 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
- static_branch_dec(&userspace_irqchip_in_use);
-
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
kvm_timer_vcpu_terminate(vcpu);
kvm_pmu_vcpu_destroy(vcpu);
@@ -848,14 +844,6 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
return ret;
}
- if (!irqchip_in_kernel(kvm)) {
- /*
- * Tell the rest of the code that there are userspace irqchip
- * VMs in the wild.
- */
- static_branch_inc(&userspace_irqchip_in_use);
- }
-
/*
* Initialize traps for protected VMs.
* NOTE: Move to run in EL2 directly, rather than via a hypercall, once
@@ -1072,7 +1060,7 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
* state gets updated in kvm_timer_update_run and
* kvm_pmu_update_run below).
*/
- if (static_branch_unlikely(&userspace_irqchip_in_use)) {
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
if (kvm_timer_should_notify_user(vcpu) ||
kvm_pmu_should_notify_user(vcpu)) {
*ret = -EINTR;
@@ -1194,7 +1182,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
vcpu->mode = OUTSIDE_GUEST_MODE;
isb(); /* Ensure work in x_flush_hwstate is committed */
kvm_pmu_sync_hwstate(vcpu);
- if (static_branch_unlikely(&userspace_irqchip_in_use))
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
kvm_timer_sync_user(vcpu);
kvm_vgic_sync_hwstate(vcpu);
local_irq_enable();
@@ -1240,7 +1228,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* we don't want vtimer interrupts to race with syncing the
* timer virtual interrupt state.
*/
- if (static_branch_unlikely(&userspace_irqchip_in_use))
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
kvm_timer_sync_user(vcpu);
kvm_arch_vcpu_ctxsync_fp(vcpu);
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
--
2.47.0.163.g1226f6d8fa-goog
DAMON's logics to determine if this is the time to apply damos schemes
assumes next_apply_sis is always set larger than current
passed_sample_intervals. And therefore assume continuously incrementing
passed_sample_intervals will make it reaches to the next_apply_sis in
future. The logic hence does apply the scheme and update next_apply_sis
only if passed_sample_intervals is same to next_apply_sis.
If Schemes apply interval is set as zero, however, next_apply_sis is set
same to current passed_sample_intervals, respectively. And
passed_sample_intervals is incremented before doing the next_apply_sis
check. Hence, next_apply_sis becomes larger than next_apply_sis, and
the logic says it is not the time to apply schemes and update
next_apply_sis. In other words, DAMON stops applying schemes until
passed_sample_intervals overflows.
Based on the documents and the common sense, a reasonable behavior for
such inputs would be applying the schemes for every sampling interval.
Handle the case by removing the assumption.
Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval")
Cc: <stable(a)vger.kernel.org> # 6.7.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 931526fb2d2e..511c3f61ab44 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1412,7 +1412,7 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
damon_for_each_scheme(s, c) {
struct damos_quota *quota = &s->quota;
- if (c->passed_sample_intervals != s->next_apply_sis)
+ if (c->passed_sample_intervals < s->next_apply_sis)
continue;
if (!s->wmarks.activated)
@@ -1636,7 +1636,7 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
bool has_schemes_to_apply = false;
damon_for_each_scheme(s, c) {
- if (c->passed_sample_intervals != s->next_apply_sis)
+ if (c->passed_sample_intervals < s->next_apply_sis)
continue;
if (!s->wmarks.activated)
@@ -1656,9 +1656,9 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
}
damon_for_each_scheme(s, c) {
- if (c->passed_sample_intervals != s->next_apply_sis)
+ if (c->passed_sample_intervals < s->next_apply_sis)
continue;
- s->next_apply_sis +=
+ s->next_apply_sis = c->passed_sample_intervals +
(s->apply_interval_us ? s->apply_interval_us :
c->attrs.aggr_interval) / sample_interval;
}
--
2.39.5
DAMON's logics to determine if this is the time to do aggregation and
ops update assumes next_{aggregation,ops_update}_sis are always set
larger than current passed_sample_intervals. And therefore it further
assumes continuously incrementing passed_sample_intervals every sampling
interval will make it reaches to the next_{aggregation,ops_update}_sis
in future. The logic therefore make the action and update
next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is
same to the counts, respectively.
If Aggregation interval or Ops update interval are zero, however,
next_aggregation_sis or next_ops_update_sis are set same to current
passed_sample_intervals, respectively. And passed_sample_intervals is
incremented before doing the next_{aggregation,ops_update}_sis check.
Hence, passed_sample_intervals becomes larger than
next_{aggregation,ops_update}_sis, and the logic says it is not the time
to do the action and update next_{aggregation,ops_update}_sis forever,
until an overflow happens. In other words, DAMON stops doing
aggregations or ops updates effectively forever, and users cannot get
monitoring results.
Based on the documents and the common sense, a reasonable behavior for
such inputs is doing an aggregation and an ops update for every sampling
interval. Handle the case by removing the assumption.
Note that this could incur particular real issue for DAMON sysfs
interface users, in case of zero Aggregation interval. When user starts
DAMON with zero Aggregation interval and asks online DAMON parameter
tuning via DAMON sysfs interface, the request is handled by the
aggregation callback. Until the callback finishes the work, the user
who requested the online tuning just waits. Hence, the user will be
stuck until the passed_sample_intervals overflows.
Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer")
Cc: <stable(a)vger.kernel.org> # 6.7.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 27745dcf855f..931526fb2d2e 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2014,7 +2014,7 @@ static int kdamond_fn(void *data)
if (ctx->ops.check_accesses)
max_nr_accesses = ctx->ops.check_accesses(ctx);
- if (ctx->passed_sample_intervals == next_aggregation_sis) {
+ if (ctx->passed_sample_intervals >= next_aggregation_sis) {
kdamond_merge_regions(ctx,
max_nr_accesses / 10,
sz_limit);
@@ -2032,7 +2032,7 @@ static int kdamond_fn(void *data)
sample_interval = ctx->attrs.sample_interval ?
ctx->attrs.sample_interval : 1;
- if (ctx->passed_sample_intervals == next_aggregation_sis) {
+ if (ctx->passed_sample_intervals >= next_aggregation_sis) {
ctx->next_aggregation_sis = next_aggregation_sis +
ctx->attrs.aggr_interval / sample_interval;
@@ -2042,7 +2042,7 @@ static int kdamond_fn(void *data)
ctx->ops.reset_aggregated(ctx);
}
- if (ctx->passed_sample_intervals == next_ops_update_sis) {
+ if (ctx->passed_sample_intervals >= next_ops_update_sis) {
ctx->next_ops_update_sis = next_ops_update_sis +
ctx->attrs.ops_update_interval /
sample_interval;
--
2.39.5
I can't find any reason why it won't happen.
In SERDES_TG3_SGMII_MODE, when current_link_up == true and
current_duplex == DUPLEX_FULL, program execution will be transferred
using the goto fiber_setup_done, where the uninitialized remote_adv
variable is passed as the second parameter to the
tg3_setup_flow_control function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 85730a631f0c ("tg3: Add SGMII phy support for 5719/5718 serdes")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <grurikov(a)gmail.com>
---
drivers/net/ethernet/broadcom/tg3.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 378815917741..b1c60851c841 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -5802,7 +5802,8 @@ static int tg3_setup_fiber_mii_phy(struct tg3 *tp, bool force_reset)
u32 current_speed = SPEED_UNKNOWN;
u8 current_duplex = DUPLEX_UNKNOWN;
bool current_link_up = false;
- u32 local_adv, remote_adv, sgsr;
+ u32 local_adv, sgsr;
+ u32 remote_adv = 0;
if ((tg3_asic_rev(tp) == ASIC_REV_5719 ||
tg3_asic_rev(tp) == ASIC_REV_5720) &&
--
2.34.1
From: Bin Liu <b-liu(a)ti.com>
Currently in omap_8250_shutdown, the dma->rx_running flag is
set to zero in omap_8250_rx_dma_flush. Next pm_runtime_get_sync
is called, which is a runtime resume call stack which can
re-set the flag. When the call omap_8250_shutdown returns, the
flag is expected to be UN-SET, but this is not the case. This
is causing issues the next time UART is re-opened and
omap_8250_rx_dma is called. Fix by moving pm_runtime_get_sync
before the omap_8250_rx_dma_flush.
cc: stable(a)vger.kernel.org
Fixes: 0e31c8d173ab ("tty: serial: 8250_omap: add custom DMA-RX callback")
Signed-off-by: Bin Liu <b-liu(a)ti.com>
[Judith: Add commit message]
Signed-off-by: Judith Mendez <jm(a)ti.com>
Reviewed-by: Kevin Hilman <khilman(a)baylibre.com>
Tested-by: Kevin Hilman <khilman(a)baylibre.com>
---
Issue seen on am335x devices so far [0].
The patch has been tested with sanity boot test on am335x EVM,
am335x-boneblack and am57xx-beagle-x15.
Changes since v1 RESEND:
- Fix email header and commit description length
- Add fixes tag, add link [0], cc stable, add kevin's reviewed-by/tested-by's
- Separate patch from patch series [1]
[0] https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1…
Link to v1 RESEND:
[1] https://lore.kernel.org/linux-omap/20241011173356.870883-1-jm@ti.com/
---
drivers/tty/serial/8250/8250_omap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 88b58f44e4e97..0dd68bdbfbcf7 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -776,12 +776,12 @@ static void omap_8250_shutdown(struct uart_port *port)
struct uart_8250_port *up = up_to_u8250p(port);
struct omap8250_priv *priv = port->private_data;
+ pm_runtime_get_sync(port->dev);
+
flush_work(&priv->qos_work);
if (up->dma)
omap_8250_rx_dma_flush(up);
- pm_runtime_get_sync(port->dev);
-
serial_out(up, UART_OMAP_WER, 0);
if (priv->habit & UART_HAS_EFR2)
serial_out(up, UART_OMAP_EFR2, 0x0);
--
2.47.0
This series fixes the wrong management of the 'led_node' fwnode_handle,
which is not released after it is no longer required. This affects both
the normal path of execution and the existing error paths (currently
two) in max5970_led_probe().
First, the missing callst to fwnode_handle_put() in the different code
paths are added, to make the patch available for stable kernels. Then,
the code gets updated to a more robust approach by means of the __free()
macro to automatically release the node when it goes out of scope,
removing the need for explicit calls to fwnode_handle_put().
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
leds: max5970: fix unreleased fwnode_handle in probe function
leds: max5970: use cleanup facility for fwnode_handle led_node
drivers/leds/leds-max5970.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
---
base-commit: f2493655d2d3d5c6958ed996b043c821c23ae8d3
change-id: 20241019-max5970-of_node_put-939b004f57d2
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
damon_feed_loop_next_input() is inefficient and fragile to overflows.
Specifically, 'score_goal_diff_bp' calculation can overflow when 'score'
is high. The calculation is actually unnecessary at all because 'goal'
is a constant of value 10,000. Calculation of 'compensation' is again
fragile to overflow. Final calculation of return value for
under-achiving case is again fragile to overflow when the current score
is under-achieving the target.
Add two corner cases handling at the beginning of the function to make
the body easier to read, and rewrite the body of the function to avoid
overflows and the unnecessary bp value calcuation.
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Closes: https://lore.kernel.org/944f3d5b-9177-48e7-8ec9-7f1331a3fea3@roeck-us.net
Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning")
Cc: <stable(a)vger.kernel.org> # 6.8.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
---
Changes from RFC
(https://lore.kernel.org/20240905172405.46995-1-sj@kernel.org)
- Rebase on latest mm-unstable and cleanup code
mm/damon/core.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index a83f3b736d51..27745dcf855f 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1456,17 +1456,31 @@ static unsigned long damon_feed_loop_next_input(unsigned long last_input,
unsigned long score)
{
const unsigned long goal = 10000;
- unsigned long score_goal_diff = max(goal, score) - min(goal, score);
- unsigned long score_goal_diff_bp = score_goal_diff * 10000 / goal;
- unsigned long compensation = last_input * score_goal_diff_bp / 10000;
/* Set minimum input as 10000 to avoid compensation be zero */
const unsigned long min_input = 10000;
+ unsigned long score_goal_diff, compensation;
+ bool over_achieving = score > goal;
- if (goal > score)
+ if (score == goal)
+ return last_input;
+ if (score >= goal * 2)
+ return min_input;
+
+ if (over_achieving)
+ score_goal_diff = score - goal;
+ else
+ score_goal_diff = goal - score;
+
+ if (last_input < ULONG_MAX / score_goal_diff)
+ compensation = last_input * score_goal_diff / goal;
+ else
+ compensation = last_input / goal * score_goal_diff;
+
+ if (over_achieving)
+ return max(last_input - compensation, min_input);
+ if (last_input < ULONG_MAX - compensation)
return last_input + compensation;
- if (last_input > compensation + min_input)
- return last_input - compensation;
- return min_input;
+ return ULONG_MAX;
}
#ifdef CONFIG_PSI
--
2.39.5
Setting TPM_CHIP_FLAG_SUSPENDED in the end of tpm_pm_suspend() can be racy
according to the bug report, as this leaves window for tpm_hwrng_read() to
be called while the operation is in progress. Move setting of the flag
into the beginning.
Cc: stable(a)vger.kernel.org # v6.4+
Fixes: 99d464506255 ("tpm: Prevent hwrng from activating during resume")
Reported-by: Mike Seo <mikeseohyungjin(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219383
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
drivers/char/tpm/tpm-interface.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 8134f002b121..3f96bc8b95df 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -370,6 +370,8 @@ int tpm_pm_suspend(struct device *dev)
if (!chip)
return -ENODEV;
+ chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
+
if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED)
goto suspended;
@@ -390,8 +392,6 @@ int tpm_pm_suspend(struct device *dev)
}
suspended:
- chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
-
if (rc)
dev_err(dev, "Ignoring error %d while suspending\n", rc);
return 0;
--
2.47.0
Syzkaller reported a hung task with uevent_show() on stack trace. That
specific issue was addressed by another commit [0], but even with that
fix applied (for example, running v6.12-rc5) we face another type of hung
task that comes from the same reproducer [1]. By investigating that, we
could narrow it to the following path:
(a) Syzkaller emulates a Realtek USB WiFi adapter using raw-gadget and
dummy_hcd infrastructure.
(b) During the probe of rtl8192cu, the driver ends-up performing an efuse
read procedure (which is related to EEPROM load IIUC), and here lies the
issue: the function read_efuse() calls read_efuse_byte() many times, as
loop iterations depending on the efuse size (in our example, 512 in total).
This procedure for reading efuse bytes relies in a loop that performs an
I/O read up to *10k* times in case of failures. We measured the time of
the loop inside read_efuse_byte() alone, and in this reproducer (which
involves the dummy_hcd emulation layer), it takes 15 seconds each. As a
consequence, we have the driver stuck in its probe routine for big time,
exposing a stack trace like below if we attempt to reboot the system, for
example:
task:kworker/0:3 state:D stack:0 pid:662 tgid:662 ppid:2 flags:0x00004000
Workqueue: usb_hub_wq hub_event
Call Trace:
__schedule+0xe22/0xeb6
schedule_timeout+0xe7/0x132
__wait_for_common+0xb5/0x12e
usb_start_wait_urb+0xc5/0x1ef
? usb_alloc_urb+0x95/0xa4
usb_control_msg+0xff/0x184
_usbctrl_vendorreq_sync+0xa0/0x161
_usb_read_sync+0xb3/0xc5
read_efuse_byte+0x13c/0x146
read_efuse+0x351/0x5f0
efuse_read_all_map+0x42/0x52
rtl_efuse_shadow_map_update+0x60/0xef
rtl_get_hwinfo+0x5d/0x1c2
rtl92cu_read_eeprom_info+0x10a/0x8d5
? rtl92c_read_chip_version+0x14f/0x17e
rtl_usb_probe+0x323/0x851
usb_probe_interface+0x278/0x34b
really_probe+0x202/0x4a4
__driver_probe_device+0x166/0x1b2
driver_probe_device+0x2f/0xd8
[...]
We propose hereby to drastically reduce the attempts of doing the I/O reads
in case of failures, restricted to USB devices (given that they're inherently
slower than PCIe ones). By retrying up to 10 times (instead of 10000), we
got responsiveness in the reproducer, while seems reasonable to believe
that there's no sane USB device implementation in the field requiring this
amount of retries (at every I/O read) in order to properly work. Based on
that assumption, it'd be good to have it backported to stable but maybe not
since driver implementation (the 10k number comes from day 0), perhaps up
to 6.x series makes sense.
[0] Commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race").
[1] A note about that: this syzkaller report presents multiple reproducers
that differs by the type of emulated USB device. For this specific case,
check the entry from 2024/08/08 06:23 in the list of crashes; the C repro
is available at https://syzkaller.appspot.com/text?tag=ReproC&x=1521fc83980000.
Cc: stable(a)vger.kernel.org # v6.1+
Reported-by: syzbot+edd9fe0d3a65b14588d5(a)syzkaller.appspotmail.com
Tested-by: Bitterblue Smith <rtl8821cerfe2(a)gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
V2:
- Restrict the change to USB device only (thanks Ping-Ke Shih).
- Tested in 2 USB devices by Bitterblue Smith - thanks a lot!
V1: https://lore.kernel.org/lkml/20241025150226.896613-1-gpiccoli@igalia.com/
drivers/net/wireless/realtek/rtlwifi/efuse.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/efuse.c b/drivers/net/wireless/realtek/rtlwifi/efuse.c
index 82cf5fb5175f..f741066c06de 100644
--- a/drivers/net/wireless/realtek/rtlwifi/efuse.c
+++ b/drivers/net/wireless/realtek/rtlwifi/efuse.c
@@ -164,7 +164,17 @@ void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf)
struct rtl_priv *rtlpriv = rtl_priv(hw);
u32 value32;
u8 readbyte;
- u16 retry;
+ u16 retry, max_attempts;
+
+ /*
+ * In case of USB devices, transfer speeds are limited, hence
+ * efuse I/O reads could be (way) slower. So, decrease (a lot)
+ * the read attempts in case of failures.
+ */
+ if (rtlpriv->rtlhal.interface == INTF_PCI)
+ max_attempts = 10000;
+ else
+ max_attempts = 10;
rtl_write_byte(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL] + 1,
(_offset & 0xff));
@@ -178,7 +188,7 @@ void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf)
retry = 0;
value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]);
- while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < 10000)) {
+ while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < max_attempts)) {
value32 = rtl_read_dword(rtlpriv,
rtlpriv->cfg->maps[EFUSE_CTRL]);
retry++;
--
2.46.2
Since commit 92a81562e695 ("leds: lp55xx: Add multicolor framework
support to lp55xx") there are two subsequent tests if the chan_nr
(reg property) is in valid range. One in the lp55xx_init_led()
function and one in the lp55xx_parse_common_child() function that
was added with the mentioned commit.
There are two issues with that.
First is in the lp55xx_parse_common_child() function where the reg
property is tested right after it is read from the device tree.
Test for the upper range is not correct though. Valid reg values are
0 to (max_channel - 1) so it should be >=.
Second issue is that in case the parsed value is out of the range
the probe just fails and no error message is shown as the code never
reaches the second test that prints and error message.
Remove the test form lp55xx_parse_common_child() function completely
and keep the one in lp55xx_init_led() function to deal with it.
Fixes: 92a81562e695 ("leds: lp55xx: Add multicolor framework support to lp55xx")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Michal Vokáč <michal.vokac(a)ysoft.com>
---
v2:
- Complete change of the approach to the problem.
In v1 I removed the test from lp55xx_init_led() but I failed to test that
solution properly. It could not work. In v2 I removed the test for chan_nr
being out of range from the lp55xx_parse_common_child() function.
- Re-worded the subject and commit message to fit the changes. It was:
"leds: lp55xx: Fix check for invalid channel number"
drivers/leds/leds-lp55xx-common.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/leds/leds-lp55xx-common.c b/drivers/leds/leds-lp55xx-common.c
index 5a2e259679cf..e71456a56ab8 100644
--- a/drivers/leds/leds-lp55xx-common.c
+++ b/drivers/leds/leds-lp55xx-common.c
@@ -1132,9 +1132,6 @@ static int lp55xx_parse_common_child(struct device_node *np,
if (ret)
return ret;
- if (*chan_nr < 0 || *chan_nr > cfg->max_channel)
- return -EINVAL;
-
return 0;
}
--
2.1.4
Move the EXYNOS_UFS_OPT_UFSPR_SECURE check inside exynos_ufs_config_smu().
This way all call sites will benefit from the check. This fixes a bug
currently in the exynos_ufs_resume() path on gs101 as it calls
exynos_ufs_config_smu() and we end up accessing registers that can only
be accessed from secure world which results in a serror.
Fixes: d11e0a318df8 ("scsi: ufs: exynos: Add support for Tensor gs101 SoC")
Signed-off-by: Peter Griffin <peter.griffin(a)linaro.org>
Reviewed-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
Cc: stable(a)vger.kernel.org
---
v3: CC stable and be more verbose in commit message (Tudor)
---
drivers/ufs/host/ufs-exynos.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/ufs/host/ufs-exynos.c b/drivers/ufs/host/ufs-exynos.c
index 33de7ff747a2..f4454e89040f 100644
--- a/drivers/ufs/host/ufs-exynos.c
+++ b/drivers/ufs/host/ufs-exynos.c
@@ -719,6 +719,9 @@ static void exynos_ufs_config_smu(struct exynos_ufs *ufs)
{
u32 reg, val;
+ if (ufs->opts & EXYNOS_UFS_OPT_UFSPR_SECURE)
+ return;
+
exynos_ufs_disable_auto_ctrl_hcc_save(ufs, &val);
/* make encryption disabled by default */
@@ -1454,8 +1457,8 @@ static int exynos_ufs_init(struct ufs_hba *hba)
if (ret)
goto out;
exynos_ufs_specify_phy_time_attr(ufs);
- if (!(ufs->opts & EXYNOS_UFS_OPT_UFSPR_SECURE))
- exynos_ufs_config_smu(ufs);
+
+ exynos_ufs_config_smu(ufs);
hba->host->dma_alignment = DATA_UNIT_SIZE - 1;
return 0;
--
2.47.0.163.g1226f6d8fa-goog
From: Ard Biesheuvel <ardb(a)kernel.org>
The TPM event log table is a Linux specific construct, where the data
produced by the GetEventLog() boot service is cached in memory, and
passed on to the OS using a EFI configuration table.
The use of EFI_LOADER_DATA here results in the region being left
unreserved in the E820 memory map constructed by the EFI stub, and this
is the memory description that is passed on to the incoming kernel by
kexec, which is therefore unaware that the region should be reserved.
Even though the utility of the TPM2 event log after a kexec is
questionable, any corruption might send the parsing code off into the
weeds and crash the kernel. So let's use EFI_ACPI_RECLAIM_MEMORY
instead, which is always treated as reserved by the E820 conversion
logic.
Cc: <stable(a)vger.kernel.org>
Reported-by: Breno Leitao <leitao(a)debian.org>
Tested-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/libstub/tpm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/libstub/tpm.c b/drivers/firmware/efi/libstub/tpm.c
index df3182f2e63a..1fd6823248ab 100644
--- a/drivers/firmware/efi/libstub/tpm.c
+++ b/drivers/firmware/efi/libstub/tpm.c
@@ -96,7 +96,7 @@ static void efi_retrieve_tcg2_eventlog(int version, efi_physical_addr_t log_loca
}
/* Allocate space for the logs and copy them. */
- status = efi_bs_call(allocate_pool, EFI_LOADER_DATA,
+ status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY,
sizeof(*log_tbl) + log_size, (void **)&log_tbl);
if (status != EFI_SUCCESS) {
--
2.46.0.662.g92d0881bb0-goog
The quilt patch titled
Subject: mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
has been removed from the -mm tree. Its filename was
mm-mmap-limit-thp-aligment-of-anonymous-mappings-to-pmd-aligned-sizes.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
Date: Thu, 24 Oct 2024 17:12:29 +0200
Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") a mmap() of anonymous memory without a specific address hint
and of at least PMD_SIZE will be aligned to PMD so that it can benefit
from a THP backing page.
However this change has been shown to regress some workloads
significantly. [1] reports regressions in various spec benchmarks, with
up to 600% slowdown of the cactusBSSN benchmark on some platforms. The
benchmark seems to create many mappings of 4632kB, which would have merged
to a large THP-backed area before commit efa7df3e3bb5 and now they are
fragmented to multiple areas each aligned to PMD boundary with gaps
between. The regression then seems to be caused mainly due to the
benchmark's memory access pattern suffering from TLB or cache aliasing due
to the aligned boundaries of the individual areas.
Another known regression bisected to commit efa7df3e3bb5 is darktable [2]
[3] and early testing suggests this patch fixes the regression there as
well.
To fix the regression but still try to benefit from THP-friendly anonymous
mapping alignment, add a condition that the size of the mapping must be a
multiple of PMD size instead of at least PMD size. In case of many
odd-sized mapping like the cactusBSSN creates, those will stop being
aligned and with gaps between, and instead naturally merge again.
Link: https://lkml.kernel.org/r/20241024151228.101841-2-vbabka@suse.cz
Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Michael Matz <matz(a)suse.de>
Debugged-by: Gabriel Krisman Bertazi <gabriel(a)krisman.be>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1]
Reported-by: Matthias Bodenbinder <matthias(a)bodenbinder.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2]
Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.i… [3]
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Petr Tesarik <ptesarik(a)suse.com>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/mmap.c~mm-mmap-limit-thp-aligment-of-anonymous-mappings-to-pmd-aligned-sizes
+++ a/mm/mmap.c
@@ -900,7 +900,8 @@ __get_unmapped_area(struct file *file, u
if (get_area) {
addr = get_area(file, addr, len, pgoff, flags);
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
+ && IS_ALIGNED(len, PMD_SIZE)) {
/* Ensures that larger anonymous mappings are THP aligned. */
addr = thp_get_unmapped_area_vmflags(file, addr, len,
pgoff, flags, vm_flags);
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
The quilt patch titled
Subject: vmscan,migrate: fix page count imbalance on node stats when demoting pages
has been removed from the -mm tree. Its filename was
vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gregory Price <gourry(a)gourry.net>
Subject: vmscan,migrate: fix page count imbalance on node stats when demoting pages
Date: Fri, 25 Oct 2024 10:17:24 -0400
When numa balancing is enabled with demotion, vmscan will call
migrate_pages when shrinking LRUs. migrate_pages will decrement the
the node's isolated page count, leading to an imbalanced count when
invoked from (MG)LRU code.
The result is dmesg output like such:
$ cat /proc/sys/vm/stat_refresh
[77383.088417] vmstat_refresh: nr_isolated_anon -103212
[77383.088417] vmstat_refresh: nr_isolated_file -899642
This negative value may impact compaction and reclaim throttling.
The following path produces the decrement:
shrink_folio_list
demote_folio_list
migrate_pages
migrate_pages_batch
migrate_folio_move
migrate_folio_done
mod_node_page_state(-ve) <- decrement
This path happens for SUCCESSFUL migrations, not failures. Typically
callers to migrate_pages are required to handle putback/accounting for
failures, but this is already handled in the shrink code.
When accounting for migrations, instead do not decrement the count when
the migration reason is MR_DEMOTION. As of v6.11, this demotion logic
is the only source of MR_DEMOTION.
Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net
Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim")
Signed-off-by: Gregory Price <gourry(a)gourry.net>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Reviewed-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages
+++ a/mm/migrate.c
@@ -1178,7 +1178,7 @@ static void migrate_folio_done(struct fo
* not accounted to NR_ISOLATED_*. They can be recognized
* as __folio_test_movable
*/
- if (likely(!__folio_test_movable(src)))
+ if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION)
mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
folio_is_file_lru(src), -folio_nr_pages(src));
_
Patches currently in -mm which might be from gourry(a)gourry.net are
The quilt patch titled
Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
has been removed from the -mm tree. Its filename was
mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Date: Sat, 19 Oct 2024 01:29:38 +0000
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google…
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: James Houghton <jthoughton(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Matlack <dmatlack(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: David Stevens <stevensd(a)google.com>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 2 --
mm/vmscan.c | 14 +++++---------
2 files changed, 5 insertions(+), 11 deletions(-)
--- a/include/linux/mmzone.h~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats
+++ a/include/linux/mmzone.h
@@ -458,9 +458,7 @@ struct lru_gen_folio {
enum {
MM_LEAF_TOTAL, /* total leaf entries */
- MM_LEAF_OLD, /* old leaf entries */
MM_LEAF_YOUNG, /* young leaf entries */
- MM_NONLEAF_TOTAL, /* total non-leaf entries */
MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */
MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */
NR_MM_STATS
--- a/mm/vmscan.c~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats
+++ a/mm/vmscan.c
@@ -3399,7 +3399,6 @@ restart:
continue;
if (!pte_young(ptent)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3552,7 +3551,6 @@ restart:
walk->mm_stats[MM_LEAF_TOTAL]++;
if (!pmd_young(val)) {
- walk->mm_stats[MM_LEAF_OLD]++;
continue;
}
@@ -3564,8 +3562,6 @@ restart:
continue;
}
- walk->mm_stats[MM_NONLEAF_TOTAL]++;
-
if (!walk->force_scan && should_clear_pmd_young()) {
if (!pmd_young(val))
continue;
@@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
for (type = 0; type < ANON_AND_FILE; type++) {
- const char *s = " ";
+ const char *s = "xxx";
unsigned long n[3] = {};
if (seq == max_seq) {
- s = "RT ";
+ s = "RTx";
n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
@@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct
seq_puts(m, " ");
for (i = 0; i < NR_MM_STATS; i++) {
- const char *s = " ";
+ const char *s = "xxxx";
unsigned long n = 0;
if (seq == max_seq && NR_HIST_GENS == 1) {
- s = "LOYNFA";
+ s = "TYFA";
n = READ_ONCE(mm_state->stats[hist][i]);
} else if (seq != max_seq && NR_HIST_GENS > 1) {
- s = "loynfa";
+ s = "tyfa";
n = READ_ONCE(mm_state->stats[hist][i]);
}
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-page_alloc-keep-track-of-free-highatomic.patch
The quilt patch titled
Subject: mm: allow set/clear page_type again
has been removed from the -mm tree. Its filename was
mm-allow-set-clear-page_type-again.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm: allow set/clear page_type again
Date: Sat, 19 Oct 2024 22:22:12 -0600
Some page flags (page->flags) were converted to page types
(page->page_types). A recent example is PG_hugetlb.
From the exclusive writer's perspective, e.g., a thread doing
__folio_set_hugetlb(), there is a difference between the page flag and
type APIs: the former allows the same non-atomic operation to be repeated
whereas the latter does not. For example, calling __folio_set_hugetlb()
twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type
(PG_hugetlb) not to be set previously.
Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in
the following error-handling path. And when that happens, it triggers the
aforementioned VM_BUG_ON_FOLIO().
if (folio_test_hugetlb(folio)) {
rc = hugetlb_vmemmap_restore_folio(h, folio);
if (rc) {
spin_lock_irq(&hugetlb_lock);
add_hugetlb_folio(h, folio, false);
...
It is possible to make hugeTLB comply with the new requirements from the
page type API. However, a straightforward fix would be to just allow the
same page type to be set or cleared again inside the API, to avoid any
changes to its callers.
Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com
Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/page-flags.h | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/include/linux/page-flags.h~mm-allow-set-clear-page_type-again
+++ a/include/linux/page-flags.h
@@ -975,12 +975,16 @@ static __always_inline bool folio_test_#
} \
static __always_inline void __folio_set_##fname(struct folio *folio) \
{ \
+ if (folio_test_##fname(folio)) \
+ return; \
VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \
folio); \
folio->page.page_type = (unsigned int)PGTY_##lname << 24; \
} \
static __always_inline void __folio_clear_##fname(struct folio *folio) \
{ \
+ if (folio->page.page_type == UINT_MAX) \
+ return; \
VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \
folio->page.page_type = UINT_MAX; \
}
@@ -993,11 +997,15 @@ static __always_inline int Page##uname(c
} \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
+ if (Page##uname(page)) \
+ return; \
VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \
page->page_type = (unsigned int)PGTY_##lname << 24; \
} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
+ if (page->page_type == UINT_MAX) \
+ return; \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
page->page_type = UINT_MAX; \
}
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-page_alloc-keep-track-of-free-highatomic.patch
The quilt patch titled
Subject: nilfs2: fix potential deadlock with newly created symlinks
has been removed from the -mm tree. Its filename was
nilfs2-fix-potential-deadlock-with-newly-created-symlinks.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix potential deadlock with newly created symlinks
Date: Sun, 20 Oct 2024 13:51:28 +0900
Syzbot reported that page_symlink(), called by nilfs_symlink(), triggers
memory reclamation involving the filesystem layer, which can result in
circular lock dependencies among the reader/writer semaphore
nilfs->ns_segctor_sem, s_writers percpu_rwsem (intwrite) and the
fs_reclaim pseudo lock.
This is because after commit 21fc61c73c39 ("don't put symlink bodies in
pagecache into highmem"), the gfp flags of the page cache for symbolic
links are overwritten to GFP_KERNEL via inode_nohighmem().
This is not a problem for symlinks read from the backing device, because
the __GFP_FS flag is dropped after inode_nohighmem() is called. However,
when a new symlink is created with nilfs_symlink(), the gfp flags remain
overwritten to GFP_KERNEL. Then, memory allocation called from
page_symlink() etc. triggers memory reclamation including the FS layer,
which may call nilfs_evict_inode() or nilfs_dirty_inode(). And these can
cause a deadlock if they are called while nilfs->ns_segctor_sem is held:
Fix this issue by dropping the __GFP_FS flag from the page cache GFP flags
of newly created symlinks in the same way that nilfs_new_inode() and
__nilfs_read_inode() do, as a workaround until we adopt nofs allocation
scope consistently or improve the locking constraints.
Link: https://lkml.kernel.org/r/20241020050003.4308-1-konishi.ryusuke@gmail.com
Fixes: 21fc61c73c39 ("don't put symlink bodies in pagecache into highmem")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+9ef37ac20608f4836256(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9ef37ac20608f4836256
Tested-by: syzbot+9ef37ac20608f4836256(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/namei.c | 3 +++
1 file changed, 3 insertions(+)
--- a/fs/nilfs2/namei.c~nilfs2-fix-potential-deadlock-with-newly-created-symlinks
+++ a/fs/nilfs2/namei.c
@@ -157,6 +157,9 @@ static int nilfs_symlink(struct mnt_idma
/* slow symlink */
inode->i_op = &nilfs_symlink_inode_operations;
inode_nohighmem(inode);
+ mapping_set_gfp_mask(inode->i_mapping,
+ mapping_gfp_constraint(inode->i_mapping,
+ ~__GFP_FS));
inode->i_mapping->a_ops = &nilfs_aops;
err = page_symlink(inode, symname, l);
if (err)
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-convert-segment-buffer-to-be-folio-based.patch
nilfs2-convert-common-metadata-file-code-to-be-folio-based.patch
nilfs2-convert-segment-usage-file-to-be-folio-based.patch
nilfs2-convert-persistent-object-allocator-to-be-folio-based.patch
nilfs2-convert-inode-file-to-be-folio-based.patch
nilfs2-convert-dat-file-to-be-folio-based.patch
nilfs2-remove-nilfs_palloc_block_get_entry.patch
nilfs2-convert-checkpoint-file-to-be-folio-based.patch
The quilt patch titled
Subject: kasan: remove vmalloc_percpu test
has been removed from the -mm tree. Its filename was
kasan-remove-vmalloc_percpu-test.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Andrey Konovalov <andreyknvl(a)gmail.com>
Subject: kasan: remove vmalloc_percpu test
Date: Tue, 22 Oct 2024 18:07:06 +0200
Commit 1a2473f0cbc0 ("kasan: improve vmalloc tests") added the
vmalloc_percpu KASAN test with the assumption that __alloc_percpu always
uses vmalloc internally, which is tagged by KASAN.
However, __alloc_percpu might allocate memory from the first per-CPU
chunk, which is not allocated via vmalloc(). As a result, the test might
fail.
Remove the test until proper KASAN annotation for the per-CPU allocated
are added; tracked in https://bugzilla.kernel.org/show_bug.cgi?id=215019.
Link: https://lkml.kernel.org/r/20241022160706.38943-1-andrey.konovalov@linux.dev
Fixes: 1a2473f0cbc0 ("kasan: improve vmalloc tests")
Signed-off-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Reported-by: Samuel Holland <samuel.holland(a)sifive.com>
Link: https://lore.kernel.org/all/4a245fff-cc46-44d1-a5f9-fd2f1c3764ae@sifive.com/
Reported-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Link: https://lore.kernel.org/all/CACzwLxiWzNqPBp4C1VkaXZ2wDwvY3yZeetCi1TLGFipKW7…
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan_test_c.c | 27 ---------------------------
1 file changed, 27 deletions(-)
--- a/mm/kasan/kasan_test_c.c~kasan-remove-vmalloc_percpu-test
+++ a/mm/kasan/kasan_test_c.c
@@ -1810,32 +1810,6 @@ static void vm_map_ram_tags(struct kunit
free_pages((unsigned long)p_ptr, 1);
}
-static void vmalloc_percpu(struct kunit *test)
-{
- char __percpu *ptr;
- int cpu;
-
- /*
- * This test is specifically crafted for the software tag-based mode,
- * the only tag-based mode that poisons percpu mappings.
- */
- KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_SW_TAGS);
-
- ptr = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
-
- for_each_possible_cpu(cpu) {
- char *c_ptr = per_cpu_ptr(ptr, cpu);
-
- KUNIT_EXPECT_GE(test, (u8)get_tag(c_ptr), (u8)KASAN_TAG_MIN);
- KUNIT_EXPECT_LT(test, (u8)get_tag(c_ptr), (u8)KASAN_TAG_KERNEL);
-
- /* Make sure that in-bounds accesses don't crash the kernel. */
- *c_ptr = 0;
- }
-
- free_percpu(ptr);
-}
-
/*
* Check that the assigned pointer tag falls within the [KASAN_TAG_MIN,
* KASAN_TAG_KERNEL) range (note: excluding the match-all tag) for tag-based
@@ -2023,7 +1997,6 @@ static struct kunit_case kasan_kunit_tes
KUNIT_CASE(vmalloc_oob),
KUNIT_CASE(vmap_tags),
KUNIT_CASE(vm_map_ram_tags),
- KUNIT_CASE(vmalloc_percpu),
KUNIT_CASE(match_all_not_assigned),
KUNIT_CASE(match_all_ptr_tag),
KUNIT_CASE(match_all_mem_tag),
_
Patches currently in -mm which might be from andreyknvl(a)gmail.com are
From: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
[ Upstream commit 03f5a999adba062456c8c818a683beb1b498983a ]
In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
VMBus code could free decrypted pages if set_memory_encrypted()/decrypted()
fails. Leak the pages if this happens.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Signed-off-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240311161558.1310-2-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Message-ID: <20240311161558.1310-2-mhklinux(a)outlook.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Xiangyu: bp to fix CVE-2024-36913, resolved minor conflicts]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/hv/connection.c | 66 ++++++++++++++++++++++++++---------------
1 file changed, 42 insertions(+), 24 deletions(-)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index da51b50787df..23fb0df9d350 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -243,8 +243,17 @@ int vmbus_connect(void)
ret |= set_memory_decrypted((unsigned long)
vmbus_connection.monitor_pages[1],
1);
- if (ret)
+ if (ret) {
+ /*
+ * If set_memory_decrypted() fails, the encryption state
+ * of the memory is unknown. So leak the memory instead
+ * of risking returning decrypted memory to the free list.
+ * For simplicity, always handle both pages the same.
+ */
+ vmbus_connection.monitor_pages[0] = NULL;
+ vmbus_connection.monitor_pages[1] = NULL;
goto cleanup;
+ }
/*
* Isolation VM with AMD SNP needs to access monitor page via
@@ -377,30 +386,39 @@ void vmbus_disconnect(void)
}
if (hv_is_isolation_supported()) {
- /*
- * memunmap() checks input address is ioremap address or not
- * inside. It doesn't unmap any thing in the non-SNP CVM and
- * so not check CVM type here.
- */
- memunmap(vmbus_connection.monitor_pages[0]);
- memunmap(vmbus_connection.monitor_pages[1]);
-
- set_memory_encrypted((unsigned long)
- vmbus_connection.monitor_pages_original[0],
- 1);
- set_memory_encrypted((unsigned long)
- vmbus_connection.monitor_pages_original[1],
- 1);
- }
+ if(vmbus_connection.monitor_pages[0]) {
+ /*
+ * memunmap() checks input address is ioremap address or not
+ * inside. It doesn't unmap any thing in the non-SNP CVM and
+ * so not check CVM type here.
+ */
+ memunmap(vmbus_connection.monitor_pages[0]);
+ if (!set_memory_encrypted((unsigned long)
+ vmbus_connection.monitor_pages_original[0], 1))
+ hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
+ vmbus_connection.monitor_pages_original[0] =
+ vmbus_connection.monitor_pages[0] = NULL;
+ }
+
+ if(vmbus_connection.monitor_pages[1]) {
+ memunmap(vmbus_connection.monitor_pages[1]);
+ if (!set_memory_encrypted((unsigned long)
+ vmbus_connection.monitor_pages_original[1], 1))
+ hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
+ vmbus_connection.monitor_pages_original[1] =
+ vmbus_connection.monitor_pages[1] = NULL;
+ }
+ } else {
- hv_free_hyperv_page((unsigned long)
- vmbus_connection.monitor_pages_original[0]);
- hv_free_hyperv_page((unsigned long)
- vmbus_connection.monitor_pages_original[1]);
- vmbus_connection.monitor_pages_original[0] =
- vmbus_connection.monitor_pages[0] = NULL;
- vmbus_connection.monitor_pages_original[1] =
- vmbus_connection.monitor_pages[1] = NULL;
+ hv_free_hyperv_page((unsigned long)
+ vmbus_connection.monitor_pages_original[0]);
+ hv_free_hyperv_page((unsigned long)
+ vmbus_connection.monitor_pages_original[1]);
+ vmbus_connection.monitor_pages_original[0] =
+ vmbus_connection.monitor_pages[0] = NULL;
+ vmbus_connection.monitor_pages_original[1] =
+ vmbus_connection.monitor_pages[1] = NULL;
+ }
}
/*
--
2.43.0
Setting TPM_CHIP_FLAG_SUSPENDED in the end of tpm_pm_suspend() can be racy
according to the bug report, as this leaves window for tpm_hwrng_read() to
be called while the operation is in progress.
To address this, lock the TPM chip before checking any possible flags.
This will guarantee that tpm_hwrng_read() and tpm_pm_suspend() won't
conflict with each other.
Cc: stable(a)vger.kernel.org # v6.4+
Fixes: 99d464506255 ("tpm: Prevent hwrng from activating during resume")
Reported-by: Mike Seo <mikeseohyungjin(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219383
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2:
- Addressed my own remark:
https://lore.kernel.org/linux-integrity/D59JAI6RR2CD.G5E5T4ZCZ49W@kernel.or…
---
drivers/char/tpm/tpm-interface.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 8134f002b121..e37fcf9361bc 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -370,6 +370,13 @@ int tpm_pm_suspend(struct device *dev)
if (!chip)
return -ENODEV;
+ rc = tpm_try_get_ops(chip);
+ if (rc) {
+ /* Can be safely set out of locks, as no action cannot race: */
+ chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
+ goto out;
+ }
+
if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED)
goto suspended;
@@ -377,23 +384,22 @@ int tpm_pm_suspend(struct device *dev)
!pm_suspend_via_firmware())
goto suspended;
- rc = tpm_try_get_ops(chip);
- if (!rc) {
- if (chip->flags & TPM_CHIP_FLAG_TPM2) {
- tpm2_end_auth_session(chip);
- tpm2_shutdown(chip, TPM2_SU_STATE);
- } else {
- rc = tpm1_pm_suspend(chip, tpm_suspend_pcr);
- }
-
- tpm_put_ops(chip);
+ if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+ tpm2_end_auth_session(chip);
+ tpm2_shutdown(chip, TPM2_SU_STATE);
+ goto suspended;
}
+ rc = tpm1_pm_suspend(chip, tpm_suspend_pcr);
+
suspended:
chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
+ tpm_put_ops(chip);
+out:
if (rc)
dev_err(dev, "Ignoring error %d while suspending\n", rc);
+
return 0;
}
EXPORT_SYMBOL_GPL(tpm_pm_suspend);
--
2.47.0
Do not continue on tpm2_create_primary() failure in tpm2_load_null().
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: eb24c9788cd9 ("tpm: disable the TPM if NULL name changes")
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v8:
- Fix stray character in a log message.
v7:
- No changes.
v6:
- Address Stefan's remark:
https://lore.kernel.org/linux-integrity/def4ec2d-584b-405f-9d5e-99267013c3c…
v5:
- Fix the TPM error code leak from tpm2_load_context().
v4:
- No changes.
v3:
- Update log messages. Previously the log message incorrectly stated
on load failure that integrity check had been failed, even tho the
check is done *after* the load operation.
v2:
- Refined the commit message.
- Reverted tpm2_create_primary() changes. They are not required if
tmp_null_key is used as the parameter.
---
drivers/char/tpm/tpm2-sessions.c | 44 +++++++++++++++++---------------
1 file changed, 24 insertions(+), 20 deletions(-)
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index a0306126e86c..950a3e48293b 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -915,33 +915,37 @@ static int tpm2_parse_start_auth_session(struct tpm2_auth *auth,
static int tpm2_load_null(struct tpm_chip *chip, u32 *null_key)
{
- int rc;
unsigned int offset = 0; /* dummy offset for null seed context */
u8 name[SHA256_DIGEST_SIZE + 2];
+ u32 tmp_null_key;
+ int rc;
rc = tpm2_load_context(chip, chip->null_key_context, &offset,
- null_key);
- if (rc != -EINVAL)
- return rc;
+ &tmp_null_key);
+ if (rc != -EINVAL) {
+ if (!rc)
+ *null_key = tmp_null_key;
+ goto err;
+ }
- /* an integrity failure may mean the TPM has been reset */
- dev_err(&chip->dev, "NULL key integrity failure!\n");
- /* check the null name against what we know */
- tpm2_create_primary(chip, TPM2_RH_NULL, NULL, name);
- if (memcmp(name, chip->null_key_name, sizeof(name)) == 0)
- /* name unchanged, assume transient integrity failure */
- return rc;
- /*
- * Fatal TPM failure: the NULL seed has actually changed, so
- * the TPM must have been illegally reset. All in-kernel TPM
- * operations will fail because the NULL primary can't be
- * loaded to salt the sessions, but disable the TPM anyway so
- * userspace programmes can't be compromised by it.
- */
- dev_err(&chip->dev, "NULL name has changed, disabling TPM due to interference\n");
+ /* Try to re-create null key, given the integrity failure: */
+ rc = tpm2_create_primary(chip, TPM2_RH_NULL, &tmp_null_key, name);
+ if (rc)
+ goto err;
+
+ /* Return null key if the name has not been changed: */
+ if (!memcmp(name, chip->null_key_name, sizeof(name))) {
+ *null_key = tmp_null_key;
+ return 0;
+ }
+
+ /* Deduce from the name change TPM interference: */
+ dev_err(&chip->dev, "null key integrity check failed\n");
+ tpm2_flush_context(chip, tmp_null_key);
chip->flags |= TPM_CHIP_FLAG_DISABLE;
- return rc;
+err:
+ return rc ? -ENODEV : 0;
}
/**
--
2.47.0
The patch titled
Subject: mm/mlock: set the correct prev on failure
has been added to the -mm mm-unstable branch. Its filename is
mm-mlock-set-the-correct-prev-on-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Wei Yang <richard.weiyang(a)gmail.com>
Subject: mm/mlock: set the correct prev on failure
Date: Sun, 27 Oct 2024 12:33:21 +0000
After commit 94d7d9233951 ("mm: abstract the vma_merge()/split_vma()
pattern for mprotect() et al."), if vma_modify_flags() return error, the
vma is set to an error code. This will lead to an invalid prev be
returned.
Generally this shouldn't matter as the caller should treat an error as
indicating state is now invalidated, however unfortunately
apply_mlockall_flags() does not check for errors and assumes that
mlock_fixup() correctly maintains prev even if an error were to occur.
This patch fixes that assumption.
[lorenzo.stoakes(a)oracle.com: provide a better fix and rephrase the log]
Link: https://lkml.kernel.org/r/20241027123321.19511-1-richard.weiyang@gmail.com
Fixes: 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mlock.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/mm/mlock.c~mm-mlock-set-the-correct-prev-on-failure
+++ a/mm/mlock.c
@@ -725,14 +725,17 @@ static int apply_mlockall_flags(int flag
}
for_each_vma(vmi, vma) {
+ int error;
vm_flags_t newflags;
newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= to_add;
- /* Ignore errors */
- mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
- newflags);
+ error = mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
+ newflags);
+ /* Ignore errors, but prev needs fixing up. */
+ if (error)
+ prev = vma;
cond_resched();
}
out:
_
Patches currently in -mm which might be from richard.weiyang(a)gmail.com are
maple_tree-i-is-always-less-than-or-equal-to-mas_end.patch
maple_tree-goto-complete-directly-on-a-pivot-of-0.patch
maple_tree-remove-maple_big_nodeparent.patch
maple_tree-memset-maple_big_node-as-a-whole.patch
maple_tree-root-node-could-be-handled-by-p_slot-too.patch
maple_tree-clear-request_count-for-new-allocated-one.patch
maple_tree-total-is-not-changed-for-nomem_one-case.patch
maple_tree-simplify-mas_push_node.patch
maple_tree-calculate-new_end-when-needed.patch
maple_tree-remove-sanity-check-from-mas_wr_slot_store.patch
mm-vma-the-pgoff-is-correct-if-can_merge_right.patch
mm-mlock-set-the-correct-prev-on-failure.patch
This series releases the 'soc' device_node when it is no longer required
by adding the missing calls to of_node_put() to make the fix compatible
with all affected stable kernels. Then, the more robust approach via
cleanup attribute is used to simplify the handling and prevent issues if
the loop gets new execution paths.
These issues were found while analyzing the code, and the patches have
been successfully compiled, but not tested on real hardware as I don't
have access to it. Any volunteering for testing is always more than
welcome.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
clk: renesas: cpg-mssr: fix 'soc' node handling in cpg_mssr_reserved_init()
clk: renesas: cpg-mssr: automate 'soc' node release in cpg_mssr_reserved_init()
drivers/clk/renesas/renesas-cpg-mssr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
base-commit: 86e3904dcdc7e70e3257fc1de294a1b75f3d8d04
change-id: 20241031-clk-renesas-cpg-mssr-cleanup-1933df63bc9c
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Apparently using non-posted DSB writes to update the legacy
LUT can cause CPU MMIO accesses to fail on TGL. Stop using
them for the legacy LUT updates, and instead switch to using
the double write approach (which is the other empirically
found workaround for the issue of DSB failing to correctly
update the legacy LUT).
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12494
Fixes: 25ea3411bd23 ("drm/i915/dsb: Use non-posted register writes for legacy LUT")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_color.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_color.c b/drivers/gpu/drm/i915/display/intel_color.c
index 174753625bca..aa50ecaf368d 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -1357,19 +1357,19 @@ static void ilk_load_lut_8(const struct intel_crtc_state *crtc_state,
lut = blob->data;
/*
- * DSB fails to correctly load the legacy LUT
- * unless we either write each entry twice,
- * or use non-posted writes
+ * DSB fails to correctly load the legacy LUT unless
+ * we either write each entry twice, or use non-posted
+ * writes. However using non-posted writes can cause
+ * CPU MMIO accesses to fail on TGL, so we choose to
+ * use the double write approach.
*/
- if (crtc_state->dsb_color_vblank)
- intel_dsb_nonpost_start(crtc_state->dsb_color_vblank);
-
- for (i = 0; i < 256; i++)
+ for (i = 0; i < 256; i++) {
ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i),
i9xx_lut_8(&lut[i]));
-
- if (crtc_state->dsb_color_vblank)
- intel_dsb_nonpost_end(crtc_state->dsb_color_vblank);
+ if (crtc_state->dsb_color_vblank)
+ ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i),
+ i9xx_lut_8(&lut[i]));
+ }
}
static void ilk_load_lut_10(const struct intel_crtc_state *crtc_state,
--
2.45.2
When we enter a signal handler we exit streaming mode in order to ensure
that signal handlers can run normal FPSIMD code, and while we're at it we
also clear PSTATE.ZA. Currently the code in setup_return() updates both the
in memory copy of the state and the register state. Not only is this
redundant it can also lead to corruption if we are preempted.
Consider two tasks on one CPU:
A: Begins signal entry in kernel mode, is preempted prior to SMSTOP.
B: Using SM and/or ZA in userspace with register state current on the
CPU, is preempted.
A: Scheduled in, no register state changes made as in kernel mode.
A: Executes SMSTOP, modifying live register state.
A: Scheduled out.
B: Scheduled in, fpsimd_thread_switch() sees the register state on the
CPU is tracked as being that for task B so the state is not reloaded
prior to returning to userspace.
Task B is now running with SM and ZA incorrectly cleared.
Fix this by check TIF_FOREIGN_FPSTATE and only updating one of the live
register context or the in memory copy when entering a signal handler.
Since this needs to happen atomically and all code that atomically
accesses FP state is in fpsimd.c also move the code there to ensure
consistency.
This race has been observed intermittently with fp-stress, especially
with preempt disabled.
Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/include/asm/fpsimd.h | 1 +
arch/arm64/kernel/fpsimd.c | 30 ++++++++++++++++++++++++++++++
arch/arm64/kernel/signal.c | 19 +------------------
3 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state);
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
+extern void fpsimd_enter_sighandler(void);
extern void fpsimd_signal_preserve_current_state(void);
extern void fpsimd_preserve_current_state(void);
extern void fpsimd_restore_current_state(void);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 77006df20a75aee7c991cf116b6d06bfe953d1a4..e6b086dc09f21e7f30df32ab4f6875b53c4228fd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1693,6 +1693,36 @@ void fpsimd_signal_preserve_current_state(void)
sve_to_fpsimd(current);
}
+/*
+ * Called by the signal handling code when preparing current to enter
+ * a signal handler. Currently this only needs to take care of exiting
+ * streaming mode and clearing ZA on SME systems.
+ */
+void fpsimd_enter_sighandler(void)
+{
+ if (!system_supports_sme())
+ return;
+
+ get_cpu_fpsimd_context();
+
+ if (test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ /* Exiting streaming mode zeros the FPSIMD state */
+ if (current->thread.svcr & SVCR_SM_MASK) {
+ memset(¤t->thread.uw.fpsimd_state, 0,
+ sizeof(current->thread.uw.fpsimd_state));
+ current->thread.fp_type = FP_STATE_FPSIMD;
+ }
+
+ current->thread.svcr &= ~(SVCR_ZA_MASK |
+ SVCR_SM_MASK);
+ } else {
+ /* The register state is current, just update it. */
+ sme_smstop();
+ }
+
+ put_cpu_fpsimd_context();
+}
+
/*
* Called by KVM when entering the guest.
*/
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 5619869475304776fc005fe24a385bf86bfdd253..fe07d0bd9f7978d73973f07ce38b7bdd7914abb2 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -1218,24 +1218,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
/* TCO (Tag Check Override) always cleared for signal handlers */
regs->pstate &= ~PSR_TCO_BIT;
- /* Signal handlers are invoked with ZA and streaming mode disabled */
- if (system_supports_sme()) {
- /*
- * If we were in streaming mode the saved register
- * state was SVE but we will exit SM and use the
- * FPSIMD register state - flush the saved FPSIMD
- * register state in case it gets loaded.
- */
- if (current->thread.svcr & SVCR_SM_MASK) {
- memset(¤t->thread.uw.fpsimd_state, 0,
- sizeof(current->thread.uw.fpsimd_state));
- current->thread.fp_type = FP_STATE_FPSIMD;
- }
-
- current->thread.svcr &= ~(SVCR_ZA_MASK |
- SVCR_SM_MASK);
- sme_smstop();
- }
+ fpsimd_enter_sighandler();
if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241023-arm64-fp-sme-sigentry-a2bd7187e71b
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series releases the np device_node when it is no longer required by
adding the missing calls to of_node_put() to make the fix compatible
with all affected stable kernels. Then, the more robust approach via
cleanup attribute is used to simplify the handling and prevent issues if
the loop gets new execution paths.
These issues were found while analyzing the code, and the patches have
been successfully compiled, but not tested on real hardware as I don't
have access to it. Any volunteering for testing is always more than
welcome.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
drivers: soc: atmel: fix device_node release in atmel_soc_device_init()
drivers: soc: atmel: use automatic cleanup for device_node in atmel_soc_device_init()
drivers/soc/atmel/soc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
base-commit: 86e3904dcdc7e70e3257fc1de294a1b75f3d8d04
change-id: 20241030-soc-atmel-soc-cleanup-8fcf3029bb28
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
This series fixes a missing call to of_node_put() in two steps: first
adding the call (compatible with all affected kernels), and then moving
to a more robust approach once the issue is fixed.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name()
Bluetooth: btbcm: automate node cleanup in btbcm_get_board_name()
drivers/bluetooth/btbcm.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
---
base-commit: 6fb2fa9805c501d9ade047fc511961f3273cdcb5
change-id: 20241030-bluetooth-btbcm-node-cleanup-23d21a73870c
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The mptcp_sched_find() function must be called with the RCU read lock
held, as it accesses RCU-protected data structures. This requirement was
not properly enforced in the mptcp_init_sock() function, leading to a
RCU list traversal in a non-reader section error when
CONFIG_PROVE_RCU_LIST is enabled.
net/mptcp/sched.c:44 RCU-list traversed in non-reader section!!
Fix it by acquiring the RCU read lock before calling the
mptcp_sched_find() function. This ensures that the function is invoked
with the necessary RCU protection in place, as it accesses RCU-protected
data structures.
Additionally, the patch breaks down the mptcp_init_sched() call into
smaller parts, with the RCU read lock only covering the specific call to
mptcp_sched_find(). This helps minimize the critical section, reducing
the time during which RCU grace periods are blocked.
The mptcp_sched_list_lock is not held in this case, and it is not clear
if it is necessary.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Fixes: 1730b2b2c5a5 ("mptcp: add sched in mptcp_sock")
Cc: stable(a)vger.kernel.org
---
net/mptcp/protocol.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6d0e201c3eb2..8ece630f80d4 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2854,6 +2854,7 @@ static void mptcp_ca_reset(struct sock *sk)
static int mptcp_init_sock(struct sock *sk)
{
struct net *net = sock_net(sk);
+ struct mptcp_sched_ops *sched;
int ret;
__mptcp_init_sock(sk);
@@ -2864,8 +2865,10 @@ static int mptcp_init_sock(struct sock *sk)
if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net))
return -ENOMEM;
- ret = mptcp_init_sched(mptcp_sk(sk),
- mptcp_sched_find(mptcp_get_scheduler(net)));
+ rcu_read_lock();
+ sched = mptcp_sched_find(mptcp_get_scheduler(net));
+ rcu_read_unlock();
+ ret = mptcp_init_sched(mptcp_sk(sk), sched);
if (ret)
return ret;
--
2.43.5
This patch series is to fix bugs for below APIs:
devm_phy_put()
devm_of_phy_provider_unregister()
devm_phy_destroy()
phy_get()
of_phy_get()
devm_phy_get()
devm_of_phy_get()
devm_of_phy_get_by_index()
And simplify below API:
of_phy_simple_xlate().
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Changes in v3:
- Correct commit message based on Johan's suggestions for patches 1/6-3/6.
- Use goto label solution suggested by Johan for patch 1/6, also correct
commit message and remove the inline comment for it.
- Link to v2: https://lore.kernel.org/r/20241024-phy_core_fix-v2-0-fc0c63dbfcf3@quicinc.c…
Changes in v2:
- Correct title, commit message, and inline comments.
- Link to v1: https://lore.kernel.org/r/20241020-phy_core_fix-v1-0-078062f7da71@quicinc.c…
---
Zijun Hu (6):
phy: core: Fix that API devm_phy_put() fails to release the phy
phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider
phy: core: Fix that API devm_phy_destroy() fails to destroy the phy
phy: core: Fix an OF node refcount leakage in _of_phy_get()
phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup()
phy: core: Simplify API of_phy_simple_xlate() implementation
drivers/phy/phy-core.c | 43 +++++++++++++++++++++----------------------
1 file changed, 21 insertions(+), 22 deletions(-)
---
base-commit: e70d2677ef4088d59158739d72b67ac36d1b132b
change-id: 20241020-phy_core_fix-e3ad65db98f7
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
Syzkaller reported a hung task with uevent_show() on stack trace. That
specific issue was addressed by another commit [0], but even with that
fix applied (for example, running v6.12-rc4) we face another type of hung
task that comes from the same reproducer [1]. By investigating that, we
could narrow it to the following path:
(a) Syzkaller emulates a Realtek USB WiFi adapter using raw-gadget and
dummy_hcd infrastructure.
(b) During the probe of rtl8192cu, the driver ends-up performing an efuse
read procedure (which is related to EEPROM load IIUC), and here lies the
issue: the function read_efuse() calls read_efuse_byte() many times, as
loop iterations depending on the efuse size (in our example, 512 in total).
This procedure for reading efuse bytes relies in a loop that performs an
I/O read up to *10k* times in case of failures. We measured the time of
the loop inside read_efuse_byte() alone, and in this reproducer (which
involves the dummy_hcd emulation layer), it takes 15 seconds each. As a
consequence, we have the driver stuck in its probe routine for big time,
exposing a stack trace like below if we attempt to reboot the system, for
example:
task:kworker/0:3 state:D stack:0 pid:662 tgid:662 ppid:2 flags:0x00004000
Workqueue: usb_hub_wq hub_event
Call Trace:
__schedule+0xe22/0xeb6
schedule_timeout+0xe7/0x132
__wait_for_common+0xb5/0x12e
usb_start_wait_urb+0xc5/0x1ef
? usb_alloc_urb+0x95/0xa4
usb_control_msg+0xff/0x184
_usbctrl_vendorreq_sync+0xa0/0x161
_usb_read_sync+0xb3/0xc5
read_efuse_byte+0x13c/0x146
read_efuse+0x351/0x5f0
efuse_read_all_map+0x42/0x52
rtl_efuse_shadow_map_update+0x60/0xef
rtl_get_hwinfo+0x5d/0x1c2
rtl92cu_read_eeprom_info+0x10a/0x8d5
? rtl92c_read_chip_version+0x14f/0x17e
rtl_usb_probe+0x323/0x851
usb_probe_interface+0x278/0x34b
really_probe+0x202/0x4a4
__driver_probe_device+0x166/0x1b2
driver_probe_device+0x2f/0xd8
[...]
We propose hereby to drastically reduce the attempts of doing the I/O read
in case of failures, from 10000 to 10. With that, we got reponsiveness in the
reproducer, while seems reasonable to believe that there's no sane device
implementation in the field requiring this amount of retries at every I/O
read in order to properly work. Based on that assumption it'd be good to
have it backported to stable but maybe not since driver implementation
(the 10k number comes from day 0), perhaps up to 6.x series makes sense.
[0] Commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race").
[1] A note about that: this syzkaller report presents multiple reproducers
that differs by the type of emulated USB device. For this specific case,
check the entry from 2024/08/08 06:23 in the list of crashes; the C repro
is available at https://syzkaller.appspot.com/text?tag=ReproC&x=1521fc83980000.
Cc: stable(a)vger.kernel.org # v6.1+
Reported-by: syzbot+edd9fe0d3a65b14588d5(a)syzkaller.appspotmail.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
drivers/net/wireless/realtek/rtlwifi/efuse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/efuse.c b/drivers/net/wireless/realtek/rtlwifi/efuse.c
index 82cf5fb5175f..2f75e376c0f6 100644
--- a/drivers/net/wireless/realtek/rtlwifi/efuse.c
+++ b/drivers/net/wireless/realtek/rtlwifi/efuse.c
@@ -178,7 +178,7 @@ void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf)
retry = 0;
value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]);
- while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < 10000)) {
+ while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < 10)) {
value32 = rtl_read_dword(rtlpriv,
rtlpriv->cfg->maps[EFUSE_CTRL]);
retry++;
--
2.46.2
Sometimes the hub driver does not recognize the USB device connected
to the external USB2.0 hub when the system resumes from S4.
After the SetPortFeature(PORT_RESET) request is completed, the hub
driver calls the HCD reset_device callback, which will issue a Reset
Device command and free all structures associated with endpoints
that were disabled.
This happens when the xHCI driver issue a Reset Device command to
inform the Etron xHCI host that the USB device associated with a
device slot has been reset. Seems that the Etron xHCI host can not
perform this command correctly, affecting the USB device.
To work around this, the xHCI driver should obtain a new device slot
with reference to commit 651aaf36a7d7 ("usb: xhci: Handle USB transaction
error on address command"), which is another way to inform the Etron
xHCI host that the USB device has been reset.
Add a new XHCI_ETRON_HOST quirk flag to invoke the workaround in
xhci_discover_or_reset_device().
Fixes: 2a8f82c4ceaf ("USB: xhci: Notify the xHC when a device is reset.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com>
---
drivers/usb/host/xhci-pci.c | 1 +
drivers/usb/host/xhci.c | 19 +++++++++++++++++++
drivers/usb/host/xhci.h | 1 +
3 files changed, 21 insertions(+)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 33a6d99afc10..ddc9a82cceec 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -397,6 +397,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
(pdev->device == PCI_DEVICE_ID_EJ168 ||
pdev->device == PCI_DEVICE_ID_EJ188)) {
+ xhci->quirks |= XHCI_ETRON_HOST;
xhci->quirks |= XHCI_RESET_ON_RESUME;
xhci->quirks |= XHCI_BROKEN_STREAMS;
}
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 899c0effb5d3..ef7ead6393d4 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3692,6 +3692,8 @@ void xhci_free_device_endpoint_resources(struct xhci_hcd *xhci,
xhci->num_active_eps);
}
+static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev);
+
/*
* This submits a Reset Device Command, which will set the device state to 0,
* set the device address to 0, and disable all the endpoints except the default
@@ -3762,6 +3764,23 @@ static int xhci_discover_or_reset_device(struct usb_hcd *hcd,
SLOT_STATE_DISABLED)
return 0;
+ if (xhci->quirks & XHCI_ETRON_HOST) {
+ /*
+ * Obtaining a new device slot to inform the xHCI host that
+ * the USB device has been reset.
+ */
+ ret = xhci_disable_slot(xhci, udev->slot_id);
+ xhci_free_virt_device(xhci, udev->slot_id);
+ if (!ret) {
+ ret = xhci_alloc_dev(hcd, udev);
+ if (ret == 1)
+ ret = 0;
+ else
+ ret = -EINVAL;
+ }
+ return ret;
+ }
+
trace_xhci_discover_or_reset_device(slot_ctx);
xhci_dbg(xhci, "Resetting device with slot ID %u\n", slot_id);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index f0fb696d5619..4f5b732e8944 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1624,6 +1624,7 @@ struct xhci_hcd {
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
+#define XHCI_ETRON_HOST BIT_ULL(49)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.25.1
This series addresses two issues in spm_cpuidle_register: a missing
device_node release in an error path, and releasing a reference to a
device after its usage.
These issues were found while analyzing the code, and the patches have
been successfully compiled and statically analyzed, but not tested on
real hardware as I don't have access to it. Any volunteering for testing
is always more than welcome.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
cpuidle: qcom-spm: fix device node release in spm_cpuidle_register
cpuidle: qcom-spm: fix platform device release in spm_cpuidle_register
drivers/cpuidle/cpuidle-qcom-spm.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
---
base-commit: 6fb2fa9805c501d9ade047fc511961f3273cdcb5
change-id: 20241029-cpuidle-qcom-spm-cleanup-6f03669bcd70
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The patch titled
Subject: mm/page_alloc: keep track of free highatomic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-keep-track-of-free-highatomic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/page_alloc: keep track of free highatomic
Date: Mon, 28 Oct 2024 12:26:53 -0600
OOM kills due to vastly overestimated free highatomic reserves were
observed:
... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
The second line above shows that the OOM kill was due to the following
condition:
free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
And the third line shows there were no free pages in any
MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'.
Therefore __zone_watermark_unusable_free() underestimated the usable free
memory by over 1GB, which resulted in the unnecessary OOM kill above.
The comments in __zone_watermark_unusable_free() warns about the potential
risk, i.e.,
If the caller does not have rights to reserves below the min
watermark then subtract the high-atomic reserves. This will
over-estimate the size of the atomic reserve but it avoids a search.
However, it is possible to keep track of free pages in reserved highatomic
pageblocks with a new per-zone counter nr_free_highatomic protected by the
zone lock, to avoid a search when calculating the usable free memory. And
the cost would be minimal, i.e., simple arithmetics in the highatomic
alloc/free/move paths.
Note that since nr_free_highatomic can be relatively small, using a
per-cpu counter might cause too much drift and defeat its purpose, in
addition to the extra memory overhead.
Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Link Lin <linkl(a)google.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 10 +++++++---
2 files changed, 8 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~mm-page_alloc-keep-track-of-free-highatomic
+++ a/include/linux/mmzone.h
@@ -823,6 +823,7 @@ struct zone {
unsigned long watermark_boost;
unsigned long nr_reserved_highatomic;
+ unsigned long nr_free_highatomic;
/*
* We don't know if the memory that we're going to allocate will be
--- a/mm/page_alloc.c~mm-page_alloc-keep-track-of-free-highatomic
+++ a/mm/page_alloc.c
@@ -635,6 +635,8 @@ compaction_capture(struct capture_contro
static inline void account_freepages(struct zone *zone, int nr_pages,
int migratetype)
{
+ lockdep_assert_held(&zone->lock);
+
if (is_migrate_isolate(migratetype))
return;
@@ -642,6 +644,9 @@ static inline void account_freepages(str
if (is_migrate_cma(migratetype))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
+
+ if (is_migrate_highatomic(migratetype))
+ WRITE_ONCE(zone->nr_free_highatomic, zone->nr_free_highatomic + nr_pages);
}
/* Used for pages not on another list */
@@ -3081,11 +3086,10 @@ static inline long __zone_watermark_unus
/*
* If the caller does not have rights to reserves below the min
- * watermark then subtract the high-atomic reserves. This will
- * over-estimate the size of the atomic reserve but it avoids a search.
+ * watermark then subtract the free pages reserved for highatomic.
*/
if (likely(!(alloc_flags & ALLOC_RESERVES)))
- unusable_free += z->nr_reserved_highatomic;
+ unusable_free += READ_ONCE(z->nr_free_highatomic);
#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-allow-set-clear-page_type-again.patch
mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch
mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch
mm-page_alloc-keep-track-of-free-highatomic.patch
From: Xiangyu Chen <xiangyu.chen(a)eng.windriver.com>
[ Upstream commit 15c2990e0f0108b9c3752d7072a97d45d4283aea ]
This commit adds null checks for the 'stream' and 'plane' variables in
the dcn30_apply_idle_power_optimizations function. These variables were
previously assumed to be null at line 922, but they were used later in
the code without checking if they were null. This could potentially lead
to a null pointer dereference, which would cause a crash.
The null checks ensure that 'stream' and 'plane' are not null before
they are used, preventing potential crashes.
Fixes the below static smatch checker:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:938 dcn30_apply_idle_power_optimizations() error: we previously assumed 'stream' could be null (see line 922)
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:940 dcn30_apply_idle_power_optimizations() error: we previously assumed 'plane' could be null (see line 922)
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Hersen Wu <hersenxs.wu(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[Xiangyu: Modified file path to backport this commit]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
index 407f7889e8fd..7a643690fdc7 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
@@ -762,6 +762,9 @@ bool dcn30_apply_idle_power_optimizations(struct dc *dc, bool enable)
stream = dc->current_state->streams[0];
plane = (stream ? dc->current_state->stream_status[0].plane_states[0] : NULL);
+ if (!stream || !plane)
+ return false;
+
if (stream && plane) {
cursor_cache_enable = stream->cursor_position.enable &&
plane->address.grph.cursor_cache_addr.quad_part;
--
2.43.0
Atomicity violation occurs when the fmc_send_cmd() function is executed
simultaneously with the modification of the fmdev->resp_skb value.
Consider a scenario where, after passing the validity check within the
function, a non-null fmdev->resp_skb variable is assigned a null value.
This results in an invalid fmdev->resp_skb variable passing the validity
check. As seen in the later part of the function, skb = fmdev->resp_skb;
when the invalid fmdev->resp_skb passes the check, a null pointer
dereference error may occur at line 478, evt_hdr = (void *)skb->data;
To address this issue, it is recommended to include the validity check of
fmdev->resp_skb within the locked section of the function. This
modification ensures that the value of fmdev->resp_skb does not change
during the validation process, thereby maintaining its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: e8454ff7b9a4 ("[media] drivers:media:radio: wl128x: FM Driver Common sources")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/media/radio/wl128x/fmdrv_common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/radio/wl128x/fmdrv_common.c b/drivers/media/radio/wl128x/fmdrv_common.c
index 3d36f323a8f8..4d032436691c 100644
--- a/drivers/media/radio/wl128x/fmdrv_common.c
+++ b/drivers/media/radio/wl128x/fmdrv_common.c
@@ -466,11 +466,12 @@ int fmc_send_cmd(struct fmdev *fmdev, u8 fm_op, u16 type, void *payload,
jiffies_to_msecs(FM_DRV_TX_TIMEOUT) / 1000);
return -ETIMEDOUT;
}
+ spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
if (!fmdev->resp_skb) {
+ spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
fmerr("Response SKB is missing\n");
return -EFAULT;
}
- spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
skb = fmdev->resp_skb;
fmdev->resp_skb = NULL;
spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
--
2.34.1
The atomicity violation issue is due to the invalidation of the function
port_has_data()'s check caused by concurrency. Imagine a scenario where a
port that contains data passes the validity check but is simultaneously
assigned a value with no data. This could result in an empty port passing
the validity check, potentially leading to a null pointer dereference
error later in the program, which is inconsistent.
To address this issue, we added a separate validity check for the variable
buf after its assignment. This ensures that an invalid buf does not proceed
further into the program, thereby preventing a null pointer dereference
error.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 203baab8ba31 ("virtio: console: Introduce function to hand off data from host to readers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
V2:
The logic of the fix has been modified.
---
drivers/char/virtio_console.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index c62b208b42f1..54fee192d93c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -660,6 +660,10 @@ static ssize_t fill_readbuf(struct port *port, u8 __user *out_buf,
return 0;
buf = port->inbuf;
+
+ if (!buf)
+ return 0;
+
out_count = min(out_count, buf->len - buf->offset);
if (to_user) {
--
2.34.1
The atomicity violation occurs when the variables cur_delay and new_delay
are defined. Imagine a scenario where, while defining cur_delay and
new_delay, the values stored in devfreq->profile->polling_ms and the delay
variable change. After acquiring the mutex_lock and entering the critical
section, due to possible concurrent modifications, cur_delay and new_delay
may no longer represent the correct values. Subsequent usage, such as if
(cur_delay > new_delay), could cause the program to run incorrectly,
resulting in inconsistencies.
If the read of devfreq->profile->polling_ms is not protected by the
lock, the cur_delay that enters the critical section would not store
the actual old value of devfreq->profile->polling_ms, which would
affect the subsequent checks like if (!cur_delay) and if (cur_delay >
new_delay), potentially causing the driver to perform incorrect
operations.
We believe that moving the read of devfreq->profile->polling_ms inside
the lock is beneficial as it ensures that cur_delay stores the true
old value of devfreq->profile->polling_ms, ensuring the correctness of
the later checks.
To address this issue, it is recommended to acquire a lock in advance,
ensuring that devfreq->profile->polling_ms and delay are protected by the
lock when being read. This will help ensure the consistency of the program.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 7e6fdd4bad03 ("PM / devfreq: Core updates to support devices which can idle")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
V2:
Added some descriptions to reduce misunderstandings.
Thanks to MyungJoo Ham for suggesting this improvement.
---
drivers/devfreq/devfreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 98657d3b9435..9634739fc9cb 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -616,10 +616,10 @@ EXPORT_SYMBOL(devfreq_monitor_resume);
*/
void devfreq_update_interval(struct devfreq *devfreq, unsigned int *delay)
{
+ mutex_lock(&devfreq->lock);
unsigned int cur_delay = devfreq->profile->polling_ms;
unsigned int new_delay = *delay;
- mutex_lock(&devfreq->lock);
devfreq->profile->polling_ms = new_delay;
if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
--
2.34.1
An atomicity violation occurs when the vc4_crtc_send_vblank function
executes simultaneously with modifications to crtc->state->event. Consider
a scenario where crtc->state->event is non-null, allowing it to pass the
validity check. However, at the same time, crtc->state->event might be set
to null. In this case, the validity check in vc4_crtc_send_vblank might act
on the old crtc->state->event (before locking), allowing invalid values to
pass the validity check, which could lead to a null pointer dereference.
In the drm_device structure, it is mentioned: "@event_lock: Protects
@vblank_event_list and event delivery in general." I believe that the
validity check and the subsequent null assignment operation are part
of the event delivery process, and all of these should be protected by
the event_lock. If there is no lock protection before the validity
check, it is possible for a null crtc->state->event to be passed into
the drm_crtc_send_vblank_event() function, leading to a null pointer
dereference error.
We have observed its callers and found that they are from the
drm_crtc_helper_funcs driver interface. We believe that functions
within driver interfaces can be concurrent, potentially causing a data
race on crtc->state->event.
To address this issue, it is recommended to include the validity check of
crtc->state and crtc->state->event within the locking section of the
function. This modification ensures that the values of crtc->state->event
and crtc->state do not change during the validation process, maintaining
their valid conditions.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 68e4a69aec4d ("drm/vc4: crtc: Create vblank reporting function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
V2:
The description of the patch has been modified to make it clearer.
Thanks to Simona Vetter for suggesting this improvement.
---
drivers/gpu/drm/vc4/vc4_crtc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 8b5a7e5eb146..98885f519827 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -575,10 +575,12 @@ void vc4_crtc_send_vblank(struct drm_crtc *crtc)
struct drm_device *dev = crtc->dev;
unsigned long flags;
- if (!crtc->state || !crtc->state->event)
+ spin_lock_irqsave(&dev->event_lock, flags);
+ if (!crtc->state || !crtc->state->event) {
+ spin_unlock_irqrestore(&dev->event_lock, flags);
return;
+ }
- spin_lock_irqsave(&dev->event_lock, flags);
drm_crtc_send_vblank_event(crtc, crtc->state->event);
crtc->state->event = NULL;
spin_unlock_irqrestore(&dev->event_lock, flags);
--
2.34.1
This series releases the np device_node when it is no longer required by
adding the missing calls to of_node_put() to make the fix compatible
with all affected stable kernels. Then, the more robust approach via
cleanup attribute is used to simplify the handling and prevent issues if
the loop gets new execution paths.
These issues were found while analyzing the code, and the patches have
been successfully compiled, but not tested on real hardware as I don't
have access to it. Any volunteering for testing is always more than
welcome.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
cpuidle: riscv-sbi: fix device node release in early exit of for_each_possible_cpu
cpuidle: riscv-sbi: use cleanup attribute for np in for_each_possible_cpu
drivers/cpuidle/cpuidle-riscv-sbi.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
---
base-commit: 6fb2fa9805c501d9ade047fc511961f3273cdcb5
change-id: 20241029-cpuidle-riscv-sbi-cleanup-e9b3cb96e16d
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Currently the condition ((rc != -ENOTSUPP) || (rc != -EINVAL)) is always
true because rc cannot be equal to two different values at the same time,
so it must be not equal to at least one of them. Fix the original commit
that introduced the issue.
This reverts commit 22a26cc6a51ef73dcfeb64c50513903f6b2d53d8.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
drivers/cpufreq/brcmstb-avs-cpufreq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/brcmstb-avs-cpufreq.c b/drivers/cpufreq/brcmstb-avs-cpufreq.c
index 5d03a295a085..2fd0f6be6fa3 100644
--- a/drivers/cpufreq/brcmstb-avs-cpufreq.c
+++ b/drivers/cpufreq/brcmstb-avs-cpufreq.c
@@ -474,8 +474,8 @@ static bool brcm_avs_is_firmware_loaded(struct private_data *priv)
rc = brcm_avs_get_pmap(priv, NULL);
magic = readl(priv->base + AVS_MBOX_MAGIC);
- return (magic == AVS_FIRMWARE_MAGIC) && ((rc != -ENOTSUPP) ||
- (rc != -EINVAL));
+ return (magic == AVS_FIRMWARE_MAGIC) && (rc != -ENOTSUPP) &&
+ (rc != -EINVAL);
}
static unsigned int brcm_avs_cpufreq_get(unsigned int cpu)
--
2.39.5