From: Andrew Davis <afd(a)ti.com>
[ Upstream commit 5ab90f40121a9f6a9b368274cd92d0f435dc7cfa ]
The syscon helper device_node_to_regmap() is used to fetch a regmap
registered to a device node. It also currently creates this regmap
if the node did not already have a regmap associated with it. This
should only be used on "syscon" nodes. This driver is not such a
device and instead uses device_node_to_regmap() on its own node as
a hacky way to create a regmap for itself.
This will not work going forward and so we should create our regmap
the normal way by defining our regmap_config, fetching our memory
resource, then using the normal regmap_init_mmio() function.
Signed-off-by: Andrew Davis <afd(a)ti.com>
Tested-by: Nishanth Menon <nm(a)ti.com>
Link: https://lore.kernel.org/r/20250123182234.597665-1-afd@ti.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/phy/ti/phy-gmii-sel.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/ti/phy-gmii-sel.c b/drivers/phy/ti/phy-gmii-sel.c
index e0ca59ae31531..ff5d5e29629fa 100644
--- a/drivers/phy/ti/phy-gmii-sel.c
+++ b/drivers/phy/ti/phy-gmii-sel.c
@@ -424,6 +424,12 @@ static int phy_gmii_sel_init_ports(struct phy_gmii_sel_priv *priv)
return 0;
}
+static const struct regmap_config phy_gmii_sel_regmap_cfg = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+};
+
static int phy_gmii_sel_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -468,7 +474,14 @@ static int phy_gmii_sel_probe(struct platform_device *pdev)
priv->regmap = syscon_node_to_regmap(node->parent);
if (IS_ERR(priv->regmap)) {
- priv->regmap = device_node_to_regmap(node);
+ void __iomem *base;
+
+ base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(base))
+ return dev_err_probe(dev, PTR_ERR(base),
+ "failed to get base memory resource\n");
+
+ priv->regmap = regmap_init_mmio(dev, base, &phy_gmii_sel_regmap_cfg);
if (IS_ERR(priv->regmap))
return dev_err_probe(dev, PTR_ERR(priv->regmap),
"Failed to get syscon\n");
--
2.39.5
From: Zhang Lixu <lixu.zhang(a)intel.com>
[ Upstream commit 4b54ae69197b9f416baa0fceadff7e89075f8454 ]
The timestamps in the Firmware log and HID sensor samples are incorrect.
They show 1970-01-01 because the current IPC driver only uses the first
8 bytes of bootup time when synchronizing time with the firmware. The
firmware converts the bootup time to UTC time, which results in the
display of 1970-01-01.
In write_ipc_from_queue(), when sending the MNG_SYNC_FW_CLOCK message,
the clock is updated according to the definition of ipc_time_update_msg.
However, in _ish_sync_fw_clock(), the message length is specified as the
size of uint64_t when building the doorbell. As a result, the firmware
only receives the first 8 bytes of struct ipc_time_update_msg.
This patch corrects the length in the doorbell to ensure the entire
ipc_time_update_msg is sent, fixing the timestamp issue.
Signed-off-by: Zhang Lixu <lixu.zhang(a)intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/intel-ish-hid/ipc/ipc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c b/drivers/hid/intel-ish-hid/ipc/ipc.c
index dd5fc60874ba1..b1a41c90c5741 100644
--- a/drivers/hid/intel-ish-hid/ipc/ipc.c
+++ b/drivers/hid/intel-ish-hid/ipc/ipc.c
@@ -577,14 +577,14 @@ static void fw_reset_work_fn(struct work_struct *unused)
static void _ish_sync_fw_clock(struct ishtp_device *dev)
{
static unsigned long prev_sync;
- uint64_t usec;
+ struct ipc_time_update_msg time = {};
if (prev_sync && time_before(jiffies, prev_sync + 20 * HZ))
return;
prev_sync = jiffies;
- usec = ktime_to_us(ktime_get_boottime());
- ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &usec, sizeof(uint64_t));
+ /* The fields of time would be updated while sending message */
+ ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &time, sizeof(time));
}
/**
--
2.39.5
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
Fix the order of the freq-table-hz property, then convert to OPP tables
and add interconnect support for UFS for the SM6350 SoC.
Signed-off-by: Luca Weiss <luca.weiss(a)fairphone.com>
---
Luca Weiss (3):
arm64: dts: qcom: sm6350: Fix wrong order of freq-table-hz for UFS
arm64: dts: qcom: sm6350: Add OPP table support to UFSHC
arm64: dts: qcom: sm6350: Add interconnect support to UFS
arch/arm64/boot/dts/qcom/sm6350.dtsi | 49 ++++++++++++++++++++++++++++--------
1 file changed, 39 insertions(+), 10 deletions(-)
---
base-commit: eea255893718268e1ab852fb52f70c613d109b99
change-id: 20250314-sm6350-ufs-things-53c5de9fec5e
Best regards,
--
Luca Weiss <luca.weiss(a)fairphone.com>
The retain_ff bit should be updated for a GDSC when it is under SW
control and ON. The current sequence needs to be fixed as the GDSC
needs to update retention and is moved to HW control which does not
guarantee the GDSC to be in enabled state.
During the GDSC FSM state, the GDSC hardware waits for an ACK and the
timeout for the ACK is 2000us as per design requirements.
Signed-off-by: Taniya Das <quic_tdas(a)quicinc.com>
---
Taniya Das (2):
clk: qcom: gdsc: Set retain_ff before moving to HW CTRL
clk: qcom: gdsc: Update the status poll timeout for GDSC
drivers/clk/qcom/gdsc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
---
base-commit: c674aa7c289e51659e40dda0f954886ef7f80042
change-id: 20250212-gdsc_fixes-77e8b8e27e2f
Best regards,
--
Taniya Das <quic_tdas(a)quicinc.com>
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
DSP expects the periods to be aligned to fragment sizes, currently
setting up to hw constriants on periods bytes is not going to work
correctly as we can endup with periods sizes aligned to 32 bytes however
not aligned to fragment size.
Update the constriants to use fragment size, and also set at step of
10ms for period size to accommodate DSP requirements of 10ms latency.
Fixes: 9b4fe0f1cd79 ("ASoC: qdsp6: audioreach: add q6apm-dai support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/qcom/qdsp6/q6apm-dai.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/sound/soc/qcom/qdsp6/q6apm-dai.c b/sound/soc/qcom/qdsp6/q6apm-dai.c
index 90cb24947f31..180ff24041bf 100644
--- a/sound/soc/qcom/qdsp6/q6apm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6apm-dai.c
@@ -385,13 +385,14 @@ static int q6apm_dai_open(struct snd_soc_component *component,
}
}
- ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, 32);
+ /* setup 10ms latency to accommodate DSP restrictions */
+ ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, 480);
if (ret < 0) {
dev_err(dev, "constraint for period bytes step ret = %d\n", ret);
goto err;
}
- ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, 32);
+ ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, 480);
if (ret < 0) {
dev_err(dev, "constraint for buffer bytes step ret = %d\n", ret);
goto err;
--
2.39.5
Hi there,
I wanted to check in to see if you've had a chance to review my previous message.
Your feedback would be greatly appreciated.
Best regards,
Kevin
________________________________
From: Kevin Martin
Sent: 12 March 2025 08:08
To: linux-stable-mirror(a)lists.linaro.org<mailto:linux-stable-mirror@lists.linaro.org>
Subject: Get premium contact lists
Hi there,
Hope you're having a great day!
Would you be interested in a recently verified list of NetApp clients to support your outreach?
Let me know, and I'll be happy to share the details.
Best regards,
Kevin Martin
Demand Consultant
If you wish to stop receiving emails, reply with Abolish.
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: bd50b1f5b6f3 ("arm64: dts: qcom: x1e80100: Add Compute Reference Device")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Cc: Rajendra Nayak <quic_rjendra(a)quicinc.com>
Cc: Sibi Sankar <quic_sibis(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1-crd.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1-crd.dtsi b/arch/arm64/boot/dts/qcom/x1-crd.dtsi
index 9e587dc57532..22ebcbe54e24 100644
--- a/arch/arm64/boot/dts/qcom/x1-crd.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1-crd.dtsi
@@ -610,6 +610,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -631,6 +632,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Rajendra Nayak <quic_rjendra(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
index ec594628304a..8f366bf61bbd 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
@@ -437,6 +437,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -458,6 +459,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 45247fe17db2 ("arm64: dts: qcom: x1e80100: add Lenovo Thinkpad Yoga slim 7x devicetree")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts b/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
index a3d53f2ba2c3..9d4ba9728355 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
@@ -290,6 +290,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l14b_3p0: ldo14 {
@@ -304,8 +305,8 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
-
};
regulators-1 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 6f18b8d4142c ("arm64: dts: qcom: x1e80100-hp-x14: dt for HP Omnibook X Laptop 14")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Jens Glathe <jens.glathe(a)oldschoolsolutions.biz>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts b/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
index cd860a246c45..ab5addb33b7a 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
@@ -633,6 +633,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -654,6 +655,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Note that these supplies currently have no consumers described in
mainline.
Fixes: f5b788d0e8cd ("arm64: dts: qcom: Add support for X1-based Dell XPS 13 9345")
Cc: stable(a)vger.kernel.org # 6.13
Reviewed-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
index 86e87f03b0ec..90f588ed7d63 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
@@ -359,6 +359,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -380,6 +381,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 7b8a31e82b87 ("arm64: dts: qcom: Add X1E001DE Snapdragon Devkit for Windows")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Sibi Sankar <quic_sibis(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e001de-devkit.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts b/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
index 5e3970b26e2f..f92bda2d34f2 100644
--- a/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
+++ b/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
@@ -507,6 +507,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -528,6 +529,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 7d1cbe2f4985 ("arm64: dts: qcom: Add X1E78100 ThinkPad T14s Gen 6")
Cc: stable(a)vger.kernel.org # 6.12
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi b/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
index eff0e73640bc..160c052db1ec 100644
--- a/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
@@ -456,6 +456,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -477,6 +478,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.48.1
From: Haibo Chen <haibo.chen(a)nxp.com>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Fixes: 4de349e786a3 ("can: flexcan: fix resume function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
Link: https://patch.msgid.link/20250314110145.899179-1-haibo.chen@nxp.com
Reported-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Closes: https://lore.kernel.org/all/20250314-married-polar-elephant-b15594-mkl@peng…
[mkl: add newlines]
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/flexcan/flexcan-core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/flexcan/flexcan-core.c b/drivers/net/can/flexcan/flexcan-core.c
index ac1a860986df..3a71fd235722 100644
--- a/drivers/net/can/flexcan/flexcan-core.c
+++ b/drivers/net/can/flexcan/flexcan-core.c
@@ -2266,8 +2266,9 @@ static int __maybe_unused flexcan_suspend(struct device *device)
}
netif_stop_queue(dev);
netif_device_detach(dev);
+
+ priv->can.state = CAN_STATE_SLEEPING;
}
- priv->can.state = CAN_STATE_SLEEPING;
return 0;
}
@@ -2278,7 +2279,6 @@ static int __maybe_unused flexcan_resume(struct device *device)
struct flexcan_priv *priv = netdev_priv(dev);
int err;
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
if (netif_running(dev)) {
netif_device_attach(dev);
netif_start_queue(dev);
@@ -2298,6 +2298,8 @@ static int __maybe_unused flexcan_resume(struct device *device)
flexcan_chip_interrupts_enable(dev);
}
+
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
}
return 0;
--
2.47.2
From: Haibo Chen <haibo.chen(a)nxp.com>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Fixes: 4de349e786a3 ("can: flexcan: fix resume function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
---
Changes for v3:
- only handle priv->can.state when netif_running(dev) return true in PM.
---
drivers/net/can/flexcan/flexcan-core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/flexcan/flexcan-core.c b/drivers/net/can/flexcan/flexcan-core.c
index b347a1c93536..d4d342d8f490 100644
--- a/drivers/net/can/flexcan/flexcan-core.c
+++ b/drivers/net/can/flexcan/flexcan-core.c
@@ -2299,8 +2299,8 @@ static int __maybe_unused flexcan_suspend(struct device *device)
}
netif_stop_queue(dev);
netif_device_detach(dev);
+ priv->can.state = CAN_STATE_SLEEPING;
}
- priv->can.state = CAN_STATE_SLEEPING;
return 0;
}
@@ -2311,7 +2311,6 @@ static int __maybe_unused flexcan_resume(struct device *device)
struct flexcan_priv *priv = netdev_priv(dev);
int err;
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
if (netif_running(dev)) {
netif_device_attach(dev);
netif_start_queue(dev);
@@ -2331,6 +2330,7 @@ static int __maybe_unused flexcan_resume(struct device *device)
flexcan_chip_interrupts_enable(dev);
}
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
}
return 0;
--
2.34.1
This patch series addresses 2 issues
1) Fix typo in pattern properties for R-Car V4M.
2) Fix page entries in the AFL list.
v2->v3:
* Collected tags.
* Dropped unused variables cfg and start from
rcar_canfd_configure_afl_rules().
v1->v2:
* Split fixes patches as separate series.
* Added Rb tag from Geert for binding patch.
* Added the tag Cc:stable@vger.kernel.org
Biju Das (2):
dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties
for R-Car V4M
can: rcar_canfd: Fix page entries in the AFL list
.../bindings/net/can/renesas,rcar-canfd.yaml | 2 +-
drivers/net/can/rcar/rcar_canfd.c | 28 ++++++++-----------
2 files changed, 12 insertions(+), 18 deletions(-)
--
2.43.0
Prepare vPMC registers for user-initiated changes after first run. This
is important specifically for debugging Windows on QEMU with GDB; QEMU
tries to write back all visible registers when resuming the VM execution
with GDB, corrupting the PMU state. Windows always uses the PMU so this
can cause adverse effects on that particular OS.
This series also contains patch "KVM: arm64: PMU: Set raw values from
user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}", which reverts semantic
changes made for the mentioned registers in the past. It is necessary
to migrate the PMU state properly on Firecracker, QEMU, and crosvm.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v4:
- Reverted changes for functions implementing ioctls in patch
"KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Removed kvm_pmu_vcpu_reset().
- Reordered function calls in kvm_vcpu_reload_pmu() for better style.
- Link to v3: https://lore.kernel.org/r/20250312-pmc-v3-0-0411cab5dc3d@daynix.com
Changes in v3:
- Added patch "KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Added an explanation of this path series' motivation to each patch.
- Explained why userspace register writes and register reset should be
covered in patch "KVM: arm64: PMU: Reload when user modifies
registers".
- Marked patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" for stable.
- Reoreded so that patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" would come first.
- Added patch "KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking
PMCNTENSET_EL0".
- Added patch "KVM: arm64: Reload PMCNTENSET_EL0".
- Link to v2: https://lore.kernel.org/r/20250307-pmc-v2-0-6c3375a5f1e4@daynix.com
Changes in v2:
- Changed to utilize KVM_REQ_RELOAD_PMU as suggested by Oliver Upton.
- Added patch "KVM: arm64: PMU: Reload when user modifies registers"
to cover more registers.
- Added patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}".
- Link to v1: https://lore.kernel.org/r/20250302-pmc-v1-1-caff989093dc@daynix.com
---
Akihiko Odaki (7):
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking PMCNTENSET_EL0
KVM: arm64: PMU: Reload PMCNTENSET_EL0
KVM: arm64: PMU: Reload when resetting
arch/arm64/kvm/arm.c | 8 ++++---
arch/arm64/kvm/pmu-emul.c | 60 ++++++++++++++---------------------------------
arch/arm64/kvm/reset.c | 3 ---
arch/arm64/kvm/sys_regs.c | 53 +++++++++++++++++++++++------------------
include/kvm/arm_pmu.h | 3 +--
5 files changed, 53 insertions(+), 74 deletions(-)
---
base-commit: da2f480cb24d39d480b1e235eda0dd2d01f8765b
change-id: 20250302-pmc-b90a86af945c
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This is a note to let you know that I've just added the patch titled
iio: adc: ad7768-1: Fix conversion result sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
From 8236644f5ecb180e80ad92d691c22bc509b747bb Mon Sep 17 00:00:00 2001
From: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Date: Thu, 6 Mar 2025 18:00:29 -0300
Subject: iio: adc: ad7768-1: Fix conversion result sign
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
Fixes: a5f8c7da3dbe ("iio: adc: Add AD7768-1 ADC basic support")
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt(a)analog.com>
Signed-off-by: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Signed-off-by: Jonathan Santos <Jonathan.Santos(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://patch.msgid.link/505994d3b71c2aa38ba714d909a68e021f12124c.174126812…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7768-1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index ea829c51e80b..09e7cccfd51c 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -142,7 +142,7 @@ static const struct iio_chan_spec ad7768_channels[] = {
.channel = 0,
.scan_index = 0,
.scan_type = {
- .sign = 'u',
+ .sign = 's',
.realbits = 24,
.storagebits = 32,
.shift = 8,
@@ -373,7 +373,7 @@ static int ad7768_read_raw(struct iio_dev *indio_dev,
iio_device_release_direct(indio_dev);
if (ret < 0)
return ret;
- *val = ret;
+ *val = sign_extend32(ret, chan->scan_type.realbits - 1);
return IIO_VAL_INT;
--
2.48.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad7768-1: Fix conversion result sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 8236644f5ecb180e80ad92d691c22bc509b747bb Mon Sep 17 00:00:00 2001
From: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Date: Thu, 6 Mar 2025 18:00:29 -0300
Subject: iio: adc: ad7768-1: Fix conversion result sign
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
Fixes: a5f8c7da3dbe ("iio: adc: Add AD7768-1 ADC basic support")
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt(a)analog.com>
Signed-off-by: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Signed-off-by: Jonathan Santos <Jonathan.Santos(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://patch.msgid.link/505994d3b71c2aa38ba714d909a68e021f12124c.174126812…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7768-1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index ea829c51e80b..09e7cccfd51c 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -142,7 +142,7 @@ static const struct iio_chan_spec ad7768_channels[] = {
.channel = 0,
.scan_index = 0,
.scan_type = {
- .sign = 'u',
+ .sign = 's',
.realbits = 24,
.storagebits = 32,
.shift = 8,
@@ -373,7 +373,7 @@ static int ad7768_read_raw(struct iio_dev *indio_dev,
iio_device_release_direct(indio_dev);
if (ret < 0)
return ret;
- *val = ret;
+ *val = sign_extend32(ret, chan->scan_type.realbits - 1);
return IIO_VAL_INT;
--
2.48.1
Currently on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only
available for KVM Guests running on PowerNV and not for the KVM guests
running on pSeries hypervisors. This prevents a pSeries L2 guest from
leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hcalls that results in slow startup times for large memory
guests.
Support for VFIO on pSeries was restored in commit f431a8cde7f1
("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries"),
making it possible to re-enable this capability on pSeries hosts.
This change enables KVM_CAP_SPAPR_TCE_VFIO for nested PAPR guests on
pSeries, while maintaining the existing behavior on PowerNV. Booting an
L2 guest with 128GB of memory shows an average 11% improvement in
startup time.
Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Amit Machhiwal <amachhiw(a)linux.ibm.com>
---
Changes since v2:
* Updated the patch description
* v2: https://lore.kernel.org/all/20250129094033.2265211-1-amachhiw@linux.ibm.com/
Changes since v1:
* Addressed review comments from Ritesh
* v1: https://lore.kernel.org/all/20250109132053.158436-1-amachhiw@linux.ibm.com/
arch/powerpc/kvm/powerpc.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ce1d91eed231..a7138eb18d59 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -543,26 +543,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !hv_enabled;
break;
#ifdef CONFIG_KVM_MPIC
case KVM_CAP_IRQ_MPIC:
r = 1;
break;
#endif
#ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
+ fallthrough;
case KVM_CAP_SPAPR_TCE_64:
- r = 1;
- break;
case KVM_CAP_SPAPR_TCE_VFIO:
- r = !!cpu_has_feature(CPU_FTR_HVMODE);
- break;
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
#ifdef CONFIG_KVM_XICS
case KVM_CAP_IRQ_XICS:
#endif
case KVM_CAP_PPC_GET_CPU_CHAR:
r = 1;
break;
#ifdef CONFIG_KVM_XIVE
base-commit: 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2
--
2.48.1
From: Thomas Weißschuh <linux(a)weissschuh.net>
commit fd53aa40e65f518453115b6f56183b0c201db26b upstream.
The ioctl and sysfs handlers unconditionally call the ->enable callback.
Not all drivers implement that callback, leading to NULL dereferences.
Example of affected drivers: ptp_s390.c, ptp_vclock.c and ptp_mock.c.
Instead use a dummy callback if no better was specified by the driver.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Richard Cochran <richardcochran(a)gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Link: https://patch.msgid.link/20250123-ptp-enable-v1-1-b015834d3a47@weissschuh.n…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
42704b26b0f1 ("ptp: Add cycles support for virtual clocks")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
drivers/ptp/ptp_clock.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 4d775cd8ee3c..c895e26b1f17 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -188,6 +188,11 @@ static void ptp_clock_release(struct device *dev)
kfree(ptp);
}
+static int ptp_enable(struct ptp_clock_info *ptp, struct ptp_clock_request *request, int on)
+{
+ return -EOPNOTSUPP;
+}
+
static void ptp_aux_kworker(struct kthread_work *work)
{
struct ptp_clock *ptp = container_of(work, struct ptp_clock,
@@ -233,6 +238,9 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
mutex_init(&ptp->pincfg_mux);
init_waitqueue_head(&ptp->tsev_wq);
+ if (!ptp->info->enable)
+ ptp->info->enable = ptp_enable;
+
if (ptp->info->do_aux_work) {
kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
ptp->kworker = kthread_create_worker(0, "ptp%d", ptp->index);
--
2.47.1
This patch series addresses a shutdown issue reported in [1].
This problem has been fixed on kernel 6.12 and later, kernel 6.6 is
the last kernel these upstream patches should go to because the Realtek
RTL8852BE chip supported by kernel since v6.2 is the only chip known to
have this problem.
[1] https://github.com/lwfinger/rtw89/issues/372
Zenm Chen (2):
wifi: rtw89: pci: add pre_deinit to be called after probe complete
wifi: rtw89: pci: disable PCIE wake bit when PCIE deinit
drivers/net/wireless/realtek/rtw89/core.c | 2 ++
drivers/net/wireless/realtek/rtw89/core.h | 6 ++++++
drivers/net/wireless/realtek/rtw89/pci.c | 10 ++++++++++
3 files changed, 18 insertions(+)
--
2.48.1
This series addresses GPU reset issues reported in [1], where running a
long compute job would trigger repeated GPU resets, leading to a UI
freeze.
Patches #1 and #2 prevent the same faulty job from being resubmitted in a
loop, mitigating the first cause of the issue.
However, the issue isn't entirely solved. Even with only a single GPU
reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
Patches #3 to #6 address this by properly configuring the V3D_SMS
registers, which are required for power management and resets in V3D 7.1.
Patch #7 updates the DT maintainership, replacing Emma with the current
v3d driver maintainer.
[1] https://github.com/raspberrypi/linux/issues/6660
Best Regards,
- Maíra
---
v1 -> v2:
- [1/6, 2/6, 5/6] Add Iago's R-b (Iago Toral)
- [3/6] Use V3D_GEN_* macros consistently throughout the driver (Phil Elwell)
- [3/6] Don't add Iago's R-b in 3/6 due to changes in the patch
- [4/6] Add per-compatible restrictions to enforce per‐SoC register rules (Conor Dooley)
- [6/6] Add Emma's A-b, collected through IRC (Emma Anholt)
- [6/6] Add Rob's A-b (Rob Herring)
- Link to v1: https://lore.kernel.org/r/20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@ig…
v2 -> v3:
- [3/7] Add Iago's R-b (Iago Toral)
- [4/7, 5/7] Separate the patches to ease the reviewing process -> Now,
PATCH 4/7 only adds the per-compatible rules and PATCH 5/7 adds the
SMS registers
- [4/7] `allOf` goes above `additionalProperties` (Krzysztof Kozlowski)
- [4/7, 5/7] Sync `reg` and `reg-names` items (Krzysztof Kozlowski)
- Link to v2: https://lore.kernel.org/r/20250308-v3d-gpu-reset-fixes-v2-0-2939c30f0cc4@ig…
v3 -> v4:
- [4/7] BCM2712 has an external reset controller, therefore the "bridge"
register is not needed (Krzysztof Kozlowski)
- [4/7] Remove the word "required" from the reg descriptions (Rob Herring)
- [5/7] Improve commit message (Rob Herring)
- Link to v3: https://lore.kernel.org/r/20250311-v3d-gpu-reset-fixes-v3-0-64f7a4247ec0@ig…
---
Maíra Canal (7):
drm/v3d: Don't run jobs that have errors flagged in its fence
drm/v3d: Set job pointer to NULL when the job's fence has an error
drm/v3d: Associate a V3D tech revision to all supported devices
dt-bindings: gpu: v3d: Add per-compatible register restrictions
dt-bindings: gpu: v3d: Add SMS register to BCM2712 compatible
drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x
dt-bindings: gpu: Add V3D driver maintainer as DT maintainer
.../devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 77 ++++++++++---
drivers/gpu/drm/v3d/v3d_debugfs.c | 126 ++++++++++-----------
drivers/gpu/drm/v3d/v3d_drv.c | 62 +++++++++-
drivers/gpu/drm/v3d/v3d_drv.h | 22 +++-
drivers/gpu/drm/v3d/v3d_gem.c | 27 ++++-
drivers/gpu/drm/v3d/v3d_irq.c | 6 +-
drivers/gpu/drm/v3d/v3d_perfmon.c | 4 +-
drivers/gpu/drm/v3d/v3d_regs.h | 26 +++++
drivers/gpu/drm/v3d/v3d_sched.c | 29 ++++-
9 files changed, 279 insertions(+), 100 deletions(-)
---
base-commit: 10646ddac2917b31c985ceff0e4982c42a9c924b
change-id: 20250224-v3d-gpu-reset-fixes-2d21fc70711d
Smatch noticed that inode_getblk() can return 1 on successful mapping of
a block instead of expected 0 after commit b405c1e58b73 ("udf: refactor
udf_next_aext() to handle error"). This could confuse some of the
callers and lead to strange failures (although the one reported by
Smatch in udf_mkdir() is impossible to trigger in practice). Fix the
return value of inode_getblk().
Link: https://lore.kernel.org/all/cb514af7-bbe0-435b-934f-dd1d7a16d2cd@stanley.mo…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/udf/inode.c | 1 +
1 file changed, 1 insertion(+)
I plan to merge this patch through my tree.
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 70c907fe8af9..4386dd845e40 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -810,6 +810,7 @@ static int inode_getblk(struct inode *inode, struct udf_map_rq *map)
}
map->oflags = UDF_BLK_MAPPED;
map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
+ ret = 0;
goto out_free;
}
--
2.43.0
There are two variables that indicate the interrupt type to be used
in the next test execution, "irq_type" as global and test->irq_type.
The global is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
The type set in this function isn't reflected in the global "irq_type",
so ioctl(PCITEST_GET_IRQTYPE) returns the previous type.
As a result, the wrong type is displayed in old "pcitest" as follows:
# pcitest -i 0
SET IRQ TYPE TO LEGACY: OKAY
# pcitest -I
GET IRQ TYPE: MSI
And new "pcitest" in kselftest results in an error as follows:
# RUN pci_ep_basic.LEGACY_IRQ_TEST ...
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Expected 0 (0) == ret (1)
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Can't get Legacy IRQ type
Fix this issue by propagating the current type to the global "irq_type".
Cc: stable(a)vger.kernel.org
Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype")
Reviewed-by: Niklas Cassel <cassel(a)kernel.org>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
---
drivers/misc/pci_endpoint_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index acf3d8dab131..896392c428de 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -833,6 +833,7 @@ static int pci_endpoint_test_set_irq(struct pci_endpoint_test *test,
return ret;
}
+ irq_type = test->irq_type;
return 0;
}
--
2.25.1
From: Sven Eckelmann <sven(a)narfation.org>
An OGMv1 and OGMv2 packet receive processing were not only limited by the
number of bytes in the received packet but also by the nodes maximum
aggregation packet size limit. But this limit is relevant for TX and not
for RX. It must not be enforced by batadv_(i)v_ogm_aggr_packet to avoid
loss of information in case of a different limit for sender and receiver.
This has a minor side effect for B.A.T.M.A.N. IV because the
batadv_iv_ogm_aggr_packet is also used for the preprocessing for the TX.
But since the aggregation code itself will not allow more than
BATADV_MAX_AGGREGATION_BYTES bytes, this check was never triggering (in
this context) prior of removing it.
Cc: stable(a)vger.kernel.org
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Fixes: 9323158ef9f4 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_iv_ogm.c | 3 +--
net/batman-adv/bat_v_ogm.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 07ae5dd1f150..b12645949ae5 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -325,8 +325,7 @@ batadv_iv_ogm_aggr_packet(int buff_pos, int packet_len,
/* check if there is enough space for the optional TVLV */
next_buff_pos += ntohs(ogm_packet->tvlv_len);
- return (next_buff_pos <= packet_len) &&
- (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES);
+ return next_buff_pos <= packet_len;
}
/* send a batman ogm to a given interface */
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c
index e503ee0d896b..8f89ffe6020c 100644
--- a/net/batman-adv/bat_v_ogm.c
+++ b/net/batman-adv/bat_v_ogm.c
@@ -839,8 +839,7 @@ batadv_v_ogm_aggr_packet(int buff_pos, int packet_len,
/* check if there is enough space for the optional TVLV */
next_buff_pos += ntohs(ogm2_packet->tvlv_len);
- return (next_buff_pos <= packet_len) &&
- (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES);
+ return next_buff_pos <= packet_len;
}
/**
--
2.39.5
From: Remi Pommarel <repk(a)triplefau.lt>
Since commit 4436df478860 ("batman-adv: Add flex array to struct
batadv_tvlv_tt_data"), the introduction of batadv_tvlv_tt_data's flex
array member in batadv_tt_tvlv_ogm_handler_v1() put tt_changes at
invalid offset. Those TT changes are supposed to be filled from the end
of batadv_tvlv_tt_data structure (including vlan_data flexible array),
but only the flex array size is taken into account missing completely
the size of the fixed part of the structure itself.
Fix the tt_change offset computation by using struct_size() instead of
flex_array_size() so both flex array member and its container structure
sizes are taken into account.
Cc: stable(a)vger.kernel.org
Fixes: 4436df478860 ("batman-adv: Add flex array to struct batadv_tvlv_tt_data")
Signed-off-by: Remi Pommarel <repk(a)triplefau.lt>
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/translation-table.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 760d51fdbdf6..7d5de4cbb814 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3959,23 +3959,21 @@ static void batadv_tt_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
struct batadv_tvlv_tt_change *tt_change;
struct batadv_tvlv_tt_data *tt_data;
u16 num_entries, num_vlan;
- size_t flex_size;
+ size_t tt_data_sz;
if (tvlv_value_len < sizeof(*tt_data))
return;
tt_data = tvlv_value;
- tvlv_value_len -= sizeof(*tt_data);
-
num_vlan = ntohs(tt_data->num_vlan);
- flex_size = flex_array_size(tt_data, vlan_data, num_vlan);
- if (tvlv_value_len < flex_size)
+ tt_data_sz = struct_size(tt_data, vlan_data, num_vlan);
+ if (tvlv_value_len < tt_data_sz)
return;
tt_change = (struct batadv_tvlv_tt_change *)((void *)tt_data
- + flex_size);
- tvlv_value_len -= flex_size;
+ + tt_data_sz);
+ tvlv_value_len -= tt_data_sz;
num_entries = batadv_tt_entries(tvlv_value_len);
--
2.39.5
From: Sven Eckelmann <sven(a)narfation.org>
The ELP worker needs to calculate new metric values for all neighbors
"reachable" over an interface. Some of the used metric sources require
locks which might need to sleep. This sleep is incompatible with the RCU
list iterator used for the recorded neighbors. The initial approach to work
around of this problem was to queue another work item per neighbor and then
run this in a new context.
Even when this solved the RCU vs might_sleep() conflict, it has a major
problems: Nothing was stopping the work item in case it is not needed
anymore - for example because one of the related interfaces was removed or
the batman-adv module was unloaded - resulting in potential invalid memory
accesses.
Directly canceling the metric worker also has various problems:
* cancel_work_sync for a to-be-deactivated interface is called with
rtnl_lock held. But the code in the ELP metric worker also tries to use
rtnl_lock() - which will never return in this case. This also means that
cancel_work_sync would never return because it is waiting for the worker
to finish.
* iterating over the neighbor list for the to-be-deactivated interface is
currently done using the RCU specific methods. Which means that it is
possible to miss items when iterating over it without the associated
spinlock - a behaviour which is acceptable for a periodic metric check
but not for a cleanup routine (which must "stop" all still running
workers)
The better approch is to get rid of the per interface neighbor metric
worker and handle everything in the interface worker. The original problems
are solved by:
* creating a list of neighbors which require new metric information inside
the RCU protected context, gathering the metric according to the new list
outside the RCU protected context
* only use rcu_trylock inside metric gathering code to avoid a deadlock
when the cancel_delayed_work_sync is called in the interface removal code
(which is called with the rtnl_lock held)
Cc: stable(a)vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_v.c | 2 --
net/batman-adv/bat_v_elp.c | 71 ++++++++++++++++++++++++++------------
net/batman-adv/bat_v_elp.h | 2 --
net/batman-adv/types.h | 3 --
4 files changed, 48 insertions(+), 30 deletions(-)
diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c
index ac11f1f08db0..d35479c465e2 100644
--- a/net/batman-adv/bat_v.c
+++ b/net/batman-adv/bat_v.c
@@ -113,8 +113,6 @@ static void
batadv_v_hardif_neigh_init(struct batadv_hardif_neigh_node *hardif_neigh)
{
ewma_throughput_init(&hardif_neigh->bat_v.throughput);
- INIT_WORK(&hardif_neigh->bat_v.metric_work,
- batadv_v_elp_throughput_metric_update);
}
/**
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 65e52de52bcd..b065578b4436 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -18,6 +18,7 @@
#include <linux/if_ether.h>
#include <linux/jiffies.h>
#include <linux/kref.h>
+#include <linux/list.h>
#include <linux/minmax.h>
#include <linux/netdevice.h>
#include <linux/nl80211.h>
@@ -26,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/rtnetlink.h>
#include <linux/skbuff.h>
+#include <linux/slab.h>
#include <linux/stddef.h>
#include <linux/string.h>
#include <linux/types.h>
@@ -41,6 +43,18 @@
#include "routing.h"
#include "send.h"
+/**
+ * struct batadv_v_metric_queue_entry - list of hardif neighbors which require
+ * and metric update
+ */
+struct batadv_v_metric_queue_entry {
+ /** @hardif_neigh: hardif neighbor scheduled for metric update */
+ struct batadv_hardif_neigh_node *hardif_neigh;
+
+ /** @list: list node for metric_queue */
+ struct list_head list;
+};
+
/**
* batadv_v_elp_start_timer() - restart timer for ELP periodic work
* @hard_iface: the interface for which the timer has to be reset
@@ -137,10 +151,17 @@ static bool batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh,
goto default_throughput;
}
+ /* only use rtnl_trylock because the elp worker will be cancelled while
+ * the rntl_lock is held. the cancel_delayed_work_sync() would otherwise
+ * wait forever when the elp work_item was started and it is then also
+ * trying to rtnl_lock
+ */
+ if (!rtnl_trylock())
+ return false;
+
/* if not a wifi interface, check if this device provides data via
* ethtool (e.g. an Ethernet adapter)
*/
- rtnl_lock();
ret = __ethtool_get_link_ksettings(hard_iface->net_dev, &link_settings);
rtnl_unlock();
if (ret == 0) {
@@ -175,31 +196,19 @@ static bool batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh,
/**
* batadv_v_elp_throughput_metric_update() - worker updating the throughput
* metric of a single hop neighbour
- * @work: the work queue item
+ * @neigh: the neighbour to probe
*/
-void batadv_v_elp_throughput_metric_update(struct work_struct *work)
+static void
+batadv_v_elp_throughput_metric_update(struct batadv_hardif_neigh_node *neigh)
{
- struct batadv_hardif_neigh_node_bat_v *neigh_bat_v;
- struct batadv_hardif_neigh_node *neigh;
u32 throughput;
bool valid;
- neigh_bat_v = container_of(work, struct batadv_hardif_neigh_node_bat_v,
- metric_work);
- neigh = container_of(neigh_bat_v, struct batadv_hardif_neigh_node,
- bat_v);
-
valid = batadv_v_elp_get_throughput(neigh, &throughput);
if (!valid)
- goto put_neigh;
+ return;
ewma_throughput_add(&neigh->bat_v.throughput, throughput);
-
-put_neigh:
- /* decrement refcounter to balance increment performed before scheduling
- * this task
- */
- batadv_hardif_neigh_put(neigh);
}
/**
@@ -273,14 +282,16 @@ batadv_v_elp_wifi_neigh_probe(struct batadv_hardif_neigh_node *neigh)
*/
static void batadv_v_elp_periodic_work(struct work_struct *work)
{
+ struct batadv_v_metric_queue_entry *metric_entry;
+ struct batadv_v_metric_queue_entry *metric_safe;
struct batadv_hardif_neigh_node *hardif_neigh;
struct batadv_hard_iface *hard_iface;
struct batadv_hard_iface_bat_v *bat_v;
struct batadv_elp_packet *elp_packet;
+ struct list_head metric_queue;
struct batadv_priv *bat_priv;
struct sk_buff *skb;
u32 elp_interval;
- bool ret;
bat_v = container_of(work, struct batadv_hard_iface_bat_v, elp_wq.work);
hard_iface = container_of(bat_v, struct batadv_hard_iface, bat_v);
@@ -316,6 +327,8 @@ static void batadv_v_elp_periodic_work(struct work_struct *work)
atomic_inc(&hard_iface->bat_v.elp_seqno);
+ INIT_LIST_HEAD(&metric_queue);
+
/* The throughput metric is updated on each sent packet. This way, if a
* node is dead and no longer sends packets, batman-adv is still able to
* react timely to its death.
@@ -340,16 +353,28 @@ static void batadv_v_elp_periodic_work(struct work_struct *work)
/* Reading the estimated throughput from cfg80211 is a task that
* may sleep and that is not allowed in an rcu protected
- * context. Therefore schedule a task for that.
+ * context. Therefore add it to metric_queue and process it
+ * outside rcu protected context.
*/
- ret = queue_work(batadv_event_workqueue,
- &hardif_neigh->bat_v.metric_work);
-
- if (!ret)
+ metric_entry = kzalloc(sizeof(*metric_entry), GFP_ATOMIC);
+ if (!metric_entry) {
batadv_hardif_neigh_put(hardif_neigh);
+ continue;
+ }
+
+ metric_entry->hardif_neigh = hardif_neigh;
+ list_add(&metric_entry->list, &metric_queue);
}
rcu_read_unlock();
+ list_for_each_entry_safe(metric_entry, metric_safe, &metric_queue, list) {
+ batadv_v_elp_throughput_metric_update(metric_entry->hardif_neigh);
+
+ batadv_hardif_neigh_put(metric_entry->hardif_neigh);
+ list_del(&metric_entry->list);
+ kfree(metric_entry);
+ }
+
restart_timer:
batadv_v_elp_start_timer(hard_iface);
out:
diff --git a/net/batman-adv/bat_v_elp.h b/net/batman-adv/bat_v_elp.h
index 9e2740195fa2..c9cb0a307100 100644
--- a/net/batman-adv/bat_v_elp.h
+++ b/net/batman-adv/bat_v_elp.h
@@ -10,7 +10,6 @@
#include "main.h"
#include <linux/skbuff.h>
-#include <linux/workqueue.h>
int batadv_v_elp_iface_enable(struct batadv_hard_iface *hard_iface);
void batadv_v_elp_iface_disable(struct batadv_hard_iface *hard_iface);
@@ -19,6 +18,5 @@ void batadv_v_elp_iface_activate(struct batadv_hard_iface *primary_iface,
void batadv_v_elp_primary_iface_set(struct batadv_hard_iface *primary_iface);
int batadv_v_elp_packet_recv(struct sk_buff *skb,
struct batadv_hard_iface *if_incoming);
-void batadv_v_elp_throughput_metric_update(struct work_struct *work);
#endif /* _NET_BATMAN_ADV_BAT_V_ELP_H_ */
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 04f6398b3a40..85a50096f5b2 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -596,9 +596,6 @@ struct batadv_hardif_neigh_node_bat_v {
* neighbor
*/
unsigned long last_unicast_tx;
-
- /** @metric_work: work queue callback item for metric update */
- struct work_struct metric_work;
};
/**
--
2.39.5
From: Andy Strohman <andrew(a)andrewstrohman.com>
Reference counting is used to ensure that
batadv_hardif_neigh_node and batadv_hard_iface
are not freed before/during
batadv_v_elp_throughput_metric_update work is
finished.
But there isn't a guarantee that the hard if will
remain associated with a soft interface up until
the work is finished.
This fixes a crash triggered by reboot that looks
like this:
Call trace:
batadv_v_mesh_free+0xd0/0x4dc [batman_adv]
batadv_v_elp_throughput_metric_update+0x1c/0xa4
process_one_work+0x178/0x398
worker_thread+0x2e8/0x4d0
kthread+0xd8/0xdc
ret_from_fork+0x10/0x20
(the batadv_v_mesh_free call is misleading,
and does not actually happen)
I was able to make the issue happen more reliably
by changing hardif_neigh->bat_v.metric_work work
to be delayed work. This allowed me to track down
and confirm the fix.
Cc: stable(a)vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Andy Strohman <andrew(a)andrewstrohman.com>
[sven(a)narfation.org: prevent entering batadv_v_elp_get_throughput without
soft_iface]
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_v_elp.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 1d704574e6bf..fbf499bcc671 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -66,12 +66,19 @@ static void batadv_v_elp_start_timer(struct batadv_hard_iface *hard_iface)
static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
{
struct batadv_hard_iface *hard_iface = neigh->if_incoming;
+ struct net_device *soft_iface = hard_iface->soft_iface;
struct ethtool_link_ksettings link_settings;
struct net_device *real_netdev;
struct station_info sinfo;
u32 throughput;
int ret;
+ /* don't query throughput when no longer associated with any
+ * batman-adv interface
+ */
+ if (!soft_iface)
+ return BATADV_THROUGHPUT_DEFAULT_VALUE;
+
/* if the user specified a customised value for this interface, then
* return it directly
*/
@@ -141,7 +148,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
default_throughput:
if (!(hard_iface->bat_v.flags & BATADV_WARNING_DEFAULT)) {
- batadv_info(hard_iface->soft_iface,
+ batadv_info(soft_iface,
"WiFi driver or ethtool info does not provide information about link speeds on interface %s, therefore defaulting to hardcoded throughput values of %u.%1u Mbps. Consider overriding the throughput manually or checking your driver.\n",
hard_iface->net_dev->name,
BATADV_THROUGHPUT_DEFAULT_VALUE / 10,
--
2.39.5
Backport of a similar change from commit 5ac9b4e935df ("lib/buildid:
Handle memfd_secret() files in build_id_parse()") to address an issue
where accessing secret memfd contents through build_id_parse() would
trigger faults.
Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
This repro will cause BUG: unable to handle kernel paging request in
build_id_parse in 5.15/6.1/6.6.
Some other discussions can be found in [1].
[1] https://lore.kernel.org/bpf/20241104175256.2327164-1-jolsa@kernel.org/T/#u
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..9db35305f257 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 927e926d72d9155fde3264459fe9bfd7b5e40d28
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030948-playhouse-strongman-c9c3@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 927e926d72d9155fde3264459fe9bfd7b5e40d28 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Wed, 26 Feb 2025 10:55:09 -0800
Subject: [PATCH] userfaultfd: fix PTE unmapping stack-allocated PTE copies
Current implementation of move_pages_pte() copies source and destination
PTEs in order to detect concurrent changes to PTEs involved in the move.
However these copies are also used to unmap the PTEs, which will fail if
CONFIG_HIGHPTE is enabled because the copies are allocated on the stack.
Fix this by using the actual PTEs which were kmap()ed.
Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index f5c6b3454f76..d06453fa8aba 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1290,8 +1290,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
spin_unlock(src_ptl);
if (!locked) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
@@ -1307,8 +1307,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
/* split_folio() can block */
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
@@ -1333,8 +1333,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
goto out;
}
if (!anon_vma_trylock_write(src_anon_vma)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
anon_vma_lock_write(src_anon_vma);
@@ -1352,8 +1352,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
@@ -1396,8 +1396,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
src_folio = folio;
src_folio_pte = orig_src_pte;
if (!folio_trylock(src_folio)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
put_swap_device(si);
si = NULL;
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 927e926d72d9155fde3264459fe9bfd7b5e40d28
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030947-disloyal-bust-0d23@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 927e926d72d9155fde3264459fe9bfd7b5e40d28 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Wed, 26 Feb 2025 10:55:09 -0800
Subject: [PATCH] userfaultfd: fix PTE unmapping stack-allocated PTE copies
Current implementation of move_pages_pte() copies source and destination
PTEs in order to detect concurrent changes to PTEs involved in the move.
However these copies are also used to unmap the PTEs, which will fail if
CONFIG_HIGHPTE is enabled because the copies are allocated on the stack.
Fix this by using the actual PTEs which were kmap()ed.
Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index f5c6b3454f76..d06453fa8aba 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1290,8 +1290,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
spin_unlock(src_ptl);
if (!locked) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
@@ -1307,8 +1307,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
/* split_folio() can block */
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
@@ -1333,8 +1333,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
goto out;
}
if (!anon_vma_trylock_write(src_anon_vma)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
anon_vma_lock_write(src_anon_vma);
@@ -1352,8 +1352,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
@@ -1396,8 +1396,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
src_folio = folio;
src_folio_pte = orig_src_pte;
if (!folio_trylock(src_folio)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
put_swap_device(si);
si = NULL;
On Thu, Mar 13, 2025 at 03:20:34PM +0100, Christian König wrote:
> Am 11.03.25 um 20:01 schrieb Qasim Ijaz:
> > In the ttm_bo_unreserve_bulk() test function, resv is allocated
> > using kunit_kzalloc(), but the subsequent assertion mistakenly
> > verifies the ttm_dev pointer instead of checking the resv pointer.
> > This mistake means that if allocation for resv fails, the error will
> > go undetected, resv will be NULL and a call to dma_resv_init(resv)
>
> The description here is correct, but the subject line is a bit misleading.
>
> Please use something like this instead "drm/ttm/tests: incorrect assert in ttm_bo_unreserve_bulk()".
>
> > will dereference a NULL pointer.
>
> That irrelevant, an allocation failure will result in a NULL pointer deref anyway. This is just an unit test.
>
> >
> > Fix the assertion to properly verify the resv pointer.
> >
> > Fixes: 588c4c8d58c4 ("drm/ttm/tests: Fix a warning in ttm_bo_unreserve_bulk")
> > Cc: stable(a)vger.kernel.org
>
> Please drop those tags. This is just an unit test, not relevant for stability and therefore shouldn't be backported.
>
> Regards,
> Christian.
>
Thank you for the feedback Christian, I will resend a new patch with the
changes you described.
Thanks,
Qasim.
> > Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
> > ---
> > drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
> > index f8f20d2f6174..e08e5a138420 100644
> > --- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
> > +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
> > @@ -340,7 +340,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
> > KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
> >
> > resv = kunit_kzalloc(test, sizeof(*resv), GFP_KERNEL);
> > - KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
> > + KUNIT_ASSERT_NOT_NULL(test, resv);
> >
> > err = ttm_device_kunit_init(priv, ttm_dev, false, false);
> > KUNIT_ASSERT_EQ(test, err, 0);
>
Commit 5afd032961e8 "perf cs-etm: Don't flush when packet_queue fills
up" uses i as a loop counter in cs_etm__process_queues(). It was
backported to the 5.4 and 5.10 stable branches, but the i variable
doesn't exist there as it was only added in 5.15.
Declare i with the expected type.
Fixes: 1ed167325c32 ("perf cs-etm: Don't flush when packet_queue fills up")
Fixes: 26db806fa23e ("perf cs-etm: Don't flush when packet_queue fills up")
Signed-off-by: Ben Hutchings <benh(a)debian.org>
---
tools/perf/util/cs-etm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index e3fa32b83367..2055d582a8a4 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2171,7 +2171,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
static int cs_etm__process_queues(struct cs_etm_auxtrace *etm)
{
int ret = 0;
- unsigned int cs_queue_nr, queue_nr;
+ unsigned int cs_queue_nr, queue_nr, i;
u8 trace_chan_id;
u64 timestamp;
struct auxtrace_queue *queue;
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().
Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Cc: stable(a)vger.kernel.org # 6.11+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index da327ab48a57..02bc2eddf0c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -2413,7 +2413,7 @@ static int gfx_v12_0_cp_gfx_load_me_microcode_rs64(struct amdgpu_device *adev)
(void **)&adev->gfx.me.me_fw_data_ptr);
if (r) {
dev_err(adev->dev, "(%d) failed to create me data bo\n", r);
- gfx_v12_0_pfp_fini(adev);
+ gfx_v12_0_me_fini(adev);
return r;
}
--
2.42.0.windows.2
In the ttm_bo_unreserve_bulk() test function, resv is allocated
using kunit_kzalloc(), but the subsequent assertion mistakenly
verifies the ttm_dev pointer instead of checking the resv pointer.
This mistake means that if allocation for resv fails, the error will
go undetected, resv will be NULL and a call to dma_resv_init(resv)
will dereference a NULL pointer.
Fix the assertion to properly verify the resv pointer.
Fixes: 588c4c8d58c4 ("drm/ttm/tests: Fix a warning in ttm_bo_unreserve_bulk")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
index f8f20d2f6174..e08e5a138420 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
@@ -340,7 +340,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
resv = kunit_kzalloc(test, sizeof(*resv), GFP_KERNEL);
- KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
+ KUNIT_ASSERT_NOT_NULL(test, resv);
err = ttm_device_kunit_init(priv, ttm_dev, false, false);
KUNIT_ASSERT_EQ(test, err, 0);
--
2.39.5
The PWM allow configuring the PWM resolution from 8 bits PWM
values up to 15 bits values, for the Hi-Res PWMs, and then either
6-bit or 9-bit for the normal PWMs. The current implementation loops
through all possible resolutions (PWM sizes), for the PWM subtype, on top
of the already existing process of determining the prediv, exponent and
refclk.
The first and second issues are related to capping the computed PWM
value.
The third issue is that it uses the wrong maximum possible PWM
value for determining the best matched period.
Fix all of them.
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v4:
- Rebased on next-20250305
- Re-worded the commit messages for the first two patches to include an
example that should better explain what the issue that is being fixed
is. As per Uwe's request.
- Link to v3: https://lore.kernel.org/r/20250303-leds-qcom-lpg-fix-max-pwm-on-hi-res-v3-0…
Changes in v3:
- Added a new patch that fixes the normal PWMs, since they now support
6-bit resolution as well. Added it as first patch.
- Re-worded the second patch. Included Bjorn's suggestion and R-b tag.
- Link to v2: https://lore.kernel.org/r/20250226-leds-qcom-lpg-fix-max-pwm-on-hi-res-v2-0…
Changes in v2:
- Re-worded the commit to drop the details that are not important
w.r.t. what the patch is fixing.
- Added another patch which fixes the resolution used for determining
best matched period and PWM config.
- Link to v1: https://lore.kernel.org/r/20250220-leds-qcom-lpg-fix-max-pwm-on-hi-res-v1-1…
---
Abel Vesa (3):
leds: rgb: leds-qcom-lpg: Fix pwm resolution max for normal PWMs
leds: rgb: leds-qcom-lpg: Fix pwm resolution max for Hi-Res PWMs
leds: rgb: leds-qcom-lpg: Fix calculation of best period Hi-Res PWMs
drivers/leds/rgb/leds-qcom-lpg.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
---
base-commit: 7ec162622e66a4ff886f8f28712ea1b13069e1aa
change-id: 20250220-leds-qcom-lpg-fix-max-pwm-on-hi-res-067e8782a79b
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 3e385c0d6ce88ac9916dcf84267bd5855d830748
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030957-magnetism-lustily-55d9@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3e385c0d6ce88ac9916dcf84267bd5855d830748 Mon Sep 17 00:00:00 2001
From: Alexey Kardashevskiy <aik(a)amd.com>
Date: Fri, 7 Mar 2025 12:37:00 +1100
Subject: [PATCH] virt: sev-guest: Move SNP Guest Request data pages handling
under snp_cmd_mutex
Compared to the SNP Guest Request, the "Extended" version adds data pages for
receiving certificates. If not enough pages provided, the HV can report to the
VM how much is needed so the VM can reallocate and repeat.
Commit
ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")
moved handling of the allocated/desired pages number out of scope of said
mutex and create a possibility for a race (multiple instances trying to
trigger Extended request in a VM) as there is just one instance of
snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex.
Fix the issue by moving the data blob/size and the GHCB input struct
(snp_req_data) into snp_guest_req which is allocated on stack now and accessed
by the GHCB caller under that mutex.
Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four
callers needs it. Free the received blob in get_ext_report() right after it is
copied to the userspace. Possible future users of snp_send_guest_request() are
likely to have different ideas about the buffer size anyways.
Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")
Signed-off-by: Alexey Kardashevskiy <aik(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj(a)amd.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 82492efc5d94..96c7bc698e6b 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2853,19 +2853,8 @@ struct snp_msg_desc *snp_msg_alloc(void)
if (!mdesc->response)
goto e_free_request;
- mdesc->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
- if (!mdesc->certs_data)
- goto e_free_response;
-
- /* initial the input address for guest request */
- mdesc->input.req_gpa = __pa(mdesc->request);
- mdesc->input.resp_gpa = __pa(mdesc->response);
- mdesc->input.data_gpa = __pa(mdesc->certs_data);
-
return mdesc;
-e_free_response:
- free_shared_pages(mdesc->response, sizeof(struct snp_guest_msg));
e_free_request:
free_shared_pages(mdesc->request, sizeof(struct snp_guest_msg));
e_unmap:
@@ -2885,7 +2874,6 @@ void snp_msg_free(struct snp_msg_desc *mdesc)
kfree(mdesc->ctx);
free_shared_pages(mdesc->response, sizeof(struct snp_guest_msg));
free_shared_pages(mdesc->request, sizeof(struct snp_guest_msg));
- free_shared_pages(mdesc->certs_data, SEV_FW_BLOB_MAX_SIZE);
iounmap((__force void __iomem *)mdesc->secrets);
memset(mdesc, 0, sizeof(*mdesc));
@@ -3054,7 +3042,7 @@ static int __handle_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_r
* sequence number must be incremented or the VMPCK must be deleted to
* prevent reuse of the IV.
*/
- rc = snp_issue_guest_request(req, &mdesc->input, rio);
+ rc = snp_issue_guest_request(req, &req->input, rio);
switch (rc) {
case -ENOSPC:
/*
@@ -3064,7 +3052,7 @@ static int __handle_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_r
* order to increment the sequence number and thus avoid
* IV reuse.
*/
- override_npages = mdesc->input.data_npages;
+ override_npages = req->input.data_npages;
req->exit_code = SVM_VMGEXIT_GUEST_REQUEST;
/*
@@ -3120,7 +3108,7 @@ static int __handle_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_r
}
if (override_npages)
- mdesc->input.data_npages = override_npages;
+ req->input.data_npages = override_npages;
return rc;
}
@@ -3158,6 +3146,11 @@ int snp_send_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_req *req
*/
memcpy(mdesc->request, &mdesc->secret_request, sizeof(mdesc->secret_request));
+ /* Initialize the input address for guest request */
+ req->input.req_gpa = __pa(mdesc->request);
+ req->input.resp_gpa = __pa(mdesc->response);
+ req->input.data_gpa = req->certs_data ? __pa(req->certs_data) : 0;
+
rc = __handle_guest_request(mdesc, req, rio);
if (rc) {
if (rc == -EIO &&
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 1581246491b5..ba7999f66abe 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -203,6 +203,9 @@ struct snp_guest_req {
unsigned int vmpck_id;
u8 msg_version;
u8 msg_type;
+
+ struct snp_req_data input;
+ void *certs_data;
};
/*
@@ -263,9 +266,6 @@ struct snp_msg_desc {
struct snp_guest_msg secret_request, secret_response;
struct snp_secrets_page *secrets;
- struct snp_req_data input;
-
- void *certs_data;
struct aesgcm_ctx *ctx;
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index 23ac177472be..70fbc9a3e703 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -176,6 +176,7 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
struct snp_guest_req req = {};
int ret, npages = 0, resp_len;
sockptr_t certs_address;
+ struct page *page;
if (sockptr_is_null(io->req_data) || sockptr_is_null(io->resp_data))
return -EINVAL;
@@ -209,8 +210,20 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
* the host. If host does not supply any certs in it, then copy
* zeros to indicate that certificate data was not provided.
*/
- memset(mdesc->certs_data, 0, report_req->certs_len);
npages = report_req->certs_len >> PAGE_SHIFT;
+ page = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO,
+ get_order(report_req->certs_len));
+ if (!page)
+ return -ENOMEM;
+
+ req.certs_data = page_address(page);
+ ret = set_memory_decrypted((unsigned long)req.certs_data, npages);
+ if (ret) {
+ pr_err("failed to mark page shared, ret=%d\n", ret);
+ __free_pages(page, get_order(report_req->certs_len));
+ return -EFAULT;
+ }
+
cmd:
/*
* The intermediate response buffer is used while decrypting the
@@ -219,10 +232,12 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
*/
resp_len = sizeof(report_resp->data) + mdesc->ctx->authsize;
report_resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
- if (!report_resp)
- return -ENOMEM;
+ if (!report_resp) {
+ ret = -ENOMEM;
+ goto e_free_data;
+ }
- mdesc->input.data_npages = npages;
+ req.input.data_npages = npages;
req.msg_version = arg->msg_version;
req.msg_type = SNP_MSG_REPORT_REQ;
@@ -237,7 +252,7 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
/* If certs length is invalid then copy the returned length */
if (arg->vmm_error == SNP_GUEST_VMM_ERR_INVALID_LEN) {
- report_req->certs_len = mdesc->input.data_npages << PAGE_SHIFT;
+ report_req->certs_len = req.input.data_npages << PAGE_SHIFT;
if (copy_to_sockptr(io->req_data, report_req, sizeof(*report_req)))
ret = -EFAULT;
@@ -246,7 +261,7 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
if (ret)
goto e_free;
- if (npages && copy_to_sockptr(certs_address, mdesc->certs_data, report_req->certs_len)) {
+ if (npages && copy_to_sockptr(certs_address, req.certs_data, report_req->certs_len)) {
ret = -EFAULT;
goto e_free;
}
@@ -256,6 +271,13 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
e_free:
kfree(report_resp);
+e_free_data:
+ if (npages) {
+ if (set_memory_encrypted((unsigned long)req.certs_data, npages))
+ WARN_ONCE(ret, "failed to restore encryption mask (leak it)\n");
+ else
+ __free_pages(page, get_order(report_req->certs_len));
+ }
return ret;
}
From: Sean Christopherson <seanjc(a)google.com>
commit a8de7f100bb5989d9c3627d3a223ee1c863f3b69 upstream.
Advertise support for Hyper-V's SEND_IPI and SEND_IPI_EX hypercalls if and
only if the local API is emulated/virtualized by KVM, and explicitly reject
said hypercalls if the local APIC is emulated in userspace, i.e. don't rely
on userspace to opt-in to KVM_CAP_HYPERV_ENFORCE_CPUID.
Rejecting SEND_IPI and SEND_IPI_EX fixes a NULL-pointer dereference if
Hyper-V enlightenments are exposed to the guest without an in-kernel local
APIC:
dump_stack+0xbe/0xfd
__kasan_report.cold+0x34/0x84
kasan_report+0x3a/0x50
__apic_accept_irq+0x3a/0x5c0
kvm_hv_send_ipi.isra.0+0x34e/0x820
kvm_hv_hypercall+0x8d9/0x9d0
kvm_emulate_hypercall+0x506/0x7e0
__vmx_handle_exit+0x283/0xb60
vmx_handle_exit+0x1d/0xd0
vcpu_enter_guest+0x16b0/0x24c0
vcpu_run+0xc0/0x550
kvm_arch_vcpu_ioctl_run+0x170/0x6d0
kvm_vcpu_ioctl+0x413/0xb20
__se_sys_ioctl+0x111/0x160
do_syscal1_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Note, checking the sending vCPU is sufficient, as the per-VM irqchip_mode
can't be modified after vCPUs are created, i.e. if one vCPU has an
in-kernel local APIC, then all vCPUs have an in-kernel local APIC.
Reported-by: Dongjie Zou <zoudongjie(a)huawei.com>
Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
72167a9d7da2 ("KVM: x86: hyper-v: Stop shadowing global 'current_vcpu'
variable")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
arch/x86/kvm/hyperv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 20eb8f55e1f1..e097faf12c82 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1618,6 +1618,9 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *current_vcpu, u64 ingpa, u64 outgpa,
u32 vector;
bool all_cpus;
+ if (!lapic_in_kernel(current_vcpu))
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+
if (!ex) {
if (!fast) {
if (unlikely(kvm_read_guest(kvm, ingpa, &send_ipi,
@@ -2060,7 +2063,8 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
- ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
+ if (!vcpu || lapic_in_kernel(vcpu))
+ ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
if (evmcs_ver)
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
--
2.47.1
From: Baoquan He <bhe(a)redhat.com>
commit b3e34a47f98974d0844444c5121aaff123004e57 upstream.
This is reported by kmemleak detector:
unreferenced object 0xffffc900002a9000 (size 4096):
comm "kexec", pid 14950, jiffies 4295110793 (age 373.951s)
hex dump (first 32 bytes):
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 ..>.............
backtrace:
[<0000000016a8ef9f>] __vmalloc_node_range+0x101/0x170
[<000000002b66b6c0>] __vmalloc_node+0xb4/0x160
[<00000000ad40107d>] crash_prepare_elf64_headers+0x8e/0xcd0
[<0000000019afff23>] crash_load_segments+0x260/0x470
[<0000000019ebe95c>] bzImage64_load+0x814/0xad0
[<0000000093e16b05>] arch_kexec_kernel_image_load+0x1be/0x2a0
[<000000009ef2fc88>] kimage_file_alloc_init+0x2ec/0x5a0
[<0000000038f5a97a>] __do_sys_kexec_file_load+0x28d/0x530
[<0000000087c19992>] do_syscall_64+0x3b/0x90
[<0000000066e063a4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In crash_prepare_elf64_headers(), a buffer is allocated via vmalloc() to
store elf headers. While it's not freed back to system correctly when
kdump kernel is reloaded or unloaded. Then memory leak is caused. Fix it
by introducing x86 specific function arch_kimage_file_post_load_cleanup(),
and freeing the buffer there.
And also remove the incorrect elf header buffer freeing code. Before
calling arch specific kexec_file loading function, the image instance has
been initialized. So 'image->elf_headers' must be NULL. It doesn't make
sense to free the elf header buffer in the place.
Three different people have reported three bugs about the memory leak on
x86_64 inside Redhat.
Link: https://lkml.kernel.org/r/20220223113225.63106-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
179350f00e06 ("x86: Use ELF fields defined in 'struct kimage'")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
arch/x86/kernel/machine_kexec_64.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index a29a44a98e5b..19f6aafd595a 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -402,9 +402,6 @@ void machine_kexec(struct kimage *image)
#ifdef CONFIG_KEXEC_FILE
void *arch_kexec_kernel_image_load(struct kimage *image)
{
- vfree(image->arch.elf_headers);
- image->arch.elf_headers = NULL;
-
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
@@ -540,6 +537,15 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
(int)ELF64_R_TYPE(rel[i].r_info), value);
return -ENOEXEC;
}
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+ vfree(image->arch.elf_headers);
+ image->arch.elf_headers = NULL;
+ image->arch.elf_headers_sz = 0;
+
+ return kexec_image_post_load_cleanup_default(image);
+}
#endif /* CONFIG_KEXEC_FILE */
static int
--
2.47.1
This series introduces a new metadata format for UVC cameras and adds a
couple of improvements to the UVC metadata handling.
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v3:
- Fix doc syntax errors.
- Link to v2: https://lore.kernel.org/r/20250306-uvc-metadata-v2-0-7e939857cad5@chromium.…
Changes in v2:
- Add metadata invalid fix
- Move doc note to a separate patch
- Introuce V4L2_META_FMT_UVC_CUSTOM (thanks HdG!).
- Link to v1: https://lore.kernel.org/r/20250226-uvc-metadata-v1-1-6cd6fe5ec2cb@chromium.…
---
Ricardo Ribalda (3):
media: uvcvideo: Do not mark valid metadata as invalid
media: Documentation: Add note about UVCH length field
media: uvcvideo: Introduce V4L2_META_FMT_UVC_CUSTOM
.../userspace-api/media/v4l/meta-formats.rst | 1 +
.../userspace-api/media/v4l/metafmt-uvc-custom.rst | 31 +++++++++++++++++
.../userspace-api/media/v4l/metafmt-uvc.rst | 4 ++-
MAINTAINERS | 1 +
drivers/media/usb/uvc/uvc_metadata.c | 40 ++++++++++++++++++----
drivers/media/usb/uvc/uvc_video.c | 12 +++----
drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
include/uapi/linux/videodev2.h | 1 +
8 files changed, 78 insertions(+), 13 deletions(-)
---
base-commit: 36cef585e2a31e4ddf33a004b0584a7a572246de
change-id: 20250226-uvc-metadata-2e7e445966de
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
Due to asynchronous driver probing there is a chance that the dummy
regulator hasn't already been probed when first accessing it.
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
---
v2:
- return -EPROBE_DEFER rather than using BUG_ON()
v3:
- move dev_warn() below returning -EPROBE_DEFER
drivers/regulator/core.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 4ddf0efead68..4d0f13899e6b 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2069,6 +2069,10 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (have_full_constraints()) {
r = dummy_regulator_rdev;
+ if (!r) {
+ ret = -EPROBE_DEFER;
+ goto out;
+ }
get_device(&r->dev);
} else {
dev_err(dev, "Failed to resolve %s-supply for %s\n",
@@ -2086,6 +2090,10 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
goto out;
}
r = dummy_regulator_rdev;
+ if (!r) {
+ ret = -EPROBE_DEFER;
+ goto out;
+ }
get_device(&r->dev);
}
@@ -2211,8 +2219,10 @@ struct regulator *_regulator_get_common(struct regulator_dev *rdev, struct devic
* enabled, even if it isn't hooked up, and just
* provide a dummy.
*/
- dev_warn(dev, "supply %s not found, using dummy regulator\n", id);
rdev = dummy_regulator_rdev;
+ if (!rdev)
+ return ERR_PTR(-EPROBE_DEFER);
+ dev_warn(dev, "supply %s not found, using dummy regulator\n", id);
get_device(&rdev->dev);
break;
--
2.44.1
The vfio_ap_mdev_request function in drivers/s390/crypto/vfio_ap_ops.c
accesses fields of an ap_matrix_mdev object without ensuring that the
object is accessed by only one thread at a time. This patch adds the lock
necessary to secure access to the ap_matrix_mdev object.
Fixes: 2e3d8d71e285 ("s390/vfio-ap: wire in the vfio_device_ops request callback")
Signed-off-by: Anthony Krowiak <akrowiak(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
---
drivers/s390/crypto/vfio_ap_ops.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a52c2690933f..a2784d3357d9 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -2045,6 +2045,7 @@ static void vfio_ap_mdev_request(struct vfio_device *vdev, unsigned int count)
struct ap_matrix_mdev *matrix_mdev;
matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+ mutex_lock(&matrix_dev->mdevs_lock);
if (matrix_mdev->req_trigger) {
if (!(count % 10))
@@ -2057,6 +2058,8 @@ static void vfio_ap_mdev_request(struct vfio_device *vdev, unsigned int count)
dev_notice(dev,
"No device request registered, blocked until released by user\n");
}
+
+ mutex_unlock(&matrix_dev->mdevs_lock);
}
static int vfio_ap_mdev_get_device_info(unsigned long arg)
--
2.47.1
Hi Greg, Sasha,
Please consider this series for 6.12.y. It should apply cleanly on top
of v6.12.18.
These are the patches to backport the `alloc` series for Rust, which
will be useful for Rust Android Binder and others. It also means that,
with this applied, we will not rely on the standard library `alloc` (and
the unstable `cfg` option we used) anymore in any stable kernel that
supports several Rust versions, so e.g. upstream Rust could consider
removing that `cfg` if they needed.
The entire series of cherry-picks apply almost cleanly (only 2 trivial
conflicts) -- to achieve that, I included the `#[expect]` support, which
will make future backports that use that feature easier anyway. That
series also enabled some Clippy warnings. We could reduce the series,
but the end result is warning-free and Clippy is opt-in anyway.
Out-of-tree code could, of course, see some warnings if they use it.
I also included a bunch of Clippy warnings cleanups for the DRM QR Code
to have this series clean up to Rust 1.85.0 (the latest stable), but
I could send them separately if needed.
Finally, I included the "Custom FFI" series backport, which in turn
solves the arm64 + Rust 1.85.0 + `CONFIG_RUST_FW_LOADER_ABSTRACTIONS=y`
issue. It will also make future patches easier to backport, since we
will have the same `ffi::` types.
I tested that the entire series builds between every commit for x86_64
`LLVM=1` with the latest stable and minimum supported Rust compiler
versions. I also ran my usual stable kernel tests on the end result;
that is, boot-tested in QEMU for several architectures etc. In v6.12.18
for loongarch there is an unrelated error that was not there in v6.12.17
when I did a previous test run -- reported separately.
Things could still break, so extra tests on the next -rc from users in
Cc here would be welcome -- thanks!
Cheers,
Miguel
Asahi Lina (1):
rust: alloc: Fix `ArrayLayout` allocations
Benno Lossin (1):
rust: alloc: introduce `ArrayLayout`
Danilo Krummrich (28):
rust: alloc: add `Allocator` trait
rust: alloc: separate `aligned_size` from `krealloc_aligned`
rust: alloc: rename `KernelAllocator` to `Kmalloc`
rust: alloc: implement `ReallocFunc`
rust: alloc: make `allocator` module public
rust: alloc: implement `Allocator` for `Kmalloc`
rust: alloc: add module `allocator_test`
rust: alloc: implement `Vmalloc` allocator
rust: alloc: implement `KVmalloc` allocator
rust: alloc: add __GFP_NOWARN to `Flags`
rust: alloc: implement kernel `Box`
rust: treewide: switch to our kernel `Box` type
rust: alloc: remove extension of std's `Box`
rust: alloc: add `Box` to prelude
rust: alloc: implement kernel `Vec` type
rust: alloc: implement `IntoIterator` for `Vec`
rust: alloc: implement `collect` for `IntoIter`
rust: treewide: switch to the kernel `Vec` type
rust: alloc: remove `VecExt` extension
rust: alloc: add `Vec` to prelude
rust: error: use `core::alloc::LayoutError`
rust: error: check for config `test` in `Error::name`
rust: alloc: implement `contains` for `Flags`
rust: alloc: implement `Cmalloc` in module allocator_test
rust: str: test: replace `alloc::format`
rust: alloc: update module comment of alloc.rs
kbuild: rust: remove the `alloc` crate and `GlobalAlloc`
MAINTAINERS: add entry for the Rust `alloc` module
Ethan D. Twardy (1):
rust: kbuild: expand rusttest target for macros
Filipe Xavier (2):
rust: error: make conversion functions public
rust: error: optimize error type to use nonzero
Gary Guo (3):
rust: fix size_t in bindgen prototypes of C builtins
rust: map `__kernel_size_t` and friends also to usize/isize
rust: use custom FFI integer types
Miguel Ojeda (17):
rust: workqueue: remove unneeded ``#[allow(clippy::new_ret_no_self)]`
rust: sort global Rust flags
rust: types: avoid repetition in `{As,From}Bytes` impls
rust: enable `clippy::undocumented_unsafe_blocks` lint
rust: enable `clippy::unnecessary_safety_comment` lint
rust: enable `clippy::unnecessary_safety_doc` lint
rust: enable `clippy::ignored_unit_patterns` lint
rust: enable `rustdoc::unescaped_backticks` lint
rust: init: remove unneeded `#[allow(clippy::disallowed_names)]`
rust: sync: remove unneeded
`#[allow(clippy::non_send_fields_in_send_ty)]`
rust: introduce `.clippy.toml`
rust: replace `clippy::dbg_macro` with `disallowed_macros`
rust: provide proper code documentation titles
rust: enable Clippy's `check-private-items`
Documentation: rust: add coding guidelines on lints
rust: start using the `#[expect(...)]` attribute
Documentation: rust: discuss `#[expect(...)]` in the guidelines
Thomas Böhler (7):
drm/panic: avoid reimplementing Iterator::find
drm/panic: remove unnecessary borrow in alignment_pattern
drm/panic: prefer eliding lifetimes
drm/panic: remove redundant field when assigning value
drm/panic: correctly indent continuation of line in list item
drm/panic: allow verbose boolean for clarity
drm/panic: allow verbose version check
.clippy.toml | 9 +
.gitignore | 1 +
Documentation/rust/coding-guidelines.rst | 148 ++++
MAINTAINERS | 8 +
Makefile | 15 +-
drivers/block/rnull.rs | 4 +-
drivers/gpu/drm/drm_panic_qr.rs | 23 +-
mm/kasan/kasan_test_rust.rs | 3 +-
rust/Makefile | 92 +--
rust/bindgen_parameters | 5 +
rust/bindings/bindings_helper.h | 1 +
rust/bindings/lib.rs | 6 +
rust/exports.c | 1 -
rust/ffi.rs | 13 +
rust/helpers/helpers.c | 1 +
rust/helpers/slab.c | 6 +
rust/helpers/vmalloc.c | 9 +
rust/kernel/alloc.rs | 150 +++-
rust/kernel/alloc/allocator.rs | 208 ++++--
rust/kernel/alloc/allocator_test.rs | 95 +++
rust/kernel/alloc/box_ext.rs | 89 ---
rust/kernel/alloc/kbox.rs | 456 +++++++++++
rust/kernel/alloc/kvec.rs | 913 +++++++++++++++++++++++
rust/kernel/alloc/layout.rs | 91 +++
rust/kernel/alloc/vec_ext.rs | 185 -----
rust/kernel/block/mq/operations.rs | 18 +-
rust/kernel/block/mq/raw_writer.rs | 2 +-
rust/kernel/block/mq/tag_set.rs | 2 +-
rust/kernel/error.rs | 79 +-
rust/kernel/init.rs | 127 ++--
rust/kernel/init/__internal.rs | 13 +-
rust/kernel/init/macros.rs | 18 +-
rust/kernel/ioctl.rs | 2 +-
rust/kernel/lib.rs | 5 +-
rust/kernel/list.rs | 1 +
rust/kernel/list/arc_field.rs | 2 +-
rust/kernel/net/phy.rs | 16 +-
rust/kernel/prelude.rs | 5 +-
rust/kernel/print.rs | 5 +-
rust/kernel/rbtree.rs | 49 +-
rust/kernel/std_vendor.rs | 12 +-
rust/kernel/str.rs | 46 +-
rust/kernel/sync/arc.rs | 25 +-
rust/kernel/sync/arc/std_vendor.rs | 2 +
rust/kernel/sync/condvar.rs | 7 +-
rust/kernel/sync/lock.rs | 8 +-
rust/kernel/sync/lock/mutex.rs | 4 +-
rust/kernel/sync/lock/spinlock.rs | 4 +-
rust/kernel/sync/locked_by.rs | 2 +-
rust/kernel/task.rs | 8 +-
rust/kernel/time.rs | 4 +-
rust/kernel/types.rs | 140 ++--
rust/kernel/uaccess.rs | 23 +-
rust/kernel/workqueue.rs | 29 +-
rust/macros/lib.rs | 14 +-
rust/macros/module.rs | 8 +-
rust/uapi/lib.rs | 6 +
samples/rust/rust_minimal.rs | 4 +-
samples/rust/rust_print.rs | 1 +
scripts/Makefile.build | 4 +-
scripts/generate_rust_analyzer.py | 11 +-
61 files changed, 2482 insertions(+), 756 deletions(-)
create mode 100644 .clippy.toml
create mode 100644 rust/ffi.rs
create mode 100644 rust/helpers/vmalloc.c
create mode 100644 rust/kernel/alloc/allocator_test.rs
delete mode 100644 rust/kernel/alloc/box_ext.rs
create mode 100644 rust/kernel/alloc/kbox.rs
create mode 100644 rust/kernel/alloc/kvec.rs
create mode 100644 rust/kernel/alloc/layout.rs
delete mode 100644 rust/kernel/alloc/vec_ext.rs
--
2.48.1
Sometimes I get a NULL pointer dereference at boot time in kobject_get()
with the following call stack:
anatop_regulator_probe()
devm_regulator_register()
regulator_register()
regulator_resolve_supply()
kobject_get()
By placing some extra BUG_ON() statements I could verify that this is
raised because probing of the 'dummy' regulator driver is not completed
('dummy_regulator_rdev' is still NULL).
In the JTAG debugger I can see that dummy_regulator_probe() and
anatop_regulator_probe() can be run by different kernel threads
(kworker/u4:*). I haven't further investigated whether this can be
changed or if there are other possibilities to force synchronization
between these two probe routines. On the other hand I don't expect much
boot time penalty by probing the 'dummy' regulator synchronously.
Cc: stable(a)vger.kernel.org
Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14")
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
---
v2:
- no changes
v3:
- no changes
drivers/regulator/dummy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/dummy.c b/drivers/regulator/dummy.c
index 5b9b9e4e762d..9f59889129ab 100644
--- a/drivers/regulator/dummy.c
+++ b/drivers/regulator/dummy.c
@@ -60,7 +60,7 @@ static struct platform_driver dummy_regulator_driver = {
.probe = dummy_regulator_probe,
.driver = {
.name = "reg-dummy",
- .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
},
};
--
2.44.1
Once device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
As comment of device_register() says, 'NOTE: _Never_ directly free
@dev after calling this function, even if it returned an error! Always
use put_device() to give up the reference initialized in this function
instead.'
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: baa057e29b58 ("media: v4l2-dev: use pr_foo() for printing messages")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/media/v4l2-core/v4l2-dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
index 5bcaeeba4d09..1619614e96bf 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -1060,6 +1060,7 @@ int __video_register_device(struct video_device *vdev,
if (ret < 0) {
mutex_unlock(&videodev_lock);
pr_err("%s: device_register failed\n", __func__);
+ put_device(&vdev->dev);
goto cleanup;
}
/* Register the release callback that will be called when the last
--
2.25.1
Once device_add() failed, we should call put_device() to decrement
reference count for cleanup. Or it could cause memory leak.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 8434aa8b6fe5 ("[SCSI] iscsi: break up session creation into two stages")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/scsi/scsi_transport_iscsi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 9c347c64c315..74333e182612 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2114,6 +2114,7 @@ int iscsi_add_session(struct iscsi_cls_session *session, unsigned int target_id)
release_dev:
device_del(&session->dev);
release_ida:
+ put_device(&session->dev);
if (session->ida_used)
ida_free(&iscsi_sess_ida, session->target_id);
destroy_wq:
--
2.25.1
Once device_add() failed, we should call put_device() to decrement
reference count for cleanup. Or it could cause memory leak.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 54506918059a ("Bluetooth: Move SMP initialization after HCI init")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
net/bluetooth/hci_core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index e7ec12437c8b..c03fd16d3c46 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -2641,6 +2641,7 @@ int hci_register_dev(struct hci_dev *hdev)
return id;
err_wqueue:
+ put_device(&hdev->dev);
debugfs_remove_recursive(hdev->debugfs);
destroy_workqueue(hdev->workqueue);
destroy_workqueue(hdev->req_workqueue);
--
2.25.1
Once device_add() failed, we should call put_device() to decrement
reference count for cleanup. Or it could cause memory leak.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 0cd587735205 ("Input: preallocate memory to hold event values")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/input/input.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/input/input.c b/drivers/input/input.c
index c9e3ac64bcd0..2e70f346dadc 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -2424,6 +2424,7 @@ int input_register_device(struct input_dev *dev)
err_device_del:
device_del(&dev->dev);
err_devres_free:
+ put_device(&dev->dev);
devres_free(devres);
return error;
}
--
2.25.1
Once device_add() failed, we should call put_device() to decrement
reference count for cleanup. Or it could cause memory leak.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 8ed633b9baf9 ("Revert "net-sysfs: Fix memory leak in netdev_register_kobject"")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
net/core/net-sysfs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 07cb99b114bd..f443eacc9237 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -2169,6 +2169,7 @@ int netdev_register_kobject(struct net_device *ndev)
error = device_add(dev);
if (error)
+ put_device(dev);
return error;
error = register_queue_kobjects(ndev);
--
2.25.1
Once device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
As comment of device_register() says, 'NOTE: _Never_ directly free
@dev after calling this function, even if it returned an error! Always
use put_device() to give up the reference initialized in this function
instead.'
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: a3d4d6435b56 ("[POWERPC] ps3: add ps3 platform system bus support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
arch/powerpc/platforms/ps3/system-bus.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c
index afbaabf182d0..c477d0ee523a 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -769,6 +769,9 @@ int ps3_system_bus_device_register(struct ps3_system_bus_device *dev)
pr_debug("%s:%d add %s\n", __func__, __LINE__, dev_name(&dev->core));
result = device_register(&dev->core);
+ if (result)
+ put_device(&dev->core);
+
return result;
}
--
2.25.1
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x c50f8e6053b0503375c2975bf47f182445aebb4c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030926-tapestry-rabid-7392@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c50f8e6053b0503375c2975bf47f182445aebb4c Mon Sep 17 00:00:00 2001
From: Barry Song <baohua(a)kernel.org>
Date: Wed, 26 Feb 2025 13:14:00 +1300
Subject: [PATCH] mm: fix kernel BUG when userfaultfd_move encounters swapcache
userfaultfd_move() checks whether the PTE entry is present or a
swap entry.
- If the PTE entry is present, move_present_pte() handles folio
migration by setting:
src_folio->index = linear_page_index(dst_vma, dst_addr);
- If the PTE entry is a swap entry, move_swap_pte() simply copies
the PTE to the new dst_addr.
This approach is incorrect because, even if the PTE is a swap entry,
it can still reference a folio that remains in the swap cache.
This creates a race window between steps 2 and 4.
1. add_to_swap: The folio is added to the swapcache.
2. try_to_unmap: PTEs are converted to swap entries.
3. pageout: The folio is written back.
4. Swapcache is cleared.
If userfaultfd_move() occurs in the window between steps 2 and 4,
after the swap PTE has been moved to the destination, accessing the
destination triggers do_swap_page(), which may locate the folio in
the swapcache. However, since the folio's index has not been updated
to match the destination VMA, do_swap_page() will detect a mismatch.
This can result in two critical issues depending on the system
configuration.
If KSM is disabled, both small and large folios can trigger a BUG
during the add_rmap operation due to:
page_pgoff(folio, page) != linear_page_index(vma, address)
[ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c index:0xffffaf150 pfn:0x4667c
[ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_mapped:1 pincount:0
[ 13.337716] memcg:ffff00000405f000
[ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|owner_priv_1|head|swapbacked|node=0|zone=0|lastcpupid=0xffff)
[ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.339204] head: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff00000000 0000000000000001
[ 13.339546] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address))
[ 13.340190] ------------[ cut here ]------------
[ 13.340316] kernel BUG at mm/rmap.c:1380!
[ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 13.340969] Modules linked in:
[ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-rc3-gcf42737e247a-dirty #299
[ 13.341470] Hardware name: linux,dummy-virt (DT)
[ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0
[ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0
[ 13.342018] sp : ffff80008752bb20
[ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 0000000000000001
[ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
[ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffdffc0199f00
[ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 00000000ffffffff
[ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662866666f67705f
[ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff800083728ab0
[ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff80008011bc40
[ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000829eebf8
[ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000
[ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000000000000005f
[ 13.343876] Call trace:
[ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P)
[ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320
[ 13.344333] do_swap_page+0x1060/0x1400
[ 13.344417] __handle_mm_fault+0x61c/0xbc8
[ 13.344504] handle_mm_fault+0xd8/0x2e8
[ 13.344586] do_page_fault+0x20c/0x770
[ 13.344673] do_translation_fault+0xb4/0xf0
[ 13.344759] do_mem_abort+0x48/0xa0
[ 13.344842] el0_da+0x58/0x130
[ 13.344914] el0t_64_sync_handler+0xc4/0x138
[ 13.345002] el0t_64_sync+0x1ac/0x1b0
[ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000)
[ 13.345504] ---[ end trace 0000000000000000 ]---
[ 13.345715] note: a.out[107] exited with irqs disabled
[ 13.345954] note: a.out[107] exited with preempt_count 2
If KSM is enabled, Peter Xu also discovered that do_swap_page() may
trigger an unexpected CoW operation for small folios because
ksm_might_need_to_copy() allocates a new folio when the folio index
does not match linear_page_index(vma, addr).
This patch also checks the swapcache when handling swap entries. If a
match is found in the swapcache, it processes it similarly to a present
PTE.
However, there are some differences. For example, the folio is no longer
exclusive because folio_try_share_anon_rmap_pte() is performed during
unmapping.
Furthermore, in the case of swapcache, the folio has already been
unmapped, eliminating the risk of concurrent rmap walks and removing the
need to acquire src_folio's anon_vma or lock.
Note that for large folios, in the swapcache handling path, we directly
return -EBUSY since split_folio() will return -EBUSY regardless if
the folio is under writeback or unmapped. This is not an urgent issue,
so a follow-up patch may address it separately.
[v-songbaohua(a)oppo.com: minor cleanup according to Peter Xu]
Link: https://lkml.kernel.org/r/20250226024411.47092-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20250226001400.9129-1-21cnbao@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Nicolas Geoffray <ngeoffray(a)google.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: ZhangPeng <zhangpeng362(a)huawei.com>
Cc: Tangquan Zheng <zhengtangquan(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index af3dfc3633db..c45b672e10d1 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -18,6 +18,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include "internal.h"
+#include "swap.h"
static __always_inline
bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
@@ -1076,16 +1077,14 @@ static int move_present_pte(struct mm_struct *mm,
return err;
}
-static int move_swap_pte(struct mm_struct *mm,
+static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
unsigned long dst_addr, unsigned long src_addr,
pte_t *dst_pte, pte_t *src_pte,
pte_t orig_dst_pte, pte_t orig_src_pte,
pmd_t *dst_pmd, pmd_t dst_pmdval,
- spinlock_t *dst_ptl, spinlock_t *src_ptl)
+ spinlock_t *dst_ptl, spinlock_t *src_ptl,
+ struct folio *src_folio)
{
- if (!pte_swp_exclusive(orig_src_pte))
- return -EBUSY;
-
double_pt_lock(dst_ptl, src_ptl);
if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte,
@@ -1094,6 +1093,16 @@ static int move_swap_pte(struct mm_struct *mm,
return -EAGAIN;
}
+ /*
+ * The src_folio resides in the swapcache, requiring an update to its
+ * index and mapping to align with the dst_vma, where a swap-in may
+ * occur and hit the swapcache after moving the PTE.
+ */
+ if (src_folio) {
+ folio_move_anon_rmap(src_folio, dst_vma);
+ src_folio->index = linear_page_index(dst_vma, dst_addr);
+ }
+
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
@@ -1141,6 +1150,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
__u64 mode)
{
swp_entry_t entry;
+ struct swap_info_struct *si = NULL;
pte_t orig_src_pte, orig_dst_pte;
pte_t src_folio_pte;
spinlock_t *src_ptl, *dst_ptl;
@@ -1322,6 +1332,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
orig_dst_pte, orig_src_pte, dst_pmd,
dst_pmdval, dst_ptl, src_ptl, src_folio);
} else {
+ struct folio *folio = NULL;
+
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
@@ -1335,9 +1347,53 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
goto out;
}
- err = move_swap_pte(mm, dst_addr, src_addr, dst_pte, src_pte,
- orig_dst_pte, orig_src_pte, dst_pmd,
- dst_pmdval, dst_ptl, src_ptl);
+ if (!pte_swp_exclusive(orig_src_pte)) {
+ err = -EBUSY;
+ goto out;
+ }
+
+ si = get_swap_device(entry);
+ if (unlikely(!si)) {
+ err = -EAGAIN;
+ goto out;
+ }
+ /*
+ * Verify the existence of the swapcache. If present, the folio's
+ * index and mapping must be updated even when the PTE is a swap
+ * entry. The anon_vma lock is not taken during this process since
+ * the folio has already been unmapped, and the swap entry is
+ * exclusive, preventing rmap walks.
+ *
+ * For large folios, return -EBUSY immediately, as split_folio()
+ * also returns -EBUSY when attempting to split unmapped large
+ * folios in the swapcache. This issue needs to be resolved
+ * separately to allow proper handling.
+ */
+ if (!src_folio)
+ folio = filemap_get_folio(swap_address_space(entry),
+ swap_cache_index(entry));
+ if (!IS_ERR_OR_NULL(folio)) {
+ if (folio_test_large(folio)) {
+ err = -EBUSY;
+ folio_put(folio);
+ goto out;
+ }
+ src_folio = folio;
+ src_folio_pte = orig_src_pte;
+ if (!folio_trylock(src_folio)) {
+ pte_unmap(&orig_src_pte);
+ pte_unmap(&orig_dst_pte);
+ src_pte = dst_pte = NULL;
+ put_swap_device(si);
+ si = NULL;
+ /* now we can block and wait */
+ folio_lock(src_folio);
+ goto retry;
+ }
+ }
+ err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte,
+ orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval,
+ dst_ptl, src_ptl, src_folio);
}
out:
@@ -1354,6 +1410,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
if (src_pte)
pte_unmap(src_pte);
mmu_notifier_invalidate_range_end(&range);
+ if (si)
+ put_swap_device(si);
return err;
}
commit ed4fb6d7ef68111bb539283561953e5c6e9a6e38 upstream.
The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.
This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.
This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":
Timer slack is not applied to threads that are scheduled under a
real-time scheduling policy (see sched_setscheduler(2)).
The special handling of timerslack on rt tasks in downstream users
is removed as well.
Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemen…
---
fs/proc/base.c | 9 +++++----
fs/select.c | 11 ++++-------
kernel/sched/core.c | 8 ++++++++
kernel/sys.c | 2 ++
kernel/time/hrtimer.c | 18 +++---------------
5 files changed, 22 insertions(+), 26 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ecc45389ea793..82e4a8805bae6 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2633,10 +2633,11 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
}
task_lock(p);
- if (slack_ns == 0)
- p->timer_slack_ns = p->default_timer_slack_ns;
- else
- p->timer_slack_ns = slack_ns;
+ if (task_is_realtime(p))
+ slack_ns = 0;
+ else if (slack_ns == 0)
+ slack_ns = p->default_timer_slack_ns;
+ p->timer_slack_ns = slack_ns;
task_unlock(p);
out:
diff --git a/fs/select.c b/fs/select.c
index 3f730b8581f65..e66b6189845ea 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -77,19 +77,16 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
{
u64 ret;
struct timespec64 now;
+ u64 slack = current->timer_slack_ns;
- /*
- * Realtime tasks get a slack of 0 for obvious reasons.
- */
-
- if (rt_task(current))
+ if (slack == 0)
return 0;
ktime_get_ts64(&now);
now = timespec64_sub(*tv, now);
ret = __estimate_accuracy(&now);
- if (ret < current->timer_slack_ns)
- return current->timer_slack_ns;
+ if (ret < slack)
+ return slack;
return ret;
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0a483fd9f5de5..9be8a509b5f3f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7380,6 +7380,14 @@ static void __setscheduler_params(struct task_struct *p,
else if (fair_policy(policy))
p->static_prio = NICE_TO_PRIO(attr->sched_nice);
+ /* rt-policy tasks do not have a timerslack */
+ if (task_is_realtime(p)) {
+ p->timer_slack_ns = 0;
+ } else if (p->timer_slack_ns == 0) {
+ /* when switching back to non-rt policy, restore timerslack */
+ p->timer_slack_ns = p->default_timer_slack_ns;
+ }
+
/*
* __sched_setscheduler() ensures attr->sched_priority == 0 when
* !rt_policy. Always setting this ensures that things like
diff --git a/kernel/sys.c b/kernel/sys.c
index d06eda1387b69..06a9a87a8d3e0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2477,6 +2477,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
error = current->timer_slack_ns;
break;
case PR_SET_TIMERSLACK:
+ if (task_is_realtime(current))
+ break;
if (arg2 <= 0)
current->timer_slack_ns =
current->default_timer_slack_ns;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 8db65e2db14c7..f6d799646dd9c 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2090,14 +2090,9 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,
struct restart_block *restart;
struct hrtimer_sleeper t;
int ret = 0;
- u64 slack;
-
- slack = current->timer_slack_ns;
- if (dl_task(current) || rt_task(current))
- slack = 0;
hrtimer_init_sleeper_on_stack(&t, clockid, mode);
- hrtimer_set_expires_range_ns(&t.timer, rqtp, slack);
+ hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns);
ret = do_nanosleep(&t, mode);
if (ret != -ERESTART_RESTARTBLOCK)
goto out;
@@ -2278,7 +2273,7 @@ void __init hrtimers_init(void)
/**
* schedule_hrtimeout_range_clock - sleep until timeout
* @expires: timeout value (ktime_t)
- * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
+ * @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
* @clock_id: timer clock to be used
*/
@@ -2305,13 +2300,6 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
return -EINTR;
}
- /*
- * Override any slack passed by the user if under
- * rt contraints.
- */
- if (rt_task(current))
- delta = 0;
-
hrtimer_init_sleeper_on_stack(&t, clock_id, mode);
hrtimer_set_expires_range_ns(&t.timer, *expires, delta);
hrtimer_sleeper_start_expires(&t, mode);
@@ -2331,7 +2319,7 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
/**
* schedule_hrtimeout_range - sleep until timeout
* @expires: timeout value (ktime_t)
- * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
+ * @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
--
2.47.2
Patches "sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy" and
"sctp: sysctl: auth_enable: avoid using current->nsproxy" have been
mixed up when backported to 5.4. The `member` argument passed to
`container_of` has been swapped in both proc_sctp_do_auth() and
proc_sctp_do_hmac_alg(). For instance, accessing
/proc/sys/net/sctp/cookie_hmac_alg can now cause a kernel oops.
Fix this by reverting the wrong backports and re-applying them correctly.
Magali Lemes (2):
Revert "sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy"
Revert "sctp: sysctl: auth_enable: avoid using current->nsproxy"
Matthieu Baerts (NGI0) (2):
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
sctp: sysctl: auth_enable: avoid using current->nsproxy
net/sctp/sysctl.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--
2.48.1
Backport of a similar change from commit 5ac9b4e935df ("lib/buildid:
Handle memfd_secret() files in build_id_parse()") to address an issue
where accessing secret memfd contents through build_id_parse() would
trigger faults.
Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
This repro will cause BUG: unable to handle kernel paging request in
build_id_parse in 5.15/6.1/6.6.
Some other discussions can be found in [1].
[1] https://lore.kernel.org/bpf/20241104175256.2327164-1-jolsa@kernel.org/T/#u
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..9db35305f257 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
commit ed4fb6d7ef68111bb539283561953e5c6e9a6e38 upstream.
The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.
This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.
This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":
Timer slack is not applied to threads that are scheduled under a
real-time scheduling policy (see sched_setscheduler(2)).
The special handling of timerslack on rt tasks in downstream users
is removed as well.
Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemen…
---
fs/proc/base.c | 9 +++++----
fs/select.c | 11 ++++-------
kernel/sched/core.c | 8 ++++++++
kernel/sys.c | 2 ++
kernel/time/hrtimer.c | 18 +++---------------
5 files changed, 22 insertions(+), 26 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 699f085d4de7d..91fe20b7657c0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2633,10 +2633,11 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
}
task_lock(p);
- if (slack_ns == 0)
- p->timer_slack_ns = p->default_timer_slack_ns;
- else
- p->timer_slack_ns = slack_ns;
+ if (task_is_realtime(p))
+ slack_ns = 0;
+ else if (slack_ns == 0)
+ slack_ns = p->default_timer_slack_ns;
+ p->timer_slack_ns = slack_ns;
task_unlock(p);
out:
diff --git a/fs/select.c b/fs/select.c
index 3f730b8581f65..e66b6189845ea 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -77,19 +77,16 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
{
u64 ret;
struct timespec64 now;
+ u64 slack = current->timer_slack_ns;
- /*
- * Realtime tasks get a slack of 0 for obvious reasons.
- */
-
- if (rt_task(current))
+ if (slack == 0)
return 0;
ktime_get_ts64(&now);
now = timespec64_sub(*tv, now);
ret = __estimate_accuracy(&now);
- if (ret < current->timer_slack_ns)
- return current->timer_slack_ns;
+ if (ret < slack)
+ return slack;
return ret;
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 784a4f8409453..3d6dc03e4a7d3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7530,6 +7530,14 @@ static void __setscheduler_params(struct task_struct *p,
else if (fair_policy(policy))
p->static_prio = NICE_TO_PRIO(attr->sched_nice);
+ /* rt-policy tasks do not have a timerslack */
+ if (task_is_realtime(p)) {
+ p->timer_slack_ns = 0;
+ } else if (p->timer_slack_ns == 0) {
+ /* when switching back to non-rt policy, restore timerslack */
+ p->timer_slack_ns = p->default_timer_slack_ns;
+ }
+
/*
* __sched_setscheduler() ensures attr->sched_priority == 0 when
* !rt_policy. Always setting this ensures that things like
diff --git a/kernel/sys.c b/kernel/sys.c
index 44b5759903332..355de0b65c235 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2535,6 +2535,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
error = current->timer_slack_ns;
break;
case PR_SET_TIMERSLACK:
+ if (task_is_realtime(current))
+ break;
if (arg2 <= 0)
current->timer_slack_ns =
current->default_timer_slack_ns;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index e99b1305e1a5f..5db6912b8f6e1 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2093,14 +2093,9 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,
struct restart_block *restart;
struct hrtimer_sleeper t;
int ret = 0;
- u64 slack;
-
- slack = current->timer_slack_ns;
- if (rt_task(current))
- slack = 0;
hrtimer_init_sleeper_on_stack(&t, clockid, mode);
- hrtimer_set_expires_range_ns(&t.timer, rqtp, slack);
+ hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns);
ret = do_nanosleep(&t, mode);
if (ret != -ERESTART_RESTARTBLOCK)
goto out;
@@ -2281,7 +2276,7 @@ void __init hrtimers_init(void)
/**
* schedule_hrtimeout_range_clock - sleep until timeout
* @expires: timeout value (ktime_t)
- * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
+ * @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
* @clock_id: timer clock to be used
*/
@@ -2308,13 +2303,6 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
return -EINTR;
}
- /*
- * Override any slack passed by the user if under
- * rt contraints.
- */
- if (rt_task(current))
- delta = 0;
-
hrtimer_init_sleeper_on_stack(&t, clock_id, mode);
hrtimer_set_expires_range_ns(&t.timer, *expires, delta);
hrtimer_sleeper_start_expires(&t, mode);
@@ -2334,7 +2322,7 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
/**
* schedule_hrtimeout_range - sleep until timeout
* @expires: timeout value (ktime_t)
- * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
+ * @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
--
2.47.2
Currently, load_microcode_amd() iterates over all NUMA nodes, retrieves
their CPU masks and unconditonally accesses per-CPU data for the first
CPU of each mask.
According to Documentation/admin-guide/mm/numaperf.rst: "Some memory may
share the same node as a CPU, and others are provided as memory only
nodes." Therefore, some node CPU masks may be empty and wouldn't have a
"first CPU".
On a machine with far memory (and therefore CPU-less NUMA nodes):
- cpumask_of_node(nid) is 0
- cpumask_first(0) is CONFIG_NR_CPUS
- cpu_data(CONFIG_NR_CPUS) accesses the cpu_info per-CPU array at an
index that is 1 out of bounds
This does not have any security implications since flashing microcode is
a privileged operation but I believe this has reliability implications
by potentially corrupting memory while flashing a microcode update.
When booting with CONFIG_UBSAN_BOUNDS=y on an AMD machine that flashes a
microcode update. I get the following splat:
UBSAN: array-index-out-of-bounds in arch/x86/kernel/cpu/microcode/amd.c:X:Y
index 512 is out of range for type 'unsigned long[512]'
[...]
Call Trace:
dump_stack+0xdb/0x143
__ubsan_handle_out_of_bounds+0xf5/0x120
load_microcode_amd+0x58f/0x6b0
request_microcode_amd+0x17c/0x250
reload_store+0x174/0x2b0
kernfs_fop_write_iter+0x227/0x2d0
vfs_write+0x322/0x510
ksys_write+0xb5/0x160
do_syscall_64+0x6b/0xa0
entry_SYSCALL_64_after_hwframe+0x67/0xd1
This changes the iteration to only loop on NUMA nodes which have CPUs
before attempting to update their microcodes.
Fixes: 7ff6edf4fef3 ("x86/microcode/AMD: Fix mixed steppings support")
Signed-off-by: Florent Revest <revest(a)chromium.org>
Cc: stable(a)vger.kernel.org
---
arch/x86/kernel/cpu/microcode/amd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 95ac1c6a84fbe..d1e74bfe130f8 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -1068,7 +1068,7 @@ static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t siz
if (ret != UCODE_OK)
return ret;
- for_each_node(nid) {
+ for_each_node_with_cpus(nid) {
cpu = cpumask_first(cpumask_of_node(nid));
c = &cpu_data(cpu);
--
2.49.0.rc0.332.g42c0ae87b1-goog
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030901-banshee-unwomanly-f19e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio(a)gmail.com>
Date: Tue, 25 Feb 2025 11:59:27 +0200
Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805
Raspberry Pi is a major user of those chips and they discovered a bug -
when the end of a transfer ring segment is reached, up to four TRBs can
be prefetched from the next page even if the segment ends with link TRB
and on page boundary (the chip claims to support standard 4KB pages).
It also appears that if the prefetched TRBs belong to a different ring
whose doorbell is later rung, they may be used without refreshing from
system RAM and the endpoint will stay idle if their cycle bit is stale.
Other users complain about IOMMU faults on x86 systems, unsurprisingly.
Deal with it by using existing quirk which allocates a dummy page after
each transfer ring segment. This was seen to resolve both problems. RPi
came up with a more efficient solution, shortening each segment by four
TRBs, but it complicated the driver and they ditched it for this quirk.
Also rename the quirk and add VL805 device ID macro.
Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com>
Link: https://github.com/raspberrypi/linux/issues/4685
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906
CC: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..fdf0c1008225 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ if (xhci->quirks & XHCI_TRB_OVERFETCH)
+ /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
else
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ad0ff356f6fa..54460d11f7ee 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -38,6 +38,8 @@
#define PCI_DEVICE_ID_ETRON_EJ168 0x7023
#define PCI_DEVICE_ID_ETRON_EJ188 0x7052
+#define PCI_DEVICE_ID_VIA_VL805 0x3483
+
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31
#define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1
@@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) {
@@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->device == 0x9202) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->device == 0x9203)
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->vendor == PCI_VENDOR_ID_CDNS &&
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..779b01dee068 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1632,7 +1632,7 @@ struct xhci_hcd {
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
+#define XHCI_TRB_OVERFETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
From: Paulo Alcantara <pc(a)manguebit.com>
commit d3da25c5ac84430f89875ca7485a3828150a7e0a upstream.
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
[ cifs_debug.c was moved from fs/cifs to fs/smb/client since
38c8a9a52082 ("smb: move client and server files to common directory fs/smb").
The cifs_ses_exiting() was introduced to cifs_debug.c since
ca545b7f0823 ("smb: client: fix potential UAF in cifs_debug_files_proc_show()").
and the SES_EXITING in cifs_ses_exiting() instead of CifsExiting since
dd3cd8709ed5 ("cifs: use new enum for ses_status").
The ses_lock in cifs_ses_exiting() was introduced in commmit d7d7a66aacd6
("cifs: avoid use of global locks for high contention data"), on 5.15/5.10,
there is a global lock take care of ses->status.
So use "if (ses->status == CifsExiting)" instead of "if (cifs_ses_exiting(ses))" ]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Try to merge commit d3da25c5ac84430f89875ca7485a3828150a7e0a to 5.15
Verified the code compile on linux 5.15
---
fs/cifs/cifs_debug.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index e7501533c2ec..ce02cc71e117 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -528,6 +528,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
list_for_each(tmp2, &server->smb_ses_list) {
ses = list_entry(tmp2, struct cifs_ses,
smb_ses_list);
+ if (ses->status == CifsExiting)
+ continue;
list_for_each(tmp3, &ses->tcon_list) {
tcon = list_entry(tmp3,
struct cifs_tcon,
--
2.25.1
From: Ard Biesheuvel <ardb(a)kernel.org>
This is the stable backport of commit
c00b413a96261fae ("x86/boot: Sanitize boot params before parsing command line")
to the v6.6 and v6.1 trees.
Patch #2 can be applied to both v6.1 and v6.6. Patch #1 is a
prerequisite that is already present in v6.1 but was not backported to
v6.6 yet for reasons that are unclear to me, and so it needs to be
applied to v6.6 first.
Ard Biesheuvel (2):
x86/boot: Rename conflicting 'boot_params' pointer to
'boot_params_ptr'
x86/boot: Sanitize boot params before parsing command line
arch/x86/boot/compressed/acpi.c | 14 +++++------
arch/x86/boot/compressed/cmdline.c | 4 +--
arch/x86/boot/compressed/ident_map_64.c | 7 +++---
arch/x86/boot/compressed/kaslr.c | 26 ++++++++++----------
arch/x86/boot/compressed/mem.c | 6 ++---
arch/x86/boot/compressed/misc.c | 26 ++++++++++----------
arch/x86/boot/compressed/misc.h | 1 -
arch/x86/boot/compressed/pgtable_64.c | 11 +++++----
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/include/asm/boot.h | 2 ++
drivers/firmware/efi/libstub/x86-stub.c | 2 +-
drivers/firmware/efi/libstub/x86-stub.h | 2 --
12 files changed, 52 insertions(+), 51 deletions(-)
--
2.49.0.rc0.332.g42c0ae87b1-goog
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x c50f8e6053b0503375c2975bf47f182445aebb4c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030928-bonanza-unscrew-4f1c@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c50f8e6053b0503375c2975bf47f182445aebb4c Mon Sep 17 00:00:00 2001
From: Barry Song <baohua(a)kernel.org>
Date: Wed, 26 Feb 2025 13:14:00 +1300
Subject: [PATCH] mm: fix kernel BUG when userfaultfd_move encounters swapcache
userfaultfd_move() checks whether the PTE entry is present or a
swap entry.
- If the PTE entry is present, move_present_pte() handles folio
migration by setting:
src_folio->index = linear_page_index(dst_vma, dst_addr);
- If the PTE entry is a swap entry, move_swap_pte() simply copies
the PTE to the new dst_addr.
This approach is incorrect because, even if the PTE is a swap entry,
it can still reference a folio that remains in the swap cache.
This creates a race window between steps 2 and 4.
1. add_to_swap: The folio is added to the swapcache.
2. try_to_unmap: PTEs are converted to swap entries.
3. pageout: The folio is written back.
4. Swapcache is cleared.
If userfaultfd_move() occurs in the window between steps 2 and 4,
after the swap PTE has been moved to the destination, accessing the
destination triggers do_swap_page(), which may locate the folio in
the swapcache. However, since the folio's index has not been updated
to match the destination VMA, do_swap_page() will detect a mismatch.
This can result in two critical issues depending on the system
configuration.
If KSM is disabled, both small and large folios can trigger a BUG
during the add_rmap operation due to:
page_pgoff(folio, page) != linear_page_index(vma, address)
[ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c index:0xffffaf150 pfn:0x4667c
[ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_mapped:1 pincount:0
[ 13.337716] memcg:ffff00000405f000
[ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|owner_priv_1|head|swapbacked|node=0|zone=0|lastcpupid=0xffff)
[ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.339204] head: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff00000000 0000000000000001
[ 13.339546] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address))
[ 13.340190] ------------[ cut here ]------------
[ 13.340316] kernel BUG at mm/rmap.c:1380!
[ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 13.340969] Modules linked in:
[ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-rc3-gcf42737e247a-dirty #299
[ 13.341470] Hardware name: linux,dummy-virt (DT)
[ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0
[ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0
[ 13.342018] sp : ffff80008752bb20
[ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 0000000000000001
[ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
[ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffdffc0199f00
[ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 00000000ffffffff
[ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662866666f67705f
[ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff800083728ab0
[ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff80008011bc40
[ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000829eebf8
[ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000
[ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000000000000005f
[ 13.343876] Call trace:
[ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P)
[ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320
[ 13.344333] do_swap_page+0x1060/0x1400
[ 13.344417] __handle_mm_fault+0x61c/0xbc8
[ 13.344504] handle_mm_fault+0xd8/0x2e8
[ 13.344586] do_page_fault+0x20c/0x770
[ 13.344673] do_translation_fault+0xb4/0xf0
[ 13.344759] do_mem_abort+0x48/0xa0
[ 13.344842] el0_da+0x58/0x130
[ 13.344914] el0t_64_sync_handler+0xc4/0x138
[ 13.345002] el0t_64_sync+0x1ac/0x1b0
[ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000)
[ 13.345504] ---[ end trace 0000000000000000 ]---
[ 13.345715] note: a.out[107] exited with irqs disabled
[ 13.345954] note: a.out[107] exited with preempt_count 2
If KSM is enabled, Peter Xu also discovered that do_swap_page() may
trigger an unexpected CoW operation for small folios because
ksm_might_need_to_copy() allocates a new folio when the folio index
does not match linear_page_index(vma, addr).
This patch also checks the swapcache when handling swap entries. If a
match is found in the swapcache, it processes it similarly to a present
PTE.
However, there are some differences. For example, the folio is no longer
exclusive because folio_try_share_anon_rmap_pte() is performed during
unmapping.
Furthermore, in the case of swapcache, the folio has already been
unmapped, eliminating the risk of concurrent rmap walks and removing the
need to acquire src_folio's anon_vma or lock.
Note that for large folios, in the swapcache handling path, we directly
return -EBUSY since split_folio() will return -EBUSY regardless if
the folio is under writeback or unmapped. This is not an urgent issue,
so a follow-up patch may address it separately.
[v-songbaohua(a)oppo.com: minor cleanup according to Peter Xu]
Link: https://lkml.kernel.org/r/20250226024411.47092-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20250226001400.9129-1-21cnbao@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Nicolas Geoffray <ngeoffray(a)google.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: ZhangPeng <zhangpeng362(a)huawei.com>
Cc: Tangquan Zheng <zhengtangquan(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index af3dfc3633db..c45b672e10d1 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -18,6 +18,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include "internal.h"
+#include "swap.h"
static __always_inline
bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
@@ -1076,16 +1077,14 @@ static int move_present_pte(struct mm_struct *mm,
return err;
}
-static int move_swap_pte(struct mm_struct *mm,
+static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
unsigned long dst_addr, unsigned long src_addr,
pte_t *dst_pte, pte_t *src_pte,
pte_t orig_dst_pte, pte_t orig_src_pte,
pmd_t *dst_pmd, pmd_t dst_pmdval,
- spinlock_t *dst_ptl, spinlock_t *src_ptl)
+ spinlock_t *dst_ptl, spinlock_t *src_ptl,
+ struct folio *src_folio)
{
- if (!pte_swp_exclusive(orig_src_pte))
- return -EBUSY;
-
double_pt_lock(dst_ptl, src_ptl);
if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte,
@@ -1094,6 +1093,16 @@ static int move_swap_pte(struct mm_struct *mm,
return -EAGAIN;
}
+ /*
+ * The src_folio resides in the swapcache, requiring an update to its
+ * index and mapping to align with the dst_vma, where a swap-in may
+ * occur and hit the swapcache after moving the PTE.
+ */
+ if (src_folio) {
+ folio_move_anon_rmap(src_folio, dst_vma);
+ src_folio->index = linear_page_index(dst_vma, dst_addr);
+ }
+
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
@@ -1141,6 +1150,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
__u64 mode)
{
swp_entry_t entry;
+ struct swap_info_struct *si = NULL;
pte_t orig_src_pte, orig_dst_pte;
pte_t src_folio_pte;
spinlock_t *src_ptl, *dst_ptl;
@@ -1322,6 +1332,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
orig_dst_pte, orig_src_pte, dst_pmd,
dst_pmdval, dst_ptl, src_ptl, src_folio);
} else {
+ struct folio *folio = NULL;
+
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
@@ -1335,9 +1347,53 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
goto out;
}
- err = move_swap_pte(mm, dst_addr, src_addr, dst_pte, src_pte,
- orig_dst_pte, orig_src_pte, dst_pmd,
- dst_pmdval, dst_ptl, src_ptl);
+ if (!pte_swp_exclusive(orig_src_pte)) {
+ err = -EBUSY;
+ goto out;
+ }
+
+ si = get_swap_device(entry);
+ if (unlikely(!si)) {
+ err = -EAGAIN;
+ goto out;
+ }
+ /*
+ * Verify the existence of the swapcache. If present, the folio's
+ * index and mapping must be updated even when the PTE is a swap
+ * entry. The anon_vma lock is not taken during this process since
+ * the folio has already been unmapped, and the swap entry is
+ * exclusive, preventing rmap walks.
+ *
+ * For large folios, return -EBUSY immediately, as split_folio()
+ * also returns -EBUSY when attempting to split unmapped large
+ * folios in the swapcache. This issue needs to be resolved
+ * separately to allow proper handling.
+ */
+ if (!src_folio)
+ folio = filemap_get_folio(swap_address_space(entry),
+ swap_cache_index(entry));
+ if (!IS_ERR_OR_NULL(folio)) {
+ if (folio_test_large(folio)) {
+ err = -EBUSY;
+ folio_put(folio);
+ goto out;
+ }
+ src_folio = folio;
+ src_folio_pte = orig_src_pte;
+ if (!folio_trylock(src_folio)) {
+ pte_unmap(&orig_src_pte);
+ pte_unmap(&orig_dst_pte);
+ src_pte = dst_pte = NULL;
+ put_swap_device(si);
+ si = NULL;
+ /* now we can block and wait */
+ folio_lock(src_folio);
+ goto retry;
+ }
+ }
+ err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte,
+ orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval,
+ dst_ptl, src_ptl, src_folio);
}
out:
@@ -1354,6 +1410,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
if (src_pte)
pte_unmap(src_pte);
mmu_notifier_invalidate_range_end(&range);
+ if (si)
+ put_swap_device(si);
return err;
}
Backport of a similar change from commit 5ac9b4e935df ("lib/buildid:
Handle memfd_secret() files in build_id_parse()") to address an issue
where accessing secret memfd contents through build_id_parse() would
trigger faults.
Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
This repro will cause BUG: unable to handle kernel paging request in
build_id_parse in 5.15/6.1/6.6.
Some other discussions can be found in [1].
[1] https://lore.kernel.org/bpf/20241104175256.2327164-1-jolsa@kernel.org/T/#u
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..9db35305f257 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
Hi all,
Christian sent a fix [1] for ARM_VECTORS with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION that exposed a deficiency in ld.lld
with regards to KEEP() within an OVERLAY description. I have fixed that
in ld.lld [2] and added a patch before Christian's to disallow
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION when KEEP() cannot be used within
OVERLAY to keep everything working for all linkers.
[1]: https://lore.kernel.org/20250221125520.14035-1-ceggers@arri.de/
[2]: https://github.com/llvm/llvm-project/commit/381599f1fe973afad3094e55ec99b16…
---
Christian Eggers (1):
ARM: add KEEP() keyword to ARM_VECTORS
Nathan Chancellor (1):
ARM: Require linker to support KEEP within OVERLAY for DCE
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/vmlinux.lds.h | 12 +++++++++---
init/Kconfig | 5 +++++
3 files changed, 15 insertions(+), 4 deletions(-)
---
base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
change-id: 20250311-arm-fix-vectors-with-linker-dce-83475b0b8f5b
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Sometimes I get a NULL pointer dereference at boot time in kobject_get()
with the following call stack:
anatop_regulator_probe()
devm_regulator_register()
regulator_register()
regulator_resolve_supply()
kobject_get()
By placing some extra BUG_ON() statements I could verify that this is
raised because probing of the 'dummy' regulator driver is not completed
('dummy_regulator_rdev' is still NULL).
In the JTAG debugger I can see that dummy_regulator_probe() and
anatop_regulator_probe() can be run by different kernel threads
(kworker/u4:*). I haven't further investigated whether this can be
changed or if there are other possibilities to force synchronization
between these two probe routines. On the other hand I don't expect much
boot time penalty by probing the 'dummy' regulator synchronously.
Cc: stable(a)vger.kernel.org
Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14")
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
---
v2:
- no changes
drivers/regulator/dummy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/dummy.c b/drivers/regulator/dummy.c
index 5b9b9e4e762d..9f59889129ab 100644
--- a/drivers/regulator/dummy.c
+++ b/drivers/regulator/dummy.c
@@ -60,7 +60,7 @@ static struct platform_driver dummy_regulator_driver = {
.probe = dummy_regulator_probe,
.driver = {
.name = "reg-dummy",
- .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
},
};
--
2.44.1
From: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
CSA is currently not supported on mt7925, so CSA is only registered for
the mt7921 series
Cc: stable(a)vger.kernel.org
Fixes: 8aa2f59260eb ("wifi: mt76: mt7921: introduce CSA support")
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt792x_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt792x_core.c b/drivers/net/wireless/mediatek/mt76/mt792x_core.c
index 8799627f6292..0f7806f6338d 100644
--- a/drivers/net/wireless/mediatek/mt76/mt792x_core.c
+++ b/drivers/net/wireless/mediatek/mt76/mt792x_core.c
@@ -665,7 +665,8 @@ int mt792x_init_wiphy(struct ieee80211_hw *hw)
ieee80211_hw_set(hw, SUPPORTS_DYNAMIC_PS);
ieee80211_hw_set(hw, SUPPORTS_VHT_EXT_NSS_BW);
ieee80211_hw_set(hw, CONNECTION_MONITOR);
- ieee80211_hw_set(hw, CHANCTX_STA_CSA);
+ if (is_mt7921(&dev->mt76))
+ ieee80211_hw_set(hw, CHANCTX_STA_CSA);
if (dev->pm.enable)
ieee80211_hw_set(hw, CONNECTION_MONITOR);
--
2.45.2
Hello.....
Good day to you I am writing this message to you to seek your
consent regarding onward investment plan, in any country of yours.
My name is Ms Gloria Johson., I'm the CEO of AMP
Resources LTD. We are involved in all sectors of oil and gas production.
Our operation range from crude extraction , transportation, retail and distribution.
I am a Canadian born Russian resident in Saint Petersburg
Russia.
Because of the war in Russia , I want to start moving my funds to a trusted person.
funds out of countries i mentioned as I can't no longer stand the hard sanctions, over my funds in Holland and Paris and UK
Base on.russian sanctions is facing as a result of Putin and his stubbornness. I
have deposits i would like to move, each of the deposit is 20m l need a trusted
person to manage the heritage.
Best regards,
Ms Gloria Johson.
Send your response to my above email.
The patch titled
Subject: mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-contig_alloc-fix-alloc_contig_range-when-__gfp_comp-and-order-max_order.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jinjiang Tu <tujinjiang(a)huawei.com>
Subject: mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
Date: Wed, 12 Mar 2025 16:47:05 +0800
When calling alloc_contig_range() with __GFP_COMP and the order of
requested pfn range is pageblock_order, less than MAX_ORDER, I triggered
WARNING as follows:
PFN range: requested [2150105088, 2150105600), allocated [2150105088, 2150106112)
WARNING: CPU: 3 PID: 580 at mm/page_alloc.c:6877 alloc_contig_range+0x280/0x340
alloc_contig_range() marks pageblocks of the requested pfn range to be
isolated, migrate these pages if they are in use and will be freed to
MIGRATE_ISOLATED freelist.
Suppose two alloc_contig_range() calls at the same time and the requested
pfn range are [0x80280000, 0x80280200) and [0x80280200, 0x80280400)
respectively. Suppose the two memory range are in use, then
alloc_contig_range() will migrate and free these pages to MIGRATE_ISOLATED
freelist. __free_one_page() will merge MIGRATE_ISOLATE buddy to larger
buddy, resulting in a MAX_ORDER buddy. Finally, find_large_buddy() in
alloc_contig_range() returns a MAX_ORDER buddy and results in WARNING.
To fix it, call free_contig_range() to free the excess pfn range.
Link: https://lkml.kernel.org/r/20250312084705.2938220-1-tujinjiang@huawei.com
Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-contig_alloc-fix-alloc_contig_range-when-__gfp_comp-and-order-max_order
+++ a/mm/page_alloc.c
@@ -6528,7 +6528,8 @@ int alloc_contig_range_noprof(unsigned l
goto done;
}
- if (!(gfp_mask & __GFP_COMP)) {
+ if (!(gfp_mask & __GFP_COMP) ||
+ (is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)) {
split_free_pages(cc.freepages, gfp_mask);
/* Free head and tail (if any) */
@@ -6536,7 +6537,15 @@ int alloc_contig_range_noprof(unsigned l
free_contig_range(outer_start, start - outer_start);
if (end != outer_end)
free_contig_range(end, outer_end - end);
- } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
+
+ outer_start = start;
+ outer_end = end;
+
+ if (!(gfp_mask & __GFP_COMP))
+ goto done;
+ }
+
+ if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
struct page *head = pfn_to_page(start);
int order = ilog2(end - start);
_
Patches currently in -mm which might be from tujinjiang(a)huawei.com are
mm-hugetlb-fix-surplus-pages-in-dissolve_free_huge_page.patch
mm-contig_alloc-fix-alloc_contig_range-when-__gfp_comp-and-order-max_order.patch
mm-hugetlb-fix-set_max_huge_pages-when-there-are-surplus-pages.patch
The patch titled
Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
has been added to the -mm mm-unstable branch. Its filename is
mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages
Date: Wed, 12 Mar 2025 19:28:51 +0800
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action
Optional) signature in the machine check bank. This was overkill because
it's not an urgent problem that no core is on the verge of consuming that
bad data. It's also found that multi SRAO UCE may cause nested MCE
interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub
from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
#CMCI. Just to add to the confusion, Linux does take an action (in
uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.
- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if
an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).
- Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. As a result,
kill_accessing_process():
- call walk_page_range() and return 1 regardless of whether
try_to_unmap() succeeds or fails,
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the
process and kill_me_maybe() doesn't have to send it again.
However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
PTE unchanged and not converted to a hwpoison entry. Conversely, for
clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send
a SIGBUS.
Console log looks like this:
Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
Memory failure: 0x827ca68: already hardware poisoned
mce: Memory error not recovered
To fix it, return 0 for "corrupted page was clean", preventing an
unnecessary SIGBUS to user process.
[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.…
Link: https://lkml.kernel.org/r/20250312112852.82415-3-xueshuai@linux.alibaba.com
Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Tested-by: Tony Luck <tony.luck(a)intel.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ruidong Tian <tianruidong(a)linux.alibaba.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam(a)amd.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages
+++ a/mm/memory-failure.c
@@ -881,12 +881,17 @@ static int kill_accessing_process(struct
mmap_read_lock(p->mm);
ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwpoison_walk_ops,
(void *)&priv);
+ /*
+ * ret = 1 when CMCI wins, regardless of whether try_to_unmap()
+ * succeeds or fails, then kill the process with SIGBUS.
+ * ret = 0 when poison page is a clean page and it's dropped, no
+ * SIGBUS is needed.
+ */
if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
- else
- ret = 0;
mmap_read_unlock(p->mm);
- return ret > 0 ? -EHWPOISON : -EFAULT;
+
+ return ret > 0 ? -EHWPOISON : 0;
}
/*
_
Patches currently in -mm which might be from xueshuai(a)linux.alibaba.com are
x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context.patch
mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch
mm-memory-failure-enhance-comments-for-return-value-of-memory_failure.patch
The patch titled
Subject: x86/mce: use is_copy_from_user() to determine copy-from-user context
has been added to the -mm mm-unstable branch. Its filename is
x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
Subject: x86/mce: use is_copy_from_user() to determine copy-from-user context
Date: Wed, 12 Mar 2025 19:28:50 +0800
Patch series "mm/hwpoison: Fix regressions in memory failure handling",
v4.
## 1. What am I trying to do:
This patchset resolves two critical regressions related to memory failure
handling that have appeared in the upstream kernel since version 5.17, as
compared to 5.10 LTS.
- copyin case: poison found in user page while kernel copying from user space
- instr case: poison found while instruction fetching in user space
## 2. What is the expected outcome and why
- For copyin case:
Kernel can recover from poison found where kernel is doing get_user() or
copy_from_user() if those places get an error return and the kernel return
-EFAULT to the process instead of crashing. More specifily, MCE handler
checks the fixup handler type to decide whether an in kernel #MC can be
recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code
specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space.
- For instr case:
If a poison found while instruction fetching in user space, full recovery
is possible. User process takes #PF, Linux allocates a new page and fills
by reading from storage.
## 3. What actually happens and why
- For copyin case: kernel panic since v5.17
Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the
extable fixup type for copy-from-user operations, changing it from
EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS
handling when posion found in get_user() or copy_from_user().
- For instr case: user process is killed by a SIGBUS signal due to #CMCI
and #MCE race
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action
Optional) signature in the machine check bank. This was overkill because
it's not an urgent problem that no core is on the verge of consuming that
bad data. It's also found that multi SRAO UCE may cause nested MCE
interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub
from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
#CMCI. Just to add to the confusion, Linux does take an action (in
uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.
### Background: why #CMCI and #MCE race when poison is consuming in
Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if
an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a
speculative read.
2) Core finds it is going to use address A soon after sending the read
request
3) The CMCI from the memory controller is in a race with MCE from the
core that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).
## Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.
# 4. The fix, in my opinion, should be:
- For copyin case:
The key point is whether the error context is in a read from user memory.
We do not care about the ex-type if we know its a MOV reading from
userspace.
is_copy_from_user() return true when both of the following two checks are
true:
- the current instruction is copy
- source address is user memory
If copy_user is true, we set
m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
Then do_machine_check() will try fixup_exception() first.
- For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS.
- For patch 3:
The return value of memory_failure() is quite important while discussed
instr case regression with Tony and Miaohe for patch 2, so add comment
about the return value.
This patch (of 3):
Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
("x86/futex: Remove .fixup usage") updated the extable fixup type for
copy-from-user operations, changing it from EX_TYPE_UACCESS to
EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
longer functions as an in-kernel recovery context. Consequently, the
error context for copy-from-user operations no longer functions as an
in-kernel recovery context, resulting in kernel panics with the message:
"Machine check: Data load in unrecoverable area of kernel."
To address this, it is crucial to identify if an error context involves a
read operation from user memory. The function is_copy_from_user() can be
utilized to determine:
- the current operation is copy
- when reading user memory
When these conditions are met, is_copy_from_user() will return true,
confirming that it is indeed a direct copy from user memory. This check
is essential for correctly handling the context of errors in these
operations without relying on the extable fixup types that previously
allowed for in-kernel recovery.
So, use is_copy_from_user() to determine if a context is copy user directly.
Link: https://lkml.kernel.org/r/20250312112852.82415-1-xueshuai@linux.alibaba.com
Link: https://lkml.kernel.org/r/20250312112852.82415-2-xueshuai@linux.alibaba.com
Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage")
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Acked-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Tested-by: Tony Luck <tony.luck(a)intel.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Ruidong Tian <tianruidong(a)linux.alibaba.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam(a)amd.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/cpu/mce/severity.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/cpu/mce/severity.c~x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context
+++ a/arch/x86/kernel/cpu/mce/severity.c
@@ -300,13 +300,12 @@ static noinstr int error_context(struct
copy_user = is_copy_from_user(regs);
instrumentation_end();
- switch (fixup_type) {
- case EX_TYPE_UACCESS:
- if (!copy_user)
- return IN_KERNEL;
- m->kflags |= MCE_IN_KERNEL_COPYIN;
- fallthrough;
+ if (copy_user) {
+ m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
+ return IN_KERNEL_RECOV;
+ }
+ switch (fixup_type) {
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
_
Patches currently in -mm which might be from xueshuai(a)linux.alibaba.com are
x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context.patch
mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch
mm-memory-failure-enhance-comments-for-return-value-of-memory_failure.patch
The patch titled
Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock
has been added to the -mm mm-unstable branch. Its filename is
mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock
Date: Wed, 12 Mar 2025 10:10:13 -0400
The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of node
reclaim for struct pglist_data using a single bit.
It is "locked" with a test_and_set_bit (similarly to a try lock) which
provides full ordering with respect to loads and stores done within
__node_reclaim().
It is "unlocked" with clear_bit(), which does not provide any ordering
with respect to loads and stores done before clearing the bit.
The lack of clear_bit() memory ordering with respect to stores within
__node_reclaim() can cause a subsequent CPU to fail to observe stores from
a prior node reclaim. This is not an issue in practice on TSO (e.g.
x86), but it is an issue on weakly-ordered architectures (e.g. arm64).
Fix this by using clear_bit_unlock rather than clear_bit to clear
PGDAT_RECLAIM_LOCKED with a release memory ordering semantic.
This provides stronger memory ordering (release rather than relaxed).
Link: https://lkml.kernel.org/r/20250312141014.129725-1-mathieu.desnoyers@efficio…
Fixes: d773ed6b856a ("mm: test and set zone reclaim lock before starting reclaim")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea(a)gmail.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Jade Alglave <j.alglave(a)ucl.ac.uk>
Cc: Luc Maranget <luc.maranget(a)inria.fr>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock
+++ a/mm/vmscan.c
@@ -7581,7 +7581,7 @@ int node_reclaim(struct pglist_data *pgd
return NODE_RECLAIM_NOSCAN;
ret = __node_reclaim(pgdat, gfp_mask, order);
- clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
+ clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
if (ret)
count_vm_event(PGSCAN_ZONE_RECLAIM_SUCCESS);
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch
mm-lock-pgdat_reclaim_locked-with-acquire-memory-ordering.patch
The patch titled
Subject: mm/userfaultfd: Fix release hang over concurrent GUP
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-fix-release-hang-over-concurrent-gup.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/userfaultfd: Fix release hang over concurrent GUP
Date: Wed, 12 Mar 2025 10:51:31 -0400
This patch should fix a possible userfaultfd release() hang during
concurrent GUP.
This problem was initially reported by Dimitris Siakavaras in July 2023
[1] in a firecracker use case. Firecracker has a separate process
handling page faults remotely, and when the process releases the
userfaultfd it can race with a concurrent GUP from KVM trying to fault in
a guest page during the secondary MMU page fault process.
A similar problem was reported recently again by Jinjiang Tu in March 2025
[2], even though the race happened this time with a mlockall() operation,
which does GUP in a similar fashion.
In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the
uffd without triggering SIGBUS") was trying to fix this issue. AFAIU,
that fixes well the fault paths but may not work yet for GUP. In GUP, the
issue is NOPAGE will be almost treated the same as "page fault resolved"
in faultin_page(), then the GUP will follow page again, seeing page
missing, and it'll keep going into a live lock situation as reported.
This change makes core mm return RETRY instead of NOPAGE for both the GUP
and fault paths, proactively releasing the mmap read lock. This should
guarantee the other release thread make progress on taking the write lock
and avoid the live lock even for GUP.
When at it, rearrange the comments to make sure it's uptodate.
[1] https://lore.kernel.org/r/79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.nt…
[2] https://lore.kernel.org/r/20250307072133.3522652-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20250312145131.1143062-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Jinjiang Tu <tujinjiang(a)huawei.com>
Cc: Dimitris Siakavaras <jimsiak(a)cslab.ece.ntua.gr>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 51 ++++++++++++++++++++++-----------------------
1 file changed, 25 insertions(+), 26 deletions(-)
--- a/fs/userfaultfd.c~mm-userfaultfd-fix-release-hang-over-concurrent-gup
+++ a/fs/userfaultfd.c
@@ -396,32 +396,6 @@ vm_fault_t handle_userfault(struct vm_fa
goto out;
/*
- * If it's already released don't get it. This avoids to loop
- * in __get_user_pages if userfaultfd_release waits on the
- * caller of handle_userfault to release the mmap_lock.
- */
- if (unlikely(READ_ONCE(ctx->released))) {
- /*
- * Don't return VM_FAULT_SIGBUS in this case, so a non
- * cooperative manager can close the uffd after the
- * last UFFDIO_COPY, without risking to trigger an
- * involuntary SIGBUS if the process was starting the
- * userfaultfd while the userfaultfd was still armed
- * (but after the last UFFDIO_COPY). If the uffd
- * wasn't already closed when the userfault reached
- * this point, that would normally be solved by
- * userfaultfd_must_wait returning 'false'.
- *
- * If we were to return VM_FAULT_SIGBUS here, the non
- * cooperative manager would be instead forced to
- * always call UFFDIO_UNREGISTER before it can safely
- * close the uffd.
- */
- ret = VM_FAULT_NOPAGE;
- goto out;
- }
-
- /*
* Check that we can return VM_FAULT_RETRY.
*
* NOTE: it should become possible to return VM_FAULT_RETRY
@@ -457,6 +431,31 @@ vm_fault_t handle_userfault(struct vm_fa
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
goto out;
+ if (unlikely(READ_ONCE(ctx->released))) {
+ /*
+ * If a concurrent release is detected, do not return
+ * VM_FAULT_SIGBUS or VM_FAULT_NOPAGE, but instead always
+ * return VM_FAULT_RETRY with lock released proactively.
+ *
+ * If we were to return VM_FAULT_SIGBUS here, the non
+ * cooperative manager would be instead forced to
+ * always call UFFDIO_UNREGISTER before it can safely
+ * close the uffd, to avoid involuntary SIGBUS triggered.
+ *
+ * If we were to return VM_FAULT_NOPAGE, it would work for
+ * the fault path, in which the lock will be released
+ * later. However for GUP, faultin_page() does nothing
+ * special on NOPAGE, so GUP would spin retrying without
+ * releasing the mmap read lock, causing possible livelock.
+ *
+ * Here only VM_FAULT_RETRY would make sure the mmap lock
+ * be released immediately, so that the thread concurrently
+ * releasing the userfault would always make progress.
+ */
+ release_fault_lock(vmf);
+ goto out;
+ }
+
/* take the reference before dropping the mmap_lock */
userfaultfd_ctx_get(ctx);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-userfaultfd-fix-release-hang-over-concurrent-gup.patch
Hi there,
Hope you're having a great day!
Would you be interested in a recently verified list of NetApp clients to support your outreach?
Let me know, and I'll be happy to share the details.
Best regards,
Kevin Martin
Demand Consultant
If you wish to stop receiving emails, reply with Abolish.
Hi there,
I am following up on my previous email.
Could you kindly share your thoughts when you have a moment?
Regards,
Kristina
________________________________
From: Kristina Williams
Sent: 07 March 2025 07:58
To: linux-stable-mirror(a)lists.linaro.org<mailto:linux-stable-mirror@lists.linaro.org>
Subject: Shopify POS new users
Hi there,
I hope you're doing well.
Would you be open to exploring our freshly verified Shopify POS Users Data? This could be a valuable resource for your marketing and outreach efforts.
Let me know if you're interested, and I'd be happy to share the available count and more details.
Looking forward to your response.
Best Regards,
Kristina Williams
Demand Generation Executive
Please respond with cancel, if not needed.
As part of I3C driver probing sequence for particular device instance,
While adding to queue it is trying to access ibi variable of dev which is
not yet initialized causing "Unable to handle kernel read from unreadable
memory" resulting in kernel panic.
Below is the sequence where this issue happened.
1. During boot up sequence IBI is received at host from the slave device
before requesting for IBI, Usually will request IBI by calling
i3c_device_request_ibi() during probe of slave driver.
2. Since master code trying to access IBI Variable for the particular
device instance before actually it initialized by slave driver,
due to this randomly accessing the address and causing kernel panic.
3. i3c_device_request_ibi() function invoked by the slave driver where
dev->ibi = ibi; assigned as part of function call
i3c_dev_request_ibi_locked().
4. But when IBI request sent by slave device, master code trying to access
this variable before its initialized due to this race condition
situation kernel panic happened.
fixes: dd3c52846d595 (i3c: master: svc: Add Silvaco I3C master driver)
Cc: stable(a)vger.kernel.org
Signed-off-by: Manjunatha Venkatesh <manjunatha.venkatesh(a)nxp.com>
---
Changes since v2:
- Description updated as per the review feedback
drivers/i3c/master/svc-i3c-master.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index d6057d8c7dec..98c4d2e5cd8d 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -534,8 +534,11 @@ static void svc_i3c_master_ibi_work(struct work_struct *work)
switch (ibitype) {
case SVC_I3C_MSTATUS_IBITYPE_IBI:
if (dev) {
- i3c_master_queue_ibi(dev, master->ibi.tbq_slot);
- master->ibi.tbq_slot = NULL;
+ data = i3c_dev_get_master_data(dev);
+ if (master->ibi.slots[data->ibi]) {
+ i3c_master_queue_ibi(dev, master->ibi.tbq_slot);
+ master->ibi.tbq_slot = NULL;
+ }
}
svc_i3c_master_emit_stop(master);
break;
--
2.46.1
The second parameter of memblock_set_node() is size instead of end.
Since it iterates from lower address to higher address, finally the node
id is correct. But during the process, some of them are wrong.
Pass size instead of end.
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Mike Rapoport <rppt(a)kernel.org>
CC: Yajun Deng <yajun.deng(a)linux.dev>
CC: <stable(a)vger.kernel.org>
---
mm/memblock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 64ae678cd1d1..85442f1b7f14 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2192,7 +2192,7 @@ static void __init memmap_init_reserved_pages(void)
if (memblock_is_nomap(region))
reserve_bootmem_region(start, end, nid);
- memblock_set_node(start, end, &memblock.reserved, nid);
+ memblock_set_node(start, region->size, &memblock.reserved, nid);
}
/*
--
2.34.1
Hi folks,
This series fixes support for correctly saving and restoring fltcon0
and fltcon1 registers on gs101 for non-alive banks where the fltcon
register offset is not at a fixed offset (unlike previous SoCs).
This is done by adding a eint_fltcon_offset and providing GS101
specific pin macros that take an additional parameter (similar to
how exynosautov920 handles it's eint_con_offset).
Additionally the SoC specific suspend and resume callbacks are
re-factored so that each SoC variant has it's own callback containing
the peculiarities for that SoC.
Finally support for filter selection on alive banks is added, this is
currently only enabled for gs101. The code path can be excercised using
`echo mem > /sys/power/state`
regards,
Peter
To: Krzysztof Kozlowski <krzk(a)kernel.org>
To: Sylwester Nawrocki <s.nawrocki(a)samsung.com>
To: Alim Akhtar <alim.akhtar(a)samsung.com>
To: Linus Walleij <linus.walleij(a)linaro.org>
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-samsung-soc(a)vger.kernel.org
Cc: linux-gpio(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: andre.draszik(a)linaro.org
Cc: tudor.ambarus(a)linaro.org
Cc: willmcvicker(a)google.com
Cc: semen.protsenko(a)linaro.org
Cc: kernel-team(a)android.com
Cc: jaewon02.kim(a)samsung.com
Signed-off-by: Peter Griffin <peter.griffin(a)linaro.org>
---
Changes in v4:
- save->eint_fltcon1 is an argument to pr_debug(), not readl() change alignment accordingly (Andre)
- Link to v3: https://lore.kernel.org/r/20250306-pinctrl-fltcon-suspend-v3-0-f9ab4ff6a24e…
Changes in v3:
- Ensure EXYNOS_FLTCON_DIGITAL bit is cleared (Andre)
- Make it obvious that exynos_eint_set_filter() is conditional on bank type (Andre)
- Make it obvious exynos_set_wakeup() is conditional on bank type (Andre)
- Align style where the '+' is placed first (Andre)
- Remove unnecessary braces (Andre)
- Link to v2: https://lore.kernel.org/r/20250301-pinctrl-fltcon-suspend-v2-0-a7eef9bb443b…
Changes in v2:
- Remove eint_flt_selectable bool as it can be deduced from EINT_TYPE_WKUP (Peter)
- Move filter config register comment to header file (Andre)
- Rename EXYNOS_FLTCON_DELAY to EXYNOS_FLTCON_ANALOG (Andre)
- Remove misleading old comment (Andre)
- Refactor exynos_eint_update_flt_reg() into a loop (Andre)
- Split refactor of suspend/resume callbacks & gs101 parts into separate patches (Andre)
- Link to v1: https://lore.kernel.org/r/20250120-pinctrl-fltcon-suspend-v1-0-e77900b2a854…
---
Peter Griffin (4):
pinctrl: samsung: add support for eint_fltcon_offset
pinctrl: samsung: add dedicated SoC eint suspend/resume callbacks
pinctrl: samsung: add gs101 specific eint suspend/resume callbacks
pinctrl: samsung: Add filter selection support for alive bank on gs101
drivers/pinctrl/samsung/pinctrl-exynos-arm64.c | 150 ++++++-------
drivers/pinctrl/samsung/pinctrl-exynos.c | 294 +++++++++++++++----------
drivers/pinctrl/samsung/pinctrl-exynos.h | 50 ++++-
drivers/pinctrl/samsung/pinctrl-samsung.c | 12 +-
drivers/pinctrl/samsung/pinctrl-samsung.h | 12 +-
5 files changed, 318 insertions(+), 200 deletions(-)
---
base-commit: 0761652a3b3b607787aebc386d412b1d0ae8008c
change-id: 20250120-pinctrl-fltcon-suspend-2333a137c4d4
Best regards,
--
Peter Griffin <peter.griffin(a)linaro.org>
Prepare vPMC registers for user-initiated changes after first run. This
is important specifically for debugging Windows on QEMU with GDB; QEMU
tries to write back all visible registers when resuming the VM execution
with GDB, corrupting the PMU state. Windows always uses the PMU so this
can cause adverse effects on that particular OS.
This series also contains patch "KVM: arm64: PMU: Set raw values from
user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}", which reverts semantic
changes made for the mentioned registers in the past. It is necessary
to migrate the PMU state properly on Firecracker, QEMU, and crosvm.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v3:
- Added patch "KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Added an explanation of this path series' motivation to each patch.
- Explained why userspace register writes and register reset should be
covered in patch "KVM: arm64: PMU: Reload when user modifies
registers".
- Marked patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" for stable.
- Reoreded so that patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" would come first.
- Added patch "KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking
PMCNTENSET_EL0".
- Added patch "KVM: arm64: Reload PMCNTENSET_EL0".
- Link to v2: https://lore.kernel.org/r/20250307-pmc-v2-0-6c3375a5f1e4@daynix.com
Changes in v2:
- Changed to utilize KVM_REQ_RELOAD_PMU as suggested by Oliver Upton.
- Added patch "KVM: arm64: PMU: Reload when user modifies registers"
to cover more registers.
- Added patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}".
- Link to v1: https://lore.kernel.org/r/20250302-pmc-v1-1-caff989093dc@daynix.com
---
Akihiko Odaki (6):
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking PMCNTENSET_EL0
KVM: arm64: Reload PMCNTENSET_EL0
arch/arm64/kvm/arm.c | 8 ++++---
arch/arm64/kvm/guest.c | 12 +++++++++++
arch/arm64/kvm/pmu-emul.c | 54 ++++++++++++++++-------------------------------
arch/arm64/kvm/sys_regs.c | 53 ++++++++++++++++++++++++++--------------------
include/kvm/arm_pmu.h | 1 +
5 files changed, 66 insertions(+), 62 deletions(-)
---
base-commit: da2f480cb24d39d480b1e235eda0dd2d01f8765b
change-id: 20250302-pmc-b90a86af945c
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
From: Amit Sunil Dhamne <amitsd(a)google.com>
A subtle error got introduced while manually fixing merge conflict in
tcpm.c for commit 85c4efbe6088 ("Merge v6.12-rc6 into usb-next"). As a
result of this error, the next state is unconditionally set to
SNK_WAIT_CAPABILITIES_TIMEOUT while handling SNK_WAIT_CAPABILITIES state
in run_state_machine(...).
Fix this by setting new state of TCPM state machine to `upcoming_state`
(that is set to different values based on conditions).
Cc: stable(a)vger.kernel.org
Fixes: 85c4efbe60888 ("Merge v6.12-rc6 into usb-next")
Signed-off-by: Amit Sunil Dhamne <amitsd(a)google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
drivers/usb/typec/tcpm/tcpm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 47be450d2be352698e9dee2e283664cd4db8081b..758933d4ac9e4e55d45940b068f3c416e7e51ee8 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -5117,16 +5117,16 @@ static void run_state_machine(struct tcpm_port *port)
*/
if (port->vbus_never_low) {
port->vbus_never_low = false;
- tcpm_set_state(port, SNK_SOFT_RESET,
- port->timings.sink_wait_cap_time);
+ upcoming_state = SNK_SOFT_RESET;
} else {
if (!port->self_powered)
upcoming_state = SNK_WAIT_CAPABILITIES_TIMEOUT;
else
upcoming_state = hard_reset_state(port);
- tcpm_set_state(port, SNK_WAIT_CAPABILITIES_TIMEOUT,
- port->timings.sink_wait_cap_time);
}
+
+ tcpm_set_state(port, upcoming_state,
+ port->timings.sink_wait_cap_time);
break;
case SNK_WAIT_CAPABILITIES_TIMEOUT:
/*
---
base-commit: 5c8c229261f14159b54b9a32f12e5fa89d88b905
change-id: 20250310-fix-snk-wait-timeout-v6-14-rc6-7b4d9fb9bc99
Best regards,
--
Amit Sunil Dhamne <amitsd(a)google.com>
This series addresses GPU reset issues reported in [1], where running a
long compute job would trigger repeated GPU resets, leading to a UI
freeze.
Patches #1 and #2 prevent the same faulty job from being resubmitted in a
loop, mitigating the first cause of the issue.
However, the issue isn't entirely solved. Even with only a single GPU
reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
Patches #3 to #6 address this by properly configuring the V3D_SMS
registers, which are required for power management and resets in V3D 7.1.
Patch #7 updates the DT maintainership, replacing Emma with the current
v3d driver maintainer.
[1] https://github.com/raspberrypi/linux/issues/6660
Best Regards,
- Maíra
---
v1 -> v2:
- [1/6, 2/6, 5/6] Add Iago's R-b (Iago Toral)
- [3/6] Use V3D_GEN_* macros consistently throughout the driver (Phil Elwell)
- [3/6] Don't add Iago's R-b in 3/6 due to changes in the patch
- [4/6] Add per-compatible restrictions to enforce per‐SoC register rules (Conor Dooley)
- [6/6] Add Emma's A-b, collected through IRC (Emma Anholt)
- [6/6] Add Rob's A-b (Rob Herring)
- Link to v1: https://lore.kernel.org/r/20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@ig…
v2 -> v3:
- [3/7] Add Iago's R-b (Iago Toral)
- [4/7, 5/7] Separate the patches to ease the reviewing process -> Now,
PATCH 4/7 only adds the per-compatible rules and PATCH 5/7 adds the
SMS registers
- [4/7] `allOf` goes above `additionalProperties` (Krzysztof Kozlowski)
- [4/7, 5/7] Sync `reg` and `reg-names` items (Krzysztof Kozlowski)
- Link to v2: https://lore.kernel.org/r/20250308-v3d-gpu-reset-fixes-v2-0-2939c30f0cc4@ig…
---
Maíra Canal (7):
drm/v3d: Don't run jobs that have errors flagged in its fence
drm/v3d: Set job pointer to NULL when the job's fence has an error
drm/v3d: Associate a V3D tech revision to all supported devices
dt-bindings: gpu: v3d: Add per-compatible register restrictions
dt-bindings: gpu: v3d: Add SMS register to BCM2712 compatible
drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x
dt-bindings: gpu: Add V3D driver maintainer as DT maintainer
.../devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 77 +++++++++++--
drivers/gpu/drm/v3d/v3d_debugfs.c | 126 ++++++++++-----------
drivers/gpu/drm/v3d/v3d_drv.c | 62 +++++++++-
drivers/gpu/drm/v3d/v3d_drv.h | 22 +++-
drivers/gpu/drm/v3d/v3d_gem.c | 27 ++++-
drivers/gpu/drm/v3d/v3d_irq.c | 6 +-
drivers/gpu/drm/v3d/v3d_perfmon.c | 4 +-
drivers/gpu/drm/v3d/v3d_regs.h | 26 +++++
drivers/gpu/drm/v3d/v3d_sched.c | 29 ++++-
9 files changed, 281 insertions(+), 98 deletions(-)
---
base-commit: 9e75b6ef407fee5d4ed8021cd7ddd9d6a8f7b0e8
change-id: 20250224-v3d-gpu-reset-fixes-2d21fc70711d
From: Ard Biesheuvel <ardb(a)kernel.org>
GCC and Clang both implement stack protector support based on Thread
Local Storage (TLS) variables, and this is used in the kernel to
implement per-task stack cookies, by copying a task's stack cookie into
a per-CPU variable every time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits
the TLS variable to be specified directly. This is useful because it
will allow us to move away from using a fixed offset of 40 bytes into
the per-CPU area on x86_64, which requires a lot of special handling in
the per-CPU code and the runtime relocation code.
However, while GCC is rather lax in its implementation of this command
line option, Clang actually requires that the provided symbol name
refers to a TLS variable (i.e., one declared with __thread), although it
also permits the variable to be undeclared entirely, in which case it
will use an implicit declaration of the right type.
The upshot of this is that Clang will emit the correct references to the
stack cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in
the same compilation unit (which amounts to the whole of vmlinux if LTO
is enabled), it will drop the per-CPU prefix and emit a load from a
bogus address.
Work around this by using a symbol name that never occurs in C code, and
emit it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Cc: <stable(a)vger.kernel.org>
Cc: Fangrui Song <i(a)maskray.me>
Cc: Uros Bizjak <ubizjak(a)gmail.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Brian Gerst <brgerst(a)gmail.com>
---
arch/x86/Makefile | 5 +++--
arch/x86/entry/entry.S | 16 ++++++++++++++++
arch/x86/include/asm/asm-prototypes.h | 3 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/vmlinux.lds.S | 3 +++
5 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 324686bca368..b7ea3e8e9ecc 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -51,3 +51,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f41ab219cf1..9d42bd15e06c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2091,8 +2091,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 410546bacc0f..d61c3584f3e6 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -468,6 +468,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
--
2.47.0
From: Mark Brown <broonie(a)kernel.org>
commit 1601033da2dd2052e0489137f7788a46a8fcd82f upstream.
The controls allow inputs to be specified as negative but our manipulating
them into register fields need to be done on unsigned variables so the
checks for negative numbers weren't taking effect properly. Do the checks
for negative values on the variable in the ABI struct rather than on our
local unsigned copy.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20220128192443.3504823-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Dmitriy Privalov <d.privalov(a)omp.ru>
---
sound/soc/soc-ops.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c
index a83cd8d8a9633..a1087dfee532d 100644
--- a/sound/soc/soc-ops.c
+++ b/sound/soc/soc-ops.c
@@ -316,26 +316,26 @@ int snd_soc_put_volsw(struct snd_kcontrol *kcontrol,
if (sign_bit)
mask = BIT(sign_bit + 1) - 1;
+ if (ucontrol->value.integer.value[0] < 0)
+ return -EINVAL;
val = ucontrol->value.integer.value[0];
if (mc->platform_max && ((int)val + min) > mc->platform_max)
return -EINVAL;
if (val > max - min)
return -EINVAL;
- if (val < 0)
- return -EINVAL;
val = (val + min) & mask;
if (invert)
val = max - val;
val_mask = mask << shift;
val = val << shift;
if (snd_soc_volsw_is_stereo(mc)) {
+ if (ucontrol->value.integer.value[1] < 0)
+ return -EINVAL;
val2 = ucontrol->value.integer.value[1];
if (mc->platform_max && ((int)val2 + min) > mc->platform_max)
return -EINVAL;
if (val2 > max - min)
return -EINVAL;
- if (val2 < 0)
- return -EINVAL;
val2 = (val2 + min) & mask;
if (invert)
val2 = max - val2;
@@ -429,13 +429,13 @@ int snd_soc_put_volsw_sx(struct snd_kcontrol *kcontrol,
int err = 0;
unsigned int val, val_mask, val2 = 0;
+ if (ucontrol->value.integer.value[0] < 0)
+ return -EINVAL;
val = ucontrol->value.integer.value[0];
if (mc->platform_max && val > mc->platform_max)
return -EINVAL;
if (val > max)
return -EINVAL;
- if (val < 0)
- return -EINVAL;
val_mask = mask << shift;
val = (val + min) & mask;
val = val << shift;
--
2.34.1
The array for the iomapping cookie addresses has a length of
PCI_STD_NUM_BARS. This constant, however, only describes standard BARs;
while PCI can allow for additional, special BARs.
The total number of PCI resources is described by constant
PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars().
Thus, the devres array has so far been too small.
Change the length of the devres array to PCI_NUM_RESOURCES.
Cc: <stable(a)vger.kernel.org> # v6.11+
Fixes: bbaff68bf4a4 ("PCI: Add managed partial-BAR request and map infrastructure")
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
---
drivers/pci/devres.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 3431a7df3e0d..728ed0c7f70a 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -40,7 +40,7 @@
* Legacy struct storing addresses to whole mapped BARs.
*/
struct pcim_iomap_devres {
- void __iomem *table[PCI_STD_NUM_BARS];
+ void __iomem *table[PCI_NUM_RESOURCES];
};
/* Used to restore the old INTx state on driver detach. */
--
2.48.1
From: Mario Limonciello <mario.limonciello(a)amd.com>
[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR. Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.
[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.
Fixes: 06fbedfaf1a9 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965
Cc: stable(a)vger.kernel.org
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c b/drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c
index bf636b28e3e1..6e2fce329d73 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c
@@ -69,5 +69,16 @@ bool should_use_dmub_lock(struct dc_link *link)
if (link->replay_settings.replay_feature_enabled)
return true;
+ /* only use HW lock for PSR1 on single eDP */
+ if (link->psr_settings.psr_version == DC_PSR_VERSION_1) {
+ struct dc_link *edp_links[MAX_NUM_EDP];
+ int edp_num;
+
+ dc_get_edp_links(link->dc, edp_links, &edp_num);
+
+ if (edp_num == 1)
+ return true;
+ }
+
return false;
}
--
2.43.0
The patch titled
Subject: mm/page_alloc: fix memory accept before watermarks gets initialized
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/page_alloc: fix memory accept before watermarks gets initialized
Date: Mon, 10 Mar 2025 10:28:55 +0200
Watermarks are initialized during the postcore initcall. Until then, all
watermarks are set to zero. This causes cond_accept_memory() to
incorrectly skip memory acceptance because a watermark of 0 is always met.
This can lead to a premature OOM on boot.
To ensure progress, accept one MAX_ORDER page if the watermark is zero.
Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Tested-by: Farrah Chen <farrah.chen(a)intel.com>
Reported-by: Farrah Chen <farrah.chen(a)intel.com>
Cc: Ashish Kalra <ashish.kalra(a)amd.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe(a)intel.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: "Mike Rapoport (IBM)" <rppt(a)kernel.org>
Cc: Thomas Lendacky <thomas.lendacky(a)amd.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized
+++ a/mm/page_alloc.c
@@ -7004,7 +7004,7 @@ static inline bool has_unaccepted_memory
static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
- long to_accept;
+ long to_accept, wmark;
bool ret = false;
if (!has_unaccepted_memory())
@@ -7013,8 +7013,18 @@ static bool cond_accept_memory(struct zo
if (list_empty(&zone->unaccepted_pages))
return false;
+ wmark = promo_wmark_pages(zone);
+
+ /*
+ * Watermarks have not been initialized yet.
+ *
+ * Accepting one MAX_ORDER page to ensure progress.
+ */
+ if (!wmark)
+ return try_to_accept_memory_one(zone);
+
/* How much to accept to get to promo watermark? */
- to_accept = promo_wmark_pages(zone) -
+ to_accept = wmark -
(zone_page_state(zone, NR_FREE_PAGES) -
__zone_watermark_unusable_free(zone, order, 0) -
zone_page_state(zone, NR_UNACCEPTED));
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized.patch
In ThinPro, we use the convention <upstream_ver>+hp<patchlevel> for
the kernel package. This does not have a dash in the name or version.
This is built by editing ".version" before a build, and setting
EXTRAVERSION="+hp" and KDEB_PKGVERSION make variables:
echo 68 > .version
make -j<n> EXTRAVERSION="+hp" bindeb-pkg KDEB_PKGVERSION=6.6.6+hp69
.deb name: linux-image-6.6.6+hp_6.6.6+hp69_amd64.deb
Since commit 7d4f07d5cb71 ("kbuild: deb-pkg: squash
scripts/package/deb-build-option to debian/rules"), this no longer
works. The deb build logic changed, even though, the commit message
implies that the logic should be unmodified.
Before, KBUILD_BUILD_VERSION was not set if the KDEB_PKGVERSION did
not contain a dash. After the change KBUILD_BUILD_VERSION is always
set to KDEB_PKGVERSION. Since this determines UTS_VERSION,the uname
output to look off:
(now) uname -a: version 6.6.6+hp ... #6.6.6+hp69
(expected) uname -a: version 6.6.6+hp ... #69
Update the debian/rules logic to restore the original behavior.
Cc: <stable(a)vger.kernel.org> # v6.12+
Fixes: 7d4f07d5cb71 ("kbuild: deb-pkg: squash scripts/package/deb-build-option to debian/rules")
Signed-off-by: Alexandru Gagniuc <alexandru.gagniuc(a)hp.com>
---
scripts/package/debian/rules | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/scripts/package/debian/rules b/scripts/package/debian/rules
index ca07243bd5cd..bbc214f2e6bd 100755
--- a/scripts/package/debian/rules
+++ b/scripts/package/debian/rules
@@ -21,9 +21,13 @@ ifeq ($(origin KBUILD_VERBOSE),undefined)
endif
endif
-revision = $(lastword $(subst -, ,$(shell dpkg-parsechangelog -S Version)))
+debian_revision = $(shell dpkg-parsechangelog -S Version)
+revision = $(lastword $(subst -, ,$(debian_revision)))
CROSS_COMPILE ?= $(filter-out $(DEB_BUILD_GNU_TYPE)-, $(DEB_HOST_GNU_TYPE)-)
-make-opts = ARCH=$(ARCH) KERNELRELEASE=$(KERNELRELEASE) KBUILD_BUILD_VERSION=$(revision) $(addprefix CROSS_COMPILE=,$(CROSS_COMPILE))
+make-opts = ARCH=$(ARCH) KERNELRELEASE=$(KERNELRELEASE) $(addprefix CROSS_COMPILE=,$(CROSS_COMPILE))
+ifneq ($(revision), $(debian_revision))
+ make-opts+=KBUILD_BUILD_VERSION=$(revision)
+endif
binary-targets := $(addprefix binary-, image image-dbg headers libc-dev)
--
2.48.1
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4de6acb9b8ec..59dcaf6a3519 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1921,9 +1921,18 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old;
+ unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
+
+ /* drain_obj_stock requires stock_lock */
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
+ old = drain_obj_stock(stock);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+
drain_stock(stock);
+ obj_cgroup_put(old);
return 0;
}
--
2.47.1
The .rodata.(cst|str)* sections are often resized during the final
linking and since these sections do not cover actual symbols there is
no need to include them in the modules.builtin.ranges data.
When these sections were included in processing and resizing occurred,
modules were reported with ranges that extended beyond their true end,
causing subsequent symbols (in address order) to be associated with
the wrong module.
Fixes: 5f5e7344322f ("kbuild: generate offset range data for builtin modules")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kris Van Hees <kris.van.hees(a)oracle.com>
Reviewed-by: Jack Vogel <jack.vogel(a)oracle.com>
---
scripts/generate_builtin_ranges.awk | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
index b9ec761b3bef..d4bd5c2b998c 100755
--- a/scripts/generate_builtin_ranges.awk
+++ b/scripts/generate_builtin_ranges.awk
@@ -282,6 +282,11 @@ ARGIND == 2 && !anchor && NF == 2 && $1 ~ /^0x/ && $2 !~ /^0x/ {
# section.
#
ARGIND == 2 && sect && NF == 4 && /^ [^ \*]/ && !($1 in sect_addend) {
+ # There are a few sections with constant data (without symbols) that
+ # can get resized during linking, so it is best to ignore them.
+ if ($1 ~ /^\.rodata\.(cst|str)[0-9]/)
+ next;
+
if (!($1 in sect_base)) {
sect_base[$1] = base;
--
2.45.2
After cdev_alloc() succeed and cdev_add() failed, call cdev_del() to
remove unit->cdev from the system properly.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 8cb5d216ab33 ("char: xillybus: Move class-related functions to new xillybus_class.c")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the patch as suggestions to avoid UAF.
---
drivers/char/xillybus/xillybus_class.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/char/xillybus/xillybus_class.c b/drivers/char/xillybus/xillybus_class.c
index c92a628e389e..356af6551b0d 100644
--- a/drivers/char/xillybus/xillybus_class.c
+++ b/drivers/char/xillybus/xillybus_class.c
@@ -104,8 +104,7 @@ int xillybus_init_chrdev(struct device *dev,
if (rc) {
dev_err(dev, "Failed to add cdev.\n");
/* kobject_put() is normally done by cdev_del() */
- kobject_put(&unit->cdev->kobj);
- goto unregister_chrdev;
+ goto err_cdev;
}
for (i = 0; i < num_nodes; i++) {
@@ -157,6 +156,7 @@ int xillybus_init_chrdev(struct device *dev,
device_destroy(&xillybus_class, MKDEV(unit->major,
i + unit->lowest_minor));
+err_cdev:
cdev_del(unit->cdev);
unregister_chrdev:
--
2.25.1
The quilt patch titled
Subject: dma: kmsan: export kmsan_handle_dma() for modules
has been removed from the -mm tree. Its filename was
dma-kmsan-export-kmsan_handle_dma-for-modules.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Subject: dma: kmsan: export kmsan_handle_dma() for modules
Date: Tue, 18 Feb 2025 10:14:11 +0100
kmsan_handle_dma() is used by virtio_ring() which can be built as a
module. kmsan_handle_dma() needs to be exported otherwise building the
virtio_ring fails.
Export kmsan_handle_dma for modules.
Link: https://lkml.kernel.org/r/20250218091411.MMS3wBN9@linutronix.de
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502150634.qjxwSeJR-lkp@intel.com/
Fixes: 7ade4f10779c ("dma: kmsan: unpoison DMA mappings")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Macro Elver <elver(a)google.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/hooks.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kmsan/hooks.c~dma-kmsan-export-kmsan_handle_dma-for-modules
+++ a/mm/kmsan/hooks.c
@@ -357,6 +357,7 @@ void kmsan_handle_dma(struct page *page,
size -= to_go;
}
}
+EXPORT_SYMBOL_GPL(kmsan_handle_dma);
void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
enum dma_data_direction dir)
_
Patches currently in -mm which might be from bigeasy(a)linutronix.de are
rcu-provide-a-static-initializer-for-hlist_nulls_head.patch
ucount-replace-get_ucounts_or_wrap-with-atomic_inc_not_zero.patch
ucount-use-rcu-for-ucounts-lookups.patch
ucount-use-rcuref_t-for-reference-counting.patch
A Partial Region Controller can be connected to one or more
Freeze Bridge. Each Freeze Bridge has an illegal_request
bit represented in the freeze_illegal_request register.
Thus, instead of just set to clear the illegal_request bit
for first Freeze Bridge, we need to ensure the set to clear
action is applied to which ever Freeze Bridge that has
occurrence of illegal request.
Fixes: ca24a648f535 ("fpga: add altera freeze bridge support")
Signed-off-by: Chiau Ee Chew <chiau.ee.chew(a)intel.com>
Signed-off-by: Tanmay Kathpalia <tanmay.kathpalia(a)altera.com>
---
drivers/fpga/altera-freeze-bridge.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/fpga/altera-freeze-bridge.c b/drivers/fpga/altera-freeze-bridge.c
index 594693ff786e..23e8b2b54355 100644
--- a/drivers/fpga/altera-freeze-bridge.c
+++ b/drivers/fpga/altera-freeze-bridge.c
@@ -52,7 +52,7 @@ static int altera_freeze_br_req_ack(struct altera_freeze_br_data *priv,
if (illegal) {
dev_err(dev, "illegal request detected 0x%x", illegal);
- writel(1, csr_illegal_req_addr);
+ writel(illegal, csr_illegal_req_addr);
illegal = readl(csr_illegal_req_addr);
if (illegal)
--
2.19.0
Add qlcnic_sriov_free_vlans() in qlcnic_sriov_alloc_vlans() if
any sriov_vlans fails to be allocated.
Add qlcnic_sriov_free_vlans() to free the memory allocated by
qlcnic_sriov_alloc_vlans() if "sriov->allowed_vlans" fails to
be allocated.
Fixes: 91b7282b613d ("qlcnic: Support VLAN id config.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
---
Changes in v3:
- Handle allocation errors in qlcnic_sriov_alloc_vlans()
- Modify the patch title and description.
There's one more thing I'm confused about: I'm not sure if the fixes-tag
is correct, because I noticed that the two modifications correspond to
different commits. Should I split them into two separate patch submissions? Thanks, Paolo!
Changes in v2:
- Add qlcnic_sriov_free_vlans() if qlcnic_sriov_alloc_vlans() fails.
- Modify the patch description.
vf_info was allocated by kcalloc, no need to do more checks cause
kfree(NULL) is safe. Thanks, Paolo!
---
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
index f9dd50152b1e..28d24d59efb8 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
@@ -454,8 +454,10 @@ static int qlcnic_sriov_set_guest_vlan_mode(struct qlcnic_adapter *adapter,
num_vlans = sriov->num_allowed_vlans;
sriov->allowed_vlans = kcalloc(num_vlans, sizeof(u16), GFP_KERNEL);
- if (!sriov->allowed_vlans)
+ if (!sriov->allowed_vlans) {
+ qlcnic_sriov_free_vlans(adapter);
return -ENOMEM;
+ }
vlans = (u16 *)&cmd->rsp.arg[3];
for (i = 0; i < num_vlans; i++)
@@ -2167,8 +2169,10 @@ int qlcnic_sriov_alloc_vlans(struct qlcnic_adapter *adapter)
vf = &sriov->vf_info[i];
vf->sriov_vlans = kcalloc(sriov->num_allowed_vlans,
sizeof(*vf->sriov_vlans), GFP_KERNEL);
- if (!vf->sriov_vlans)
+ if (!vf->sriov_vlans) {
+ qlcnic_sriov_free_vlans(adapter);
return -ENOMEM;
+ }
}
return 0;
--
2.25.1
Hello,
New build issue found on stable-rc/linux-5.10.y:
---
in vmlinux (Makefile:1212) [logspec:kbuild,kbuild.other]
---
- dashboard: https://d.kernelci.org/issue/maestro:d5c2be698989c7de46471109aae8df0339b713…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: a0e8dfa03993fda7b4d4b696c50f69726522abba
Log excerpt:
=====================================================
.lds
In file included from ./include/linux/kernel.h:15,
net/ipv6/udp.c: In function ‘udp_v6_send_skb’:
./include/linux/minmax.h:20:35: warning: comparison of distinct
pointer types lacks a cast
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp’
net/ipv6/udp.c:1213:28: note: in expansion of macro ‘min’
In file included from ./include/linux/uaccess.h:7,
net/ipv4/udp.c: In function ‘udp_send_skb’:
./include/linux/minmax.h:20:35: warning: comparison of distinct
pointer types lacks a cast
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp’
net/ipv4/udp.c:926:28: note: in expansion of macro ‘min’
FAILED unresolved symbol filp_close
=====================================================
# Builds where the incident occurred:
## cros://chromeos-5.10/x86_64/chromeos-amd-stoneyridge.flavour.config+lab-setup+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y
on (x86_64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ceffea18018371957ebdc0
#kernelci issue maestro:d5c2be698989c7de46471109aae8df0339b713c1
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
---8<---
Changes in v2:
- Added explicit comment about the quirk, as requested by Mani.
- Made commit message more clear, as requested by Bjorn.
---8<---
On our Marvell OCTEON CN96XX board, we observed the following panic on
the latest kernel:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20
Hardware name: Marvell OcteonTX CN96XX board (DT)
pc : of_pci_add_properties+0x278/0x4c8
Call trace:
of_pci_add_properties+0x278/0x4c8 (P)
of_pci_make_dev_node+0xe0/0x158
pci_bus_add_device+0x158/0x228
pci_bus_add_devices+0x40/0x98
pci_host_probe+0x94/0x118
pci_host_common_probe+0x130/0x1b0
platform_probe+0x70/0xf0
The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range:
pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [bus 00-ff]
pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0
pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0
pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0
pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0
pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0
pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0
pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0
pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0
pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0
pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0
pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0
pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0
pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0
pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0
pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0
pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0
This regression was introduced by commit 7246a4520b4b ("PCI: Use
preserve_config in place of pci_flags"). On our board, the 0002:00:07.0
bridge is misconfigured by the bootloader. Both its secondary and
subordinate bus numbers are initialized to 0, while its fixed secondary
bus number is set to 8. However, bus number 8 is also assigned to another
bridge (0002:00:0f.0). Although this is a bootloader issue, before the
change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set
by default when PCI_PROBE_ONLY was not enabled, ensuing that all the
bus number for these bridges were reassigned, avoiding any conflicts.
After the change introduced in commit 7246a4520b4b, the bus numbers
assigned by the bootloader are reused by all other bridges, except
the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure
0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by
bootloader. However, since a pci_bus has already been allocated for
bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for
0002:00:07.0. This results in a pci bridge device without a pci_bus
attached (pdev->subordinate == NULL). Consequently, accessing
pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer
dereference.
To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when
PCI_PROBE_ONLY is not enabled in order to work around issue like the
one described above.
Cc: stable(a)vger.kernel.org
Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags")
Signed-off-by: Bo Sun <Bo.Sun.CN(a)windriver.com>
---
drivers/pci/quirks.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 82b21e34c545..cec58c7479e1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6181,6 +6181,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
+/*
+ * Quirk for Marvell CN96XX/CN10XXX boards:
+ *
+ * Adds PCI_REASSIGN_ALL_BUS unless PCI_PROBE_ONLY is set, forcing bus number
+ * reassignment to avoid conflicts caused by bootloader misconfigured PCI bridges.
+ *
+ * This resolves a regression introduced by commit 7246a4520b4b ("PCI: Use
+ * preserve_config in place of pci_flags"), which removed this behavior.
+ */
+static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev)
+{
+ if (!pci_has_flag(PCI_PROBE_ONLY))
+ pci_add_flags(PCI_REASSIGN_ALL_BUS);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002,
+ quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr);
+
#ifdef CONFIG_PCIEASPM
/*
* Several Intel DG2 graphics devices advertise that they can only tolerate
--
2.48.1
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: <stable(a)vger.kernel.org> # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
---
arch/arm64/mm/mmu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
--
2.25.1
Some users are reporting that ov08x40_identify_module() fails
to identify the chip reading 0x00 as value for OV08X40_REG_CHIP_ID.
Intel's out of tree IPU6 drivers include some ov08x40 changes
including adding support for the reset GPIO for older kernels and
Intel's patch for this uses 5 ms. Extend the sleep to 5 ms following
Intel's example, this fixes the ov08x40_identify_module() problem.
Link: https://github.com/intel/ipu6-drivers/blob/c09e2198d801e1eb701984d294837312…
Fixes: df1ae2251a50 ("media: ov08x40: Add OF probe support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/media/i2c/ov08x40.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/i2c/ov08x40.c b/drivers/media/i2c/ov08x40.c
index cf0e41fc3071..54575eea3c49 100644
--- a/drivers/media/i2c/ov08x40.c
+++ b/drivers/media/i2c/ov08x40.c
@@ -1341,7 +1341,7 @@ static int ov08x40_power_on(struct device *dev)
}
gpiod_set_value_cansleep(ov08x->reset_gpio, 0);
- usleep_range(1500, 1800);
+ usleep_range(5000, 5500);
return 0;
--
2.48.1
On Tue, Mar 11, 2025 at 06:54:00AM +0000, Cameron Williams wrote:
> Cc'ing stable
>
> Cc: stable(a)vger.kernel.org
>
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
The patch titled
Subject: memcg: drain obj stock on cpu hotplug teardown
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
memcg-drain-obj-stock-on-cpu-hotplug-teardown.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Subject: memcg: drain obj stock on cpu hotplug teardown
Date: Mon, 10 Mar 2025 16:09:34 -0700
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc:
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memcontrol.c~memcg-drain-obj-stock-on-cpu-hotplug-teardown
+++ a/mm/memcontrol.c
@@ -1921,9 +1921,18 @@ void drain_all_stock(struct mem_cgroup *
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old;
+ unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
+
+ /* drain_obj_stock requires stock_lock */
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
+ old = drain_obj_stock(stock);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+
drain_stock(stock);
+ obj_cgroup_put(old);
return 0;
}
_
Patches currently in -mm which might be from shakeel.butt(a)linux.dev are
memcg-drain-obj-stock-on-cpu-hotplug-teardown.patch
memcg-add-hierarchical-effective-limits-for-v2.patch
memcg-dont-call-propagate_protected_usage-for-v1.patch
page_counter-track-failcnt-only-for-legacy-cgroups.patch
page_counter-reduce-struct-page_counter-size.patch
memcg-bypass-root-memcg-check-for-skmem-charging.patch
The patch titled
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs.
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs.
Date: Mon, 10 Mar 2025 11:57:27 -0400
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all
page cache refs. Otherwise, the folio will not be freed, causing memory
leak.
This leak would happen on a filesystem with blocksize > page_size and a
truncate is performed, where the blocksize makes folios split to >0 order
ones, causing truncated folios not being freed.
Link: https://lkml.kernel.org/r/20250310155727.472846-1-ziy@nvidia.com
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com>
Cc: Luis Chamberalin <mcgrof(a)kernel.org>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Pankaj Raghav <p.raghav(a)samsung.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs
+++ a/mm/huge_memory.c
@@ -3304,7 +3304,7 @@ static void __split_huge_page(struct pag
folio_account_cleaned(tail,
inode_to_wb(folio->mapping->host));
__filemap_remove_folio(tail, NULL);
- folio_put(tail);
+ folio_put_refs(tail, folio_nr_pages(tail));
} else if (!folio_test_anon(folio)) {
__xa_store(&folio->mapping->i_pages, tail->index,
tail, 0);
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
mm-migrate-fix-shmem-xarray-update-during-migration.patch
mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs.patch
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
xarray-add-xas_try_split-to-split-a-multi-index-entry.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix.patch
mm-huge_memory-move-folio-split-common-code-to-__folio_split.patch
mm-huge_memory-add-buddy-allocator-like-non-uniform-folio_split.patch
mm-huge_memory-remove-the-old-unused-__split_huge_page.patch
mm-huge_memory-add-folio_split-to-debugfs-testing-interface.patch
mm-truncate-use-folio_split-in-truncate-operation.patch
selftests-mm-add-tests-for-folio_split-buddy-allocator-like-split.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
The patch titled
Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0
has been added to the -mm mm-unstable branch. Its filename is
mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0
Date: Mon, 10 Mar 2025 20:50:34 +0000
Patch series "refactor mremap and fix bug", v3.
The existing mremap() logic has grown organically over a very long period
of time, resulting in code that is in many parts, very difficult to follow
and full of subtleties and sources of confusion.
In addition, it is difficult to thread state through the operation
correctly, as function arguments have expanded, some parameters are
expected to be temporarily altered during the operation, others are
intended to remain static and some can be overridden.
This series completely refactors the mremap implementation, sensibly
separating functions, adding comments to explain the more subtle aspects
of the implementation and making use of small structs to thread state
through everything.
The reason for doing so is to lay the groundwork for planned future
changes to the mremap logic, changes which require the ability to easily
pass around state.
Additionally, it would be unhelpful to add yet more logic to code that is
already difficult to follow without first refactoring it like this.
The first patch in this series additionally fixes a bug when a VMA with
start address zero is partially remapped.
Tested on real hardware under heavy workload and all self tests are
passing.
This patch (of 3):
Consider the case of a partial mremap() (that results in a VMA split) of
an accountable VMA (i.e. which has the VM_ACCOUNT flag set) whose start
address is zero, with the MREMAP_MAYMOVE flag specified and a scenario
where a move does in fact occur:
addr end
| |
v v
|-------------|
| vma |
|-------------|
0
This move is affected by unmapping the range [addr, end). In order to
prevent an incorrect decrement of accounted memory which has already been
determined, the mremap() code in move_vma() clears VM_ACCOUNT from the VMA
prior to doing so, before reestablishing it in each of the VMAs
post-split:
addr end
| |
v v
|---| |---|
| A | | B |
|---| |---|
Commit 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
changed this logic such as to determine whether there is a need to do so
by establishing account_start and account_end and, in the instance where
such an operation is required, assigning them to vma->vm_start and
vma->vm_end.
Later the code checks if the operation is required for 'A' referenced
above thusly:
if (account_start) {
...
}
However, if the VMA described above has vma->vm_start == 0, which is now
assigned to account_start, this branch will not be executed.
As a result, the VMA 'A' above will remain stripped of its VM_ACCOUNT
flag, incorrectly.
The fix is to simply convert these variables to booleans and set them as
required.
Link: https://lkml.kernel.org/r/cover.1741639347.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/dc55cb6db25d97c3d9e460de4986a323fa959676.17416393…
Fixes: 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/mm/mremap.c~mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0
+++ a/mm/mremap.c
@@ -705,8 +705,8 @@ static unsigned long move_vma(struct vm_
unsigned long vm_flags = vma->vm_flags;
unsigned long new_pgoff;
unsigned long moved_len;
- unsigned long account_start = 0;
- unsigned long account_end = 0;
+ bool account_start = false;
+ bool account_end = false;
unsigned long hiwater_vm;
int err = 0;
bool need_rmap_locks;
@@ -790,9 +790,9 @@ static unsigned long move_vma(struct vm_
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
vm_flags_clear(vma, VM_ACCOUNT);
if (vma->vm_start < old_addr)
- account_start = vma->vm_start;
+ account_start = true;
if (vma->vm_end > old_addr + old_len)
- account_end = vma->vm_end;
+ account_end = true;
}
/*
@@ -832,7 +832,7 @@ static unsigned long move_vma(struct vm_
/* OOM: unable to split vma, just get accounts right */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
vm_acct_memory(old_len >> PAGE_SHIFT);
- account_start = account_end = 0;
+ account_start = account_end = false;
}
if (vm_flags & VM_LOCKED) {
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-simplify-vma-merge-structure-and-expand-comments.patch
mm-further-refactor-commit_merge.patch
mm-eliminate-adj_start-parameter-from-commit_merge.patch
mm-make-vmg-target-consistent-and-further-simplify-commit_merge.patch
mm-completely-abstract-unnecessary-adj_start-calculation.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise-fix.patch
mm-use-read-write_once-for-vma-vm_flags-on-migrate-mprotect.patch
mm-refactor-rmap_walk_file-to-separate-out-traversal-logic.patch
mm-provide-mapping_wrprotect_range-function.patch
fb_defio-do-not-use-deprecated-page-mapping-index-fields.patch
fb_defio-do-not-use-deprecated-page-mapping-index-fields-fix.patch
mm-allow-guard-regions-in-file-backed-and-read-only-mappings.patch
selftests-mm-rename-guard-pages-to-guard-regions.patch
selftests-mm-rename-guard-pages-to-guard-regions-fix.patch
tools-selftests-expand-all-guard-region-tests-to-file-backed.patch
tools-selftests-add-file-shmem-backed-mapping-guard-region-tests.patch
fs-proc-task_mmu-add-guard-region-bit-to-pagemap.patch
tools-selftests-add-guard-region-test-for-proc-pid-pagemap.patch
tools-selftests-add-guard-region-test-for-proc-pid-pagemap-fix.patch
mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
mm-mremap-refactor-mremap-system-call-implementation.patch
mm-mremap-introduce-and-use-vma_remap_struct-threaded-state.patch
mm-mremap-initial-refactor-of-move_vma.patch
mm-mremap-complete-refactor-of-move_vma.patch
mm-mremap-refactor-move_page_tables-abstracting-state.patch
mm-mremap-thread-state-through-move-page-table-operation.patch
The handling of the MST Connection Status Notify message is skipped if
the probing of the topology is still pending. Acquiring the
drm_dp_mst_topology_mgr::probe_lock for this in
drm_dp_mst_handle_up_req() is problematic: the task/work this function
is called from is also responsible for handling MST down-request replies
(in drm_dp_mst_handle_down_rep()). Thus drm_dp_mst_link_probe_work() -
holding already probe_lock - could be blocked waiting for an MST
down-request reply while drm_dp_mst_handle_up_req() is waiting for
probe_lock while processing a CSN message. This leads to the probe
work's down-request message timing out.
A scenario similar to the above leading to a down-request timeout is
handling a CSN message in drm_dp_mst_handle_conn_stat(), holding the
probe_lock and sending down-request messages while a second CSN message
sent by the sink subsequently is handled by drm_dp_mst_handle_up_req().
Fix the above by moving the logic to skip the CSN handling to
drm_dp_mst_process_up_req(). This function is called from a work
(separate from the task/work handling new up/down messages), already
holding probe_lock. This solves the above timeout issue, since handling
of down-request replies won't be blocked by probe_lock.
Fixes: ddf983488c3e ("drm/dp_mst: Skip CSN if topology probing is not done yet")
Cc: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org # v6.6+
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 40 +++++++++++--------
1 file changed, 24 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 8b68bb3fbffb0..3a1f1ffc7b552 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -4036,6 +4036,22 @@ static int drm_dp_mst_handle_down_rep(struct drm_dp_mst_topology_mgr *mgr)
return 0;
}
+static bool primary_mstb_probing_is_done(struct drm_dp_mst_topology_mgr *mgr)
+{
+ bool probing_done = false;
+
+ mutex_lock(&mgr->lock);
+
+ if (mgr->mst_primary && drm_dp_mst_topology_try_get_mstb(mgr->mst_primary)) {
+ probing_done = mgr->mst_primary->link_address_sent;
+ drm_dp_mst_topology_put_mstb(mgr->mst_primary);
+ }
+
+ mutex_unlock(&mgr->lock);
+
+ return probing_done;
+}
+
static inline bool
drm_dp_mst_process_up_req(struct drm_dp_mst_topology_mgr *mgr,
struct drm_dp_pending_up_req *up_req)
@@ -4066,8 +4082,12 @@ drm_dp_mst_process_up_req(struct drm_dp_mst_topology_mgr *mgr,
/* TODO: Add missing handler for DP_RESOURCE_STATUS_NOTIFY events */
if (msg->req_type == DP_CONNECTION_STATUS_NOTIFY) {
- dowork = drm_dp_mst_handle_conn_stat(mstb, &msg->u.conn_stat);
- hotplug = true;
+ if (!primary_mstb_probing_is_done(mgr)) {
+ drm_dbg_kms(mgr->dev, "Got CSN before finish topology probing. Skip it.\n");
+ } else {
+ dowork = drm_dp_mst_handle_conn_stat(mstb, &msg->u.conn_stat);
+ hotplug = true;
+ }
}
drm_dp_mst_topology_put_mstb(mstb);
@@ -4149,10 +4169,11 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
drm_dp_send_up_ack_reply(mgr, mst_primary, up_req->msg.req_type,
false);
+ drm_dp_mst_topology_put_mstb(mst_primary);
+
if (up_req->msg.req_type == DP_CONNECTION_STATUS_NOTIFY) {
const struct drm_dp_connection_status_notify *conn_stat =
&up_req->msg.u.conn_stat;
- bool handle_csn;
drm_dbg_kms(mgr->dev, "Got CSN: pn: %d ldps:%d ddps: %d mcs: %d ip: %d pdt: %d\n",
conn_stat->port_number,
@@ -4161,16 +4182,6 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
conn_stat->message_capability_status,
conn_stat->input_port,
conn_stat->peer_device_type);
-
- mutex_lock(&mgr->probe_lock);
- handle_csn = mst_primary->link_address_sent;
- mutex_unlock(&mgr->probe_lock);
-
- if (!handle_csn) {
- drm_dbg_kms(mgr->dev, "Got CSN before finish topology probing. Skip it.");
- kfree(up_req);
- goto out_put_primary;
- }
} else if (up_req->msg.req_type == DP_RESOURCE_STATUS_NOTIFY) {
const struct drm_dp_resource_status_notify *res_stat =
&up_req->msg.u.resource_stat;
@@ -4185,9 +4196,6 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
list_add_tail(&up_req->next, &mgr->up_req_list);
mutex_unlock(&mgr->up_req_lock);
queue_work(system_long_wq, &mgr->up_req_work);
-
-out_put_primary:
- drm_dp_mst_topology_put_mstb(mst_primary);
out_clear_reply:
reset_msg_rx_state(&mgr->up_req_recv);
return ret;
--
2.44.2
Dear stable team,
I noticed that ceeeb99cd821 ("dmaengine: mxs: rename custom flag") got backported, but the additional fix 269e31aecdd0 ("spi-mxs: Fix chipselect glitch") hasn't.
I think was caused by the lack of Cc to stable. Without the latter patch the SPI is causing glitches on MXS platform.
Please backport it from 5.4 to 6.6.
Thanks
Stefan
Sometimes I get a NULL pointer dereference at boot time in kobject_get()
with the following call stack:
anatop_regulator_probe()
devm_regulator_register()
regulator_register()
regulator_resolve_supply()
kobject_get()
By placing some extra BUG_ON() statements I could verify that this is
raised because probing of the 'dummy' regulator driver is not completed
('dummy_regulator_rdev' is still NULL).
In the JTAG debugger I can see that dummy_regulator_probe() and
anatop_regulator_probe() can be run by different kernel threads
(kworker/u4:*). I haven't further investigated whether this can be
changed or if there are other possibilities to force synchronization
between these two probe routines. On the other hand I don't expect much
boot time penalty by probing the 'dummy' regulator synchronously.
Cc: stable(a)vger.kernel.org
Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14")
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
---
drivers/regulator/dummy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/dummy.c b/drivers/regulator/dummy.c
index 5b9b9e4e762d..9f59889129ab 100644
--- a/drivers/regulator/dummy.c
+++ b/drivers/regulator/dummy.c
@@ -60,7 +60,7 @@ static struct platform_driver dummy_regulator_driver = {
.probe = dummy_regulator_probe,
.driver = {
.name = "reg-dummy",
- .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
},
};
--
2.43.0
Upon encountering errors during the HSIC pinctrl handling section the
regulator should be disabled.
After the above-stated changes it is possible to jump onto
"disable_hsic_regulator" label without having added the CPU latency QoS
request previously. This would result in cpu_latency_qos_remove_request()
yielding a WARNING.
So rearrange the error handling path to follow the reverse order of
different probing phases.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4d6141288c33 ("usb: chipidea: imx: pinctrl for HSIC is optional")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 619779eef333..3f11ae071c7f 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -407,13 +407,13 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
"pinctrl_hsic_idle lookup failed, err=%ld\n",
PTR_ERR(pinctrl_hsic_idle));
ret = PTR_ERR(pinctrl_hsic_idle);
- goto err_put;
+ goto disable_hsic_regulator;
}
ret = pinctrl_select_state(data->pinctrl, pinctrl_hsic_idle);
if (ret) {
dev_err(dev, "hsic_idle select failed, err=%d\n", ret);
- goto err_put;
+ goto disable_hsic_regulator;
}
data->pinctrl_hsic_active = pinctrl_lookup_state(data->pinctrl,
@@ -423,7 +423,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
"pinctrl_hsic_active lookup failed, err=%ld\n",
PTR_ERR(data->pinctrl_hsic_active));
ret = PTR_ERR(data->pinctrl_hsic_active);
- goto err_put;
+ goto disable_hsic_regulator;
}
}
@@ -432,11 +432,11 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
ret = imx_get_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = imx_prepare_enable_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = clk_prepare_enable(data->clk_wakeup);
if (ret)
@@ -526,12 +526,13 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
clk_disable_unprepare(data->clk_wakeup);
err_wakeup_clk:
imx_disable_unprepare_clks(dev);
+qos_remove_request:
+ if (pdata.flags & CI_HDRC_PMQOS)
+ cpu_latency_qos_remove_request(&data->pm_qos_req);
disable_hsic_regulator:
if (data->hsic_pad_regulator)
/* don't overwrite original ret (cf. EPROBE_DEFER) */
regulator_disable(data->hsic_pad_regulator);
- if (pdata.flags & CI_HDRC_PMQOS)
- cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
err_put:
if (data->usbmisc_data)
--
2.48.1
On Sun, Mar 09, 2025 at 03:45:57PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> drm/i915: Plumb 'dsb' all way to the plane hooks
>
> to the 6.12-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> drm-i915-plumb-dsb-all-way-to-the-plane-hooks.patch
> and it can be found in the queue-6.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit f03e7cca22f4bb50cae98840f91fcf1e6d780a54
> Author: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
> Date: Mon Sep 30 20:04:13 2024 +0300
>
> drm/i915: Plumb 'dsb' all way to the plane hooks
>
> [ Upstream commit 01389846f7d61d262cc92d42ad4d1a25730e3eff ]
It would help if you actually mentioned *why* you need to backport this?
--
Ville Syrjälä
Intel
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function);
did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? in arch/riscv/kernel/suspend.o
(arch/riscv/kernel/suspend.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:f277022d07efdd2a5858eb44b3c3dab79cca84…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: b49d45c66a5e8cc1c82591049bfc0d04daa1e77c
Log excerpt:
=====================================================
arch/riscv/kernel/suspend.c:14:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’
undeclared (first use in this function); did you mean
‘RISCV_ISA_EXT_ZIFENCEI’?
14 | if
(riscv_cpu_has_extension_unlikely(smp_processor_id(),
RISCV_ISA_EXT_XLINUXENVCFG))
|
^~~~~~~~~~~~~~~~~~~~~~~~~~
|
RISCV_ISA_EXT_ZIFENCEI
arch/riscv/kernel/suspend.c:14:66: note: each undeclared identifier is
reported only once for each function it appears in
CC fs/proc/cpuinfo.o
arch/riscv/kernel/suspend.c: In function ‘suspend_restore_csrs’:
arch/riscv/kernel/suspend.c:37:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’
undeclared (first use in this function); did you mean
‘RISCV_ISA_EXT_ZIFENCEI’?
37 | if
(riscv_cpu_has_extension_unlikely(smp_processor_id(),
RISCV_ISA_EXT_XLINUXENVCFG))
|
^~~~~~~~~~~~~~~~~~~~~~~~~~
|
RISCV_ISA_EXT_ZIFENCEI
=====================================================
# Builds where the incident occurred:
## defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67cf00ee18018371957ec83e
#kernelci issue maestro:f277022d07efdd2a5858eb44b3c3dab79cca847e
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop
all page cache refs. Otherwise, the folio will not be freed, causing
memory leak.
This leak would happen on a filesystem with blocksize > page_size and
a truncate is performed, where the blocksize makes folios split to
>0 order ones, causing truncated folios not being freed.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Reported-by: Hugh Dickins <hughd(a)google.com>
Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
Cc: stable(a)vger.kernel.org
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3d3ebdc002d5..373781b21e5c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3304,7 +3304,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
folio_account_cleaned(tail,
inode_to_wb(folio->mapping->host));
__filemap_remove_folio(tail, NULL);
- folio_put(tail);
+ folio_put_refs(tail, folio_nr_pages(tail));
} else if (!folio_test_anon(folio)) {
__xa_store(&folio->mapping->i_pages, tail->index,
tail, 0);
--
2.47.2
The macb ethernet driver (Raspberry Pi 5) delivers interrupts only to
the first core, quickly saturating it at higher packet rates.
Introducing software interrupt coalescing dramatically alleviates this
limitation; the oneliner fix is upstream at
d57f7b45945ac0517ff8ea50655f00db6e8d637c.
Please backport this fix to 6.6 -stable to bring this benefit to more
Raspberry Pis; it applies cleanly on this branch.
Many thanks,
Daniel
--
Daniel J Blueman
usbmisc is an optional device property so it is totally valid for the
corresponding data->usbmisc_data to have a NULL value.
Check that before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: 74adad500346 ("usb: chipidea: ci_hdrc_imx: decrement device's refcount in .remove() and in the error path of .probe()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 1a7fc638213e..619779eef333 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -534,7 +534,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
err_put:
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -559,7 +560,8 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
--
2.48.1
Hello,
New build issue found on stable-rc/linux-5.15.y:
---
implicit declaration of function ‘acpi_get_cache_info’; did you mean
‘acpi_get_system_info’? [-Werror=implicit-function-declaration] in
arch/riscv/kernel/cacheinfo.o (arch/riscv/kernel/cacheinfo.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:c4d70565f303a7d7450fbf5add7ca4cc80a961…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 2ae395ef666caf57984ff9d2ad7bca6be851f719
Log excerpt:
=====================================================
arch/riscv/kernel/cacheinfo.c:127:23: error: implicit declaration of
function ‘acpi_get_cache_info’; did you mean ‘acpi_get_system_info’?
[-Werror=implicit-function-declaration]
127 | ret = acpi_get_cache_info(cpu, &fw_levels,
&split_levels);
| ^~~~~~~~~~~~~~~~~~~
| acpi_get_system_info
cc1: some warnings being treated as errors
CC arch/riscv/kernel/patch.o
CC fs/proc/generic.o
=====================================================
# Builds where the incident occurred:
## defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ced73618018371957dfa8e
## nommu_k210_defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ced73a18018371957dfa91
#kernelci issue maestro:c4d70565f303a7d7450fbf5add7ca4cc80a96112
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.4.y:
---
implicit declaration of function ‘acpi_get_cache_info’; did you mean
‘acpi_get_system_info’? [-Werror=implicit-function-declaration] in
arch/riscv/kernel/cacheinfo.o (arch/riscv/kernel/cacheinfo.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:0f2670909ac3275cc312c3c604f3ed03443fee…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 2f9225fb6ea4ba2ad94f50f0e24bad9c353b8649
Log excerpt:
=====================================================
arch/riscv/kernel/cacheinfo.c:118:23: error: implicit declaration of
function ‘acpi_get_cache_info’; did you mean ‘acpi_get_system_info’?
[-Werror=implicit-function-declaration]
118 | ret = acpi_get_cache_info(cpu, &fw_levels,
&split_levels);
| ^~~~~~~~~~~~~~~~~~~
| acpi_get_system_info
arch/riscv/kernel/cacheinfo.c:140:13: error: implicit declaration of
function ‘of_property_present’; did you mean
‘fwnode_property_present’? [-Werror=implicit-function-declaration]
140 | if (of_property_present(np, "cache-size"))
| ^~~~~~~~~~~~~~~~~~~
| fwnode_property_present
CC arch/riscv/kernel/module-sections.o
CC arch/riscv/kernel/perf_regs.o
cc1: some warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ced63718018371957df9ae
#kernelci issue maestro:0f2670909ac3275cc312c3c604f3ed03443feecc
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
From: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
Ensure the PHY reset and perst is asserted during power-off to
guarantee it is in a reset state upon repeated power-on calls. This
resolves an issue where the PHY may not properly initialize during
subsequent power-on cycles. Power-on will deassert the reset at the
appropriate time after tuning the PHY parameters.
During suspend/resume cycles, we observed that the PHY PLL failed to
lock during resume when the CPU temperature increased from 65C to 75C.
The observed errors were:
phy phy-32f00000.pcie-phy.3: phy poweron failed --> -110
imx6q-pcie 33800000.pcie: waiting for PHY ready timeout!
imx6q-pcie 33800000.pcie: PM: dpm_run_callback(): genpd_resume_noirq+0x0/0x80 returns -110
imx6q-pcie 33800000.pcie: PM: failed to resume noirq: error -110
This resulted in a complete CPU freeze, which is resolved by ensuring
the PHY is in reset during power-on, thus preventing PHY PLL failures.
Cc: stable(a)vger.kernel.org
Fixes: 1aa97b002258 ("phy: freescale: pcie: Initialize the imx8 pcie standalone phy driver")
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Acked-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Signed-off-by: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
---
drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
index 5b505e34ca364..7355d9921b646 100644
--- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
+++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
@@ -156,6 +156,16 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
return ret;
}
+static int imx8_pcie_phy_power_off(struct phy *phy)
+{
+ struct imx8_pcie_phy *imx8_phy = phy_get_drvdata(phy);
+
+ reset_control_assert(imx8_phy->reset);
+ reset_control_assert(imx8_phy->perst);
+
+ return 0;
+}
+
static int imx8_pcie_phy_init(struct phy *phy)
{
struct imx8_pcie_phy *imx8_phy = phy_get_drvdata(phy);
@@ -176,6 +186,7 @@ static const struct phy_ops imx8_pcie_phy_ops = {
.init = imx8_pcie_phy_init,
.exit = imx8_pcie_phy_exit,
.power_on = imx8_pcie_phy_power_on,
+ .power_off = imx8_pcie_phy_power_off,
.owner = THIS_MODULE,
};
--
2.45.2
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to the 6.1-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-dac-ad3552r-clear-reset-status-flag.patch
and it can be found in the queue-6.1 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
From: Angelo Dureghello <adureghello(a)baylibre.com>
commit e17b9f20da7d2bc1f48878ab2230523b2512d965 upstream.
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -703,6 +703,12 @@ static int ad3552r_reset(struct ad3552r_
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
addr_mask_map[AD3552R_ADDR_ASCENSION][0],
addr_mask_map[AD3552R_ADDR_ASCENSION][1],
Patches currently in stable-queue which might be from adureghello(a)baylibre.com are
queue-6.1/iio-dac-ad3552r-clear-reset-status-flag.patch
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to the 6.12-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-dac-ad3552r-clear-reset-status-flag.patch
and it can be found in the queue-6.12 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
From: Angelo Dureghello <adureghello(a)baylibre.com>
commit e17b9f20da7d2bc1f48878ab2230523b2512d965 upstream.
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -714,6 +714,12 @@ static int ad3552r_reset(struct ad3552r_
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
addr_mask_map[AD3552R_ADDR_ASCENSION][0],
addr_mask_map[AD3552R_ADDR_ASCENSION][1],
Patches currently in stable-queue which might be from adureghello(a)baylibre.com are
queue-6.12/iio-dac-ad3552r-clear-reset-status-flag.patch
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to the 6.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-dac-ad3552r-clear-reset-status-flag.patch
and it can be found in the queue-6.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
From: Angelo Dureghello <adureghello(a)baylibre.com>
commit e17b9f20da7d2bc1f48878ab2230523b2512d965 upstream.
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -410,6 +410,12 @@ static int ad3552r_reset(struct ad3552r_
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
AD3552R_REG_ADDR_INTERFACE_CONFIG_A,
AD3552R_MASK_ADDR_ASCENSION,
Patches currently in stable-queue which might be from adureghello(a)baylibre.com are
queue-6.13/iio-dac-ad3552r-clear-reset-status-flag.patch
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 91d44c1afc61a2fec37a9c7a3485368309391e0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031035-dangle-briskness-0e29@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91d44c1afc61a2fec37a9c7a3485368309391e0b Mon Sep 17 00:00:00 2001
From: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Date: Sat, 18 Jan 2025 15:08:33 +0800
Subject: [PATCH] cdx: Fix possible UAF error in driver_override_show()
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c
This function driver_override_show() is part of DEVICE_ATTR_RW, which
includes both driver_override_show() and driver_override_store().
These functions can be executed concurrently in sysfs.
The driver_override_store() function uses driver_set_override() to
update the driver_override value, and driver_set_override() internally
locks the device (device_lock(dev)). If driver_override_show() reads
cdx_dev->driver_override without locking, it could potentially access
a freed pointer if driver_override_store() frees the string
concurrently. This could lead to printing a kernel address, which is a
security risk since DEVICE_ATTR can be read by all users.
Additionally, a similar pattern is used in drivers/amba/bus.c, as well
as many other bus drivers, where device_lock() is taken in the show
function, and it has been working without issues.
This potential bug was detected by our experimental static analysis
tool, which analyzes locking APIs and paired functions to identify
data races and atomicity violations.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..7811aa734053 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -473,8 +473,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cdx_device *cdx_dev = to_cdx_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 91d44c1afc61a2fec37a9c7a3485368309391e0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031035-unmoving-oak-e2a9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91d44c1afc61a2fec37a9c7a3485368309391e0b Mon Sep 17 00:00:00 2001
From: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Date: Sat, 18 Jan 2025 15:08:33 +0800
Subject: [PATCH] cdx: Fix possible UAF error in driver_override_show()
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c
This function driver_override_show() is part of DEVICE_ATTR_RW, which
includes both driver_override_show() and driver_override_store().
These functions can be executed concurrently in sysfs.
The driver_override_store() function uses driver_set_override() to
update the driver_override value, and driver_set_override() internally
locks the device (device_lock(dev)). If driver_override_show() reads
cdx_dev->driver_override without locking, it could potentially access
a freed pointer if driver_override_store() frees the string
concurrently. This could lead to printing a kernel address, which is a
security risk since DEVICE_ATTR can be read by all users.
Additionally, a similar pattern is used in drivers/amba/bus.c, as well
as many other bus drivers, where device_lock() is taken in the show
function, and it has been working without issues.
This potential bug was detected by our experimental static analysis
tool, which analyzes locking APIs and paired functions to identify
data races and atomicity violations.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..7811aa734053 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -473,8 +473,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cdx_device *cdx_dev = to_cdx_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 91d44c1afc61a2fec37a9c7a3485368309391e0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031034-faction-uphold-6310@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91d44c1afc61a2fec37a9c7a3485368309391e0b Mon Sep 17 00:00:00 2001
From: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Date: Sat, 18 Jan 2025 15:08:33 +0800
Subject: [PATCH] cdx: Fix possible UAF error in driver_override_show()
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c
This function driver_override_show() is part of DEVICE_ATTR_RW, which
includes both driver_override_show() and driver_override_store().
These functions can be executed concurrently in sysfs.
The driver_override_store() function uses driver_set_override() to
update the driver_override value, and driver_set_override() internally
locks the device (device_lock(dev)). If driver_override_show() reads
cdx_dev->driver_override without locking, it could potentially access
a freed pointer if driver_override_store() frees the string
concurrently. This could lead to printing a kernel address, which is a
security risk since DEVICE_ATTR can be read by all users.
Additionally, a similar pattern is used in drivers/amba/bus.c, as well
as many other bus drivers, where device_lock() is taken in the show
function, and it has been working without issues.
This potential bug was detected by our experimental static analysis
tool, which analyzes locking APIs and paired functions to identify
data races and atomicity violations.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..7811aa734053 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -473,8 +473,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cdx_device *cdx_dev = to_cdx_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 189ecdb3e112da703ac0699f4ec76aa78122f911
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031003-unstitch-arbitrate-baa1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 189ecdb3e112da703ac0699f4ec76aa78122f911 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:10 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c6fd0edc41f..12d5f47c1bbe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,7 +4968,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
- vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10969,6 +10968,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
+ vcpu->arch.host_debugctl = get_debugctlmsr();
+
guest_timing_enter_irqoff();
for (;;) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 189ecdb3e112da703ac0699f4ec76aa78122f911
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031002-campsite-railroad-4d13@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 189ecdb3e112da703ac0699f4ec76aa78122f911 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:10 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c6fd0edc41f..12d5f47c1bbe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,7 +4968,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
- vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10969,6 +10968,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
+ vcpu->arch.host_debugctl = get_debugctlmsr();
+
guest_timing_enter_irqoff();
for (;;) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fb71c795935652fa20eaf9517ca9547f5af99a76
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031034-twister-stash-ba87@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb71c795935652fa20eaf9517ca9547f5af99a76 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:08 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL in common x86
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b7af5902ff7..32ae3aa50c7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm_vcpu_arch {
u32 pkru;
u32 hflags;
u64 efer;
+ u64 host_debugctl;
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6c56d5235f0f..3b92f893b239 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
*/
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
shrink_ple_window(vcpu);
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
}
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
- if (vmx->host_debugctlmsr)
- update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vcpu->arch.host_debugctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
/* apic deadline value in host tsc */
u64 hv_deadline_tsc;
- unsigned long host_debugctlmsr;
-
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02159c967d29..5c6fd0edc41f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,6 +4968,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
+ vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fb71c795935652fa20eaf9517ca9547f5af99a76
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031034-latitude-stinking-09c1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb71c795935652fa20eaf9517ca9547f5af99a76 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:08 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL in common x86
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b7af5902ff7..32ae3aa50c7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm_vcpu_arch {
u32 pkru;
u32 hflags;
u64 efer;
+ u64 host_debugctl;
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6c56d5235f0f..3b92f893b239 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
*/
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
shrink_ple_window(vcpu);
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
}
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
- if (vmx->host_debugctlmsr)
- update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vcpu->arch.host_debugctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
/* apic deadline value in host tsc */
u64 hv_deadline_tsc;
- unsigned long host_debugctlmsr;
-
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02159c967d29..5c6fd0edc41f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,6 +4968,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
+ vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031025-hurry-muster-0e93@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031024-bootleg-parkway-393c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031023-dodge-ungodly-172a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031022-debunk-winner-e8fe@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
This patch series addresses 2 issues
1) Fix typo in pattern properties for R-Car V4M.
2) Fix page entries in the AFL list.
v1->v2:
* Split fixes patches as separate series.
* Added Rb tag from Geert for binding patch.
* Added the tag Cc:stable@vger.kernel.org
Biju Das (2):
dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties
for R-Car V4M
can: rcar_canfd: Fix page entries in the AFL list
.../bindings/net/can/renesas,rcar-canfd.yaml | 2 +-
drivers/net/can/rcar/rcar_canfd.c | 17 ++++++++++-------
2 files changed, 11 insertions(+), 8 deletions(-)
--
2.43.0
This small series adds support for non-coherent video capture buffers
on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's
allocated by dma-contig allocator. Patch 2 allows non-coherent
allocations on the rkisp1 capture queue. Some timing measurements are
provided in the commit message of patch 2.
Signed-off-by: Mikhail Rudenko <mike.rudenko(a)gmail.com>
---
Changes in v4:
- rebase to media/next
- use `direction` instead of `buf->dma_dir` in dma_sync_sgtable_*
- Link to v3: https://lore.kernel.org/r/20250128-b4-rkisp-noncoherent-v3-0-baf39c997d2a@g…
Changes in v3:
- ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access
- invalidate/flush kernel mappings as appropriate if they exist
- use dma_sync_sgtable_* instead of dma_sync_sg_*
- Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@g…
Changes in v2:
- Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers.
- Add cache management timing information to patch 2 commit message.
- Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@g…
---
Mikhail Rudenko (2):
media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
media: rkisp1: Allow non-coherent video capture buffers
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++
.../platform/rockchip/rkisp1/rkisp1-capture.c | 1 +
2 files changed, 23 insertions(+)
---
base-commit: b2c4bf0c102084e77ed1b12090d77a76469a6814
change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba
Best regards,
--
Mikhail Rudenko <mike.rudenko(a)gmail.com>
Note that this was a real fix, but the fix only matters if commit
aaec5a95d596 ("pipe_read: don't wake up the writer if the pipe is
still full") is in the tree.
Now, the bug was pre-existing, and *maybe* it could be hit without
that commit aaec5a95d596, but nobody has ever reported it, so it's
very very unlikely.
Also, this fix then had some fall-out, and while I think you've queued
all the fallout fixes too, I think it might be a good idea to wait for
more reports from the development tree before considering these for
stable.
Put another way: this fix caused some pain. It might not be worth
back-porting to stable at all, and if it is, it might be worth waiting
to see that there's no other fallout.
Linus
On Sun, 9 Mar 2025 at 09:52, Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex
From: Saurabh Sengar <ssengar(a)linux.microsoft.com>
On a x86 system under test with 1780 CPUs, topology_span_sane() takes
around 8 seconds cumulatively for all the iterations. It is an expensive
operation which does the sanity of non-NUMA topology masks.
CPU topology is not something which changes very frequently hence make
this check optional for the systems where the topology is trusted and
need faster bootup.
Restrict this to sched_verbose kernel cmdline option so that this penalty
can be avoided for the systems who want to avoid it.
Cc: stable(a)vger.kernel.org
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Co-developed-by: Naman Jain <namjain(a)linux.microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
Tested-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
---
Changes since v3:
https://lore.kernel.org/all/20250203114738.3109-1-namjain@linux.microsoft.c…
- Minor typo correction in comment
- Added Tested-by tag from Prateek for x86
Changes since v2:
https://lore.kernel.org/all/1731922777-7121-1-git-send-email-ssengar@linux.…
- Use sched_debug() instead of using sched_debug_verbose
variable directly (addressing Prateek's comment)
Changes since v1:
https://lore.kernel.org/all/1729619853-2597-1-git-send-email-ssengar@linux.…
- Use kernel cmdline param instead of compile time flag.
Adding a link to the other patch which is under review.
https://lore.kernel.org/lkml/20241031200431.182443-1-steve.wahl@hpe.com/
Above patch tries to optimize the topology sanity check, whereas this
patch makes it optional. We believe both patches can coexist, as even
with optimization, there will still be some performance overhead for
this check.
---
kernel/sched/topology.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..666f0a18cc6c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2359,6 +2359,13 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
{
int i = cpu + 1;
+ /* Skip the topology sanity check for non-debug, as it is a time-consuming operation */
+ if (!sched_debug()) {
+ pr_info_once("%s: Skipping topology span sanity check. Use `sched_verbose` boot parameter to enable it.\n",
+ __func__);
+ return true;
+ }
+
/* NUMA levels are allowed to overlap */
if (tl->flags & SDTL_OVERLAP)
return true;
--
2.34.1