The existing memstick core patch: commit 62c59a8786e6 ("memstick: Skip
allocating card when removing host") sets host->removing in
memstick_remove_host(),but still exists a critical time window where
memstick_check can run after host->eject is set but before removing is set.
In the rtsx_usb_ms driver, the problematic sequence is:
rtsx_usb_ms_drv_remove: memstick_check:
host->eject = true
cancel_work_sync(handle_req) if(!host->removing)
... memstick_alloc_card()
memstick_set_rw_addr()
memstick_new_req()
rtsx_usb_ms_request()
if(!host->eject)
skip schedule_work
wait_for_completion()
memstick_remove_host: [blocks indefinitely]
host->removing = true
flush_workqueue()
[block]
1. rtsx_usb_ms_drv_remove sets host->eject = true
2. cancel_work_sync(&host->handle_req) runs
3. memstick_check work may be executed here <-- danger window
4. memstick_remove_host sets removing = 1
During this window (step 3), memstick_check calls memstick_alloc_card,
which may indefinitely waiting for mrq_complete completion that will
never occur because rtsx_usb_ms_request sees eject=true and skips
scheduling work, memstick_set_rw_addr waits forever for completion.
This causes a deadlock when memstick_remove_host tries to flush_workqueue,
waiting for memstick_check to complete, while memstick_check is blocked
waiting for mrq_complete completion.
Fix this by setting removing=true at the start of rtsx_usb_ms_drv_remove,
before any work cancellation. This ensures memstick_check will see the
removing flag immediately and exit early, avoiding the deadlock.
Fixes: 62c59a8786e6 ("memstick: Skip allocating card when removing host")
Signed-off-by: Jiayi Li <lijiayi(a)kylinos.cn>
Cc: stable(a)vger.kernel.org
---
v1 -> v2:
Added Cc: stable(a)vger.kernel.org
---
drivers/memstick/core/memstick.c | 1 -
drivers/memstick/host/rtsx_usb_ms.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/memstick/core/memstick.c b/drivers/memstick/core/memstick.c
index 043b9ec756ff..95e65f4958f2 100644
--- a/drivers/memstick/core/memstick.c
+++ b/drivers/memstick/core/memstick.c
@@ -555,7 +555,6 @@ EXPORT_SYMBOL(memstick_add_host);
*/
void memstick_remove_host(struct memstick_host *host)
{
- host->removing = 1;
flush_workqueue(workqueue);
mutex_lock(&host->lock);
if (host->card)
diff --git a/drivers/memstick/host/rtsx_usb_ms.c b/drivers/memstick/host/rtsx_usb_ms.c
index 3878136227e4..5b5e9354fb2e 100644
--- a/drivers/memstick/host/rtsx_usb_ms.c
+++ b/drivers/memstick/host/rtsx_usb_ms.c
@@ -812,6 +812,7 @@ static void rtsx_usb_ms_drv_remove(struct platform_device *pdev)
int err;
host->eject = true;
+ msh->removing = true;
cancel_work_sync(&host->handle_req);
cancel_delayed_work_sync(&host->poll_card);
--
2.47.1
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 08b016864fc0..71b17a00d3ed 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4076,7 +4076,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 954cd962e113..c026e7cc0af1 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4183,7 +4183,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index cd94b0a4e021..626f02605192 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4486,7 +4486,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2ff8787f753c..19978f02bb9e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4378,7 +4378,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
During the integration of the RTL8239 POE chip + its frontend MCU, it was
noticed that multi-byte operations were basically broken in the current
driver.
Tests using SMBus Block Writes showed that the data (after the Wr + Ack
marker) was mixed up on the wire. At first glance, it looked like an
endianness problem. But for transfers were the number of count + data bytes
was not divisible by 4, the last bytes were not looking like an endianness
problem because they were were in the wrong order but not for example 0 -
which would be the case for an endianness problem with 32 bit registers. At
the end, it turned out to be a the way how i2c_write tried to add the bytes
to the send registers.
Each 32 bit register was used similar to a shift register - shifting the
various bytes up the register while the next one is added to the least
significant byte. But the I2C controller expects the first byte of the
tranmission in the least significant byte of the first register. And the
last byte (assuming it is a 16 byte transfer) in the most significant byte
of the fourth register.
While doing these tests, it was also observed that the count byte was
missing from the SMBus Block Writes. The driver just removed them from the
data->block (from the I2C subsystem). But the I2C controller DOES NOT
automatically add this byte - for example by using the configured
transmission length.
The RTL8239 MCU is not actually an SMBus compliant device. Instead, it
expects I2C Block Reads + I2C Block Writes. But according to the already
identified bugs in the driver, it was clear that the I2C controller can
simply be modified to not send the count byte for I2C_SMBUS_I2C_BLOCK_DATA.
The receive part, just needs to write the content of the receive buffer to
the correct position in data->block.
While the on-wire formwat was now correct, reads were still not possible
against the MCU (for the RTL8239 POE chip). It was always timing out
because the 2ms were not enough for sending the read request and then
receiving the 12 byte answer.
These changes were originally submitted to OpenWrt. But there are plans to
migrate OpenWrt to the upstream Linux driver. As result, the pull request
was stopped and the changes were redone against this driver.
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Harshal Gohel (2):
i2c: rtl9300: Fix multi-byte I2C write
i2c: rtl9300: Implement I2C block read and write
Sven Eckelmann (2):
i2c: rtl9300: Increase timeout for transfer polling
i2c: rtl9300: Add missing count byte for SMBus Block Write
drivers/i2c/busses/i2c-rtl9300.c | 43 +++++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)
---
base-commit: b9ddaa95fd283bce7041550ddbbe7e764c477110
change-id: 20250802-i2c-rtl9300-multi-byte-edaa1fb0872c
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
The gpio-mlxbf2 driver interfaces with four GPIO controllers,
device instances 0-3. There are two IRQ resources shared between
the four controllers, and they are found in the ACPI table for
instances 0 and 3. The driver should not use platform_get_irq(),
otherwise this error is logged when probing instances 1 and 2:
mlxbf2_gpio MLNXBF22:01: error -ENXIO: IRQ index 0 not found
Fixes: 2b725265cb08 ("gpio: mlxbf2: Introduce IRQ support")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Thompson <davthompson(a)nvidia.com>
Reviewed-by: Shravan Kumar Ramani <shravankr(a)nvidia.com>
---
v4: updated logic to simply use platform_get_irq_optional()
v3: added version history
v2: added tag "Cc: stable(a)vger.kernel.org"
drivers/gpio/gpio-mlxbf2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-mlxbf2.c b/drivers/gpio/gpio-mlxbf2.c
index 6f3dda6b635f..390f2e74a9d8 100644
--- a/drivers/gpio/gpio-mlxbf2.c
+++ b/drivers/gpio/gpio-mlxbf2.c
@@ -397,7 +397,7 @@ mlxbf2_gpio_probe(struct platform_device *pdev)
gc->ngpio = npins;
gc->owner = THIS_MODULE;
- irq = platform_get_irq(pdev, 0);
+ irq = platform_get_irq_optional(pdev, 0);
if (irq >= 0) {
girq = &gs->gc.irq;
gpio_irq_chip_set_chip(girq, &mlxbf2_gpio_irq_chip);
--
2.43.2
Make sure to drop the reference to the secure monitor device taken by
of_find_device_by_node() when looking up its driver data on behalf of
other drivers (e.g. during probe).
Note that holding a reference to the platform device does not prevent
its driver data from going away so there is no point in keeping the
reference after the helper returns.
Fixes: 8cde3c2153e8 ("firmware: meson_sm: Rework driver as a proper platform driver")
Cc: stable(a)vger.kernel.org # 5.5
Cc: Carlo Caione <ccaione(a)baylibre.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/firmware/meson/meson_sm.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/meson/meson_sm.c b/drivers/firmware/meson/meson_sm.c
index f25a9746249b..3ab67aaa9e5d 100644
--- a/drivers/firmware/meson/meson_sm.c
+++ b/drivers/firmware/meson/meson_sm.c
@@ -232,11 +232,16 @@ EXPORT_SYMBOL(meson_sm_call_write);
struct meson_sm_firmware *meson_sm_get(struct device_node *sm_node)
{
struct platform_device *pdev = of_find_device_by_node(sm_node);
+ struct meson_sm_firmware *fw;
if (!pdev)
return NULL;
- return platform_get_drvdata(pdev);
+ fw = platform_get_drvdata(pdev);
+
+ put_device(&pdev->dev);
+
+ return fw;
}
EXPORT_SYMBOL_GPL(meson_sm_get);
--
2.49.1
The Qualcomm SM6375 processor is a 7nm process SoC for the mid-range market with the following features:
CPU: Eight-core design, including high-performance Kryo 670 core and efficient Kryo 265 core, optimized performance and energy efficiency.
GPU: Equipped with Adreno 642L GPU, supporting high-quality graphics and gaming experience.
AI Engine: Integrated Qualcomm AI engine to enhance intelligent features such as voice recognition and image processing.
Connectivity: Supports modern wireless standards such as 5G, Wi-Fi 6 and Bluetooth 5.2.
Multimedia: Supports 4K video encoding and decoding
Mainly used in mid-to-high-end smartphones, tablets and some IoT devices, suitable for users who need to balance cost performance and performance.
.# Part Number Manufacturer Date Code Quantity Unit Price Lead Time Condition (PCS) USD/Each one 1 SM-6375-1-PSP837-TR-00-0-AB QUALCOMM 2023+ 12000pcs US$18.00/pcs 7days New & original - stock 2 PM-6375-0-FOWNSP144-TR-01-0;TR-01-1 QUALCOMM 2023+ 12000pcs US$1.00/pcs 3 PMR-735A-0-WLNSP48-TR-05-0,TR-05-1 QUALCOMM 2023+ 12000pcs US$0.85/pcs 4 PMK-8003-0-FOWPSP36-TR-01-0 QUALCOMM 2023+ 12000pcs US$0.24/pcs 5 SDR-735-0-PSP219B-TR-01-0;TR-01-1 QUALCOMM 2023+ 12000pcs US$2.50/pcs 6 WCD-9370-0-WLPSP55-TR-01-0;TR-01-4 QUALCOMM 2023+ 12000pcs US$0.50/pcs 7 WCN-3988-0-82BWLPSP-TR-00-0 QUALCOMM 2023+ 12000pcs US$3.50/pcs 8 QET-6105-0-WLNSP24B-TR-00-1 QUALCOMM 2023+ 12000pcs US$1.20/pcs 9 QET4101-0-12WLNSP-TR-00-0 QUALCOMM 2022+ 12000pcs US$0.21/pcs
These materials are sold as a set for $28/usd, and are guaranteed to be authentic.
If you need other Qualcomm materials, please feel free to contact me
Stay in tune with product evolutions—tap . Keep Receiving Notices
Feel like taking a break? Select Configure Your Mailing.
Increase the External ROM access timeouts to prevent failures during
programming of External SPI EEPROM chips. The current timeouts are
too short for some SPI EEPROMs used with uPD720201 controllers.
The current timeout for Chip Erase in renesas_rom_erase() is 100 ms ,
the current timeout for Sector Erase issued by the controller before
Page Program in renesas_fw_download_image() is also 100 ms. Neither
timeout is sufficient for e.g. the Macronix MX25L5121E or MX25V5126F.
MX25L5121E reference manual [1] page 35 section "ERASE AND PROGRAMMING
PERFORMANCE" and page 23 section "Table 8. AC CHARACTERISTICS (Temperature
= 0°C to 70°C for Commercial grade, VCC = 2.7V ~ 3.6V)" row "tCE" indicate
that the maximum time required for Chip Erase opcode to complete is 2 s,
and for Sector Erase it is 300 ms .
MX25V5126F reference manual [2] page 47 section "13. ERASE AND PROGRAMMING
PERFORMANCE (2.3V - 3.6V)" and page 42 section "Table 8. AC CHARACTERISTICS
(Temperature = -40°C to 85°C for Industrial grade, VCC = 2.3V - 3.6V)" row
"tCE" indicate that the maximum time required for Chip Erase opcode to
complete is 3.2 s, and for Sector Erase it is 400 ms .
Update the timeouts such, that Chip Erase timeout is set to 5 seconds,
and Sector Erase timeout is set to 500 ms. Such lengthy timeouts ought
to be sufficient for majority of SPI EEPROM chips.
[1] https://www.macronix.com/Lists/Datasheet/Attachments/8634/MX25L5121E,%203V,…
[2] https://www.macronix.com/Lists/Datasheet/Attachments/8750/MX25V5126F,%202.5…
Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Marek Vasut <marek.vasut+renesas(a)mailbox.org>
---
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Mathias Nyman <mathias.nyman(a)intel.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: linux-renesas-soc(a)vger.kernel.org
Cc: linux-usb(a)vger.kernel.org
---
V2: Move Cc: stable above ---
---
drivers/usb/host/xhci-pci-renesas.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-pci-renesas.c b/drivers/usb/host/xhci-pci-renesas.c
index 620f8f0febb8..86df80399c9f 100644
--- a/drivers/usb/host/xhci-pci-renesas.c
+++ b/drivers/usb/host/xhci-pci-renesas.c
@@ -47,8 +47,9 @@
#define RENESAS_ROM_ERASE_MAGIC 0x5A65726F
#define RENESAS_ROM_WRITE_MAGIC 0x53524F4D
-#define RENESAS_RETRY 10000
-#define RENESAS_DELAY 10
+#define RENESAS_RETRY 50000 /* 50000 * RENESAS_DELAY ~= 500ms */
+#define RENESAS_CHIP_ERASE_RETRY 500000 /* 500000 * RENESAS_DELAY ~= 5s */
+#define RENESAS_DELAY 10
#define RENESAS_FW_NAME "renesas_usb_fw.mem"
@@ -407,7 +408,7 @@ static void renesas_rom_erase(struct pci_dev *pdev)
/* sleep a bit while ROM is erased */
msleep(20);
- for (i = 0; i < RENESAS_RETRY; i++) {
+ for (i = 0; i < RENESAS_CHIP_ERASE_RETRY; i++) {
retval = pci_read_config_byte(pdev, RENESAS_ROM_STATUS,
&status);
status &= RENESAS_ROM_STATUS_ERASE;
--
2.47.2
Increase the External ROM access timeouts to prevent failures during
programming of External SPI EEPROM chips. The current timeouts are
too short for some SPI EEPROMs used with uPD720201 controllers.
The current timeout for Chip Erase in renesas_rom_erase() is 100 ms ,
the current timeout for Sector Erase issued by the controller before
Page Program in renesas_fw_download_image() is also 100 ms. Neither
timeout is sufficient for e.g. the Macronix MX25L5121E or MX25V5126F.
MX25L5121E reference manual [1] page 35 section "ERASE AND PROGRAMMING
PERFORMANCE" and page 23 section "Table 8. AC CHARACTERISTICS (Temperature
= 0°C to 70°C for Commercial grade, VCC = 2.7V ~ 3.6V)" row "tCE" indicate
that the maximum time required for Chip Erase opcode to complete is 2 s,
and for Sector Erase it is 300 ms .
MX25V5126F reference manual [2] page 47 section "13. ERASE AND PROGRAMMING
PERFORMANCE (2.3V - 3.6V)" and page 42 section "Table 8. AC CHARACTERISTICS
(Temperature = -40°C to 85°C for Industrial grade, VCC = 2.3V - 3.6V)" row
"tCE" indicate that the maximum time required for Chip Erase opcode to
complete is 3.2 s, and for Sector Erase it is 400 ms .
Update the timeouts such, that Chip Erase timeout is set to 5 seconds,
and Sector Erase timeout is set to 500 ms. Such lengthy timeouts ought
to be sufficient for majority of SPI EEPROM chips.
[1] https://www.macronix.com/Lists/Datasheet/Attachments/8634/MX25L5121E,%203V,…
[2] https://www.macronix.com/Lists/Datasheet/Attachments/8750/MX25V5126F,%202.5…
Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201")
Signed-off-by: Marek Vasut <marek.vasut+renesas(a)mailbox.org>
---
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Mathias Nyman <mathias.nyman(a)intel.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: linux-renesas-soc(a)vger.kernel.org
Cc: linux-usb(a)vger.kernel.org
---
drivers/usb/host/xhci-pci-renesas.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-pci-renesas.c b/drivers/usb/host/xhci-pci-renesas.c
index 620f8f0febb8..86df80399c9f 100644
--- a/drivers/usb/host/xhci-pci-renesas.c
+++ b/drivers/usb/host/xhci-pci-renesas.c
@@ -47,8 +47,9 @@
#define RENESAS_ROM_ERASE_MAGIC 0x5A65726F
#define RENESAS_ROM_WRITE_MAGIC 0x53524F4D
-#define RENESAS_RETRY 10000
-#define RENESAS_DELAY 10
+#define RENESAS_RETRY 50000 /* 50000 * RENESAS_DELAY ~= 500ms */
+#define RENESAS_CHIP_ERASE_RETRY 500000 /* 500000 * RENESAS_DELAY ~= 5s */
+#define RENESAS_DELAY 10
#define RENESAS_FW_NAME "renesas_usb_fw.mem"
@@ -407,7 +408,7 @@ static void renesas_rom_erase(struct pci_dev *pdev)
/* sleep a bit while ROM is erased */
msleep(20);
- for (i = 0; i < RENESAS_RETRY; i++) {
+ for (i = 0; i < RENESAS_CHIP_ERASE_RETRY; i++) {
retval = pci_read_config_byte(pdev, RENESAS_ROM_STATUS,
&status);
status &= RENESAS_ROM_STATUS_ERASE;
--
2.47.2
The patch that broke text mode VGA-console scrolling is this one:
"vgacon: Add check for vc_origin address range in vgacon_scroll()"
commit 864f9963ec6b4b76d104d595ba28110b87158003 upstream.
How to preproduce:
(1) boot a kernel that is configured to use text mode VGA-console
(2) type commands: ls -l /usr/bin | less -S
(3) scroll up/down with cursor-down/up keys
Above mentioned patch seems to have landed in upstream and all
kernel.org stable trees with zero testing. Even minimal testing
would have shown that it breaks text mode VGA-console scrolling.
Greg, Sasha, Linus,
Please consider reverting that buggy patch from all affected trees.
--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189
The quilt patch titled
Subject: mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
has been removed from the -mm tree. Its filename was
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
Date: Mon, 28 Jul 2025 10:53:55 -0700
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
[surenb(a)google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 30 ++++++++++++++++++++++++++++++
mm/mmap_lock.c | 10 +++-------
2 files changed, 33 insertions(+), 7 deletions(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped
+++ a/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwai
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -154,6 +155,10 @@ static inline void vma_refcount_put(stru
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped
+++ a/mm/mmap_lock.c
@@ -164,8 +164,7 @@ retry:
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ retry:
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
_
Patches currently in -mm which might be from surenb(a)google.com are
The quilt patch titled
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hang
has been removed from the -mm tree. Its filename was
mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hang
Date: Mon, 28 Jul 2025 15:52:59 +0800
The current swap-in code assumes that, when a swap entry in shmem mapping
is order 0, its cached folios (if present) must be order 0 too, which
turns out not always correct.
The problem is shmem_split_large_entry is called before verifying the
folio will eventually be swapped in, one possible race is:
CPU1 CPU2
shmem_swapin_folio
/* swap in of order > 0 swap entry S1 */
folio = swap_cache_get_folio
/* folio = NULL */
order = xa_get_order
/* order > 0 */
folio = shmem_swap_alloc_folio
/* mTHP alloc failure, folio = NULL */
<... Interrupted ...>
shmem_swapin_folio
/* S1 is swapped in */
shmem_writeout
/* S1 is swapped out, folio cached */
shmem_split_large_entry(..., S1)
/* S1 is split, but the folio covering it has order > 0 now */
Now any following swapin of S1 will hang: `xa_get_order` returns 0, and
folio lookup will return a folio with order > 0. The
`xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always
return false causing swap-in to return -EEXIST.
And this looks fragile. So fix this up by allowing seeing a larger folio
in swap cache, and check the whole shmem mapping range covered by the
swapin have the right swap value upon inserting the folio. And drop the
redundant tree walks before the insertion.
This will actually improve performance, as it avoids two redundant Xarray
tree walks in the hot path, and the only side effect is that in the
failure path, shmem may redundantly reallocate a few folios causing
temporary slight memory pressure.
And worth noting, it may seems the order and value check before inserting
might help reducing the lock contention, which is not true. The swap
cache layer ensures raced swapin will either see a swap cache folio or
failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is
bypassed), so holding the folio lock and checking the folio flag is
already good enough for avoiding the lock contention. The chance that a
folio passes the swap entry value check but the shmem mapping slot has
changed should be very low.
Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250728075306.12704-2-ryncsn@gmail.com
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
--- a/mm/shmem.c~mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang
+++ a/mm/shmem.c
@@ -891,7 +891,9 @@ static int shmem_add_to_page_cache(struc
pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
- long nr = folio_nr_pages(folio);
+ unsigned long nr = folio_nr_pages(folio);
+ swp_entry_t iter, swap;
+ void *entry;
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -903,14 +905,25 @@ static int shmem_add_to_page_cache(struc
gfp &= GFP_RECLAIM_MASK;
folio_throttle_swaprate(folio, gfp);
+ swap = radix_to_swp_entry(expected);
do {
+ iter = swap;
xas_lock_irq(&xas);
- if (expected != xas_find_conflict(&xas)) {
- xas_set_err(&xas, -EEXIST);
- goto unlock;
+ xas_for_each_conflict(&xas, entry) {
+ /*
+ * The range must either be empty, or filled with
+ * expected swap entries. Shmem swap entries are never
+ * partially freed without split of both entry and
+ * folio, so there shouldn't be any holes.
+ */
+ if (!expected || entry != swp_to_radix_entry(iter)) {
+ xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
+ iter.val += 1 << xas_get_order(&xas);
}
- if (expected && xas_find_conflict(&xas)) {
+ if (expected && iter.val - nr != swap.val) {
xas_set_err(&xas, -EEXIST);
goto unlock;
}
@@ -2359,7 +2372,7 @@ static int shmem_swapin_folio(struct ino
error = -ENOMEM;
goto failed;
}
- } else if (order != folio_order(folio)) {
+ } else if (order > folio_order(folio)) {
/*
* Swap readahead may swap in order 0 folios into swapcache
* asynchronously, while the shmem mapping can still stores
@@ -2384,15 +2397,23 @@ static int shmem_swapin_folio(struct ino
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
+ } else if (order < folio_order(folio)) {
+ swap.val = round_down(swap.val, 1 << folio_order(folio));
+ index = round_down(index, 1 << folio_order(folio));
}
alloced:
- /* We have to do this with folio locked to prevent races */
+ /*
+ * We have to do this with the folio locked to prevent races.
+ * The shmem_confirm_swap below only checks if the first swap
+ * entry matches the folio, that's enough to ensure the folio
+ * is not used outside of shmem, as shmem swap entries
+ * and swap cache folios are never partially freed.
+ */
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
- folio->swap.val != swap.val ||
!shmem_confirm_swap(mapping, index, swap) ||
- xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
+ folio->swap.val != swap.val) {
error = -EEXIST;
goto unlock;
}
_
Patches currently in -mm which might be from kasong(a)tencent.com are
The quilt patch titled
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
has been removed from the -mm tree. Its filename was
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3.patch
This patch was dropped because it was folded into mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
Date: Tue, 29 Jul 2025 07:57:09 -0700
- Addressed Lorenzo's nits, per Lorenzo Stoakes
- Added a warning comment for vma_start_read()
- Added Reviewed-by and Acked-by, per Vlastimil Babka and Lorenzo Stoakes
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 7 +++++++
mm/mmap_lock.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/include/linux/mmap_lock.h
@@ -155,6 +155,10 @@ static inline void vma_refcount_put(stru
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -194,9 +198,12 @@ static inline struct vm_area_struct *vma
if (unlikely(vma->vm_mm != mm)) {
/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
struct mm_struct *other_mm = vma->vm_mm;
+
/*
* __mmdrop() is a heavy operation and we don't need RCU
* protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
*/
rcu_read_unlock();
mmgrab(other_mm);
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/mm/mmap_lock.c
@@ -235,7 +235,7 @@ retry:
goto fallback;
}
- /* Verify the vma is not behind of the last search position. */
+ /* Verify the vma is not behind the last search position. */
if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
Hello maintainers,
This series addresses a defect observed on certain hardware platforms using Linux kernel 6.1.147 with the i915 driver. The issue concerns hot plug detection (HPD) logic,
leading to unreliable or missed detection events on affected hardware. This is happening on some specific devices.
### Background
Issue:
On Simatic IPC227E, we observed unreliable or missing hot plug detection events, while on Simatic IPC227G (otherwise similar platform), expected hot plug behavior was maintained.
Affected kernel:
This patch series is intended for the Linux 6.1.y stable tree only (tested on 6.1.147)
Most of the tests were conducted on 6.1.147 (manual/standalone kernel build, CIP/Isar context).
Root cause analysis:
I do not have access to hardware signal traces or scope data to conclusively prove the root cause at electrical level. My understanding is based on observed driver behavior and logs.
Therefore my assumption as to the real cause is that on IPC227G, HPD IRQ storms are apparently not occurring, so the standard HPD IRQ-based detection works as expected. On IPC227E,
frequent HPD interrupts trigger the i915 driver’s storm detection logic, causing it to switch to polling mode. Therefore polling does not resume correctly, leading to the hotplug
issue this series addresses. Device IPC227E's behavior triggers this kernel edge case, likely due to slight variations in signal integrity, electrical margins, or internal component timing.
Device IPC227G, functions as expected, possibly due to cleaner electrical signaling or more optimal timing characteristics, thus avoiding the triggering condition.
Conclusion:
This points to a hardware-software interaction where kernel code assumes nicer signaling or margins than IPC227E is able to provide, exposing logic gaps not visible on more robust hardware.
### Patches
Patches 1-4:
- Partial backports of upstream commits; only the relevant logic or fixes are applied, with other code omitted due to downstream divergence.
- Applied minimal merging without exhaustive backport of all intermediate upstream changes.
Patch 5:
- Contains cherry-picked logic plus context/compatibility amendments as needed. Ensures that the driver builds.
- Together these fixes greatly improve reliability of hotplug detection on both devices, with no regression detected in our setups.
Thank you for your review,
Nicusor Huhulea
This patch series contains the following changes:
Dmitry Baryshkov (2):
drm/probe_helper: extract two helper functions
drm/probe-helper: enable and disable HPD on connectors
Imre Deak (2):
drm/i915: Fix HPD polling, reenabling the output poll work as needed
drm: Add an HPD poll helper to reschedule the poll work
Nicusor Huhulea (1):
drm/i915: fixes for i915 Hot Plug Detection and build/runtime issues
drivers/gpu/drm/drm_probe_helper.c | 127 ++++++++++++++-----
drivers/gpu/drm/i915/display/intel_hotplug.c | 4 +-
include/drm/drm_modeset_helper_vtables.h | 22 ++++
include/drm/drm_probe_helper.h | 1 +
4 files changed, 122 insertions(+), 32 deletions(-)
--
2.39.2
The patch titled
Subject: mm: fix possible deadlock in console_trylock_spinning
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-possible-deadlock-in-console_trylock_spinning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gu Bowen <gubowen5(a)huawei.com>
Subject: mm: fix possible deadlock in console_trylock_spinning
Date: Wed, 30 Jul 2025 17:49:14 +0800
kmemleak_scan_thread() invokes scan_block() which may invoke a nomal
printk() to print warning message. This can cause a deadlock in the
scenario reported below:
CPU0 CPU1
---- ----
lock(kmemleak_lock);
lock(&port->lock);
lock(kmemleak_lock);
lock(console_owner);
To solve this problem, switch to printk_safe mode before printing warning
message, this will redirect all printk()-s to a special per-CPU buffer,
which will be flushed later from a safe context (irq work), and this
deadlock problem can be avoided.
Our syztester report the following lockdep error:
======================================================
WARNING: possible circular locking dependency detected
5.10.0-22221-gca646a51dd00 #16 Not tainted
------------------------------------------------------
kmemleak/182 is trying to acquire lock:
ffffffffaf9e9020 (console_owner){-...}-{0:0}, at: console_trylock_spinning+0xda/0x1d0 kernel/printk/printk.c:1900
but task is already holding lock:
ffffffffb007cf58 (kmemleak_lock){-.-.}-{2:2}, at: scan_block+0x3d/0x220 mm/kmemleak.c:1310
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (kmemleak_lock){-.-.}-{2:2}:
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3b/0x60 kernel/locking/spinlock.c:164
create_object.isra.0+0x36/0x80 mm/kmemleak.c:691
kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
slab_post_alloc_hook mm/slab.h:518 [inline]
slab_alloc_node mm/slub.c:2987 [inline]
slab_alloc mm/slub.c:2995 [inline]
__kmalloc+0x637/0xb60 mm/slub.c:4100
kmalloc include/linux/slab.h:620 [inline]
tty_buffer_alloc+0x127/0x140 drivers/tty/tty_buffer.c:176
__tty_buffer_request_room+0x9b/0x110 drivers/tty/tty_buffer.c:276
tty_insert_flip_string_fixed_flag+0x60/0x130 drivers/tty/tty_buffer.c:321
tty_insert_flip_string include/linux/tty_flip.h:36 [inline]
tty_insert_flip_string_and_push_buffer+0x3a/0xb0 drivers/tty/tty_buffer.c:578
process_output_block+0xc2/0x2e0 drivers/tty/n_tty.c:592
n_tty_write+0x298/0x540 drivers/tty/n_tty.c:2433
do_tty_write drivers/tty/tty_io.c:1041 [inline]
file_tty_write.constprop.0+0x29b/0x4b0 drivers/tty/tty_io.c:1147
redirected_tty_write+0x51/0x90 drivers/tty/tty_io.c:1176
call_write_iter include/linux/fs.h:2117 [inline]
do_iter_readv_writev+0x274/0x350 fs/read_write.c:741
do_iter_write+0xbb/0x1f0 fs/read_write.c:867
vfs_writev+0xfa/0x380 fs/read_write.c:940
do_writev+0xd6/0x1d0 fs/read_write.c:983
do_syscall_64+0x2b/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
-> #2 (&port->lock){-.-.}-{2:2}:
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3b/0x60 kernel/locking/spinlock.c:164
tty_port_tty_get+0x1f/0xa0 drivers/tty/tty_port.c:289
tty_port_default_wakeup+0xb/0x30 drivers/tty/tty_port.c:48
serial8250_tx_chars+0x259/0x430 drivers/tty/serial/8250/8250_port.c:1906
__start_tx drivers/tty/serial/8250/8250_port.c:1598 [inline]
serial8250_start_tx+0x304/0x320 drivers/tty/serial/8250/8250_port.c:1720
uart_write+0x1a1/0x2e0 drivers/tty/serial/serial_core.c:635
do_output_char+0x2c0/0x370 drivers/tty/n_tty.c:444
process_output drivers/tty/n_tty.c:511 [inline]
n_tty_write+0x269/0x540 drivers/tty/n_tty.c:2445
do_tty_write drivers/tty/tty_io.c:1041 [inline]
file_tty_write.constprop.0+0x29b/0x4b0 drivers/tty/tty_io.c:1147
call_write_iter include/linux/fs.h:2117 [inline]
do_iter_readv_writev+0x274/0x350 fs/read_write.c:741
do_iter_write+0xbb/0x1f0 fs/read_write.c:867
vfs_writev+0xfa/0x380 fs/read_write.c:940
do_writev+0xd6/0x1d0 fs/read_write.c:983
do_syscall_64+0x2b/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
-> #1 (&port_lock_key){-.-.}-{2:2}:
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3b/0x60 kernel/locking/spinlock.c:164
serial8250_console_write+0x292/0x320 drivers/tty/serial/8250/8250_port.c:3458
call_console_drivers.constprop.0+0x185/0x240 kernel/printk/printk.c:1988
console_unlock+0x2b4/0x640 kernel/printk/printk.c:2648
register_console.part.0+0x2a1/0x390 kernel/printk/printk.c:3024
univ8250_console_init+0x24/0x2b drivers/tty/serial/8250/8250_core.c:724
console_init+0x188/0x24b kernel/printk/printk.c:3134
start_kernel+0x2b0/0x41e init/main.c:1072
secondary_startup_64_no_verify+0xc3/0xcb
-> #0 (console_owner){-...}-{0:0}:
check_prev_add+0xfa/0x1380 kernel/locking/lockdep.c:2988
check_prevs_add+0x1d8/0x3c0 kernel/locking/lockdep.c:3113
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
console_trylock_spinning+0x10d/0x1d0 kernel/printk/printk.c:1921
vprintk_emit+0x1a5/0x270 kernel/printk/printk.c:2134
printk+0xb2/0xe7 kernel/printk/printk.c:2183
lookup_object.cold+0xf/0x24 mm/kmemleak.c:405
scan_block+0x1fa/0x220 mm/kmemleak.c:1357
scan_object+0xdd/0x140 mm/kmemleak.c:1415
scan_gray_list+0x8f/0x1c0 mm/kmemleak.c:1453
kmemleak_scan+0x649/0xf30 mm/kmemleak.c:1608
kmemleak_scan_thread+0x94/0xb6 mm/kmemleak.c:1721
kthread+0x1c4/0x210 kernel/kthread.c:328
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299
other info that might help us debug this:
Chain exists of:
console_owner --> &port->lock --> kmemleak_lock
Link: https://lkml.kernel.org/r/20250730094914.566582-1-gubowen5@huawei.com
Signed-off-by: Gu Bowen <gubowen5(a)huawei.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Lu Jialin <lujialin4(a)huawei.com>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: <stable(a)vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/kmemleak.c~mm-fix-possible-deadlock-in-console_trylock_spinning
+++ a/mm/kmemleak.c
@@ -437,9 +437,11 @@ static struct kmemleak_object *__lookup_
else if (untagged_objp == untagged_ptr || alias)
return object;
else {
+ __printk_safe_enter();
kmemleak_warn("Found object by alias at 0x%08lx\n",
ptr);
dump_object_info(object);
+ __printk_safe_exit();
break;
}
}
_
Patches currently in -mm which might be from gubowen5(a)huawei.com are
mm-fix-possible-deadlock-in-console_trylock_spinning.patch
The quilt patch titled
Subject: mm: shmem: fix the shmem large folio allocation for the i915 driver
has been removed from the -mm tree. Its filename was
mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver.patch
This patch was dropped because an alternative patch was or shall be merged
------------------------------------------------------
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Subject: mm: shmem: fix the shmem large folio allocation for the i915 driver
Date: Mon, 28 Jul 2025 16:03:53 +0800
After commit acd7ccb284b8 ("mm: shmem: add large folio support for
tmpfs"), we extend the 'huge=' option to allow any sized large folios for
tmpfs, which means tmpfs will allow getting a highest order hint based on
the size of write() and fallocate() paths, and then will try each
allowable large order.
However, when the i915 driver allocates shmem memory, it doesn't provide
hint information about the size of the large folio to be allocated,
resulting in the inability to allocate PMD-sized shmem, which in turn
affects GPU performance.
To fix this issue, add the 'end' information for shmem_read_folio_gfp() to
help allocate PMD-sized large folios. Additionally, use the maximum
allocation chunk (via mapping_max_folio_size()) to determine the size of
the large folios to allocate in the i915 driver.
Patryk added:
: In my tests, the performance drop ranges from a few percent up to 13%
: in Unigine Superposition under heavy memory usage on the CPU Core Ultra
: 155H with the Xe 128 EU GPU. Other users have reported performance
: impact up to 30% on certain workloads. Please find more in the
: regressions reports:
: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645
: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845
:
: I believe the change should be backported to all active kernel branches
: after version 6.12.
Link: https://lkml.kernel.org/r/0d734549d5ed073c80b11601da3abdd5223e1889.17536898…
Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Patryk Kowalczyk <patryk(a)kowalczyk.ws>
Reported-by: Ville Syrj��l�� <ville.syrjala(a)linux.intel.com>
Tested-by: Patryk Kowalczyk <patryk(a)kowalczyk.ws>
Cc: Christan K��nig <christian.koenig(a)amd.com>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Huang Ray <Ray.Huang(a)amd.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Jonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Mathew Brost <matthew.brost(a)intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Thomas Zimemrmann <tzimmermann(a)suse.de>
Cc: Tvrtko Ursulin <tursulin(a)ursulin.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/gpu/drm/drm_gem.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 ++++++-
drivers/gpu/drm/ttm/ttm_backup.c | 2 +-
include/linux/shmem_fs.h | 4 ++--
mm/shmem.c | 7 ++++---
5 files changed, 14 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/drm_gem.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/drivers/gpu/drm/drm_gem.c
@@ -627,7 +627,7 @@ struct page **drm_gem_get_pages(struct d
i = 0;
while (i < npages) {
long nr;
- folio = shmem_read_folio_gfp(mapping, i,
+ folio = shmem_read_folio_gfp(mapping, i, 0,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
goto fail;
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -69,6 +69,7 @@ int shmem_sg_alloc_table(struct drm_i915
struct scatterlist *sg;
unsigned long next_pfn = 0; /* suppress gcc warning */
gfp_t noreclaim;
+ size_t chunk;
int ret;
if (overflows_type(size / PAGE_SIZE, page_count))
@@ -94,6 +95,7 @@ int shmem_sg_alloc_table(struct drm_i915
mapping_set_unevictable(mapping);
noreclaim = mapping_gfp_constraint(mapping, ~__GFP_RECLAIM);
noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
+ chunk = mapping_max_folio_size(mapping);
sg = st->sgl;
st->nents = 0;
@@ -105,10 +107,13 @@ int shmem_sg_alloc_table(struct drm_i915
0,
}, *s = shrink;
gfp_t gfp = noreclaim;
+ loff_t bytes = (page_count - i) << PAGE_SHIFT;
+ loff_t pos = i << PAGE_SHIFT;
+ bytes = min_t(loff_t, chunk, bytes);
do {
cond_resched();
- folio = shmem_read_folio_gfp(mapping, i, gfp);
+ folio = shmem_read_folio_gfp(mapping, i, pos + bytes, gfp);
if (!IS_ERR(folio))
break;
--- a/drivers/gpu/drm/ttm/ttm_backup.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/drivers/gpu/drm/ttm/ttm_backup.c
@@ -100,7 +100,7 @@ ttm_backup_backup_page(struct file *back
struct folio *to_folio;
int ret;
- to_folio = shmem_read_folio_gfp(mapping, idx, alloc_gfp);
+ to_folio = shmem_read_folio_gfp(mapping, idx, 0, alloc_gfp);
if (IS_ERR(to_folio))
return PTR_ERR(to_folio);
--- a/include/linux/shmem_fs.h~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/include/linux/shmem_fs.h
@@ -153,12 +153,12 @@ enum sgp_type {
int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end,
struct folio **foliop, enum sgp_type sgp);
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp);
+ pgoff_t index, loff_t end, gfp_t gfp);
static inline struct folio *shmem_read_folio(struct address_space *mapping,
pgoff_t index)
{
- return shmem_read_folio_gfp(mapping, index, mapping_gfp_mask(mapping));
+ return shmem_read_folio_gfp(mapping, index, 0, mapping_gfp_mask(mapping));
}
static inline struct page *shmem_read_mapping_page(
--- a/mm/shmem.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/mm/shmem.c
@@ -5930,6 +5930,7 @@ int shmem_zero_setup(struct vm_area_stru
* shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
* @mapping: the folio's address_space
* @index: the folio index
+ * @end: end of a read if allocating a new folio
* @gfp: the page allocator flags to use if allocating
*
* This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
@@ -5942,14 +5943,14 @@ int shmem_zero_setup(struct vm_area_stru
* with the mapping_gfp_mask(), to avoid OOMing the machine unnecessarily.
*/
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp)
+ pgoff_t index, loff_t end, gfp_t gfp)
{
#ifdef CONFIG_SHMEM
struct inode *inode = mapping->host;
struct folio *folio;
int error;
- error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE,
+ error = shmem_get_folio_gfp(inode, index, end, &folio, SGP_CACHE,
gfp, NULL, NULL);
if (error)
return ERR_PTR(error);
@@ -5968,7 +5969,7 @@ EXPORT_SYMBOL_GPL(shmem_read_folio_gfp);
struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp)
{
- struct folio *folio = shmem_read_folio_gfp(mapping, index, gfp);
+ struct folio *folio = shmem_read_folio_gfp(mapping, index, 0, gfp);
struct page *page;
if (IS_ERR(folio))
_
Patches currently in -mm which might be from baolin.wang(a)linux.alibaba.com are
The following changes since commit 347e9f5043c89695b01e66b3ed111755afcf1911:
Linux 6.16-rc6 (2025-07-13 14:25:58 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 6693731487a8145a9b039bc983d77edc47693855:
vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers (2025-08-01 09:11:09 -0400)
Changes from v1:
drop commits that I put in there by mistake. Sorry!
----------------------------------------------------------------
virtio, vhost: features, fixes
vhost can now support legacy threading
if enabled in Kconfig
vsock memory allocation strategies for
large buffers have been improved,
reducing pressure on kmalloc
vhost now supports the in-order feature
guest bits missed the merge window
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Alok Tiwari (4):
virtio: Fix typo in register_virtio_device() doc comment
vhost-scsi: Fix typos and formatting in comments and logs
vhost: Fix typos
vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit
Anders Roxell (1):
vdpa: Fix IDR memory leak in VDUSE module exit
Cindy Lu (1):
vhost: Reintroduce kthread API and add mode selection
Dr. David Alan Gilbert (2):
vhost: vringh: Remove unused iotlb functions
vhost: vringh: Remove unused functions
Dragos Tatulea (2):
vdpa/mlx5: Fix needs_teardown flag calculation
vdpa/mlx5: Fix release of uninitialized resources on error path
Gerd Hoffmann (1):
drm/virtio: implement virtio_gpu_shutdown
Jason Wang (3):
vhost: fail early when __vhost_add_used() fails
vhost: basic in order support
vhost_net: basic in_order support
Michael S. Tsirkin (2):
virtio: fix comments, readability
virtio: document ENOSPC
Mike Christie (1):
vhost-scsi: Fix log flooding with target does not exist errors
Pei Xiao (1):
vhost: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))
Viresh Kumar (2):
virtio-mmio: Remove virtqueue list from mmio device
virtio-vdpa: Remove virtqueue list
WangYuli (1):
virtio: virtio_dma_buf: fix missing parameter documentation
Will Deacon (9):
vhost/vsock: Avoid allocating arbitrarily-sized SKBs
vsock/virtio: Validate length in packet header before skb_put()
vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put()
vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page
vsock/virtio: Rename virtio_vsock_alloc_skb()
vsock/virtio: Move SKB allocation lower-bound check to callers
vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
vsock/virtio: Rename virtio_vsock_skb_rx_put()
vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers
drivers/gpu/drm/virtio/virtgpu_drv.c | 8 +-
drivers/vdpa/mlx5/core/mr.c | 3 +
drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 1 +
drivers/vhost/Kconfig | 18 ++
drivers/vhost/net.c | 88 +++++---
drivers/vhost/scsi.c | 24 +-
drivers/vhost/vhost.c | 377 ++++++++++++++++++++++++++++----
drivers/vhost/vhost.h | 30 ++-
drivers/vhost/vringh.c | 118 ----------
drivers/vhost/vsock.c | 15 +-
drivers/virtio/virtio.c | 7 +-
drivers/virtio/virtio_dma_buf.c | 2 +
drivers/virtio/virtio_mmio.c | 52 +----
drivers/virtio/virtio_ring.c | 4 +
drivers/virtio/virtio_vdpa.c | 44 +---
include/linux/virtio.h | 2 +-
include/linux/virtio_vsock.h | 46 +++-
include/linux/vringh.h | 12 -
include/uapi/linux/vhost.h | 29 +++
kernel/vhost_task.c | 2 +-
net/vmw_vsock/virtio_transport.c | 20 +-
net/vmw_vsock/virtio_transport_common.c | 3 +-
23 files changed, 575 insertions(+), 342 deletions(-)
The patch titled
Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Breno Leitao <leitao(a)debian.org>
Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
Date: Thu, 31 Jul 2025 02:57:18 -0700
When netpoll is enabled, calling pr_warn_once() while holding
kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock
inversion with the netconsole subsystem. This occurs because
pr_warn_once() may trigger netpoll, which eventually leads to
__alloc_skb() and back into kmemleak code, attempting to reacquire
kmemleak_lock.
This is the path for the deadlock.
mem_pool_alloc()
-> raw_spin_lock_irqsave(&kmemleak_lock, flags);
-> pr_warn_once()
-> netconsole subsystem
-> netpoll
-> __alloc_skb
-> __create_object
-> raw_spin_lock_irqsave(&kmemleak_lock, flags);
Fix this by setting a flag and issuing the pr_warn_once() after
kmemleak_lock is released.
Link: https://lkml.kernel.org/r/20250731-kmemleak_lock-v1-1-728fd470198f@debian.o…
Fixes: c5665868183fec ("mm: kmemleak: use the memory pool for early allocations")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock
+++ a/mm/kmemleak.c
@@ -470,6 +470,7 @@ static struct kmemleak_object *mem_pool_
{
unsigned long flags;
struct kmemleak_object *object;
+ bool warn = false;
/* try the slab allocator first */
if (object_cache) {
@@ -488,8 +489,10 @@ static struct kmemleak_object *mem_pool_
else if (mem_pool_free_count)
object = &mem_pool[--mem_pool_free_count];
else
- pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
+ warn = true;
raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
+ if (warn)
+ pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
return object;
}
_
Patches currently in -mm which might be from leitao(a)debian.org are
mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch
The patch titled
Subject: mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-fix-kmap_local-lifo-ordering-for-config_highpte.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sasha Levin <sashal(a)kernel.org>
Subject: mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
Date: Thu, 31 Jul 2025 10:44:31 -0400
With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.
The current code maps dst_pte first, then src_pte, but unmaps them in the
same order (dst_pte, src_pte), violating the LIFO requirement. This
causes the warning in kunmap_local_indexed():
WARNING: CPU: 0 PID: 604 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
addr \!= __fix_to_virt(FIX_KMAP_BEGIN + idx)
Fix this by reversing the unmap order to respect LIFO ordering.
This issue follows the same pattern as similar fixes:
- commit eca6828403b8 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering.
Link: https://lkml.kernel.org/r/20250731144431.773923-1-sashal@kernel.org
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/userfaultfd.c~mm-userfaultfd-fix-kmap_local-lifo-ordering-for-config_highpte
+++ a/mm/userfaultfd.c
@@ -1453,10 +1453,15 @@ out:
folio_unlock(src_folio);
folio_put(src_folio);
}
- if (dst_pte)
- pte_unmap(dst_pte);
+ /*
+ * Unmap in reverse order (LIFO) to maintain proper kmap_local
+ * index ordering when CONFIG_HIGHPTE is enabled. We mapped dst_pte
+ * first, then src_pte, so we must unmap src_pte first, then dst_pte.
+ */
if (src_pte)
pte_unmap(src_pte);
+ if (dst_pte)
+ pte_unmap(dst_pte);
mmu_notifier_invalidate_range_end(&range);
if (si)
put_swap_device(si);
_
Patches currently in -mm which might be from sashal(a)kernel.org are
mm-userfaultfd-fix-kmap_local-lifo-ordering-for-config_highpte.patch
The patch titled
Subject: mm/debug_vm_pgtable: clear page table entries at destroy_args()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Herton R. Krzesinski" <herton(a)redhat.com>
Subject: mm/debug_vm_pgtable: clear page table entries at destroy_args()
Date: Thu, 31 Jul 2025 18:40:51 -0300
The mm/debug_vm_pagetable test allocates manually page table entries for
the tests it runs, using also its manually allocated mm_struct. That in
itself is ok, but when it exits, at destroy_args() it fails to clear those
entries with the *_clear functions.
The problem is that leaves stale entries. If another process allocates an
mm_struct with a pgd at the same address, it may end up running into the
stale entry. This is happening in practice on a debug kernel with
CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra
debugging I added (it prints a warning trace if pgtables_bytes goes
negative, in addition to the warning at check_mm() function):
[ 2.539353] debug_vm_pgtable: [get_random_vaddr ]: random_vaddr is 0x7ea247140000
[ 2.539366] kmem_cache info
[ 2.539374] kmem_cachep 0x000000002ce82385 - freelist 0x0000000000000000 - offset 0x508
[ 2.539447] debug_vm_pgtable: [init_args ]: args->mm is 0x000000002267cc9e
(...)
[ 2.552800] WARNING: CPU: 5 PID: 116 at include/linux/mm.h:2841 free_pud_range+0x8bc/0x8d0
[ 2.552816] Modules linked in:
[ 2.552843] CPU: 5 UID: 0 PID: 116 Comm: modprobe Not tainted 6.12.0-105.debug_vm2.el10.ppc64le+debug #1 VOLUNTARY
[ 2.552859] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW910.00 (VL910_062) hv:phyp pSeries
[ 2.552872] NIP: c0000000007eef3c LR: c0000000007eef30 CTR: c0000000003d8c90
[ 2.552885] REGS: c0000000622e73b0 TRAP: 0700 Not tainted (6.12.0-105.debug_vm2.el10.ppc64le+debug)
[ 2.552899] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 0000000a
[ 2.552954] CFAR: c0000000008f03f0 IRQMASK: 0
[ 2.552954] GPR00: c0000000007eef30 c0000000622e7650 c000000002b1ac00 0000000000000001
[ 2.552954] GPR04: 0000000000000008 0000000000000000 c0000000007eef30 ffffffffffffffff
[ 2.552954] GPR08: 00000000ffff00f5 0000000000000001 0000000000000048 0000000000004000
[ 2.552954] GPR12: 00000003fa440000 c000000017ffa300 c0000000051d9f80 ffffffffffffffdb
[ 2.552954] GPR16: 0000000000000000 0000000000000008 000000000000000a 60000000000000e0
[ 2.552954] GPR20: 4080000000000000 c0000000113af038 00007fffcf130000 0000700000000000
[ 2.552954] GPR24: c000000062a6a000 0000000000000001 8000000062a68000 0000000000000001
[ 2.552954] GPR28: 000000000000000a c000000062ebc600 0000000000002000 c000000062ebc760
[ 2.553170] NIP [c0000000007eef3c] free_pud_range+0x8bc/0x8d0
[ 2.553185] LR [c0000000007eef30] free_pud_range+0x8b0/0x8d0
[ 2.553199] Call Trace:
[ 2.553207] [c0000000622e7650] [c0000000007eef30] free_pud_range+0x8b0/0x8d0 (unreliable)
[ 2.553229] [c0000000622e7750] [c0000000007f40b4] free_pgd_range+0x284/0x3b0
[ 2.553248] [c0000000622e7800] [c0000000007f4630] free_pgtables+0x450/0x570
[ 2.553274] [c0000000622e78e0] [c0000000008161c0] exit_mmap+0x250/0x650
[ 2.553292] [c0000000622e7a30] [c0000000001b95b8] __mmput+0x98/0x290
[ 2.558344] [c0000000622e7a80] [c0000000001d1018] exit_mm+0x118/0x1b0
[ 2.558361] [c0000000622e7ac0] [c0000000001d141c] do_exit+0x2ec/0x870
[ 2.558376] [c0000000622e7b60] [c0000000001d1ca8] do_group_exit+0x88/0x150
[ 2.558391] [c0000000622e7bb0] [c0000000001d1db8] sys_exit_group+0x48/0x50
[ 2.558407] [c0000000622e7be0] [c00000000003d810] system_call_exception+0x1e0/0x4c0
[ 2.558423] [c0000000622e7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
(...)
[ 2.558892] ---[ end trace 0000000000000000 ]---
[ 2.559022] BUG: Bad rss-counter state mm:000000002267cc9e type:MM_ANONPAGES val:1
[ 2.559037] BUG: non-zero pgtables_bytes on freeing mm: -6144
Here the modprobe process ended up with an allocated mm_struct from the
mm_struct slab that was used before by the debug_vm_pgtable test. That is
not a problem, since the mm_struct is initialized again etc., however, if
it ends up using the same pgd table, it bumps into the old stale entry
when clearing/freeing the page table entries, so it tries to free an entry
already gone (that one which was allocated by the debug_vm_pgtable test),
which also explains the negative pgtables_bytes since it's accounting for
not allocated entries in the current process.
As far as I looked pgd_{alloc,free} etc. does not clear entries, and
clearing of the entries is explicitly done in the free_pgtables->
free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range->
free_pte_range path. However, the debug_vm_pgtable test does not call
free_pgtables, since it allocates mm_struct and entries manually for its
test and eg. not goes through page faults. So it also should clear
manually the entries before exit at destroy_args().
This problem was noticed on a reboot X number of times test being done on
a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled.
Depends on the system, but on a 100 times reboot loop the problem could
manifest once or twice, if a process ends up getting the right mm->pgd
entry with the stale entries used by mm/debug_vm_pagetable. After using
this patch, I couldn't reproduce/experience the problems anymore. I was
able to reproduce the problem as well on latest upstream kernel (6.16).
I also modified destroy_args() to use mmput() instead of mmdrop(), there
is no reason to hold mm_users reference and not release the mm_struct
entirely, and in the output above with my debugging prints I already had
patched it to use mmput, it did not fix the problem, but helped in the
debugging as well.
Link: https://lkml.kernel.org/r/20250731214051.4115182-1-herton@redhat.com
Fixes: 3c9b84f044a9e ("mm/debug_vm_pgtable: introduce struct pgtable_debug_args")
Signed-off-by: Herton R. Krzesinski <herton(a)redhat.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Gavin Shan <gshan(a)redhat.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/debug_vm_pgtable.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args
+++ a/mm/debug_vm_pgtable.c
@@ -1041,29 +1041,34 @@ static void __init destroy_args(struct p
/* Free page table entries */
if (args->start_ptep) {
+ pmd_clear(args->pmdp);
pte_free(args->mm, args->start_ptep);
mm_dec_nr_ptes(args->mm);
}
if (args->start_pmdp) {
+ pud_clear(args->pudp);
pmd_free(args->mm, args->start_pmdp);
mm_dec_nr_pmds(args->mm);
}
if (args->start_pudp) {
+ p4d_clear(args->p4dp);
pud_free(args->mm, args->start_pudp);
mm_dec_nr_puds(args->mm);
}
- if (args->start_p4dp)
+ if (args->start_p4dp) {
+ pgd_clear(args->pgdp);
p4d_free(args->mm, args->start_p4dp);
+ }
/* Free vma and mm struct */
if (args->vma)
vm_area_free(args->vma);
if (args->mm)
- mmdrop(args->mm);
+ mmput(args->mm);
}
static struct page * __init
_
Patches currently in -mm which might be from herton(a)redhat.com are
mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args.patch
To prevent timing attacks, HMAC value comparison needs to be constant
time. Replace the memcmp() with the correct function, crypto_memneq().
Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
drivers/char/tpm/Kconfig | 1 +
drivers/char/tpm/tpm2-sessions.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index dddd702b2454a..f9d8a4e966867 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -31,10 +31,11 @@ config TCG_TPM2_HMAC
bool "Use HMAC and encrypted transactions on the TPM bus"
default X86_64
select CRYPTO_ECDH
select CRYPTO_LIB_AESCFB
select CRYPTO_LIB_SHA256
+ select CRYPTO_LIB_UTILS
help
Setting this causes us to deploy a scheme which uses request
and response HMACs in addition to encryption for
communicating with the TPM to prevent or detect bus snooping
and interposer attacks (see tpm-security.rst). Saying Y
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index bdb119453dfbe..5fbd62ee50903 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -69,10 +69,11 @@
#include <linux/unaligned.h>
#include <crypto/kpp.h>
#include <crypto/ecdh.h>
#include <crypto/hash.h>
#include <crypto/hmac.h>
+#include <crypto/utils.h>
/* maximum number of names the TPM must remember for authorization */
#define AUTH_MAX_NAMES 3
#define AES_KEY_BYTES AES_KEYSIZE_128
@@ -827,16 +828,15 @@ int tpm_buf_check_hmac_response(struct tpm_chip *chip, struct tpm_buf *buf,
sha256_update(&sctx, auth->our_nonce, sizeof(auth->our_nonce));
sha256_update(&sctx, &auth->attrs, 1);
/* we're done with the rphash, so put our idea of the hmac there */
tpm2_hmac_final(&sctx, auth->session_key, sizeof(auth->session_key)
+ auth->passphrase_len, rphash);
- if (memcmp(rphash, &buf->data[offset_s], SHA256_DIGEST_SIZE) == 0) {
- rc = 0;
- } else {
+ if (crypto_memneq(rphash, &buf->data[offset_s], SHA256_DIGEST_SIZE)) {
dev_err(&chip->dev, "TPM: HMAC check failed\n");
goto out;
}
+ rc = 0;
/* now do response decryption */
if (auth->attrs & TPM2_SA_ENCRYPT) {
/* need key and IV */
tpm2_KDFa(auth->session_key, SHA256_DIGEST_SIZE
--
2.50.1
The patch titled
Subject: kunit: kasan_test: disable fortify string checker on kasan_strings() test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kunit-kasan_test-disable-fortify-string-checker-on-kasan_strings-test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yeoreum Yun <yeoreum.yun(a)arm.com>
Subject: kunit: kasan_test: disable fortify string checker on kasan_strings() test
Date: Fri, 1 Aug 2025 13:02:36 +0100
Similar to commit 09c6304e38e4 ("kasan: test: fix compatibility with
FORTIFY_SOURCE") the kernel is panicing in kasan_string().
This is due to the `src` and `ptr` not being hidden from the optimizer
which would disable the runtime fortify string checker.
Call trace:
__fortify_panic+0x10/0x20 (P)
kasan_strings+0x980/0x9b0
kunit_try_run_case+0x68/0x190
kunit_generic_run_threadfn_adapter+0x34/0x68
kthread+0x1c4/0x228
ret_from_fork+0x10/0x20
Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000)
---[ end trace 0000000000000000 ]---
note: kunit_try_catch[128] exited with irqs disabled
note: kunit_try_catch[128] exited with preempt_count 1
# kasan_strings: try faulted: last
** replaying previous printk message **
# kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600
# kasan_strings: internal error occurred preventing test case from running: -4
Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com
Fixes: 73228c7ecc5e ("KASAN: port KASAN Tests to KUnit")
Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan_test_c.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/kasan/kasan_test_c.c~kunit-kasan_test-disable-fortify-string-checker-on-kasan_strings-test
+++ a/mm/kasan/kasan_test_c.c
@@ -1578,9 +1578,11 @@ static void kasan_strings(struct kunit *
ptr = kmalloc(size, GFP_KERNEL | __GFP_ZERO);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
+ OPTIMIZER_HIDE_VAR(ptr);
src = kmalloc(KASAN_GRANULE_SIZE, GFP_KERNEL | __GFP_ZERO);
strscpy(src, "f0cacc1a0000000", KASAN_GRANULE_SIZE);
+ OPTIMIZER_HIDE_VAR(src);
/*
* Make sure that strscpy() does not trigger KASAN if it overreads into
_
Patches currently in -mm which might be from yeoreum.yun(a)arm.com are
kunit-kasan_test-disable-fortify-string-checker-on-kasan_strings-test.patch
The commit in the Fixes tag breaks my laptop (found by git bisect).
My home RJ45 LAN cable cannot connect after that commit.
The call to netif_carrier_on() should be done when netif_carrier_ok()
is false. Not when it's true. Because calling netif_carrier_on() when
__LINK_STATE_NOCARRIER is not set actually does nothing.
Cc: Armando Budianto <sprite(a)gnuweeb.org>
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb…
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
drivers/net/usb/usbnet.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index bc1d8631ffe0..1eb98eeb64f9 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1107,31 +1107,31 @@ static const struct ethtool_ops usbnet_ethtool_ops = {
};
/*-------------------------------------------------------------------------*/
static void __handle_link_change(struct usbnet *dev)
{
if (!test_bit(EVENT_DEV_OPEN, &dev->flags))
return;
if (!netif_carrier_ok(dev->net)) {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* kill URBs for reading packets to save bus bandwidth */
unlink_urbs(dev, &dev->rxq);
/*
* tx_timeout will unlink URBs for sending packets and
* tx queue is stopped by netcore after link becomes off
*/
} else {
- if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
- netif_carrier_on(dev->net);
-
/* submitting URBs for reading packets */
tasklet_schedule(&dev->bh);
}
/* hard_mtu or rx_urb_size may change during link change */
usbnet_update_max_qlen(dev);
clear_bit(EVENT_LINK_CHANGE, &dev->flags);
}
--
Ammar Faizi
The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.
Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected
Where the last Connected is then a stable link state.
When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.
In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
Call trace:
arch_local_irq_enable+0x4/0x8
local_bh_enable+0x18/0x20
__netdev_alloc_skb+0x18c/0x1cc
rx_submit+0x68/0x1f8 [usbnet]
rx_alloc_submit+0x4c/0x74 [usbnet]
usbnet_bh+0x1d8/0x218 [usbnet]
usbnet_bh_tasklet+0x10/0x18 [usbnet]
tasklet_action_common+0xa8/0x110
tasklet_action+0x2c/0x34
handle_softirqs+0x2cc/0x3a0
__do_softirq+0x10/0x18
____do_softirq+0xc/0x14
call_on_irq_stack+0x24/0x34
do_softirq_own_stack+0x18/0x20
__irq_exit_rcu+0xa8/0xb8
irq_exit_rcu+0xc/0x30
el1_interrupt+0x34/0x48
el1h_64_irq_handler+0x14/0x1c
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x38/0x48
xhci_urb_dequeue+0x1ac/0x45c [xhci_hcd]
unlink1+0xd4/0xdc [usbcore]
usb_hcd_unlink_urb+0x70/0xb0 [usbcore]
usb_unlink_urb+0x24/0x44 [usbcore]
unlink_urbs.constprop.0.isra.0+0x64/0xa8 [usbnet]
__handle_link_change+0x34/0x70 [usbnet]
usbnet_deferred_kevent+0x1c0/0x320 [usbnet]
process_scheduled_works+0x2d0/0x48c
worker_thread+0x150/0x1dc
kthread+0xd8/0xe8
ret_from_fork+0x10/0x20
Get around the problem by delaying the carrier on to the scheduled work.
This needs a new flag to keep track of the necessary action.
The carrier ok check cannot be removed as it remains required for the
LINK_RESET event flow.
Fixes: 4b49f58fff00 ("usbnet: handle link change")
Cc: stable(a)vger.kernel.org
Signed-off-by: John Ernberg <john.ernberg(a)actia.se>
---
I've been testing this quite aggressively over a night, and seems equally
stable to my first approach. I'm a little bit concerned that the bit stuff
can now race (although much smaller) in the opposite direction, that a
carrier off can occur between test_and_clear_bit() and the carrier on
action in the handler. Leaving the carrier on when it shouldn't be.
v2:
- target tree in patch description.
- Drop Ming Lei from address list as their address bounces.
- Rework solution based on feedback by Jakub (let me know if you want a
Suggested-by tag, if we're keeping this direction)
v1: https://lore.kernel.org/netdev/20250710085028.1070922-1-john.ernberg@actia.…
Tested on 6.12.20 and forward ported. Stack trace from 6.12.20.
---
drivers/net/usb/usbnet.c | 11 ++++++++---
include/linux/usb/usbnet.h | 1 +
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index c04e715a4c2a..bc1d8631ffe0 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1122,6 +1122,9 @@ static void __handle_link_change(struct usbnet *dev)
* tx queue is stopped by netcore after link becomes off
*/
} else {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* submitting URBs for reading packets */
tasklet_schedule(&dev->bh);
}
@@ -2009,10 +2012,12 @@ EXPORT_SYMBOL(usbnet_manage_power);
void usbnet_link_change(struct usbnet *dev, bool link, bool need_reset)
{
/* update link after link is reseted */
- if (link && !need_reset)
- netif_carrier_on(dev->net);
- else
+ if (link && !need_reset) {
+ set_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
+ } else {
+ clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
netif_carrier_off(dev->net);
+ }
if (need_reset && link)
usbnet_defer_kevent(dev, EVENT_LINK_RESET);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 0b9f1e598e3a..4bc6bb01a0eb 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -76,6 +76,7 @@ struct usbnet {
# define EVENT_LINK_CHANGE 11
# define EVENT_SET_RX_MODE 12
# define EVENT_NO_IP_ALIGN 13
+# define EVENT_LINK_CARRIER_ON 14
/* This one is special, as it indicates that the device is going away
* there are cyclic dependencies between tasklet, timer and bh
* that must be broken
--
2.49.0
Hi,
The issue that I originally fixed with
commit 8ff4fb276e23 ("pinctrl: amd: Clear GPIO debounce for suspend")
I have found out will be a problem on more platforms due to more designs
switching to power button controlled this way.
I think it would be a good idea to take it back to stable kernels.
Thanks!
Hi Greg,
The below two patches are needed on linux-5.15.y and linux-6.1.y, please
help to add them to the stable tree.
b7a62611fab7 usb: chipidea: add USB PHY event
87ed257acb09 usb: phy: mxs: disconnect line when USB charger is attached
They are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git branch usb-testing
Thanks,
Xu Yang
+ stable
+ regressions
New subject
Great news.
Greg, Sasha,
Can you please pull in these 3 commits specifically to 6.6.y to fix a
regression that was reported by Morgan in 6.6.y:
commit 12753d71e8c5 ("ACPI: CPPC: Add helper to get the highest
performance value")
commit ed429c686b79 ("cpufreq: amd-pstate: Enable amd-pstate preferred
core support")
commit 3d291fe47fe1 ("cpufreq: amd-pstate: fix the highest frequency
issue which limits performance")
Further details are below.
Thanks!
On 9/5/2024 16:09, Jones, Morgan wrote:
> Mario,
>
> Confirmed. Thank you for the help! Slightly different refs on my end:
>
> Remotes:
>
> next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (fetch)
> next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (push)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (fetch)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (push)
> superm1 https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/ (fetch)
> superm1 https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/ (push)
> torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
>
> Patches:
>
> git format-patch 12753d71e8c5^..12753d71e8c5
> git format-patch f3a052391822b772b4e27f2594526cf1eb103cab^..f3a052391822b772b4e27f2594526cf1eb103cab
> git format-patch bf202e654bfa57fb8cf9d93d4c6855890b70b9c4^..bf202e654bfa57fb8cf9d93d4c6855890b70b9c4
>
> Results:
>
> Linux redact 6.6.48 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux
>
> analyzing CPU 56:
> driver: amd-pstate-epp
> CPUs which run at the same hardware frequency: 56
> CPUs which need to have their frequency coordinated by software: 56
> maximum transition latency: Cannot determine or is not supported.
> hardware limits: 400 MHz - 3.35 GHz
> available cpufreq governors: performance powersave
> current policy: frequency should be within 400 MHz and 3.35 GHz.
> The governor "performance" may decide which speed to use
> within this range.
> current CPU frequency: Unable to call hardware
> current CPU frequency: 2.09 GHz (asserted by call to kernel)
> boost state support:
> Supported: yes
> Active: yes
> AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.35 GHz.
> AMD PSTATE Nominal Performance: 152. Nominal Frequency: 2.00 GHz.
> AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
> AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.
>
> And our builds are back to being fast with `amd_pstate=active amd_prefcore=enable amd_pstate.shared_mem=1`.
>
> Morgan
>
> -----Original Message-----
> From: Mario Limonciello <mario.limonciello(a)amd.com>
> Sent: Thursday, September 5, 2024 8:12 AM
> To: Jones, Morgan <Morgan.Jones(a)viasat.com>
> Cc: linux-pm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; David Arcari <darcari(a)redhat.com>; Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>; rafael(a)kernel.org; viresh.kumar(a)linaro.org; gautham.shenoy(a)amd.com; perry.yuan(a)amd.com; skhan(a)linuxfoundation.org; li.meng(a)amd.com; ray.huang(a)amd.com
> Subject: Re: [EXTERNAL] Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems
>
> Hi Morgan,
>
> Please apply these 3 commits:
>
> commit 12753d71e8c5 ("ACPI: CPPC: Add helper to get the highest performance value") commit ed429c686b79 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support") commit 3d291fe47fe1 ("cpufreq: amd-pstate: fix the highest frequency issue which limits performance")
>
> The first two should help your system, the third will prevent introducing a regression on a different one.
>
> Assuming that works we should ask @stable to pull all 3 in to fix this regression.
>
> Thanks,
>
> On 9/4/2024 08:57, Mario Limonciello wrote:
>> Morgan,
>>
>> I was referring specfiically to the version that landed in Linus' tree:
>> https://urldefense.us/v3/__https://git.kernel.org/torvalds/c/8164f7433
>> 264__;!!C5Asm8uRnZQmlRln!aIZEDEbIUKD7OrxN0b0KjoqKYDL2yMkwk4EK7x_oSnyHQ
>> 6MEq7yt6JHjd0TD9DgEYEWDcF58OKL8c7G11bT3dSqL8eM$
>>
>> But yeah it's effectively the same thing. In any case, it's not the
>> solution.
>>
>> We had some internal discussion and suspect this is due to missing
>> prefcore patches in 6.6 as that feature landed in 6.9. We'll try to
>> reproduce this on a Rome system and come back with our findings and
>> suggestions what to do.
>>
>> Thanks,
>>
>
From: Nirmal Patel <nirmal.patel(a)linux.intel.com>
[ Upstream commit 886e67100b904cb1b106ed1dfa8a60696aff519a ]
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.c…
Signed-off-by: Nirmal Patel <nirmal.patel(a)linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
(cherry picked from commit 886e67100b904cb1b106ed1dfa8a60696aff519a)
[ This patch has already been backported to the Ubuntu 5.15 kernel
and fixes boot issues on Intel platforms with VMD rev 04,
confirmed on version 5.15.189. ]
Signed-off-by: Artur Piechocki <artur.piechocki(a)open-e.com>
---
drivers/pci/controller/vmd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 846590706a38..d0d18e9c6968 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -799,6 +799,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
vmd_attach_resources(vmd);
if (vmd->irq_domain)
dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
+ else
+ dev_set_msi_domain(&vmd->bus->dev,
+ dev_get_msi_domain(&vmd->dev->dev));
WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
"domain"), "Can't create symlink to domain\n");
--
2.50.1
When working on an issue in the MPTCP selftests due to a recent
backport, I noticed it was due to a missing backports. A few years ago,
we were not properly monitoring the failed backports, and we missed a
few patches:
- 857898eb4b28 ("selftests: mptcp: add missing join check")
- 0c1f78a49af7 ("mptcp: fix error mibs accounting")
- 31bf11de146c ("mptcp: introduce MAPPING_BAD_CSUM")
- fd37c2ecb21f ("selftests: mptcp: Initialize variables to quiet gcc 12 warnings")
- c886d70286bf ("mptcp: do not queue data on closed subflows")
An extra patch has been added to ease the other backports:
- b8e0def397d7 ("mptcp: drop unused sk in mptcp_push_release")
Geliang Tang (1):
mptcp: drop unused sk in mptcp_push_release
Mat Martineau (1):
selftests: mptcp: Initialize variables to quiet gcc 12 warnings
Matthieu Baerts (1):
selftests: mptcp: add missing join check
Paolo Abeni (3):
mptcp: fix error mibs accounting
mptcp: introduce MAPPING_BAD_CSUM
mptcp: do not queue data on closed subflows
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 17 ++++++++++------
net/mptcp/protocol.h | 11 ++++++----
net/mptcp/subflow.c | 20 +++++++++----------
.../selftests/net/mptcp/mptcp_connect.c | 2 +-
.../testing/selftests/net/mptcp/mptcp_join.sh | 1 +
6 files changed, 30 insertions(+), 22 deletions(-)
--
2.50.0
From: Willem de Bruijn <willemb(a)google.com>
When packet_set_ring() releases po->bind_lock, another thread can
run packet_notifier() and process an NETDEV_UP event.
This race and the fix are both similar to that of commit 15fe076edea7
("net/packet: fix a race in packet_bind() and packet_notifier()").
There too the packet_notifier NETDEV_UP event managed to run while a
po->bind_lock critical section had to be temporarily released. And
the fix was similarly to temporarily set po->num to zero to keep
the socket unhooked until the lock is retaken.
The po->bind_lock in packet_set_ring and packet_notifier precede the
introduction of git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Quang Le <quanglex97(a)gmail.com>
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
net/packet/af_packet.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bc438d0d96a7..a7017d7f0927 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4573,10 +4573,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
spin_lock(&po->bind_lock);
was_running = packet_sock_flag(po, PACKET_SOCK_RUNNING);
num = po->num;
- if (was_running) {
- WRITE_ONCE(po->num, 0);
+ WRITE_ONCE(po->num, 0);
+ if (was_running)
__unregister_prot_hook(sk, false);
- }
+
spin_unlock(&po->bind_lock);
synchronize_net();
@@ -4608,10 +4608,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
mutex_unlock(&po->pg_vec_lock);
spin_lock(&po->bind_lock);
- if (was_running) {
- WRITE_ONCE(po->num, num);
+ WRITE_ONCE(po->num, num);
+ if (was_running)
register_prot_hook(sk);
- }
+
spin_unlock(&po->bind_lock);
if (pg_vec && (po->tp_version > TPACKET_V2)) {
/* Because we don't support block-based V3 on tx-ring */
--
2.50.1.565.gc32cd1483b-goog
The logic to synthesize constant_tsc for Pentium 4s (Family 15) is
wrong. Since INTEL_P4_PRESCOTT is numerically greater than
INTEL_P4_WILLAMETTE, the logic always results in false and never sets
X86_FEATURE_CONSTANT_TSC for any Pentium 4 model.
The error was introduced while replacing the x86_model check with a VFM
one. The original check was as follows:
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e))
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
Fix the logic to cover all Pentium 4 models from Prescott (model 3) to
Cedarmill (model 6) which is the last model released in Family 15.
Fixes: fadb6f569b10 ("x86/cpu/intel: Limit the non-architectural constant_tsc model checks")
Cc: <stable(a)vger.kernel.org> # v6.15
Signed-off-by: Suchit Karunakaran <suchitkarunakaran(a)gmail.com>
---
Changes since v2:
- Improve commit message
Changes since v1:
- Fix incorrect logic
arch/x86/kernel/cpu/intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 076eaa41b8c8..6f5bd5dbc249 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -262,7 +262,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
- } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_WILLAMETTE) ||
+ } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_CEDARMILL) ||
(c->x86_vfm >= INTEL_CORE_YONAH && c->x86_vfm <= INTEL_IVYBRIDGE)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
}
--
2.50.1
While working on a BeaglePlay with the UART controller and making it use
DMA, I figured the DMA controller completion helper was clearly
misbehaving.
On one side, with slow devices like a UART controller, the helper uses
an unusual polling delay of 1s.
On the other side, the helper can also perform 0us sleeps which means
the driver does not sleep at all in some situations and keeps the CPU
busy waiting.
Finally, while I was digging into these issues, it felt like the helper
was a bit too complex and could be simplified, which is what I did in
patch 3. I initially worked on v6.7.x which did not had the spinlocks, I
hope I didn't got them wrong.
I also tested this with v6.17-rc7.
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Miquel Raynal (3):
dmaengine: ti: k3-udma: Reduce completion polling delay
dmaengine: ti: k3-udma: Ensure a minimum polling delay
dmaengine: ti: k3-udma: Simplify the completion worker
drivers/dma/ti/k3-udma.c | 85 ++++++++++++++++++++----------------------------
1 file changed, 36 insertions(+), 49 deletions(-)
---
base-commit: b59220b9fa2c3da4295d71913146cd64b869fcf9
change-id: 20250731-ti-dma-timeout-1c64868b5baf
Best regards,
--
Miquel Raynal <miquel.raynal(a)bootlin.com>
Hi,
Hope today's treating you well!
I wanted to know if you would be interested in purchasing the attendees list of NRF 2025 Retail's Big Show ?
Attendees count: 25,000 Leads
Contact Information: Company Name, Web URL, Contact Name, Title, Direct Email, Phone Number, Mailing Address, Industry, Employee Size, Annual Sales.
If this ignites your interest please let me know, I'd be glad to offer you the pricing options, for your assessment.
I'm grateful for your quick reply. Can't wait to hear from you.
Best
Alyssa Campbell
Demand Generation Manager
B2B Connect, Inc.
Please respond with a 'REMOVE' if you don't wish to receive further emails
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
When trying to set MCR[2], XON1 is incorrectly accessed instead. And when
writing to the TCR register to configure flow control levels, we are
incorrectly writing to the MSR register. The default value of $00 is then
used for TCR, which means that selectable trigger levels in FCR are used
in place of TCR.
TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2]
to be set. EFR[4] is already set in probe().
MCR access requires LCR[7] to be zero.
Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly
accessed instead because MCR shares the same address space as XON1.
Since MCR[2] is unmodified and still zero, when writing to TCR we are in
fact writing to MSR because TCR/TLR registers share the same address space
as MSR/SPR.
Fix by first removing useless reconfiguration of EFR[4] (enable enhanced
functions), as it is already enabled in sc16is7xx_probe() since commit
43c51bb573aa ("sc16is7xx: make sure device is in suspend once probed").
Now LCR is $00, which means that MCR access is enabled.
Also remove regcache_cache_bypass() calls since we no longer access the
enhanced registers set, and TCR is already declared as volatile (in fact
by declaring MSR as volatile, which shares the same address).
Finally disable access to TCR/TLR registers after modifying them by
clearing MCR[2].
Note: the comment about "... and internal clock div" is wrong and can be
ignored/removed as access to internal clock div registers (DLL/DLH)
is permitted only when LCR[7] is logic 1, not when enhanced features
is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
---
drivers/tty/serial/sc16is7xx.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 5ea8aadb6e69c..9056cb82456f1 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1177,17 +1177,6 @@ static int sc16is7xx_startup(struct uart_port *port)
sc16is7xx_port_write(port, SC16IS7XX_FCR_REG,
SC16IS7XX_FCR_FIFO_BIT);
- /* Enable EFR */
- sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
- SC16IS7XX_LCR_CONF_MODE_B);
-
- regcache_cache_bypass(one->regmap, true);
-
- /* Enable write access to enhanced features and internal clock div */
- sc16is7xx_port_update(port, SC16IS7XX_EFR_REG,
- SC16IS7XX_EFR_ENABLE_BIT,
- SC16IS7XX_EFR_ENABLE_BIT);
-
/* Enable TCR/TLR */
sc16is7xx_port_update(port, SC16IS7XX_MCR_REG,
SC16IS7XX_MCR_TCRTLR_BIT,
@@ -1199,7 +1188,8 @@ static int sc16is7xx_startup(struct uart_port *port)
SC16IS7XX_TCR_RX_RESUME(24) |
SC16IS7XX_TCR_RX_HALT(48));
- regcache_cache_bypass(one->regmap, false);
+ /* Disable TCR/TLR access */
+ sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0);
/* Now, initialize the UART */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_WORD_LEN_8);
base-commit: 260f6f4fda93c8485c8037865c941b42b9cba5d2
--
2.39.5
From: Nirmal Patel <nirmal.patel(a)linux.intel.com>
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.c…
Cc: stable(a)vger.kernel.org
Signed-off-by: Nirmal Patel <nirmal.patel(a)linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Artur Piechocki <artur.piechocki(a)open-e.com>
---
drivers/pci/controller/vmd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index eb05cceab964..5015adc04d19 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -853,6 +853,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
vmd_attach_resources(vmd);
if (vmd->irq_domain)
dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
+ else
+ dev_set_msi_domain(&vmd->bus->dev,
+ dev_get_msi_domain(&vmd->dev->dev));
vmd_acpi_begin();
--
2.50.1
> This is a note to let you know that I've just added the patch titled
>
> Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
>
> to the 6.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> revert-cgroup_freezer-cgroup_freezing-check-if-not-f.patch
> and it can be found in the queue-6.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it.
>
The patch ("sched,freezer: Remove unnecessary warning in __thaw_task") should also be merged to
prevent triggering another warning in __thaw_task().
Best regards,
Ridong
>
>
> commit ee09694849a570cbc32015b5329b7c2f3f778748
> Author: Chen Ridong <chenridong(a)huawei.com>
> Date: Thu Jul 17 08:55:50 2025 +0000
>
> Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
>
> [ Upstream commit 14a67b42cb6f3ab66f41603c062c5056d32ea7dd ]
>
> This reverts commit cff5f49d433fcd0063c8be7dd08fa5bf190c6c37.
>
> Commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
> frozen") modified the cgroup_freezing() logic to verify that the FROZEN
> flag is not set, affecting the return value of the freezing() function,
> in order to address a warning in __thaw_task.
>
> A race condition exists that may allow tasks to escape being frozen. The
> following scenario demonstrates this issue:
>
> CPU 0 (get_signal path) CPU 1 (freezer.state reader)
> try_to_freeze read freezer.state
> __refrigerator freezer_read
> update_if_frozen
> WRITE_ONCE(current->__state, TASK_FROZEN);
> ...
> /* Task is now marked frozen */
> /* frozen(task) == true */
> /* Assuming other tasks are frozen */
> freezer->state |= CGROUP_FROZEN;
> /* freezing(current) returns false */
> /* because cgroup is frozen (not freezing) */
> break out
> __set_current_state(TASK_RUNNING);
> /* Bug: Task resumes running when it should remain frozen */
>
> The existing !frozen(p) check in __thaw_task makes the
> WARN_ON_ONCE(freezing(p)) warning redundant. Removing this warning enables
> reverting the commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check
> if not frozen") to resolve the issue.
>
> The warning has been removed in the previous patch. This patch revert the
> commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
> frozen") to complete the fix.
>
> Fixes: cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not frozen")
> Reported-by: Zhong Jiawei<zhongjiawei1(a)huawei.com>
> Signed-off-by: Chen Ridong <chenridong(a)huawei.com>
> Signed-off-by: Tejun Heo <tj(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/kernel/cgroup/legacy_freezer.c b/kernel/cgroup/legacy_freezer.c index 507b8f19a262e..dd9417425d929 100644
> --- a/kernel/cgroup/legacy_freezer.c
> +++ b/kernel/cgroup/legacy_freezer.c
> @@ -66,15 +66,9 @@ static struct freezer *parent_freezer(struct freezer *freezer) bool cgroup_freezing(struct task_struct *task) {
> bool ret;
> - unsigned int state;
>
> rcu_read_lock();
> - /* Check if the cgroup is still FREEZING, but not FROZEN. The extra
> - * !FROZEN check is required, because the FREEZING bit is not cleared
> - * when the state FROZEN is reached.
> - */
> - state = task_freezer(task)->state;
> - ret = (state & CGROUP_FREEZING) && !(state & CGROUP_FROZEN);
> + ret = task_freezer(task)->state & CGROUP_FREEZING;
> rcu_read_unlock();
>
> return ret;
>
The driver stores a reference to the `usb_device` structure (`udev`)
in its private data (`data->udev`), which can persist beyond the
immediate context of the `bfusb_probe()` function.
Without proper reference count management, this can lead to two issues:
1. A `use-after-free` scenario if `udev` is accessed after its main
reference count drops to zero (e.g., if the device is disconnected
and the `data` structure is still active).
2. A `memory leak` if `udev`'s reference count is not properly
decremented during driver disconnect, preventing the `usb_device`
object from being freed.
To correctly manage the `udev` lifetime, explicitly increment its
reference count with `usb_get_dev(udev)` when storing it in the
driver's private data. Correspondingly, decrement the reference count
with `usb_put_dev(data->udev)` in the `bfusb_disconnect()` callback.
This ensures `udev` remains valid while referenced by the driver's
private data and is properly released when no longer needed.
Signed-off-by: Salah Triki <salah.triki(a)gmail.com>
Fixes: 9c724357f432d ("[Bluetooth] Code cleanup of the drivers source code")
Cc: stable(a)vger.kernel.org
Cc: Marcel Holtmann <marcel(a)holtmann.org>
Cc: Luiz Augusto von Dentz <luiz.dentz(a)gmail.com>
Cc: linux-bluetooth(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
Changes in v3:
- Add tag Cc
Changes in v2:
- Add tags Fixes and Cc
drivers/bluetooth/bfusb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/bfusb.c b/drivers/bluetooth/bfusb.c
index 8df310983bf6..f966bd8361b0 100644
--- a/drivers/bluetooth/bfusb.c
+++ b/drivers/bluetooth/bfusb.c
@@ -622,7 +622,7 @@ static int bfusb_probe(struct usb_interface *intf, const struct usb_device_id *i
if (!data)
return -ENOMEM;
- data->udev = udev;
+ data->udev = usb_get_dev(udev);
data->bulk_in_ep = bulk_in_ep->desc.bEndpointAddress;
data->bulk_out_ep = bulk_out_ep->desc.bEndpointAddress;
data->bulk_pkt_size = le16_to_cpu(bulk_out_ep->desc.wMaxPacketSize);
@@ -701,6 +701,8 @@ static void bfusb_disconnect(struct usb_interface *intf)
usb_set_intfdata(intf, NULL);
+ usb_put_dev(data->udev);
+
bfusb_close(hdev);
hci_unregister_dev(hdev);
--
2.43.0
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken
the logic in iomap_dio_bio_iter() in a way that when the device does
support FUA (or has no writeback cache) and the direct IO happens to
freshly allocated or unwritten extents, we will *not* issue fsync after
completing direct IO O_SYNC / O_DSYNC write because the
IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by
clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as
it was originally intended.
CC: John Garry <john.g.garry(a)oracle.com>
CC: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com>
Fixes: d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/iomap/direct-io.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
BTW, I've spotted this because some performance tests got suspiciously fast
on recent kernels :) Sadly no easy improvement to cherry-pick for me to fix
customer issue I'm chasing...
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6f25d4cfea9f..b84f6af2eb4c 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -363,14 +363,14 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
if (iomap->flags & IOMAP_F_SHARED)
dio->flags |= IOMAP_DIO_COW;
- if (iomap->flags & IOMAP_F_NEW) {
+ if (iomap->flags & IOMAP_F_NEW)
need_zeroout = true;
- } else if (iomap->type == IOMAP_MAPPED) {
- if (iomap_dio_can_use_fua(iomap, dio))
- bio_opf |= REQ_FUA;
- else
- dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
- }
+ else if (iomap->type == IOMAP_MAPPED &&
+ iomap_dio_can_use_fua(iomap, dio))
+ bio_opf |= REQ_FUA;
+
+ if (!(bio_opf & REQ_FUA))
+ dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
/*
* We can only do deferred completion for pure overwrites that
--
2.43.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 37848a456fc38c191aedfe41f662cc24db8c23d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072839-wildly-gala-e85f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 37848a456fc38c191aedfe41f662cc24db8c23d9 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Tue, 15 Jul 2025 20:43:28 +0200
Subject: [PATCH] selftests: mptcp: connect: also cover alt modes
The "mmap" and "sendfile" alternate modes for mptcp_connect.sh/.c are
available from the beginning, but only tested when mptcp_connect.sh is
manually launched with "-m mmap" or "-m sendfile", not via the
kselftests helpers.
The MPTCP CI was manually running "mptcp_connect.sh -m mmap", but not
"-m sendfile". Plus other CIs, especially the ones validating the stable
releases, were not validating these alternate modes.
To make sure these modes are validated by these CIs, add two new test
programs executing mptcp_connect.sh with the alternate modes.
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Cc: stable(a)vger.kernel.org
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-1-8230ddd824…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index e47788bfa671..c6b030babba8 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -4,7 +4,8 @@ top_srcdir = ../../../../..
CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
-TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
+TEST_PROGS := mptcp_connect.sh mptcp_connect_mmap.sh mptcp_connect_sendfile.sh \
+ pm_netlink.sh mptcp_join.sh diag.sh \
simult_flows.sh mptcp_sockopt.sh userspace_pm.sh
TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
new file mode 100755
index 000000000000..5dd30f9394af
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m mmap "${@}"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
new file mode 100755
index 000000000000..1d16fb1cc9bb
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m sendfile "${@}"
According to documentation, the eDP PHY on x1e80100 has another clock
called refclk. Rework the driver to allow different number of clocks
based on match data and add this refclk to the x1e80100. Fix the
dt-bindings schema and add the clock to the DT node as well.
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Abel Vesa (3):
dt-bindings: phy: qcom-edp: Add missing clock for X Elite
phy: qcom: edp: Add missing refclk for X1E80100
arm64: dts: qcom: Add missing TCSR refclk to the eDP PHY
.../devicetree/bindings/phy/qcom,edp-phy.yaml | 23 +++++++++++-
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 6 ++-
drivers/phy/qualcomm/phy-qcom-edp.c | 43 ++++++++++++++++++----
3 files changed, 62 insertions(+), 10 deletions(-)
---
base-commit: 79fb37f39b77bbf9a56304e9af843cd93a7a1916
change-id: 20250730-phy-qcom-edp-add-missing-refclk-5ab82828f8e7
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro(a)zeniv.linux.org.uk> wrote:
>
> On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > Hi Al and Christian,
> >
> > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > unmounted/not ours mounts") introduced an ABI backward compatibility
> > break. CRIU depends on the previous behavior, and users are now
> > reporting criu restore failures following the kernel update. This change
> > has been propagated to stable kernels. Is this check strictly required?
>
> Yes.
>
> > Would it be possible to check only if the current process has
> > CAP_SYS_ADMIN within the mount user namespace?
>
> Not enough, both in terms of permissions *and* in terms of "thou
> shalt not bugger the kernel data structures - nobody's priveleged
> enough for that".
Al,
I am still thinking in terms of "Thou shalt not break userspace"...
Seriously though, this original behavior has been in the kernel for 20
years, and it hasn't triggered any corruptions in all that time. I
understand this change might be necessary in its current form, and
that some collateral damage could be unavoidable. But if that's the
case, I'd expect a detailed explanation of why it had to be so and why
userspace breakage is unavoidable.
The original change was merged two decades ago. We need to
consider that some applications might rely on that behavior. I'm not
questioning the security aspect - that must be addressed. But for
anything else, we need to minimize the impact on user applications that
don't violate security.
We can consider a cleaner fix for the upstream kernel, but when we are
talking about stable kernels, the user-space backward compatibility
aspect should be even more critical.
Thanks,
Andrei
From: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Due to a flaw in the hardware design, the GL9763e replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9763e
PCI config. Therefore, the replay timer timeout must be masked.
Signed-off-by: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Fixes: 1ae1d2d6e555 ("mmc: sdhci-pci-gli: Add Genesys Logic GL9763E support")
Cc: stable(a)vger.kernel.org
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
---
drivers/mmc/host/sdhci-pci-gli.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
index 436f0460222f..3a1de477e9af 100644
--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -1782,6 +1782,9 @@ static void gl9763e_hw_setting(struct sdhci_pci_slot *slot)
value |= FIELD_PREP(GLI_9763E_HS400_RXDLY, GLI_9763E_HS400_RXDLY_5);
pci_write_config_dword(pdev, PCIE_GLI_9763E_CLKRXDLY, value);
+ /* mask the replay timer timeout of AER */
+ sdhci_gli_mask_replay_timer_timeout(pdev);
+
pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
value &= ~GLI_9763E_VHS_REV;
value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_R);
--
2.43.0
`dma_free_coherent()` must only be called if the corresponding
`dma_alloc_coherent()` call has succeeded. Calling it when the allocation
fails leads to undefined behavior.
Add a check to ensure that the memory is only freed when the allocation
was successful.
Signed-off-by: Salah Triki <salah.triki(a)gmail.com>
Fixes: 71bcada88b0f3 ("edac: altera: Add Altera SDRAM EDAC support")
Cc: Markus Elfring <Markus.Elfring(a)web.de>
Cc: Dinh Nguyen <dinguyen(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Robert Richter <rric(a)kernel.org>
Cc: linux-edac(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
drivers/edac/altera_edac.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
index cae52c654a15..7685a8550d4b 100644
--- a/drivers/edac/altera_edac.c
+++ b/drivers/edac/altera_edac.c
@@ -128,7 +128,6 @@ static ssize_t altr_sdr_mc_err_inject_write(struct file *file,
ptemp = dma_alloc_coherent(mci->pdev, 16, &dma_handle, GFP_KERNEL);
if (!ptemp) {
- dma_free_coherent(mci->pdev, 16, ptemp, dma_handle);
edac_printk(KERN_ERR, EDAC_MC,
"Inject: Buffer Allocation error\n");
return -ENOMEM;
--
2.43.0
netlink_attachskb() checks for the socket's read memory allocation
constraints. Firstly, it has:
rmem < READ_ONCE(sk->sk_rcvbuf)
to check if the just increased rmem value fits into the socket's receive
buffer. If not, it proceeds and tries to wait for the memory under:
rmem + skb->truesize > READ_ONCE(sk->sk_rcvbuf)
The checks don't cover the case when skb->truesize + sk->sk_rmem_alloc is
equal to sk->sk_rcvbuf. Thus the function neither successfully accepts
these conditions, nor manages to reschedule the task - and is called in
retry loop for indefinite time which is caught as:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 0-....: (25999 ticks this GP) idle=ef2/1/0x4000000000000000 softirq=262269/262269 fqs=6212
(t=26000 jiffies g=230833 q=259957)
NMI backtrace for cpu 0
CPU: 0 PID: 22 Comm: kauditd Not tainted 5.10.240 #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc42 04/01/2014
Call Trace:
<IRQ>
dump_stack lib/dump_stack.c:120
nmi_cpu_backtrace.cold lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace lib/nmi_backtrace.c:62
rcu_dump_cpu_stacks kernel/rcu/tree_stall.h:335
rcu_sched_clock_irq.cold kernel/rcu/tree.c:2590
update_process_times kernel/time/timer.c:1953
tick_sched_handle kernel/time/tick-sched.c:227
tick_sched_timer kernel/time/tick-sched.c:1399
__hrtimer_run_queues kernel/time/hrtimer.c:1652
hrtimer_interrupt kernel/time/hrtimer.c:1717
__sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113
asm_call_irq_on_stack arch/x86/entry/entry_64.S:808
</IRQ>
netlink_attachskb net/netlink/af_netlink.c:1234
netlink_unicast net/netlink/af_netlink.c:1349
kauditd_send_queue kernel/audit.c:776
kauditd_thread kernel/audit.c:897
kthread kernel/kthread.c:328
ret_from_fork arch/x86/entry/entry_64.S:304
Restore the original behavior of the check which commit in Fixes
accidentally missed when restructuring the code.
Found by Linux Verification Center (linuxtesting.org).
Fixes: ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
Similar rmem and sk->sk_rcvbuf comparing pattern in
netlink_broadcast_deliver() accepts these values being equal, while
the netlink_dump() case does not - but it goes all the way down to
f9c2288837ba ("netlink: implement memory mapped recvmsg()") and looks
like an irrelevant issue without any real consequences. Though might be
cleaned up if needed.
Updated sk->sk_rmem_alloc vs sk->sk_rcvbuf checks throughout the kernel
diverse in treating the corner case of them being equal.
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6332a0e06596..0fc3f045fb65 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1218,7 +1218,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
nlk = nlk_sk(sk);
rmem = atomic_add_return(skb->truesize, &sk->sk_rmem_alloc);
- if ((rmem == skb->truesize || rmem < READ_ONCE(sk->sk_rcvbuf)) &&
+ if ((rmem == skb->truesize || rmem <= READ_ONCE(sk->sk_rcvbuf)) &&
!test_bit(NETLINK_S_CONGESTED, &nlk->state)) {
netlink_skb_set_owner_r(skb, sk);
return 0;
--
2.50.1
The `xa_store()` function can fail due to memory allocation issues or other
internal errors. Currently, the return value of `xa_store()` is not
checked, which can lead to a memory leak if it fails to store `numa_info`.
This patch checks the return value of `xa_store()`. If an error is
detected, the allocated `numa_info` is freed, and NULL is returned to
indicate the failure, preventing a memory leak and ensuring proper error
handling.
Signed-off-by: Salah Triki <salah.triki(a)gmail.com>
Fixes: 1cc823011a23f ("drm/amdgpu: Store additional numa node information")
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Simona Vetter <simona(a)ffwll.ch>
Cc: amd-gfx(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Improve description
- Add tags Fixes and Cc
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index f5466c592d94..b4a3e4d3e957 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -876,7 +876,7 @@ static inline uint64_t amdgpu_acpi_get_numa_size(int nid)
static struct amdgpu_numa_info *amdgpu_acpi_get_numa_info(uint32_t pxm)
{
- struct amdgpu_numa_info *numa_info;
+ struct amdgpu_numa_info *numa_info, *old;
int nid;
numa_info = xa_load(&numa_info_xa, pxm);
@@ -898,7 +898,11 @@ static struct amdgpu_numa_info *amdgpu_acpi_get_numa_info(uint32_t pxm)
} else {
numa_info->size = amdgpu_acpi_get_numa_size(nid);
}
- xa_store(&numa_info_xa, numa_info->pxm, numa_info, GFP_KERNEL);
+ old = xa_store(&numa_info_xa, numa_info->pxm, numa_info, GFP_KERNEL);
+ if (xa_is_err(old)) {
+ kfree(numa_info);
+ return NULL;
+ }
}
return numa_info;
--
2.43.0
From: Mario Limonciello <mario.limonciello(a)amd.com>
This reverts commit 66abb996999de0d440a02583a6e70c2c24deab45.
This broke custom brightness curves but it wasn't obvious because
of other related changes. Custom brightness curves are always
from a 0-255 input signal. The correct fix was to fix the default
value which was done by [1].
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel…
Reviewed-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Roman Li <roman.li(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 16347ca2396a..31ea57edeb45 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4800,16 +4800,16 @@ static int get_brightness_range(const struct amdgpu_dm_backlight_caps *caps,
return 1;
}
-/* Rescale from [min..max] to [0..MAX_BACKLIGHT_LEVEL] */
+/* Rescale from [min..max] to [0..AMDGPU_MAX_BL_LEVEL] */
static inline u32 scale_input_to_fw(int min, int max, u64 input)
{
- return DIV_ROUND_CLOSEST_ULL(input * MAX_BACKLIGHT_LEVEL, max - min);
+ return DIV_ROUND_CLOSEST_ULL(input * AMDGPU_MAX_BL_LEVEL, max - min);
}
-/* Rescale from [0..MAX_BACKLIGHT_LEVEL] to [min..max] */
+/* Rescale from [0..AMDGPU_MAX_BL_LEVEL] to [min..max] */
static inline u32 scale_fw_to_input(int min, int max, u64 input)
{
- return min + DIV_ROUND_CLOSEST_ULL(input * (max - min), MAX_BACKLIGHT_LEVEL);
+ return min + DIV_ROUND_CLOSEST_ULL(input * (max - min), AMDGPU_MAX_BL_LEVEL);
}
static void convert_custom_brightness(const struct amdgpu_dm_backlight_caps *caps,
--
2.34.1
Due to updates to the Device Tree (migrating to onboard USB hub nodes
instead of (badly) hacking things with a gpio regulator that doesn't
actually work properly), we now need to enable the onboard USB hub
driver in U-Boot.
This anticipates upcoming breakage when 6.16 DT will be merged into
U-Boot's dts/upstream.
The series can be applied as is before v6.16 DT is merged or only the
defconfig changes after 6.16 DT has been merged.
The last two patches are simply to avoid probing devices that aren't
actually routed on RK3399 Puma, which is nice to have but doesn't fix
anything.
Note that this depends on the following series:
https://lore.kernel.org/u-boot/20250722-usb_onboard_hub_cypress_hx3-v4-0-91…
Signed-off-by: Quentin Schulz <quentin.schulz(a)cherry.de>
---
Lukasz Czechowski (2):
dt-bindings: usb: cypress,hx3: Add support for all variants
arm64: dts: rockchip: fix internal USB hub instability on RK3399 Puma
Quentin Schulz (4):
configs: puma-rk3399: enable onboard USB hub support
dt-bindings: usb: usb-device: relax compatible pattern to a contains
arm64: dts: rockchip: disable unrouted USB controllers and PHY on RK3399 Puma
arm64: dts: rockchip: disable unrouted USB controllers and PHY on RK3399 Puma with Haikou
configs/puma-rk3399_defconfig | 1 +
dts/upstream/Bindings/usb/cypress,hx3.yaml | 19 +++++++--
dts/upstream/Bindings/usb/usb-device.yaml | 3 +-
.../src/arm64/rockchip/rk3399-puma-haikou.dts | 8 ----
dts/upstream/src/arm64/rockchip/rk3399-puma.dtsi | 48 +++++++++++-----------
5 files changed, 43 insertions(+), 36 deletions(-)
---
base-commit: 5a8dd2e0c848135b5c96af291aa96e79acc923ec
change-id: 20250730-puma-usb-cypress-2d2957024424
prerequisite-change-id: 20250425-usb_onboard_hub_cypress_hx3-2831983f1ede:v4
prerequisite-patch-id: 515a13b22600d40716e9c36d16b084086ce7d474
prerequisite-patch-id: 9f8a11bd6c66e976c51dc0f0bc3292183f9403f3
prerequisite-patch-id: 45ef5b9422333db5fcd23d95b6570b320635c49b
prerequisite-patch-id: 19092f1c3db746292401b8513807439c87ea9589
prerequisite-patch-id: fced7578d40069c5fe83d97aa42476015cf9cbda
Best regards,
--
Quentin Schulz <quentin.schulz(a)cherry.de>
This is the start of the stable review cycle for the 6.6.100 release.
There are 111 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 24 Jul 2025 13:43:10 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.100-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.100-rc1
Michael C. Pratt <mcpratt(a)pm.me>
nvmem: layouts: u-boot-env: remove crc32 endianness conversion
Johan Hovold <johan+linaro(a)kernel.org>
i2c: omap: fix deprecated of_property_read_bool() use
Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
Revert "selftests/bpf: dummy_st_ops should reject 0 for non-nullable params"
Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
Revert "selftests/bpf: adjust dummy_st_ops_success to detect additional error"
Arun Raghavan <arun(a)asymptotic.io>
ASoC: fsl_sai: Force a software reset when starting in consumer mode
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
regulator: pwm-regulator: Manage boot-on with disabled PWM channels
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
regulator: pwm-regulator: Calculate the output voltage for disabled PWMs
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
i2c: omap: Handle omap_i2c_init() errors in omap_i2c_probe()
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
i2c: omap: Fix an error handling path in omap_i2c_probe()
Jayesh Choudhary <j-choudhary(a)ti.com>
i2c: omap: Add support for setting mux
Krishna Kurapati <krishna.kurapati(a)oss.qualcomm.com>
usb: dwc3: qcom: Don't leave BCR asserted
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: Don't try to recover devices lost during warm reset.
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: Fix flushing of delayed work used for post resume purposes
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: Fix flushing and scheduling of delayed work that tunes runtime pm
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: fix detection of high tier USB3 devices behind suspended hubs
Mark Brown <broonie(a)kernel.org>
arm64: Filter out SME hwcaps when FEAT_SME isn't implemented
Al Viro <viro(a)zeniv.linux.org.uk>
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
Eric Dumazet <edumazet(a)google.com>
ipv6: make addrconf_wq single threaded
Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
sched: Change nr_uninterruptible type to unsigned long
Chen Ridong <chenridong(a)huawei.com>
Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
David Howells <dhowells(a)redhat.com>
rxrpc: Fix transmission of an abort in response to an abort
David Howells <dhowells(a)redhat.com>
rxrpc: Fix recv-recv race of completed call
William Liu <will(a)willsroot.io>
net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree
Joseph Huang <Joseph.Huang(a)garmin.com>
net: bridge: Do not offload IGMP/MLD messages
Dong Chenchen <dongchenchen2(a)huawei.com>
net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
Jakub Kicinski <kuba(a)kernel.org>
tls: always refresh the queue when reading sock
Li Tian <litian(a)redhat.com>
hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 addrconf
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU
Florian Westphal <fw(a)strlen.de>
netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
Yue Haibing <yuehaibing(a)huawei.com>
ipv6: mcast: Delay put pmc->idev in mld_del_delrec()
Christoph Paasch <cpaasch(a)openai.com>
net/mlx5: Correctly set gso_size when LRO is used
Zijun Hu <zijun.hu(a)oss.qualcomm.com>
Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: SMP: If an unallowed command is received consider it a failure
Alessandro Gasbarroni <alex.gasbarroni(a)gmail.com>
Bluetooth: hci_sync: fix connectable extended advertising when using static random address
Kuniyuki Iwashima <kuniyu(a)google.com>
Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()
Oliver Neukum <oneukum(a)suse.com>
usb: net: sierra: check for no status endpoint
Dave Ertman <david.m.ertman(a)intel.com>
ice: add NULL check in eswitch lag check
Marius Zachmann <mail(a)mariuszachmann.de>
hwmon: (corsair-cpro) Validate the size of the received input buffer
Paolo Abeni <pabeni(a)redhat.com>
selftests: net: increase inter-packet timeout in udpgro.sh
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: remove scan request n_channels counted_by
Yu Kuai <yukuai3(a)huawei.com>
nvme: fix misaccounting of nvme-mpath inflight I/O
Sean Anderson <sean.anderson(a)linux.dev>
net: phy: Don't register LEDs for genphy
Zheng Qixing <zhengqixing(a)huawei.com>
nvme: fix inconsistent RCU list manipulation in nvme_ns_add_to_ctrl_list()
Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
smb: client: fix use-after-free in cifs_oplock_break
Kuniyuki Iwashima <kuniyu(a)google.com>
rpl: Fix use-after-free in rpl_do_srh_inline().
Xiang Mei <xmei5(a)asu.edu>
net/sched: sch_qfq: Fix race condition on qfq_aggregate
Ming Lei <ming.lei(a)redhat.com>
block: fix kobject leak in blk_unregister_queue
Alok Tiwari <alok.a.tiwari(a)oracle.com>
net: emaclite: Fix missing pointer increment in aligned_read()
Zizhi Wo <wozizhi(a)huawei.com>
cachefiles: Fix the incorrect return value in __cachefiles_write()
Paul Chaignon <paul.chaignon(a)gmail.com>
bpf: Reject %p% format string in bprintf-like helpers
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
soundwire: amd: fix for clearing command status register
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
soundwire: amd: fix for handling slave alerts after link is down
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fix initialization of data for instructions that write to subdevice
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fix use of uninitialized data in insn_rw_emulate_bits()
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fix some signed shift left operations
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fail COMEDI_INSNLIST ioctl if n_insns is too large
Ian Abbott <abbotti(a)mev.co.uk>
comedi: das6402: Fix bit shift out of bounds
Ian Abbott <abbotti(a)mev.co.uk>
comedi: das16m1: Fix bit shift out of bounds
Ian Abbott <abbotti(a)mev.co.uk>
comedi: aio_iiro_16: Fix bit shift out of bounds
Ian Abbott <abbotti(a)mev.co.uk>
comedi: pcl812: Fix bit shift out of bounds
Chen Ni <nichen(a)iscas.ac.cn>
iio: adc: stm32-adc: Fix race in installing chained IRQ handler
Fabio Estevam <festevam(a)denx.de>
iio: adc: max1363: Reorder mode_list[] entries
Fabio Estevam <festevam(a)denx.de>
iio: adc: max1363: Fix MAX1363_4X_CHANS/MAX1363_8X_CHANS[]
Sean Nyekjaer <sean(a)geanix.com>
iio: accel: fxls8962af: Fix use after free in fxls8962af_fifo_flush
Andrew Jeffery <andrew(a)codeconstruct.com.au>
soc: aspeed: lpc-snoop: Don't disable channels that aren't enabled
Andrew Jeffery <andrew(a)codeconstruct.com.au>
soc: aspeed: lpc-snoop: Cleanup resources in stack-order
Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
smb: client: fix use-after-free in crypt_message when using async crypto
Ilya Leoshkevich <iii(a)linux.ibm.com>
s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL again
Maulik Shah <maulik.shah(a)oss.qualcomm.com>
pmdomain: governor: Consider CPU latency tolerance from pm_domain_cpu_gov
Jiawen Wu <jiawenwu(a)trustnetic.com>
net: libwx: properly reset Rx ring descriptor
Jiawen Wu <jiawenwu(a)trustnetic.com>
net: libwx: fix the using of Rx buffer DMA
Jiawen Wu <jiawenwu(a)trustnetic.com>
net: libwx: remove duplicate page_pool_put_full_page()
Judith Mendez <jm(a)ti.com>
mmc: sdhci_am654: Workaround for Errata i2312
Edson Juliano Drosdeck <edson.drosdeck(a)gmail.com>
mmc: sdhci-pci: Quirk for broken command queuing on Intel GLK-based Positivo models
Thomas Fourier <fourier.thomas(a)gmail.com>
mmc: bcm2835: Fix dma_unmap_sg() nents value
Nathan Chancellor <nathan(a)kernel.org>
memstick: core: Zero initialize id_reg in h_memstick_read_dev_id()
Jan Kara <jack(a)suse.cz>
isofs: Verify inode mode when loading from disk
Dan Carpenter <dan.carpenter(a)linaro.org>
dmaengine: nbpfaxi: Fix memory corruption in probe()
Yun Lu <luyun(a)kylinos.cn>
af_packet: fix soft lockup issue caused by tpacket_snd()
Yun Lu <luyun(a)kylinos.cn>
af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
Jakob Unterwurzacher <jakob.unterwurzacher(a)cherry.de>
arm64: dts: rockchip: use cs-gpios for spi1 on ringneck
Francesco Dolcini <francesco.dolcini(a)toradex.com>
arm64: dts: freescale: imx8mm-verdin: Keep LDO5 always on
Tim Harvey <tharvey(a)gateworks.com>
arm64: dts: imx8mp-venice-gw74xx: fix TPM SPI frequency
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Update the list of the PCI supported devices
Nathan Chancellor <nathan(a)kernel.org>
phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring/poll: fix POLLERR handling
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda/realtek: Add quirk for ASUS ROG Strix G712LWS
Eeli Haapalainen <eeli.haapalainen(a)protonmail.com>
drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resume
Tomas Glozar <tglozar(a)redhat.com>
tracing/osnoise: Fix crash in timerlat_dump_stack()
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Add down_write(trace_event_sem) when adding trace event
Nathan Chancellor <nathan(a)kernel.org>
tracing/probes: Avoid using params uninitialized in parse_btf_arg()
Benjamin Tissoires <bentiss(a)kernel.org>
HID: core: do not bypass hid_hw_raw_request
Benjamin Tissoires <bentiss(a)kernel.org>
HID: core: ensure __hid_request reserves the report ID as the first byte
Benjamin Tissoires <bentiss(a)kernel.org>
HID: core: ensure the allocated report buffer can contain the reserved report ID
Sheng Yong <shengyong1(a)xiaomi.com>
dm-bufio: fix sched in atomic context
Cheng Ming Lin <chengminglin(a)mxic.com.tw>
spi: Add check for 8-bit transfer with 8 IO mode support
Thomas Fourier <fourier.thomas(a)gmail.com>
pch_uart: Fix dma_sync_sg_for_device() nents value
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - set correct controller type for Acer NGR200
Steffen Bätz <steffen(a)innosonix.de>
nvmem: imx-ocotp: fix MAC address byte length
Alok Tiwari <alok.a.tiwari(a)oracle.com>
thunderbolt: Fix bit masking in tb_dp_port_set_hops()
Mario Limonciello <mario.limonciello(a)amd.com>
thunderbolt: Fix wake on connect at runtime
Clément Le Goffic <clement.legoffic(a)foss.st.com>
i2c: stm32: fix the device used for the DMA map
Xinyu Liu <1171169449(a)qq.com>
usb: gadget: configfs: Fix OOB read on empty string write
Drew Hamilton <drew.hamilton(a)zetier.com>
usb: musb: fix gadget state on disconnect
Ryan Mann (NDI) <rmann(a)ndigital.com>
USB: serial: ftdi_sio: add support for NDI EMGUIDE GEMINI
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add Foxconn T99W640
Fabio Porcedda <fabio.porcedda(a)gmail.com>
USB: serial: option: add Telit Cinterion FE910C04 (ECM) composition
Haotien Hsu <haotienh(a)nvidia.com>
phy: tegra: xusb: Disable periodic tracking on Tegra234
Wayne Chang <waynec(a)nvidia.com>
phy: tegra: xusb: Decouple CYA_TRK_CODE_UPDATE_ON_IDLE from trk_hw_mode
Wayne Chang <waynec(a)nvidia.com>
phy: tegra: xusb: Fix unbalanced regulator disable in UTMI PHY mode
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/boot/dts/freescale/imx8mm-verdin.dtsi | 1 +
.../boot/dts/freescale/imx8mp-venice-gw74xx.dts | 2 +-
arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi | 23 +++++++
arch/arm64/kernel/cpufeature.c | 35 ++++++----
arch/s390/net/bpf_jit_comp.c | 10 ++-
block/blk-sysfs.c | 1 +
drivers/base/power/domain_governor.c | 18 ++++-
drivers/bluetooth/btusb.c | 78 ++++++++++++----------
drivers/comedi/comedi_fops.c | 30 ++++++++-
drivers/comedi/drivers.c | 17 +++--
drivers/comedi/drivers/aio_iiro_16.c | 3 +-
drivers/comedi/drivers/das16m1.c | 3 +-
drivers/comedi/drivers/das6402.c | 3 +-
drivers/comedi/drivers/pcl812.c | 3 +-
drivers/dma/nbpfaxi.c | 11 ++-
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 1 +
drivers/hid/hid-core.c | 21 ++++--
drivers/hwmon/corsair-cpro.c | 5 ++
drivers/i2c/busses/Kconfig | 1 +
drivers/i2c/busses/i2c-omap.c | 30 ++++++++-
drivers/i2c/busses/i2c-stm32.c | 8 +--
drivers/i2c/busses/i2c-stm32f7.c | 4 +-
drivers/iio/accel/fxls8962af-core.c | 2 +
drivers/iio/adc/max1363.c | 43 ++++++------
drivers/iio/adc/stm32-adc-core.c | 7 +-
drivers/input/joystick/xpad.c | 2 +-
drivers/md/dm-bufio.c | 6 +-
drivers/memstick/core/memstick.c | 2 +-
drivers/mmc/host/bcm2835.c | 3 +-
drivers/mmc/host/sdhci-pci-core.c | 3 +-
drivers/mmc/host/sdhci_am654.c | 9 ++-
drivers/net/ethernet/intel/ice/ice_lag.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 ++--
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
drivers/net/ethernet/wangxun/libwx/wx_hw.c | 7 +-
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 20 ++----
drivers/net/ethernet/wangxun/libwx/wx_type.h | 2 -
drivers/net/ethernet/xilinx/xilinx_emaclite.c | 2 +-
drivers/net/hyperv/netvsc_drv.c | 5 +-
drivers/net/phy/phy_device.c | 6 +-
drivers/net/usb/sierra_net.c | 4 ++
drivers/nvme/host/core.c | 6 +-
drivers/nvmem/imx-ocotp-ele.c | 5 +-
drivers/nvmem/imx-ocotp.c | 5 +-
drivers/nvmem/u-boot-env.c | 6 +-
drivers/phy/tegra/xusb-tegra186.c | 77 ++++++++++++---------
drivers/phy/tegra/xusb.h | 1 +
drivers/regulator/pwm-regulator.c | 40 +++++++++++
drivers/soc/aspeed/aspeed-lpc-snoop.c | 13 +++-
drivers/soundwire/amd_manager.c | 4 +-
drivers/spi/spi.c | 14 ++--
drivers/thunderbolt/switch.c | 10 +--
drivers/thunderbolt/tb.h | 2 +-
drivers/thunderbolt/usb4.c | 12 ++--
drivers/tty/serial/pch_uart.c | 2 +-
drivers/usb/core/hub.c | 36 +++++++++-
drivers/usb/core/hub.h | 1 +
drivers/usb/dwc3/dwc3-qcom.c | 8 +--
drivers/usb/gadget/configfs.c | 4 ++
drivers/usb/musb/musb_gadget.c | 2 +
drivers/usb/serial/ftdi_sio.c | 2 +
drivers/usb/serial/ftdi_sio_ids.h | 3 +
drivers/usb/serial/option.c | 5 ++
fs/cachefiles/io.c | 2 -
fs/cachefiles/ondemand.c | 4 +-
fs/isofs/inode.c | 9 ++-
fs/namespace.c | 5 ++
fs/smb/client/file.c | 10 ++-
fs/smb/client/smb2ops.c | 7 +-
include/net/cfg80211.h | 2 +-
include/net/netfilter/nf_conntrack.h | 15 ++++-
include/trace/events/rxrpc.h | 3 +
io_uring/net.c | 12 ++--
io_uring/poll.c | 2 -
kernel/bpf/helpers.c | 11 ++-
kernel/cgroup/legacy_freezer.c | 8 +--
kernel/sched/loadavg.c | 2 +-
kernel/sched/sched.h | 2 +-
kernel/trace/trace_events.c | 5 ++
kernel/trace/trace_osnoise.c | 2 +-
kernel/trace/trace_probe.c | 2 +-
net/8021q/vlan.c | 42 +++++++++---
net/8021q/vlan.h | 1 +
net/bluetooth/hci_sync.c | 4 +-
net/bluetooth/l2cap_core.c | 26 ++++++--
net/bluetooth/l2cap_sock.c | 3 +
net/bluetooth/smp.c | 21 +++++-
net/bluetooth/smp.h | 1 +
net/bridge/br_switchdev.c | 3 +
net/ipv6/addrconf.c | 3 +-
net/ipv6/mcast.c | 2 +-
net/ipv6/rpl_iptunnel.c | 8 +--
net/netfilter/nf_conntrack_core.c | 26 ++++++--
net/packet/af_packet.c | 27 ++++----
net/phonet/pep.c | 2 +-
net/rxrpc/call_accept.c | 1 +
net/rxrpc/output.c | 3 +
net/rxrpc/recvmsg.c | 19 +++++-
net/sched/sch_htb.c | 4 +-
net/sched/sch_qfq.c | 30 ++++++---
net/tls/tls_strp.c | 3 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/soc/fsl/fsl_sai.c | 14 ++--
.../selftests/bpf/prog_tests/dummy_st_ops.c | 27 --------
.../selftests/bpf/progs/dummy_st_ops_success.c | 13 +---
tools/testing/selftests/net/udpgro.sh | 8 +--
107 files changed, 753 insertions(+), 351 deletions(-)
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 80 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 10 ++-
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 129 insertions(+), 34 deletions(-)
--
2.50.1.552.g942d659e1b-goog
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 82 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 12 ++--
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 131 insertions(+), 36 deletions(-)
--
2.50.1.552.g942d659e1b-goog
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 82 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 12 ++--
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 131 insertions(+), 36 deletions(-)
--
2.50.1.552.g942d659e1b-goog
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 37848a456fc38c191aedfe41f662cc24db8c23d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072839-machinist-coherence-ab5f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 37848a456fc38c191aedfe41f662cc24db8c23d9 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Tue, 15 Jul 2025 20:43:28 +0200
Subject: [PATCH] selftests: mptcp: connect: also cover alt modes
The "mmap" and "sendfile" alternate modes for mptcp_connect.sh/.c are
available from the beginning, but only tested when mptcp_connect.sh is
manually launched with "-m mmap" or "-m sendfile", not via the
kselftests helpers.
The MPTCP CI was manually running "mptcp_connect.sh -m mmap", but not
"-m sendfile". Plus other CIs, especially the ones validating the stable
releases, were not validating these alternate modes.
To make sure these modes are validated by these CIs, add two new test
programs executing mptcp_connect.sh with the alternate modes.
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Cc: stable(a)vger.kernel.org
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-1-8230ddd824…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index e47788bfa671..c6b030babba8 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -4,7 +4,8 @@ top_srcdir = ../../../../..
CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
-TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
+TEST_PROGS := mptcp_connect.sh mptcp_connect_mmap.sh mptcp_connect_sendfile.sh \
+ pm_netlink.sh mptcp_join.sh diag.sh \
simult_flows.sh mptcp_sockopt.sh userspace_pm.sh
TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
new file mode 100755
index 000000000000..5dd30f9394af
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m mmap "${@}"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
new file mode 100755
index 000000000000..1d16fb1cc9bb
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m sendfile "${@}"
From: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Due to a flaw in the hardware design, the GL9763e replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9763e
PCI config. Therefore, the replay timer timeout must be masked.
Signed-off-by: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Fixes: 1ae1d2d6e555 ("mmc: sdhci-pci-gli: Add Genesys Logic GL9763E support")
Cc: stable(a)vger.kernel.org
---
drivers/mmc/host/sdhci-pci-gli.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
index 436f0460222f..3a1de477e9af 100644
--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -1782,6 +1782,9 @@ static void gl9763e_hw_setting(struct sdhci_pci_slot *slot)
value |= FIELD_PREP(GLI_9763E_HS400_RXDLY, GLI_9763E_HS400_RXDLY_5);
pci_write_config_dword(pdev, PCIE_GLI_9763E_CLKRXDLY, value);
+ /* mask the replay timer timeout of AER */
+ sdhci_gli_mask_replay_timer_timeout(pdev);
+
pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
value &= ~GLI_9763E_VHS_REV;
value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_R);
--
2.43.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6aae87fe7f180cd93a74466cdb6cf2aa9bb28798
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072104-bacteria-resend-dcff@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6aae87fe7f180cd93a74466cdb6cf2aa9bb28798 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Le=20Goffic?= <clement.legoffic(a)foss.st.com>
Date: Fri, 4 Jul 2025 10:39:15 +0200
Subject: [PATCH] i2c: stm32f7: unmap DMA mapped buffer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Before each I2C transfer using DMA, the I2C buffer is DMA'pped to make
sure the memory buffer is DMA'able. This is handle in the function
`stm32_i2c_prep_dma_xfer()`.
If the transfer fails for any reason the I2C buffer must be unmap.
Use the dma_callback to factorize the code and fix this issue.
Note that the `stm32f7_i2c_dma_callback()` is now called in case of DMA
transfer success and error and that the `complete()` on the dma_complete
completion structure is done inconditionnally in case of transfer
success or error as well as the `dmaengine_terminate_async()`.
This is allowed as a `complete()` in case transfer error has no effect
as well as a `dmaengine_terminate_async()` on a transfer success.
Also fix the unneeded cast and remove not more needed variables.
Fixes: 7ecc8cfde553 ("i2c: i2c-stm32f7: Add DMA support")
Signed-off-by: Clément Le Goffic <clement.legoffic(a)foss.st.com>
Cc: <stable(a)vger.kernel.org> # v4.18+
Acked-by: Alain Volmat <alain.volmat(a)foss.st.com>
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
Link: https://lore.kernel.org/r/20250704-i2c-upstream-v4-2-84a095a2c728@foss.st.c…
diff --git a/drivers/i2c/busses/i2c-stm32f7.c b/drivers/i2c/busses/i2c-stm32f7.c
index 817d081460c2..73a7b8894c0d 100644
--- a/drivers/i2c/busses/i2c-stm32f7.c
+++ b/drivers/i2c/busses/i2c-stm32f7.c
@@ -739,10 +739,11 @@ static void stm32f7_i2c_disable_dma_req(struct stm32f7_i2c_dev *i2c_dev)
static void stm32f7_i2c_dma_callback(void *arg)
{
- struct stm32f7_i2c_dev *i2c_dev = (struct stm32f7_i2c_dev *)arg;
+ struct stm32f7_i2c_dev *i2c_dev = arg;
struct stm32_i2c_dma *dma = i2c_dev->dma;
stm32f7_i2c_disable_dma_req(i2c_dev);
+ dmaengine_terminate_async(dma->chan_using);
dma_unmap_single(i2c_dev->dev, dma->dma_buf, dma->dma_len,
dma->dma_data_dir);
complete(&dma->dma_complete);
@@ -1510,7 +1511,6 @@ static irqreturn_t stm32f7_i2c_handle_isr_errs(struct stm32f7_i2c_dev *i2c_dev,
u16 addr = f7_msg->addr;
void __iomem *base = i2c_dev->base;
struct device *dev = i2c_dev->dev;
- struct stm32_i2c_dma *dma = i2c_dev->dma;
/* Bus error */
if (status & STM32F7_I2C_ISR_BERR) {
@@ -1551,10 +1551,8 @@ static irqreturn_t stm32f7_i2c_handle_isr_errs(struct stm32f7_i2c_dev *i2c_dev,
}
/* Disable dma */
- if (i2c_dev->use_dma) {
- stm32f7_i2c_disable_dma_req(i2c_dev);
- dmaengine_terminate_async(dma->chan_using);
- }
+ if (i2c_dev->use_dma)
+ stm32f7_i2c_dma_callback(i2c_dev);
i2c_dev->master_mode = false;
complete(&i2c_dev->complete);
@@ -1600,7 +1598,6 @@ static irqreturn_t stm32f7_i2c_isr_event_thread(int irq, void *data)
{
struct stm32f7_i2c_dev *i2c_dev = data;
struct stm32f7_i2c_msg *f7_msg = &i2c_dev->f7_msg;
- struct stm32_i2c_dma *dma = i2c_dev->dma;
void __iomem *base = i2c_dev->base;
u32 status, mask;
int ret;
@@ -1619,10 +1616,8 @@ static irqreturn_t stm32f7_i2c_isr_event_thread(int irq, void *data)
dev_dbg(i2c_dev->dev, "<%s>: Receive NACK (addr %x)\n",
__func__, f7_msg->addr);
writel_relaxed(STM32F7_I2C_ICR_NACKCF, base + STM32F7_I2C_ICR);
- if (i2c_dev->use_dma) {
- stm32f7_i2c_disable_dma_req(i2c_dev);
- dmaengine_terminate_async(dma->chan_using);
- }
+ if (i2c_dev->use_dma)
+ stm32f7_i2c_dma_callback(i2c_dev);
f7_msg->result = -ENXIO;
}
@@ -1640,8 +1635,7 @@ static irqreturn_t stm32f7_i2c_isr_event_thread(int irq, void *data)
ret = wait_for_completion_timeout(&i2c_dev->dma->dma_complete, HZ);
if (!ret) {
dev_dbg(i2c_dev->dev, "<%s>: Timed out\n", __func__);
- stm32f7_i2c_disable_dma_req(i2c_dev);
- dmaengine_terminate_async(dma->chan_using);
+ stm32f7_i2c_dma_callback(i2c_dev);
f7_msg->result = -ETIMEDOUT;
}
}
From: Nianyao Tang <tangnianyao(a)huawei.com>
[ upstream commit e8cde32f111f7f5681a7bad3ec747e9e697569a9 ]
Enable ECBHB bits in ID_AA64MMFR1 register as per ARM DDI 0487K.a
specification.
When guest OS read ID_AA64MMFR1_EL1, kvm emulate this reg using
ftr_id_aa64mmfr1 and always return ID_AA64MMFR1_EL1.ECBHB=0 to guest.
It results in guest syscall jump to tramp ventry, which is not needed
in implementation with ID_AA64MMFR1_EL1.ECBHB=1.
Let's make the guest syscall process the same as the host.
This fixes performance regressions introduced by commit a53b3599d9bf
("arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected()
lists") for guests running on neoverse v2 hardware, which supports
ECBHB.
Signed-off-by: Nianyao Tang <tangnianyao(a)huawei.com>
Link: https://lore.kernel.org/r/20240611122049.2758600-1-tangnianyao@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Patrick Roy <roypat(a)amazon.co.uk>
---
arch/arm64/kernel/cpufeature.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 840cc48b5147..5d2322eeee47 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -343,6 +343,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
};
static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_ECBHB_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_TIDCP1_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_AFP_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_ETS_SHIFT, 4, 0),
--
2.50.1
Hi Sasha,
Thank you for maintaining the stable versions with Greg!
If I remember well, you run some scripts on your side to maintain the
queue/* branches in the linux-stable-rc Git tree [1], is that correct?
These branches have not been updated for a bit more than 3 weeks. Is it
normal?
Personally, I find them useful. But if it is just me, I can work without
them.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
A new warning in clang [1] points out a few places in s5p_mfc_cmd_v6.c
where an uninitialized variable is passed as a const pointer:
drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c:45:7: error: variable 'h2r_args' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
45 | &h2r_args);
| ^~~~~~~~
drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c:133:7: error: variable 'h2r_args' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
133 | &h2r_args);
| ^~~~~~~~
drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c:148:7: error: variable 'h2r_args' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
148 | &h2r_args);
| ^~~~~~~~
The args parameter in s5p_mfc_cmd_host2risc_v6() is never actually used,
so just pass NULL to it in the places where h2r_args is currently
passed, clearing up the warning and not changing the functionality of
the code.
Cc: stable(a)vger.kernel.org
Fixes: f96f3cfa0bb8 ("[media] s5p-mfc: Update MFC v4l2 driver to support MFC6.x")
Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d44… [1]
Closes: https://github.com/ClangBuiltLinux/linux/issues/2103
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
From what I can tell, it seems like ->cmd_host2risc() is only ever
called from v6 code, which always passes NULL? It seems like it should
be possible to just drop .cmd_host2risc on the v5 side, then update
.cmd_host2risc to only take two parameters? If so, I can send a follow
up as a clean up, so that this can go back relatively conflict free.
---
.../platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c | 22 +++++-----------------
1 file changed, 5 insertions(+), 17 deletions(-)
diff --git a/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c b/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c
index 47bc3014b5d8..735471c50dbb 100644
--- a/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c
+++ b/drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c
@@ -31,7 +31,6 @@ static int s5p_mfc_cmd_host2risc_v6(struct s5p_mfc_dev *dev, int cmd,
static int s5p_mfc_sys_init_cmd_v6(struct s5p_mfc_dev *dev)
{
- struct s5p_mfc_cmd_args h2r_args;
const struct s5p_mfc_buf_size_v6 *buf_size = dev->variant->buf_size->priv;
int ret;
@@ -41,33 +40,23 @@ static int s5p_mfc_sys_init_cmd_v6(struct s5p_mfc_dev *dev)
mfc_write(dev, dev->ctx_buf.dma, S5P_FIMV_CONTEXT_MEM_ADDR_V6);
mfc_write(dev, buf_size->dev_ctx, S5P_FIMV_CONTEXT_MEM_SIZE_V6);
- return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_SYS_INIT_V6,
- &h2r_args);
+ return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_SYS_INIT_V6, NULL);
}
static int s5p_mfc_sleep_cmd_v6(struct s5p_mfc_dev *dev)
{
- struct s5p_mfc_cmd_args h2r_args;
-
- memset(&h2r_args, 0, sizeof(struct s5p_mfc_cmd_args));
- return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_SLEEP_V6,
- &h2r_args);
+ return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_SLEEP_V6, NULL);
}
static int s5p_mfc_wakeup_cmd_v6(struct s5p_mfc_dev *dev)
{
- struct s5p_mfc_cmd_args h2r_args;
-
- memset(&h2r_args, 0, sizeof(struct s5p_mfc_cmd_args));
- return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_WAKEUP_V6,
- &h2r_args);
+ return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_WAKEUP_V6, NULL);
}
/* Open a new instance and get its number */
static int s5p_mfc_open_inst_cmd_v6(struct s5p_mfc_ctx *ctx)
{
struct s5p_mfc_dev *dev = ctx->dev;
- struct s5p_mfc_cmd_args h2r_args;
int codec_type;
mfc_debug(2, "Requested codec mode: %d\n", ctx->codec_mode);
@@ -130,14 +119,13 @@ static int s5p_mfc_open_inst_cmd_v6(struct s5p_mfc_ctx *ctx)
mfc_write(dev, 0, S5P_FIMV_D_CRC_CTRL_V6); /* no crc */
return s5p_mfc_cmd_host2risc_v6(dev, S5P_FIMV_H2R_CMD_OPEN_INSTANCE_V6,
- &h2r_args);
+ NULL);
}
/* Close instance */
static int s5p_mfc_close_inst_cmd_v6(struct s5p_mfc_ctx *ctx)
{
struct s5p_mfc_dev *dev = ctx->dev;
- struct s5p_mfc_cmd_args h2r_args;
int ret = 0;
dev->curr_ctx = ctx->num;
@@ -145,7 +133,7 @@ static int s5p_mfc_close_inst_cmd_v6(struct s5p_mfc_ctx *ctx)
mfc_write(dev, ctx->inst_no, S5P_FIMV_INSTANCE_ID_V6);
ret = s5p_mfc_cmd_host2risc_v6(dev,
S5P_FIMV_H2R_CMD_CLOSE_INSTANCE_V6,
- &h2r_args);
+ NULL);
} else {
ret = -EINVAL;
}
---
base-commit: 347e9f5043c89695b01e66b3ed111755afcf1911
change-id: 20250715-media-s5p-mfc-fix-uninit-const-pointer-cbf944ae4b4b
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
From: Edip Hazuri <edip(a)medip.dev>
The mute led on this laptop is using ALC245 but requires a quirk to work
This patch enables the existing quirk for the device.
Tested on Victus 16-S0063NT Laptop. The LED behaviour works
as intended.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Edip Hazuri <edip(a)medip.dev>
---
sound/hda/codecs/realtek/alc269.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index 05019fa732..77322ff8a6 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6528,6 +6528,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8bbe, "HP Victus 16-r0xxx (MB 8BBE)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bc8, "HP Victus 15-fa1xxx", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bcd, "HP Omen 16-xd0xxx", ALC245_FIXUP_HP_MUTE_LED_V1_COEFBIT),
+ SND_PCI_QUIRK(0x103c, 0x8bd4, "HP Victus 16-s0xxx (MB 8BD4)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bdd, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bde, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bdf, "HP Envy 15", ALC287_FIXUP_CS35L41_I2C_2),
--
2.50.1
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series backports Lorenzo's series to the 5.4 kernel.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
Lorenzo Stoakes (3):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: perform the mapping_map_writable() check after call_mmap()
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 ++--
include/linux/mm.h | 26 +++++++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/mmap.c | 26 ++++++++++++++++----------
mm/shmem.c | 2 +-
8 files changed, 42 insertions(+), 24 deletions(-)
--
2.50.1.552.g942d659e1b-goog
The process of adding an I2C adapter can invoke I2C accesses on that new
adapter (see i2c_detect()).
Ensure we have set the adapter's driver data to avoid null pointer
dereferences in the xfer functions during the adapter add.
This has been noted in the past and the same fix proposed but not
completed. See:
https://lore.kernel.org/lkml/ef597e73-ed71-168e-52af-0d19b03734ac@vigem.de/
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sumanth Gavini <sumanth.gavini(a)yahoo.com>
---
drivers/hid/hid-mcp2221.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-mcp2221.c b/drivers/hid/hid-mcp2221.c
index de52e9f7bb8c..9973545c1c4b 100644
--- a/drivers/hid/hid-mcp2221.c
+++ b/drivers/hid/hid-mcp2221.c
@@ -873,12 +873,12 @@ static int mcp2221_probe(struct hid_device *hdev,
"MCP2221 usb-i2c bridge on hidraw%d",
((struct hidraw *)hdev->hidraw)->minor);
+ i2c_set_adapdata(&mcp->adapter, mcp);
ret = i2c_add_adapter(&mcp->adapter);
if (ret) {
hid_err(hdev, "can't add usb-i2c adapter: %d\n", ret);
goto err_i2c;
}
- i2c_set_adapdata(&mcp->adapter, mcp);
/* Setup GPIO chip */
mcp->gc = devm_kzalloc(&hdev->dev, sizeof(*mcp->gc), GFP_KERNEL);
--
2.43.0
The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.
Relax the check around --ld-path so it gets used for all linkers.
Fixes: dfc1b168a8c4 ("kbuild: userprogs: use correct lld when linking through clang")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Nathan, you originally proposed the check for $(CONFIG_LD_IS_LLD) [0],
could you take a look at this?
[0] https://lore.kernel.org/all/20250213175437.GA2756218@ax162/
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index c09766beb7eff4780574682b8ea44475fc0a5188..e300c6546c845c300edb5f0033719963c7da8f9b 100644
--- a/Makefile
+++ b/Makefile
@@ -1134,7 +1134,7 @@ KBUILD_USERCFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD
KBUILD_USERLDFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS))
# userspace programs are linked via the compiler, use the correct linker
-ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_LD_IS_LLD),yy)
+ifneq ($(CONFIG_CC_IS_CLANG),)
KBUILD_USERLDFLAGS += --ld-path=$(LD)
endif
---
base-commit: 6832a9317eee280117cd695fa885b2b7a7a38daf
change-id: 20250723-userprogs-clang-gnu-ld-7a1c16fc852d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This patchset fixes the xe driver probe fail with -ETIMEDOUT issue in
linux 6.12.35 and later version. The failure is caused by commit
d42b44736ea2 ("drm/xe/gt: Update handling of xe_force_wake_get return"),
which incorrectly handles the return value of xe_force_wake_get as
"refcounted domain mask" (as introduced in 6.13), rather than status
code (as used in 6.12).
In 6.12 stable kernel, xe_force_wake_get still returns a status code.
The update incorrectly treats the return value as a mask, causing the
return value of 0 to be misinterpreted as an error. As a result, the
driver probe fails with -ETIMEDOUT in xe_pci_probe -> xe_device_probe
-> xe_gt_init_hwconfig -> xe_force_wake_get.
[ 1254.323172] xe 0000:00:02.0: [drm] Found ALDERLAKE_P (device ID 46a6) display version 13.00 stepping D0
[ 1254.323175] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] ALDERLAKE_P 46a6:000c dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:yes dma_m_s:39 tc:1 gscfi:0 cscfi:0
[ 1254.323275] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:C0, M:C0, B:**)
[ 1254.323328] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
[ 1254.323379] xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
[ 1254.323475] xe 0000:00:02.0: probe with driver xe failed with error -110
Similar return handling issue cause by API mismatch are also found in:
Commit 95a75ed2b005 ("drm/xe/tests/mocs: Update xe_force_wake_get() return handling")
Commit 9ffd6ec2de08 ("drm/xe/devcoredump: Update handling of xe_force_wake_get return")
This patchset fixes them by reverting them all.
Additionally, commit deb05f8431f3 ("drm/xe/forcewake: Add a helper
xe_force_wake_ref_has_domain()") is also reverted as it is not needed in
6.12.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5373
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Tomita Moeko (4):
Revert "drm/xe/gt: Update handling of xe_force_wake_get return"
Revert "drm/xe/tests/mocs: Update xe_force_wake_get() return handling"
Revert "drm/xe/devcoredump: Update handling of xe_force_wake_get
return"
Revert "drm/xe/forcewake: Add a helper xe_force_wake_ref_has_domain()"
drivers/gpu/drm/xe/tests/xe_mocs.c | 21 +++---
drivers/gpu/drm/xe/xe_devcoredump.c | 14 ++--
drivers/gpu/drm/xe/xe_force_wake.h | 16 -----
drivers/gpu/drm/xe/xe_gt.c | 105 +++++++++++++---------------
4 files changed, 63 insertions(+), 93 deletions(-)
--
2.47.2
Greg recently reported 3 patches that could not be applied without
conflicts in v6.1:
- f8a1d9b18c5e ("mptcp: make fallback action and fallback decision atomic")
- def5b7b2643e ("mptcp: plug races between subflow fail and subflow creation")
- da9b2fc7b73d ("mptcp: reset fallback status gracefully at disconnect() time")
Conflicts have been resolved, and documented in each patch.
Paolo Abeni (3):
mptcp: make fallback action and fallback decision atomic
mptcp: plug races between subflow fail and subflow creation
mptcp: reset fallback status gracefully at disconnect() time
net/mptcp/options.c | 3 ++-
net/mptcp/pm.c | 8 ++++++-
net/mptcp/protocol.c | 55 ++++++++++++++++++++++++++++++++++++--------
net/mptcp/protocol.h | 27 +++++++++++++++++-----
net/mptcp/subflow.c | 30 +++++++++++++++---------
5 files changed, 95 insertions(+), 28 deletions(-)
--
2.50.0
The patch titled
Subject: mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
has been added to the -mm mm-new branch. Its filename is
mm-memory-tiering-fix-pgpromote_candidate-counting.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ruan Shiyang <ruansy.fnst(a)fujitsu.com>
Subject: mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
Date: Tue, 29 Jul 2025 11:51:01 +0800
Goto-san reported confusing pgpromote statistics where the
pgpromote_success count significantly exceeded pgpromote_candidate.
On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB):
# Enable demotion only
echo 1 > /sys/kernel/mm/numa/demotion_enabled
numactl -m 0-1 memhog -r200 3500M >/dev/null &
pid=$!
sleep 2
numactl memhog -r100 2500M >/dev/null &
sleep 10
kill -9 $pid # terminate the 1st memhog
# Enable promotion
echo 2 > /proc/sys/kernel/numa_balancing
After a few seconds, we observeed `pgpromote_candidate < pgpromote_success`
$ grep -e pgpromote /proc/vmstat
pgpromote_success 2579
pgpromote_candidate 0
In this scenario, after terminating the first memhog, the conditions for
pgdat_free_space_enough() are quickly met, and triggers promotion.
However, these migrated pages are only counted for in PGPROMOTE_SUCCESS,
not in PGPROMOTE_CANDIDATE.
To solve these confusing statistics, introduce PGPROMOTE_CANDIDATE_NRL to
count the missed promotion pages. And also, not counting these pages into
PGPROMOTE_CANDIDATE is to avoid changing the existing algorithm or
performance of the promotion rate limit.
Link: https://lkml.kernel.org/r/20250729035101.1601407-1-ruansy.fnst@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Signed-off-by: Ruan Shiyang <ruansy.fnst(a)fujitsu.com>
Reported-by: Yasunori Gotou (Fujitsu) <y-goto(a)fujitsu.com>
Suggested-by: Huang Ying <ying.huang(a)linux.alibaba.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 16 +++++++++++++++-
kernel/sched/fair.c | 5 +++--
mm/vmstat.c | 1 +
3 files changed, 19 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/include/linux/mmzone.h
@@ -234,7 +234,21 @@ enum node_stat_item {
#endif
#ifdef CONFIG_NUMA_BALANCING
PGPROMOTE_SUCCESS, /* promote successfully */
- PGPROMOTE_CANDIDATE, /* candidate pages to promote */
+ /**
+ * Candidate pages for promotion based on hint fault latency. This
+ * counter is used to control the promotion rate and adjust the hot
+ * threshold.
+ */
+ PGPROMOTE_CANDIDATE,
+ /**
+ * Not rate-limited (NRL) candidate pages for those can be promoted
+ * without considering hot threshold because of enough free pages in
+ * fast-tier node. These promotions bypass the regular hotness checks
+ * and do NOT influence the promotion rate-limiter or
+ * threshold-adjustment logic.
+ * This is for statistics/monitoring purposes.
+ */
+ PGPROMOTE_CANDIDATE_NRL,
#endif
/* PGDEMOTE_*: pages demoted */
PGDEMOTE_KSWAPD,
--- a/kernel/sched/fair.c~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/kernel/sched/fair.c
@@ -1940,11 +1940,13 @@ bool should_numa_migrate_memory(struct t
struct pglist_data *pgdat;
unsigned long rate_limit;
unsigned int latency, th, def_th;
+ long nr = folio_nr_pages(folio);
pgdat = NODE_DATA(dst_nid);
if (pgdat_free_space_enough(pgdat)) {
/* workload changed, reset hot threshold */
pgdat->nbp_threshold = 0;
+ mod_node_page_state(pgdat, PGPROMOTE_CANDIDATE_NRL, nr);
return true;
}
@@ -1958,8 +1960,7 @@ bool should_numa_migrate_memory(struct t
if (latency >= th)
return false;
- return !numa_promotion_rate_limit(pgdat, rate_limit,
- folio_nr_pages(folio));
+ return !numa_promotion_rate_limit(pgdat, rate_limit, nr);
}
this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
--- a/mm/vmstat.c~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/mm/vmstat.c
@@ -1280,6 +1280,7 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
[I(PGPROMOTE_SUCCESS)] = "pgpromote_success",
[I(PGPROMOTE_CANDIDATE)] = "pgpromote_candidate",
+ [I(PGPROMOTE_CANDIDATE_NRL)] = "pgpromote_candidate_nrl",
#endif
[I(PGDEMOTE_KSWAPD)] = "pgdemote_kswapd",
[I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
_
Patches currently in -mm which might be from ruansy.fnst(a)fujitsu.com are
mm-memory-tiering-fix-pgpromote_candidate-counting.patch
The patch titled
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
has been added to the -mm mm-unstable branch. Its filename is
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
Date: Tue, 29 Jul 2025 07:57:09 -0700
- Addressed Lorenzo's nits, per Lorenzo Stoakes
- Added a warning comment for vma_start_read()
- Added Reviewed-by and Acked-by, per Vlastimil Babka and Lorenzo Stoakes
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 7 +++++++
mm/mmap_lock.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/include/linux/mmap_lock.h
@@ -155,6 +155,10 @@ static inline void vma_refcount_put(stru
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -194,9 +198,12 @@ static inline struct vm_area_struct *vma
if (unlikely(vma->vm_mm != mm)) {
/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
struct mm_struct *other_mm = vma->vm_mm;
+
/*
* __mmdrop() is a heavy operation and we don't need RCU
* protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
*/
rcu_read_unlock();
mmgrab(other_mm);
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/mm/mmap_lock.c
@@ -235,7 +235,7 @@ retry:
goto fallback;
}
- /* Verify the vma is not behind of the last search position. */
+ /* Verify the vma is not behind the last search position. */
if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3.patch
From: Filipe Manana <fdmanana(a)suse.com>
commit f9baa501b4fd6962257853d46ddffbc21f27e344 upstream.
There are a few exceptional cases where cloning an inline extent needs to
copy the inline extent data into a page of the destination inode.
When this happens, we end up starting a transaction while having a dirty
page for the destination inode and while having the range locked in the
destination's inode iotree too. Because when reserving metadata space
for a transaction we may need to flush existing delalloc in case there is
not enough free space, we have a mechanism in place to prevent a deadlock,
which was introduced in commit 3d45f221ce627d ("btrfs: fix deadlock when
cloning inline extent and low on free metadata space").
However when using qgroups, a transaction also reserves metadata qgroup
space, which can also result in flushing delalloc in case there is not
enough available space at the moment. When this happens we deadlock, since
flushing delalloc requires locking the file range in the inode's iotree
and the range was already locked at the very beginning of the clone
operation, before attempting to start the transaction.
When this issue happens, stack traces like the following are reported:
[72747.556262] task:kworker/u81:9 state:D stack: 0 pid: 225 ppid: 2 flags:0x00004000
[72747.556268] Workqueue: writeback wb_workfn (flush-btrfs-1142)
[72747.556271] Call Trace:
[72747.556273] __schedule+0x296/0x760
[72747.556277] schedule+0x3c/0xa0
[72747.556279] io_schedule+0x12/0x40
[72747.556284] __lock_page+0x13c/0x280
[72747.556287] ? generic_file_readonly_mmap+0x70/0x70
[72747.556325] extent_write_cache_pages+0x22a/0x440 [btrfs]
[72747.556331] ? __set_page_dirty_nobuffers+0xe7/0x160
[72747.556358] ? set_extent_buffer_dirty+0x5e/0x80 [btrfs]
[72747.556362] ? update_group_capacity+0x25/0x210
[72747.556366] ? cpumask_next_and+0x1a/0x20
[72747.556391] extent_writepages+0x44/0xa0 [btrfs]
[72747.556394] do_writepages+0x41/0xd0
[72747.556398] __writeback_single_inode+0x39/0x2a0
[72747.556403] writeback_sb_inodes+0x1ea/0x440
[72747.556407] __writeback_inodes_wb+0x5f/0xc0
[72747.556410] wb_writeback+0x235/0x2b0
[72747.556414] ? get_nr_inodes+0x35/0x50
[72747.556417] wb_workfn+0x354/0x490
[72747.556420] ? newidle_balance+0x2c5/0x3e0
[72747.556424] process_one_work+0x1aa/0x340
[72747.556426] worker_thread+0x30/0x390
[72747.556429] ? create_worker+0x1a0/0x1a0
[72747.556432] kthread+0x116/0x130
[72747.556435] ? kthread_park+0x80/0x80
[72747.556438] ret_from_fork+0x1f/0x30
[72747.566958] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[72747.566961] Call Trace:
[72747.566964] __schedule+0x296/0x760
[72747.566968] ? finish_wait+0x80/0x80
[72747.566970] schedule+0x3c/0xa0
[72747.566995] wait_extent_bit.constprop.68+0x13b/0x1c0 [btrfs]
[72747.566999] ? finish_wait+0x80/0x80
[72747.567024] lock_extent_bits+0x37/0x90 [btrfs]
[72747.567047] btrfs_invalidatepage+0x299/0x2c0 [btrfs]
[72747.567051] ? find_get_pages_range_tag+0x2cd/0x380
[72747.567076] __extent_writepage+0x203/0x320 [btrfs]
[72747.567102] extent_write_cache_pages+0x2bb/0x440 [btrfs]
[72747.567106] ? update_load_avg+0x7e/0x5f0
[72747.567109] ? enqueue_entity+0xf4/0x6f0
[72747.567134] extent_writepages+0x44/0xa0 [btrfs]
[72747.567137] ? enqueue_task_fair+0x93/0x6f0
[72747.567140] do_writepages+0x41/0xd0
[72747.567144] __filemap_fdatawrite_range+0xc7/0x100
[72747.567167] btrfs_run_delalloc_work+0x17/0x40 [btrfs]
[72747.567195] btrfs_work_helper+0xc2/0x300 [btrfs]
[72747.567200] process_one_work+0x1aa/0x340
[72747.567202] worker_thread+0x30/0x390
[72747.567205] ? create_worker+0x1a0/0x1a0
[72747.567208] kthread+0x116/0x130
[72747.567211] ? kthread_park+0x80/0x80
[72747.567214] ret_from_fork+0x1f/0x30
[72747.569686] task:fsstress state:D stack: 0 pid:841421 ppid:841417 flags:0x00000000
[72747.569689] Call Trace:
[72747.569691] __schedule+0x296/0x760
[72747.569694] schedule+0x3c/0xa0
[72747.569721] try_flush_qgroup+0x95/0x140 [btrfs]
[72747.569725] ? finish_wait+0x80/0x80
[72747.569753] btrfs_qgroup_reserve_data+0x34/0x50 [btrfs]
[72747.569781] btrfs_check_data_free_space+0x5f/0xa0 [btrfs]
[72747.569804] btrfs_buffered_write+0x1f7/0x7f0 [btrfs]
[72747.569810] ? path_lookupat.isra.48+0x97/0x140
[72747.569833] btrfs_file_write_iter+0x81/0x410 [btrfs]
[72747.569836] ? __kmalloc+0x16a/0x2c0
[72747.569839] do_iter_readv_writev+0x160/0x1c0
[72747.569843] do_iter_write+0x80/0x1b0
[72747.569847] vfs_writev+0x84/0x140
[72747.569869] ? btrfs_file_llseek+0x38/0x270 [btrfs]
[72747.569873] do_writev+0x65/0x100
[72747.569876] do_syscall_64+0x33/0x40
[72747.569879] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[72747.569899] task:fsstress state:D stack: 0 pid:841424 ppid:841417 flags:0x00004000
[72747.569903] Call Trace:
[72747.569906] __schedule+0x296/0x760
[72747.569909] schedule+0x3c/0xa0
[72747.569936] try_flush_qgroup+0x95/0x140 [btrfs]
[72747.569940] ? finish_wait+0x80/0x80
[72747.569967] __btrfs_qgroup_reserve_meta+0x36/0x50 [btrfs]
[72747.569989] start_transaction+0x279/0x580 [btrfs]
[72747.570014] clone_copy_inline_extent+0x332/0x490 [btrfs]
[72747.570041] btrfs_clone+0x5b7/0x7a0 [btrfs]
[72747.570068] ? lock_extent_bits+0x64/0x90 [btrfs]
[72747.570095] btrfs_clone_files+0xfc/0x150 [btrfs]
[72747.570122] btrfs_remap_file_range+0x3d8/0x4a0 [btrfs]
[72747.570126] do_clone_file_range+0xed/0x200
[72747.570131] vfs_clone_file_range+0x37/0x110
[72747.570134] ioctl_file_clone+0x7d/0xb0
[72747.570137] do_vfs_ioctl+0x138/0x630
[72747.570140] __x64_sys_ioctl+0x62/0xc0
[72747.570143] do_syscall_64+0x33/0x40
[72747.570146] entry_SYSCALL_64_after_hwframe+0x44/0xa9
So fix this by skipping the flush of delalloc for an inode that is
flagged with BTRFS_INODE_NO_DELALLOC_FLUSH, meaning it is currently under
such a special case of cloning an inline extent, when flushing delalloc
during qgroup metadata reservation.
The special cases for cloning inline extents were added in kernel 5.7 by
by commit 05a5a7621ce66c ("Btrfs: implement full reflink support for
inline extents"), while having qgroup metadata space reservation flushing
delalloc when low on space was added in kernel 5.9 by commit
c53e9653605dbf ("btrfs: qgroup: try to flush qgroup space when we get
-EDQUOT"). So use a "Fixes:" tag for the later commit to ease stable
kernel backports.
Reported-by: Wang Yugui <wangyugui(a)e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20210421083137.31E3.409509F4@e16-tech.c…
Fixes: c53e9653605dbf ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable(a)vger.kernel.org # 5.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on 5.10.y, Passed false to
btrfs_start_delalloc_flush() in fs/btrfs/transaction.c file to
maintain the default behaviour]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
fs/btrfs/ctree.h | 2 +-
fs/btrfs/inode.c | 4 ++--
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/qgroup.c | 2 +-
fs/btrfs/send.c | 4 ++--
fs/btrfs/transaction.c | 2 +-
6 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7ad3091db571..d9d6a57acafe 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3013,7 +3013,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
struct inode *inode, u64 new_size,
u32 min_type);
-int btrfs_start_delalloc_snapshot(struct btrfs_root *root);
+int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_context);
int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
bool in_reclaim_context);
int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8d7ca8a21525..99aad39fad13 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9566,7 +9566,7 @@ static int start_delalloc_inodes(struct btrfs_root *root,
return ret;
}
-int btrfs_start_delalloc_snapshot(struct btrfs_root *root)
+int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_context)
{
struct writeback_control wbc = {
.nr_to_write = LONG_MAX,
@@ -9579,7 +9579,7 @@ int btrfs_start_delalloc_snapshot(struct btrfs_root *root)
if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
return -EROFS;
- return start_delalloc_inodes(root, &wbc, true, false);
+ return start_delalloc_inodes(root, &wbc, true, in_reclaim_context);
}
int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 24c4d059cfab..9d5dfcec22de 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1030,7 +1030,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
*/
btrfs_drew_read_lock(&root->snapshot_lock);
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, false);
if (ret)
goto out;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 95a39d535a82..bc1feb97698c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3704,7 +3704,7 @@ static int try_flush_qgroup(struct btrfs_root *root)
return 0;
}
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, true);
if (ret < 0)
goto out;
btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3e7bb24eb227..d86b4d13cae4 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -7207,7 +7207,7 @@ static int flush_delalloc_roots(struct send_ctx *sctx)
int i;
if (root) {
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, false);
if (ret)
return ret;
btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX);
@@ -7215,7 +7215,7 @@ static int flush_delalloc_roots(struct send_ctx *sctx)
for (i = 0; i < sctx->clone_roots_cnt; i++) {
root = sctx->clone_roots[i].root;
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, false);
if (ret)
return ret;
btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 21a5a963c70e..424b1dd3fe27 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2045,7 +2045,7 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_trans_handle *trans)
list_for_each_entry(pending, head, list) {
int ret;
- ret = btrfs_start_delalloc_snapshot(pending->root);
+ ret = btrfs_start_delalloc_snapshot(pending->root, false);
if (ret)
return ret;
}
--
2.40.4
From: Sean Christopherson <seanjc(a)google.com>
[ Upstream commit 17bcd714426386fda741a4bccd96a2870179344b ]
Free vCPUs before freeing any VM state, as both SVM and VMX may access
VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
to be kicked out of nested guest mode.
Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
called") partially fixed the issue, but for unknown reasons only moved the
MMU unloading before VM destruction. Complete the change, and free all
vCPU state prior to destroying VM state, as nVMX accesses even more state
than nSVM.
In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
kvm_msr_allowed+0x4c/0xd0
__kvm_set_msr+0x12d/0x1e0
kvm_set_msr+0x19/0x40
load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
vmx_free_vcpu+0x54/0xc0 [kvm_intel]
kvm_arch_vcpu_destroy+0x28/0xf0
kvm_vcpu_destroy+0x12/0x50
kvm_arch_destroy_vm+0x12c/0x1c0
kvm_put_kvm+0x263/0x3c0
kvm_vm_release+0x21/0x30
and an upcoming fix to process injectable interrupts on nested VM-Exit
will access the PIC:
BUG: kernel NULL pointer dereference, address: 0000000000000090
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm]
Call Trace:
<TASK>
kvm_cpu_has_injectable_intr+0xe/0x60 [kvm]
nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel]
nested_vmx_free_vcpu+0x40/0x50 [kvm_intel]
vmx_vcpu_free+0x2d/0x80 [kvm_intel]
kvm_arch_vcpu_destroy+0x2d/0x130 [kvm]
kvm_destroy_vcpus+0x8a/0x100 [kvm]
kvm_arch_destroy_vm+0xa7/0x1d0 [kvm]
kvm_destroy_vm+0x172/0x300 [kvm]
kvm_vcpu_release+0x31/0x50 [kvm]
Inarguably, both nSVM and nVMX need to be fixed, but punt on those
cleanups for the moment. Conceptually, vCPUs should be freed before VM
state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs
are created, so it stands to reason that they must be freed _after_ vCPUs
are destroyed.
Reported-by: Aaron Lewis <aaronlewis(a)google.com>
Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com
Cc: Jim Mattson <jmattson(a)google.com>
Cc: Yan Zhao <yan.y.zhao(a)intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: Kai Huang <kai.huang(a)intel.com>
Cc: Isaku Yamahata <isaku.yamahata(a)intel.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Kevin Cheng <chengkev(a)google.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f378d479fea3f..7f91b11e6f0ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12888,11 +12888,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
kvm_unload_vcpu_mmus(kvm);
+ kvm_destroy_vcpus(kvm);
kvm_x86_call(vm_destroy)(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
- kvm_destroy_vcpus(kvm);
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
kvm_mmu_uninit_vm(kvm);
--
2.50.1.470.g6ba607880d-goog
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Ming Lei <ming.lei(a)redhat.com>
[ Upstream commit a647a524a46736786c95cdb553a070322ca096e3 ]
rq_qos framework is only applied on request based driver, so:
1) rq_qos_done_bio() needn't to be called for bio based driver
2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
such as bios ended from error handling code.
Especially in bio_endio():
1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
may be gone since request queue refcount may not be held in above two
cases
2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
__rq_qos_done_bio()
Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.
Reported-by: Yu Kuai <yukuai3(a)huawei.com>
Cc: tj(a)kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Tejun Heo <tj(a)kernel.org>
Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
block/bio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bio.c b/block/bio.c
index 88a09c31095f..7851f54edc76 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1430,7 +1430,7 @@ void bio_endio(struct bio *bio)
if (!bio_integrity_endio(bio))
return;
- if (bio->bi_disk)
+ if (bio->bi_disk && bio_flagged(bio, BIO_TRACKED))
rq_qos_done_bio(bio->bi_disk->queue, bio);
/*
--
2.40.4
From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
commit ac2140449184a26eac99585b7f69814bd3ba8f2d upstream.
This commit addresses a potential null pointer dereference issue in the
`dcn32_acquire_idle_pipe_for_head_pipe_in_layer` function. The issue
could occur when `head_pipe` is null.
The fix adds a check to ensure `head_pipe` is not null before asserting
it. If `head_pipe` is null, the function returns NULL to prevent a
potential null pointer dereference.
Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn32/dcn32_resource.c:2690 dcn32_acquire_idle_pipe_for_head_pipe_in_layer() error: we previously assumed 'head_pipe' could be null (see line 2681)
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[ Daniil: dcn32 was moved from drivers/gpu/drm/amd/display/dc to
drivers/gpu/drm/amd/display/dc/resource since commit
8b8eed05a1c6 ("drm/amd/display: Refactor resource into component directory").
The path is changed accordingly to apply the patch on 6.1.y. and 6.6.y ]
Signed-off-by: Daniil Dulov <d.dulov(a)aladdin.ru>
---
Backport fix for CVE-2024-49918
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
index 1b1534ffee9f..591c3166a468 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
@@ -2563,8 +2563,10 @@ struct pipe_ctx *dcn32_acquire_idle_pipe_for_head_pipe_in_layer(
struct resource_context *old_ctx = &stream->ctx->dc->current_state->res_ctx;
int head_index;
- if (!head_pipe)
+ if (!head_pipe) {
ASSERT(0);
+ return NULL;
+ }
/*
* Modified from dcn20_acquire_idle_pipe_for_layer
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072801-game-appendix-1d20@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Thu, 3 Jul 2025 17:52:32 +0800
Subject: [PATCH] ice: Fix a null pointer dereference in
ice_copy_and_init_pkg()
Add check for the return value of devm_kmemdup()
to prevent potential null pointer dereference.
Fixes: c76488109616 ("ice: Implement Dynamic Device Personalization (DDP) download")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Tested-by: Rinitha S <sx.rinitha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index 59323c019544..351824dc3c62 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -2301,6 +2301,8 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
return ICE_DDP_PKG_ERR;
buf_copy = devm_kmemdup(ice_hw_to_dev(hw), buf, len, GFP_KERNEL);
+ if (!buf_copy)
+ return ICE_DDP_PKG_ERR;
state = ice_init_pkg(hw, buf_copy, len);
if (!ice_is_init_pkg_successful(state)) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072801-groove-marauding-e4a9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Thu, 3 Jul 2025 17:52:32 +0800
Subject: [PATCH] ice: Fix a null pointer dereference in
ice_copy_and_init_pkg()
Add check for the return value of devm_kmemdup()
to prevent potential null pointer dereference.
Fixes: c76488109616 ("ice: Implement Dynamic Device Personalization (DDP) download")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Tested-by: Rinitha S <sx.rinitha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index 59323c019544..351824dc3c62 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -2301,6 +2301,8 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
return ICE_DDP_PKG_ERR;
buf_copy = devm_kmemdup(ice_hw_to_dev(hw), buf, len, GFP_KERNEL);
+ if (!buf_copy)
+ return ICE_DDP_PKG_ERR;
state = ice_init_pkg(hw, buf_copy, len);
if (!ice_is_init_pkg_successful(state)) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072858-pentagon-gumball-b400@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Thu, 3 Jul 2025 17:52:32 +0800
Subject: [PATCH] ice: Fix a null pointer dereference in
ice_copy_and_init_pkg()
Add check for the return value of devm_kmemdup()
to prevent potential null pointer dereference.
Fixes: c76488109616 ("ice: Implement Dynamic Device Personalization (DDP) download")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Tested-by: Rinitha S <sx.rinitha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index 59323c019544..351824dc3c62 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -2301,6 +2301,8 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
return ICE_DDP_PKG_ERR;
buf_copy = devm_kmemdup(ice_hw_to_dev(hw), buf, len, GFP_KERNEL);
+ if (!buf_copy)
+ return ICE_DDP_PKG_ERR;
state = ice_init_pkg(hw, buf_copy, len);
if (!ice_is_init_pkg_successful(state)) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 9e0c433d0c05fde284025264b89eaa4ad59f0a3e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072852-brewing-moonrise-04ac@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9e0c433d0c05fde284025264b89eaa4ad59f0a3e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Thu, 10 Jul 2025 23:17:12 +0300
Subject: [PATCH] drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On g4x we currently use the 96MHz non-SSC refclk, which can't actually
generate an exact 2.7 Gbps link rate. In practice we end up with 2.688
Gbps which seems to be close enough to actually work, but link training
is currently failing due to miscalculating the DP_LINK_BW value (we
calcualte it directly from port_clock which reflects the actual PLL
outpout frequency).
Ideas how to fix this:
- nudge port_clock back up to 270000 during PLL computation/readout
- track port_clock and the nominal link rate separately so they might
differ a bit
- switch to the 100MHz refclk, but that one should be SSC so perhaps
not something we want
While we ponder about a better solution apply some band aid to the
immediate issue of miscalculated DP_LINK_BW value. With this
I can again use 2.7 Gbps link rate on g4x.
Cc: stable(a)vger.kernel.org
Fixes: 665a7b04092c ("drm/i915: Feed the DPLL output freq back into crtc_state")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710201718.25310-2-ville.…
Reviewed-by: Imre Deak <imre.deak(a)intel.com>
(cherry picked from commit a8b874694db5cae7baaf522756f87acd956e6e66)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 640c43bf62d4..724de7ed3c04 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1604,6 +1604,12 @@ int intel_dp_rate_select(struct intel_dp *intel_dp, int rate)
void intel_dp_compute_rate(struct intel_dp *intel_dp, int port_clock,
u8 *link_bw, u8 *rate_select)
{
+ struct intel_display *display = to_intel_display(intel_dp);
+
+ /* FIXME g4x can't generate an exact 2.7GHz with the 96MHz non-SSC refclk */
+ if (display->platform.g4x && port_clock == 268800)
+ port_clock = 270000;
+
/* eDP 1.4 rate select method. */
if (intel_dp->use_rate_select) {
*link_bw = 0;
Revert kvec msg iterator before trying to process a TLS alert
when possible.
In nvmet_tcp_try_recv_data(), it's assumed that no msg control
message buffer is set prior to sock_recvmsg(). If no control
message structure is setup, kTLS layer will read and process
TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, we setup a kvec
backed control buffer and read in the control message such as
a TLS alert. Msg can advance the kvec pointer as a part of the
copy process thus we need to revert the iterator before calling
into the tls_alert_recv.
Fixes: a1c5dd8355b1 ("nvmet-tcp: control messages for recvmsg()")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
drivers/nvme/target/tcp.c | 48 +++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 688033b88d38..cf3336ddc9a3 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1161,6 +1161,7 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
if (unlikely(len < 0))
return len;
if (queue->tls_pskid) {
+ iov_iter_revert(&msg.msg_iter, len);
ret = nvmet_tcp_tls_record_ok(queue, &msg, cbuf);
if (ret < 0)
return ret;
@@ -1218,18 +1219,47 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
{
struct nvmet_tcp_cmd *cmd = queue->cmd;
int len, ret;
+ union {
+ struct cmsghdr cmsg;
+ u8 buf[CMSG_SPACE(sizeof(u8))];
+ } u;
+ u8 alert[2];
+ struct kvec alert_kvec = {
+ .iov_base = alert,
+ .iov_len = sizeof(alert),
+ };
+ struct msghdr msg = {
+ .msg_control = &u,
+ .msg_controllen = sizeof(u),
+ };
while (msg_data_left(&cmd->recv_msg)) {
+ /* assumed that cmg->recv_msg's control buffer is not setup
+ */
+ WARN_ON_ONCE(cmd->recv_msg.msg_controllen > 0);
+
len = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg,
cmd->recv_msg.msg_flags);
+ if (cmd->recv_msg.msg_flags & MSG_CTRUNC) {
+ cmd->recv_msg.msg_flags &= ~(MSG_CTRUNC | MSG_EOF);
+ if (len == 0 || len == -EIO) {
+ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec,
+ 1, alert_kvec.iov_len);
+ len = sock_recvmsg(cmd->queue->sock, &msg,
+ MSG_DONTWAIT);
+ if (len > 0 &&
+ tls_get_record_type(cmd->queue->sock->sk,
+ &u.cmsg) == TLS_RECORD_TYPE_ALERT) {
+ iov_iter_revert(&msg.msg_iter, len);
+ ret = nvmet_tcp_tls_record_ok(cmd->queue,
+ &msg, u.buf);
+ if (ret < 0)
+ return ret;
+ }
+ }
+ }
if (len <= 0)
return len;
- if (queue->tls_pskid) {
- ret = nvmet_tcp_tls_record_ok(cmd->queue,
- &cmd->recv_msg, cmd->recv_cbuf);
- if (ret < 0)
- return ret;
- }
cmd->pdu_recv += len;
cmd->rbytes_done += len;
@@ -1267,6 +1297,7 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
if (unlikely(len < 0))
return len;
if (queue->tls_pskid) {
+ iov_iter_revert(&msg.msg_iter, len);
ret = nvmet_tcp_tls_record_ok(queue, &msg, cbuf);
if (ret < 0)
return ret;
@@ -1453,10 +1484,6 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue,
if (!c->r2t_pdu)
goto out_free_data;
- if (queue->state == NVMET_TCP_Q_TLS_HANDSHAKE) {
- c->recv_msg.msg_control = c->recv_cbuf;
- c->recv_msg.msg_controllen = sizeof(c->recv_cbuf);
- }
c->recv_msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
list_add_tail(&c->entry, &queue->free_list);
@@ -1736,6 +1763,7 @@ static int nvmet_tcp_try_peek_pdu(struct nvmet_tcp_queue *queue)
return len;
}
+ iov_iter_revert(&msg.msg_iter, len);
ret = nvmet_tcp_tls_record_ok(queue, &msg, cbuf);
if (ret < 0)
return ret;
--
2.47.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 9e0c433d0c05fde284025264b89eaa4ad59f0a3e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072822-armrest-dominoes-7934@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9e0c433d0c05fde284025264b89eaa4ad59f0a3e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Thu, 10 Jul 2025 23:17:12 +0300
Subject: [PATCH] drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On g4x we currently use the 96MHz non-SSC refclk, which can't actually
generate an exact 2.7 Gbps link rate. In practice we end up with 2.688
Gbps which seems to be close enough to actually work, but link training
is currently failing due to miscalculating the DP_LINK_BW value (we
calcualte it directly from port_clock which reflects the actual PLL
outpout frequency).
Ideas how to fix this:
- nudge port_clock back up to 270000 during PLL computation/readout
- track port_clock and the nominal link rate separately so they might
differ a bit
- switch to the 100MHz refclk, but that one should be SSC so perhaps
not something we want
While we ponder about a better solution apply some band aid to the
immediate issue of miscalculated DP_LINK_BW value. With this
I can again use 2.7 Gbps link rate on g4x.
Cc: stable(a)vger.kernel.org
Fixes: 665a7b04092c ("drm/i915: Feed the DPLL output freq back into crtc_state")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710201718.25310-2-ville.…
Reviewed-by: Imre Deak <imre.deak(a)intel.com>
(cherry picked from commit a8b874694db5cae7baaf522756f87acd956e6e66)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 640c43bf62d4..724de7ed3c04 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1604,6 +1604,12 @@ int intel_dp_rate_select(struct intel_dp *intel_dp, int rate)
void intel_dp_compute_rate(struct intel_dp *intel_dp, int port_clock,
u8 *link_bw, u8 *rate_select)
{
+ struct intel_display *display = to_intel_display(intel_dp);
+
+ /* FIXME g4x can't generate an exact 2.7GHz with the 96MHz non-SSC refclk */
+ if (display->platform.g4x && port_clock == 268800)
+ port_clock = 270000;
+
/* eDP 1.4 rate select method. */
if (intel_dp->use_rate_select) {
*link_bw = 0;
Scott Mayhew discovered a security exploit in NFS over TLS in
tls_alert_recv() due to its assumption it can read data from
the msg iterator's kvec..
kTLS implementation splits TLS non-data record payload between
the control message buffer (which includes the type such as TLS
aler or TLS cipher change) and the rest of the payload (say TLS
alert's level/description) which goes into the msg payload buffer.
This patch proposes to rework how control messages are setup and
used by sock_recvmsg().
If no control message structure is setup, kTLS layer will read and
process TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, NFS can setup a
kvec backed msg buffer and read in the control message such as a
TLS alert. Msg iterator can advance the kvec pointer as a part of
the copy process thus we need to revert the iterator before calling
into the tls_alert_recv.
Reported-by: Scott Mayhew <smayhew(a)redhat.com>
Fixes: 5e052dda121e ("SUNRPC: Recognize control messages in server-side TCP socket code")
Suggested-by: Trond Myklebust <trondmy(a)hammerspace.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
net/sunrpc/svcsock.c | 43 +++++++++++++++++++++++++++++++++++--------
1 file changed, 35 insertions(+), 8 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 46c156b121db..e2c5e0e626f9 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -257,20 +257,47 @@ svc_tcp_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
}
static int
-svc_tcp_sock_recv_cmsg(struct svc_sock *svsk, struct msghdr *msg)
+svc_tcp_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags)
{
union {
struct cmsghdr cmsg;
u8 buf[CMSG_SPACE(sizeof(u8))];
} u;
- struct socket *sock = svsk->sk_sock;
+ u8 alert[2];
+ struct kvec alert_kvec = {
+ .iov_base = alert,
+ .iov_len = sizeof(alert),
+ };
+ struct msghdr msg = {
+ .msg_flags = *msg_flags,
+ .msg_control = &u,
+ .msg_controllen = sizeof(u),
+ };
+ int ret;
+
+ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1,
+ alert_kvec.iov_len);
+ ret = sock_recvmsg(sock, &msg, MSG_DONTWAIT);
+ if (ret > 0 &&
+ tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) {
+ iov_iter_revert(&msg.msg_iter, ret);
+ ret = svc_tcp_sock_process_cmsg(sock, &msg, &u.cmsg, -EAGAIN);
+ }
+ return ret;
+}
+
+static int
+svc_tcp_sock_recvmsg(struct svc_sock *svsk, struct msghdr *msg)
+{
int ret;
+ struct socket *sock = svsk->sk_sock;
- msg->msg_control = &u;
- msg->msg_controllen = sizeof(u);
ret = sock_recvmsg(sock, msg, MSG_DONTWAIT);
- if (unlikely(msg->msg_controllen != sizeof(u)))
- ret = svc_tcp_sock_process_cmsg(sock, msg, &u.cmsg, ret);
+ if (msg->msg_flags & MSG_CTRUNC) {
+ msg->msg_flags &= ~(MSG_CTRUNC | MSG_EOR);
+ if (ret == 0 || ret == -EIO)
+ ret = svc_tcp_sock_recv_cmsg(sock, &msg->msg_flags);
+ }
return ret;
}
@@ -321,7 +348,7 @@ static ssize_t svc_tcp_read_msg(struct svc_rqst *rqstp, size_t buflen,
iov_iter_advance(&msg.msg_iter, seek);
buflen -= seek;
}
- len = svc_tcp_sock_recv_cmsg(svsk, &msg);
+ len = svc_tcp_sock_recvmsg(svsk, &msg);
if (len > 0)
svc_flush_bvec(bvec, len, seek);
@@ -1018,7 +1045,7 @@ static ssize_t svc_tcp_read_marker(struct svc_sock *svsk,
iov.iov_base = ((char *)&svsk->sk_marker) + svsk->sk_tcplen;
iov.iov_len = want;
iov_iter_kvec(&msg.msg_iter, ITER_DEST, &iov, 1, want);
- len = svc_tcp_sock_recv_cmsg(svsk, &msg);
+ len = svc_tcp_sock_recvmsg(svsk, &msg);
if (len < 0)
return len;
svsk->sk_tcplen += len;
--
2.47.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 9e0c433d0c05fde284025264b89eaa4ad59f0a3e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072832-cannabis-shout-55f6@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9e0c433d0c05fde284025264b89eaa4ad59f0a3e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Thu, 10 Jul 2025 23:17:12 +0300
Subject: [PATCH] drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On g4x we currently use the 96MHz non-SSC refclk, which can't actually
generate an exact 2.7 Gbps link rate. In practice we end up with 2.688
Gbps which seems to be close enough to actually work, but link training
is currently failing due to miscalculating the DP_LINK_BW value (we
calcualte it directly from port_clock which reflects the actual PLL
outpout frequency).
Ideas how to fix this:
- nudge port_clock back up to 270000 during PLL computation/readout
- track port_clock and the nominal link rate separately so they might
differ a bit
- switch to the 100MHz refclk, but that one should be SSC so perhaps
not something we want
While we ponder about a better solution apply some band aid to the
immediate issue of miscalculated DP_LINK_BW value. With this
I can again use 2.7 Gbps link rate on g4x.
Cc: stable(a)vger.kernel.org
Fixes: 665a7b04092c ("drm/i915: Feed the DPLL output freq back into crtc_state")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710201718.25310-2-ville.…
Reviewed-by: Imre Deak <imre.deak(a)intel.com>
(cherry picked from commit a8b874694db5cae7baaf522756f87acd956e6e66)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 640c43bf62d4..724de7ed3c04 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1604,6 +1604,12 @@ int intel_dp_rate_select(struct intel_dp *intel_dp, int rate)
void intel_dp_compute_rate(struct intel_dp *intel_dp, int port_clock,
u8 *link_bw, u8 *rate_select)
{
+ struct intel_display *display = to_intel_display(intel_dp);
+
+ /* FIXME g4x can't generate an exact 2.7GHz with the 96MHz non-SSC refclk */
+ if (display->platform.g4x && port_clock == 268800)
+ port_clock = 270000;
+
/* eDP 1.4 rate select method. */
if (intel_dp->use_rate_select) {
*link_bw = 0;
Hello,
The following is the original thread, where a bug was reported to the
linux-wireless and ath10k mailing lists. The specific bug has been
detailed clearly here.
https://lore.kernel.org/linux-wireless/690B1DB2-C9DC-4FAD-8063-4CED659B1701…
There is also a Bugzilla report by me, which was opened later:
https://bugzilla.kernel.org/show_bug.cgi?id=220264
As stated, it is highly encouraged to check out all the logs,
especially the line of IRQ #16 in /proc/interrupts.
Here is where all the logs are:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
(these logs are taken from an Arch liveboot)
On my daily driver, I found these on my IRQ #16:
16: 173210 0 0 0 IR-IO-APIC
16-fasteoi i2c_designware.0, idma64.0, i801_smbus
The fixes stated on the Reddit post for this Wi-Fi card didn't quite
work. (But git-cloning the firmware files did give me some more time
to have stable internet)
This time, I had to go for the GRUB kernel parameters.
Right now, I'm using "irqpoll" to curb the errors caused.
"intel_iommu=off" did not work, and the Wi-Fi was constantly crashing
even then. Did not try out "pci=noaer" this time.
If it's of any concern, there is a very weird error in Chromium-based
browsers which has only happened after I started using irqpoll. When I
Google something, the background of the individual result boxes shows
as pure black, while the surrounding space is the usual
greyish-blackish, like we see in Dark Mode. Here is a picture of the
exact thing I'm experiencing: https://files.catbox.moe/mjew6g.png
If you notice anything in my logs/bug reports, please let me know.
(Because it seems like Wi-Fi errors are just a red herring, there are
some ACPI or PCIe-related errors in the computers of this model - just
a naive speculation, though.)
Thanking you,
Bandhan Pramanik
Hi!
So... I'm afraid subject is pretty accurate. I assume there's actual
human being called "Sasha Levin" somewhere, but I interact with him
via email, and while some interactions may be by human, some are
written by LLM but not clearly marked as such.
And that's not okay -- because LLMs lie, have no ethics, and no
memory, so there's no point arguing with them. Its just wasting
everyone's time. People are not very thrilled by 'Markus Elfring' on
the lists, as he seems to ignore feedback, but at least that's actual
human, not a damn LLM that interacts as human but then ignores
everything.
Do we need bot rules on the list?
Oh, and if you find my email offensive, feel free to ask LLM to change
the tone.
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 15f77764e90a713ee3916ca424757688e4f565b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072855-spongy-yodel-58e6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15f77764e90a713ee3916ca424757688e4f565b9 Mon Sep 17 00:00:00 2001
From: "Lin.Cao" <lincao12(a)amd.com>
Date: Thu, 17 Jul 2025 16:44:53 +0800
Subject: [PATCH] drm/sched: Remove optimization that causes hang when killing
dependent jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.
When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.
Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable(a)vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao <lincao12(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e671aa241720..ac678de7fe5e 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_destroy);
-/* drm_sched_entity_clear_dep - callback to clear the entities dependency */
-static void drm_sched_entity_clear_dep(struct dma_fence *f,
- struct dma_fence_cb *cb)
-{
- struct drm_sched_entity *entity =
- container_of(cb, struct drm_sched_entity, cb);
-
- entity->dependency = NULL;
- dma_fence_put(f);
-}
-
/*
* drm_sched_entity_wakeup - callback to clear the entity's dependency and
* wake up the scheduler
@@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
- drm_sched_entity_clear_dep(f, cb);
+ entity->dependency = NULL;
+ dma_fence_put(f);
drm_sched_wakeup(entity->rq->sched);
}
@@ -429,13 +419,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
entity->dependency = fence;
- if (!dma_fence_add_callback(fence, &entity->cb,
- drm_sched_entity_clear_dep))
- return true;
-
- /* Ignore it when it is already scheduled */
- dma_fence_put(fence);
- return false;
}
if (!dma_fence_add_callback(entity->dependency, &entity->cb,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d42e6c20de6192f8e4ab4cf10be8c694ef27e8cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072811-pension-lyrics-6070@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d42e6c20de6192f8e4ab4cf10be8c694ef27e8cb Mon Sep 17 00:00:00 2001
From: Ada Couprie Diaz <ada.coupriediaz(a)arm.com>
Date: Fri, 18 Jul 2025 15:28:14 +0100
Subject: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(),
call_on_irq_stack()
`cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
to different stacks along with the Shadow Call Stack if it is enabled.
Those two stack changes cannot be done atomically and both functions
can be interrupted by SErrors or Debug Exceptions which, though unlikely,
is very much broken : if interrupted, we can end up with mismatched stacks
and Shadow Call Stack leading to clobbered stacks.
In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
but x18 stills points to the old task's SCS. When the interrupt handler
tries to save the task's SCS pointer, it will save the old task
SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
clobbering it.
In `call_on_irq_stack()`, it can happen when switching from the task stack
to the IRQ stack and when switching back. In both cases, we can be
interrupted when the SCS pointer points to the IRQ SCS, but SP points to
the task stack. The nested interrupt handler pushes its return addresses
on the IRQ SCS. It then detects that SP points to the task stack,
calls `call_on_irq_stack()` and clobbers the task SCS pointer with
the IRQ SCS pointer, which it will also use !
This leads to tasks returning to addresses on the wrong SCS,
or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
or FPAC if enabled.
This is possible on a default config, but unlikely.
However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
instead the GIC is responsible for filtering what interrupts the CPU
should receive based on priority.
Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
frequently depending on the system configuration and workload, leading
to unpredictable kernel panics.
Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
Do the same in `call_on_irq_stack()`, but restore and mask around
the branch.
Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
of behaviour between all configurations.
Introduce and use an assembly macro for saving and masking DAIF,
as the existing one saves but only masks IF.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz(a)arm.com>
Reported-by: Cristian Prundeanu <cpru(a)amazon.com>
Fixes: 59b37fe52f49 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
Tested-by: Cristian Prundeanu <cpru(a)amazon.com>
Acked-by: Will Deacon <will(a)kernel.org>
Link: https://lore.kernel.org/r/20250718142814.133329-1-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index ad63457a05c5..c56c21bb1eec 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -41,6 +41,11 @@
/*
* Save/restore interrupts.
*/
+ .macro save_and_disable_daif, flags
+ mrs \flags, daif
+ msr daifset, #0xf
+ .endm
+
.macro save_and_disable_irq, flags
mrs \flags, daif
msr daifset, #3
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5ae2a34b50bd..30dcb719685b 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -825,6 +825,7 @@ SYM_CODE_END(__bp_harden_el1_vectors)
*
*/
SYM_FUNC_START(cpu_switch_to)
+ save_and_disable_daif x11
mov x10, #THREAD_CPU_CONTEXT
add x8, x0, x10
mov x9, sp
@@ -848,6 +849,7 @@ SYM_FUNC_START(cpu_switch_to)
ptrauth_keys_install_kernel x1, x8, x9, x10
scs_save x0
scs_load_current
+ restore_irq x11
ret
SYM_FUNC_END(cpu_switch_to)
NOKPROBE(cpu_switch_to)
@@ -874,6 +876,7 @@ NOKPROBE(ret_from_fork)
* Calls func(regs) using this CPU's irq stack and shadow irq stack.
*/
SYM_FUNC_START(call_on_irq_stack)
+ save_and_disable_daif x9
#ifdef CONFIG_SHADOW_CALL_STACK
get_current_task x16
scs_save x16
@@ -888,8 +891,10 @@ SYM_FUNC_START(call_on_irq_stack)
/* Move to the new stack and call the function there */
add sp, x16, #IRQ_STACK_SIZE
+ restore_irq x9
blr x1
+ save_and_disable_daif x9
/*
* Restore the SP from the FP, and restore the FP and LR from the frame
* record.
@@ -897,6 +902,7 @@ SYM_FUNC_START(call_on_irq_stack)
mov sp, x29
ldp x29, x30, [sp], #16
scs_load_current
+ restore_irq x9
ret
SYM_FUNC_END(call_on_irq_stack)
NOKPROBE(call_on_irq_stack)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d42e6c20de6192f8e4ab4cf10be8c694ef27e8cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072817-deeply-galleria-d758@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d42e6c20de6192f8e4ab4cf10be8c694ef27e8cb Mon Sep 17 00:00:00 2001
From: Ada Couprie Diaz <ada.coupriediaz(a)arm.com>
Date: Fri, 18 Jul 2025 15:28:14 +0100
Subject: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(),
call_on_irq_stack()
`cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
to different stacks along with the Shadow Call Stack if it is enabled.
Those two stack changes cannot be done atomically and both functions
can be interrupted by SErrors or Debug Exceptions which, though unlikely,
is very much broken : if interrupted, we can end up with mismatched stacks
and Shadow Call Stack leading to clobbered stacks.
In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
but x18 stills points to the old task's SCS. When the interrupt handler
tries to save the task's SCS pointer, it will save the old task
SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
clobbering it.
In `call_on_irq_stack()`, it can happen when switching from the task stack
to the IRQ stack and when switching back. In both cases, we can be
interrupted when the SCS pointer points to the IRQ SCS, but SP points to
the task stack. The nested interrupt handler pushes its return addresses
on the IRQ SCS. It then detects that SP points to the task stack,
calls `call_on_irq_stack()` and clobbers the task SCS pointer with
the IRQ SCS pointer, which it will also use !
This leads to tasks returning to addresses on the wrong SCS,
or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
or FPAC if enabled.
This is possible on a default config, but unlikely.
However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
instead the GIC is responsible for filtering what interrupts the CPU
should receive based on priority.
Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
frequently depending on the system configuration and workload, leading
to unpredictable kernel panics.
Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
Do the same in `call_on_irq_stack()`, but restore and mask around
the branch.
Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
of behaviour between all configurations.
Introduce and use an assembly macro for saving and masking DAIF,
as the existing one saves but only masks IF.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz(a)arm.com>
Reported-by: Cristian Prundeanu <cpru(a)amazon.com>
Fixes: 59b37fe52f49 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
Tested-by: Cristian Prundeanu <cpru(a)amazon.com>
Acked-by: Will Deacon <will(a)kernel.org>
Link: https://lore.kernel.org/r/20250718142814.133329-1-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index ad63457a05c5..c56c21bb1eec 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -41,6 +41,11 @@
/*
* Save/restore interrupts.
*/
+ .macro save_and_disable_daif, flags
+ mrs \flags, daif
+ msr daifset, #0xf
+ .endm
+
.macro save_and_disable_irq, flags
mrs \flags, daif
msr daifset, #3
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5ae2a34b50bd..30dcb719685b 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -825,6 +825,7 @@ SYM_CODE_END(__bp_harden_el1_vectors)
*
*/
SYM_FUNC_START(cpu_switch_to)
+ save_and_disable_daif x11
mov x10, #THREAD_CPU_CONTEXT
add x8, x0, x10
mov x9, sp
@@ -848,6 +849,7 @@ SYM_FUNC_START(cpu_switch_to)
ptrauth_keys_install_kernel x1, x8, x9, x10
scs_save x0
scs_load_current
+ restore_irq x11
ret
SYM_FUNC_END(cpu_switch_to)
NOKPROBE(cpu_switch_to)
@@ -874,6 +876,7 @@ NOKPROBE(ret_from_fork)
* Calls func(regs) using this CPU's irq stack and shadow irq stack.
*/
SYM_FUNC_START(call_on_irq_stack)
+ save_and_disable_daif x9
#ifdef CONFIG_SHADOW_CALL_STACK
get_current_task x16
scs_save x16
@@ -888,8 +891,10 @@ SYM_FUNC_START(call_on_irq_stack)
/* Move to the new stack and call the function there */
add sp, x16, #IRQ_STACK_SIZE
+ restore_irq x9
blr x1
+ save_and_disable_daif x9
/*
* Restore the SP from the FP, and restore the FP and LR from the frame
* record.
@@ -897,6 +902,7 @@ SYM_FUNC_START(call_on_irq_stack)
mov sp, x29
ldp x29, x30, [sp], #16
scs_load_current
+ restore_irq x9
ret
SYM_FUNC_END(call_on_irq_stack)
NOKPROBE(call_on_irq_stack)
Revert kvec msg iterator before trying to process a TLS alert
when possible.
In nvmet_tcp_try_recv_data(), it's assumed that no msg control
message buffer is set prior to sock_recvmsg(). If no control
message structure is setup, kTLS layer will read and process
TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, we setup a kvec
backed control buffer and read in the control message such as
a TLS alert. Msg can advance the kvec pointer as a part of the
copy process thus we need to revert the iterator before calling
into the tls_alert_recv.
Fixes: a1c5dd8355b1 ("nvmet-tcp: control messages for recvmsg()")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
drivers/nvme/target/tcp.c | 48 +++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 688033b88d38..cf3336ddc9a3 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1161,6 +1161,7 @@ static int nvmet_tcp_try_recv_pdu(struct nvmet_tcp_queue *queue)
if (unlikely(len < 0))
return len;
if (queue->tls_pskid) {
+ iov_iter_revert(&msg.msg_iter, len);
ret = nvmet_tcp_tls_record_ok(queue, &msg, cbuf);
if (ret < 0)
return ret;
@@ -1218,18 +1219,47 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
{
struct nvmet_tcp_cmd *cmd = queue->cmd;
int len, ret;
+ union {
+ struct cmsghdr cmsg;
+ u8 buf[CMSG_SPACE(sizeof(u8))];
+ } u;
+ u8 alert[2];
+ struct kvec alert_kvec = {
+ .iov_base = alert,
+ .iov_len = sizeof(alert),
+ };
+ struct msghdr msg = {
+ .msg_control = &u,
+ .msg_controllen = sizeof(u),
+ };
while (msg_data_left(&cmd->recv_msg)) {
+ /* assumed that cmg->recv_msg's control buffer is not setup
+ */
+ WARN_ON_ONCE(cmd->recv_msg.msg_controllen > 0);
+
len = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg,
cmd->recv_msg.msg_flags);
+ if (cmd->recv_msg.msg_flags & MSG_CTRUNC) {
+ cmd->recv_msg.msg_flags &= ~(MSG_CTRUNC | MSG_EOF);
+ if (len == 0 || len == -EIO) {
+ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec,
+ 1, alert_kvec.iov_len);
+ len = sock_recvmsg(cmd->queue->sock, &msg,
+ MSG_DONTWAIT);
+ if (len > 0 &&
+ tls_get_record_type(cmd->queue->sock->sk,
+ &u.cmsg) == TLS_RECORD_TYPE_ALERT) {
+ iov_iter_revert(&msg.msg_iter, len);
+ ret = nvmet_tcp_tls_record_ok(cmd->queue,
+ &msg, u.buf);
+ if (ret < 0)
+ return ret;
+ }
+ }
+ }
if (len <= 0)
return len;
- if (queue->tls_pskid) {
- ret = nvmet_tcp_tls_record_ok(cmd->queue,
- &cmd->recv_msg, cmd->recv_cbuf);
- if (ret < 0)
- return ret;
- }
cmd->pdu_recv += len;
cmd->rbytes_done += len;
@@ -1267,6 +1297,7 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
if (unlikely(len < 0))
return len;
if (queue->tls_pskid) {
+ iov_iter_revert(&msg.msg_iter, len);
ret = nvmet_tcp_tls_record_ok(queue, &msg, cbuf);
if (ret < 0)
return ret;
@@ -1453,10 +1484,6 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue,
if (!c->r2t_pdu)
goto out_free_data;
- if (queue->state == NVMET_TCP_Q_TLS_HANDSHAKE) {
- c->recv_msg.msg_control = c->recv_cbuf;
- c->recv_msg.msg_controllen = sizeof(c->recv_cbuf);
- }
c->recv_msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
list_add_tail(&c->entry, &queue->free_list);
@@ -1736,6 +1763,7 @@ static int nvmet_tcp_try_peek_pdu(struct nvmet_tcp_queue *queue)
return len;
}
+ iov_iter_revert(&msg.msg_iter, len);
ret = nvmet_tcp_tls_record_ok(queue, &msg, cbuf);
if (ret < 0)
return ret;
--
2.47.1
A security exploit was discovered in NFS over TLS in tls_alert_recv
due to its assumption that there is valid data in the msghdr's
iterator's kvec.
Instead, this patch proposes the rework how control messages are
setup and used by sock_recvmsg().
If no control message structure is setup, kTLS layer will read and
process TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, NFS can setup a kvec
backed control buffer and read in the control message such as a TLS
alert. Msg iterator can advance the kvec pointer as a part of the copy
process thus we need to revert the iterator before calling into the
tls_alert_recv.
Fixes: dea034b963c8 ("SUNRPC: Capture CMSG metadata on client-side receive")
Suggested-by: Trond Myklebust <trondmy(a)hammerspace.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
net/sunrpc/xprtsock.c | 40 ++++++++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 04ff66758fc3..c5f7bbf5775f 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -358,7 +358,7 @@ xs_alloc_sparse_pages(struct xdr_buf *buf, size_t want, gfp_t gfp)
static int
xs_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
- struct cmsghdr *cmsg, int ret)
+ unsigned int *msg_flags, struct cmsghdr *cmsg, int ret)
{
u8 content_type = tls_get_record_type(sock->sk, cmsg);
u8 level, description;
@@ -371,7 +371,7 @@ xs_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
* record, even though there might be more frames
* waiting to be decrypted.
*/
- msg->msg_flags &= ~MSG_EOR;
+ *msg_flags &= ~MSG_EOR;
break;
case TLS_RECORD_TYPE_ALERT:
tls_alert_recv(sock->sk, msg, &level, &description);
@@ -386,19 +386,33 @@ xs_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
}
static int
-xs_sock_recv_cmsg(struct socket *sock, struct msghdr *msg, int flags)
+xs_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags, int flags)
{
union {
struct cmsghdr cmsg;
u8 buf[CMSG_SPACE(sizeof(u8))];
} u;
+ u8 alert[2];
+ struct kvec alert_kvec = {
+ .iov_base = alert,
+ .iov_len = sizeof(alert),
+ };
+ struct msghdr msg = {
+ .msg_flags = *msg_flags,
+ .msg_control = &u,
+ .msg_controllen = sizeof(u),
+ };
int ret;
- msg->msg_control = &u;
- msg->msg_controllen = sizeof(u);
- ret = sock_recvmsg(sock, msg, flags);
- if (msg->msg_controllen != sizeof(u))
- ret = xs_sock_process_cmsg(sock, msg, &u.cmsg, ret);
+ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1,
+ alert_kvec.iov_len);
+ ret = sock_recvmsg(sock, &msg, flags);
+ if (ret > 0 &&
+ tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) {
+ iov_iter_revert(&msg.msg_iter, ret);
+ ret = xs_sock_process_cmsg(sock, &msg, msg_flags, &u.cmsg,
+ -EAGAIN);
+ }
return ret;
}
@@ -408,7 +422,13 @@ xs_sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags, size_t seek)
ssize_t ret;
if (seek != 0)
iov_iter_advance(&msg->msg_iter, seek);
- ret = xs_sock_recv_cmsg(sock, msg, flags);
+ ret = sock_recvmsg(sock, msg, flags);
+ /* Handle TLS inband control message lazily */
+ if (msg->msg_flags & MSG_CTRUNC) {
+ msg->msg_flags &= ~(MSG_CTRUNC | MSG_EOR);
+ if (ret == 0 || ret == -EIO)
+ ret = xs_sock_recv_cmsg(sock, &msg->msg_flags, flags);
+ }
return ret > 0 ? ret + seek : ret;
}
@@ -434,7 +454,7 @@ xs_read_discard(struct socket *sock, struct msghdr *msg, int flags,
size_t count)
{
iov_iter_discard(&msg->msg_iter, ITER_DEST, count);
- return xs_sock_recv_cmsg(sock, msg, flags);
+ return xs_sock_recvmsg(sock, msg, flags, 0);
}
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
--
2.47.1
Scott Mayhew discovered a security exploit in NFS over TLS in
tls_alert_recv() due to its assumption it can read data from
the msg iterator's kvec..
kTLS implementation splits TLS non-data record payload between
the control message buffer (which includes the type such as TLS
aler or TLS cipher change) and the rest of the payload (say TLS
alert's level/description) which goes into the msg payload buffer.
This patch proposes to rework how control messages are setup and
used by sock_recvmsg().
If no control message structure is setup, kTLS layer will read and
process TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, NFS can setup a
kvec backed msg buffer and read in the control message such as a
TLS alert. Msg iterator can advance the kvec pointer as a part of
the copy process thus we need to revert the iterator before calling
into the tls_alert_recv.
Reported-by: Scott Mayhew <smayhew(a)redhat.com>
Fixes: 5e052dda121e ("SUNRPC: Recognize control messages in server-side TCP socket code")
Suggested-by: Trond Myklebust <trondmy(a)hammerspace.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
net/sunrpc/svcsock.c | 43 +++++++++++++++++++++++++++++++++++--------
1 file changed, 35 insertions(+), 8 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 46c156b121db..e2c5e0e626f9 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -257,20 +257,47 @@ svc_tcp_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
}
static int
-svc_tcp_sock_recv_cmsg(struct svc_sock *svsk, struct msghdr *msg)
+svc_tcp_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags)
{
union {
struct cmsghdr cmsg;
u8 buf[CMSG_SPACE(sizeof(u8))];
} u;
- struct socket *sock = svsk->sk_sock;
+ u8 alert[2];
+ struct kvec alert_kvec = {
+ .iov_base = alert,
+ .iov_len = sizeof(alert),
+ };
+ struct msghdr msg = {
+ .msg_flags = *msg_flags,
+ .msg_control = &u,
+ .msg_controllen = sizeof(u),
+ };
+ int ret;
+
+ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1,
+ alert_kvec.iov_len);
+ ret = sock_recvmsg(sock, &msg, MSG_DONTWAIT);
+ if (ret > 0 &&
+ tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) {
+ iov_iter_revert(&msg.msg_iter, ret);
+ ret = svc_tcp_sock_process_cmsg(sock, &msg, &u.cmsg, -EAGAIN);
+ }
+ return ret;
+}
+
+static int
+svc_tcp_sock_recvmsg(struct svc_sock *svsk, struct msghdr *msg)
+{
int ret;
+ struct socket *sock = svsk->sk_sock;
- msg->msg_control = &u;
- msg->msg_controllen = sizeof(u);
ret = sock_recvmsg(sock, msg, MSG_DONTWAIT);
- if (unlikely(msg->msg_controllen != sizeof(u)))
- ret = svc_tcp_sock_process_cmsg(sock, msg, &u.cmsg, ret);
+ if (msg->msg_flags & MSG_CTRUNC) {
+ msg->msg_flags &= ~(MSG_CTRUNC | MSG_EOR);
+ if (ret == 0 || ret == -EIO)
+ ret = svc_tcp_sock_recv_cmsg(sock, &msg->msg_flags);
+ }
return ret;
}
@@ -321,7 +348,7 @@ static ssize_t svc_tcp_read_msg(struct svc_rqst *rqstp, size_t buflen,
iov_iter_advance(&msg.msg_iter, seek);
buflen -= seek;
}
- len = svc_tcp_sock_recv_cmsg(svsk, &msg);
+ len = svc_tcp_sock_recvmsg(svsk, &msg);
if (len > 0)
svc_flush_bvec(bvec, len, seek);
@@ -1018,7 +1045,7 @@ static ssize_t svc_tcp_read_marker(struct svc_sock *svsk,
iov.iov_base = ((char *)&svsk->sk_marker) + svsk->sk_tcplen;
iov.iov_len = want;
iov_iter_kvec(&msg.msg_iter, ITER_DEST, &iov, 1, want);
- len = svc_tcp_sock_recv_cmsg(svsk, &msg);
+ len = svc_tcp_sock_recvmsg(svsk, &msg);
if (len < 0)
return len;
svsk->sk_tcplen += len;
--
2.47.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 15f77764e90a713ee3916ca424757688e4f565b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072854-humbling-collector-aca6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15f77764e90a713ee3916ca424757688e4f565b9 Mon Sep 17 00:00:00 2001
From: "Lin.Cao" <lincao12(a)amd.com>
Date: Thu, 17 Jul 2025 16:44:53 +0800
Subject: [PATCH] drm/sched: Remove optimization that causes hang when killing
dependent jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.
When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.
Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable(a)vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao <lincao12(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e671aa241720..ac678de7fe5e 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_destroy);
-/* drm_sched_entity_clear_dep - callback to clear the entities dependency */
-static void drm_sched_entity_clear_dep(struct dma_fence *f,
- struct dma_fence_cb *cb)
-{
- struct drm_sched_entity *entity =
- container_of(cb, struct drm_sched_entity, cb);
-
- entity->dependency = NULL;
- dma_fence_put(f);
-}
-
/*
* drm_sched_entity_wakeup - callback to clear the entity's dependency and
* wake up the scheduler
@@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
- drm_sched_entity_clear_dep(f, cb);
+ entity->dependency = NULL;
+ dma_fence_put(f);
drm_sched_wakeup(entity->rq->sched);
}
@@ -429,13 +419,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
entity->dependency = fence;
- if (!dma_fence_add_callback(fence, &entity->cb,
- drm_sched_entity_clear_dep))
- return true;
-
- /* Ignore it when it is already scheduled */
- dma_fence_put(fence);
- return false;
}
if (!dma_fence_add_callback(entity->dependency, &entity->cb,
A security exploit was discovered in NFS over TLS in tls_alert_recv
due to its assumption that there is valid data in the msghdr's
iterator's kvec.
Instead, this patch proposes the rework how control messages are
setup and used by sock_recvmsg().
If no control message structure is setup, kTLS layer will read and
process TLS data record types. As soon as it encounters a TLS control
message, it would return an error. At that point, NFS can setup a kvec
backed control buffer and read in the control message such as a TLS
alert. Msg iterator can advance the kvec pointer as a part of the copy
process thus we need to revert the iterator before calling into the
tls_alert_recv.
Fixes: dea034b963c8 ("SUNRPC: Capture CMSG metadata on client-side receive")
Suggested-by: Trond Myklebust <trondmy(a)hammerspace.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
net/sunrpc/xprtsock.c | 40 ++++++++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 04ff66758fc3..c5f7bbf5775f 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -358,7 +358,7 @@ xs_alloc_sparse_pages(struct xdr_buf *buf, size_t want, gfp_t gfp)
static int
xs_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
- struct cmsghdr *cmsg, int ret)
+ unsigned int *msg_flags, struct cmsghdr *cmsg, int ret)
{
u8 content_type = tls_get_record_type(sock->sk, cmsg);
u8 level, description;
@@ -371,7 +371,7 @@ xs_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
* record, even though there might be more frames
* waiting to be decrypted.
*/
- msg->msg_flags &= ~MSG_EOR;
+ *msg_flags &= ~MSG_EOR;
break;
case TLS_RECORD_TYPE_ALERT:
tls_alert_recv(sock->sk, msg, &level, &description);
@@ -386,19 +386,33 @@ xs_sock_process_cmsg(struct socket *sock, struct msghdr *msg,
}
static int
-xs_sock_recv_cmsg(struct socket *sock, struct msghdr *msg, int flags)
+xs_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags, int flags)
{
union {
struct cmsghdr cmsg;
u8 buf[CMSG_SPACE(sizeof(u8))];
} u;
+ u8 alert[2];
+ struct kvec alert_kvec = {
+ .iov_base = alert,
+ .iov_len = sizeof(alert),
+ };
+ struct msghdr msg = {
+ .msg_flags = *msg_flags,
+ .msg_control = &u,
+ .msg_controllen = sizeof(u),
+ };
int ret;
- msg->msg_control = &u;
- msg->msg_controllen = sizeof(u);
- ret = sock_recvmsg(sock, msg, flags);
- if (msg->msg_controllen != sizeof(u))
- ret = xs_sock_process_cmsg(sock, msg, &u.cmsg, ret);
+ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1,
+ alert_kvec.iov_len);
+ ret = sock_recvmsg(sock, &msg, flags);
+ if (ret > 0 &&
+ tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) {
+ iov_iter_revert(&msg.msg_iter, ret);
+ ret = xs_sock_process_cmsg(sock, &msg, msg_flags, &u.cmsg,
+ -EAGAIN);
+ }
return ret;
}
@@ -408,7 +422,13 @@ xs_sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags, size_t seek)
ssize_t ret;
if (seek != 0)
iov_iter_advance(&msg->msg_iter, seek);
- ret = xs_sock_recv_cmsg(sock, msg, flags);
+ ret = sock_recvmsg(sock, msg, flags);
+ /* Handle TLS inband control message lazily */
+ if (msg->msg_flags & MSG_CTRUNC) {
+ msg->msg_flags &= ~(MSG_CTRUNC | MSG_EOR);
+ if (ret == 0 || ret == -EIO)
+ ret = xs_sock_recv_cmsg(sock, &msg->msg_flags, flags);
+ }
return ret > 0 ? ret + seek : ret;
}
@@ -434,7 +454,7 @@ xs_read_discard(struct socket *sock, struct msghdr *msg, int flags,
size_t count)
{
iov_iter_discard(&msg->msg_iter, ITER_DEST, count);
- return xs_sock_recv_cmsg(sock, msg, flags);
+ return xs_sock_recvmsg(sock, msg, flags, 0);
}
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
--
2.47.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 15f77764e90a713ee3916ca424757688e4f565b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072854-backlog-freeness-4f50@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15f77764e90a713ee3916ca424757688e4f565b9 Mon Sep 17 00:00:00 2001
From: "Lin.Cao" <lincao12(a)amd.com>
Date: Thu, 17 Jul 2025 16:44:53 +0800
Subject: [PATCH] drm/sched: Remove optimization that causes hang when killing
dependent jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.
When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.
Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable(a)vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao <lincao12(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e671aa241720..ac678de7fe5e 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_destroy);
-/* drm_sched_entity_clear_dep - callback to clear the entities dependency */
-static void drm_sched_entity_clear_dep(struct dma_fence *f,
- struct dma_fence_cb *cb)
-{
- struct drm_sched_entity *entity =
- container_of(cb, struct drm_sched_entity, cb);
-
- entity->dependency = NULL;
- dma_fence_put(f);
-}
-
/*
* drm_sched_entity_wakeup - callback to clear the entity's dependency and
* wake up the scheduler
@@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
- drm_sched_entity_clear_dep(f, cb);
+ entity->dependency = NULL;
+ dma_fence_put(f);
drm_sched_wakeup(entity->rq->sched);
}
@@ -429,13 +419,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
entity->dependency = fence;
- if (!dma_fence_add_callback(fence, &entity->cb,
- drm_sched_entity_clear_dep))
- return true;
-
- /* Ignore it when it is already scheduled */
- dma_fence_put(fence);
- return false;
}
if (!dma_fence_add_callback(entity->dependency, &entity->cb,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 15f77764e90a713ee3916ca424757688e4f565b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072853-startup-unwomanly-7e1c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15f77764e90a713ee3916ca424757688e4f565b9 Mon Sep 17 00:00:00 2001
From: "Lin.Cao" <lincao12(a)amd.com>
Date: Thu, 17 Jul 2025 16:44:53 +0800
Subject: [PATCH] drm/sched: Remove optimization that causes hang when killing
dependent jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.
When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.
Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable(a)vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao <lincao12(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e671aa241720..ac678de7fe5e 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_destroy);
-/* drm_sched_entity_clear_dep - callback to clear the entities dependency */
-static void drm_sched_entity_clear_dep(struct dma_fence *f,
- struct dma_fence_cb *cb)
-{
- struct drm_sched_entity *entity =
- container_of(cb, struct drm_sched_entity, cb);
-
- entity->dependency = NULL;
- dma_fence_put(f);
-}
-
/*
* drm_sched_entity_wakeup - callback to clear the entity's dependency and
* wake up the scheduler
@@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
- drm_sched_entity_clear_dep(f, cb);
+ entity->dependency = NULL;
+ dma_fence_put(f);
drm_sched_wakeup(entity->rq->sched);
}
@@ -429,13 +419,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
entity->dependency = fence;
- if (!dma_fence_add_callback(fence, &entity->cb,
- drm_sched_entity_clear_dep))
- return true;
-
- /* Ignore it when it is already scheduled */
- dma_fence_put(fence);
- return false;
}
if (!dma_fence_add_callback(entity->dependency, &entity->cb,
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 15f77764e90a713ee3916ca424757688e4f565b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072852-shrivel-theorize-77f8@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15f77764e90a713ee3916ca424757688e4f565b9 Mon Sep 17 00:00:00 2001
From: "Lin.Cao" <lincao12(a)amd.com>
Date: Thu, 17 Jul 2025 16:44:53 +0800
Subject: [PATCH] drm/sched: Remove optimization that causes hang when killing
dependent jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.
When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.
Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable(a)vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao <lincao12(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e671aa241720..ac678de7fe5e 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_destroy);
-/* drm_sched_entity_clear_dep - callback to clear the entities dependency */
-static void drm_sched_entity_clear_dep(struct dma_fence *f,
- struct dma_fence_cb *cb)
-{
- struct drm_sched_entity *entity =
- container_of(cb, struct drm_sched_entity, cb);
-
- entity->dependency = NULL;
- dma_fence_put(f);
-}
-
/*
* drm_sched_entity_wakeup - callback to clear the entity's dependency and
* wake up the scheduler
@@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
- drm_sched_entity_clear_dep(f, cb);
+ entity->dependency = NULL;
+ dma_fence_put(f);
drm_sched_wakeup(entity->rq->sched);
}
@@ -429,13 +419,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
entity->dependency = fence;
- if (!dma_fence_add_callback(fence, &entity->cb,
- drm_sched_entity_clear_dep))
- return true;
-
- /* Ignore it when it is already scheduled */
- dma_fence_put(fence);
- return false;
}
if (!dma_fence_add_callback(entity->dependency, &entity->cb,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 694d6b99923eb05a8fd188be44e26077d19f0e21
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072811-ethanol-arbitrary-a664@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 694d6b99923eb05a8fd188be44e26077d19f0e21 Mon Sep 17 00:00:00 2001
From: Harry Yoo <harry.yoo(a)oracle.com>
Date: Fri, 4 Jul 2025 19:30:53 +0900
Subject: [PATCH] mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
Commit 48b4800a1c6a ("zsmalloc: page migration support") added support for
migrating zsmalloc pages using the movable_operations migration framework.
However, the commit did not take into account that zsmalloc supports
migration only when CONFIG_COMPACTION is enabled. Tracing shows that
zsmalloc was still passing the __GFP_MOVABLE flag even when compaction is
not supported.
This can result in unmovable pages being allocated from movable page
blocks (even without stealing page blocks), ZONE_MOVABLE and CMA area.
Possible user visible effects:
- Some ZONE_MOVABLE memory can be not actually movable
- CMA allocation can fail because of this
- Increased memory fragmentation due to ignoring the page mobility
grouping feature
I'm not really sure who uses kernels without compaction support, though :(
To fix this, clear the __GFP_MOVABLE flag when
!IS_ENABLED(CONFIG_COMPACTION).
Link: https://lkml.kernel.org/r/20250704103053.6913-1-harry.yoo@oracle.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 999b513c7fdf..f3e2215f95eb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1043,6 +1043,9 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
if (!zspage)
return NULL;
+ if (!IS_ENABLED(CONFIG_COMPACTION))
+ gfp &= ~__GFP_MOVABLE;
+
zspage->magic = ZSPAGE_MAGIC;
zspage->pool = pool;
zspage->class = class->index;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 694d6b99923eb05a8fd188be44e26077d19f0e21
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072810-footgear-grumpily-4fd5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 694d6b99923eb05a8fd188be44e26077d19f0e21 Mon Sep 17 00:00:00 2001
From: Harry Yoo <harry.yoo(a)oracle.com>
Date: Fri, 4 Jul 2025 19:30:53 +0900
Subject: [PATCH] mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
Commit 48b4800a1c6a ("zsmalloc: page migration support") added support for
migrating zsmalloc pages using the movable_operations migration framework.
However, the commit did not take into account that zsmalloc supports
migration only when CONFIG_COMPACTION is enabled. Tracing shows that
zsmalloc was still passing the __GFP_MOVABLE flag even when compaction is
not supported.
This can result in unmovable pages being allocated from movable page
blocks (even without stealing page blocks), ZONE_MOVABLE and CMA area.
Possible user visible effects:
- Some ZONE_MOVABLE memory can be not actually movable
- CMA allocation can fail because of this
- Increased memory fragmentation due to ignoring the page mobility
grouping feature
I'm not really sure who uses kernels without compaction support, though :(
To fix this, clear the __GFP_MOVABLE flag when
!IS_ENABLED(CONFIG_COMPACTION).
Link: https://lkml.kernel.org/r/20250704103053.6913-1-harry.yoo@oracle.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 999b513c7fdf..f3e2215f95eb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1043,6 +1043,9 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
if (!zspage)
return NULL;
+ if (!IS_ENABLED(CONFIG_COMPACTION))
+ gfp &= ~__GFP_MOVABLE;
+
zspage->magic = ZSPAGE_MAGIC;
zspage->pool = pool;
zspage->class = class->index;
By inducing delays in the right places, Jann Horn created a reproducer
for a hard to hit UAF issue that became possible after VMAs were allowed
to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and
it can be recycled by another process. vma_start_read() then
increments the vma->vm_refcnt (if it is in an acceptable range), and
if this succeeds, vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA
through vma_end_read(), which calls vma_refcount_put().
vma_refcount_put() drops the refcount and then calls rcuwait_wake_up()
using a copy of vma->vm_mm. This is wrong: It implicitly assumes that
the caller is keeping the VMA's mm alive, but in this scenario the caller
has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount
was already dropped by T1. At this point T3 can exit and free the mm
causing UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
---
Changes since v2 [1]
- Addressed Lorenzo's nits, per Lorenzo Stoakes
- Added a warning comment for vma_start_read()
- Added Reviewed-by and Acked-by, per Vlastimil Babka and Lorenzo Stoakes
Notes:
- Applies cleanly over mm-unstable after reverting previous version of
this patch [1].
- Should be applied to 6.15 and 6.16 but these branches do not
have lock_next_vma() function, so the change in lock_next_vma() should be
skipped when applying to those branches.
[1] https://lore.kernel.org/all/20250728175355.2282375-1-surenb@google.com/
include/linux/mmap_lock.h | 30 ++++++++++++++++++++++++++++++
mm/mmap_lock.c | 10 +++-------
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 1f4f44951abe..11a078de9150 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w);
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -154,6 +155,10 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 729fb7d0dd59..b006cec8e6fe 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
--
2.50.1.487.gc89ff58d15-goog
On Wed, Jul 23, 2025 at 01:47:58PM +0000, Zhivich, Michael wrote:
>
> On 7/22/25, 12:56, "Greg Kroah-Hartman" <gregkh(a)linuxfoundation.org> wrote:
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
> On Tue, Jul 22, 2025 at 04:22:54PM +0200, Borislav Petkov wrote:
> > On Tue, Jul 22, 2025 at 08:28:44AM -0400, Michael Zhivich wrote:
> > > For kernels compiled with CONFIG_INIT_STACK_NONE=y, the value of __reserved
> > > field in zen_patch_rev union on the stack may be garbage. If so, it will
> > > prevent correct microcode check when consulting p.ucode_rev, resulting in
> > > incorrect mitigation selection.
> >
> > "This is a stable-only fix." so that the AI is happy. :-P
> >
> > > Cc: <stable(a)vger.kernel.org>
> > > Signed-off-by: Michael Zhivich <mzhivich(a)akamai.com>
> >
> > Acked-by: Borislav Petkov (AMD) <bp(a)alien8.de>
> >
> > > Fixes: 7a0395f6607a5 ("x86/bugs: Add a Transient Scheduler Attacks mitigation")
> >
> > That commit in Fixes: is the 6.12 stable one.
> >
> > The 6.6 one is:
> >
> > Fixes: 90293047df18 ("x86/bugs: Add a Transient Scheduler Attacks mitigation")
> >
> > The 6.1 is:
> >
> > Fixes: d12145e8454f ("x86/bugs: Add a Transient Scheduler Attacks mitigation")
> >
> > The 5.15 one:
> >
> > Fixes: f2b75f1368af ("x86/bugs: Add a Transient Scheduler Attacks mitigation")
> >
> > and the 5.10 one is
> >
> > Fixes: 78192f511f40 ("x86/bugs: Add a Transient Scheduler Attacks mitigation")
> >
> > and since all stable kernels above have INIT_STACK_NONE, that same
> > one-liner should be applied to all of them.
> >
> > Greg, I'm thinking this one-liner should apply to all of the above with
> > some fuzz. Can you simply add it to each stable version with a different
> > Fixes: tag each?
> >
> > Or do you prefer separate submissions?
>
> Ideally, separate submissions, otherwise I have to do this all by hand
> :(
>
> thanks
>
> greg k-h
>
> Apologies for the barrage of e-mails; I managed to mess up the subject line on a couple, so I’ve resent them with correct subject lines.
> There’s now a submission per stable branch with appropriate patch and fixes tags.
Ok, I think I got them all figured out, thanks!
greg k-h
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x fdf0f60a2bb02ba581d9e71d583e69dd0714a521
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072855-ligament-reliable-75fa@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fdf0f60a2bb02ba581d9e71d583e69dd0714a521 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Tue, 15 Jul 2025 20:43:29 +0200
Subject: [PATCH] selftests: mptcp: connect: also cover checksum
The checksum mode has been added a while ago, but it is only validated
when manually launching mptcp_connect.sh with "-C".
The different CIs were then not validating these MPTCP Connect tests
with checksum enabled. To make sure they do, add a new test program
executing mptcp_connect.sh with the checksum mode.
Fixes: 94d66ba1d8e4 ("selftests: mptcp: enable checksum in mptcp_connect.sh")
Cc: stable(a)vger.kernel.org
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-2-8230ddd824…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index c6b030babba8..4c7e51336ab2 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -5,7 +5,7 @@ top_srcdir = ../../../../..
CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
TEST_PROGS := mptcp_connect.sh mptcp_connect_mmap.sh mptcp_connect_sendfile.sh \
- pm_netlink.sh mptcp_join.sh diag.sh \
+ mptcp_connect_checksum.sh pm_netlink.sh mptcp_join.sh diag.sh \
simult_flows.sh mptcp_sockopt.sh userspace_pm.sh
TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_checksum.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_checksum.sh
new file mode 100755
index 000000000000..ce93ec2f107f
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_checksum.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -C "${@}"
By inducing delays in the right places, Jann Horn created a reproducer
for a hard to hit UAF issue that became possible after VMAs were allowed
to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and
it can be recycled by another process. vma_start_read() then
increments the vma->vm_refcnt (if it is in an acceptable range), and
if this succeeds, vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA
through vma_end_read(), which calls vma_refcount_put().
vma_refcount_put() drops the refcount and then calls rcuwait_wake_up()
using a copy of vma->vm_mm. This is wrong: It implicitly assumes that
the caller is keeping the VMA's mm alive, but in this scenario the caller
has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount
was already dropped by T1. At this point T3 can exit and free the mm
causing UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
Changes since v1 [1]
- Made a copy of vma->mm before using it in vma_start_read(),
per Vlastimil Babka
Notes:
- Applies cleanly over mm-unstable.
- Should be applied to 6.15 and 6.16 but these branches do not
have lock_next_vma() function, so the change in lock_next_vma() should be
skipped when applying to those branches.
[1] https://lore.kernel.org/all/20250728170950.2216966-1-surenb@google.com/
include/linux/mmap_lock.h | 23 +++++++++++++++++++++++
mm/mmap_lock.c | 10 +++-------
2 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 1f4f44951abe..da34afa2f8ef 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w);
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -183,6 +184,28 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 729fb7d0dd59..aa3bc42ecde0 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind of the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
base-commit: c617a4dd7102e691fa0fb2bc4f6b369e37d7f509
--
2.50.1.487.gc89ff58d15-goog
The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.
Relax the check around --ld-path so it gets used for all linkers.
Fixes: dfc1b168a8c4 ("kbuild: userprogs: use correct lld when linking through clang")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
---
Changes in v2:
- Use plain ifdef instead of ifneq
- Rebase onto v6.16
- Pick up review tag from Nathan
- Link to v1: https://lore.kernel.org/r/20250724-userprogs-clang-gnu-ld-v1-1-3d3d071e53a7…
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 478f2004602da03c9039044e3288f24a13833cc7..7d24245d118c0e903119263b871d2e5c2759f48a 100644
--- a/Makefile
+++ b/Makefile
@@ -1134,7 +1134,7 @@ KBUILD_USERCFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD
KBUILD_USERLDFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS))
# userspace programs are linked via the compiler, use the correct linker
-ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_LD_IS_LLD),yy)
+ifdef CONFIG_CC_IS_CLANG
KBUILD_USERLDFLAGS += --ld-path=$(LD)
endif
---
base-commit: 038d61fd642278bab63ee8ef722c50d10ab01e8f
change-id: 20250723-userprogs-clang-gnu-ld-7a1c16fc852d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
UFS performance for < v4 controllers has reduced quite a bit in 6.16.
This series addresses this regression and brings numbers more or less
back to the previous level.
See patch 2 for some benchmark (fio) results.
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
Changes in v2:
- patch 1: new patch as suggested by Bart during v1 review
- patch 2: update commit message and some inline & kerneldoc comments
- patch 2: add missing jiffies.h include
- Link to v1: https://lore.kernel.org/r/20250724-ufshcd-hardirq-v1-1-6398a52f8f02@linaro.…
---
André Draszik (2):
scsi: ufs: core: complete polled requests also from interrupt context
scsi: ufs: core: move some irq handling back to hardirq (with time limit)
drivers/ufs/core/ufshcd.c | 211 +++++++++++++++++++++++++++++++++-------------
1 file changed, 152 insertions(+), 59 deletions(-)
---
base-commit: 58ba80c4740212c29a1cf9b48f588e60a7612209
change-id: 20250723-ufshcd-hardirq-c7326f642e56
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
Syzbot reports a KASAN issue as below:
BUG: KASAN: use-after-free in __create_pipe include/linux/usb.h:1945 [inline]
BUG: KASAN: use-after-free in send_packet+0xa2d/0xbc0 drivers/media/rc/imon.c:627
Read of size 4 at addr ffff8880256fb000 by task syz-executor314/4465
CPU: 2 PID: 4465 Comm: syz-executor314 Not tainted 6.0.0-rc1-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x6e9 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
__create_pipe include/linux/usb.h:1945 [inline]
send_packet+0xa2d/0xbc0 drivers/media/rc/imon.c:627
vfd_write+0x2d9/0x550 drivers/media/rc/imon.c:991
vfs_write+0x2d7/0xdd0 fs/read_write.c:576
ksys_write+0x127/0x250 fs/read_write.c:631
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The iMON driver improperly releases the usb_device reference in
imon_disconnect without coordinating with active users of the
device.
Specifically, the fields usbdev_intf0 and usbdev_intf1 are not
protected by the users counter (ictx->users). During probe,
imon_init_intf0 or imon_init_intf1 increments the usb_device
reference count depending on the interface. However, during
disconnect, usb_put_dev is called unconditionally, regardless of
actual usage.
As a result, if vfd_write or other operations are still in
progress after disconnect, this can lead to a use-after-free of
the usb_device pointer.
Thread 1 vfd_write Thread 2 imon_disconnect
...
if
usb_put_dev(ictx->usbdev_intf0)
else
usb_put_dev(ictx->usbdev_intf1)
...
while
send_packet
if
pipe = usb_sndintpipe(
ictx->usbdev_intf0) UAF
else
pipe = usb_sndctrlpipe(
ictx->usbdev_intf0, 0) UAF
Guard access to usbdev_intf0 and usbdev_intf1 after disconnect by
checking ictx->disconnected in all writer paths. Add early return
with -ENODEV in send_packet(), vfd_write(), lcd_write() and
display_open() if the device is no longer present.
Set and read ictx->disconnected under ictx->lock to ensure memory
synchronization. Acquire the lock in imon_disconnect() before setting
the flag to synchronize with any ongoing operations.
Ensure writers exit early and safely after disconnect before the USB
core proceeds with cleanup.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+f1a69784f6efe748c3bf(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f1a69784f6efe748c3bf
Fixes: 21677cfc562a ("V4L/DVB: ir-core: add imon driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Larshin Sergey <Sergey.Larshin(a)kaspersky.com>
---
drivers/media/rc/imon.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/media/rc/imon.c b/drivers/media/rc/imon.c
index f5221b018808..cf3e6e43c0c7 100644
--- a/drivers/media/rc/imon.c
+++ b/drivers/media/rc/imon.c
@@ -536,7 +536,9 @@ static int display_open(struct inode *inode, struct file *file)
mutex_lock(&ictx->lock);
- if (!ictx->display_supported) {
+ if (ictx->disconnected) {
+ retval = -ENODEV;
+ } else if (!ictx->display_supported) {
pr_err("display not supported by device\n");
retval = -ENODEV;
} else if (ictx->display_isopen) {
@@ -598,6 +600,9 @@ static int send_packet(struct imon_context *ictx)
int retval = 0;
struct usb_ctrlrequest *control_req = NULL;
+ if (ictx->disconnected)
+ return -ENODEV;
+
/* Check if we need to use control or interrupt urb */
if (!ictx->tx_control) {
pipe = usb_sndintpipe(ictx->usbdev_intf0,
@@ -949,12 +954,14 @@ static ssize_t vfd_write(struct file *file, const char __user *buf,
static const unsigned char vfd_packet6[] = {
0x01, 0x00, 0x00, 0x00, 0x00, 0xFF, 0xFF };
- if (ictx->disconnected)
- return -ENODEV;
-
if (mutex_lock_interruptible(&ictx->lock))
return -ERESTARTSYS;
+ if (ictx->disconnected) {
+ retval = -ENODEV;
+ goto exit;
+ }
+
if (!ictx->dev_present_intf0) {
pr_err_ratelimited("no iMON device present\n");
retval = -ENODEV;
@@ -1029,11 +1036,13 @@ static ssize_t lcd_write(struct file *file, const char __user *buf,
int retval = 0;
struct imon_context *ictx = file->private_data;
- if (ictx->disconnected)
- return -ENODEV;
-
mutex_lock(&ictx->lock);
+ if (ictx->disconnected) {
+ retval = -ENODEV;
+ goto exit;
+ }
+
if (!ictx->display_supported) {
pr_err_ratelimited("no iMON display present\n");
retval = -ENODEV;
@@ -2499,7 +2508,11 @@ static void imon_disconnect(struct usb_interface *interface)
int ifnum;
ictx = usb_get_intfdata(interface);
+
+ mutex_lock(&ictx->lock);
ictx->disconnected = true;
+ mutex_unlock(&ictx->lock);
+
dev = ictx->dev;
ifnum = interface->cur_altsetting->desc.bInterfaceNumber;
--
2.39.5
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space.
These helpers ensure proper synchronization of page tables when
updating the kernel portion of top-level page tables.
Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner.
For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for
direct mapping and vmemmap mapping changes").
However, this approach has proven fragile for following reasons:
1) It is easy to forget to perform the necessary page table
synchronization when introducing new changes.
For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
savings for compound devmaps") overlooked the need to synchronize
page tables for the vmemmap area.
2) It is also easy to overlook that the vmemmap and direct mapping areas
must not be accessed before explicit page table synchronization.
For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
sub-pmd ranges")) caused crashes by accessing the vmemmap area
before calling sync_global_pgds().
To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables.
They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by arch_sync_kernel_mappings().
This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced. In theory, PUD and PMD level helpers can be added
later if needed by other architectures.
Currently this is a no-op, since no architecture sets
PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
Cc: stable(a)vger.kernel.org
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
include/asm-generic/pgalloc.h | 16 ++++++++++++++++
include/linux/pgtable.h | 4 ++--
mm/kasan/init.c | 10 +++++-----
mm/percpu.c | 4 ++--
mm/sparse-vmemmap.c | 4 ++--
5 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 3c8ec3bfea44..fc0ab8eed5a6 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -4,6 +4,8 @@
#ifdef CONFIG_MMU
+#include <linux/pgtable.h>
+
#define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
#define GFP_PGTABLE_USER (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
@@ -296,6 +298,20 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
}
#endif
+#define pgd_populate_kernel(addr, pgd, p4d) \
+do { \
+ pgd_populate(&init_mm, pgd, p4d); \
+ if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) \
+ arch_sync_kernel_mappings(addr, addr); \
+} while (0)
+
+#define p4d_populate_kernel(addr, p4d, pud) \
+do { \
+ p4d_populate(&init_mm, p4d, pud); \
+ if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED) \
+ arch_sync_kernel_mappings(addr, addr); \
+} while (0)
+
#endif /* CONFIG_MMU */
#endif /* __ASM_GENERIC_PGALLOC_H */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e564f338c758..2e24514ab6d0 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1332,8 +1332,8 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
/*
* Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
- * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
- * needs to be called.
+ * and let generic vmalloc, ioremap and page table update code know when
+ * arch_sync_kernel_mappings() needs to be called.
*/
#ifndef ARCH_PAGE_TABLE_SYNC_MASK
#define ARCH_PAGE_TABLE_SYNC_MASK 0
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index ced6b29fcf76..43de820ee282 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -191,7 +191,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
pud_t *pud;
pmd_t *pmd;
- p4d_populate(&init_mm, p4d,
+ p4d_populate_kernel(addr, p4d,
lm_alias(kasan_early_shadow_pud));
pud = pud_offset(p4d, addr);
pud_populate(&init_mm, pud,
@@ -212,7 +212,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
} else {
p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
pud_init(p);
- p4d_populate(&init_mm, p4d, p);
+ p4d_populate_kernel(addr, p4d, p);
}
}
zero_pud_populate(p4d, addr, next);
@@ -251,10 +251,10 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
* puds,pmds, so pgd_populate(), pud_populate()
* is noops.
*/
- pgd_populate(&init_mm, pgd,
+ pgd_populate_kernel(addr, pgd,
lm_alias(kasan_early_shadow_p4d));
p4d = p4d_offset(pgd, addr);
- p4d_populate(&init_mm, p4d,
+ p4d_populate_kernel(addr, p4d,
lm_alias(kasan_early_shadow_pud));
pud = pud_offset(p4d, addr);
pud_populate(&init_mm, pud,
@@ -273,7 +273,7 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
if (!p)
return -ENOMEM;
} else {
- pgd_populate(&init_mm, pgd,
+ pgd_populate_kernel(addr, pgd,
early_alloc(PAGE_SIZE, NUMA_NO_NODE));
}
}
diff --git a/mm/percpu.c b/mm/percpu.c
index b35494c8ede2..1615dc3b3af5 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3134,13 +3134,13 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
if (pgd_none(*pgd)) {
p4d = memblock_alloc_or_panic(P4D_TABLE_SIZE, P4D_TABLE_SIZE);
- pgd_populate(&init_mm, pgd, p4d);
+ pgd_populate_kernel(addr, pgd, p4d);
}
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d)) {
pud = memblock_alloc_or_panic(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
- p4d_populate(&init_mm, p4d, pud);
+ p4d_populate_kernel(addr, p4d, pud);
}
pud = pud_offset(p4d, addr);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index fd2ab5118e13..e275310ac708 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -229,7 +229,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
if (!p)
return NULL;
pud_init(p);
- p4d_populate(&init_mm, p4d, p);
+ p4d_populate_kernel(addr, p4d, p);
}
return p4d;
}
@@ -241,7 +241,7 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
- pgd_populate(&init_mm, pgd, p);
+ pgd_populate_kernel(addr, pgd, p);
}
return pgd;
}
--
2.43.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ee9f3a81ab08dfe0538dbd1746f81fd4d5147fdc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072840-quickstep-spiny-0e80@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee9f3a81ab08dfe0538dbd1746f81fd4d5147fdc Mon Sep 17 00:00:00 2001
From: Ma Ke <make24(a)iscas.ac.cn>
Date: Thu, 17 Jul 2025 10:23:08 +0800
Subject: [PATCH] dpaa2-eth: Fix device reference count leak in MAC endpoint
handling
The fsl_mc_get_endpoint() function uses device_find_child() for
localization, which implicitly calls get_device() to increment the
device's reference count before returning the pointer. However, the
caller dpaa2_eth_connect_mac() fails to properly release this
reference in multiple scenarios. We should call put_device() to
decrement reference count properly.
As comment of device_find_child() says, 'NOTE: you will need to drop
the reference with put_device() after use'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
Tested-by: Ioana Ciornei <ioana.ciornei(a)nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei(a)nxp.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://patch.msgid.link/20250717022309.3339976-2-make24@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index b82f121cadad..0f4efd505332 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -4666,12 +4666,19 @@ static int dpaa2_eth_connect_mac(struct dpaa2_eth_priv *priv)
return PTR_ERR(dpmac_dev);
}
- if (IS_ERR(dpmac_dev) || dpmac_dev->dev.type != &fsl_mc_bus_dpmac_type)
+ if (IS_ERR(dpmac_dev))
return 0;
+ if (dpmac_dev->dev.type != &fsl_mc_bus_dpmac_type) {
+ err = 0;
+ goto out_put_device;
+ }
+
mac = kzalloc(sizeof(struct dpaa2_mac), GFP_KERNEL);
- if (!mac)
- return -ENOMEM;
+ if (!mac) {
+ err = -ENOMEM;
+ goto out_put_device;
+ }
mac->mc_dev = dpmac_dev;
mac->mc_io = priv->mc_io;
@@ -4705,6 +4712,8 @@ static int dpaa2_eth_connect_mac(struct dpaa2_eth_priv *priv)
dpaa2_mac_close(mac);
err_free_mac:
kfree(mac);
+out_put_device:
+ put_device(&dpmac_dev->dev);
return err;
}
This patch series aims at adding support for Exynos7870's DECON in the
Exynos7 DECON driver. It introduces a driver data struct so that support
for DECON on other SoCs can be added to it in the future.
It also fixes a few bugs in the driver, such as functions receiving bad
pointers.
Tested on Samsung Galaxy J7 Prime (samsung-on7xelte), Samsung Galaxy A2
Core (samsung-a2corelte), and Samsung Galaxy J6 (samsung-j6lte).
Signed-off-by: Kaustabh Chakraborty <kauschluss(a)disroot.org>
---
Changes in v4:
- Drop applied patch [v2 3/3].
- Correct documentation of port dt property.
- Add documentation of memory-region.
- Remove redundant ctx->suspended completely.
- Link to v3: https://lore.kernel.org/r/20250627-exynosdrm-decon-v3-0-5b456f88cfea@disroo…
Changes in v3:
- Add a new commit documenting iommus and ports dt properties.
- Link to v2: https://lore.kernel.org/r/20250612-exynosdrm-decon-v2-0-d6c1d21c8057@disroo…
Changes in v2:
- Add a new commit to prevent an occasional panic under circumstances.
- Rewrite and redo [v1 2/6] to be a more sensible commit.
- Link to v1: https://lore.kernel.org/r/20240919-exynosdrm-decon-v1-0-6c5861c1cb04@disroo…
---
Kaustabh Chakraborty (2):
dt-bindings: display: samsung,exynos7-decon: document iommus, memory-region, and ports
drm/exynos: exynos7_drm_decon: remove ctx->suspended
.../display/samsung/samsung,exynos7-decon.yaml | 21 +++++++++++++
drivers/gpu/drm/exynos/exynos7_drm_decon.c | 36 ----------------------
2 files changed, 21 insertions(+), 36 deletions(-)
---
base-commit: 26ffb3d6f02cd0935fb9fa3db897767beee1cb2a
change-id: 20240917-exynosdrm-decon-4c228dd1d2bf
Best regards,
--
Kaustabh Chakraborty <kauschluss(a)disroot.org>
The patch titled
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hang
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hang
Date: Mon, 28 Jul 2025 15:52:59 +0800
The current swap-in code assumes that, when a swap entry in shmem mapping
is order 0, its cached folios (if present) must be order 0 too, which
turns out not always correct.
The problem is shmem_split_large_entry is called before verifying the
folio will eventually be swapped in, one possible race is:
CPU1 CPU2
shmem_swapin_folio
/* swap in of order > 0 swap entry S1 */
folio = swap_cache_get_folio
/* folio = NULL */
order = xa_get_order
/* order > 0 */
folio = shmem_swap_alloc_folio
/* mTHP alloc failure, folio = NULL */
<... Interrupted ...>
shmem_swapin_folio
/* S1 is swapped in */
shmem_writeout
/* S1 is swapped out, folio cached */
shmem_split_large_entry(..., S1)
/* S1 is split, but the folio covering it has order > 0 now */
Now any following swapin of S1 will hang: `xa_get_order` returns 0, and
folio lookup will return a folio with order > 0. The
`xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always
return false causing swap-in to return -EEXIST.
And this looks fragile. So fix this up by allowing seeing a larger folio
in swap cache, and check the whole shmem mapping range covered by the
swapin have the right swap value upon inserting the folio. And drop the
redundant tree walks before the insertion.
This will actually improve performance, as it avoids two redundant Xarray
tree walks in the hot path, and the only side effect is that in the
failure path, shmem may redundantly reallocate a few folios causing
temporary slight memory pressure.
And worth noting, it may seems the order and value check before inserting
might help reducing the lock contention, which is not true. The swap
cache layer ensures raced swapin will either see a swap cache folio or
failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is
bypassed), so holding the folio lock and checking the folio flag is
already good enough for avoiding the lock contention. The chance that a
folio passes the swap entry value check but the shmem mapping slot has
changed should be very low.
Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250728075306.12704-2-ryncsn@gmail.com
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
--- a/mm/shmem.c~mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang
+++ a/mm/shmem.c
@@ -884,7 +884,9 @@ static int shmem_add_to_page_cache(struc
pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
- long nr = folio_nr_pages(folio);
+ unsigned long nr = folio_nr_pages(folio);
+ swp_entry_t iter, swap;
+ void *entry;
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -896,14 +898,25 @@ static int shmem_add_to_page_cache(struc
gfp &= GFP_RECLAIM_MASK;
folio_throttle_swaprate(folio, gfp);
+ swap = radix_to_swp_entry(expected);
do {
+ iter = swap;
xas_lock_irq(&xas);
- if (expected != xas_find_conflict(&xas)) {
- xas_set_err(&xas, -EEXIST);
- goto unlock;
+ xas_for_each_conflict(&xas, entry) {
+ /*
+ * The range must either be empty, or filled with
+ * expected swap entries. Shmem swap entries are never
+ * partially freed without split of both entry and
+ * folio, so there shouldn't be any holes.
+ */
+ if (!expected || entry != swp_to_radix_entry(iter)) {
+ xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
+ iter.val += 1 << xas_get_order(&xas);
}
- if (expected && xas_find_conflict(&xas)) {
+ if (expected && iter.val - nr != swap.val) {
xas_set_err(&xas, -EEXIST);
goto unlock;
}
@@ -2327,7 +2340,7 @@ static int shmem_swapin_folio(struct ino
error = -ENOMEM;
goto failed;
}
- } else if (order != folio_order(folio)) {
+ } else if (order > folio_order(folio)) {
/*
* Swap readahead may swap in order 0 folios into swapcache
* asynchronously, while the shmem mapping can still stores
@@ -2352,15 +2365,23 @@ static int shmem_swapin_folio(struct ino
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
+ } else if (order < folio_order(folio)) {
+ swap.val = round_down(swap.val, 1 << folio_order(folio));
+ index = round_down(index, 1 << folio_order(folio));
}
alloced:
- /* We have to do this with folio locked to prevent races */
+ /*
+ * We have to do this with the folio locked to prevent races.
+ * The shmem_confirm_swap below only checks if the first swap
+ * entry matches the folio, that's enough to ensure the folio
+ * is not used outside of shmem, as shmem swap entries
+ * and swap cache folios are never partially freed.
+ */
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
- folio->swap.val != swap.val ||
!shmem_confirm_swap(mapping, index, swap) ||
- xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
+ folio->swap.val != swap.val) {
error = -EEXIST;
goto unlock;
}
_
Patches currently in -mm which might be from kasong(a)tencent.com are
mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang.patch
mm-shmem-swap-avoid-redundant-xarray-lookup-during-swapin.patch
mm-shmem-swap-tidy-up-thp-swapin-checks.patch
mm-shmem-swap-tidy-up-swap-entry-splitting.patch
mm-shmem-swap-never-use-swap-cache-and-readahead-for-swp_synchronous_io.patch
mm-shmem-swap-simplify-swapin-path-and-result-handling.patch
mm-shmem-swap-rework-swap-entry-and-index-calculation-for-large-swapin.patch
mm-shmem-swap-fix-major-fault-counting.patch
The patch titled
Subject: mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
has been added to the -mm mm-unstable branch. Its filename is
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
Date: Mon, 28 Jul 2025 10:53:55 -0700
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1. To avoid this we move vma->vm_mm verification into
vma_start_read() and grab vma->vm_mm to stabilize it before
vma_refcount_put() operation.
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 23 +++++++++++++++++++++++
mm/mmap_lock.c | 10 +++-------
2 files changed, 26 insertions(+), 7 deletions(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped
+++ a/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwai
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -183,6 +184,28 @@ static inline struct vm_area_struct *vma
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped
+++ a/mm/mmap_lock.c
@@ -164,8 +164,7 @@ retry:
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ retry:
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind of the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
The patch titled
Subject: kasan/test: fix protection against compiler elision
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-test-fix-protection-against-compiler-elision.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: kasan/test: fix protection against compiler elision
Date: Mon, 28 Jul 2025 22:11:54 +0200
The kunit test is using assignments to
"static volatile void *kasan_ptr_result" to prevent elision of memory
loads, but that's not working:
In this variable definition, the "volatile" applies to the "void", not to
the pointer.
To make "volatile" apply to the pointer as intended, it must follow
after the "*".
This makes the kasan_memchr test pass again on my system. The
kasan_strings test is still failing because all the definitions of
load_unaligned_zeropad() are lacking explicit instrumentation hooks and
ASAN does not instrument asm() memory operands.
Link: https://lkml.kernel.org/r/20250728-kasan-kunit-fix-volatile-v1-1-e7157c9af8…
Fixes: 5f1c8108e7ad ("mm:kasan: fix sparse warnings: Should it be static?")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Nihar Chaithanya <niharchaithanya(a)gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan_test_c.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kasan/kasan_test_c.c~kasan-test-fix-protection-against-compiler-elision
+++ a/mm/kasan/kasan_test_c.c
@@ -47,7 +47,7 @@ static struct {
* Some tests use these global variables to store return values from function
* calls that could otherwise be eliminated by the compiler as dead code.
*/
-static volatile void *kasan_ptr_result;
+static void *volatile kasan_ptr_result;
static volatile int kasan_int_result;
/* Probe for console output: obtains test_status lines of interest. */
_
Patches currently in -mm which might be from jannh(a)google.com are
kasan-test-fix-protection-against-compiler-elision.patch
kasan-skip-quarantine-if-object-is-still-accessible-under-rcu.patch
mm-rmap-add-anon_vma-lifetime-debug-check.patch
The patch titled
Subject: mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
has been added to the -mm mm-nonmm-unstable branch. Its filename is
mm-kmemleak-avoid-soft-lockup-in-__kmemleak_do_cleanup.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
Date: Mon, 28 Jul 2025 15:02:48 -0400
A soft lockup warning was observed on a relative small system x86-64
system with 16 GB of memory when running a debug kernel with kmemleak
enabled.
watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134]
The test system was running a workload with hot unplug happening in
parallel. Then kemleak decided to disable itself due to its inability to
allocate more kmemleak objects. The debug kernel has its
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000.
The soft lockup happened in kmemleak_do_cleanup() when the existing
kmemleak objects were being removed and deleted one-by-one in a loop via a
workqueue. In this particular case, there are at least 40,000 objects
that need to be processed and given the slowness of a debug kernel and the
fact that a raw_spinlock has to be acquired and released in
__delete_object(), it could take a while to properly handle all these
objects.
As kmemleak has been disabled in this case, the object removal and
deletion process can be further optimized as locking isn't really needed.
However, it is probably not worth the effort to optimize for such an edge
case that should rarely happen. So the simple solution is to call
cond_resched() at periodic interval in the iteration loop to avoid soft
lockup.
Link: https://lkml.kernel.org/r/20250728190248.605750-1-longman@redhat.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-soft-lockup-in-__kmemleak_do_cleanup
+++ a/mm/kmemleak.c
@@ -2181,6 +2181,7 @@ static const struct file_operations kmem
static void __kmemleak_do_cleanup(void)
{
struct kmemleak_object *object, *tmp;
+ unsigned int cnt = 0;
/*
* Kmemleak has already been disabled, no need for RCU list traversal
@@ -2189,6 +2190,10 @@ static void __kmemleak_do_cleanup(void)
list_for_each_entry_safe(object, tmp, &object_list, object_list) {
__remove_object(object);
__delete_object(object);
+
+ /* Call cond_resched() once per 64 iterations to avoid soft lockup */
+ if (!(++cnt & 0x3f))
+ cond_resched();
}
}
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-kmemleak-avoid-soft-lockup-in-__kmemleak_do_cleanup.patch
By inducing delays in the right places, Jann Horn created a reproducer
for a hard to hit UAF issue that became possible after VMAs were allowed
to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and
it can be recycled by another process. vma_start_read() then
increments the vma->vm_refcnt (if it is in an acceptable range), and
if this succeeds, vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA
through vma_end_read(), which calls vma_refcount_put().
vma_refcount_put() drops the refcount and then calls rcuwait_wake_up()
using a copy of vma->vm_mm. This is wrong: It implicitly assumes that
the caller is keeping the VMA's mm alive, but in this scenario the caller
has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount
was already dropped by T1. At this point T3 can exit and free the mm
causing UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
- Applies cleanly over mm-unstable.
- Should be applied to 6.15 and 6.16 but these branches do not
have lock_next_vma() function, so the change in lock_next_vma() should be
skipped when applying to those branches.
include/linux/mmap_lock.h | 21 +++++++++++++++++++++
mm/mmap_lock.c | 10 +++-------
2 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 1f4f44951abe..4ee4ab835c41 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w);
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -183,6 +184,26 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ */
+ rcu_read_unlock();
+ mmgrab(vma->vm_mm);
+ vma_refcount_put(vma);
+ mmdrop(vma->vm_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 729fb7d0dd59..aa3bc42ecde0 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind of the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
base-commit: c617a4dd7102e691fa0fb2bc4f6b369e37d7f509
--
2.50.1.487.gc89ff58d15-goog
Hi Greg, Sasha,
Please consider commit f0915acd1fc6 ("rust: give Clippy the minimum
supported Rust version") for 6.12.y. It should cherry-pick
It will clean a lint starting with Rust 1.90.0 (expected 2025-09-18)
-- more details below [1]. Moreover, it also aligns us with 6.15.y and
mainline in terms of Clippy behavior, which also helps.
It is pretty safe, since it is just a config option for Clippy. Even
if a bug were to occur that somehow broke it only in stable, normal
kernel builds do not use Clippy to begin with.
Thanks!
Cheers,
Miguel
[1]
Starting with Rust 1.90.0 (expected 2025-09-18), Clippy detects manual
implementations of `is_multiple_of`, of which we have one in the QR
code panic code:
warning: manual implementation of `.is_multiple_of()`
--> drivers/gpu/drm/drm_panic_qr.rs:886:20
|
886 | if (x ^ y) % 2 == 0 && !self.is_reserved(x, y) {
| ^^^^^^^^^^^^^^^^ help: replace with: `(x
^ y).is_multiple_of(2)`
|
= help: for further information visit
https://rust-lang.github.io/rust-clippy/master/index.html#manual_is_multipl…
= note: `-W clippy::manual-is-multiple-of` implied by `-W clippy::all`
= help: to override `-W clippy::all` add
`#[allow(clippy::manual_is_multiple_of)]`
While in general `is_multiple_of` does not have the same behavior as
`b == 0` [2], in this case the suggestion is fine.
However, we cannot use `is_multiple_of` yet because it was introduced
(unstably) in Rust 1.81.0, and we still support Rust 1.78.0.
Normally, we would conditionally allow it (which is what I was going
to do in a patch for mainline), but it turns out we don't trigger it
in mainline nor 6.15.y because of the `msrv` config option which makes
Clippy avoid the lint.
Thus, instead of a custom patch, I decided to align Clippy here by
backporting the config option instead.
Link: https://github.com/rust-lang/rust-clippy/issues/15335 [2]
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 87c4e1459e80bf65066f864c762ef4dc932fad4b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072856-crisply-wannabe-7c72@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87c4e1459e80bf65066f864c762ef4dc932fad4b Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 20 Jun 2025 19:08:09 +0100
Subject: [PATCH] ARM: 9448/1: Use an absolute path to unified.h in
KBUILD_AFLAGS
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg…
Cc: stable(a)vger.kernel.org
Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot(a)kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4808d3ed98e4..e31e95ffd33f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -149,7 +149,7 @@ endif
# Need -Uarm for gcc < 3.x
KBUILD_CPPFLAGS +=$(cpp-y)
KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
-KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include asm/unified.h -msoft-float
+KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include $(srctree)/arch/arm/include/asm/unified.h -msoft-float
KBUILD_RUSTFLAGS += --target=arm-unknown-linux-gnueabi
CHECKFLAGS += -D__arm__
[ Upstream commit 08ae4b20f5e82101d77326ecab9089e110f224cc ]
The handling of the `COMEDI_INSNLIST` ioctl allocates a kernel buffer to
hold the array of `struct comedi_insn`, getting the length from the
`n_insns` member of the `struct comedi_insnlist` supplied by the user.
The allocation will fail with a WARNING and a stack dump if it is too
large.
Avoid that by failing with an `-EINVAL` error if the supplied `n_insns`
value is unreasonable.
Define the limit on the `n_insns` value in the `MAX_INSNS` macro. Set
this to the same value as `MAX_SAMPLES` (65536), which is the maximum
allowed sum of the values of the member `n` in the array of `struct
comedi_insn`, and sensible comedi instructions will have an `n` of at
least 1.
Reported-by: syzbot+d6995b62e5ac7d79557a(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6995b62e5ac7d79557a
Fixes: ed9eccbe8970 ("Staging: add comedi core")
Tested-by: Ian Abbott <abbotti(a)mev.co.uk>
Cc: stable(a)vger.kernel.org # 5.13+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Link: https://lore.kernel.org/r/20250704120405.83028-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Reworked for before commit bac42fb21259 ("comedi: get rid of compat_alloc_user_space() mess in COMEDI_CMD{,TEST} compat") ]
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/comedi_compat32.c | 3 +++
drivers/staging/comedi/comedi_fops.c | 13 +++++++++++++
2 files changed, 16 insertions(+)
diff --git a/drivers/staging/comedi/comedi_compat32.c b/drivers/staging/comedi/comedi_compat32.c
index 36a3564ba1fb..2f444e2b92c2 100644
--- a/drivers/staging/comedi/comedi_compat32.c
+++ b/drivers/staging/comedi/comedi_compat32.c
@@ -360,6 +360,9 @@ static int compat_insnlist(struct file *file, unsigned long arg)
if (err)
return -EFAULT;
+ if (n_insns > 65536) /* See MAX_INSNS in comedi_fops.c */
+ return -EINVAL;
+
/* Allocate user memory to copy insnlist and insns into. */
s = compat_alloc_user_space(offsetof(struct combined_insnlist,
insn[n_insns]));
diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index 8b2337f8303d..413863bc929b 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -1502,6 +1502,16 @@ static int parse_insn(struct comedi_device *dev, struct comedi_insn *insn,
return ret;
}
+#define MAX_INSNS 65536
+static int check_insnlist_len(struct comedi_device *dev, unsigned int n_insns)
+{
+ if (n_insns > MAX_INSNS) {
+ dev_dbg(dev->class_dev, "insnlist length too large\n");
+ return -EINVAL;
+ }
+ return 0;
+}
+
/*
* COMEDI_INSNLIST ioctl
* synchronous instruction list
@@ -1534,6 +1544,9 @@ static int do_insnlist_ioctl(struct comedi_device *dev,
if (copy_from_user(&insnlist, arg, sizeof(insnlist)))
return -EFAULT;
+ ret = check_insnlist_len(dev, insnlist32.n_insns);
+ if (ret)
+ return ret;
insns = kcalloc(insnlist.n_insns, sizeof(*insns), GFP_KERNEL);
if (!insns) {
ret = -ENOMEM;
--
2.47.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 87c4e1459e80bf65066f864c762ef4dc932fad4b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072855-footbath-stooge-3311@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87c4e1459e80bf65066f864c762ef4dc932fad4b Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 20 Jun 2025 19:08:09 +0100
Subject: [PATCH] ARM: 9448/1: Use an absolute path to unified.h in
KBUILD_AFLAGS
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg…
Cc: stable(a)vger.kernel.org
Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot(a)kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4808d3ed98e4..e31e95ffd33f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -149,7 +149,7 @@ endif
# Need -Uarm for gcc < 3.x
KBUILD_CPPFLAGS +=$(cpp-y)
KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
-KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include asm/unified.h -msoft-float
+KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include $(srctree)/arch/arm/include/asm/unified.h -msoft-float
KBUILD_RUSTFLAGS += --target=arm-unknown-linux-gnueabi
CHECKFLAGS += -D__arm__
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 87c4e1459e80bf65066f864c762ef4dc932fad4b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072854-backstab-skeletal-e45e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87c4e1459e80bf65066f864c762ef4dc932fad4b Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 20 Jun 2025 19:08:09 +0100
Subject: [PATCH] ARM: 9448/1: Use an absolute path to unified.h in
KBUILD_AFLAGS
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg…
Cc: stable(a)vger.kernel.org
Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot(a)kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4808d3ed98e4..e31e95ffd33f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -149,7 +149,7 @@ endif
# Need -Uarm for gcc < 3.x
KBUILD_CPPFLAGS +=$(cpp-y)
KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
-KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include asm/unified.h -msoft-float
+KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include $(srctree)/arch/arm/include/asm/unified.h -msoft-float
KBUILD_RUSTFLAGS += --target=arm-unknown-linux-gnueabi
CHECKFLAGS += -D__arm__
Commit 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service
routine to a threaded IRQ handler") introduced a massive performance
drop for various work loads on UFSHC versions < 4 due to the extra
latency introduced by moving all of the IRQ handling into a threaded
handler. See below for a summary.
To resolve this performance drop, move IRQ handling back into hardirq
context, but apply a time limit which, once expired, will cause the
remainder of the work to be deferred to the threaded handler.
Above commit is trying to avoid unduly delay of other subsystem
interrupts while the UFS events are being handled. By limiting the
amount of time spent in hardirq context, we can still ensure that.
The time limit itself was chosen because I have generally seen
interrupt handling to have been completed within 20 usecs, with the
occasional spikes of a couple 100 usecs.
This commits brings UFS performance roughly back to original
performance, and should still avoid other subsystem's starvation thanks
to dealing with these spikes.
fio results on Pixel 6:
read / 1 job original after this commit
min IOPS 4,653.60 2,704.40 3,902.80
max IOPS 6,151.80 4,847.60 6,103.40
avg IOPS 5,488.82 4,226.61 5,314.89
cpu % usr 1.85 1.72 1.97
cpu % sys 32.46 28.88 33.29
bw MB/s 21.46 16.50 20.76
read / 8 jobs original after this commit
min IOPS 18,207.80 11,323.00 17,911.80
max IOPS 25,535.80 14,477.40 24,373.60
avg IOPS 22,529.93 13,325.59 21,868.85
cpu % usr 1.70 1.41 1.67
cpu % sys 27.89 21.85 27.23
bw MB/s 88.10 52.10 84.48
write / 1 job original after this commit
min IOPS 6,524.20 3,136.00 5,988.40
max IOPS 7,303.60 5,144.40 7,232.40
avg IOPS 7,169.80 4,608.29 7,014.66
cpu % usr 2.29 2.34 2.23
cpu % sys 41.91 39.34 42.48
bw MB/s 28.02 18.00 27.42
write / 8 jobs original after this commit
min IOPS 12,685.40 13,783.00 12,622.40
max IOPS 30,814.20 22,122.00 29,636.00
avg IOPS 21,539.04 18,552.63 21,134.65
cpu % usr 2.08 1.61 2.07
cpu % sys 30.86 23.88 30.64
bw MB/s 84.18 72.54 82.62
Closes: https://lore.kernel.org/all/1e06161bf49a3a88c4ea2e7a406815be56114c4f.camel@…
Fixes: 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler")
Cc: stable(a)vger.kernel.org
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
drivers/ufs/core/ufshcd.c | 192 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 153 insertions(+), 39 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 13f7e0469141619cfc5e180aa730171ff01b8fb1..a117c6a30680f4ece113a7602b61f9f09bd4fda5 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -111,6 +111,9 @@ enum {
/* bMaxNumOfRTT is equal to two after device manufacturing */
#define DEFAULT_MAX_NUM_RTT 2
+/* Time limit in usecs for hardirq context */
+#define HARDIRQ_TIMELIMIT 20
+
/* UFSHC 4.0 compliant HC support this mode. */
static bool use_mcq_mode = true;
@@ -5603,14 +5606,31 @@ void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
* __ufshcd_transfer_req_compl - handle SCSI and query command completion
* @hba: per adapter instance
* @completed_reqs: bitmask that indicates which requests to complete
+ * @time_limit: maximum amount of jiffies to spend executing command completion
+ *
+ * This completes the individual requests as per @completed_reqs with an
+ * optional time limit. If a time limit is given and it expired before all
+ * requests were handled, the return value will indicate which requests have not
+ * been handled.
+ *
+ * Return: Bitmask that indicates which requests have not been completed due to
+ *time limit expiry.
*/
-static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
- unsigned long completed_reqs)
+static unsigned long __ufshcd_transfer_req_compl(struct ufs_hba *hba,
+ unsigned long completed_reqs,
+ unsigned long time_limit)
{
int tag;
- for_each_set_bit(tag, &completed_reqs, hba->nutrs)
+ for_each_set_bit(tag, &completed_reqs, hba->nutrs) {
ufshcd_compl_one_cqe(hba, tag, NULL);
+ __clear_bit(tag, &completed_reqs);
+ if (time_limit && time_after_eq(jiffies, time_limit))
+ break;
+ }
+
+ /* any bits still set represent unhandled requests due to timeout */
+ return completed_reqs;
}
/* Any value that is not an existing queue number is fine for this constant. */
@@ -5633,16 +5653,29 @@ static void ufshcd_clear_polled(struct ufs_hba *hba,
}
}
-/*
- * Return: > 0 if one or more commands have been completed or 0 if no
- * requests have been completed.
+/**
+ * ufshcd_poll_impl - handle SCSI and query command completion helper
+ * @shost: Scsi_Host instance
+ * @queue_num: The h/w queue number, or UFSHCD_POLL_FROM_INTERRUPT_CONTEXT when
+ * invoked from the interrupt handler
+ * @time_limit: maximum amount of jiffies to spend executing command completion
+ * @__pending: Pointer to store any still pending requests in case of time limit
+ * expiry
+ *
+ * This handles completed commands with an optional time limit. If a time limit
+ * is given and it expires, @__pending will be set to the requests that could
+ * not be completed in time and are still pending.
+ *
+ * Return: true if one or more commands have been completed, false otherwise.
*/
-static int ufshcd_poll(struct Scsi_Host *shost, unsigned int queue_num)
+static int ufshcd_poll_impl(struct Scsi_Host *shost, unsigned int queue_num,
+ unsigned long time_limit, unsigned long *__pending)
{
struct ufs_hba *hba = shost_priv(shost);
unsigned long completed_reqs, flags;
u32 tr_doorbell;
struct ufs_hw_queue *hwq;
+ unsigned long pending = 0;
if (hba->mcq_enabled) {
hwq = &hba->uhq[queue_num];
@@ -5656,19 +5689,39 @@ static int ufshcd_poll(struct Scsi_Host *shost, unsigned int queue_num)
WARN_ONCE(completed_reqs & ~hba->outstanding_reqs,
"completed: %#lx; outstanding: %#lx\n", completed_reqs,
hba->outstanding_reqs);
- if (queue_num == UFSHCD_POLL_FROM_INTERRUPT_CONTEXT) {
- /* Do not complete polled requests from interrupt context. */
+ if (time_limit) {
+ /* Do not complete polled requests from hardirq context. */
ufshcd_clear_polled(hba, &completed_reqs);
}
+
+ if (completed_reqs)
+ pending = __ufshcd_transfer_req_compl(hba, completed_reqs,
+ time_limit);
+
+ completed_reqs &= ~pending;
hba->outstanding_reqs &= ~completed_reqs;
+
spin_unlock_irqrestore(&hba->outstanding_lock, flags);
- if (completed_reqs)
- __ufshcd_transfer_req_compl(hba, completed_reqs);
+ if (__pending)
+ *__pending = pending;
return completed_reqs != 0;
}
+/*
+ * ufshcd_poll - SCSI interface of blk_poll to poll for IO completions
+ * @shost: Scsi_Host instance
+ * @queue_num: The h/w queue number
+ *
+ * Return: > 0 if one or more commands have been completed or 0 if no
+ * requests have been completed.
+ */
+static int ufshcd_poll(struct Scsi_Host *shost, unsigned int queue_num)
+{
+ return ufshcd_poll_impl(shost, queue_num, 0, NULL);
+}
+
/**
* ufshcd_mcq_compl_pending_transfer - MCQ mode function. It is
* invoked from the error handler context or ufshcd_host_reset_and_restore()
@@ -5722,13 +5775,19 @@ static void ufshcd_mcq_compl_pending_transfer(struct ufs_hba *hba,
/**
* ufshcd_transfer_req_compl - handle SCSI and query command completion
* @hba: per adapter instance
+ * @time_limit: maximum amount of jiffies to spend executing command completion
*
* Return:
- * IRQ_HANDLED - If interrupt is valid
- * IRQ_NONE - If invalid interrupt
+ * IRQ_HANDLED - If interrupt is valid
+ * IRQ_WAKE_THREAD - If further interrupt processing should be delegated to the
+ * thread
+ * IRQ_NONE - If invalid interrupt
*/
-static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
+static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba,
+ unsigned long time_limit)
{
+ unsigned long pending;
+
/* Resetting interrupt aggregation counters first and reading the
* DOOR_BELL afterward allows us to handle all the completed requests.
* In order to prevent other interrupts starvation the DB is read once
@@ -5744,12 +5803,19 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
return IRQ_HANDLED;
/*
- * Ignore the ufshcd_poll() return value and return IRQ_HANDLED since we
- * do not want polling to trigger spurious interrupt complaints.
+ * Ignore the ufshcd_poll() return value and return IRQ_HANDLED or
+ * IRQ_WAKE_THREAD since we do not want polling to trigger spurious
+ * interrupt complaints.
*/
- ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT);
+ ufshcd_poll_impl(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT,
+ time_limit, &pending);
- return IRQ_HANDLED;
+ /*
+ * If a time limit was set, some requests completions might not have
+ * been handled yet and will need to be dealt with in the threaded
+ * interrupt handler.
+ */
+ return pending ? IRQ_WAKE_THREAD : IRQ_HANDLED;
}
int __ufshcd_write_ee_control(struct ufs_hba *hba, u32 ee_ctrl_mask)
@@ -6310,7 +6376,7 @@ static void ufshcd_complete_requests(struct ufs_hba *hba, bool force_compl)
if (hba->mcq_enabled)
ufshcd_mcq_compl_pending_transfer(hba, force_compl);
else
- ufshcd_transfer_req_compl(hba);
+ ufshcd_transfer_req_compl(hba, 0);
ufshcd_tmc_handler(hba);
}
@@ -7012,12 +7078,16 @@ static irqreturn_t ufshcd_handle_mcq_cq_events(struct ufs_hba *hba)
* ufshcd_sl_intr - Interrupt service routine
* @hba: per adapter instance
* @intr_status: contains interrupts generated by the controller
+ * @time_limit: maximum amount of jiffies to spend executing command completion
*
* Return:
- * IRQ_HANDLED - If interrupt is valid
- * IRQ_NONE - If invalid interrupt
+ * IRQ_HANDLED - If interrupt is valid
+ * IRQ_WAKE_THREAD - If further interrupt processing should be delegated to the
+ * thread
+ * IRQ_NONE - If invalid interrupt
*/
-static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
+static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status,
+ unsigned long time_limit)
{
irqreturn_t retval = IRQ_NONE;
@@ -7031,7 +7101,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
retval |= ufshcd_tmc_handler(hba);
if (intr_status & UTP_TRANSFER_REQ_COMPL)
- retval |= ufshcd_transfer_req_compl(hba);
+ retval |= ufshcd_transfer_req_compl(hba, time_limit);
if (intr_status & MCQ_CQ_EVENT_STATUS)
retval |= ufshcd_handle_mcq_cq_events(hba);
@@ -7040,15 +7110,25 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
}
/**
- * ufshcd_threaded_intr - Threaded interrupt service routine
+ * ufshcd_intr_helper - hardirq and threaded interrupt service routine
* @irq: irq number
* @__hba: pointer to adapter instance
+ * @time_limit: maximum amount of jiffies to spend executing
+ *
+ * Interrupts are initially served from hardirq context with a time limit, but
+ * if there is more work to be done than can be completed before the limit
+ * expires, remaining work is delegated to the IRQ thread. This helper does the
+ * bulk of the work in either case - if @time_limit is set, it is being run from
+ * hardirq context, otherwise from the threaded interrupt handler.
*
* Return:
- * IRQ_HANDLED - If interrupt is valid
- * IRQ_NONE - If invalid interrupt
+ * IRQ_HANDLED - If interrupt was fully handled
+ * IRQ_WAKE_THREAD - If further interrupt processing should be delegated to the
+ * thread
+ * IRQ_NONE - If invalid interrupt
*/
-static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
+static irqreturn_t ufshcd_intr_helper(int irq, void *__hba,
+ unsigned long time_limit)
{
u32 last_intr_status, intr_status, enabled_intr_status = 0;
irqreturn_t retval = IRQ_NONE;
@@ -7062,15 +7142,22 @@ static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
* if the reqs get finished 1 by 1 after the interrupt status is
* read, make sure we handle them by checking the interrupt status
* again in a loop until we process all of the reqs before returning.
+ * This done until the time limit is exceeded, at which point further
+ * processing is delegated to the threaded handler.
*/
- while (intr_status && retries--) {
+ while (intr_status && !(retval & IRQ_WAKE_THREAD) && retries--) {
enabled_intr_status =
intr_status & ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
ufshcd_writel(hba, intr_status, REG_INTERRUPT_STATUS);
if (enabled_intr_status)
- retval |= ufshcd_sl_intr(hba, enabled_intr_status);
+ retval |= ufshcd_sl_intr(hba, enabled_intr_status,
+ time_limit);
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
+
+ if (intr_status && time_limit && time_after_eq(jiffies,
+ time_limit))
+ retval |= IRQ_WAKE_THREAD;
}
if (enabled_intr_status && retval == IRQ_NONE &&
@@ -7087,6 +7174,20 @@ static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
return retval;
}
+/**
+ * ufshcd_threaded_intr - Threaded interrupt service routine
+ * @irq: irq number
+ * @__hba: pointer to adapter instance
+ *
+ * Return:
+ * IRQ_HANDLED - If interrupt was fully handled
+ * IRQ_NONE - If invalid interrupt
+ */
+static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
+{
+ return ufshcd_intr_helper(irq, __hba, 0);
+}
+
/**
* ufshcd_intr - Main interrupt service routine
* @irq: irq number
@@ -7094,20 +7195,33 @@ static irqreturn_t ufshcd_threaded_intr(int irq, void *__hba)
*
* Return:
* IRQ_HANDLED - If interrupt is valid
- * IRQ_WAKE_THREAD - If handling is moved to threaded handled
+ * IRQ_WAKE_THREAD - If handling is moved to threaded handler
* IRQ_NONE - If invalid interrupt
*/
static irqreturn_t ufshcd_intr(int irq, void *__hba)
{
struct ufs_hba *hba = __hba;
+ unsigned long time_limit = jiffies +
+ usecs_to_jiffies(HARDIRQ_TIMELIMIT);
- /* Move interrupt handling to thread when MCQ & ESI are not enabled */
- if (!hba->mcq_enabled || !hba->mcq_esi_enabled)
- return IRQ_WAKE_THREAD;
-
- /* Directly handle interrupts since MCQ ESI handlers does the hard job */
- return ufshcd_sl_intr(hba, ufshcd_readl(hba, REG_INTERRUPT_STATUS) &
- ufshcd_readl(hba, REG_INTERRUPT_ENABLE));
+ /*
+ * Directly handle interrupts when MCQ & ESI are enabled since MCQ
+ * ESI handlers do the hard job.
+ */
+ if (hba->mcq_enabled && hba->mcq_esi_enabled)
+ return ufshcd_sl_intr(hba,
+ ufshcd_readl(hba, REG_INTERRUPT_STATUS) &
+ ufshcd_readl(hba, REG_INTERRUPT_ENABLE),
+ 0);
+
+ /* Otherwise handle interrupt in thread */
+ if (!time_limit)
+ /*
+ * To deal with jiffies wrapping, we just add one so that other
+ * code can reliably detect if a time limit was requested.
+ */
+ time_limit++;
+ return ufshcd_intr_helper(irq, __hba, time_limit);
}
static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag)
@@ -7540,7 +7654,7 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
__func__, pos);
}
}
- __ufshcd_transfer_req_compl(hba, pending_reqs & ~not_cleared_mask);
+ __ufshcd_transfer_req_compl(hba, pending_reqs & ~not_cleared_mask, 0);
out:
hba->req_abort_count = 0;
@@ -7696,7 +7810,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
dev_err(hba->dev,
"%s: cmd was completed, but without a notifying intr, tag = %d",
__func__, tag);
- __ufshcd_transfer_req_compl(hba, 1UL << tag);
+ __ufshcd_transfer_req_compl(hba, 1UL << tag, 0);
goto release;
}
---
base-commit: 58ba80c4740212c29a1cf9b48f588e60a7612209
change-id: 20250723-ufshcd-hardirq-c7326f642e56
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 87c4e1459e80bf65066f864c762ef4dc932fad4b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072854-unpledged-repaint-0eac@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87c4e1459e80bf65066f864c762ef4dc932fad4b Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 20 Jun 2025 19:08:09 +0100
Subject: [PATCH] ARM: 9448/1: Use an absolute path to unified.h in
KBUILD_AFLAGS
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg…
Cc: stable(a)vger.kernel.org
Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot(a)kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4808d3ed98e4..e31e95ffd33f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -149,7 +149,7 @@ endif
# Need -Uarm for gcc < 3.x
KBUILD_CPPFLAGS +=$(cpp-y)
KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
-KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include asm/unified.h -msoft-float
+KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include $(srctree)/arch/arm/include/asm/unified.h -msoft-float
KBUILD_RUSTFLAGS += --target=arm-unknown-linux-gnueabi
CHECKFLAGS += -D__arm__
[ Upstream commit 46d8c744136ce2454aa4c35c138cc06817f92b8e ]
Some Comedi subdevice instruction handlers are known to access
instruction data elements beyond the first `insn->n` elements in some
cases. The `do_insn_ioctl()` and `do_insnlist_ioctl()` functions
allocate at least `MIN_SAMPLES` (16) data elements to deal with this,
but they do not initialize all of that. For Comedi instruction codes
that write to the subdevice, the first `insn->n` data elements are
copied from user-space, but the remaining elements are left
uninitialized. That could be a problem if the subdevice instruction
handler reads the uninitialized data. Ensure that the first
`MIN_SAMPLES` elements are initialized before calling these instruction
handlers, filling the uncopied elements with 0. For
`do_insnlist_ioctl()`, the same data buffer elements are used for
handling a list of instructions, so ensure the first `MIN_SAMPLES`
elements are initialized for each instruction that writes to the
subdevice.
Fixes: ed9eccbe8970 ("Staging: add comedi core")
Cc: stable(a)vger.kernel.org # 5.13+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Link: https://lore.kernel.org/r/20250707161439.88385-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Reworked for before commit bac42fb21259 ("comedi: get rid of compat_alloc_user_space() mess in COMEDI_CMD{,TEST} compat") ]
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/comedi_fops.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index 413863bc929b..e4dfb5e7843d 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -1544,7 +1544,7 @@ static int do_insnlist_ioctl(struct comedi_device *dev,
if (copy_from_user(&insnlist, arg, sizeof(insnlist)))
return -EFAULT;
- ret = check_insnlist_len(dev, insnlist32.n_insns);
+ ret = check_insnlist_len(dev, insnlist.n_insns);
if (ret)
return ret;
insns = kcalloc(insnlist.n_insns, sizeof(*insns), GFP_KERNEL);
@@ -1580,21 +1580,27 @@ static int do_insnlist_ioctl(struct comedi_device *dev,
}
for (i = 0; i < insnlist.n_insns; ++i) {
+ unsigned int n = insns[i].n;
+
if (insns[i].insn & INSN_MASK_WRITE) {
if (copy_from_user(data, insns[i].data,
- insns[i].n * sizeof(unsigned int))) {
+ n * sizeof(unsigned int))) {
dev_dbg(dev->class_dev,
"copy_from_user failed\n");
ret = -EFAULT;
goto error;
}
+ if (n < MIN_SAMPLES) {
+ memset(&data[n], 0, (MIN_SAMPLES - n) *
+ sizeof(unsigned int));
+ }
}
ret = parse_insn(dev, insns + i, data, file);
if (ret < 0)
goto error;
if (insns[i].insn & INSN_MASK_READ) {
if (copy_to_user(insns[i].data, data,
- insns[i].n * sizeof(unsigned int))) {
+ n * sizeof(unsigned int))) {
dev_dbg(dev->class_dev,
"copy_to_user failed\n");
ret = -EFAULT;
@@ -1661,6 +1667,10 @@ static int do_insn_ioctl(struct comedi_device *dev,
ret = -EFAULT;
goto error;
}
+ if (insn.n < MIN_SAMPLES) {
+ memset(&data[insn.n], 0,
+ (MIN_SAMPLES - insn.n) * sizeof(unsigned int));
+ }
}
ret = parse_insn(dev, &insn, data, file);
if (ret < 0)
--
2.47.2
Greg recently reported 3 patches that could not be applied without
conflicts in v6.6:
- f8a1d9b18c5e ("mptcp: make fallback action and fallback decision atomic")
- def5b7b2643e ("mptcp: plug races between subflow fail and subflow creation")
- da9b2fc7b73d ("mptcp: reset fallback status gracefully at disconnect() time")
Conflicts, if any, have been resolved, and documented in each patch.
Paolo Abeni (3):
mptcp: make fallback action and fallback decision atomic
mptcp: plug races between subflow fail and subflow creation
mptcp: reset fallback status gracefully at disconnect() time
net/mptcp/options.c | 3 ++-
net/mptcp/pm.c | 8 +++++-
net/mptcp/protocol.c | 58 +++++++++++++++++++++++++++++++++++++-------
net/mptcp/protocol.h | 27 ++++++++++++++++-----
net/mptcp/subflow.c | 30 ++++++++++++++---------
5 files changed, 98 insertions(+), 28 deletions(-)
--
2.50.0
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 4225fea1cb28370086e17e82c0f69bec2779dca0 ]
I got memory leak as follows when doing fault injection test:
unreferenced object 0xffff88800906c618 (size 8):
comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s)
hex dump (first 8 bytes):
70 74 70 30 00 00 00 00 ptp0....
backtrace:
[<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0
[<0000000079f6e2ff>] kvasprintf+0xb5/0x150
[<0000000026aae54f>] kvasprintf_const+0x60/0x190
[<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150
[<000000004e35abdd>] dev_set_name+0xc0/0x100
[<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp]
[<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33]
When posix_clock_register() returns an error, the name allocated
in dev_set_name() will be leaked, the put_device() should be used
to give up the device reference, then the name will be freed in
kobject_cleanup() and other memory will be freed in ptp_clock_release().
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y, Removed
kfree(ptp->vclock_index) in the ptach, since vclock_index is
introduced in later versions]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/ptp/ptp_clock.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index c895e26b1f17..869023f0987e 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -283,15 +283,20 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
/* Create a posix clock and link it to the device. */
err = posix_clock_register(&ptp->clock, &ptp->dev);
if (err) {
+ if (ptp->pps_source)
+ pps_unregister_source(ptp->pps_source);
+
+ if (ptp->kworker)
+ kthread_destroy_worker(ptp->kworker);
+
+ put_device(&ptp->dev);
+
pr_err("failed to create posix clock\n");
- goto no_clock;
+ return ERR_PTR(err);
}
return ptp;
-no_clock:
- if (ptp->pps_source)
- pps_unregister_source(ptp->pps_source);
no_pps:
ptp_cleanup_pin_groups(ptp);
no_pin_groups:
--
2.40.4
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 87c4e1459e80bf65066f864c762ef4dc932fad4b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072848-unwitting-glandular-25db@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87c4e1459e80bf65066f864c762ef4dc932fad4b Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 20 Jun 2025 19:08:09 +0100
Subject: [PATCH] ARM: 9448/1: Use an absolute path to unified.h in
KBUILD_AFLAGS
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg…
Cc: stable(a)vger.kernel.org
Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot(a)kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4808d3ed98e4..e31e95ffd33f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -149,7 +149,7 @@ endif
# Need -Uarm for gcc < 3.x
KBUILD_CPPFLAGS +=$(cpp-y)
KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
-KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include asm/unified.h -msoft-float
+KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include $(srctree)/arch/arm/include/asm/unified.h -msoft-float
KBUILD_RUSTFLAGS += --target=arm-unknown-linux-gnueabi
CHECKFLAGS += -D__arm__
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6d496e9569983a0d7a05be6661126d0702cf94f7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072830-reveler-drop-down-d571@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6d496e9569983a0d7a05be6661126d0702cf94f7 Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Tue, 15 Jul 2025 17:58:16 +0200
Subject: [PATCH] Revert "drm/gem-shmem: Use dma_buf from GEM object instance"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This reverts commit 1a148af06000e545e714fe3210af3d77ff903c11.
The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.
Workarounds in commit 5307dce878d4 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc751 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.
Hence, this revert to going back to using .import_attach->dmabuf.
v3:
- cc stable
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Simona Vetter <simona.vetter(a)ffwll.ch>
Acked-by: Christian König <christian.koenig(a)amd.com>
Acked-by: Zack Rusin <zack.rusin(a)broadcom.com>
Cc: <stable(a)vger.kernel.org> # v6.15+
Link: https://lore.kernel.org/r/20250715155934.150656-7-tzimmermann@suse.de
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index aa43265f4f4f..a5dbee6974ab 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -349,7 +349,7 @@ int drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem,
int ret = 0;
if (drm_gem_is_imported(obj)) {
- ret = dma_buf_vmap(obj->dma_buf, map);
+ ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
} else {
pgprot_t prot = PAGE_KERNEL;
@@ -409,7 +409,7 @@ void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
struct drm_gem_object *obj = &shmem->base;
if (drm_gem_is_imported(obj)) {
- dma_buf_vunmap(obj->dma_buf, map);
+ dma_buf_vunmap(obj->import_attach->dmabuf, map);
} else {
dma_resv_assert_held(shmem->base.resv);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072801-earflap-sleet-7b13@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ff12d82dac119b4b99b5a78b5af3bf2474c0a36 Mon Sep 17 00:00:00 2001
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Date: Thu, 3 Jul 2025 17:52:32 +0800
Subject: [PATCH] ice: Fix a null pointer dereference in
ice_copy_and_init_pkg()
Add check for the return value of devm_kmemdup()
to prevent potential null pointer dereference.
Fixes: c76488109616 ("ice: Implement Dynamic Device Personalization (DDP) download")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Tested-by: Rinitha S <sx.rinitha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index 59323c019544..351824dc3c62 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -2301,6 +2301,8 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
return ICE_DDP_PKG_ERR;
buf_copy = devm_kmemdup(ice_hw_to_dev(hw), buf, len, GFP_KERNEL);
+ if (!buf_copy)
+ return ICE_DDP_PKG_ERR;
state = ice_init_pkg(hw, buf_copy, len);
if (!ice_is_init_pkg_successful(state)) {
From: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Due to a flaw in the hardware design, the GL9763e replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9763e
PCI config. Therefore, the replay timer timeout must be masked.
Also rename the gli_set_gl9763e() to gl9763e_hw_setting() for consistency.
Signed-off-by: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Cc: stable(a)vger.kernel.org
---
drivers/mmc/host/sdhci-pci-gli.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
index 98ee3191b02f..7165dde9b6b8 100644
--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -1753,7 +1753,7 @@ static int gl9763e_add_host(struct sdhci_pci_slot *slot)
return ret;
}
-static void gli_set_gl9763e(struct sdhci_pci_slot *slot)
+static void gl9763e_hw_setting(struct sdhci_pci_slot *slot)
{
struct pci_dev *pdev = slot->chip->pdev;
u32 value;
@@ -1782,6 +1782,9 @@ static void gli_set_gl9763e(struct sdhci_pci_slot *slot)
value |= FIELD_PREP(GLI_9763E_HS400_RXDLY, GLI_9763E_HS400_RXDLY_5);
pci_write_config_dword(pdev, PCIE_GLI_9763E_CLKRXDLY, value);
+ /* mask the replay timer timeout of AER */
+ sdhci_gli_mask_replay_timer_timeout(pdev);
+
pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
value &= ~GLI_9763E_VHS_REV;
value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_R);
@@ -1925,7 +1928,7 @@ static int gli_probe_slot_gl9763e(struct sdhci_pci_slot *slot)
gli_pcie_enable_msi(slot);
host->mmc_host_ops.hs400_enhanced_strobe =
gl9763e_hs400_enhanced_strobe;
- gli_set_gl9763e(slot);
+ gl9763e_hw_setting(slot);
sdhci_enable_v4_mode(host);
return 0;
--
2.43.0
From: Mingcong Bai <jeffbai(a)aosc.io>
As this component hooks into userspace API, it should be assumed that it
will play well with non-4KiB/64KiB pages.
Use `PAGE_SIZE' as the final reference for page alignment instead.
Cc: stable(a)vger.kernel.org
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: 801989b08aff ("drm/xe/uapi: Make constant comments visible in kernel doc")
Tested-by: Mingcong Bai <jeffbai(a)aosc.io>
Tested-by: Wenbin Fang <fangwenbin(a)vip.qq.com>
Tested-by: Haien Liang <27873200(a)qq.com>
Tested-by: Jianfeng Liu <liujianfeng1994(a)gmail.com>
Tested-by: Shirong Liu <lsr1024(a)qq.com>
Tested-by: Haofeng Wu <s2600cw2(a)126.com>
Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3…
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <429839446(a)qq.com>
Signed-off-by: Shang Yatsen <429839446(a)qq.com>
Signed-off-by: Mingcong Bai <jeffbai(a)aosc.io>
---
drivers/gpu/drm/xe/xe_query.c | 2 +-
include/uapi/drm/xe_drm.h | 7 +++++--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 44d44bbc71dc..f695d5d0610d 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -347,7 +347,7 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query)
config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY;
config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
- xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K;
+ xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : PAGE_SIZE;
config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits;
config->info[DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY] =
xe_exec_queue_device_get_max_priority(xe);
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index e2426413488f..5ba76b9369ba 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -397,8 +397,11 @@ struct drm_xe_query_mem_regions {
* has low latency hint support
* - %DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR - Flag is set if the
* device has CPU address mirroring support
- * - %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment
- * required by this device, typically SZ_4K or SZ_64K
+ * - %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment required
+ * by this device and the CPU. The minimum page size for the device is
+ * usually SZ_4K or SZ_64K, while for the CPU, it is PAGE_SIZE. This value
+ * is calculated by max(min_gpu_page_size, PAGE_SIZE). This alignment is
+ * enforced on buffer object allocations and VM binds.
* - %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address
* - %DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY - Value of the highest
* available exec queue priority
--
2.47.2
Hi
The IPQ50xx chip integrates only a 2x2 2.4GHz Wi-Fi module. Through the high-performance NSS core and PCIe expansion and integration
the application range of the IPQ50xx chip ranges from Wi-Fi mesh node to Enterprise AP. OEM manufacturers can think in a unified way when
doing circuit design and PCB layout, and the same packaging can really save designers a lot of trouble.
.# Part Number Manufacturer Date Code Quantity Unit Price Lead Time Condition (PCS) USD/Each one 1 IPQ-5018-0-MRQFN232-TR-01-0 QUALCOMM 2022+ 30000pcs US$3.50/pcs 7days New & original - stock 2 QCN-6102-0-DRQFN116-TR-01-1 QUALCOMM 2022+ 30000pcs US$2.50/pcs 3 QCN-9024-0-MSP234-TR-01-0 QUALCOMM 2022+ 5000pcs US$3.50/pcs 4 QCN-6112-0-DRQFN116-TR-01-0 QUALCOMM 2023+ 15000pcs US$2.70/pcs 5 QCA-8337-AL3C-R QUALCOMM 2023+ 20000pcs US$1.30/pcs
The above are our company's current inventory, all of which are genuine and original packaging.
If you need anything, please feel free to contact me, thank you
Best Regards
Maintain your product-savvy edge with . Stay Updated on News
If you prefer to exit, choose Review Communication Options.
Memory hotunplug is done under the hotplug lock and ptdump walk is done
under the init_mm.mmap_lock. Therefore, ptdump and hotunplug can run
simultaneously without any synchronization. During hotunplug,
free_empty_tables() is ultimately called to free up the pagetables.
The following race can happen, where x denotes the level of the pagetable:
CPU1 CPU2
free_empty_pxd_table
ptdump_walk_pgd()
Get p(x+1)d table from pxd entry
pxd_clear
free_hotplug_pgtable_page(p(x+1)dp)
Still using the p(x+1)d table
which leads to a user-after-free.
To solve this, we need to synchronize ptdump_walk_pgd() with
free_hotplug_pgtable_page() in such a way that ptdump never takes a
reference on a freed pagetable.
Since this race is very unlikely to happen in practice, we do not want to
penalize other code paths taking the init_mm mmap_lock. Therefore, we use
static keys. ptdump will enable the static key - upon observing that,
the free_empty_pxd_table() functions will get patched in with an
mmap_read_lock/unlock sequence. A code comment explains in detail, how
a combination of acquire semantics of static_branch_enable() and the
barriers in __flush_tlb_kernel_pgtable() ensures that ptdump will never
get a hold on the address of a freed pagetable - either ptdump will block
the table freeing path due to write locking the mmap_lock, or, the nullity
of the pxd entry will be observed by ptdump, therefore having no access to
the isolated p(x+1)d pagetable.
This bug was found by code inspection, as a result of working on [1].
1. https://lore.kernel.org/all/20250723161827.15802-1-dev.jain@arm.com/
Cc: <stable(a)vger.kernel.org>
Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Rebased on Linux 6.16.
arch/arm64/include/asm/ptdump.h | 2 ++
arch/arm64/mm/mmu.c | 61 +++++++++++++++++++++++++++++++++
arch/arm64/mm/ptdump.c | 11 ++++--
3 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index fded5358641f..4760168cbd6e 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -7,6 +7,8 @@
#include <linux/ptdump.h>
+DECLARE_STATIC_KEY_FALSE(arm64_ptdump_key);
+
#ifdef CONFIG_PTDUMP
#include <linux/mm_types.h>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 00ab1d648db6..d2feef270880 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,6 +46,8 @@
#define NO_CONT_MAPPINGS BIT(1)
#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
+DEFINE_STATIC_KEY_FALSE(arm64_ptdump_key);
+
enum pgtable_type {
TABLE_PTE,
TABLE_PMD,
@@ -1002,6 +1004,61 @@ static void unmap_hotplug_range(unsigned long addr, unsigned long end,
} while (addr = next, addr < end);
}
+/*
+ * Our objective is to prevent ptdump from reading a pagetable which has
+ * been freed. Assume that ptdump_walk_pgd() (call this thread T1)
+ * executes completely on CPU1 and free_hotplug_pgtable_page() (call this
+ * thread T2) executes completely on CPU2. Let the region sandwiched by the
+ * mmap_write_lock/unlock in T1 be called CS (the critical section).
+ *
+ * Claim: The CS of T1 will never operate on a freed pagetable.
+ *
+ * Proof:
+ *
+ * Case 1: The static branch is visible to T2.
+ *
+ * Case 1 (a): T1 acquires the lock before T2 can.
+ * T2 will block until T1 drops the lock, so free_hotplug_pgtable_page() will
+ * only be executed after T1 exits CS.
+ *
+ * Case 1 (b): T2 acquires the lock before T1 can.
+ * The acquire semantics of mmap_read_lock() ensure that an empty pagetable
+ * entry (via pxd_clear()) is visible to T1 before T1 can enter CS, therefore
+ * it is impossible for the CS to get hold of the isolated level + 1 pagetable.
+ *
+ * Case 2: The static branch is not visible to T2.
+ *
+ * Since static_branch_enable() and mmap_write_lock() (via smp_mb()) have
+ * acquire semantics, it implies that the static branch will be visible to
+ * all CPUs before T1 can enter CS. The static branch not being visible to
+ * T2 therefore implies that T1 has not yet entered CS .... (i)
+ *
+ * The sequence of barriers via __flush_tlb_kernel_pgtable() in T2
+ * implies that if the invisibility of the static branch has been
+ * observed by T2 (i.e static_branch_unlikely() is observed as false),
+ * then all CPUs will have observed an empty pagetable entry via
+ * pxd_clear() ... (ii)
+ *
+ * Combining (i) and (ii), we conclude that T1 observes an empty pagetable
+ * entry before entering CS => it is impossible for the CS to get hold of
+ * the isolated level + 1 pagetable. Q.E.D
+ *
+ * We have proven that the claim is true on the assumption that
+ * there is no context switch for T1 and T2. Note that the reasoning
+ * of the proof uses barriers operating on the inner shareable domain,
+ * which means that they will affect all CPUs, and also a context switch
+ * will insert extra barriers into the code paths => the claim will
+ * stand true even if we drop the assumption.
+ */
+static void synchronize_with_ptdump(void)
+{
+ if (!static_branch_unlikely(&arm64_ptdump_key))
+ return;
+
+ mmap_read_lock(&init_mm);
+ mmap_read_unlock(&init_mm);
+}
+
static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling)
@@ -1036,6 +1093,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
pmd_clear(pmdp);
__flush_tlb_kernel_pgtable(start);
+ synchronize_with_ptdump();
free_hotplug_pgtable_page(virt_to_page(ptep));
}
@@ -1076,6 +1134,7 @@ static void free_empty_pmd_table(pud_t *pudp, unsigned long addr,
pud_clear(pudp);
__flush_tlb_kernel_pgtable(start);
+ synchronize_with_ptdump();
free_hotplug_pgtable_page(virt_to_page(pmdp));
}
@@ -1116,6 +1175,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
p4d_clear(p4dp);
__flush_tlb_kernel_pgtable(start);
+ synchronize_with_ptdump();
free_hotplug_pgtable_page(virt_to_page(pudp));
}
@@ -1156,6 +1216,7 @@ static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
pgd_clear(pgdp);
__flush_tlb_kernel_pgtable(start);
+ synchronize_with_ptdump();
free_hotplug_pgtable_page(virt_to_page(p4dp));
}
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 421a5de806c6..d543c9f8ffa8 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -283,6 +283,13 @@ void note_page_flush(struct ptdump_state *pt_st)
note_page(pt_st, 0, -1, pte_val(pte_zero));
}
+static void arm64_ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
+{
+ static_branch_enable(&arm64_ptdump_key);
+ ptdump_walk_pgd(st, mm, NULL);
+ static_branch_disable(&arm64_ptdump_key);
+}
+
void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
{
unsigned long end = ~0UL;
@@ -311,7 +318,7 @@ void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
}
};
- ptdump_walk_pgd(&st.ptdump, info->mm, NULL);
+ arm64_ptdump_walk_pgd(&st.ptdump, info->mm);
}
static void __init ptdump_initialize(void)
@@ -353,7 +360,7 @@ bool ptdump_check_wx(void)
}
};
- ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
+ arm64_ptdump_walk_pgd(&st.ptdump, &init_mm);
if (st.wx_pages || st.uxn_pages) {
pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n",
--
2.30.2
Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"),
we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
[...]
/* Calculate the number of bytes we need to push, for this page
* specifically */
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
/* If we can't splice it, then copy it in, as normal */
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
/* Set the bvec pointing to the page, with len $bytes */
bvec_set_page(&bvec, page[i], bytes, offset);
/* Set the iter to $size, aka the size of the whole sendpages (!!!) */
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
/* Sendmsg with $size size (!!!) */
rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls
for a while. This has a been a benign bug because sendpage_ok() always
returned true. With the recent slab allocator changes being slowly
introduced into next (that disallow sendpage on large kmalloc
allocations), we have recently hit out-of-bounds crashes, due to slight
differences in iov_iter behavior between the MSG_SPLICE_PAGES and
"regular" copy paths:
(MSG_SPLICE_PAGES)
skb_splice_from_iter
iov_iter_extract_pages
iov_iter_extract_bvec_pages
uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere
skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES)
skb_copy_to_page_nocache copy=iov_iter_count
[...]
copy_from_iter
/* this doesn't help */
if (unlikely(iter->count < len))
len = iter->count;
iterate_bvec
... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the
correct byte count to tcp_sendmsg_locked.
Cc: stable(a)vger.kernel.org
Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com
Signed-off-by: Pedro Falcato <pfalcato(a)suse.de>
---
drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 3a08f57d2211..9576a2b766c4 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
bvec_set_page(&bvec, page[i], bytes, offset);
- iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again:
lock_sock(sk);
- rv = tcp_sendmsg_locked(sk, &msg, size);
+ rv = tcp_sendmsg_locked(sk, &msg, bytes);
release_sock(sk);
if (rv > 0) {
--
2.50.1
From: Edip Hazuri <edip(a)medip.dev>
The mute led on this laptop is using ALC245 but requires a quirk to work
This patch enables the existing quirk for the device.
Tested on Victus 16-r1xxx Laptop. The LED behaviour works
as intended.
v2:
- adapt the HD-audio code changes and rebase on for-next branch of tiwai/sound.git
- link to v1: https://lore.kernel.org/linux-sound/20250724210756.61453-2-edip@medip.dev/
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Edip Hazuri <edip(a)medip.dev>
---
sound/hda/codecs/realtek/alc269.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index 05019fa73..33ef08d25 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6580,6 +6580,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8c91, "HP EliteBook 660", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8c96, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8c97, "HP ZBook", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
+ SND_PCI_QUIRK(0x103c, 0x8c99, "HP Victus 16-r1xxx (MB 8C99)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8c9c, "HP Victus 16-s1xxx (MB 8C9C)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8ca1, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8ca2, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED),
--
2.50.1
Since commit 3d1f08b032dc ("mtd: spinand: Use the external ECC engine
logic") the spinand_write_page() function ignores the errors returned
by spinand_wait(). Change the code to propagate those up to the stack
as it was done before the offending change.
Cc: stable(a)vger.kernel.org
Fixes: 3d1f08b032dc ("mtd: spinand: Use the external ECC engine logic")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
drivers/mtd/nand/spi/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 7099db7a62be61f563380b724ac849057a834211..8cce63aef1b5ad7cda2f6ab28d29565aa979498f 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -688,7 +688,10 @@ int spinand_write_page(struct spinand_device *spinand,
SPINAND_WRITE_INITIAL_DELAY_US,
SPINAND_WRITE_POLL_DELAY_US,
&status);
- if (!ret && (status & STATUS_PROG_FAILED))
+ if (ret)
+ return ret;
+
+ if (status & STATUS_PROG_FAILED)
return -EIO;
return nand_ecc_finish_io_req(nand, (struct nand_page_io_req *)req);
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250708-spinand-propagate-wait-timeout-a0220ea7c322
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
It seems like what was intended is to test if the dma_map of the
previous line failed but the wrong dma address was passed.
Fixes: f88fc122cc34 ("mtd: nand: Cleanup/rework the atmel_nand driver")
Signed-off-by: Thomas Fourier <fourier.thomas(a)gmail.com>
---
v1 -> v2:
- Add stable(a)vger.kernel.org
- Fix subject prefix
drivers/mtd/nand/raw/atmel/nand-controller.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/atmel/nand-controller.c b/drivers/mtd/nand/raw/atmel/nand-controller.c
index dedcca87defc..84ab4a83cbd6 100644
--- a/drivers/mtd/nand/raw/atmel/nand-controller.c
+++ b/drivers/mtd/nand/raw/atmel/nand-controller.c
@@ -373,7 +373,7 @@ static int atmel_nand_dma_transfer(struct atmel_nand_controller *nc,
dma_cookie_t cookie;
buf_dma = dma_map_single(nc->dev, buf, len, dir);
- if (dma_mapping_error(nc->dev, dev_dma)) {
+ if (dma_mapping_error(nc->dev, buf_dma)) {
dev_err(nc->dev,
"Failed to prepare a buffer for DMA access\n");
goto err;
--
2.43.0
Before commit df6d7277e552 ("i2c: core: Do not dereference fwnode in struct
device"), i2c_unregister_device() only called fwnode_handle_put() on
of_node-s in the form of calling of_node_put(client->dev.of_node).
But after this commit the i2c_client's fwnode now unconditionally gets
fwnode_handle_put() on it.
When the i2c_client has no primary (ACPI / OF) fwnode but it does have
a software fwnode, the software-node will be the primary node and
fwnode_handle_put() will put() it.
But for the software fwnode device_remove_software_node() will also put()
it leading to a double free:
[ 82.665598] ------------[ cut here ]------------
[ 82.665609] refcount_t: underflow; use-after-free.
[ 82.665808] WARNING: CPU: 3 PID: 1502 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x11
...
[ 82.666830] RIP: 0010:refcount_warn_saturate+0xba/0x110
...
[ 82.666962] <TASK>
[ 82.666971] i2c_unregister_device+0x60/0x90
Fix this by not calling fwnode_handle_put() when the primary fwnode is
a software-node.
Fixes: df6d7277e552 ("i2c: core: Do not dereference fwnode in struct device")
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hansg(a)kernel.org>
---
drivers/i2c/i2c-core-base.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index 2ad2b1838f0f..0849aa44952d 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -1066,7 +1066,13 @@ void i2c_unregister_device(struct i2c_client *client)
of_node_clear_flag(to_of_node(fwnode), OF_POPULATED);
else if (is_acpi_device_node(fwnode))
acpi_device_clear_enumerated(to_acpi_device_node(fwnode));
- fwnode_handle_put(fwnode);
+
+ /*
+ * If the primary fwnode is a software node it is free-ed by
+ * device_remove_software_node() below, avoid double-free.
+ */
+ if (!is_software_node(fwnode))
+ fwnode_handle_put(fwnode);
device_remove_software_node(&client->dev);
device_unregister(&client->dev);
--
2.49.0
From: Kairui Song <kasong(a)tencent.com>
The current swap-in code assumes that, when a swap entry in shmem mapping
is order 0, its cached folios (if present) must be order 0 too, which
turns out not always correct.
The problem is shmem_split_large_entry is called before verifying the
folio will eventually be swapped in, one possible race is:
CPU1 CPU2
shmem_swapin_folio
/* swap in of order > 0 swap entry S1 */
folio = swap_cache_get_folio
/* folio = NULL */
order = xa_get_order
/* order > 0 */
folio = shmem_swap_alloc_folio
/* mTHP alloc failure, folio = NULL */
<... Interrupted ...>
shmem_swapin_folio
/* S1 is swapped in */
shmem_writeout
/* S1 is swapped out, folio cached */
shmem_split_large_entry(..., S1)
/* S1 is split, but the folio covering it has order > 0 now */
Now any following swapin of S1 will hang: `xa_get_order` returns 0, and
folio lookup will return a folio with order > 0. The
`xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always
return false causing swap-in to return -EEXIST.
And this looks fragile. So fix this up by allowing seeing a larger folio
in swap cache, and check the whole shmem mapping range covered by the
swapin have the right swap value upon inserting the folio. And drop the
redundant tree walks before the insertion.
This will actually improve performance, as it avoids two redundant Xarray
tree walks in the hot path, and the only side effect is that in the
failure path, shmem may redundantly reallocate a few folios causing
temporary slight memory pressure.
And worth noting, it may seems the order and value check before inserting
might help reducing the lock contention, which is not true. The swap
cache layer ensures raced swapin will either see a swap cache folio or
failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is
bypassed), so holding the folio lock and checking the folio flag is
already good enough for avoiding the lock contention. The chance that a
folio passes the swap entry value check but the shmem mapping slot has
changed should be very low.
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org>
---
mm/shmem.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 7570a24e0ae4..1d0fd266c29b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -891,7 +891,9 @@ static int shmem_add_to_page_cache(struct folio *folio,
pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
- long nr = folio_nr_pages(folio);
+ unsigned long nr = folio_nr_pages(folio);
+ swp_entry_t iter, swap;
+ void *entry;
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -903,14 +905,25 @@ static int shmem_add_to_page_cache(struct folio *folio,
gfp &= GFP_RECLAIM_MASK;
folio_throttle_swaprate(folio, gfp);
+ swap = radix_to_swp_entry(expected);
do {
+ iter = swap;
xas_lock_irq(&xas);
- if (expected != xas_find_conflict(&xas)) {
- xas_set_err(&xas, -EEXIST);
- goto unlock;
+ xas_for_each_conflict(&xas, entry) {
+ /*
+ * The range must either be empty, or filled with
+ * expected swap entries. Shmem swap entries are never
+ * partially freed without split of both entry and
+ * folio, so there shouldn't be any holes.
+ */
+ if (!expected || entry != swp_to_radix_entry(iter)) {
+ xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
+ iter.val += 1 << xas_get_order(&xas);
}
- if (expected && xas_find_conflict(&xas)) {
+ if (expected && iter.val - nr != swap.val) {
xas_set_err(&xas, -EEXIST);
goto unlock;
}
@@ -2359,7 +2372,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
error = -ENOMEM;
goto failed;
}
- } else if (order != folio_order(folio)) {
+ } else if (order > folio_order(folio)) {
/*
* Swap readahead may swap in order 0 folios into swapcache
* asynchronously, while the shmem mapping can still stores
@@ -2384,15 +2397,23 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
+ } else if (order < folio_order(folio)) {
+ swap.val = round_down(swap.val, 1 << folio_order(folio));
+ index = round_down(index, 1 << folio_order(folio));
}
alloced:
- /* We have to do this with folio locked to prevent races */
+ /*
+ * We have to do this with the folio locked to prevent races.
+ * The shmem_confirm_swap below only checks if the first swap
+ * entry matches the folio, that's enough to ensure the folio
+ * is not used outside of shmem, as shmem swap entries
+ * and swap cache folios are never partially freed.
+ */
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
- folio->swap.val != swap.val ||
!shmem_confirm_swap(mapping, index, swap) ||
- xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
+ folio->swap.val != swap.val) {
error = -EEXIST;
goto unlock;
}
--
2.50.1