From: Gabriel Totev gabriel.totev@zetier.com
[ Upstream commit c5bf96d20fd787e4909b755de4705d52f3458836 ]
When using AppArmor profiles inside an unprivileged container, the link operation observes an unshifted ouid. (tested with LXD and Incus)
For example, root inside container and uid 1000000 outside, with `owner /root/link l,` profile entry for ln:
/root$ touch chain && ln chain link ==> dmesg apparmor="DENIED" operation="link" class="file" namespace="root//lxd-feet_<var-snap-lxd-common-lxd>" profile="linkit" name="/root/link" pid=1655 comm="ln" requested_mask="l" denied_mask="l" fsuid=1000000 ouid=0 [<== should be 1000000] target="/root/chain"
Fix by mapping inode uid of old_dentry in aa_path_link() rather than using it directly, similarly to how it's mapped in __file_path_perm() later in the file.
Signed-off-by: Gabriel Totev gabriel.totev@zetier.com Signed-off-by: John Johansen john.johansen@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Clear Bug Fix**: This fixes a real bug where AppArmor incorrectly reports the unshifted uid when mediating hard link operations inside user namespaces/containers. The commit message provides a concrete example showing ouid=0 instead of the expected ouid=1000000 in container logs.
2. **Security Impact**: This is a security-relevant bug that causes AppArmor policy enforcement to behave incorrectly in containerized environments. The owner-based AppArmor rules (like `owner /root/link l,`) won't work correctly because the uid comparison is done with the wrong (unshifted) value.
3. **Minimal and Contained Fix**: The change is small and surgical - it only modifies the aa_path_link() function to properly map the inode uid through the mount's idmapping, following the exact same pattern already used in __file_path_perm(): - Uses `i_uid_into_vfsuid(mnt_idmap(target.mnt), inode)` to get the vfsuid - Converts it back with `vfsuid_into_kuid(vfsuid)` for the path_cond structure
4. **No New Features or Architecture Changes**: This is purely a bug fix that makes aa_path_link() consistent with how __file_path_perm() already handles uid mapping. No new functionality is added.
5. **Container/User Namespace Compatibility**: With the widespread use of containers (LXD, Incus, Docker with userns), this bug affects many production systems. The fix ensures AppArmor policies work correctly in these environments.
6. **Low Risk**: The change follows an established pattern in the codebase (from __file_path_perm) and only affects the specific case of hard link permission checks in user namespaces. The risk of regression is minimal.
7. **Clear Testing**: The commit message indicates this was tested with LXD and Incus containers, showing the issue is reproducible and the fix validated.
The code change is straightforward - replacing direct access to `d_backing_inode(old_dentry)->i_uid` with proper idmapping-aware conversion that respects user namespace uid shifts. This makes aa_path_link() consistent with other AppArmor file operations that already handle idmapped mounts correctly.
security/apparmor/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/security/apparmor/file.c b/security/apparmor/file.c index d52a5b14dad4..62bc46e03758 100644 --- a/security/apparmor/file.c +++ b/security/apparmor/file.c @@ -423,9 +423,11 @@ int aa_path_link(const struct cred *subj_cred, { struct path link = { .mnt = new_dir->mnt, .dentry = new_dentry }; struct path target = { .mnt = new_dir->mnt, .dentry = old_dentry }; + struct inode *inode = d_backing_inode(old_dentry); + vfsuid_t vfsuid = i_uid_into_vfsuid(mnt_idmap(target.mnt), inode); struct path_cond cond = { - d_backing_inode(old_dentry)->i_uid, - d_backing_inode(old_dentry)->i_mode + .uid = vfsuid_into_kuid(vfsuid), + .mode = inode->i_mode, }; char *buffer = NULL, *buffer2 = NULL; struct aa_profile *profile;
From: Purva Yeshi purvayeshi550@gmail.com
[ Upstream commit 487767bff572d46f7c37ad846c4078f6d6c9cc55 ]
Fix Smatch-detected error: drivers/md/dm-zoned-target.c:1073 dmz_iterate_devices() error: uninitialized symbol 'r'.
Smatch detects a possible use of the uninitialized variable 'r' in dmz_iterate_devices() because if dmz->nr_ddevs is zero, the loop is skipped and 'r' is returned without being set, leading to undefined behavior.
Initialize 'r' to 0 before the loop. This ensures that if there are no devices to iterate over, the function still returns a defined value.
Signed-off-by: Purva Yeshi purvayeshi550@gmail.com Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Analysis of the Bug
The commit fixes a genuine uninitialized variable bug in `dmz_iterate_devices()` function in drivers/md/dm-zoned-target.c:1073. The bug occurs when:
1. The variable `r` is declared but not initialized at line 1065 2. If `dmz->nr_ddevs` is 0, the for loop at line 1067 is never entered 3. The function returns `r` at line 1073, which contains garbage data
This is a clear programming error that can lead to undefined behavior. The function could return any arbitrary value when there are no devices to iterate over.
## Why This Should Be Backported
1. **Real Bug Fix**: This fixes an actual bug that can cause unpredictable behavior. An uninitialized return value can lead to: - Incorrect error handling by callers - Potential system instability - Unpredictable behavior in device mapper operations
2. **Small and Contained Fix**: The fix is minimal - just initializing `r` to 0. This follows the pattern used in other similar iterate_devices implementations like `dm-cache-target.c:3438` where `int r = 0;` is properly initialized.
3. **No Side Effects**: The change has no architectural impact or side effects. It simply ensures the function returns a defined value (0) when there are no devices to iterate, which is the expected behavior.
4. **Affects Users**: While the condition (nr_ddevs == 0) might be rare, it's still a possible configuration that could affect users of dm- zoned devices. The undefined behavior could manifest in various ways depending on what's in memory.
5. **Static Analysis Finding**: This was caught by Smatch static analysis, indicating it's a real code quality issue that should be fixed.
6. **Follows Stable Rules**: The fix is: - Obviously correct - Fixes a real bug (uninitialized variable use) - Minimal risk of regression - Small change (1 line) - No new features or architectural changes
## Technical Context
Looking at the code structure, `nr_ddevs` is set from `argc` in line 858, which must be at least 1 based on the check at line 835. However, there could be error paths or future code changes that might result in this function being called with nr_ddevs == 0, making this defensive programming important.
The fix aligns with how other device mapper targets handle their iterate_devices functions, where the return value is properly initialized before use.
drivers/md/dm-zoned-target.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c index 5da3db06da10..9da329078ea4 100644 --- a/drivers/md/dm-zoned-target.c +++ b/drivers/md/dm-zoned-target.c @@ -1062,7 +1062,7 @@ static int dmz_iterate_devices(struct dm_target *ti, struct dmz_target *dmz = ti->private; unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata); sector_t capacity; - int i, r; + int i, r = 0;
for (i = 0; i < dmz->nr_ddevs; i++) { capacity = dmz->dev[i].capacity & ~(zone_nr_sectors - 1);
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 447270cdb41b1c8c3621bb14b93a6749f942556e ]
'I3C_BCR_HDR_CAP' is still spec v1.0 and has been renamed to 'advanced capabilities' in v1.1 onwards. The ST pressure sensor LPS22DF does not have HDR, but has the 'advanced cap' bit set. The core still wants to get additional information using the CCC 'GETHDRCAP' (or GETCAPS in v1.1 onwards). Not all controllers support this CCC and will notify the upper layers about it. For instantiating the device, we can ignore this unsupported CCC as standard communication will work. Without this patch, the device will not be instantiated at all.
Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20250704204524.6124-1-wsa+renesas@sang-engineering... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit and the surrounding code, here's my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real bug that prevents device instantiation**: The commit message clearly states that "Without this patch, the device will not be instantiated at all." This is a concrete functionality issue where legitimate I3C devices (like the ST pressure sensor LPS22DF) cannot be used with controllers that don't support the GETHDRCAP CCC command.
2. **Small and contained fix**: The change is minimal - only 2 lines changed: - Line 1441: Changed from `if (ret)` to `if (ret && ret != -ENOTSUPP)` - This allows the function to continue when GETHDRCAP returns -ENOTSUPP instead of failing
3. **Clear regression prevention**: The fix prevents device initialization failure for hardware combinations that should work. The I3C spec evolved from v1.0 to v1.1, and this handles backward compatibility gracefully.
4. **No architectural changes**: This is a simple error handling adjustment that doesn't change any core functionality or introduce new features.
5. **Fixes interoperability issue**: The commit addresses a real-world hardware compatibility problem between certain I3C controllers and devices, which is exactly the type of bug that stable kernels should fix.
6. **Low risk of regression**: The change only affects the error path when GETHDRCAP fails with -ENOTSUPP. It doesn't change behavior for successful cases or other error conditions.
7. **Follows stable rules**: - It's obviously correct (treats unsupported CCC as non-fatal) - It's tiny (2-line change) - Fixes one specific issue - Fixes a real bug that prevents hardware from working
The commit allows I3C device initialization to proceed even when the controller doesn't support the GETHDRCAP command, which is reasonable since HDR capabilities are optional and standard I3C communication will still work without them.
drivers/i3c/master.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i3c/master.c b/drivers/i3c/master.c index fd81871609d9..e53c69d24873 100644 --- a/drivers/i3c/master.c +++ b/drivers/i3c/master.c @@ -1439,7 +1439,7 @@ static int i3c_master_retrieve_dev_info(struct i3c_dev_desc *dev)
if (dev->info.bcr & I3C_BCR_HDR_CAP) { ret = i3c_master_gethdrcap_locked(master, &dev->info); - if (ret) + if (ret && ret != -ENOTSUPP) return ret; }
From: Mikulas Patocka mpatocka@redhat.com
[ Upstream commit 6e11952a6abc4641dc8ae63f01b318b31b44e8db ]
If dm_register_path_selector, don't print the "version X loaded" message.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is a clear candidate for backporting to stable kernel trees. Here's my extensive analysis:
## Bug Analysis
The commit fixes a logic error in the error handling of four device- mapper path selector modules. Looking at the code changes:
1. **Original buggy behavior**: When `dm_register_path_selector()` fails (returns r < 0), the code would: - Print "register failed" error message - **Continue to print "version X loaded" message** - Return the error code
2. **Fixed behavior**: When registration fails: - Print "register failed" error message - **Return immediately with the error code** - Skip the misleading "loaded" message
## Why This Should Be Backported
1. **Clear Bug Fix**: This is a straightforward bug fix that corrects misleading log messages. When module registration fails, it incorrectly reports success with "version X loaded".
2. **Small and Contained**: The fix is minimal - just adding an early return statement in four similar init functions: - `dm_hst_init()` in dm-ps-historical-service-time.c - `dm_ql_init()` in dm-ps-queue-length.c - `dm_rr_init()` in dm-ps-round-robin.c - `dm_st_init()` in dm-ps-service-time.c
3. **No Side Effects**: The change only affects error path behavior and logging. It doesn't introduce new functionality or change any successful operation paths.
4. **User-Visible Issue**: This fixes a user-visible problem where system logs would incorrectly indicate a module loaded successfully when it actually failed.
5. **Consistent Pattern**: All four files have the identical bug and identical fix, showing this is a systematic issue that should be addressed.
6. **Low Risk**: The change is trivial - adding an early return on error. There's virtually no risk of regression since: - It only changes behavior when registration already failed - The function already returns the error code at the end - This just makes it return earlier without printing the misleading message
7. **Affects Core Subsystem**: Device-mapper is a critical storage subsystem, and accurate error reporting is important for system administrators debugging storage issues.
The commit perfectly fits stable kernel criteria: it's a small, obvious fix for a real bug that affects users (through misleading log messages), with minimal risk of introducing new problems.
drivers/md/dm-ps-historical-service-time.c | 4 +++- drivers/md/dm-ps-queue-length.c | 4 +++- drivers/md/dm-ps-round-robin.c | 4 +++- drivers/md/dm-ps-service-time.c | 4 +++- 4 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm-ps-historical-service-time.c b/drivers/md/dm-ps-historical-service-time.c index b49e10d76d03..2c8626a83de4 100644 --- a/drivers/md/dm-ps-historical-service-time.c +++ b/drivers/md/dm-ps-historical-service-time.c @@ -541,8 +541,10 @@ static int __init dm_hst_init(void) { int r = dm_register_path_selector(&hst_ps);
- if (r < 0) + if (r < 0) { DMERR("register failed %d", r); + return r; + }
DMINFO("version " HST_VERSION " loaded");
diff --git a/drivers/md/dm-ps-queue-length.c b/drivers/md/dm-ps-queue-length.c index e305f05ad1e5..eb543e6431e0 100644 --- a/drivers/md/dm-ps-queue-length.c +++ b/drivers/md/dm-ps-queue-length.c @@ -260,8 +260,10 @@ static int __init dm_ql_init(void) { int r = dm_register_path_selector(&ql_ps);
- if (r < 0) + if (r < 0) { DMERR("register failed %d", r); + return r; + }
DMINFO("version " QL_VERSION " loaded");
diff --git a/drivers/md/dm-ps-round-robin.c b/drivers/md/dm-ps-round-robin.c index d1745b123dc1..66a15ac0c22c 100644 --- a/drivers/md/dm-ps-round-robin.c +++ b/drivers/md/dm-ps-round-robin.c @@ -220,8 +220,10 @@ static int __init dm_rr_init(void) { int r = dm_register_path_selector(&rr_ps);
- if (r < 0) + if (r < 0) { DMERR("register failed %d", r); + return r; + }
DMINFO("version " RR_VERSION " loaded");
diff --git a/drivers/md/dm-ps-service-time.c b/drivers/md/dm-ps-service-time.c index 969d31c40272..f8c43aecdb27 100644 --- a/drivers/md/dm-ps-service-time.c +++ b/drivers/md/dm-ps-service-time.c @@ -341,8 +341,10 @@ static int __init dm_st_init(void) { int r = dm_register_path_selector(&st_ps);
- if (r < 0) + if (r < 0) { DMERR("register failed %d", r); + return r; + }
DMINFO("version " ST_VERSION " loaded");
From: Meagan Lloyd meaganlloyd@linux.microsoft.com
[ Upstream commit 523923cfd5d622b8f4ba893fdaf29fa6adeb8c3e ]
In using CONFIG_RTC_HCTOSYS, rtc_hctosys() will sync the RTC time to the kernel time as long as rtc_read_time() succeeds. In some power loss situations, our supercapacitor-backed DS1342 RTC comes up with either an unpredictable future time or the default 01/01/00 from the datasheet. The oscillator stop flag (OSF) is set in these scenarios due to the power loss and can be used to determine the validity of the RTC data.
This change expands the oscillator stop flag (OSF) handling that has already been implemented for some chips to the ds1341 chip (DS1341 and DS1342 share a datasheet). This handling manages the validity of the RTC data in .read_time and .set_time based on the OSF.
Signed-off-by: Meagan Lloyd meaganlloyd@linux.microsoft.com Reviewed-by: Tyler Hicks code@tyhicks.com Acked-by: Rodolfo Giometti giometti@enneenne.com Link: https://lore.kernel.org/r/1749665656-30108-3-git-send-email-meaganlloyd@linu... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Bug Fix**: This commit fixes a real bug where the DS1341/DS1342 RTC can report invalid time data after power loss without properly checking the oscillator stop flag (OSF). The commit message clearly describes the problem: "In some power loss situations, our supercapacitor-backed DS1342 RTC comes up with either an unpredictable future time or the default 01/01/00."
2. **Data Corruption Prevention**: Without this fix, systems using DS1341/DS1342 RTCs can have incorrect system time after power loss, which could lead to data corruption, incorrect timestamps, or system malfunction. This is particularly critical for systems using CONFIG_RTC_HCTOSYS.
3. **Minimal Risk**: The change is very small and contained - it only adds OSF handling for the ds1341 chip type in two switch statements (ds1307_get_time:282-287 and ds1307_set_time:380-383). The pattern exactly follows the existing OSF handling already implemented for ds_1338, ds_1340, and ds_1388 chips.
4. **Established Pattern**: The commit follows an established pattern in the driver. Looking at the code, OSF handling is already implemented for: - ds_1338: Uses DS1338_BIT_OSF in DS1307_REG_CONTROL - ds_1340: Uses DS1340_BIT_OSF in DS1340_REG_FLAG - ds_1388: Uses DS1388_BIT_OSF in DS1388_REG_FLAG - And now ds_1341: Uses DS1337_BIT_OSF in DS1337_REG_STATUS
5. **No New Features**: This is purely a bug fix - it doesn't add any new functionality, just ensures existing functionality (reading valid time) works correctly after power loss.
6. **Previous Similar Fixes**: The git history shows similar OSF fixes have been made before, such as commit f471b05f76e4 ("rtc: ds1307: Clear OSF flag on DS1388 when setting time"), indicating this is a known class of issues that needs fixing.
7. **Hardware-Specific Fix**: This only affects systems with DS1341/DS1342 RTCs, so there's no risk to systems using other RTC chips. The change is guarded by the chip type check.
The commit meets all the criteria for stable backporting: it fixes a real bug that affects users, the fix is small and self-contained, it doesn't introduce new features or architectural changes, and follows established patterns in the codebase.
drivers/rtc/rtc-ds1307.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/rtc/rtc-ds1307.c b/drivers/rtc/rtc-ds1307.c index 5efbe69bf5ca..7a60e5ca2b8a 100644 --- a/drivers/rtc/rtc-ds1307.c +++ b/drivers/rtc/rtc-ds1307.c @@ -279,6 +279,13 @@ static int ds1307_get_time(struct device *dev, struct rtc_time *t) if (tmp & DS1340_BIT_OSF) return -EINVAL; break; + case ds_1341: + ret = regmap_read(ds1307->regmap, DS1337_REG_STATUS, &tmp); + if (ret) + return ret; + if (tmp & DS1337_BIT_OSF) + return -EINVAL; + break; case ds_1388: ret = regmap_read(ds1307->regmap, DS1388_REG_FLAG, &tmp); if (ret) @@ -377,6 +384,10 @@ static int ds1307_set_time(struct device *dev, struct rtc_time *t) regmap_update_bits(ds1307->regmap, DS1340_REG_FLAG, DS1340_BIT_OSF, 0); break; + case ds_1341: + regmap_update_bits(ds1307->regmap, DS1337_REG_STATUS, + DS1337_BIT_OSF, 0); + break; case ds_1388: regmap_update_bits(ds1307->regmap, DS1388_REG_FLAG, DS1388_BIT_OSF, 0);
On 8/8/2025 8:30 AM, Sasha Levin wrote:
From: Meagan Lloyd meaganlloyd@linux.microsoft.com
[ Upstream commit 523923cfd5d622b8f4ba893fdaf29fa6adeb8c3e ]
Hi Sasha,
Without the first patch of the series, the OSF bit will get cleared in probe, so for it to meaningfully handle an invalid RTC, we'll need to take the first patch too.
The upstream commit is 48458654659c9c2e149c211d86637f1592470da5
https://lore.kernel.org/r/1749665656-30108-2-git-send-email-meaganlloyd@linu...
It removes the clear of the OSF status flag in probe in the shared ds_1337, ds_1339, ds_1341, ds_3231 switch case block.
Thank you,
Meagan
In using CONFIG_RTC_HCTOSYS, rtc_hctosys() will sync the RTC time to the kernel time as long as rtc_read_time() succeeds. In some power loss situations, our supercapacitor-backed DS1342 RTC comes up with either an unpredictable future time or the default 01/01/00 from the datasheet. The oscillator stop flag (OSF) is set in these scenarios due to the power loss and can be used to determine the validity of the RTC data.
This change expands the oscillator stop flag (OSF) handling that has already been implemented for some chips to the ds1341 chip (DS1341 and DS1342 share a datasheet). This handling manages the validity of the RTC data in .read_time and .set_time based on the OSF.
Signed-off-by: Meagan Lloyd meaganlloyd@linux.microsoft.com Reviewed-by: Tyler Hicks code@tyhicks.com Acked-by: Rodolfo Giometti giometti@enneenne.com Link: https://lore.kernel.org/r/1749665656-30108-3-git-send-email-meaganlloyd@linu... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
**Bug Fix**: This commit fixes a real bug where the DS1341/DS1342 RTC can report invalid time data after power loss without properly checking the oscillator stop flag (OSF). The commit message clearly describes the problem: "In some power loss situations, our supercapacitor-backed DS1342 RTC comes up with either an unpredictable future time or the default 01/01/00."
**Data Corruption Prevention**: Without this fix, systems using DS1341/DS1342 RTCs can have incorrect system time after power loss, which could lead to data corruption, incorrect timestamps, or system malfunction. This is particularly critical for systems using CONFIG_RTC_HCTOSYS.
**Minimal Risk**: The change is very small and contained - it only adds OSF handling for the ds1341 chip type in two switch statements (ds1307_get_time:282-287 and ds1307_set_time:380-383). The pattern exactly follows the existing OSF handling already implemented for ds_1338, ds_1340, and ds_1388 chips.
**Established Pattern**: The commit follows an established pattern in the driver. Looking at the code, OSF handling is already implemented for:
- ds_1338: Uses DS1338_BIT_OSF in DS1307_REG_CONTROL
- ds_1340: Uses DS1340_BIT_OSF in DS1340_REG_FLAG
- ds_1388: Uses DS1388_BIT_OSF in DS1388_REG_FLAG
- And now ds_1341: Uses DS1337_BIT_OSF in DS1337_REG_STATUS
**No New Features**: This is purely a bug fix - it doesn't add any new functionality, just ensures existing functionality (reading valid time) works correctly after power loss.
**Previous Similar Fixes**: The git history shows similar OSF fixes have been made before, such as commit f471b05f76e4 ("rtc: ds1307: Clear OSF flag on DS1388 when setting time"), indicating this is a known class of issues that needs fixing.
**Hardware-Specific Fix**: This only affects systems with DS1341/DS1342 RTCs, so there's no risk to systems using other RTC chips. The change is guarded by the chip type check.
The commit meets all the criteria for stable backporting: it fixes a real bug that affects users, the fix is small and self-contained, it doesn't introduce new features or architectural changes, and follows established patterns in the codebase.
drivers/rtc/rtc-ds1307.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/rtc/rtc-ds1307.c b/drivers/rtc/rtc-ds1307.c index 5efbe69bf5ca..7a60e5ca2b8a 100644 --- a/drivers/rtc/rtc-ds1307.c +++ b/drivers/rtc/rtc-ds1307.c @@ -279,6 +279,13 @@ static int ds1307_get_time(struct device *dev, struct rtc_time *t) if (tmp & DS1340_BIT_OSF) return -EINVAL; break;
- case ds_1341:
ret = regmap_read(ds1307->regmap, DS1337_REG_STATUS, &tmp);
if (ret)
return ret;
if (tmp & DS1337_BIT_OSF)
return -EINVAL;
case ds_1388: ret = regmap_read(ds1307->regmap, DS1388_REG_FLAG, &tmp); if (ret)break;
@@ -377,6 +384,10 @@ static int ds1307_set_time(struct device *dev, struct rtc_time *t) regmap_update_bits(ds1307->regmap, DS1340_REG_FLAG, DS1340_BIT_OSF, 0); break;
- case ds_1341:
regmap_update_bits(ds1307->regmap, DS1337_REG_STATUS,
DS1337_BIT_OSF, 0);
case ds_1388: regmap_update_bits(ds1307->regmap, DS1388_REG_FLAG, DS1388_BIT_OSF, 0);break;
On Mon, Aug 11, 2025 at 09:46:07AM -0700, Meagan Lloyd wrote:
On 8/8/2025 8:30 AM, Sasha Levin wrote:
From: Meagan Lloyd meaganlloyd@linux.microsoft.com
[ Upstream commit 523923cfd5d622b8f4ba893fdaf29fa6adeb8c3e ]
Hi Sasha,
Without the first patch of the series, the OSF bit will get cleared in probe, so for it to meaningfully handle an invalid RTC, we'll need to take the first patch too.
The upstream commit is 48458654659c9c2e149c211d86637f1592470da5
https://lore.kernel.org/r/1749665656-30108-2-git-send-email-meaganlloyd@linu...
It removes the clear of the OSF status flag in probe in the shared ds_1337, ds_1339, ds_1341, ds_3231 switch case block.
I'll take it too, thanks!
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 3b661ca549b9e5bb11d0bc97ada6110aac3282d2 ]
LKP found a random config which failed to build because IO accessors were not defined:
In file included from drivers/i3c/master.c:21: drivers/i3c/internals.h: In function 'i3c_writel_fifo':
drivers/i3c/internals.h:35:9: error: implicit declaration of function 'writesl' [-Werror=implicit-function-declaration]
Add the proper header to where the IO accessors are used.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202507150208.BZDzzJ5E-lkp@intel.com/ Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20250717120046.9022-2-wsa+renesas@sang-engineering... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real build failure**: The commit fixes an actual compilation error where `writesl` function is used without including the proper header (`<linux/io.h>`). This is a concrete build regression that prevents kernel compilation in certain configurations.
2. **Minimal and safe change**: The fix is extremely simple - it only adds a single `#include <linux/io.h>` line to the internals.h header file. This is a one-line change with virtually no risk of introducing regressions.
3. **Detected by kernel test robot**: The issue was found by the Linux Kernel Test Robot (LKP), indicating it affects real-world build configurations that are tested regularly. The error message shows: ``` drivers/i3c/internals.h:35:9: error: implicit declaration of function 'writesl' ```
4. **Affects core I3C infrastructure**: The internals.h header is included by core I3C files (master.c and device.c), so a build failure here can prevent the entire I3C subsystem from compiling.
5. **Recent regression**: Looking at the git history, the `i3c_writel_fifo()` and `i3c_readl_fifo()` functions were recently added in commit 733b439375b4, and are already being used by I3C controller drivers (dw-i3c-master.c and cdns-i3c-master.c as shown in commits 6e055b1fb2fc and c20d3fa70491). This missing include is a regression that breaks builds after these recent changes.
6. **Clear stable backport criteria**: This meets the stable kernel rules perfectly: - It fixes a real bug (build failure) - It's obviously correct (missing include for used functions) - It's a minimal change (1 line) - No new features or architectural changes - Zero risk of functional regression
The commit is a textbook example of what should be backported to stable trees - a simple, obvious fix for a build regression with no side effects.
drivers/i3c/internals.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i3c/internals.h b/drivers/i3c/internals.h index 433f6088b7ce..ce04aa4f269e 100644 --- a/drivers/i3c/internals.h +++ b/drivers/i3c/internals.h @@ -9,6 +9,7 @@ #define I3C_INTERNALS_H
#include <linux/i3c/master.h> +#include <linux/io.h>
void i3c_bus_normaluse_lock(struct i3c_bus *bus); void i3c_bus_normaluse_unlock(struct i3c_bus *bus);
From: Jorge Marques jorge.marques@analog.com
[ Upstream commit 290ce8b2d0745e45a3155268184523a8c75996f1 ]
Set ret to -EINVAL if i3c_i2c_notifier_call() receives an invalid action, resolving uninitialized warning.
Signed-off-by: Jorge Marques jorge.marques@analog.com Reviewed-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20250622-i3c-master-ret-uninitialized-v1-1-aabb562... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Analysis
The commit fixes a serious uninitialized variable bug in `i3c_i2c_notifier_call()`. Looking at the code:
1. **The Bug**: The function declares `int ret` at line 2449 but doesn't initialize it. The switch statement only handles two cases (`BUS_NOTIFY_ADD_DEVICE` and `BUS_NOTIFY_DEL_DEVICE`), leaving `ret` uninitialized for any other action values.
2. **When It Triggers**: Since this notifier is registered with `bus_register_notifier(&i2c_bus_type, &i2cdev_notifier)`, it receives ALL bus notification events for I2C devices, including: - `BUS_NOTIFY_REMOVED_DEVICE` - `BUS_NOTIFY_BIND_DRIVER` - `BUS_NOTIFY_BOUND_DRIVER` - `BUS_NOTIFY_UNBIND_DRIVER` - `BUS_NOTIFY_UNBOUND_DRIVER` - `BUS_NOTIFY_DRIVER_NOT_BOUND`
3. **Impact**: When any of these unhandled actions occur, the function returns an uninitialized stack value, which could: - Accidentally return `NOTIFY_STOP` or `NOTIFY_BAD`, halting the notification chain - Cause unpredictable behavior in the device/driver binding process - Lead to intermittent, hard-to-debug failures
## Backport Criteria Analysis
1. **Fixes a real bug**: ✓ Yes - fixes an uninitialized variable that causes undefined behavior 2. **Small and contained**: ✓ Yes - adds only 2 lines (default case returning -EINVAL) 3. **No side effects**: ✓ Correct - the fix properly handles unexpected actions by returning an error 4. **No architectural changes**: ✓ Correct - simple bug fix, no design changes 5. **Critical subsystem**: ✓ Yes - affects I3C/I2C device management and driver binding 6. **Follows stable rules**: ✓ Yes - important bugfix with minimal risk
## Additional Context
- The bug was introduced in commit 72a4501b5d08 ("i3c: support dynamically added i2c devices") in January 2022 - The I3C subsystem has had other uninitialized variable fixes (e.g., commit 6cbf8b38dfe3) - This is a classic compiler warning fix that prevents real runtime issues - The fix is conservative, returning -EINVAL for unexpected actions rather than silently ignoring them
This is exactly the type of bug that stable kernels should fix: a real issue with unpredictable runtime consequences, fixed with a minimal, safe change.
drivers/i3c/master.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/i3c/master.c b/drivers/i3c/master.c index e53c69d24873..dfa0bad991cf 100644 --- a/drivers/i3c/master.c +++ b/drivers/i3c/master.c @@ -2467,6 +2467,8 @@ static int i3c_i2c_notifier_call(struct notifier_block *nb, unsigned long action case BUS_NOTIFY_DEL_DEVICE: ret = i3c_master_i2c_detach(adap, client); break; + default: + ret = -EINVAL; } i3c_bus_maintenance_unlock(&master->bus);
From: Timothy Pearson tpearson@raptorengineering.com
[ Upstream commit 80f9fc2362797538ebd4fd70a1dfa838cc2c2cdb ]
The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system was observed to incorrectly assert the Presence Detect Set bit in its capabilities when tested on a Raptor Computing Systems Blackbird system, resulting in the hot insert path never attempting a rescan of the bus and any downstream devices not being re-detected.
Work around this by additionally checking whether the PCIe data link is active or not when performing presence detection on downstream switches' ports, similar to the pciehp_hpc.c driver.
Signed-off-by: Shawn Anastasio sanastasio@raptorengineering.com Signed-off-by: Timothy Pearson tpearson@raptorengineering.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/505981576.1359853.1752615415117.JavaMail.zimbra@rap... Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees based on the following analysis:
## Commit Analysis
### 1. Bug Fix Nature The commit fixes a real hardware bug where the Microsemi Switchtec PM8533 PFX 48xG3 PCIe switch incorrectly reports presence detection. This causes downstream devices to not be re-detected after hot insertion, which is a significant functional issue affecting users.
### 2. Small and Contained Fix The patch adds only 33 lines of code: - Adds a new helper function `pcie_check_link_active()` (11 lines) - Modifies `pnv_php_get_adapter_state()` with a targeted workaround (13 lines) - The changes are confined to a single file: `drivers/pci/hotplug/pnv_php.c`
### 3. Clear and Minimal Side Effects The workaround is: - Only activated for specific hardware (downstream PCIe ports) - Only when presence detection returns EMPTY - Uses an additional check (link active status) as a fallback - Similar approach already proven in pciehp_hpc.c driver
### 4. Hardware Quirk Fix This is explicitly a hardware quirk workaround for broken hardware (Microsemi Switchtec PM8533), which is one of the accepted categories for stable backports according to stable kernel rules.
### 5. No Architectural Changes The commit: - Does not introduce new features - Does not change any APIs or interfaces - Does not modify core kernel infrastructure - Simply adds a fallback detection mechanism
### 6. Tested Solution The commit message indicates this was "observed" and tested on a Raptor Computing Systems Blackbird system, suggesting real-world validation.
## Stable Tree Compliance
According to Documentation/process/stable-kernel-rules.rst, this patch meets the criteria: - Fixes a real bug that bothers people (hotplug detection failure) - Is a hardware quirk fix (explicitly allowed) - Is under 100 lines with context - Is obviously correct (uses established pattern from pciehp_hpc.c) - Has clear user impact (devices not being detected after hot insertion)
The workaround pattern of checking link active status when presence detection fails is a well-established approach in PCIe hotplug drivers, making this a low-risk backport candidate.
drivers/pci/hotplug/pnv_php.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 573a41869c15..aaa8e0a2a893 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -391,6 +391,20 @@ static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state) return 0; }
+static int pcie_check_link_active(struct pci_dev *pdev) +{ + u16 lnk_status; + int ret; + + ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); + if (ret == PCIBIOS_DEVICE_NOT_FOUND || PCI_POSSIBLE_ERROR(lnk_status)) + return -ENODEV; + + ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); + + return ret; +} + static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) { struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); @@ -403,6 +417,19 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) */ ret = pnv_pci_get_presence_state(php_slot->id, &presence); if (ret >= 0) { + if (pci_pcie_type(php_slot->pdev) == PCI_EXP_TYPE_DOWNSTREAM && + presence == OPAL_PCI_SLOT_EMPTY) { + /* + * Similar to pciehp_hpc, check whether the Link Active + * bit is set to account for broken downstream bridges + * that don't properly assert Presence Detect State, as + * was observed on the Microsemi Switchtec PM8533 PFX + * [11f8:8533]. + */ + if (pcie_check_link_active(php_slot->pdev) > 0) + presence = OPAL_PCI_SLOT_PRESENT; + } + *state = presence; ret = 0; } else {
From: Petr Pavlu petr.pavlu@suse.com
[ Upstream commit a6323bd4e611567913e23df5b58f2d4e4da06789 ]
Passing a module name longer than MODULE_NAME_LEN to the delete_module syscall results in its silent truncation. This really isn't much of a problem in practice, but it could theoretically lead to the removal of an incorrect module. It is more sensible to return ENAMETOOLONG or ENOENT in such a case.
Update the syscall to return ENOENT, as documented in the delete_module(2) man page to mean "No module by that name exists." This is appropriate because a module with a name longer than MODULE_NAME_LEN cannot be loaded in the first place.
Signed-off-by: Petr Pavlu petr.pavlu@suse.com Reviewed-by: Daniel Gomez da.gomez@samsung.com Link: https://lore.kernel.org/r/20250630143535.267745-2-petr.pavlu@suse.com Signed-off-by: Daniel Gomez da.gomez@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis, here's my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Security Fix for Silent Truncation**: The commit fixes a security- relevant issue where module names longer than MODULE_NAME_LEN (56 bytes on 64-bit systems) are silently truncated in the delete_module() syscall. This could theoretically lead to unintentionally removing the wrong module if two modules share the same truncated prefix.
2. **Minimal and Contained Change**: The fix is very small and localized - it only modifies the delete_module syscall implementation in kernel/module/main.c. The changes are: - Changes from `strncpy_from_user(name, name_user, MODULE_NAME_LEN-1)` to `strncpy_from_user(name, name_user, MODULE_NAME_LEN)` - Adds proper length checking: `if (len == 0 || len == MODULE_NAME_LEN) return -ENOENT;` - Returns the correct error code (ENOENT) for oversized names
3. **Follows Stable Tree Rules**: - Fixes a bug that could affect users (incorrect module removal) - Very low risk of regression - the change only affects error handling for invalid input - No new features or architectural changes - Improves consistency with documented behavior (delete_module(2) man page)
4. **Prevents Inconsistent Behavior**: The commit message correctly points out that modules with names longer than MODULE_NAME_LEN cannot be loaded in the first place, so returning ENOENT for such names in delete_module makes the behavior consistent across module operations.
5. **Clear Bug Fix**: The old code would accept a 57+ character module name, truncate it to 56 characters, and potentially remove a different module. The new code properly rejects such names with ENOENT, preventing this dangerous silent truncation.
The change is defensive programming that prevents a potential security issue without introducing any backward compatibility concerns, making it an ideal candidate for stable backporting.
kernel/module/main.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/module/main.c b/kernel/module/main.c index c2c08007029d..cbd637627eb4 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -751,14 +751,16 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user, struct module *mod; char name[MODULE_NAME_LEN]; char buf[MODULE_FLAGS_BUF_SIZE]; - int ret, forced = 0; + int ret, len, forced = 0;
if (!capable(CAP_SYS_MODULE) || modules_disabled) return -EPERM;
- if (strncpy_from_user(name, name_user, MODULE_NAME_LEN-1) < 0) - return -EFAULT; - name[MODULE_NAME_LEN-1] = '\0'; + len = strncpy_from_user(name, name_user, MODULE_NAME_LEN); + if (len == 0 || len == MODULE_NAME_LEN) + return -ENOENT; + if (len < 0) + return len;
audit_log_kern_module(name);
From: Mateusz Guzik mjguzik@gmail.com
[ Upstream commit 67e370aa7f968f6a4f3573ed61a77b36d1b26475 ]
This follows the established practice and fixes a build failure for me: security/apparmor/file.c: In function ‘__file_sock_perm’: security/apparmor/file.c:544:24: error: unused variable ‘sock’ [-Werror=unused-variable] 544 | struct socket *sock = (struct socket *) file->private_data; | ^~~~
Signed-off-by: Mateusz Guzik mjguzik@gmail.com Signed-off-by: John Johansen john.johansen@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **It fixes a real build failure**: The commit message clearly states it fixes a build failure with `-Werror=unused-variable`. When `CONFIG_SECURITY_APPARMOR_DEBUG_ASSERTS` is disabled, the `AA_BUG_FMT` macro becomes just `no_printk(fmt, ##args)`, which doesn't evaluate the condition `X`. This causes the `sock` variable in `__file_sock_perm()` to be unused, triggering a compiler warning that becomes an error with `-Werror`.
2. **The fix is minimal and contained**: The change only modifies the `AA_BUG_FMT` macro definition when debug is disabled, adding `BUILD_BUG_ON_INVALID(X)` which forces the compiler to evaluate the condition without generating any runtime code. This is a well- established pattern in the kernel (as seen in `VM_BUG_ON` and similar macros in `include/linux/mmdebug.h`).
3. **No functional changes or side effects**: The fix doesn't change any runtime behavior. `BUILD_BUG_ON_INVALID(e)` is defined as `((void)(sizeof((__force long)(e))))` which only evaluates the expression at compile time to ensure it's valid, without generating any code.
4. **Follows established kernel patterns**: The commit message states "This follows the established practice," and indeed, examining other kernel debug macros like `VM_BUG_ON` shows they use the exact same pattern - using `BUILD_BUG_ON_INVALID` when debug is disabled to ensure the condition is still evaluated by the compiler.
5. **Low risk**: This is a compile-time only change that prevents build failures. It cannot introduce runtime regressions since it generates no runtime code.
6. **Affects a security subsystem**: AppArmor is a security module, and build failures in security code can prevent users from building kernels with their desired security configuration.
The commit is a classic example of a safe, minimal fix that resolves a real problem without introducing new risks, making it an ideal candidate for stable backporting.
security/apparmor/include/lib.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/security/apparmor/include/lib.h b/security/apparmor/include/lib.h index f11a0db7f51d..e83f45e936a7 100644 --- a/security/apparmor/include/lib.h +++ b/security/apparmor/include/lib.h @@ -48,7 +48,11 @@ extern struct aa_dfa *stacksplitdfa; #define AA_BUG_FMT(X, fmt, args...) \ WARN((X), "AppArmor WARN %s: (" #X "): " fmt, __func__, ##args) #else -#define AA_BUG_FMT(X, fmt, args...) no_printk(fmt, ##args) +#define AA_BUG_FMT(X, fmt, args...) \ + do { \ + BUILD_BUG_ON_INVALID(X); \ + no_printk(fmt, ##args); \ + } while (0) #endif
#define AA_ERROR(fmt, args...) \
From: Timothy Pearson tpearson@raptorengineering.com
[ Upstream commit 1010b4c012b0d78dfb9d3132b49aa2ef024a07a7 ]
Multiple race conditions existed between the PCIe hotplug driver and the EEH driver, leading to a variety of kernel oopses of the same general nature:
<pcie device unplug> <eeh driver trigger> <hotplug removal trigger> <pcie tree reconfiguration> <eeh recovery next step> <oops in EEH driver bus iteration loop>
A second class of oops is also seen when the underlying bus disappears during device recovery.
Refactor the EEH module to be PCI rescan and remove safe. Also clean up a few minor formatting / readability issues.
Signed-off-by: Timothy Pearson tpearson@raptorengineering.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/1334208367.1359861.1752615503144.JavaMail.zimbra@ra... Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees for the following reasons:
## 1. **Critical Bug Fix for Race Conditions** The commit explicitly fixes "Multiple race conditions" between the PCIe hotplug driver and the EEH (Enhanced Error Handling) driver that lead to kernel oopses. These are serious kernel crashes that can cause system instability or complete system failure.
## 2. **Clear Bug Symptoms Described** The commit message describes two classes of kernel oopses: - Race conditions during concurrent PCIe device unplug and EEH recovery operations - Crashes when the underlying bus disappears during device recovery
These are real-world stability issues that affect system reliability.
## 3. **Focused Fix with Clear Scope** The changes are focused on making the EEH driver "PCI rescan and remove safe" by properly coordinating locking around critical sections. The key changes include:
- Moving `pci_lock_rescan_remove()` calls to protect larger critical sections in `eeh_handle_normal_event()` and `eeh_handle_special_event()` - Removing scattered lock/unlock pairs from individual functions like `eeh_pe_report_edev()` and `eeh_reset_device()` - Adding proper NULL checks (e.g., checking if `edev->pdev` exists before dereferencing in `eeh_bridge_check_link()`) - Adding `*result = PCI_ERS_RESULT_DISCONNECT` when no device is found
## 4. **Minimal Risk of Regression** The changes are: - Confined to the PowerPC EEH subsystem - Primarily involve moving existing locking to broader scopes - Add defensive NULL checks - Don't introduce new features or change architecture
## 5. **Follows Stable Kernel Rules** This fix meets the stable kernel criteria: - Fixes a real bug (kernel oopses) - Has been tested (signed off by multiple maintainers) - Is self-contained to a specific subsystem - Doesn't add new features - Has clear bug symptoms that users experience
## 6. **Critical for PowerPC Server Systems** EEH is crucial for PowerPC server systems' reliability, especially in environments with PCIe hotplug operations. The race conditions can lead to unexpected downtime in production systems.
The refactoring to consolidate the locking at higher levels (in the main event handlers) rather than scattered throughout the code makes the locking strategy clearer and reduces the chance of deadlocks or missed locking, which is a good defensive programming practice for stable kernels.
arch/powerpc/kernel/eeh_driver.c | 48 +++++++++++++++++++++----------- arch/powerpc/kernel/eeh_pe.c | 10 ++++--- 2 files changed, 38 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 7efe04c68f0f..dd50de91c438 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -257,13 +257,12 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn, struct pci_driver *driver; enum pci_ers_result new_result;
- pci_lock_rescan_remove(); pdev = edev->pdev; if (pdev) get_device(&pdev->dev); - pci_unlock_rescan_remove(); if (!pdev) { eeh_edev_info(edev, "no device"); + *result = PCI_ERS_RESULT_DISCONNECT; return; } device_lock(&pdev->dev); @@ -304,8 +303,9 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root, struct eeh_dev *edev, *tmp;
pr_info("EEH: Beginning: '%s'\n", name); - eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp) - eeh_pe_report_edev(edev, fn, result); + eeh_for_each_pe(root, pe) + eeh_pe_for_each_dev(pe, edev, tmp) + eeh_pe_report_edev(edev, fn, result); if (result) pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n", name, pci_ers_result_name(*result)); @@ -383,6 +383,8 @@ static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) if (!edev) return;
+ pci_lock_rescan_remove(); + /* * The content in the config space isn't saved because * the blocked config space on some adapters. We have @@ -393,14 +395,19 @@ static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata) if (list_is_last(&edev->entry, &edev->pe->edevs)) eeh_pe_restore_bars(edev->pe);
+ pci_unlock_rescan_remove(); return; }
pdev = eeh_dev_to_pci_dev(edev); - if (!pdev) + if (!pdev) { + pci_unlock_rescan_remove(); return; + }
pci_restore_state(pdev); + + pci_unlock_rescan_remove(); }
/** @@ -647,9 +654,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, if (any_passed || driver_eeh_aware || (pe->type & EEH_PE_VF)) { eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data); } else { - pci_lock_rescan_remove(); pci_hp_remove_devices(bus); - pci_unlock_rescan_remove(); }
/* @@ -665,8 +670,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, if (rc) return rc;
- pci_lock_rescan_remove(); - /* Restore PE */ eeh_ops->configure_bridge(pe); eeh_pe_restore_bars(pe); @@ -674,7 +677,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, /* Clear frozen state */ rc = eeh_clear_pe_frozen_state(pe, false); if (rc) { - pci_unlock_rescan_remove(); return rc; }
@@ -709,7 +711,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, pe->tstamp = tstamp; pe->freeze_count = cnt;
- pci_unlock_rescan_remove(); return 0; }
@@ -843,10 +844,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe) {LIST_HEAD_INIT(rmv_data.removed_vf_list), 0}; int devices = 0;
+ pci_lock_rescan_remove(); + bus = eeh_pe_bus_get(pe); if (!bus) { pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n", __func__, pe->phb->global_number, pe->addr); + pci_unlock_rescan_remove(); return; }
@@ -1094,10 +1098,15 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true); eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
- pci_lock_rescan_remove(); - pci_hp_remove_devices(bus); - pci_unlock_rescan_remove(); + bus = eeh_pe_bus_get(pe); + if (bus) + pci_hp_remove_devices(bus); + else + pr_err("%s: PCI bus for PHB#%x-PE#%x disappeared\n", + __func__, pe->phb->global_number, pe->addr); + /* The passed PE should no longer be used */ + pci_unlock_rescan_remove(); return; }
@@ -1114,6 +1123,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_clear_slot_attention(edev->pdev);
eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true); + + pci_unlock_rescan_remove(); }
/** @@ -1132,6 +1143,7 @@ void eeh_handle_special_event(void) unsigned long flags; int rc;
+ pci_lock_rescan_remove();
do { rc = eeh_ops->next_error(&pe); @@ -1171,10 +1183,12 @@ void eeh_handle_special_event(void)
break; case EEH_NEXT_ERR_NONE: + pci_unlock_rescan_remove(); return; default: pr_warn("%s: Invalid value %d from next_error()\n", __func__, rc); + pci_unlock_rescan_remove(); return; }
@@ -1186,7 +1200,9 @@ void eeh_handle_special_event(void) if (rc == EEH_NEXT_ERR_FROZEN_PE || rc == EEH_NEXT_ERR_FENCED_PHB) { eeh_pe_state_mark(pe, EEH_PE_RECOVERING); + pci_unlock_rescan_remove(); eeh_handle_normal_event(pe); + pci_lock_rescan_remove(); } else { eeh_for_each_pe(pe, tmp_pe) eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev) @@ -1199,7 +1215,6 @@ void eeh_handle_special_event(void) eeh_report_failure, NULL); eeh_set_channel_state(pe, pci_channel_io_perm_failure);
- pci_lock_rescan_remove(); list_for_each_entry(hose, &hose_list, list_node) { phb_pe = eeh_phb_pe_get(hose); if (!phb_pe || @@ -1218,7 +1233,6 @@ void eeh_handle_special_event(void) } pci_hp_remove_devices(bus); } - pci_unlock_rescan_remove(); }
/* @@ -1228,4 +1242,6 @@ void eeh_handle_special_event(void) if (rc == EEH_NEXT_ERR_DEAD_IOC) break; } while (rc != EEH_NEXT_ERR_NONE); + + pci_unlock_rescan_remove(); } diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index d283d281d28e..e740101fadf3 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -671,10 +671,12 @@ static void eeh_bridge_check_link(struct eeh_dev *edev) eeh_ops->write_config(edev, cap + PCI_EXP_LNKCTL, 2, val);
/* Check link */ - if (!edev->pdev->link_active_reporting) { - eeh_edev_dbg(edev, "No link reporting capability\n"); - msleep(1000); - return; + if (edev->pdev) { + if (!edev->pdev->link_active_reporting) { + eeh_edev_dbg(edev, "No link reporting capability\n"); + msleep(1000); + return; + } }
/* Wait the link is up until timeout (5s) */
From: John Johansen john.johansen@canonical.com
[ Upstream commit a9eb185be84e998aa9a99c7760534ccc06216705 ]
x_table_lookup currently does stacking during label_parse() if the target specifies a stack but its only caller ensures that it will never be used with stacking.
Refactor to slightly simplify the code in x_to_label(), this also fixes a long standing problem where x_to_labels check on stacking is only on the first element to the table option list, instead of the element that is found and used.
Signed-off-by: John Johansen john.johansen@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Bug Fix Analysis
### 1. **Critical Bug Fix** This fixes a **long-standing bug** in AppArmor's domain transition handling where stacking directives would fail when they weren't the first entry in the transition table. The commit message explicitly states this fixes a "long standing problem" where the stacking check was only performed on the first element rather than the element actually found and used.
### 2. **Security Impact** - **Policy Bypass Risk**: The bug could cause incorrect security profiles to be applied during domain transitions, potentially allowing unauthorized access - **Inconsistent Enforcement**: Applications might not receive the correct stacked security context, leading to security policy violations - **Profile Selection Errors**: Wrong profiles could be selected when stacking is involved, compromising the intended security boundaries
### 3. **Small and Contained Fix** The changes are: - Limited to two functions in a single file (`security/apparmor/domain.c`) - Focused on fixing the logic flow without introducing new features - Clear refactoring that separates concerns properly
### 4. **Low Regression Risk** - The fix simplifies the code by properly separating lookup from stacking logic - Changes from `aa_label_parse()` to `aa_label_merge()` for stacking is the correct semantic operation - Proper resource management with `aa_put_label(stack)` prevents memory leaks - The refactored loop in `x_table_lookup()` is cleaner and easier to understand
### 5. **Real-World Impact** This bug would manifest in production systems where: - AppArmor profiles use stacking features - The stacking directive appears after other entries in the transition table - Domain transitions are critical for security isolation
### 6. **Clear Problem and Solution** The commit clearly identifies: - **Problem**: Stacking check only on first element, not the found element - **Solution**: Proper separation of lookup and stacking logic, checking the actual found entry
### 7. **Maintainer Authorship** The commit is authored by John Johansen, the AppArmor maintainer, indicating authoritative understanding of the subsystem and the fix's importance.
This is exactly the type of commit that stable trees should include: a clear bug fix for a security-relevant subsystem that could cause policy enforcement failures in production systems, with minimal code changes and low regression risk.
security/apparmor/domain.c | 52 +++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 23 deletions(-)
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c index 5939bd9a9b9b..08ca9057f82b 100644 --- a/security/apparmor/domain.c +++ b/security/apparmor/domain.c @@ -508,6 +508,7 @@ static const char *next_name(int xtype, const char *name) * @name: returns: name tested to find label (NOT NULL) * * Returns: refcounted label, or NULL on failure (MAYBE NULL) + * @name will always be set with the last name tried */ struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex, const char **name) @@ -517,6 +518,7 @@ struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex, struct aa_label *label = NULL; u32 xtype = xindex & AA_X_TYPE_MASK; int index = xindex & AA_X_INDEX_MASK; + const char *next;
AA_BUG(!name);
@@ -524,25 +526,27 @@ struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex, /* TODO: move lookup parsing to unpack time so this is a straight * index into the resultant label */ - for (*name = rules->file->trans.table[index]; !label && *name; - *name = next_name(xtype, *name)) { + for (next = rules->file->trans.table[index]; next; + next = next_name(xtype, next)) { + const char *lookup = (*next == '&') ? next + 1 : next; + *name = next; if (xindex & AA_X_CHILD) { - struct aa_profile *new_profile; - /* release by caller */ - new_profile = aa_find_child(profile, *name); - if (new_profile) - label = &new_profile->label; + /* TODO: switich to parse to get stack of child */ + struct aa_profile *new = aa_find_child(profile, lookup); + + if (new) + /* release by caller */ + return &new->label; continue; } - label = aa_label_parse(&profile->label, *name, GFP_KERNEL, + label = aa_label_parse(&profile->label, lookup, GFP_KERNEL, true, false); - if (IS_ERR(label)) - label = NULL; + if (!IS_ERR_OR_NULL(label)) + /* release by caller */ + return label; }
- /* released by caller */ - - return label; + return NULL; }
/** @@ -567,9 +571,9 @@ static struct aa_label *x_to_label(struct aa_profile *profile, struct aa_ruleset *rules = list_first_entry(&profile->rules, typeof(*rules), list); struct aa_label *new = NULL; + struct aa_label *stack = NULL; struct aa_ns *ns = profile->ns; u32 xtype = xindex & AA_X_TYPE_MASK; - const char *stack = NULL;
switch (xtype) { case AA_X_NONE: @@ -578,13 +582,14 @@ static struct aa_label *x_to_label(struct aa_profile *profile, break; case AA_X_TABLE: /* TODO: fix when perm mapping done at unload */ - stack = rules->file->trans.table[xindex & AA_X_INDEX_MASK]; - if (*stack != '&') { - /* released by caller */ - new = x_table_lookup(profile, xindex, lookupname); - stack = NULL; + /* released by caller + * if null for both stack and direct want to try fallback + */ + new = x_table_lookup(profile, xindex, lookupname); + if (!new || **lookupname != '&') break; - } + stack = new; + new = NULL; fallthrough; /* to X_NAME */ case AA_X_NAME: if (xindex & AA_X_CHILD) @@ -599,6 +604,7 @@ static struct aa_label *x_to_label(struct aa_profile *profile, break; }
+ /* fallback transition check */ if (!new) { if (xindex & AA_X_INHERIT) { /* (p|c|n)ix - don't change profile but do @@ -617,12 +623,12 @@ static struct aa_label *x_to_label(struct aa_profile *profile, /* base the stack on post domain transition */ struct aa_label *base = new;
- new = aa_label_parse(base, stack, GFP_KERNEL, true, false); - if (IS_ERR(new)) - new = NULL; + new = aa_label_merge(base, stack, GFP_KERNEL); + /* null on error */ aa_put_label(base); }
+ aa_put_label(stack); /* released by caller */ return new; }
From: Benjamin Marzinski bmarzins@redhat.com
[ Upstream commit 8ca719b81987be690f197e82fdb030580c0a07f3 ]
Due to the semantics of iterate_devices(), the current code allows a request-based dm table as long as it includes one request-stackable device. It is supposed to only allow tables where there are no non-request-stackable devices.
Signed-off-by: Benjamin Marzinski bmarzins@redhat.com Reviewed-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit and the code context, here's my assessment:
**Backport Status: YES**
This commit fixes a clear logic bug in the device mapper table validation code that could lead to incorrect device stacking configurations. Here's the detailed analysis:
**1. The Bug:** The original code had inverted logic when checking if devices are request-stackable: - The function `device_is_rq_stackable()` returned `true` (non-zero) for stackable devices - But `iterate_devices()` stops iteration on non-zero returns and returns that non-zero value - The check `!ti->type->iterate_devices(ti, device_is_rq_stackable, NULL)` would pass if ANY device returned true (stackable), not if ALL devices were stackable
**2. The Fix:** - Renamed function to `device_is_not_rq_stackable()` to clarify its purpose - Inverted the return values: returns `true` for non-stackable devices - Changed the logic check to `ti->type->iterate_devices(ti, device_is_not_rq_stackable, NULL)` without negation - Now correctly fails if ANY device is not stackable
**3. Why This Should Be Backported:**
a) **Fixes a real bug**: The current code allows invalid device stacking configurations where request-based DM tables could include non-request- stackable devices, potentially leading to: - Data corruption - System crashes - I/O failures
b) **Small and contained fix**: The change is minimal (4 lines changed) and only affects the validation logic without changing any core functionality
c) **Clear semantics**: The fix makes the code's intent clearer by renaming the function to match what it actually checks
d) **No architectural changes**: This is purely a bug fix that corrects validation logic
e) **Critical subsystem**: Device mapper is a critical component used in production environments for LVM, encryption, and other storage features
f) **Follows stable rules**: This is exactly the type of fix stable kernels want - important bug fixes with minimal risk
The commit message clearly explains the bug: "the current code allows a request-based dm table as long as it includes one request-stackable device. It is supposed to only allow tables where there are no non- request-stackable devices." This could lead to serious issues in production systems using device mapper.
drivers/md/dm-table.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 24a857ff6d0b..79ba4bacd0f9 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -899,17 +899,17 @@ static bool dm_table_supports_dax(struct dm_table *t, return true; }
-static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) +static int device_is_not_rq_stackable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) { struct block_device *bdev = dev->bdev; struct request_queue *q = bdev_get_queue(bdev);
/* request-based cannot stack on partitions! */ if (bdev_is_partition(bdev)) - return false; + return true;
- return queue_is_mq(q); + return !queue_is_mq(q); }
static int dm_table_determine_type(struct dm_table *t) @@ -1005,7 +1005,7 @@ static int dm_table_determine_type(struct dm_table *t)
/* Non-request-stackable devices can't be used for request-based dm */ if (!ti->type->iterate_devices || - !ti->type->iterate_devices(ti, device_is_rq_stackable, NULL)) { + ti->type->iterate_devices(ti, device_is_not_rq_stackable, NULL)) { DMERR("table load rejected: including non-request-stackable devices"); return -EINVAL; }
From: Timothy Pearson tpearson@raptorengineering.com
[ Upstream commit 4668619092554e1b95c9a5ac2941ca47ba6d548a ]
When the root of a nested PCIe bridge configuration is unplugged, the pnv_php driver leaked the allocated IRQ resources for the child bridges' hotplug event notifications, resulting in a panic.
Fix this by walking all child buses and deallocating all its IRQ resources before calling pci_hp_remove_devices().
Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so that it is only destroyed in pnv_php_free_slot(), instead of pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will now be called by workers triggered by hot unplug interrupts, so the workqueue needs to stay allocated.
The abridged kernel panic that occurs without this patch is as follows:
WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2 Call Trace: msi_device_data_release+0x34/0x9c (unreliable) release_nodes+0x64/0x13c devres_release_all+0xc0/0x140 device_del+0x2d4/0x46c pci_destroy_dev+0x5c/0x194 pci_hp_remove_devices+0x90/0x128 pci_hp_remove_devices+0x44/0x128 pnv_php_disable_slot+0x54/0xd4 power_write_file+0xf8/0x18c pci_slot_attr_store+0x40/0x5c sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x3bc/0x50c ksys_write+0x84/0x140 system_call_exception+0x124/0x230 system_call_vectored_common+0x15c/0x2ec
Signed-off-by: Shawn Anastasio sanastasio@raptorengineering.com Signed-off-by: Timothy Pearson tpearson@raptorengineering.com [bhelgaas: tidy comments] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@ra... Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit, here is my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a Real Bug with Kernel Panic**: The commit explicitly fixes a kernel panic (WARNING and crash) that occurs when unplugging nested PCIe bridge configurations. The panic trace shows it happens in `msi_device_data_release()` during hotplug operations, which is a serious stability issue.
2. **Clear Resource Leak Fix**: The commit addresses a specific IRQ resource leak where child bridge IRQ resources were not being properly cleaned up during hot unplug operations. This is a real bug that affects system stability and resource management.
3. **Small and Contained Fix**: The changes are well-contained within the PowerNV PHP (PCI Hotplug) driver. The modifications include: - Adding a new `disable_msi` parameter to `pnv_php_disable_irq()` - Moving workqueue destruction to `pnv_php_free_slot()` - Adding functions to recursively disable IRQs for child slots - Re-enabling IRQ on slot enable for surprise hotplug support
4. **Meets Stable Criteria**: - Fixes a real bug (kernel panic/crash) - Is obviously correct - properly cleans up resources - Has clear testing evidence (fixes the described panic) - Size is reasonable (under 100 lines of actual changes) - Doesn't introduce new features or architectural changes
5. **Critical for PowerNV Platform Users**: This affects IBM PowerNV platforms using PCIe hotplug functionality, where nested bridge configurations can cause system crashes without this fix.
6. **Well-Documented Problem**: The commit message provides a clear description of the issue, including the kernel panic trace, making it easy to verify the problem and solution.
The fix follows proper cleanup patterns by ensuring IRQ resources are freed before removing devices, and properly manages the lifetime of the workqueue to avoid use-after-free scenarios. This is exactly the type of bug fix that stable kernels should include to maintain system stability.
drivers/pci/hotplug/pnv_php.c | 96 ++++++++++++++++++++++++++++------- 1 file changed, 77 insertions(+), 19 deletions(-)
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index aaa8e0a2a893..5476c9e7760d 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -3,6 +3,7 @@ * PCI Hotplug Driver for PowerPC PowerNV platform. * * Copyright Gavin Shan, IBM Corporation 2016. + * Copyright (C) 2025 Raptor Engineering, LLC */
#include <linux/bitfield.h> @@ -36,8 +37,10 @@ static void pnv_php_register(struct device_node *dn); static void pnv_php_unregister_one(struct device_node *dn); static void pnv_php_unregister(struct device_node *dn);
+static void pnv_php_enable_irq(struct pnv_php_slot *php_slot); + static void pnv_php_disable_irq(struct pnv_php_slot *php_slot, - bool disable_device) + bool disable_device, bool disable_msi) { struct pci_dev *pdev = php_slot->pdev; u16 ctrl; @@ -53,19 +56,15 @@ static void pnv_php_disable_irq(struct pnv_php_slot *php_slot, php_slot->irq = 0; }
- if (php_slot->wq) { - destroy_workqueue(php_slot->wq); - php_slot->wq = NULL; - } - - if (disable_device) { + if (disable_device || disable_msi) { if (pdev->msix_enabled) pci_disable_msix(pdev); else if (pdev->msi_enabled) pci_disable_msi(pdev); + }
+ if (disable_device) pci_disable_device(pdev); - } }
static void pnv_php_free_slot(struct kref *kref) @@ -74,7 +73,8 @@ static void pnv_php_free_slot(struct kref *kref) struct pnv_php_slot, kref);
WARN_ON(!list_empty(&php_slot->children)); - pnv_php_disable_irq(php_slot, false); + pnv_php_disable_irq(php_slot, false, false); + destroy_workqueue(php_slot->wq); kfree(php_slot->name); kfree(php_slot); } @@ -588,8 +588,58 @@ static int pnv_php_reset_slot(struct hotplug_slot *slot, bool probe) static int pnv_php_enable_slot(struct hotplug_slot *slot) { struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); + u32 prop32; + int ret; + + ret = pnv_php_enable(php_slot, true); + if (ret) + return ret; + + /* (Re-)enable interrupt if the slot supports surprise hotplug */ + ret = of_property_read_u32(php_slot->dn, "ibm,slot-surprise-pluggable", + &prop32); + if (!ret && prop32) + pnv_php_enable_irq(php_slot);
- return pnv_php_enable(php_slot, true); + return 0; +} + +/* + * Disable any hotplug interrupts for all slots on the provided bus, as well as + * all downstream slots in preparation for a hot unplug. + */ +static int pnv_php_disable_all_irqs(struct pci_bus *bus) +{ + struct pci_bus *child_bus; + struct pci_slot *slot; + + /* First go down child buses */ + list_for_each_entry(child_bus, &bus->children, node) + pnv_php_disable_all_irqs(child_bus); + + /* Disable IRQs for all pnv_php slots on this bus */ + list_for_each_entry(slot, &bus->slots, list) { + struct pnv_php_slot *php_slot = to_pnv_php_slot(slot->hotplug); + + pnv_php_disable_irq(php_slot, false, true); + } + + return 0; +} + +/* + * Disable any hotplug interrupts for all downstream slots on the provided + * bus in preparation for a hot unplug. + */ +static int pnv_php_disable_all_downstream_irqs(struct pci_bus *bus) +{ + struct pci_bus *child_bus; + + /* Go down child buses, recursively deactivating their IRQs */ + list_for_each_entry(child_bus, &bus->children, node) + pnv_php_disable_all_irqs(child_bus); + + return 0; }
static int pnv_php_disable_slot(struct hotplug_slot *slot) @@ -606,6 +656,13 @@ static int pnv_php_disable_slot(struct hotplug_slot *slot) php_slot->state != PNV_PHP_STATE_REGISTERED) return 0;
+ /* + * Free all IRQ resources from all child slots before remove. + * Note that we do not disable the root slot IRQ here as that + * would also deactivate the slot hot (re)plug interrupt! + */ + pnv_php_disable_all_downstream_irqs(php_slot->bus); + /* Remove all devices behind the slot */ pci_lock_rescan_remove(); pci_hp_remove_devices(php_slot->bus); @@ -674,6 +731,15 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn) return NULL; }
+ /* Allocate workqueue for this slot's interrupt handling */ + php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name); + if (!php_slot->wq) { + SLOT_WARN(php_slot, "Cannot alloc workqueue\n"); + kfree(php_slot->name); + kfree(php_slot); + return NULL; + } + if (dn->child && PCI_DN(dn->child)) php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn); else @@ -870,14 +936,6 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) u16 sts, ctrl; int ret;
- /* Allocate workqueue */ - php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name); - if (!php_slot->wq) { - SLOT_WARN(php_slot, "Cannot alloc workqueue\n"); - pnv_php_disable_irq(php_slot, true); - return; - } - /* Check PDC (Presence Detection Change) is broken or not */ ret = of_property_read_u32(php_slot->dn, "ibm,slot-broken-pdc", &broken_pdc); @@ -896,7 +954,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) ret = request_irq(irq, pnv_php_interrupt, IRQF_SHARED, php_slot->name, php_slot); if (ret) { - pnv_php_disable_irq(php_slot, true); + pnv_php_disable_irq(php_slot, true, true); SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret, irq); return; }
While the autoselector has the right idea, in that this is critical functionality that currently induces panics on older stable kernels, this entire patch series should be backported, not just these two isolated patches.
The correct series for backport would be:
PCI: pnv_php: Fix surprise plug detection and recovery powerpc/eeh: Make EEH driver device hotplug safe powerpc/eeh: Export eeh_unfreeze_pe() PCI: pnv_php: Work around switches with broken presence detection PCI: pnv_php: Clean up allocated IRQs on unplug
Backport is especially critical for Debian, so that we don't ship a broken kernel with the soon to be released Trixie version.
Thanks!
----- Original Message -----
From: "Sasha Levin" sashal@kernel.org To: patches@lists.linux.dev, stable@vger.kernel.org Cc: "Timothy Pearson" tpearson@raptorengineering.com, "Shawn Anastasio" sanastasio@raptorengineering.com, "Bjorn Helgaas" bhelgaas@google.com, "Madhavan Srinivasan" maddy@linux.ibm.com, "Sasha Levin" sashal@kernel.org, "Michael Ellerman" mpe@ellerman.id.au, "linuxppc-dev" linuxppc-dev@lists.ozlabs.org Sent: Friday, August 8, 2025 10:30:54 AM Subject: [PATCH AUTOSEL 6.16-5.10] PCI: pnv_php: Clean up allocated IRQs on unplug
From: Timothy Pearson tpearson@raptorengineering.com
[ Upstream commit 4668619092554e1b95c9a5ac2941ca47ba6d548a ]
When the root of a nested PCIe bridge configuration is unplugged, the pnv_php driver leaked the allocated IRQ resources for the child bridges' hotplug event notifications, resulting in a panic.
Fix this by walking all child buses and deallocating all its IRQ resources before calling pci_hp_remove_devices().
Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so that it is only destroyed in pnv_php_free_slot(), instead of pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will now be called by workers triggered by hot unplug interrupts, so the workqueue needs to stay allocated.
The abridged kernel panic that occurs without this patch is as follows:
WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2 Call Trace: msi_device_data_release+0x34/0x9c (unreliable) release_nodes+0x64/0x13c devres_release_all+0xc0/0x140 device_del+0x2d4/0x46c pci_destroy_dev+0x5c/0x194 pci_hp_remove_devices+0x90/0x128 pci_hp_remove_devices+0x44/0x128 pnv_php_disable_slot+0x54/0xd4 power_write_file+0xf8/0x18c pci_slot_attr_store+0x40/0x5c sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x3bc/0x50c ksys_write+0x84/0x140 system_call_exception+0x124/0x230 system_call_vectored_common+0x15c/0x2ec
Signed-off-by: Shawn Anastasio sanastasio@raptorengineering.com Signed-off-by: Timothy Pearson tpearson@raptorengineering.com [bhelgaas: tidy comments] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@ra... Signed-off-by: Sasha Levin sashal@kernel.org
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit, here is my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
- **Fixes a Real Bug with Kernel Panic**: The commit explicitly fixes a
kernel panic (WARNING and crash) that occurs when unplugging nested PCIe bridge configurations. The panic trace shows it happens in `msi_device_data_release()` during hotplug operations, which is a serious stability issue.
- **Clear Resource Leak Fix**: The commit addresses a specific IRQ
resource leak where child bridge IRQ resources were not being properly cleaned up during hot unplug operations. This is a real bug that affects system stability and resource management.
- **Small and Contained Fix**: The changes are well-contained within
the PowerNV PHP (PCI Hotplug) driver. The modifications include:
- Adding a new `disable_msi` parameter to `pnv_php_disable_irq()`
- Moving workqueue destruction to `pnv_php_free_slot()`
- Adding functions to recursively disable IRQs for child slots
- Re-enabling IRQ on slot enable for surprise hotplug support
- **Meets Stable Criteria**:
- Fixes a real bug (kernel panic/crash)
- Is obviously correct - properly cleans up resources
- Has clear testing evidence (fixes the described panic)
- Size is reasonable (under 100 lines of actual changes)
- Doesn't introduce new features or architectural changes
- **Critical for PowerNV Platform Users**: This affects IBM PowerNV
platforms using PCIe hotplug functionality, where nested bridge configurations can cause system crashes without this fix.
- **Well-Documented Problem**: The commit message provides a clear
description of the issue, including the kernel panic trace, making it easy to verify the problem and solution.
The fix follows proper cleanup patterns by ensuring IRQ resources are freed before removing devices, and properly manages the lifetime of the workqueue to avoid use-after-free scenarios. This is exactly the type of bug fix that stable kernels should include to maintain system stability.
drivers/pci/hotplug/pnv_php.c | 96 ++++++++++++++++++++++++++++------- 1 file changed, 77 insertions(+), 19 deletions(-)
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index aaa8e0a2a893..5476c9e7760d 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -3,6 +3,7 @@
- PCI Hotplug Driver for PowerPC PowerNV platform.
- Copyright Gavin Shan, IBM Corporation 2016.
- Copyright (C) 2025 Raptor Engineering, LLC
*/
#include <linux/bitfield.h> @@ -36,8 +37,10 @@ static void pnv_php_register(struct device_node *dn); static void pnv_php_unregister_one(struct device_node *dn); static void pnv_php_unregister(struct device_node *dn);
+static void pnv_php_enable_irq(struct pnv_php_slot *php_slot);
static void pnv_php_disable_irq(struct pnv_php_slot *php_slot,
bool disable_device)
bool disable_device, bool disable_msi)
{ struct pci_dev *pdev = php_slot->pdev; u16 ctrl; @@ -53,19 +56,15 @@ static void pnv_php_disable_irq(struct pnv_php_slot *php_slot, php_slot->irq = 0; }
- if (php_slot->wq) {
destroy_workqueue(php_slot->wq);
php_slot->wq = NULL;
- }
- if (disable_device) {
if (disable_device || disable_msi) { if (pdev->msix_enabled) pci_disable_msix(pdev); else if (pdev->msi_enabled) pci_disable_msi(pdev);
}
if (disable_device) pci_disable_device(pdev);
- }
}
static void pnv_php_free_slot(struct kref *kref) @@ -74,7 +73,8 @@ static void pnv_php_free_slot(struct kref *kref) struct pnv_php_slot, kref);
WARN_ON(!list_empty(&php_slot->children));
- pnv_php_disable_irq(php_slot, false);
- pnv_php_disable_irq(php_slot, false, false);
- destroy_workqueue(php_slot->wq); kfree(php_slot->name); kfree(php_slot);
} @@ -588,8 +588,58 @@ static int pnv_php_reset_slot(struct hotplug_slot *slot, bool probe) static int pnv_php_enable_slot(struct hotplug_slot *slot) { struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
- u32 prop32;
- int ret;
- ret = pnv_php_enable(php_slot, true);
- if (ret)
return ret;
- /* (Re-)enable interrupt if the slot supports surprise hotplug */
- ret = of_property_read_u32(php_slot->dn, "ibm,slot-surprise-pluggable",
&prop32);
- if (!ret && prop32)
pnv_php_enable_irq(php_slot);
- return pnv_php_enable(php_slot, true);
- return 0;
+}
+/*
- Disable any hotplug interrupts for all slots on the provided bus, as well as
- all downstream slots in preparation for a hot unplug.
- */
+static int pnv_php_disable_all_irqs(struct pci_bus *bus) +{
- struct pci_bus *child_bus;
- struct pci_slot *slot;
- /* First go down child buses */
- list_for_each_entry(child_bus, &bus->children, node)
pnv_php_disable_all_irqs(child_bus);
- /* Disable IRQs for all pnv_php slots on this bus */
- list_for_each_entry(slot, &bus->slots, list) {
struct pnv_php_slot *php_slot = to_pnv_php_slot(slot->hotplug);
pnv_php_disable_irq(php_slot, false, true);
- }
- return 0;
+}
+/*
- Disable any hotplug interrupts for all downstream slots on the provided
- bus in preparation for a hot unplug.
- */
+static int pnv_php_disable_all_downstream_irqs(struct pci_bus *bus) +{
- struct pci_bus *child_bus;
- /* Go down child buses, recursively deactivating their IRQs */
- list_for_each_entry(child_bus, &bus->children, node)
pnv_php_disable_all_irqs(child_bus);
- return 0;
}
static int pnv_php_disable_slot(struct hotplug_slot *slot) @@ -606,6 +656,13 @@ static int pnv_php_disable_slot(struct hotplug_slot *slot) php_slot->state != PNV_PHP_STATE_REGISTERED) return 0;
- /*
* Free all IRQ resources from all child slots before remove.
* Note that we do not disable the root slot IRQ here as that
* would also deactivate the slot hot (re)plug interrupt!
*/
- pnv_php_disable_all_downstream_irqs(php_slot->bus);
- /* Remove all devices behind the slot */ pci_lock_rescan_remove(); pci_hp_remove_devices(php_slot->bus);
@@ -674,6 +731,15 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn) return NULL; }
- /* Allocate workqueue for this slot's interrupt handling */
- php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name);
- if (!php_slot->wq) {
SLOT_WARN(php_slot, "Cannot alloc workqueue\n");
kfree(php_slot->name);
kfree(php_slot);
return NULL;
- }
- if (dn->child && PCI_DN(dn->child)) php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn); else
@@ -870,14 +936,6 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) u16 sts, ctrl; int ret;
- /* Allocate workqueue */
- php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name);
- if (!php_slot->wq) {
SLOT_WARN(php_slot, "Cannot alloc workqueue\n");
pnv_php_disable_irq(php_slot, true);
return;
- }
- /* Check PDC (Presence Detection Change) is broken or not */ ret = of_property_read_u32(php_slot->dn, "ibm,slot-broken-pdc", &broken_pdc);
@@ -896,7 +954,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq) ret = request_irq(irq, pnv_php_interrupt, IRQF_SHARED, php_slot->name, php_slot); if (ret) {
pnv_php_disable_irq(php_slot, true);
SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret, irq); return; }pnv_php_disable_irq(php_slot, true, true);
-- 2.39.5
On Fri, Aug 08, 2025 at 10:59:00AM -0500, Timothy Pearson wrote:
While the autoselector has the right idea, in that this is critical functionality that currently induces panics on older stable kernels, this entire patch series should be backported, not just these two isolated patches.
The correct series for backport would be:
PCI: pnv_php: Fix surprise plug detection and recovery powerpc/eeh: Make EEH driver device hotplug safe powerpc/eeh: Export eeh_unfreeze_pe() PCI: pnv_php: Work around switches with broken presence detection PCI: pnv_php: Clean up allocated IRQs on unplug
Backport is especially critical for Debian, so that we don't ship a broken kernel with the soon to be released Trixie version.
Okay, I'll just queue these up directly. Thanks!
linux-stable-mirror@lists.linaro.org