From: Guo Xuenan <guoxuenan(a)huawei.com>
[ Upstream commit 575689fc0ffa6c4bb4e72fd18e31a6525a6124e0 ]
xfs log io error will trigger xlog shut down, and end_io worker call
xlog_state_shutdown_callbacks to unpin and release the buf log item.
The race condition is that when there are some thread doing transaction
commit and happened not to be intercepted by xlog_is_shutdown, then,
these log item will be insert into CIL, when unpin and release these
buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
can increase recurrence probability.
The following call graph actually encountered this bad situation.
fsstress io end worker kworker/0:1H-216
xlog_ioend_work
->xlog_force_shutdown
->xlog_state_shutdown_callbacks
->xlog_cil_process_committed
->xlog_cil_committed
->xfs_trans_committed_bulk
->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1);
->xfs_trans_getsb
->_xfs_trans_bjoin
->xfs_buf_item_init
->if (bip) { return 0;} //relog
->xlog_cil_commit
->xlog_cil_insert_items //insert into CIL
->xfs_buf_ioend_fail(bp);
->xfs_buf_ioend
->xfs_buf_item_done
->xfs_buf_item_relse
->xfs_buf_item_free
when cil push worker gather percpu cil and insert super block buf log item
into ctx->log_items then uaf occurs.
==================================================================
BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105
CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
6.1.0-rc1-00001-g274115149b42 #136
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: xfs-cil/sda xlog_cil_push_work
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
print_report+0x171/0x4a6
kasan_report+0xb3/0x130
xlog_cil_push_work+0x1c8f/0x22f0
process_one_work+0x6f9/0xf70
worker_thread+0x578/0xf30
kthread+0x28c/0x330
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 2145:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_slab_alloc+0x54/0x60
kmem_cache_alloc+0x14a/0x510
xfs_buf_item_init+0x160/0x6d0
_xfs_trans_bjoin+0x7f/0x2e0
xfs_trans_getsb+0xb6/0x3f0
xfs_trans_apply_sb_deltas+0x1f/0x8c0
__xfs_trans_commit+0xa25/0xe10
xfs_symlink+0xe23/0x1660
xfs_vn_symlink+0x157/0x280
vfs_symlink+0x491/0x790
do_symlinkat+0x128/0x220
__x64_sys_symlink+0x7a/0x90
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 216:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
__kasan_slab_free+0x105/0x1a0
kmem_cache_free+0xb6/0x460
xfs_buf_ioend+0x1e9/0x11f0
xfs_buf_item_unpin+0x3d6/0x840
xfs_trans_committed_bulk+0x4c2/0x7c0
xlog_cil_committed+0xab6/0xfb0
xlog_cil_process_committed+0x117/0x1e0
xlog_state_shutdown_callbacks+0x208/0x440
xlog_force_shutdown+0x1b3/0x3a0
xlog_ioend_work+0xef/0x1d0
process_one_work+0x6f9/0xf70
worker_thread+0x578/0xf30
kthread+0x28c/0x330
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88801800f388
which belongs to the cache xfs_buf_item of size 272
The buggy address is located 104 bytes inside of
272-byte region [ffff88801800f388, ffff88801800f498)
The buggy address belongs to the physical page:
page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff88801800f208 pfn:0x1800e
head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
[ Backport to 5.15: context cleanly applied with no semantic changes.
Build-tested. ]
Signed-off-by: Guo Xuenan <guoxuenan(a)huawei.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Pranav Tyagi <pranav.tyagi03(a)gmail.com>
---
fs/xfs/xfs_buf_item.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index b1ab100c09e1..ffe318eb897f 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1017,6 +1017,8 @@ xfs_buf_item_relse(
trace_xfs_buf_item_relse(bp, _RET_IP_);
ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
+ if (atomic_read(&bip->bli_refcount))
+ return;
bp->b_log_item = NULL;
xfs_buf_rele(bp);
xfs_buf_item_free(bip);
--
2.49.0
Hi,
I'm seeing a regression on an HP ZBook using the e1000e driver
(chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
after hot-plugging an Ethernet cable. In this case, the Ethernet cable
was unplugged at boot. The network interface eno1 was present but
stuck in the DHCP process. Using tcpdump, only TX packets were visible
and never got any RX -- indicating a possible packet loss or
link-layer issue.
This is on the vanilla Linux 6.16-rc4 (commit
62f224733431dbd564c4fe800d4b67a0cf92ed10).
Bisect says it's this commit:
commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
Author: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Thu Mar 13 16:05:56 2025 +0200
e1000e: change k1 configuration on MTP and later platforms
Starting from Meteor Lake, the Kumeran interface between the integrated
MAC and the I219 PHY works at a different frequency. This causes sporadic
MDI errors when accessing the PHY, and in rare circumstances could lead
to packet corruption.
To overcome this, introduce minor changes to the Kumeran idle
state (K1) parameters during device initialization. Hardware reset
reverts this configuration, therefore it needs to be applied in a few
places.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Avigail Dahan <avigailx.dahan(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
drivers/net/ethernet/intel/e1000e/defines.h | 3 +++
drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
drivers/net/ethernet/intel/e1000e/ich8lan.h | 4 ++++
3 files changed, 82 insertions(+), 5 deletions(-)
Reverting this patch resolves the issue.
Based on the symptoms and the bisect result, this issue might be
similar to https://lore.kernel.org/intel-wired-lan/20250626153544.1853d106@onyx.my.dom…
Affected machine is:
HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03 05/27/2025
(see end of message for dmesg from boot)
CPU model name:
Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
ethtool output:
driver: e1000e
version: 6.16.0-061600rc4-generic
firmware-version: 0.1-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
lspci output:
0:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:57a0]
DeviceName: Onboard Ethernet
Subsystem: Hewlett-Packard Company Device [103c:8e1d]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 162
IOMMU group: 17
Region 0: Memory at 92280000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00798 Data: 0000
Kernel driver in use: e1000e
Kernel modules: e1000e
The relevant dmesg:
<<<cable disconnected>>>
[ 0.927394] e1000e: Intel(R) PRO/1000 Network Driver
[ 0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 0.927933] e1000e 0000:00:1f.6: enabling device (0000 -> 0002)
[ 0.928249] e1000e 0000:00:1f.6: Interrupt Throttling Rate
(ints/sec) set to dynamic conservative mode
[ 1.155716] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
registered PHC clock
[ 1.220694] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
x1) 24:fb:e3:bf:28:c6
[ 1.220721] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[ 1.220903] e1000e 0000:00:1f.6 eth0: MAC: 16, PHY: 12, PBA No: FFFFFF-0FF
[ 1.222632] e1000e 0000:00:1f.6 eno1: renamed from eth0
<<<cable connected>>>
[ 153.932626] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Half
Duplex, Flow Control: None
[ 153.934527] e1000e 0000:00:1f.6 eno1: NIC Link is Down
[ 157.622238] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: None
No error message seen after hot-plugging the Ethernet cable.
--
Best Regards,
En-Wei.
The iotlb_sync_map iommu ops allows drivers to perform necessary cache
flushes when new mappings are established. For the Intel iommu driver,
this callback specifically serves two purposes:
- To flush caches when a second-stage page table is attached to a device
whose iommu is operating in caching mode (CAP_REG.CM==1).
- To explicitly flush internal write buffers to ensure updates to memory-
resident remapping structures are visible to hardware (CAP_REG.RWBF==1).
However, in scenarios where neither caching mode nor the RWBF flag is
active, the cache_tag_flush_range_np() helper, which is called in the
iotlb_sync_map path, effectively becomes a no-op.
Despite being a no-op, cache_tag_flush_range_np() involves iterating
through all cache tags of the iommu's attached to the domain, protected
by a spinlock. This unnecessary execution path introduces overhead,
leading to a measurable I/O performance regression. On systems with NVMes
under the same bridge, performance was observed to drop from approximately
~6150 MiB/s down to ~4985 MiB/s.
Introduce a flag in the dmar_domain structure. This flag will only be set
when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
cache_tag_flush_range_np() is called only for domains where this flag is
set.
Reported-by: Ioanna Alifieraki <ioanna-maria.alifieraki(a)canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115738
Link: https://lore.kernel.org/r/20250701171154.52435-1-ioanna-maria.alifieraki@ca…
Fixes: 129dab6e1286 ("iommu/vt-d: Use cache_tag_flush_range_np() in iotlb_sync_map")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
---
drivers/iommu/intel/iommu.c | 19 ++++++++++++++++++-
drivers/iommu/intel/iommu.h | 3 +++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7aa3932251b2..f60201ee4be0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1796,6 +1796,18 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
(pgd_t *)pgd, flags, old);
}
+static bool domain_need_iotlb_sync_map(struct dmar_domain *domain,
+ struct intel_iommu *iommu)
+{
+ if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
+ return true;
+
+ if (rwbf_quirk || cap_rwbf(iommu->cap))
+ return true;
+
+ return false;
+}
+
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -1833,6 +1845,8 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (ret)
goto out_block_translation;
+ domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
+
return 0;
out_block_translation:
@@ -3945,7 +3959,10 @@ static bool risky_device(struct pci_dev *pdev)
static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
unsigned long iova, size_t size)
{
- cache_tag_flush_range_np(to_dmar_domain(domain), iova, iova + size - 1);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+ if (dmar_domain->iotlb_sync_map)
+ cache_tag_flush_range_np(dmar_domain, iova, iova + size - 1);
return 0;
}
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 3ddbcc603de2..7ab2c34a5ecc 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -614,6 +614,9 @@ struct dmar_domain {
u8 has_mappings:1; /* Has mappings configured through
* iommu_map() interface.
*/
+ u8 iotlb_sync_map:1; /* Need to flush IOTLB cache or write
+ * buffer when creating mappings.
+ */
spinlock_t lock; /* Protect device tracking lists */
struct list_head devices; /* all devices' list */
--
2.43.0
The patch titled
Subject: mm/damon: fix divide by zero in damon_get_intervals_score()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-fix-divide-by-zero-in-damon_get_intervals_score.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: mm/damon: fix divide by zero in damon_get_intervals_score()
Date: Wed, 2 Jul 2025 09:02:04 +0900
The current implementation allows having zero size regions with no special
reasons, but damon_get_intervals_score() gets crashed by divide by zero
when the region size is zero.
[ 29.403950] Oops: divide error: 0000 [#1] SMP NOPTI
This patch fixes the bug, but does not disallow zero size regions to keep
the backward compatibility since disallowing zero size regions might be a
breaking change for some users.
In addition, the same crash can happen when intervals_goal.access_bp is
zero so this should be fixed in stable trees as well.
Link: https://lkml.kernel.org/r/20250702000205.1921-5-honggyu.kim@sk.com
Fixes: f04b0fedbe71 ("mm/damon/core: implement intervals auto-tuning")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/damon/core.c~mm-damon-fix-divide-by-zero-in-damon_get_intervals_score
+++ a/mm/damon/core.c
@@ -1449,6 +1449,7 @@ static unsigned long damon_get_intervals
}
}
target_access_events = max_access_events * goal_bp / 10000;
+ target_access_events = target_access_events ? : 1;
return access_events * 10000 / target_access_events;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-fix-damon-sample-prcl-for-start-failure.patch
samples-damon-fix-damon-sample-wsse-for-start-failure.patch
samples-damon-fix-damon-sample-mtier-for-start-failure.patch
mm-damon-fix-divide-by-zero-in-damon_get_intervals_score.patch
The patch titled
Subject: samples/damon: fix damon sample mtier for start failure
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
samples-damon-fix-damon-sample-mtier-for-start-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: samples/damon: fix damon sample mtier for start failure
Date: Wed, 2 Jul 2025 09:02:03 +0900
The damon_sample_mtier_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.
In such cases, setting Y to "enable" then N triggers the similar crash
with mtier because damon sample start failed but the "enable" stays as Y.
Link: https://lkml.kernel.org/r/20250702000205.1921-4-honggyu.kim@sk.com
Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
samples/damon/mtier.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/samples/damon/mtier.c~samples-damon-fix-damon-sample-mtier-for-start-failure
+++ a/samples/damon/mtier.c
@@ -164,8 +164,12 @@ static int damon_sample_mtier_enable_sto
if (enable == enabled)
return 0;
- if (enable)
- return damon_sample_mtier_start();
+ if (enable) {
+ err = damon_sample_mtier_start();
+ if (err)
+ enable = false;
+ return err;
+ }
damon_sample_mtier_stop();
return 0;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-fix-damon-sample-prcl-for-start-failure.patch
samples-damon-fix-damon-sample-wsse-for-start-failure.patch
samples-damon-fix-damon-sample-mtier-for-start-failure.patch
mm-damon-fix-divide-by-zero-in-damon_get_intervals_score.patch
The patch titled
Subject: samples/damon: fix damon sample wsse for start failure
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
samples-damon-fix-damon-sample-wsse-for-start-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: samples/damon: fix damon sample wsse for start failure
Date: Wed, 2 Jul 2025 09:02:02 +0900
The damon_sample_wsse_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.
In such cases, setting Y to "enable" then N triggers the similar crash
with wsse because damon sample start failed but the "enable" stays as Y.
Link: https://lkml.kernel.org/r/20250702000205.1921-3-honggyu.kim@sk.com
Fixes: b757c6cfc696 ("samples/damon/wsse: start and stop DAMON as the user requests")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
samples/damon/wsse.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/samples/damon/wsse.c~samples-damon-fix-damon-sample-wsse-for-start-failure
+++ a/samples/damon/wsse.c
@@ -102,8 +102,12 @@ static int damon_sample_wsse_enable_stor
if (enable == enabled)
return 0;
- if (enable)
- return damon_sample_wsse_start();
+ if (enable) {
+ err = damon_sample_wsse_start();
+ if (err)
+ enable = false;
+ return err;
+ }
damon_sample_wsse_stop();
return 0;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-fix-damon-sample-prcl-for-start-failure.patch
samples-damon-fix-damon-sample-wsse-for-start-failure.patch
samples-damon-fix-damon-sample-mtier-for-start-failure.patch
mm-damon-fix-divide-by-zero-in-damon_get_intervals_score.patch
The patch titled
Subject: samples/damon: fix damon sample prcl for start failure
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
samples-damon-fix-damon-sample-prcl-for-start-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: samples/damon: fix damon sample prcl for start failure
Date: Wed, 2 Jul 2025 09:02:01 +0900
Patch series "mm/damon: fix divide by zero and its samples", v3.
This series includes fixes against damon and its samples to make it safer
when damon sample starting fails.
It includes the following changes.
- fix unexpected divide by zero crash for zero size regions
- fix bugs for damon samples in case of start failures
This patch (of 4):
The damon_sample_prcl_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.
In such cases, setting Y to "enable" then N triggers the following crash
because damon sample start failed but the "enable" stays as Y.
[ 2441.419649] damon_sample_prcl: start
[ 2454.146817] damon_sample_prcl: stop
[ 2454.146862] ------------[ cut here ]------------
[ 2454.146865] kernel BUG at mm/slub.c:546!
[ 2454.148183] Oops: invalid opcode: 0000 [#1] SMP NOPTI
...
[ 2454.167555] Call Trace:
[ 2454.167822] <TASK>
[ 2454.168061] damon_destroy_ctx+0x78/0x140
[ 2454.168454] damon_sample_prcl_enable_store+0x8d/0xd0
[ 2454.168932] param_attr_store+0xa1/0x120
[ 2454.169315] module_attr_store+0x20/0x50
[ 2454.169695] sysfs_kf_write+0x72/0x90
[ 2454.170065] kernfs_fop_write_iter+0x150/0x1e0
[ 2454.170491] vfs_write+0x315/0x440
[ 2454.170833] ksys_write+0x69/0xf0
[ 2454.171162] __x64_sys_write+0x19/0x30
[ 2454.171525] x64_sys_call+0x18b2/0x2700
[ 2454.171900] do_syscall_64+0x7f/0x680
[ 2454.172258] ? exit_to_user_mode_loop+0xf6/0x180
[ 2454.172694] ? clear_bhb_loop+0x30/0x80
[ 2454.173067] ? clear_bhb_loop+0x30/0x80
[ 2454.173439] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Link: https://lkml.kernel.org/r/20250702000205.1921-1-honggyu.kim@sk.com
Link: https://lkml.kernel.org/r/20250702000205.1921-2-honggyu.kim@sk.com
Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
samples/damon/prcl.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/samples/damon/prcl.c~samples-damon-fix-damon-sample-prcl-for-start-failure
+++ a/samples/damon/prcl.c
@@ -122,8 +122,12 @@ static int damon_sample_prcl_enable_stor
if (enable == enabled)
return 0;
- if (enable)
- return damon_sample_prcl_start();
+ if (enable) {
+ err = damon_sample_prcl_start();
+ if (err)
+ enable = false;
+ return err;
+ }
damon_sample_prcl_stop();
return 0;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-fix-damon-sample-prcl-for-start-failure.patch
samples-damon-fix-damon-sample-wsse-for-start-failure.patch
samples-damon-fix-damon-sample-mtier-for-start-failure.patch
mm-damon-fix-divide-by-zero-in-damon_get_intervals_score.patch