Setting vm.dirtytime_expire_seconds to 0 causes wakeup_dirtytime_writeback() to reschedule itself with a delay of 0, creating an infinite busy loop that spins kworker at 100% CPU.
This series: - Patch 1: Fixes the bug by handling interval=0 as "disable writeback" (consistent with dirty_writeback_centisecs behavior) - Patch 2: Documents that setting the value to 0 disables writeback
Tested by booting kernels in QEMU with virtme-ng: - Buggy kernel: kworker CPU spikes to ~73% when interval set to 0 - Fixed kernel: CPU remains normal, writeback correctly disabled - Re-enabling (0 -> non-zero): writeback resumes correctly
v2: - Added Reviewed-by from Jan Kara (no code changes)
Laveesh Bansal (2): writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 docs: clarify that dirtytime_expire_seconds=0 disables writeback
Documentation/admin-guide/sysctl/vm.rst | 2 ++ fs/fs-writeback.c | 14 ++++++++++---- 2 files changed, 12 insertions(+), 4 deletions(-)
-- 2.43.0
When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback() schedules delayed work with a delay of 0, causing immediate execution. The function then reschedules itself with 0 delay again, creating an infinite busy loop that causes 100% kworker CPU usage.
Fix by: - Only scheduling delayed work in wakeup_dirtytime_writeback() when dirtytime_expire_interval is non-zero - Cancelling the delayed work in dirtytime_interval_handler() when the interval is set to 0 - Adding a guard in start_dirtytime_writeback() for defensive coding
Tested by booting kernel in QEMU with virtme-ng: - Before fix: kworker CPU spikes to ~73% - After fix: CPU remains at normal levels - Setting interval back to non-zero correctly resumes writeback
Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written") Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227 Signed-off-by: Laveesh Bansal laveeshb@laveeshbansal.com Reviewed-by: Jan Kara jack@suse.cz --- fs/fs-writeback.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6800886c4d10..cd21c74cd0e5 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2492,7 +2492,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w) wb_wakeup(wb); } rcu_read_unlock(); - schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); + if (dirtytime_expire_interval) + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); }
static int dirtytime_interval_handler(const struct ctl_table *table, int write, @@ -2501,8 +2502,12 @@ static int dirtytime_interval_handler(const struct ctl_table *table, int write, int ret;
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); - if (ret == 0 && write) - mod_delayed_work(system_percpu_wq, &dirtytime_work, 0); + if (ret == 0 && write) { + if (dirtytime_expire_interval) + mod_delayed_work(system_percpu_wq, &dirtytime_work, 0); + else + cancel_delayed_work_sync(&dirtytime_work); + } return ret; }
@@ -2519,7 +2524,8 @@ static const struct ctl_table vm_fs_writeback_table[] = {
static int __init start_dirtytime_writeback(void) { - schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); + if (dirtytime_expire_interval) + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); register_sysctl_init("vm", vm_fs_writeback_table); return 0; }
Document that setting vm.dirtytime_expire_seconds to zero disables periodic dirtytime writeback, matching the behavior of the related dirty_writeback_centisecs sysctl which already documents this.
Signed-off-by: Laveesh Bansal laveeshb@laveeshbansal.com Reviewed-by: Jan Kara jack@suse.cz --- Documentation/admin-guide/sysctl/vm.rst | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 4d71211fdad8..e2fdbc521033 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -231,6 +231,8 @@ eventually gets pushed out to disk. This tunable is used to define when dirty inode is old enough to be eligible for writeback by the kernel flusher threads. And, it is also used as the interval to wakeup dirtytime_writeback thread.
+Setting this to zero disables periodic dirtytime writeback. +
dirty_writeback_centisecs =========================
linux-stable-mirror@lists.linaro.org