The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array.
However, since commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken):
- if ceph_process_folio_batch() fails, shifting never happens
- if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens
- if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens
Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`.
(Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.)
My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com --- fs/ceph/addr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b202d789e93..8bc66b45dade 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1687,6 +1687,7 @@ static int ceph_writepages_start(struct address_space *mapping,
process_folio_batch: rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc); + ceph_shift_unused_folios_left(&ceph_wbc.fbatch); if (rc) goto release_folios;
@@ -1695,8 +1696,6 @@ static int ceph_writepages_start(struct address_space *mapping, goto release_folios;
if (ceph_wbc.processed_in_fbatch) { - ceph_shift_unused_folios_left(&ceph_wbc.fbatch); - if (folio_batch_count(&ceph_wbc.fbatch) == 0 && ceph_wbc.locked_pages < ceph_wbc.max_pages) { doutc(cl, "reached end fbatch, trying for more\n");
On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array.
However, since commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken):
if ceph_process_folio_batch() fails, shifting never happens
if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens
if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens
Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace:
<TASK> ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`.
(Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.)
My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com
fs/ceph/addr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b202d789e93..8bc66b45dade 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1687,6 +1687,7 @@ static int ceph_writepages_start(struct address_space *mapping, process_folio_batch: rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
if (rc) goto release_folios;ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
@@ -1695,8 +1696,6 @@ static int ceph_writepages_start(struct address_space *mapping, goto release_folios; if (ceph_wbc.processed_in_fbatch) {
ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
if (folio_batch_count(&ceph_wbc.fbatch) == 0 && ceph_wbc.locked_pages < ceph_wbc.max_pages) { doutc(cl, "reached end fbatch, trying for more\n");
Let us try to reproduce the issue and to test the patch.
Thanks, Slava.
On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array.
However, since commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken):
if ceph_process_folio_batch() fails, shifting never happens
if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens
if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens
Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace:
<TASK> ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`.
I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns `-E2BIG`, then nothing happens. There is no crush and no write operations could be processed by file system driver anymore. So, it doesn't look like recipe to reproduce the issue. I cannot confirm that the patch fixes the issue without clear way to reproduce the issue.
Could you please provide more clear explanation of the issue reproduction path?
Thanks, Slava.
(Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.)
My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com
fs/ceph/addr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b202d789e93..8bc66b45dade 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1687,6 +1687,7 @@ static int ceph_writepages_start(struct address_space *mapping, process_folio_batch: rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
if (rc) goto release_folios;ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
@@ -1695,8 +1696,6 @@ static int ceph_writepages_start(struct address_space *mapping, goto release_folios; if (ceph_wbc.processed_in_fbatch) {
ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
if (folio_batch_count(&ceph_wbc.fbatch) == 0 && ceph_wbc.locked_pages < ceph_wbc.max_pages) { doutc(cl, "reached end fbatch, trying for more\n");
On Thu, Aug 28, 2025 at 8:55 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array.
However, since commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken):
if ceph_process_folio_batch() fails, shifting never happens
if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens
if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens
Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace:
<TASK> ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`.
I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns `-E2BIG`, then nothing happens. There is no crush and no write operations could be processed by file system driver anymore. So, it doesn't look like recipe to reproduce the issue. I cannot confirm that the patch fixes the issue without clear way to reproduce the issue.
Could you please provide more clear explanation of the issue reproduction path?
Hi Slava,
Was this bit taken into account?
(Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.)
Thanks,
Ilya
On Thu, 2025-08-28 at 21:05 +0200, Ilya Dryomov wrote:
On Thu, Aug 28, 2025 at 8:55 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array.
However, since commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken):
if ceph_process_folio_batch() fails, shifting never happens
if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens
if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens
Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace:
<TASK> ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`.
I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns `-E2BIG`, then nothing happens. There is no crush and no write operations could be processed by file system driver anymore. So, it doesn't look like recipe to reproduce the issue. I cannot confirm that the patch fixes the issue without clear way to reproduce the issue.
Could you please provide more clear explanation of the issue reproduction path?
Hi Slava,
Was this bit taken into account?
(Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.)
Hi Ilya,
And which practical step of actions do you see to repeat and reproduce it? :)
Thanks, Slava.
On Thu, Aug 28, 2025 at 9:08 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
And which practical step of actions do you see to repeat and reproduce it? :)
Apply the patch in the link. Did you read that thread/patch?
On Thu, 2025-08-28 at 23:37 +0200, Max Kellermann wrote:
On Thu, Aug 28, 2025 at 9:08 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
And which practical step of actions do you see to repeat and reproduce it? :)
Apply the patch in the link. Did you read that thread/patch?
By applying the patch [1], enabling CONFIG_DEBUG_VM, and returning -E2BIG from ceph_check_page_before_write(), I was able to reproduce this warning:
[ 123.147833] ------------[ cut here ]------------ [ 123.147861] WARNING: CPU: 5 PID: 72 at ./include/linux/huge_mm.h:482 folios_put_refs+0x4c2/0x600 [ 123.147900] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel kvm irqbypass joydev polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse i2c_piix4 vga16fb pata_acpi bochs vgastate i2c_smbus serio_raw floppy qemu_fw_cfg mac_hid sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 123.147988] CPU: 5 UID: 0 PID: 72 Comm: kworker/u32:2 Not tainted 6.17.0- rc4+ #9 PREEMPT(voluntary) [ 123.147995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 123.148002] Workqueue: writeback wb_workfn (flush-ceph-1) [ 123.148021] RIP: 0010:folios_put_refs+0x4c2/0x600 [ 123.148031] Code: cc c6 db 05 0f 85 19 01 00 00 48 81 c4 b8 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b e9 1e fe ff ff e8 c2 fe 24 00 e9 da fb ff ff 4c 89 ef e8 b5 [ 123.148035] RSP: 0018:ffff888101c6f228 EFLAGS: 00010246 [ 123.148051] RAX: ffffed102038dea4 RBX: 0000000000000000 RCX: 0000000000000000 [ 123.148057] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888101c6f520 [ 123.148060] RBP: ffff888101c6f308 R08: 0000000000000000 R09: 0000000000000000 [ 123.148063] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 123.148066] R13: ffff888101c6f520 R14: 0000000000000000 R15: dffffc0000000000 [ 123.148069] FS: 0000000000000000(0000) GS:ffff88824a034000(0000) knlGS:0000000000000000 [ 123.148072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.148075] CR2: 0000700798000020 CR3: 0000000111b6a005 CR4: 0000000000772ef0 [ 123.148082] PKRU: 55555554 [ 123.148085] Call Trace: [ 123.148088] <TASK> [ 123.148093] ? __pfx_folios_put_refs+0x10/0x10 [ 123.148099] ? __pfx_filemap_get_folios_tag+0x10/0x10 [ 123.148110] __folio_batch_release+0x52/0xe0 [ 123.148115] ceph_writepages_start+0x277a/0x45f0 [ 123.148129] ? update_load_avg+0x1bd/0x1fe0 [ 123.148145] ? dequeue_entity+0x3e5/0x1450 [ 123.148151] ? ata_sff_qc_issue+0x443/0xa90 [ 123.148175] ? kvm_sched_clock_read+0x11/0x20 [ 123.148198] ? sched_clock_noinstr+0x9/0x10 [ 123.148203] ? sched_clock+0x10/0x30 [ 123.148216] ? __pfx_ceph_writepages_start+0x10/0x10 [ 123.148221] ? psi_group_change+0x3fa/0x8a0 [ 123.148233] ? __pfx_sched_clock_cpu+0x10/0x10 [ 123.148238] ? set_next_entity+0x325/0xb40 [ 123.148245] ? ncsi_channel_monitor.cold+0x36d/0x553 [ 123.148269] ? __kasan_check_write+0x14/0x30 [ 123.148283] ? _raw_spin_lock+0x82/0xf0 [ 123.148293] ? __pfx__raw_spin_lock+0x10/0x10 [ 123.148298] do_writepages+0x1e1/0x540 [ 123.148303] ? do_writepages+0x1e1/0x540 [ 123.148308] __writeback_single_inode+0xa7/0x940 [ 123.148312] ? _raw_spin_unlock+0xe/0x40 [ 123.148315] ? wbc_attach_and_unlock_inode+0x440/0x610 [ 123.148325] ? __pfx_call_function_single_prep_ipi+0x10/0x10 [ 123.148336] writeback_sb_inodes+0x563/0xe40 [ 123.148341] ? __pfx_writeback_sb_inodes+0x10/0x10 [ 123.148348] ? __pfx_move_expired_inodes+0x10/0x10 [ 123.148360] __writeback_inodes_wb+0xbe/0x210 [ 123.148364] wb_writeback+0x4e4/0x6f0 [ 123.148368] ? __pfx_wb_writeback+0x10/0x10 [ 123.148416] ? get_nr_dirty_inodes+0xdc/0x1e0 [ 123.148426] wb_workfn+0x5a9/0xb30 [ 123.148430] ? __pfx_wb_workfn+0x10/0x10 [ 123.148433] ? __pfx___schedule+0x10/0x10 [ 123.148438] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 123.148442] process_one_work+0x611/0xe20 [ 123.148448] ? __kasan_check_write+0x14/0x30 [ 123.148452] worker_thread+0x7e3/0x1580 [ 123.148456] ? __pfx_worker_thread+0x10/0x10 [ 123.148458] kthread+0x381/0x7a0 [ 123.148463] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 123.148466] ? __pfx_kthread+0x10/0x10 [ 123.148468] ? __kasan_check_write+0x14/0x30 [ 123.148471] ? recalc_sigpending+0x160/0x220 [ 123.148478] ? _raw_spin_unlock_irq+0xe/0x50 [ 123.148481] ? calculate_sigpending+0x78/0xb0 [ 123.148484] ? __pfx_kthread+0x10/0x10 [ 123.148487] ret_from_fork+0x285/0x350 [ 123.148490] ? __pfx_kthread+0x10/0x10 [ 123.148493] ret_from_fork_asm+0x1a/0x30 [ 123.148499] </TASK> [ 123.148501] ---[ end trace 0000000000000000 ]---
The warning has been eliminated by applying suggested fix. The suggested patch has been tested by xfstests and no regression or issue has been detected.
Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Tested-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com
Thanks, Slava.
[1] https://lore.kernel.org/all/20250826231626.218675-1-max.kellermann@ionos.com...
On Thu, Sep 4, 2025 at 11:43 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
By applying the patch [1], enabling CONFIG_DEBUG_VM, and returning -E2BIG from ceph_check_page_before_write(), I was able to reproduce this warning:
Thanks, I'm glad you could verify the bug and my fix. In case this wasn't clear: you saw just a warning, but this is usually a kernel crash due to NULL pointer dereference. If you only got a warning but no crash, it means your test VM does not use transparent huge pages (no huge_zero_folio allocated yet). In a real workload, the kernel would have crashed.
On Fri, 2025-09-05 at 05:41 +0200, Max Kellermann wrote:
On Thu, Sep 4, 2025 at 11:43 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
By applying the patch [1], enabling CONFIG_DEBUG_VM, and returning -E2BIG from ceph_check_page_before_write(), I was able to reproduce this warning:
Thanks, I'm glad you could verify the bug and my fix. In case this wasn't clear: you saw just a warning, but this is usually a kernel crash due to NULL pointer dereference. If you only got a warning but no crash, it means your test VM does not use transparent huge pages (no huge_zero_folio allocated yet). In a real workload, the kernel would have crashed.
I would like to reproduce the crash. But you've share only these steps. And it looks like that it's not the complete recipe. So, something was missing. If you could share more precise explanation of steps, it will be great.
Thanks, Slava.
On Fri, Sep 5, 2025 at 7:11 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
On Fri, 2025-09-05 at 05:41 +0200, Max Kellermann wrote:
Thanks, I'm glad you could verify the bug and my fix. In case this wasn't clear: you saw just a warning, but this is usually a kernel crash due to NULL pointer dereference. If you only got a warning but no crash, it means your test VM does not use transparent huge pages (no huge_zero_folio allocated yet). In a real workload, the kernel would have crashed.
I would like to reproduce the crash. But you've share only these steps. And it looks like that it's not the complete recipe. So, something was missing. If you could share more precise explanation of steps, it will be great.
The email you just cited explains the circumstances that are necessary for the crash to occur.
Let me repeat it for you: you have to ensure that huge_zero_folio gets allocated (or else the code that dereferences the NULL pointer and crashes gets skipped).
Got it now?
linux-stable-mirror@lists.linaro.org