The patch titled Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size has been added to the -mm mm-hotfixes-unstable branch. Its filename is iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch
This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches...
This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days
------------------------------------------------------ From: Dominique Martinet asmadeus@codewreck.org Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size Date: Mon, 11 Aug 2025 16:39:05 +0900
So we've had this regression in 9p for.. almost a year, which is way too long, but there was no "easy" reproducer until yesterday (thank you again!!)
It turned out to be a bug with iov_iter on folios, iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to the end edge of a folio and the later copy_to_iter() fails on the iterate_folioq() bug.
Happy to consider alternative ways of fixing this, now there's a reproducer it's all much clearer; for the bug to be visible we basically need to make and IO with non-contiguous folios in the iov_iter which is not obvious to test with synthetic VMs, with size that triggers a zero-copy read followed by a non-zero-copy read.
It's apparently possible to get an iov forwarded all the way up to the end of the current page we're looking at, e.g.
(gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}}
Where iov_offset is 4k with 4k-sized folios
This should have been because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code.
Link: https://lkml.kernel.org/r/20250811-iot_iter_folio-v1-0-d9c223adf93c@codewrec... Link: https://lkml.kernel.org/r/20250811-iot_iter_folio-v1-1-d9c223adf93c@codewrec... Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Signed-off-by: Dominique Martinet asmadeus@codewreck.org Reported-by: Maximilian Bosch maximilian@mbosch.me Reported-by: Ryan Lahfa ryan@lahfa.xyz Reported-by: Christian Theune ct@flyingcircus.io Reported-by: Arnout Engelen arnout@bzzt.net Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: David Howells dhowells@redhat.com Cc: Ryan Lahfa ryan@lahfa.xyz Cc: stable@vger.kernel.org [v6.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
include/linux/iov_iter.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/include/linux/iov_iter.h~iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size +++ a/include/linux/iov_iter.h @@ -168,6 +168,8 @@ size_t iterate_folioq(struct iov_iter *i break;
fsize = folioq_folio_size(folioq, slot); + if (skip >= fsize) + goto next; base = kmap_local_folio(folio, skip); part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); remain = step(base, progress, part, priv, priv2); @@ -177,6 +179,7 @@ size_t iterate_folioq(struct iov_iter *i progress += consumed; skip += consumed; if (skip >= fsize) { +next: skip = 0; slot++; if (slot == folioq_nr_slots(folioq) && folioq->next) { _
Patches currently in -mm which might be from asmadeus@codewreck.org are
iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch iov_iter-iov_folioq_get_pages-dont-leave-empty-slot-behind.patch
Thinking about it again, I think this:
@@ -168,6 +168,8 @@ size_t iterate_folioq(struct iov_iter *i break; fsize = folioq_folio_size(folioq, slot);
if (skip >= fsize)
base = kmap_local_folio(folio, skip); part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); remain = step(base, progress, part, priv, priv2);goto next;
@@ -177,6 +179,7 @@ size_t iterate_folioq(struct iov_iter *i progress += consumed; skip += consumed; if (skip >= fsize) { +next: skip = 0; slot++; if (slot == folioq_nr_slots(folioq) && folioq->next) {
Would be much better done as:
@@ -168,6 +168,8 @@ size_t iterate_folioq(struct iov_iter *i break; fsize = folioq_folio_size(folioq, slot);
base = kmap_local_folio(folio, skip); part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); remain = step(base, progress, part, priv, priv2);if (skip < fsize) {
@@ -177,6 +179,7 @@ size_t iterate_folioq(struct iov_iter *i progress += consumed; skip += consumed;
if (skip >= fsize) { skip = 0; slot++; if (slot == folioq_nr_slots(folioq) && folioq->next) {}
With the stuff inside the braces suitably indented. The compiler should be able to optimise away the extra comparison.
David
David Howells wrote on Tue, Aug 12, 2025 at 10:38:07AM +0100:
@@ -168,6 +168,8 @@ size_t iterate_folioq(struct iov_iter *i break; fsize = folioq_folio_size(folioq, slot);
base = kmap_local_folio(folio, skip); part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); remain = step(base, progress, part, priv, priv2);if (skip < fsize) {
@@ -177,6 +179,7 @@ size_t iterate_folioq(struct iov_iter *i progress += consumed; skip += consumed;
if (skip >= fsize) { skip = 0; slot++; if (slot == folioq_nr_slots(folioq) && folioq->next) {}
With the stuff inside the braces suitably indented. The compiler should be able to optimise away the extra comparison.
skip is modified in the first if so I don't see how the compiler could optimize it. I just checked and at least iov_iter.o is slightly bigger with a second if:
(a.o = goto, b.o = if) 06:17:52 asmadeus@thor 0 ~/code/linux/bb$ size a.o b.o text data bss dec hex filename 26923 1104 0 28027 6d7b a.o 27019 1104 0 28123 6ddb b.o
but honestly I'm happy to focus on readability here -- if you think two if are easier to read I'll be happy to send a v3
Dominique Martinet asmadeus@codewreck.org wrote:
skip is modified in the first if so I don't see how the compiler could optimize it.
I mean that if you have:
if (skip < fsize) { ... } if (skip >= fsize) { ... }
then the compiler should be able to take the second check as true in the case where the first fails.
26923 1104 0 28027 6d7b a.o 27019 1104 0 28123 6ddb b.o
That's a surprisingly large change.
but honestly I'm happy to focus on readability here -- if you think two if are easier to read I'll be happy to send a v3
I must admit I dislike goto jumping *in* to a braced section. It's not even allowed in C++ and Java, IIRC.
David
David Howells wrote on Tue, Aug 12, 2025 at 10:38:31PM +0100:
26923 1104 0 28027 6d7b a.o 27019 1104 0 28123 6ddb b.o
That's a surprisingly large change.
Right, because the function is inlined multiple times in iov_iter.o it turned out rather big... Here's what it looks like from busybox's bloat-o-meter, which gives a better picture:
function old new delta iov_iter_zero 1706 1738 +32 copy_page_to_iter_nofault 2491 2512 +21 _copy_mc_to_iter 1604 1624 +20 copy_folio_from_iter_atomic 2620 2633 +13 _copy_from_iter_nocache 1706 1719 +13 _copy_from_iter_flushcache 1409 1422 +13 _copy_to_iter 1885 1891 +6 _copy_from_iter 1874 1880 +6 iov_iter_extract_pages 2182 2166 -16 ------------------------------------------------------------------------------ (add/remove: 0/0 grow/shrink: 8/1 up/down: 124/-16) Total: 108 bytes
So the compiler obviously doesn't optimize the if (at least with whatever default flags I'm using on gcc 15.2.1), but the impact is only that big because it's copied so many times.
Anyway, as said before I'm happy to prioritize readability here, so sending a v3 with just this changed.
linux-stable-mirror@lists.linaro.org