So we've had this regression in 9p for.. almost a year, which is way too long, but there was no "easy" reproducer until yesterday (thank you again!!)
It turned out to be a bug with iov_iter on folios, iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to the end edge of a folio and the later copy_to_iter() fails on the iterate_folioq() bug.
Happy to consider alternative ways of fixing this, now there's a reproducer it's all much clearer; for the bug to be visible we basically need to make and IO with non-contiguous folios in the iov_iter which is not obvious to test with synthetic VMs, with size that triggers a zero-copy read followed by a non-zero-copy read.
Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- Changes in v3: - convert 'goto next' to a big "if there is valid data in current folio" Future optimizations can remove it again after making sure this (iov iter advanced to end of folio) can never happen. - Link to v2: https://lore.kernel.org/r/20250812-iot_iter_folio-v2-0-f99423309478@codewrec...
Changes in v2: - Fixed 'remain' being used uninitialized in iterate_folioq when going through the goto - s/forwarded/advanced in commit message - Link to v1: https://lore.kernel.org/r/20250811-iot_iter_folio-v1-0-d9c223adf93c@codewrec...
--- Dominique Martinet (2): iov_iter: iterate_folioq: fix handling of offset >= folio size iov_iter: iov_folioq_get_pages: don't leave empty slot behind
include/linux/iov_iter.h | 20 +++++++++++--------- lib/iov_iter.c | 6 +++--- 2 files changed, 14 insertions(+), 12 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250811-iot_iter_folio-1b7849f88fed
Best regards,
From: Dominique Martinet asmadeus@codewreck.org
It's apparently possible to get an iov advanced all the way up to the end of the current page we're looking at, e.g.
(gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}}
Where iov_offset is 4k with 4k-sized folios
This should have been fine because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code
Reported-by: Maximilian Bosch maximilian@mbosch.me Reported-by: Ryan Lahfa ryan@lahfa.xyz Reported-by: Christian Theune ct@flyingcircus.io Reported-by: Arnout Engelen arnout@bzzt.net Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Cc: stable@vger.kernel.org # v6.12+ Acked-by: David Howells dhowells@redhat.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- include/linux/iov_iter.h | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/include/linux/iov_iter.h b/include/linux/iov_iter.h index c4aa58032faf874ee5b29bd37f9e23c479741bef..f9a17fbbd3980b1fec5f9a7fde812aebef6728d5 100644 --- a/include/linux/iov_iter.h +++ b/include/linux/iov_iter.h @@ -160,7 +160,7 @@ size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2
do { struct folio *folio = folioq_folio(folioq, slot); - size_t part, remain, consumed; + size_t part, remain = 0, consumed; size_t fsize; void *base;
@@ -168,14 +168,16 @@ size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2 break;
fsize = folioq_folio_size(folioq, slot); - base = kmap_local_folio(folio, skip); - part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); - remain = step(base, progress, part, priv, priv2); - kunmap_local(base); - consumed = part - remain; - len -= consumed; - progress += consumed; - skip += consumed; + if (skip < fsize) { + base = kmap_local_folio(folio, skip); + part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); + remain = step(base, progress, part, priv, priv2); + kunmap_local(base); + consumed = part - remain; + len -= consumed; + progress += consumed; + skip += consumed; + } if (skip >= fsize) { skip = 0; slot++;
On Wed, 13 Aug 2025 15:04:54 +0900, Dominique Martinet wrote:
So we've had this regression in 9p for.. almost a year, which is way too long, but there was no "easy" reproducer until yesterday (thank you again!!)
It turned out to be a bug with iov_iter on folios, iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to the end edge of a folio and the later copy_to_iter() fails on the iterate_folioq() bug.
[...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree. Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.fixes
[1/2] iov_iter: iterate_folioq: fix handling of offset >= folio size https://git.kernel.org/vfs/vfs/c/546a40359fd2 [2/2] iov_iter: iov_folioq_get_pages: don't leave empty slot behind https://git.kernel.org/vfs/vfs/c/334430b2d585
Hi Christian,
Christian Brauner wrote on Fri, Aug 15, 2025 at 04:16:33PM +0200:
Applied to the vfs.fixes branch of the vfs/vfs.git tree. Patches in the vfs.fixes branch should appear in linux-next soon.
I'm not sure which is better as this is halfway between vfs and mm, but Andrew seems to have picked this up in the mm tree: - patch1 to mm-hotfixes-unstable https://lkml.kernel.org/r/20250813231639.B7E61C4CEEB@smtp.kernel.org - patch2 to mm-nonmm-unstable https://lkml.kernel.org/r/20250813231515.7AADEC4CEF5@smtp.kernel.org
(and both patches are in -next right now, so I expect you'll get pinged by Stephen in a bit...)
Thanks,
linux-stable-mirror@lists.linaro.org