Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) [...] /* Calculate the number of bytes we need to push, for this page * specifically */ size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); /* If we can't splice it, then copy it in, as normal */ if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; /* Set the bvec pointing to the page, with len $bytes */ bvec_set_page(&bvec, page[i], bytes, offset); /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); /* Sendmsg with $size size (!!!) */ rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths:
(MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter /* this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked.
Cc: stable@vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Signed-off-by: Pedro Falcato pfalcato@suse.de --- drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..9576a2b766c4 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again: lock_sock(sk); - rv = tcp_sendmsg_locked(sk, &msg, size); + rv = tcp_sendmsg_locked(sk, &msg, bytes); release_sock(sk);
if (rv > 0) {
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 12:41 To: Jason Gunthorpe jgg@ziepe.ca; Bernard Metzler BMT@zurich.ibm.com; Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz Cc: Jakub Kicinski kuba@kernel.org; David Howells dhowells@redhat.com; Tom Talpey tom@talpey.com; linux-rdma@vger.kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; Pedro Falcato pfalcato@suse.de; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com Subject: [EXTERNAL] [PATCH] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) [...] /* Calculate the number of bytes we need to push, for this page * specifically */ size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); /* If we can't splice it, then copy it in, as normal */ if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; /* Set the bvec pointing to the page, with len $bytes */ bvec_set_page(&bvec, page[i], bytes, offset); /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); /* Sendmsg with $size size (!!!) */ rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths:
(MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter /* this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked.
Cc: stable@vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https% 3A__lore.kernel.org_oe-2Dlkp_202507220801.50a7210-2Dlkp- 40intel.com&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYSbqx yOwdSiLedP4yO55g&m=8FIDji_gvHEZEUL08IM8h6Slg9f_hp4n8OxkdR_OWLnZ9CkPknHCzHVC BYkCKt_q&s=IP0c71OgDL-cEe5hFduy0qNhy5WICEkTBTQLrVicotc&e= Signed-off-by: Pedro Falcato pfalcato@suse.de
drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..9576a2b766c4 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again: lock_sock(sk);
rv = tcp_sendmsg_locked(sk, &msg, size);
release_sock(sk);rv = tcp_sendmsg_locked(sk, &msg, bytes);
Pedro, many thanks for catching this! I completely missed it during my too sloppy review of that patch. It's a serious bug which must be fixed asap. BUT, looking closer, I do not see the offset being taken into account when retrying a current segment. So, resend attempts seem to send old data which are already out. Shouldn't the try_page_again: label be above bvec_set_page()??
Thanks! Bernard.
if (rv > 0) {
-- 2.50.1
On Wed, Jul 23, 2025 at 02:52:12PM +0000, Bernard Metzler wrote:
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 12:41 To: Jason Gunthorpe jgg@ziepe.ca; Bernard Metzler BMT@zurich.ibm.com; Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz Cc: Jakub Kicinski kuba@kernel.org; David Howells dhowells@redhat.com; Tom Talpey tom@talpey.com; linux-rdma@vger.kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; Pedro Falcato pfalcato@suse.de; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com
[snip]
drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..9576a2b766c4 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again: lock_sock(sk);
rv = tcp_sendmsg_locked(sk, &msg, size);
release_sock(sk);rv = tcp_sendmsg_locked(sk, &msg, bytes);
Pedro, many thanks for catching this! I completely missed it during my too sloppy review of that patch. It's a serious bug which must be fixed asap. BUT, looking closer, I do not see the offset being taken into account when retrying a current segment. So, resend attempts seem to send old data which are already out. Shouldn't the try_page_again: label be above bvec_set_page()??
This was raised off-list by Vlastimil - I think it's harmless to bump (but not use) the offset here, because by reusing the iov_iter we progressively consume the data (it keeps its own size and offset tracking internally). So the only thing we need to track is the size we pass to tcp_sendmsg_locked[1].
If desired (and if my logic is correct!) I can send a v2 deleting that bit.
[1] Assuming tcp_sendmsg_locked guarantees it will never consume something out of the iovec_iter without reporting it as bytes copied, which from a code reading it seems like it won't...
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 17:49 To: Bernard Metzler BMT@zurich.ibm.com Cc: Jason Gunthorpe jgg@ziepe.ca; Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz; Jakub Kicinski kuba@kernel.org; David Howells dhowells@redhat.com; Tom Talpey tom@talpey.com; linux- rdma@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com Subject: [EXTERNAL] Re: [PATCH] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
On Wed, Jul 23, 2025 at 02:52:12PM +0000, Bernard Metzler wrote:
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 12:41 To: Jason Gunthorpe jgg@ziepe.ca; Bernard Metzler
Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz Cc: Jakub Kicinski kuba@kernel.org; David Howells
Tom Talpey tom@talpey.com; linux-rdma@vger.kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; Pedro Falcato pfalcato@suse.de; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com
[snip]
drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..9576a2b766c4 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s,
struct
page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again: lock_sock(sk);
rv = tcp_sendmsg_locked(sk, &msg, size);
release_sock(sk);rv = tcp_sendmsg_locked(sk, &msg, bytes);
Pedro, many thanks for catching this! I completely missed it during my too sloppy review of that patch. It's a serious bug which must be fixed asap. BUT, looking closer, I do not see the offset being taken into account when retrying a current segment. So, resend attempts seem to send old data which are already out. Shouldn't the try_page_again: label be above bvec_set_page()??
This was raised off-list by Vlastimil - I think it's harmless to bump (but not use) the offset here, because by reusing the iov_iter we progressively consume the data (it keeps its own size and offset tracking internally). So the only thing we need to track is the size we pass to tcp_sendmsg_locked[1].
Ah okay, I didn't know that. Are we sure? I am currently travelling and have only limited possibilities to try out things. I just looked up other use cases and found one in net/tls/tls_main.c#L197. Here the loop looks very similar, but it works as I was suggesting (taking offset into account and re-initializing new bvec in case of partial send).
If desired (and if my logic is correct!) I can send a v2 deleting that bit.
So yes if that's all save, please. We shall not have dead code.
Thanks! Bernard.
[1] Assuming tcp_sendmsg_locked guarantees it will never consume something out of the iovec_iter without reporting it as bytes copied, which from a code reading it seems like it won't...
-- Pedro
On Wed, Jul 23, 2025 at 04:49:30PM +0000, Bernard Metzler wrote:
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 17:49 To: Bernard Metzler BMT@zurich.ibm.com Cc: Jason Gunthorpe jgg@ziepe.ca; Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz; Jakub Kicinski kuba@kernel.org; David Howells dhowells@redhat.com; Tom Talpey tom@talpey.com; linux- rdma@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com Subject: [EXTERNAL] Re: [PATCH] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
On Wed, Jul 23, 2025 at 02:52:12PM +0000, Bernard Metzler wrote:
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 12:41 To: Jason Gunthorpe jgg@ziepe.ca; Bernard Metzler
Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz Cc: Jakub Kicinski kuba@kernel.org; David Howells
Tom Talpey tom@talpey.com; linux-rdma@vger.kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; Pedro Falcato pfalcato@suse.de; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com
[snip]
drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..9576a2b766c4 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s,
struct
page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again: lock_sock(sk);
rv = tcp_sendmsg_locked(sk, &msg, size);
release_sock(sk);rv = tcp_sendmsg_locked(sk, &msg, bytes);
Pedro, many thanks for catching this! I completely missed it during my too sloppy review of that patch. It's a serious bug which must be fixed asap. BUT, looking closer, I do not see the offset being taken into account when retrying a current segment. So, resend attempts seem to send old data which are already out. Shouldn't the try_page_again: label be above bvec_set_page()??
This was raised off-list by Vlastimil - I think it's harmless to bump (but not use) the offset here, because by reusing the iov_iter we progressively consume the data (it keeps its own size and offset tracking internally). So the only thing we need to track is the size we pass to tcp_sendmsg_locked[1].
Hi,
Sorry for the delay.
Ah okay, I didn't know that. Are we sure? I am currently travelling and have only limited possibilities to try out things. I just looked up other
I'm not 100% sure, and if some more authoritative voice (David, or Jakub for the net side) could confirm my analysis, it would be great.
use cases and found one in net/tls/tls_main.c#L197. Here the loop looks very similar, but it works as I was suggesting (taking offset into account and re-initializing new bvec in case of partial send).
If desired (and if my logic is correct!) I can send a v2 deleting that bit.
So yes if that's all save, please. We shall not have dead code.
Understood. I'll send a v2 resetting the bvec and iov_iter if we get no further feedback in the meanwhile.
Thanks! Bernard.
[1] Assuming tcp_sendmsg_locked guarantees it will never consume something out of the iovec_iter without reporting it as bytes copied, which from a code reading it seems like it won't...
-- Pedro
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Friday, 25 July 2025 18:10 To: Bernard Metzler BMT@zurich.ibm.com Cc: Jason Gunthorpe jgg@ziepe.ca; Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz; Jakub Kicinski kuba@kernel.org; David Howells dhowells@redhat.com; Tom Talpey tom@talpey.com; linux- rdma@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com Subject: [EXTERNAL] Re: [PATCH] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
On Wed, Jul 23, 2025 at 04:49:30PM +0000, Bernard Metzler wrote:
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 17:49 To: Bernard Metzler BMT@zurich.ibm.com Cc: Jason Gunthorpe jgg@ziepe.ca; Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz; Jakub Kicinski kuba@kernel.org;
David
Howells dhowells@redhat.com; Tom Talpey tom@talpey.com; linux- rdma@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com Subject: [EXTERNAL] Re: [PATCH] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
On Wed, Jul 23, 2025 at 02:52:12PM +0000, Bernard Metzler wrote:
-----Original Message----- From: Pedro Falcato pfalcato@suse.de Sent: Wednesday, 23 July 2025 12:41 To: Jason Gunthorpe jgg@ziepe.ca; Bernard Metzler
Leon Romanovsky leon@kernel.org; Vlastimil Babka vbabka@suse.cz Cc: Jakub Kicinski kuba@kernel.org; David Howells
Tom Talpey tom@talpey.com; linux-rdma@vger.kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; Pedro Falcato pfalcato@suse.de; stable@vger.kernel.org; kernel test robot oliver.sang@intel.com
[snip]
drivers/infiniband/sw/siw/siw_qp_tx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..9576a2b766c4 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,11 +340,11 @@ static int siw_tcp_sendpages(struct socket *s,
struct
page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again: lock_sock(sk);
rv = tcp_sendmsg_locked(sk, &msg, size);
release_sock(sk);rv = tcp_sendmsg_locked(sk, &msg, bytes);
Pedro, many thanks for catching this! I completely missed it during my too sloppy review of that patch. It's a serious bug which must be fixed asap. BUT, looking closer, I do not see the offset being taken into account when retrying a current segment. So, resend attempts seem to send old data which are already out. Shouldn't the try_page_again: label be above bvec_set_page()??
This was raised off-list by Vlastimil - I think it's harmless to bump
(but
not use) the offset here, because by reusing the iov_iter we progressively
consume
the data (it keeps its own size and offset tracking internally). So the only
thing
we need to track is the size we pass to tcp_sendmsg_locked[1].
Hi,
Sorry for the delay.
Ah okay, I didn't know that. Are we sure? I am currently travelling and
have
only limited possibilities to try out things. I just looked up other
I'm not 100% sure, and if some more authoritative voice (David, or Jakub for the net side) could confirm my analysis, it would be great.
use cases and found one in net/tls/tls_main.c#L197. Here the loop looks very similar, but it works as I was suggesting (taking offset into account and re-initializing new bvec in case of partial send).
If desired (and if my logic is correct!) I can send a v2 deleting that
bit.
So yes if that's all save, please. We shall not have dead code.
Understood. I'll send a v2 resetting the bvec and iov_iter if we get no further feedback in the meanwhile.
Let's see if we get input on it. Even if that is all save, I think we shall put in a short comment explaining why we do not re-initiate the bvec.
Please include my new maintainer email address in further iterations, and stop sending to the old. My IBM address goes away end of month. Thank you!
Bernard.
Thanks! Bernard.
[1] Assuming tcp_sendmsg_locked guarantees it will never consume
something
out of the iovec_iter without reporting it as bytes copied, which from a
code
reading it seems like it won't...
-- Pedro
-- Pedro
linux-stable-mirror@lists.linaro.org