From: Barry Song <v-songbaohua(a)oppo.com>
Since commit 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS
swap-space"), we can plug multiple pages then unplug them all together.
That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it
actually equals the size of iov_iter_npages(iter, INT_MAX).
Note this issue has nothing to do with large folios as we don't support
THP_SWPOUT to non-block devices.
Fixes: 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space")
Reported-by: Christoph Hellwig <hch(a)lst.de>
Closes: https://lore.kernel.org/linux-mm/20240614100329.1203579-1-hch@lst.de/
Cc: NeilBrown <neilb(a)suse.de>
Cc: Anna Schumaker <anna(a)kernel.org>
Cc: Steve French <sfrench(a)samba.org>
Cc: Trond Myklebust <trondmy(a)kernel.org>
Cc: Chuanhua Han <hanchuanhua(a)oppo.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Jeff Layton <jlayton(a)kernel.org>
Cc: Paulo Alcantara <pc(a)manguebit.com>
Cc: Ronnie Sahlberg <ronniesahlberg(a)gmail.com>
Cc: Shyam Prasad N <sprasad(a)microsoft.com>
Cc: Tom Talpey <tom(a)talpey.com>
Cc: Bharath SM <bharathsm(a)microsoft.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
---
-v2:
* drop the assertion instead of fixing the assertion.
per the comments of Willy, Christoph in nfs thread.
fs/smb/client/file.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 9d5c2440abfc..1e269e0bc75b 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -3200,8 +3200,6 @@ static int cifs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
{
ssize_t ret;
- WARN_ON_ONCE(iov_iter_count(iter) != PAGE_SIZE);
-
if (iov_iter_rw(iter) == READ)
ret = netfs_unbuffered_read_iter_locked(iocb, iter);
else
--
2.34.1
Some applications were reporting ETIMEDOUT errors on apparently
good looking flows, according to packet dumps.
We were able to root cause the issue to an accidental setting
of tp->retrans_stamp in the following scenario:
- client sends TFO SYN with data.
- server has TFO disabled, ACKs only SYN but not payload.
- client receives SYNACK covering only SYN.
- tcp_ack() eats SYN and sets tp->retrans_stamp to 0.
- tcp_rcv_fastopen_synack() calls tcp_xmit_retransmit_queue()
to retransmit TFO payload w/o SYN, sets tp->retrans_stamp to "now",
but we are not in any loss recovery state.
- TFO payload is ACKed.
- we are not in any loss recovery state, and don't see any dupacks,
so we don't get to any code path that clears tp->retrans_stamp.
- tp->retrans_stamp stays non-zero for the lifetime of the connection.
- after first RTO, tcp_clamp_rto_to_user_timeout() clamps second RTO
to 1 jiffy due to bogus tp->retrans_stamp.
- on clamped RTO with non-zero icsk_retransmits, retransmits_timed_out()
sets start_ts from tp->retrans_stamp from TFO payload retransmit
hours/days ago, and computes bogus long elapsed time for loss recovery,
and suffers ETIMEDOUT early.
Fixes: a7abf3cd76e1 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
CC: stable(a)vger.kernel.org
Co-developed-by: Neal Cardwell <ncardwell(a)google.com>
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Co-developed-by: Yuchung Cheng <ycheng(a)google.com>
Signed-off-by: Yuchung Cheng <ycheng(a)google.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
---
net/ipv4/tcp_input.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c04a9c8be9dfaa0ec2437b3748284e57588b216..01d208e0eef31fd87c7faaf5a3d10b8f52e99ee0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6296,6 +6296,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
skb_rbtree_walk_from(data)
tcp_mark_skb_lost(sk, data);
tcp_xmit_retransmit_queue(sk);
+ tp->retrans_stamp = 0;
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPFASTOPENACTIVEFAIL);
return true;
--
2.45.2.627.g7a2c4fd464-goog
From: Vineeth Pillai <viremana(a)linux.microsoft.com>
commit b46b4a8a57c377b72a98c7930a9f6969d2d4784e
There could be instances where a system stall prevents the timesync
packets to be consumed. And this might lead to more than one packet
pending in the ring buffer. Current code empties one packet per callback
and it might be a stale one. So drain all the packets from ring buffer
on each callback.
Signed-off-by: Vineeth Pillai <viremana(a)linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Link: https://lore.kernel.org/r/20200821152849.99517-1-viremana@linux.microsoft.c…
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
The upstream commit uses HV_HYP_PAGE_SIZE, which is not defined in 4.19.y.
Fixed this manually for 4.19.y by using PAGE_SIZE instead.
If there are multiple messages in the host-to-guest ringbuffer of the TimeSync
device, 4.19.y only handles 1 message, and later the host puts new messages
into the ringbuffer without signaling the guest because the ringbuffer is not
empty, causing a "hung" ringbuffer. Backported the mainline fix for this issue.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
drivers/hv/hv_util.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 2003314dcfbe..4a131efe54ef 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -294,10 +294,23 @@ static void timesync_onchannelcallback(void *context)
struct ictimesync_ref_data *refdata;
u8 *time_txf_buf = util_timesynch.recv_buffer;
- vmbus_recvpacket(channel, time_txf_buf,
- PAGE_SIZE, &recvlen, &requestid);
+ /*
+ * Drain the ring buffer and use the last packet to update
+ * host_ts
+ */
+ while (1) {
+ int ret = vmbus_recvpacket(channel, time_txf_buf,
+ PAGE_SIZE, &recvlen,
+ &requestid);
+ if (ret) {
+ pr_warn_once("TimeSync IC pkt recv failed (Err: %d)\n",
+ ret);
+ break;
+ }
+
+ if (!recvlen)
+ break;
- if (recvlen > 0) {
icmsghdrp = (struct icmsg_hdr *)&time_txf_buf[
sizeof(struct vmbuspipe_hdr)];
--
2.25.1
From: Barry Song <v-songbaohua(a)oppo.com>
Since commit 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS
swap-space"), we can plug multiple pages then unplug them all together.
That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it
actually equals the size of iov_iter_npages(iter, INT_MAX).
Note this issue has nothing to do with large folios as we don't support
THP_SWPOUT to non-block devices.
Fixes: 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space")
Reported-by: Christoph Hellwig <hch(a)lst.de>
Closes: https://lore.kernel.org/linux-mm/20240617053201.GA16852@lst.de/
Cc: NeilBrown <neilb(a)suse.de>
Cc: Anna Schumaker <anna(a)kernel.org>
Cc: Steve French <sfrench(a)samba.org>
Cc: Trond Myklebust <trondmy(a)kernel.org>
Cc: Chuanhua Han <hanchuanhua(a)oppo.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Jeff Layton <jlayton(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
---
fs/nfs/direct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index bb2f583eb28b..a1bfa86f467a 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -141,7 +141,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
{
ssize_t ret;
- VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
+ VM_WARN_ON(iov_iter_count(iter) != iov_iter_npages(iter, INT_MAX) * PAGE_SIZE);
if (iov_iter_rw(iter) == READ)
ret = nfs_file_direct_read(iocb, iter, true);
--
2.34.1
From: Barry Song <v-songbaohua(a)oppo.com>
Since commit 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS
swap-space"), we can plug multiple pages then unplug them all together.
That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it
actually equals the size of iov_iter_npages(iter, INT_MAX).
Note this issue has nothing to do with large folios as we don't support
THP_SWPOUT to non-block devices.
Fixes: 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space")
Reported-by: Christoph Hellwig <hch(a)lst.de>
Closes: https://lore.kernel.org/linux-mm/20240614100329.1203579-1-hch@lst.de/
Cc: NeilBrown <neilb(a)suse.de>
Cc: Anna Schumaker <anna(a)kernel.org>
Cc: Steve French <sfrench(a)samba.org>
Cc: Trond Myklebust <trondmy(a)kernel.org>
Cc: Chuanhua Han <hanchuanhua(a)oppo.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Jeff Layton <jlayton(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
---
fs/smb/client/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 9d5c2440abfc..2f11f138c57d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -3200,7 +3200,7 @@ static int cifs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
{
ssize_t ret;
- WARN_ON_ONCE(iov_iter_count(iter) != PAGE_SIZE);
+ WARN_ON_ONCE(iov_iter_count(iter) != iov_iter_npages(iter, INT_MAX) * PAGE_SIZE);
if (iov_iter_rw(iter) == READ)
ret = netfs_unbuffered_read_iter_locked(iocb, iter);
--
2.34.1
The patch titled
Subject: selftests/mm:fix test_prctl_fork_exec return failure
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fix-test_prctl_fork_exec-return-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: aigourensheng <shechenglong001(a)gmail.com>
Subject: selftests/mm:fix test_prctl_fork_exec return failure
Date: Mon, 17 Jun 2024 01:29:34 -0400
After calling fork() in test_prctl_fork_exec(), the global variable
ksm_full_scans_fd is initialized to 0 in the child process upon entering
the main function of ./ksm_functional_tests.
In the function call chain test_child_ksm() -> __mmap_and_merge_range ->
ksm_merge-> ksm_get_full_scans, start_scans = ksm_get_full_scans() will
return an error. Therefore, the value of ksm_full_scans_fd needs to be
initialized before calling test_child_ksm in the child process.
Link: https://lkml.kernel.org/r/20240617052934.5834-1-shechenglong001@gmail.com
Signed-off-by: aigourensheng <shechenglong001(a)gmail.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/ksm_functional_tests.c | 38 ++++++------
1 file changed, 22 insertions(+), 16 deletions(-)
--- a/tools/testing/selftests/mm/ksm_functional_tests.c~selftests-mm-fix-test_prctl_fork_exec-return-failure
+++ a/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -656,12 +656,33 @@ unmap:
munmap(map, size);
}
+static void init_global_file_handles(void)
+{
+ mem_fd = open("/proc/self/mem", O_RDWR);
+ if (mem_fd < 0)
+ ksft_exit_fail_msg("opening /proc/self/mem failed\n");
+ ksm_fd = open("/sys/kernel/mm/ksm/run", O_RDWR);
+ if (ksm_fd < 0)
+ ksft_exit_skip("open(\"/sys/kernel/mm/ksm/run\") failed\n");
+ ksm_full_scans_fd = open("/sys/kernel/mm/ksm/full_scans", O_RDONLY);
+ if (ksm_full_scans_fd < 0)
+ ksft_exit_skip("open(\"/sys/kernel/mm/ksm/full_scans\") failed\n");
+ pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
+ if (pagemap_fd < 0)
+ ksft_exit_skip("open(\"/proc/self/pagemap\") failed\n");
+ proc_self_ksm_stat_fd = open("/proc/self/ksm_stat", O_RDONLY);
+ proc_self_ksm_merging_pages_fd = open("/proc/self/ksm_merging_pages",
+ O_RDONLY);
+ ksm_use_zero_pages_fd = open("/sys/kernel/mm/ksm/use_zero_pages", O_RDWR);
+}
+
int main(int argc, char **argv)
{
unsigned int tests = 8;
int err;
if (argc > 1 && !strcmp(argv[1], FORK_EXEC_CHILD_PRG_NAME)) {
+ init_global_file_handles();
exit(test_child_ksm());
}
@@ -674,22 +695,7 @@ int main(int argc, char **argv)
pagesize = getpagesize();
- mem_fd = open("/proc/self/mem", O_RDWR);
- if (mem_fd < 0)
- ksft_exit_fail_msg("opening /proc/self/mem failed\n");
- ksm_fd = open("/sys/kernel/mm/ksm/run", O_RDWR);
- if (ksm_fd < 0)
- ksft_exit_skip("open(\"/sys/kernel/mm/ksm/run\") failed\n");
- ksm_full_scans_fd = open("/sys/kernel/mm/ksm/full_scans", O_RDONLY);
- if (ksm_full_scans_fd < 0)
- ksft_exit_skip("open(\"/sys/kernel/mm/ksm/full_scans\") failed\n");
- pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
- if (pagemap_fd < 0)
- ksft_exit_skip("open(\"/proc/self/pagemap\") failed\n");
- proc_self_ksm_stat_fd = open("/proc/self/ksm_stat", O_RDONLY);
- proc_self_ksm_merging_pages_fd = open("/proc/self/ksm_merging_pages",
- O_RDONLY);
- ksm_use_zero_pages_fd = open("/sys/kernel/mm/ksm/use_zero_pages", O_RDWR);
+ init_global_file_handles();
test_unmerge();
test_unmerge_zero_pages();
_
Patches currently in -mm which might be from shechenglong001(a)gmail.com are
selftests-mm-fix-test_prctl_fork_exec-return-failure.patch
The patch titled
Subject: ocfs2: fix DIO failure due to insufficient transaction credits
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-dio-failure-due-to-insufficient-transaction-credits.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: ocfs2: fix DIO failure due to insufficient transaction credits
Date: Fri, 14 Jun 2024 16:52:43 +0200
The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits(). This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.
Extent tree manipulations do often extend the current transaction but not
in all of the cases. For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents. Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error. This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.
To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().
Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz
Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 +++++
fs/ocfs2/journal.c | 17 +++++++++++++++++
fs/ocfs2/journal.h | 2 ++
fs/ocfs2/ocfs2_trace.h | 2 ++
4 files changed, 26 insertions(+)
--- a/fs/ocfs2/aops.c~ocfs2-fix-dio-failure-due-to-insufficient-transaction-credits
+++ a/fs/ocfs2/aops.c
@@ -2366,6 +2366,11 @@ static int ocfs2_dio_end_io_write(struct
}
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
+ ret = ocfs2_assure_trans_credits(handle, credits);
+ if (ret < 0) {
+ mlog_errno(ret);
+ break;
+ }
ret = ocfs2_mark_extent_written(inode, &et, handle,
ue->ue_cpos, 1,
ue->ue_phys,
--- a/fs/ocfs2/journal.c~ocfs2-fix-dio-failure-due-to-insufficient-transaction-credits
+++ a/fs/ocfs2/journal.c
@@ -446,6 +446,23 @@ bail:
}
/*
+ * Make sure handle has at least 'nblocks' credits available. If it does not
+ * have that many credits available, we will try to extend the handle to have
+ * enough credits. If that fails, we will restart transaction to have enough
+ * credits. Similar notes regarding data consistency and locking implications
+ * as for ocfs2_extend_trans() apply here.
+ */
+int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
+{
+ int old_nblks = jbd2_handle_buffer_credits(handle);
+
+ trace_ocfs2_assure_trans_credits(old_nblks);
+ if (old_nblks >= nblocks)
+ return 0;
+ return ocfs2_extend_trans(handle, nblocks - old_nblks);
+}
+
+/*
* If we have fewer than thresh credits, extend by OCFS2_MAX_TRANS_DATA.
* If that fails, restart the transaction & regain write access for the
* buffer head which is used for metadata modifications.
--- a/fs/ocfs2/journal.h~ocfs2-fix-dio-failure-due-to-insufficient-transaction-credits
+++ a/fs/ocfs2/journal.h
@@ -243,6 +243,8 @@ handle_t *ocfs2_start_trans(struct
int ocfs2_commit_trans(struct ocfs2_super *osb,
handle_t *handle);
int ocfs2_extend_trans(handle_t *handle, int nblocks);
+int ocfs2_assure_trans_credits(handle_t *handle,
+ int nblocks);
int ocfs2_allocate_extend_trans(handle_t *handle,
int thresh);
--- a/fs/ocfs2/ocfs2_trace.h~ocfs2-fix-dio-failure-due-to-insufficient-transaction-credits
+++ a/fs/ocfs2/ocfs2_trace.h
@@ -2577,6 +2577,8 @@ DEFINE_OCFS2_ULL_UINT_EVENT(ocfs2_commit
DEFINE_OCFS2_INT_INT_EVENT(ocfs2_extend_trans);
+DEFINE_OCFS2_INT_EVENT(ocfs2_assure_trans_credits);
+
DEFINE_OCFS2_INT_EVENT(ocfs2_extend_trans_restart);
DEFINE_OCFS2_INT_INT_EVENT(ocfs2_allocate_extend_trans);
_
Patches currently in -mm which might be from jack(a)suse.cz are
ocfs2-fix-dio-failure-due-to-insufficient-transaction-credits.patch