From: Shyam Prasad N sprasad@microsoft.com
Our current approach to select a channel for sending requests is this: 1. iterate all channels to find the min and max queue depth 2. if min and max are not the same, pick the channel with min depth 3. if min and max are same, round robin, as all channels are equally loaded
The problem with this approach is that there's a lag between selecting a channel and sending the request (that increases the queue depth on the channel). While these numbers will eventually catch up, there could be a skew in the channel usage, depending on the application's I/O parallelism and the server's speed of handling requests.
With sufficient parallelism, this lag can artificially increase the queue depth, thereby impacting the performance negatively.
This change will change the step 1 above to start the iteration from the last selected channel. This is to reduce the skew in channel usage even in the presence of this lag.
Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N sprasad@microsoft.com --- fs/smb/client/transport.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 266af17aa7d9..191783f553ce 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -1018,14 +1018,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) uint index = 0; unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; struct TCP_Server_Info *server = NULL; - int i; + int i, start, cur;
if (!ses) return NULL;
spin_lock(&ses->chan_lock); + start = atomic_inc_return(&ses->chan_seq); for (i = 0; i < ses->chan_count; i++) { - server = ses->chans[i].server; + cur = (start + i) % ses->chan_count; + server = ses->chans[cur].server; if (!server || server->terminate) continue;
@@ -1042,17 +1044,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) */ if (server->in_flight < min_in_flight) { min_in_flight = server->in_flight; - index = i; + index = cur; } if (server->in_flight > max_in_flight) max_in_flight = server->in_flight; }
/* if all channels are equally loaded, fall back to round-robin */ - if (min_in_flight == max_in_flight) { - index = (uint)atomic_inc_return(&ses->chan_seq); - index %= ses->chan_count; - } + if (min_in_flight == max_in_flight) + index = (uint)start % ses->chan_count;
server = ses->chans[index].server; spin_unlock(&ses->chan_lock);
From: Shyam Prasad N sprasad@microsoft.com
cifs_reconnect can be called with a flag to mark the session as needing reconnect too. When this is done, we expect the connections of all channels to be reconnected too, which is not happening today.
Without doing this, we have seen bad things happen when primary and secondary channels are connected to different servers (in case of cloud services like Azure Files SMB).
This change would force all connections to reconnect as well, not just the sessions and tcons.
Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N sprasad@microsoft.com --- fs/smb/client/connect.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 6bf04d9a5491..ca1cb01c6ef8 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -377,6 +377,13 @@ static int __cifs_reconnect(struct TCP_Server_Info *server, if (!cifs_tcp_ses_needs_reconnect(server, 1)) return 0;
+ /* + * if smb session has been marked for reconnect, also reconnect all + * connections. This way, the other connections do not end up bad. + */ + if (mark_smb_session) + cifs_signal_cifsd_for_reconnect(server, mark_smb_session); + cifs_mark_tcp_ses_conns_for_reconnect(server, mark_smb_session);
cifs_abort_connection(server);
From: Shyam Prasad N sprasad@microsoft.com
When the server interface info changes (more common in clustered servers like Azure Files), the per-channel iface gets updated. However, this did not update the corresponding dstaddr. As a result these channels will still connect (or try connecting) to older addresses.
Fixes: b54034a73baf ("cifs: during reconnect, update interface if necessary") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N sprasad@microsoft.com --- fs/smb/client/sess.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/smb/client/sess.c b/fs/smb/client/sess.c index b3fa9ee26912..8add3ba14e9f 100644 --- a/fs/smb/client/sess.c +++ b/fs/smb/client/sess.c @@ -445,6 +445,10 @@ cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server)
ses->chans[chan_index].iface = iface; spin_unlock(&ses->chan_lock); + + spin_lock(&server->srv_lock); + memcpy(&server->dstaddr, &iface->sockaddr, sizeof(server->dstaddr)); + spin_unlock(&server->srv_lock); }
static int
Merged all six of these multichannel fixes into cifs-2.6.git for-next pending additional testing and any review comments but these three in particular looked most obvious/safe:
b4f60a053a25 cifs: dns resolution is needed only for primary channel c1846893991f cifs: update dstaddr whenever channel iface is updated 1f396b9bfe39 cifs: reset connections for all channels when reconnect requested
I want to look more carefully especially at these three: 2f2c5d38fb9d (HEAD -> for-next, origin/for-next) cifs: do not disable interface polling on failure 4394c936623d cifs: serialize other channels when query server interfaces is pending bf75ad3631c7 cifs: deal with the channel loading lag while picking channels
On Mon, Jun 2, 2025 at 12:09 PM nspmangalore@gmail.com wrote:
From: Shyam Prasad N sprasad@microsoft.com
Our current approach to select a channel for sending requests is this:
- iterate all channels to find the min and max queue depth
- if min and max are not the same, pick the channel with min depth
- if min and max are same, round robin, as all channels are equally loaded
The problem with this approach is that there's a lag between selecting a channel and sending the request (that increases the queue depth on the channel). While these numbers will eventually catch up, there could be a skew in the channel usage, depending on the application's I/O parallelism and the server's speed of handling requests.
With sufficient parallelism, this lag can artificially increase the queue depth, thereby impacting the performance negatively.
This change will change the step 1 above to start the iteration from the last selected channel. This is to reduce the skew in channel usage even in the presence of this lag.
Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N sprasad@microsoft.com
fs/smb/client/transport.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 266af17aa7d9..191783f553ce 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -1018,14 +1018,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) uint index = 0; unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; struct TCP_Server_Info *server = NULL;
int i;
int i, start, cur; if (!ses) return NULL; spin_lock(&ses->chan_lock);
start = atomic_inc_return(&ses->chan_seq); for (i = 0; i < ses->chan_count; i++) {
server = ses->chans[i].server;
cur = (start + i) % ses->chan_count;
server = ses->chans[cur].server; if (!server || server->terminate) continue;
@@ -1042,17 +1044,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) */ if (server->in_flight < min_in_flight) { min_in_flight = server->in_flight;
index = i;
index = cur; } if (server->in_flight > max_in_flight) max_in_flight = server->in_flight; } /* if all channels are equally loaded, fall back to round-robin */
if (min_in_flight == max_in_flight) {
index = (uint)atomic_inc_return(&ses->chan_seq);
index %= ses->chan_count;
}
if (min_in_flight == max_in_flight)
index = (uint)start % ses->chan_count; server = ses->chans[index].server; spin_unlock(&ses->chan_lock);
-- 2.43.0
linux-stable-mirror@lists.linaro.org