The host may send multiple negotiation packets (due to timeout) before the KVP user-mode daemon is connected. KVP user-mode daemon is connected. We need to defer processing those packets until the daemon is negotiated and connected. It's okay for guest to respond to all negotiation packets.
In addition, the host may send multiple staged KVP requests as soon as negotiation is done. We need to properly process those packets using one tasklet for exclusive access to ring buffer.
This patch is based on the work of Nick Meier Nick.Meier@microsoft.com.
Signed-off-by: Long Li longli@microsoft.com Signed-off-by: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
The above is the original changelog of a3ade8cc474d ("HV: properly delay KVP packets when negotiation is in progress"
Here I re-worked the original patch because the mainline version can't work for the linux-4.4.y branch, on which channel->callback_event doesn't exist yet. In the mainline, channel->callback_event was added by: 631e63a9f346 ("vmbus: change to per channel tasklet"). Here we don't want to backport it to v4.4, as it requires extra supporting changes and fixes, which are unnecessary as to the KVP bug we're trying to resolve.
NOTE: before this patch is used, we should cherry-pick the other related 3 patches from the mainline first:
2d0c3b5 ("Drivers: hv: utils: Invoke the poll function after handshake") b9830d1 ("Drivers: hv: util: Pass the channel information during the init call") 4dbfc2e ("Drivers: hv: kvp: fix IP Failover")
And, actually it would better if we can cherry-pick more fixes from the mainline first (the 3 above patches are also included in this 27-patch list):
01 b003596 Drivers: hv: utils: use memdup_user in hvt_op_write 02 2d0c3b5 Drivers: hv: utils: Invoke the poll function after handshake 03 1f75338 Drivers: hv: utils: fix memory leak on on_msg() failure 04 a72f3a4 Drivers: hv: utils: rename outmsg_lock 05 a150256 Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode 06 9420098 Drivers: hv: utils: fix crash when device is removed from host side 07 77b744a Drivers: hv: utils: fix hvt_op_poll() return value on transport destroy 08 b9830d1 Drivers: hv: util: Pass the channel information during the init call 09 e66853b Drivers: hv: utils: Remove util transport handler from list if registration fails 10 4dbfc2e Drivers: hv: kvp: fix IP Failover 11 e0fa3e5 Drivers: hv: utils: fix a race on userspace daemons registration 12 497af84 Drivers: hv: utils: Continue to poll VSS channel after handling requests. 13 db886e4 Drivers: hv: utils: Check VSS daemon is listening before a hot backup 14 abeda47 Drivers: hv: utils: Rename version definitions to reflect protocol version. 15 2e338f7 Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot. 16 8e1d260 Drivers: hv: utils: Support TimeSync version 4.0 protocol samples. 17 3ba1eb1 Drivers: hv: hv_util: Avoid dynamic allocation in time synch 18 3da0401b Drivers: hv: utils: Fix the mapping between host version and protocol to use 19 23d2cc0 Drivers: hv: vss: Improve log messages. 20 b357fd3 Drivers: hv: vss: Operation timeouts should match host expectation 21 1724462 hv_util: switch to using timespec64 22 a165645 Drivers: hv: vmbus: Use all supported IC versions to negotiate 23 1274a69 Drivers: hv: Log the negotiated IC versions. 24 bb6a4db Drivers: hv: util: Fix a typo 25 e9c18ae Drivers: hv: util: move waiting for release to hv_utils_transport itself 26 bdc1dd4 vmbus: fix spelling errors 27 ddce54b Drivers: hv: kvp: Use MAX_ADAPTER_ID_SIZE for translating adapter id
This to to say, we're requesting a backport of 4 patches or 28 patches. If 28 patches seem too many, we hope at least the 4 patches can be backported.
The patches can be applied cleanly to the latest v4.4 branch (currently it's v4.4.160).
The background of this backport request is that: recently Wang Jian reported some KVP issues: https://github.com/LIS/lis-next/issues/593: e.g. the /var/lib/hyperv/.kvp_pool_* files can not be updated, and sometimes if the hv_kvp_daemon doesn't timely start, the host may not be able to query the VM's IP address via KVP.
Wang Jian tested the 4 patches and the 28 patches, and the issues can be fixed by the patches.
Reported-by: Wang Jian jianjian.wang1@gmail.com Tested-by: Wang Jian jianjian.wang1@gmail.com Signed-off-by: Dexuan Cui decui@microsoft.com --- drivers/hv/hv_kvp.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c index f3d3d75ac913e..e4fbc17bbe190 100644 --- a/drivers/hv/hv_kvp.c +++ b/drivers/hv/hv_kvp.c @@ -627,21 +627,22 @@ void hv_kvp_onchannelcallback(void *context) NEGO_IN_PROGRESS, NEGO_FINISHED} host_negotiatied = NEGO_NOT_STARTED;
- if (host_negotiatied == NEGO_NOT_STARTED && - kvp_transaction.state < HVUTIL_READY) { + if (kvp_transaction.state < HVUTIL_READY) { /* * If userspace daemon is not connected and host is asking * us to negotiate we need to delay to not lose messages. * This is important for Failover IP setting. */ - host_negotiatied = NEGO_IN_PROGRESS; - schedule_delayed_work(&kvp_host_handshake_work, + if (host_negotiatied == NEGO_NOT_STARTED) { + host_negotiatied = NEGO_IN_PROGRESS; + schedule_delayed_work(&kvp_host_handshake_work, HV_UTIL_NEGO_TIMEOUT * HZ); + } return; } if (kvp_transaction.state > HVUTIL_READY) return; - +recheck: vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 4, &recvlen, &requestid);
@@ -704,6 +705,8 @@ void hv_kvp_onchannelcallback(void *context) VM_PKT_DATA_INBAND, 0);
host_negotiatied = NEGO_FINISHED; + + goto recheck; }
}
On Fri, Oct 12, 2018 at 02:52:46AM +0000, Dexuan Cui wrote:
The host may send multiple negotiation packets (due to timeout) before the KVP user-mode daemon is connected. KVP user-mode daemon is connected. We need to defer processing those packets until the daemon is negotiated and connected. It's okay for guest to respond to all negotiation packets.
In addition, the host may send multiple staged KVP requests as soon as negotiation is done. We need to properly process those packets using one tasklet for exclusive access to ring buffer.
This patch is based on the work of Nick Meier Nick.Meier@microsoft.com.
Signed-off-by: Long Li longli@microsoft.com Signed-off-by: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
The above is the original changelog of a3ade8cc474d ("HV: properly delay KVP packets when negotiation is in progress"
Here I re-worked the original patch because the mainline version can't work for the linux-4.4.y branch, on which channel->callback_event doesn't exist yet. In the mainline, channel->callback_event was added by: 631e63a9f346 ("vmbus: change to per channel tasklet"). Here we don't want to backport it to v4.4, as it requires extra supporting changes and fixes, which are unnecessary as to the KVP bug we're trying to resolve.
NOTE: before this patch is used, we should cherry-pick the other related 3 patches from the mainline first:
2d0c3b5 ("Drivers: hv: utils: Invoke the poll function after handshake") b9830d1 ("Drivers: hv: util: Pass the channel information during the init call") 4dbfc2e ("Drivers: hv: kvp: fix IP Failover")
And, actually it would better if we can cherry-pick more fixes from the mainline first (the 3 above patches are also included in this 27-patch list):
01 b003596 Drivers: hv: utils: use memdup_user in hvt_op_write 02 2d0c3b5 Drivers: hv: utils: Invoke the poll function after handshake 03 1f75338 Drivers: hv: utils: fix memory leak on on_msg() failure 04 a72f3a4 Drivers: hv: utils: rename outmsg_lock 05 a150256 Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode 06 9420098 Drivers: hv: utils: fix crash when device is removed from host side 07 77b744a Drivers: hv: utils: fix hvt_op_poll() return value on transport destroy 08 b9830d1 Drivers: hv: util: Pass the channel information during the init call 09 e66853b Drivers: hv: utils: Remove util transport handler from list if registration fails 10 4dbfc2e Drivers: hv: kvp: fix IP Failover 11 e0fa3e5 Drivers: hv: utils: fix a race on userspace daemons registration 12 497af84 Drivers: hv: utils: Continue to poll VSS channel after handling requests. 13 db886e4 Drivers: hv: utils: Check VSS daemon is listening before a hot backup 14 abeda47 Drivers: hv: utils: Rename version definitions to reflect protocol version. 15 2e338f7 Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot. 16 8e1d260 Drivers: hv: utils: Support TimeSync version 4.0 protocol samples. 17 3ba1eb1 Drivers: hv: hv_util: Avoid dynamic allocation in time synch 18 3da0401b Drivers: hv: utils: Fix the mapping between host version and protocol to use 19 23d2cc0 Drivers: hv: vss: Improve log messages. 20 b357fd3 Drivers: hv: vss: Operation timeouts should match host expectation 21 1724462 hv_util: switch to using timespec64 22 a165645 Drivers: hv: vmbus: Use all supported IC versions to negotiate 23 1274a69 Drivers: hv: Log the negotiated IC versions. 24 bb6a4db Drivers: hv: util: Fix a typo 25 e9c18ae Drivers: hv: util: move waiting for release to hv_utils_transport itself 26 bdc1dd4 vmbus: fix spelling errors 27 ddce54b Drivers: hv: kvp: Use MAX_ADAPTER_ID_SIZE for translating adapter id
This to to say, we're requesting a backport of 4 patches or 28 patches. If 28 patches seem too many, we hope at least the 4 patches can be backported.
28 seems odd, there's lots of things in there that you do not need.
So 4 is good, can you send all 4 as a patch series, properly backported and tested with this patch as the last one?
The patches can be applied cleanly to the latest v4.4 branch (currently it's v4.4.160).
But, I really want to know why people are still trying to use the 4.4 kernel right now for a "general purpose" system. They should be using 4.9 at the very least by now, 4.4 is not a good idea at all. Why can you not just move your users to 4.9 instead of a newer 4.4 kernel? It should be the exact same, right?
thanks,
greg k-h
From: 'gregkh@linuxfoundation.org' gregkh@linuxfoundation.org Sent: Tuesday, October 16, 2018 06:55
... This is to say, we're requesting a backport of 4 patches or 28 patches. If 28 patches seem too many, we hope at least the 4 patches can be
backported.
28 seems odd, there's lots of things in there that you do not need.
Yes, some of the 28 patches are completely unnecessary for a "stable" kernel, but some are fixes for other known issues. Only backporting the minimal amount of the patches can't work due to merge conflicts, so I generated the 28-patch list which can be applied cleanly in order.
So 4 is good, can you send all 4 as a patch series, properly backported and tested with this patch as the last one?
I'm OK with only backporting the 4 patches for this particular issue reported by Wang Jian. Maybe we can backport more fixes in future if people report new KVP issues against the 4.4 kernel.
So I'm going to send all the 4 patches as a patch series. Wang Jian has tested them.
But, I really want to know why people are still trying to use the 4.4 kernel right now for a "general purpose" system. They should be using 4.9 at the very least by now, 4.4 is not a good idea at all. Why can you not just move your users to 4.9 instead of a newer 4.4 kernel? It should be the exact same, right? greg k-h
We definitely encourage users to use new kernels like 4.9 and 4.1x, but it looks some users have to use their customized 4.4 kernels due to some reason I don't know (believe it or not, except Wang Jian, I have made the same private backport twice for two companies since July).
And Ubuntu 16.04.5 LTS (http://releases.ubuntu.com/16.04/), which is based on v4.4, also has the same KVP bug: http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/Makefile?h=Ubuntu... And I did receive a bug report from a Ubuntu user last week.
Ubuntu 16.04 will reach End-of-Life on April 2021 -- still 2.5 years left since now. So I hope after the 4 patches are merged into the upstream 4.4.y branch, the Ubuntu guys will notice them and pick them up.
Thanks, -- Dexuan
But, I really want to know why people are still trying to use the 4.4 kernel right now for a "general purpose" system. They should be using 4.9 at the very least by now, 4.4 is not a good idea at all. Why can you not just move your users to 4.9 instead of a newer 4.4 kernel? It should be the exact same, right? greg k-h
Sorry about this. Maybe you don't believe this, we are just upgrading to 4.4 kernel from 3.2. I can do nothing for this.... But certainly, we are not a completely "general purpose" Linux. On Wed, Oct 17, 2018 at 6:40 AM Dexuan Cui decui@microsoft.com wrote:
From: 'gregkh@linuxfoundation.org' gregkh@linuxfoundation.org Sent: Tuesday, October 16, 2018 06:55
... This is to say, we're requesting a backport of 4 patches or 28 patches. If 28 patches seem too many, we hope at least the 4 patches can be
backported.
28 seems odd, there's lots of things in there that you do not need.
Yes, some of the 28 patches are completely unnecessary for a "stable" kernel, but some are fixes for other known issues. Only backporting the minimal amount of the patches can't work due to merge conflicts, so I generated the 28-patch list which can be applied cleanly in order.
So 4 is good, can you send all 4 as a patch series, properly backported and tested with this patch as the last one?
I'm OK with only backporting the 4 patches for this particular issue reported by Wang Jian. Maybe we can backport more fixes in future if people report new KVP issues against the 4.4 kernel.
So I'm going to send all the 4 patches as a patch series. Wang Jian has tested them.
But, I really want to know why people are still trying to use the 4.4 kernel right now for a "general purpose" system. They should be using 4.9 at the very least by now, 4.4 is not a good idea at all. Why can you not just move your users to 4.9 instead of a newer 4.4 kernel? It should be the exact same, right? greg k-h
We definitely encourage users to use new kernels like 4.9 and 4.1x, but it looks some users have to use their customized 4.4 kernels due to some reason I don't know (believe it or not, except Wang Jian, I have made the same private backport twice for two companies since July).
And Ubuntu 16.04.5 LTS (http://releases.ubuntu.com/16.04/), which is based on v4.4, also has the same KVP bug: http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/Makefile?h=Ubuntu... And I did receive a bug report from a Ubuntu user last week.
Ubuntu 16.04 will reach End-of-Life on April 2021 -- still 2.5 years left since now. So I hope after the 4 patches are merged into the upstream 4.4.y branch, the Ubuntu guys will notice them and pick them up.
Thanks, -- Dexuan
linux-stable-mirror@lists.linaro.org