After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative number: 'max_outstanding_req_per_channel' is 352, 'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(), scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which is bigger than SHRT_MAX (i.e. 32767).
Fix the hang issue by capping scsi_driver.can_queue.
Add the below Fixed tag though ea2f0f77538c itself is good.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com --- drivers/scsi/storvsc_drv.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index ebbbc1299c62..ba374908aec2 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1976,6 +1976,16 @@ static int storvsc_probe(struct hv_device *device, (max_sub_channels + 1) * (100 - ring_avail_percent_lowater) / 100;
+ /* + * v5.14 (see commit ea2f0f77538c) implicitly requires that + * scsi_driver.can_queue should not exceed SHRT_MAX, otherwise + * scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative + * number (note: the type of the "cmd_per_lun" field is "short"), and + * the system may hang during early boot. + */ + if (scsi_driver.can_queue > SHRT_MAX) + scsi_driver.can_queue = SHRT_MAX; + host = scsi_host_alloc(&scsi_driver, sizeof(struct hv_host_device)); if (!host)
On Wed, Oct 06, 2021 at 12:03:45AM -0700, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative number: 'max_outstanding_req_per_channel' is 352, 'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(), scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which is bigger than SHRT_MAX (i.e. 32767).
Fix the hang issue by capping scsi_driver.can_queue.
Add the below Fixed tag though ea2f0f77538c itself is good.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com
drivers/scsi/storvsc_drv.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index ebbbc1299c62..ba374908aec2 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1976,6 +1976,16 @@ static int storvsc_probe(struct hv_device *device, (max_sub_channels + 1) * (100 - ring_avail_percent_lowater) / 100;
- /*
* v5.14 (see commit ea2f0f77538c) implicitly requires that
No need to put a version number in a comment, they do not track well when people backport changes all over the place in other kernel trees.
If you want to refer to a commit, please do so in the documented way.
For this case, that would be: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
thanks,
greg k-h
On 06/10/2021 08:03, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative number: 'max_outstanding_req_per_channel' is 352, 'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(), scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which is bigger than SHRT_MAX (i.e. 32767).
Out of curiosity, are these values realistic? You're capping can_queue just because of a data size issue, so, if these values are realistic, seems a weak reason.
Fix the hang issue by capping scsi_driver.can_queue.
Add the below Fixed tag though ea2f0f77538c itself is good.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com
drivers/scsi/storvsc_drv.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index ebbbc1299c62..ba374908aec2 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1976,6 +1976,16 @@ static int storvsc_probe(struct hv_device *device, (max_sub_channels + 1) * (100 - ring_avail_percent_lowater) / 100;
- /*
* v5.14 (see commit ea2f0f77538c) implicitly requires that
* scsi_driver.can_queue should not exceed SHRT_MAX, otherwise
* scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative
* number (note: the type of the "cmd_per_lun" field is "short"), and
* the system may hang during early boot.
*/
The different data sizes for cmd_per_lun and can_queue are problematic here.
I'd be more inclined to set cmd_per_lun to the same data size as can_queue. We did discuss this when ea2f0f77538c was upstreamed (actually it was the other way around - setting can_queue to 16b).
Thanks, John
- if (scsi_driver.can_queue > SHRT_MAX)
scsi_driver.can_queue = SHRT_MAX;
- host = scsi_host_alloc(&scsi_driver, sizeof(struct hv_host_device)); if (!host)
From: John Garry john.garry@huawei.com Sent: Wednesday, October 6, 2021 1:17 AM
On 06/10/2021 08:03, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative number: 'max_outstanding_req_per_channel' is 352, 'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(), scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which is bigger than SHRT_MAX (i.e. 32767).
Out of curiosity, are these values realistic? You're capping can_queue just because of a data size issue, so, if these values are realistic, seems a weak reason.
The calculated value of can_queue is not realistic. The blk-mq layer caps the number of tags at 10240, so the excessively large value calculated here didn't definitively break anything, though it can be poor from a performance tuning standpoint. The algorithm used here is fairly broken, particularly in VMs with large CPU counts. I have an effort underway to fix it, but its part of a bigger set of changes to also do a better job on the perf tuning aspects.
Fix the hang issue by capping scsi_driver.can_queue.
Add the below Fixed tag though ea2f0f77538c itself is good.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com
drivers/scsi/storvsc_drv.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index ebbbc1299c62..ba374908aec2 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1976,6 +1976,16 @@ static int storvsc_probe(struct hv_device *device, (max_sub_channels + 1) * (100 - ring_avail_percent_lowater) / 100;
- /*
* v5.14 (see commit ea2f0f77538c) implicitly requires that
* scsi_driver.can_queue should not exceed SHRT_MAX, otherwise
* scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative
* number (note: the type of the "cmd_per_lun" field is "short"), and
* the system may hang during early boot.
*/
The different data sizes for cmd_per_lun and can_queue are problematic here.
I'd be more inclined to set cmd_per_lun to the same data size as can_queue. We did discuss this when ea2f0f77538c was upstreamed (actually it was the other way around - setting can_queue to 16b).
I can see that making can_queue be 16 bits would make sense. And it also seems that both cmd_per_lun and can_queue should be unsigned, though I don't the implications of making such a change.
But in today's world where cmd_per_lun is "short" and can_queue is "int", ea2f0f77538c seems incorrect to me. The comparison should be done as "int", not "short", in order to prevent the truncation problem with can_queue that Dexuan's patch is trying to address. The result will always fit in back into the "short" cmd_per_lun since it is calculating a "min" function.
Thanks, John
- if (scsi_driver.can_queue > SHRT_MAX)
scsi_driver.can_queue = SHRT_MAX;
This fix works, but is a more of a temporary hack until I can finish a larger overhaul of the algorithm. But for now, I think the better fix is for ea2f0f77538c to do the comparison as "int" instead of "short".
Michael
host = scsi_host_alloc(&scsi_driver, sizeof(struct hv_host_device)); if (!host)
On 06/10/2021 16:01, Michael Kelley wrote:
From: John Garry john.garry@huawei.com Sent: Wednesday, October 6, 2021 1:17 AM
On 06/10/2021 08:03, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative number: 'max_outstanding_req_per_channel' is 352, 'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(), scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which is bigger than SHRT_MAX (i.e. 32767).
Out of curiosity, are these values realistic? You're capping can_queue just because of a data size issue, so, if these values are realistic, seems a weak reason.
The calculated value of can_queue is not realistic. The blk-mq layer caps the number of tags at 10240,
nit: 1024, I think
so the excessively large value calculated here didn't definitively break anything, though it can be poor from a performance tuning standpoint. The algorithm used here is fairly broken, particularly in VMs with large CPU counts. I have an effort underway to fix it, but its part of a bigger set of changes to also do a better job on the perf tuning aspects.
Fix the hang issue by capping scsi_driver.can_queue.
Add the below Fixed tag though ea2f0f77538c itself is good.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com
drivers/scsi/storvsc_drv.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index ebbbc1299c62..ba374908aec2 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1976,6 +1976,16 @@ static int storvsc_probe(struct hv_device *device, (max_sub_channels + 1) * (100 - ring_avail_percent_lowater) / 100;
- /*
* v5.14 (see commit ea2f0f77538c) implicitly requires that
* scsi_driver.can_queue should not exceed SHRT_MAX, otherwise
* scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative
* number (note: the type of the "cmd_per_lun" field is "short"), and
* the system may hang during early boot.
*/
The different data sizes for cmd_per_lun and can_queue are problematic here.
I'd be more inclined to set cmd_per_lun to the same data size as can_queue. We did discuss this when ea2f0f77538c was upstreamed (actually it was the other way around - setting can_queue to 16b).
I can see that making can_queue be 16 bits would make sense. And it also seems that both cmd_per_lun and can_queue should be unsigned, though I don't the implications of making such a change.
But in today's world where cmd_per_lun is "short" and can_queue is "int", ea2f0f77538c seems incorrect to me. The comparison should be done as "int", not "short", in order to prevent the truncation problem with can_queue that Dexuan's patch is trying to address.
Yeah, right. I think can_queue values > short_max was considered outside the realms of what is realistic then, hence my sloppy programming.
The result will always fit in back into the "short" cmd_per_lun since it is calculating a "min" function.
Thanks, John
- if (scsi_driver.can_queue > SHRT_MAX)
scsi_driver.can_queue = SHRT_MAX;
This fix works, but is a more of a temporary hack until I can finish a larger overhaul of the algorithm.
But for now, I think the better fix is for ea2f0f77538c to do the comparison as "int" instead of "short".
That seems better to me. But Let's wait for other possible opinion.
Thanks, John
From: John Garry john.garry@huawei.com Sent: Wednesday, October 6, 2021 9:03 AM
On 06/10/2021 16:01, Michael Kelley wrote:
From: John Garry john.garry@huawei.com Sent: Wednesday, October 6, 2021 1:17 AM
On 06/10/2021 08:03, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative number: 'max_outstanding_req_per_channel' is 352, 'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(), scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which is bigger than SHRT_MAX (i.e. 32767).
Out of curiosity, are these values realistic? You're capping can_queue just because of a data size issue, so, if these values are realistic, seems a weak reason.
The calculated value of can_queue is not realistic. The blk-mq layer caps the number of tags at 10240,
nit: 1024, I think
I was thinking about BLK_MQ_MAX_DEPTH (#define'd as 10240), which is used to limit the tag set size in blk_mq_alloc_tag_set(). When running on large VMs on Hyper-V, we can see the "blk-mq: reduced tag depth to 10240" message. :-(
Michael
On 06/10/2021 17:08, Michael Kelley wrote:
The calculated value of can_queue is not realistic. The blk-mq layer caps the number of tags at 10240,
nit: 1024, I think
I was thinking about BLK_MQ_MAX_DEPTH (#define'd as 10240), which is used to limit the tag set size in blk_mq_alloc_tag_set(). When running on large VMs on Hyper-V, we can see the "blk-mq: reduced tag depth to 10240" message.:-(
Ah, right. The other related capping is the sdev queue depth, which is now capped at max(1024, can_queue), see scsi_device_max_queue_depth().
Thanks, John
From: John Garry john.garry@huawei.com Sent: Wednesday, October 6, 2021 9:03 AM
...
- if (scsi_driver.can_queue > SHRT_MAX)
scsi_driver.can_queue = SHRT_MAX;
This fix works, but is a more of a temporary hack until I can finish a larger overhaul of the algorithm.
But for now, I think the better fix is for ea2f0f77538c to do the comparison as "int" instead of "short".
That seems better to me. But Let's wait for other possible opinion.
Thanks, John
It looks like shost->cmd_per_lun has been "short" since day 1. I don't know whether it should be changed to unsigned int.
Thanks for the thoughts! I'll post a v2 like the below in 24 hours.
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 3f6f14f0cafb..24b72ee4246f 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -220,7 +220,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, goto fail; }
- shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, + /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */ + shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, shost->can_queue);
error = scsi_init_sense_cache(shost)
linux-stable-mirror@lists.linaro.org