Re: [Linux-stable-mirror] [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED

28 Jan 2018


      Hi ming
Thanks for your time to look at this.
That's really appreciated.
On 01/27/2018 11:44 PM, Ming Lei wrote:
...
...
...
...
The best way to fix this is to ensure the timeout path has been completed before cancel the
previously outstanding requests. (Just ignore the case where the nvme_timeout will invoke nvme_dev_disable,
it has to be fixed by other way.)
Maybe your approach looks a bit clean and simplify the implementation, but seems
it isn't necessary.
So could you explain a bit what the exact issue you are worrying about? deadlock?
or others?
There is indeed potential issue. But it is in very narrow window.
Please refer to https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2018_1_19...
OK, follows description from the above link:
...
Yes, once controller disabling completes, any started requests will be
handled and cannot expire. But before the _boundary_, there could be a
nvme_timeout context runs with nvme_dev_disable in parallel. If a timeout
path grabs a request, then nvme_dev_disable cannot get and cancel it.
So even though the nvme_dev_disable completes, there still could be a request in nvme_timeout context.
The worst case is :
nvme_timeout                              nvme_reset_work
if (ctrl->state == RESETTING )              nvme_dev_disable
   nvme_dev_disable                        initializing procedure
the nvme_dev_disable run with reinit procedure in nvme_reset_work in parallel.
OK, that is the issue, the nvme_dev_disable() in nvme_timeout() may
disable queues again, and cause hang in nvme_reset_work().
Yes. :)
...
Looks Keith's suggestion of introducing nvme_sync_queues() should be
enough to kill the race, but seems nvme_sync_queues() should have been
called at the entry of nvme_reset_work() unconditionally.
Thanks for your suggestion.
Jianchao

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Linux-stable-mirror] [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED