On 05/10/2018 02:14 PM, Keith Busch wrote:
On Thu, May 10, 2018 at 01:56:56PM -0500, Alex G. wrote:
@@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev) dev_info(dev->ctrl.device, "restart after slot reset\n"); pci_restore_state(pdev);
- nvme_reset_ctrl(&dev->ctrl);
- return PCI_ERS_RESULT_RECOVERED;
- nvme_reset_ctrl_sync(&dev->ctrl);
This does wonders when nvme_reset_ctrl_sync() returns in a timely manner. I was also able to get the nvme drive in a state where nvme_reset_ctrl_sync() does not return. Then we end up with the device lock in report_slot_reset, which, as you may imagine, is not a great thing.
It never returns? That shouldn't happen. There are cases where it may take a very long time, depending on what the controller reports in CAP.TO. The only other case it may stall is if the controller never responds to the initialization admin commands, but that should delay by 60 seconds under default parameters.
Took 28 minutes before I gave up and rebooted the machine. Maybe I should have waited 30. Even 60 seconds seems like a terribly long time to wait in AER. Simple stuff like block IO and 'nvme list' hangs in kernel space this entire time. I can raise a separate issue once I find a reliable way to repro.
Alex