On 05/10/2018 02:20 PM, Alex G. wrote:
On 05/10/2018 02:14 PM, Keith Busch wrote:
On Thu, May 10, 2018 at 01:56:56PM -0500, Alex G. wrote:
@@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev) dev_info(dev->ctrl.device, "restart after slot reset\n"); pci_restore_state(pdev);
- nvme_reset_ctrl(&dev->ctrl);
- return PCI_ERS_RESULT_RECOVERED;
- nvme_reset_ctrl_sync(&dev->ctrl);
This does wonders when nvme_reset_ctrl_sync() returns in a timely manner. I was also able to get the nvme drive in a state where nvme_reset_ctrl_sync() does not return. Then we end up with the device lock in report_slot_reset, which, as you may imagine, is not a great thing.
It never returns? That shouldn't happen. There are cases where it may take a very long time, depending on what the controller reports in CAP.TO. The only other case it may stall is if the controller never responds to the initialization admin commands, but that should delay by 60 seconds under default parameters.
Took 28 minutes before I gave up and rebooted the machine. Maybe I should have waited 30. Even 60 seconds seems like a terribly long time to wait in AER. Simple stuff like block IO and 'nvme list' hangs in kernel space this entire time. I can raise a separate issue once I find a reliable way to repro.
I've been playing some more with this. With recovery from a malformed TLP resulting from a bad MPS value in the switch upstream port we are likely to not return in a timely manner (a few minutes to infinity). This happens with less than 100% consistency. I am in a state of disbelief, since this makes little sense to me. Log excerpt below. Do you think it's a separate issue, or related?
Alex
[ 16.828656] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready [ 1605.101288] megaraid_sas 0000:86:00.0: invalid short VPD tag 00 at offset 1 [ 1621.514702] pcieport 0000:ae:00.0: AER: Multiple Uncorrected (Fatal) error received: id=b020 [ 1621.696135] pcieport 0000:b0:04.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=b020(Receiver ID) [ 1621.707429] pcieport 0000:b0:04.0: device [10b5:9733] error status/mask=00440000/01a10000 [ 1621.715780] pcieport 0000:b0:04.0: [18] Malformed TLP (First) [ 1621.722568] pcieport 0000:b0:04.0: [22] Uncorrectable Internal Error [ 1621.729192] pcieport 0000:b0:04.0: TLP Header: 60000040 b10000ff 00000004 4f4bb000 [ 1621.736942] pcieport 0000:b0:04.0: broadcast error_detected message [ 1621.736945] nvme 0000:b1:00.0: HACK: report_error_detected: Preparing to lock [ 1621.736946] nvme 0000:b1:00.0: HACK: report_error_detected: locked and ready [ 1621.736948] nvme nvme2: frozen state error detected, reset controller [ 1625.649049] INFO: NMI handler (ghes_notify_nmi) took too long to run: 175.406 msecs [ 1634.244302] nvme 0000:b1:00.0: HACK: report_error_detected: Unlocked and DONE [ 1635.290798] pcieport 0000:b0:04.0: downstream link has been reset [ 1635.290804] pcieport 0000:b0:04.0: broadcast slot_reset message [ 1635.290811] nvme 0000:b1:00.0: HACK: report_slot_reset: Preparing to lock [ 1635.290815] nvme 0000:b1:00.0: HACK: report_slot_reset: locked and ready [ 1635.290823] nvme nvme2: restart after slot reset