On Wed, May 19, 2021 at 08:01:29AM +0000, Anatoli N.Chechelnickiy wrote:
> Hi
>
> I'm sorry to report about regression in 5.10.37
>
> with
>
> CONFIG_IOMMU_IOVA=y
> CONFIG_IOMMU_API=y
> CONFIG_IOMMU_SUPPORT=y
> CONFIG_OF_IOMMU=y
> CONFIG_INTEL_IOMMU=y
> CONFIG_INTEL_IOMMU_SVM=y
> CONFIG_INTEL_IOMMU_DEFAULT_ON=y
> CONFIG_INTEL_IOMMU_FLOPPY_WA=y
>
> and iommu=on in grub.cfg
>
>
> All my dell r340 and all lenovo servers won't boot any more. Just black
> screen at once.
>
>
> with intel_iommu=on iommu=pt DELL R240 and Lenovo SR350 cat boot
>
> Older Lenovo cannot boot even with "intel_iommu=on iommu=pt" only with
> iommu=off in grub
>
> With iommu=off in grub all servers are booting well
>
> With 5.4.36 do not have this problem
>
>
> tested last 4 days with 10+ different servers(
Should be fixed with 5.10.38. If not, please let us know.
thanks,
greg k-h
The __nvmf_check_ready() routine used to bounce all filesystem io if
the controller state isn't LIVE. However, a later patch changed the
logic so that it rejection ends up being based on the Q live check.
The fc transport has a slightly different sequence from rdma and tcp
for shutting down queues/marking them non-live. FC marks its queue
non-live after aborting all ios and waiting for their termination,
leaving a rather large window for filesystem io to continue to hit the
transport. Unfortunately this resulted in filesystem io or applications
seeing I/O errors.
Change the fc transport to mark the queues non-live at the first
sign of teardown for the association (when i/o is initially terminated).
Fixes: 73a5379937ec ("nvme-fabrics: allow to queue requests for live queues")
Cc: <stable(a)vger.kernel.org> # v5.8+
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
---
stable trees for 5.8 and 5.9 will require a slightly modified patch
---
drivers/nvme/host/fc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index d9ab9e7871d0..256e87721a01 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2461,6 +2461,18 @@ nvme_fc_terminate_exchange(struct request *req, void *data, bool reserved)
static void
__nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
{
+ int q;
+
+ /*
+ * if aborting io, the queues are no longer good, mark them
+ * all as not live.
+ */
+ if (ctrl->ctrl.queue_count > 1) {
+ for (q = 1; q < ctrl->ctrl.queue_count; q++)
+ clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[q].flags);
+ }
+ clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[0].flags);
+
/*
* If io queues are present, stop them and terminate all outstanding
* ios on them. As FC allocates FC exchange for each io, the
--
2.26.2