When processing a batch of requests, it is possible that nvme_queue_rq()
misses to ring nvme queue doorbell if the last request fails because the
controller is not ready. As a result of that, previously queued requests
will timeout because the device had not chance to know about the commands
existence. This failure can cause nvme controller reset to timeout if
there was another App using adminq while nvme reset was taking place.
Consider this case:
- App is hammering adminq with NVME_ADMIN_IDENTIFY commands
- Controller reset triggered by "echo 1 > /sys/.../nvme0/reset_controller"
nvme_reset_ctrl() will change controller state to NVME_CTRL_RESETTING.
From that point on all requests from App will be forced to fail because
the controller is no longer ready. More importantly these requests will
not make it to adminq and will be short-circuited in nvme_queue_rq().
Unlike App requests, requests issued by reset code path will be allowed
to go through adminq in order to carry out the reset process. The problem
happens when blk-mq decides to mix requests from reset code path and App
in one batch, in particular when the last request in such batch happens
to be from App.
In this case the last request will have bd->last set to true telling the
driver to ring doorbell after queuing this request. However, since the
controller is not ready, this App request will be completed without going
through adminq, and nvme_queue_rq() will miss the opportunity to ring
adminq doorbell leaving earlier queued requests unknown to the device.
Fixes: d4060d2be1132 ("nvme-pci: fix controller reset hang when racing with nvme_timeout")
Cc: stable(a)vger.kernel.org
Reported-by: Eric Badger <ebadger(a)purestorage.com>
Signed-off-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Reviewed-by: Eric Badger <ebadger(a)purestorage.com>
---
drivers/nvme/host/pci.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 98864b853eef..f6b1ae593e8e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -946,8 +946,12 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
return BLK_STS_IOERR;
- if (unlikely(!nvme_check_ready(&dev->ctrl, req, true)))
- return nvme_fail_nonready_command(&dev->ctrl, req);
+ if (unlikely(!nvme_check_ready(&dev->ctrl, req, true))) {
+ ret = nvme_fail_nonready_command(&dev->ctrl, req);
+ if (ret == BLK_STS_OK && bd->last)
+ nvme_commit_rqs(hctx);
+ return ret;
+ }
ret = nvme_prep_rq(dev, req);
if (unlikely(ret))
@@ -1724,6 +1728,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
static const struct blk_mq_ops nvme_mq_admin_ops = {
.queue_rq = nvme_queue_rq,
.complete = nvme_pci_complete_rq,
+ .commit_rqs = nvme_commit_rqs,
.init_hctx = nvme_admin_init_hctx,
.init_request = nvme_pci_init_request,
.timeout = nvme_timeout,
--
2.25.1
This fixes broken atomic checks which cause a race between the
release-timer and processing of hid input.
I noticed that contacts were sometimes sticking, even with the "sticky
fingers" quirk enabled. This fixes that problem.
Cc: stable(a)vger.kernel.org
Fixes: 9609827458c3 ("HID: multitouch: optimize the sticky fingers timer")
Signed-off-by: Andri Yngvason <andri(a)yngvason.is>
---
V1 -> V2: Clarified where the race is and added Fixes tag as suggested
by Greg KH
V2 -> V3: Fix formatting of "Fixes" tag
drivers/hid/hid-multitouch.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
index 2e72922e36f5..91a4d3fc30e0 100644
--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -1186,7 +1186,7 @@ static void mt_touch_report(struct hid_device *hid,
int contact_count = -1;
/* sticky fingers release in progress, abort */
- if (test_and_set_bit(MT_IO_FLAGS_RUNNING, &td->mt_io_flags))
+ if (test_and_set_bit_lock(MT_IO_FLAGS_RUNNING, &td->mt_io_flags))
return;
scantime = *app->scantime;
@@ -1267,7 +1267,7 @@ static void mt_touch_report(struct hid_device *hid,
del_timer(&td->release_timer);
}
- clear_bit(MT_IO_FLAGS_RUNNING, &td->mt_io_flags);
+ clear_bit_unlock(MT_IO_FLAGS_RUNNING, &td->mt_io_flags);
}
static int mt_touch_input_configured(struct hid_device *hdev,
@@ -1699,11 +1699,11 @@ static void mt_expired_timeout(struct timer_list *t)
* An input report came in just before we release the sticky fingers,
* it will take care of the sticky fingers.
*/
- if (test_and_set_bit(MT_IO_FLAGS_RUNNING, &td->mt_io_flags))
+ if (test_and_set_bit_lock(MT_IO_FLAGS_RUNNING, &td->mt_io_flags))
return;
if (test_bit(MT_IO_FLAGS_PENDING_SLOTS, &td->mt_io_flags))
mt_release_contacts(hdev);
- clear_bit(MT_IO_FLAGS_RUNNING, &td->mt_io_flags);
+ clear_bit_unlock(MT_IO_FLAGS_RUNNING, &td->mt_io_flags);
}
static int mt_probe(struct hid_device *hdev, const struct hid_device_id *id)
--
2.37.2