Once the command ring doorbell is rung the xHC controller will parse all
command TRBs on the command ring that have the cycle bit set properly.
If the driver just started writing the next command TRB to the ring when
hardware finished the previous TRB, then HW might fetch an incomplete TRB
as long as its cycle bit set correctly.
A command TRB is 16 bytes (128 bits) long.
Driver writes the command TRB in four 32 bit chunks, with the chunk
containing the cycle bit last. This does however not guarantee that
chunks actually get written in that order.
This was detected in stress testing when canceling URBs with several
connected USB devices.
Two consecutive "Set TR Dequeue pointer" commands got queued right
after each other, and the second one was only partially written when
the controller parsed it, causing the dequeue pointer to be set
to bogus values. This was seen as error messages:
"Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
Solution is to add a write memory barrier before writing the cycle bit.
Cc: <stable(a)vger.kernel.org>
Tested-by: Ross Zwisler <zwisler(a)google.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 5677b81c0915..cf0c93a90200 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
trb->field[0] = cpu_to_le32(field1);
trb->field[1] = cpu_to_le32(field2);
trb->field[2] = cpu_to_le32(field3);
+ /* make sure TRB is fully written before giving it to the controller */
+ wmb();
trb->field[3] = cpu_to_le32(field4);
trace_xhci_queue_trb(ring, trb);
--
2.25.1
Issuing a 'reboot' command in i.MX5 leads to the following flow:
[ 24.557742] [<c0769b78>] (msm_atomic_commit_tail) from [<c06db0b4>]
(commit_tail+0xa4/0x1b0)
[ 24.566349] [<c06db0b4>] (commit_tail) from [<c06dbed0>]
(drm_atomic_helper_commit+0x154/0x188)
[ 24.575193] [<c06dbed0>] (drm_atomic_helper_commit) from
[<c06db604>] (drm_atomic_helper_disable_all+0x154/0x1c0)
[ 24.585599] [<c06db604>] (drm_atomic_helper_disable_all) from
[<c06db704>] (drm_atomic_helper_shutdown+0x94/0x12c)
[ 24.596094] [<c06db704>] (drm_atomic_helper_shutdown) from
In the i.MX5 case, priv->kms is not populated (as i.MX5 does not use any
of the Qualcomm display controllers), causing a NULL pointer
dereference in msm_atomic_commit_tail():
[ 24.268964] 8<--- cut here ---
[ 24.274602] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[ 24.283434] pgd = (ptrval)
[ 24.286387] [00000000] *pgd=ca212831
[ 24.290788] Internal error: Oops: 17 [#1] SMP ARM
[ 24.295609] Modules linked in:
[ 24.298777] CPU: 0 PID: 197 Comm: init Not tainted 5.11.0-rc2-next-20210111 #333
[ 24.306276] Hardware name: Freescale i.MX53 (Device Tree Support)
[ 24.312442] PC is at msm_atomic_commit_tail+0x54/0xb9c
[ 24.317743] LR is at commit_tail+0xa4/0x1b0
Fix the problem by calling drm_atomic_helper_shutdown() conditionally.
Cc: <stable(a)vger.kernel.org>
Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver")
Suggested-by: Rob Clark <robdclark(a)gmail.com>
Signed-off-by: Fabio Estevam <festevam(a)gmail.com>
---
Changes since v1:
- Explain in the commit log that the problem happens after a 'reboot' command.
drivers/gpu/drm/msm/msm_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 108c405e03dd..c082b72b9e3b 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1311,7 +1311,8 @@ static void msm_pdev_shutdown(struct platform_device *pdev)
{
struct drm_device *drm = platform_get_drvdata(pdev);
- drm_atomic_helper_shutdown(drm);
+ if (get_mdp_ver(pdev))
+ drm_atomic_helper_shutdown(drm);
}
static const struct of_device_id dt_match[] = {
--
2.25.1
This patch ensures that when `nvme_map_data()` fails to map the
addresses in a scatter/gather list:
* The addresses are not incorrectly unmapped. The underlying
scatter/gather code unmaps the addresses after detecting a failure.
Thus, unmapping them again in the driver is a bug.
* The DMA pool allocations are not deallocated when they were never
allocated.
The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
double unmap, which is a bug.
Before this patch, I observed intermittent application- and VM-level
failures when running a benchmark, fio, in an AMD SEV guest. This patch
resolves the failures.
Tested-by: Marc Orr <marcorr(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Orr <marcorr(a)google.com>
---
drivers/nvme/host/pci.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9b1fc8633cfe..8b504ed08321 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -543,11 +543,14 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
WARN_ON_ONCE(!iod->nents);
- if (is_pci_p2pdma_page(sg_page(iod->sg)))
- pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
- rq_dma_dir(req));
- else
- dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+ if (!dma_mapping_error(dev->dev, iod->first_dma)) {
+ if (is_pci_p2pdma_page(sg_page(iod->sg)))
+ pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
+ rq_dma_dir(req));
+ else
+ dma_unmap_sg(dev->dev, iod->sg, iod->nents,
+ rq_dma_dir(req));
+ }
if (iod->npages == 0)
@@ -836,8 +839,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
- if (!nr_mapped)
+ if (!nr_mapped) {
+ iod->first_dma = DMA_MAPPING_ERROR;
+ iod->npages = -1;
goto out;
+ }
iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
--
2.30.0.284.gd98b1dd5eaa7-goog
Issuing a 'reboot' command in i.MX5 leads to the following flow:
[ 24.557742] [<c0769b78>] (msm_atomic_commit_tail) from [<c06db0b4>]
(commit_tail+0xa4/0x1b0)
[ 24.566349] [<c06db0b4>] (commit_tail) from [<c06dbed0>]
(drm_atomic_helper_commit+0x154/0x188)
[ 24.575193] [<c06dbed0>] (drm_atomic_helper_commit) from
[<c06db604>] (drm_atomic_helper_disable_all+0x154/0x1c0)
[ 24.585599] [<c06db604>] (drm_atomic_helper_disable_all) from
[<c06db704>] (drm_atomic_helper_shutdown+0x94/0x12c)
[ 24.596094] [<c06db704>] (drm_atomic_helper_shutdown) from
In the i.MX5 case, priv->kms is not populated (as i.MX5 does not use any
of the Qualcomm display controllers), causing a NULL pointer
dereference in msm_atomic_commit_tail():
[ 24.268964] 8<--- cut here ---
[ 24.274602] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[ 24.283434] pgd = (ptrval)
[ 24.286387] [00000000] *pgd=ca212831
[ 24.290788] Internal error: Oops: 17 [#1] SMP ARM
[ 24.295609] Modules linked in:
[ 24.298777] CPU: 0 PID: 197 Comm: init Not tainted 5.11.0-rc2-next-20210111 #333
[ 24.306276] Hardware name: Freescale i.MX53 (Device Tree Support)
[ 24.312442] PC is at msm_atomic_commit_tail+0x54/0xb9c
[ 24.317743] LR is at commit_tail+0xa4/0x1b0
Fix the problem by calling drm_atomic_helper_shutdown() conditionally.
Cc: <stable(a)vger.kernel.org>
Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver")
Suggested-by: Rob Clark <robdclark(a)gmail.com>
Signed-off-by: Fabio Estevam <festevam(a)gmail.com>
---
Changes since v1:
- Explain in the commit log that the problem happens after a 'reboot' command.
drivers/gpu/drm/msm/msm_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 108c405e03dd..c082b72b9e3b 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1311,7 +1311,8 @@ static void msm_pdev_shutdown(struct platform_device *pdev)
{
struct drm_device *drm = platform_get_drvdata(pdev);
- drm_atomic_helper_shutdown(drm);
+ if (get_mdp_ver(pdev))
+ drm_atomic_helper_shutdown(drm);
}
static const struct of_device_id dt_match[] = {
--
2.25.1