There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks. Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.
Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay(a)intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_drm_client.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index e64f4b645e2e..badfa045ead8 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -196,6 +196,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
struct xe_drm_client *client;
struct drm_gem_object *obj;
struct xe_bo *bo;
+ LLIST_HEAD(deferred);
unsigned int id;
u32 mem_type;
@@ -215,11 +216,14 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
list_for_each_entry(bo, &client->bos_list, client_link) {
if (!kref_get_unless_zero(&bo->ttm.base.refcount))
continue;
+
bo_meminfo(bo, stats);
- xe_bo_put(bo);
+ xe_bo_put_deferred(bo, &deferred);
}
spin_unlock(&client->bos_lock);
+ xe_bo_put_commit(&deferred);
+
for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
if (!xe_mem_type_to_name[mem_type])
continue;
--
2.46.0
This reverts commit ab8d66d132bc8f1992d3eb6cab8d32dda6733c84 because it
breaks codecs using non-continuous masks in source and sink ports. The
commit missed the point that port numbers are not used as indices for
iterating over prop.sink_ports or prop.source_ports.
Soundwire core and existing codecs expect that the array passed as
prop.sink_ports and prop.source_ports is continuous. The port mask still
might be non-continuous, but that's unrelated.
Reported-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Closes: https://lore.kernel.org/all/b6c75eee-761d-44c8-8413-2a5b34ee2f98@linux.inte…
Fixes: ab8d66d132bc ("soundwire: stream: fix programming slave ports for non-continous port maps")
Acked-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Reviewed-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Resending with Ack/Rb tags and missing Cc-stable.
---
drivers/soundwire/stream.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c
index f275143d7b18..7aa4900dcf31 100644
--- a/drivers/soundwire/stream.c
+++ b/drivers/soundwire/stream.c
@@ -1291,18 +1291,18 @@ struct sdw_dpn_prop *sdw_get_slave_dpn_prop(struct sdw_slave *slave,
unsigned int port_num)
{
struct sdw_dpn_prop *dpn_prop;
- unsigned long mask;
+ u8 num_ports;
int i;
if (direction == SDW_DATA_DIR_TX) {
- mask = slave->prop.source_ports;
+ num_ports = hweight32(slave->prop.source_ports);
dpn_prop = slave->prop.src_dpn_prop;
} else {
- mask = slave->prop.sink_ports;
+ num_ports = hweight32(slave->prop.sink_ports);
dpn_prop = slave->prop.sink_dpn_prop;
}
- for_each_set_bit(i, &mask, 32) {
+ for (i = 0; i < num_ports; i++) {
if (dpn_prop[i].num == port_num)
return &dpn_prop[i];
}
--
2.43.0
Hi,
the issue was so far handled in https://lore.kernel.org/regressions/ZsyMzW-4ee_U8NoX@eldamar.lan/T/#m390d6e… and https://bugzilla.kernel.org/show_bug.cgi?id=219129
I haven’t seen any communication whether a backport for 5.15 is already in progress, so I thought I’d follow up here.
Today we rolled out 5.15.165 and immediately ran into the same issue. There’s at least one other person asking for a 5.15 backport in Bugzilla.
Hugs,
Christian
--
Christian Theune - A97C62CE - 0179 7808366
@theuni - christian(a)theune.cc
From: Peter Wang <peter.wang(a)mediatek.com>
ufshcd_abort_all forcibly aborts all on-going commands and the host
controller will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.
MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.
SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.
For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 35 ++++++++++++++++++++---------------
include/ufs/ufshcd.h | 3 +++
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..8380e8861d83 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
lrbp->req_abort_skip = false;
+ lrbp->abort_initiated_by_eh = false;
ufshcd_comp_scsi_upiu(hba, lrbp);
@@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
}
break;
case OCS_ABORTED:
- result |= DID_ABORT << 16;
+ if (lrbp->abort_initiated_by_eh)
+ result |= DID_REQUEUE << 16;
+ else
+ result |= DID_ABORT << 16;
break;
case OCS_INVALID_COMMAND_STATUS:
result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
struct ufs_hba *hba = shost_priv(shost);
- struct ufshcd_lrb *lrbp = &hba->lrb[tag];
- struct ufs_hw_queue *hwq;
- unsigned long flags;
*ret = ufshcd_try_to_abort_task(hba, tag);
dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
*ret ? "failed" : "succeeded");
- /* Release cmd in MCQ mode if abort succeeds */
- if (hba->mcq_enabled && (*ret == 0)) {
- hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
- if (!hwq)
- return 0;
- spin_lock_irqsave(&hwq->cq_lock, flags);
- if (ufshcd_cmd_inflight(lrbp->cmd))
- ufshcd_release_scsi_cmd(hba, lrbp);
- spin_unlock_irqrestore(&hwq->cq_lock, flags);
- }
-
return *ret == 0;
}
@@ -7561,6 +7551,21 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
goto out;
}
+ /*
+ * When the host software receives a "FUNCTION COMPLETE", set flag
+ * to requeue command after receive response with OCS_ABORTED
+ * SDB mode: UTRLCLR Task Management response which means a Transfer
+ * Request was aborted.
+ * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+ *
+ * This flag is set because error handler ufshcd_abort_all forcibly
+ * aborts all commands, and the host controller will automatically
+ * fill in the OCS field of the corresponding response with OCS_ABORTED.
+ * Therefore, upon receiving this response, it needs to be requeued.
+ */
+ if (!err)
+ lrbp->abort_initiated_by_eh = true;
+
err = ufshcd_clear_cmd(hba, tag);
if (err)
dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..07b5e60e6c44 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
* @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
* @data_unit_num: the data unit number for the first block for inline crypto
* @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_eh: The flag is specifically used to handle
+ * ufshcd_err_handler forcibly aborts.
*/
struct ufshcd_lrb {
struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@ struct ufshcd_lrb {
#endif
bool req_abort_skip;
+ bool abort_initiated_by_eh;
};
/**
--
2.45.2
From: Jason Andryuk <jason.andryuk(a)amd.com>
Probing xen-fbfront faults in video_is_primary_device(). The passed-in
struct device is NULL since xen-fbfront doesn't assign it and the
memory is kzalloc()-ed. Assign fb_info->device to avoid this.
This was exposed by the conversion of fb_is_primary_device() to
video_is_primary_device() which dropped a NULL check for struct device.
Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers")
Reported-by: Arthur Borsboom <arthurborsboom(a)gmail.com>
Closes: https://lore.kernel.org/xen-devel/CALUcmUncX=LkXWeiSiTKsDY-cOe8QksWhFvcCneO…
Tested-by: Arthur Borsboom <arthurborsboom(a)gmail.com>
CC: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
The other option would be to re-instate the NULL check in
video_is_primary_device()
---
drivers/video/fbdev/xen-fbfront.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/video/fbdev/xen-fbfront.c b/drivers/video/fbdev/xen-fbfront.c
index 66d4628a96ae..c90f48ebb15e 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -407,6 +407,7 @@ static int xenfb_probe(struct xenbus_device *dev,
/* complete the abuse: */
fb_info->pseudo_palette = fb_info->par;
fb_info->par = info;
+ fb_info->device = &dev->dev;
fb_info->screen_buffer = info->fb;
--
2.43.0
From: Willem de Bruijn <willemb(a)google.com>
The referenced commit drops bad input, but has false positives.
Tighten the check to avoid these.
The check detects illegal checksum offload requests, which produce
csum_start/csum_off beyond end of packet after segmentation.
But it is based on two incorrect assumptions:
1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO.
True in callers that inject into the tx path, such as tap.
But false in callers that inject into rx, like virtio-net.
Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or
CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal.
2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL.
False, as tcp[46]_gso_segment will fix up csum_start and offset for
all other ip_summed by calling __tcp_v4_send_check.
Because of 2, we can limit the scope of the fix to virtio_net_hdr
that do try to set these fields, with a bogus value.
Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Cc: <stable(a)vger.kernel.net>
---
Verified that the syzbot repro is still caught.
An equivalent alternative would be to move the check for csum_offset
to where the csum_start check is in segmentation:
- if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb)))
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb) ||
+ skb->csum_offset != offsetof(struct tcphdr, check)))
Cleaner, but messier stable backport.
We'll need an equivalent patch to this for VIRTIO_NET_HDR_GSO_UDP_L4.
But that csum_offset test was in a different commit, so different
Fixes tag.
---
include/linux/virtio_net.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 6c395a2600e8d..276ca543ef44d 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -173,7 +173,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
break;
case SKB_GSO_TCPV4:
case SKB_GSO_TCPV6:
- if (skb->csum_offset != offsetof(struct tcphdr, check))
+ if (skb->ip_summed == CHECKSUM_PARTIAL &&
+ skb->csum_offset != offsetof(struct tcphdr, check))
return -EINVAL;
break;
}
--
2.46.0.598.g6f2099f65c-goog
Supposing first scenario with a virtio_blk driver.
CPU0 CPU1
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
virtblk_done()
blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queues()
/* Add IO request to dispatch list */ 1) store blk_mq_start_stopped_hw_queue()
clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue() blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) if (!blk_mq_hctx_has_pending()) 4) load
return return
blk_mq_sched_dispatch_requests() blk_mq_sched_dispatch_requests()
if (blk_mq_hctx_stopped()) 2) load if (blk_mq_hctx_stopped())
return return
__blk_mq_sched_dispatch_requests() __blk_mq_sched_dispatch_requests()
Supposing another scenario.
CPU0 CPU1
blk_mq_requeue_work()
/* Add IO request to dispatch list */ 1) store virtblk_done()
blk_mq_run_hw_queues()/blk_mq_delay_run_hw_queues() blk_mq_start_stopped_hw_queues()
if (blk_mq_hctx_stopped()) 2) load blk_mq_start_stopped_hw_queue()
continue clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue()/blk_mq_delay_run_hw_queue() blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_sched_dispatch_requests()
Both scenarios are similar, the full memory barrier should be inserted between
1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees
BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU
will not rerun the hardware queue causing starvation of the request.
The easy way to fix it is to add the essential full memory barrier into helper
of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue
is not stopped most of the time), we only insert the barrier into the slow path.
Actually, only slow path needs to care about missing of dispatching the request
to the low-level device driver.
Fixes: 320ae51feed5c ("blk-mq: new multi-queue block IO queueing mechanism")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
---
block/blk-mq.c | 6 ++++++
block/blk-mq.h | 13 +++++++++++++
2 files changed, 19 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ac39f2a346a52..48a6a437fba5e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2413,6 +2413,12 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
return;
clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
+ /*
+ * Pairs with the smp_mb() in blk_mq_hctx_stopped() to order the
+ * clearing of BLK_MQ_S_STOPPED above and the checking of dispatch
+ * list in the subsequent routine.
+ */
+ smp_mb__after_atomic();
blk_mq_run_hw_queue(hctx, async);
}
EXPORT_SYMBOL_GPL(blk_mq_start_stopped_hw_queue);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 260beea8e332c..f36f3bff70d86 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -228,6 +228,19 @@ static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data
static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
{
+ /* Fast path: hardware queue is not stopped most of the time. */
+ if (likely(!test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+ return false;
+
+ /*
+ * This barrier is used to order adding of dispatch list before and
+ * the test of BLK_MQ_S_STOPPED below. Pairs with the memory barrier
+ * in blk_mq_start_stopped_hw_queue() so that dispatch code could
+ * either see BLK_MQ_S_STOPPED is cleared or dispatch list is not
+ * empty to avoid missing dispatching requests.
+ */
+ smp_mb();
+
return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
}
--
2.20.1
Supposing the following scenario with a virtio_blk driver.
CPU0 CPU1 CPU2
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
blk_mq_try_issue_directly() virtblk_done()
if (blk_mq_hctx_stopped())
blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queue()
blk_mq_run_hw_queue() blk_mq_run_hw_queue()
blk_mq_insert_request()
return // Who is responsible for dispatching this IO request?
After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped.
But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt
of completion of request, so it will run the hardware queue and marks the queue
as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1
and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same
hardware queue and returns. It misses dispatching a request. Fix it by running
the hardware queue explicitly. And blk_mq_request_issue_directly() should handle
a similar situation. Fix it as well.
Fixes: d964f04a8fde8 ("blk-mq: fix direct issue")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-mq.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e3c3c0c21b553..b2d0f22de0c7f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2619,6 +2619,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
blk_mq_insert_request(rq, 0);
+ blk_mq_run_hw_queue(hctx, false);
return;
}
@@ -2649,6 +2650,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
blk_mq_insert_request(rq, 0);
+ blk_mq_run_hw_queue(hctx, false);
return BLK_STS_OK;
}
--
2.20.1