The Mesa issue referenced below pointed out a possible deadlock:
[ 1231.611031] Possible interrupt unsafe locking scenario:
[ 1231.611033] CPU0 CPU1
[ 1231.611034] ---- ----
[ 1231.611035] lock(&xa->xa_lock#17);
[ 1231.611038] local_irq_disable();
[ 1231.611039] lock(&fence->lock);
[ 1231.611041] lock(&xa->xa_lock#17);
[ 1231.611044] <Interrupt>
[ 1231.611045] lock(&fence->lock);
[ 1231.611047]
*** DEADLOCK ***
In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that doesn't disable interrupts (eg:
drm_sched_job_add_dependency, drm_sched_entity_kill_jobs_cb).
CPU1 is executing drm_sched_entity_kill_jobs_cb as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.
Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.
dma_fence_signal() // locks f1.lock
-> drm_sched_entity_kill_jobs_cb()
-> foreach dependencies
-> dma_fence_add_callback() // locks f2.lock
This will deadlock if f1 and f2 share the same spinlock.
To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work.
v2: reworded commit message (Philipp)
v3: added Fixes tag (Philipp)
Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Suggested-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 34 +++++++++++++-----------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index c8e949f4a568..fe174a4857be 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -173,26 +173,15 @@ int drm_sched_entity_error(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_error);
+static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
+ struct dma_fence_cb *cb);
+
static void drm_sched_entity_kill_jobs_work(struct work_struct *wrk)
{
struct drm_sched_job *job = container_of(wrk, typeof(*job), work);
-
- drm_sched_fence_scheduled(job->s_fence, NULL);
- drm_sched_fence_finished(job->s_fence, -ESRCH);
- WARN_ON(job->s_fence->parent);
- job->sched->ops->free_job(job);
-}
-
-/* Signal the scheduler finished fence when the entity in question is killed. */
-static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
- struct dma_fence_cb *cb)
-{
- struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
- finish_cb);
+ struct dma_fence *f;
unsigned long index;
- dma_fence_put(f);
-
/* Wait for all dependencies to avoid data corruptions */
xa_for_each(&job->dependencies, index, f) {
struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
@@ -220,6 +209,21 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
dma_fence_put(f);
}
+ drm_sched_fence_scheduled(job->s_fence, NULL);
+ drm_sched_fence_finished(job->s_fence, -ESRCH);
+ WARN_ON(job->s_fence->parent);
+ job->sched->ops->free_job(job);
+}
+
+/* Signal the scheduler finished fence when the entity in question is killed. */
+static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
+ struct dma_fence_cb *cb)
+{
+ struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
+ finish_cb);
+
+ dma_fence_put(f);
+
INIT_WORK(&job->work, drm_sched_entity_kill_jobs_work);
schedule_work(&job->work);
}
--
2.43.0
This series adds AF_XDP zero coppy support to icssg driver.
Tests were performed on AM64x-EVM with xdpsock application [1].
A clear improvement is seen Transmit (txonly) and receive (rxdrop)
for 64 byte packets. 1500 byte test seems to be limited by line
rate (1G link) so no improvement seen there in packet rate
Having some issue with l2fwd as the benchmarking numbers show 0
for 64 byte packets after forwading first batch packets and I am
currently looking into it.
AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy)
rxdrop 259 462 645
txonly 350 354 760
l2fwd 178 240 0
AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy)
rxdrop 82 82 82
txonly 81 82 82
l2fwd 81 82 82
[1]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example
v3: https://lore.kernel.org/all/20251014105613.2808674-1-m-malladi@ti.com/
v4-v3:
- Rebased to the latest tip
Meghana Malladi (6):
net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx
queues
net: ti: icssg-prueth: Add XSK pool helpers
net: ti: icssg-prueth: Add AF_XDP zero copy for TX
net: ti: icssg-prueth: Make emac_run_xdp function independent of page
net: ti: icssg-prueth: Add AF_XDP zero copy for RX
net: ti: icssg-prueth: Enable zero copy in XDP features
drivers/net/ethernet/ti/icssg/icssg_common.c | 471 ++++++++++++++++---
drivers/net/ethernet/ti/icssg/icssg_prueth.c | 394 +++++++++++++---
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 25 +-
3 files changed, 741 insertions(+), 149 deletions(-)
base-commit: d550d63d0082268a31e93a10c64cbc2476b98b24
--
2.43.0