On Fri, Sep 13, 2024 at 01:23:01PM -0700, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq).
This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition.
Suggested-by: Asahi Lina lina@asahilina.net Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control") Closes: https://github.com/AsahiLinux/linux/issues/309 Cc: stable@vger.kernel.org Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++-- drivers/gpu/drm/scheduler/sched_main.c | 7 ++----- include/drm/gpu_scheduler.h | 2 +- 3 files changed, 5 insertions(+), 8 deletions(-)
Tested for several hours with CONFIG_PREMPT=y and kasan with a similar workload as in the github issue without reports or oopses.
Feel free to add Tested-by: Janne Grunau j@jannau.net
thanks, Janne