On 11/25/25 08:55, Philipp Stanner wrote:
On Thu, 2025-11-20 at 15:41 +0100, Christian König wrote:
Add a define implementations can use as reasonable maximum signaling timeout. Document that if this timeout is exceeded by config options implementations should taint the kernel.
Tainting the kernel is important for bug reports to detect that end users might be using a problematic configuration.
Signed-off-by: Christian König christian.koenig@amd.com
include/linux/dma-fence.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 64639e104110..b31dfa501c84 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -28,6 +28,20 @@ struct dma_fence_ops; struct dma_fence_cb; struct seq_file; +/**
- define DMA_FENCE_MAX_REASONABLE_TIMEOUT - max reasonable signaling timeout
- The dma_fence object has a deep inter dependency with core memory
- management, for a detailed explanation see section DMA Fences under
- Documentation/driver-api/dma-buf.rst.
- Because of this all dma_fence implementations must guarantee that each fence
- completes in a finite time. This define here now gives a reasonable value for
- the timeout to use. It is possible to use a longer timeout in an
- implementation but that should taint the kernel.
- */
+#define DMA_FENCE_MAX_REASONABLE_TIMEOUT (2*HZ)
HZ can change depending on the config. Is that really a good choice? I could see racy situations arising in some configs vs others
2*HZ is always two seconds expressed in number of jiffies, I can use msecs_to_jiffies(2000) to make that more obvious.
The GPU scheduler has a very similar define, MAX_WAIT_SCHED_ENTITY_Q_EMPTY which is currently just 1 second.
The real question is what is the maximum amount of time we can wait for the HW before we should trigger a timeout?
Some AMD internal team is pushing for 10 seconds, but that also means that for example we wait 10 seconds for the OOM killer to do something. That sounds like way to long.
Regards, Christian.
P.