Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea

22 Jul 2020

      On Wed, Jul 22, 2020 at 3:12 PM Thomas Hellström (Intel)
thomas_os@shipmail.org wrote:
...
On 2020-07-22 14:41, Daniel Vetter wrote:
...
Ah I think I misunderstood which options you want to compare here. I'm
not sure how much pain fixing up "dma-fence as memory fence" really
is. That's kinda why I want a lot more testing on my annotation
patches, to figure that out. Not much feedback aside from amdgpu and
intel, and those two drivers pretty much need to sort out their memory
fence issues anyway (because of userptr and stuff like that).
The only other issues outside of these two drivers I'm aware of:

various scheduler drivers doing allocations in the drm/scheduler

critical section. Since all arm-soc drivers have a mildly shoddy
memory model of "we just pin everything" they don't really have to
deal with this. So we might just declare arm as a platform broken and
not taint the dma-fence critical sections with fs_reclaim. Otoh we
need to fix this for drm/scheduler anyway, I think best option would
be to have a mempool for hw fences in the scheduler itself, and at
that point fixing the other drivers shouldn't be too onerous.

vmwgfx doing a dma_resv in the atomic commit tail. Entirely

orthogonal to the entire memory fence discussion.
With vmwgfx there is another issue that is hit when the gpu signals an
error. At that point the batch might be restarted with a new meta
command buffer that needs to be allocated out of a dma pool. in the
fence critical section. That's probably a bit nasty to fix, but not
impossible.
Yeah reset is fun. From what I've seen this isn't any worse than the
hw allocation issue for drm/scheduler drivers, they just allocate
another hw fence with all that drags along. So the same mempool should
be sufficient.
The really nasty thing around reset is display interactions, because
you just can't take drm_modeset_lock. amdgpu fixed that now (at least
the modeset_lock side, not yet the memory allocations that brings
along). i915 has the same problem for gen2/3 (so really old stuff),
and we've solved that by breaking&restarting all i915 fence waits, but
that predates multi-gpu and wont work for shared fences ofc. But it's
so old and predates all multi-gpu laptops that I think wontfix is the
right take.
Other drm/scheduler drivers don't have that problem since they're all
render-only, so no display driver interaction.
...
...
I'm pretty sure there's more bugs, I just haven't heard from them yet.
Also due to the opt-in nature of dma-fence we can limit the scope of
what we fix fairly naturally, just don't put them where no one cares
:-) Of course that also hides general locking issues in dma_fence
signalling code, but well *shrug*.
Hmm, yes. Another potential big problem would be drivers that want to
use gpu page faults in the dma-fence critical sections with the
batch-based programming model.
Yeah that's a massive can of worms. But luckily there's no such driver
merged in upstream, so hopefully we can think about all the
constraints and how to best annotate&enforce this before we land any
code and have big regrets.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea