Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote: [SNIP] Take a look at the DOT graphs for atomic I've done a while ago. I think we could make a formidable competition for who's doing the worst diagrams :-)
Thanks, going to give that a try.
[SNIP] amdgpu: Expects that you never hold any of the heavywheight locks while waiting for a fence (since gpu resets will need them).
i915: Happily blocks on fences while holding all kinds of locks, expects gpu reset to be able to recover even in this case.
In this case I can comfort you, the looks amdgpu needs to grab during GPU reset are the reservation lock of the VM page tables. I have strong doubt that i915 will ever hold those.
Could be that we run into problems because Thread A hold lock 1 tries to take lock 2, then i915 holds 2 and our reset path needs 1.
[SNIP]
Yes, except for fallback paths and bootup self tests we simply never wait for fences while holding locks.
That's not what I meant with "are you sure". Did you enable the cross-release stuff (after patching the bunch of leftover core kernel issues still present), annotate dma_fence with the cross-release stuff, run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and weep?
Ok, what exactly do you mean with cross-release checking?
I didn't do the full thing yet, but just within i915 we've found tons of small little deadlocks we never really considered thanks to cross release, and that wasn't even including the dma_fence annotation. Luckily nothing that needed a full-on driver redesign.
I guess I need to ping core kernel maintainers about cross-release again. I'd much prefer if we could validate ->invalidate_mapping and the locking/fence dependency issues using that, instead of me having to read and understand all the drivers.
[SNIP]
I fear that with the ->invalidate_mapping callback (which inverts the control flow between importer and exporter) and tying dma_fences into all this it will be a _lot_ worse. And I'm definitely too stupid to understand all the dependency chains without the aid of lockdep and a full test suite (we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
Yes, that is also something I worry about.
Regards, Christian.