On Fri, Jun 19, 2020 at 10:13:35AM +0100, Chris Wilson wrote:
Quoting Daniel Vetter (2020-06-19 09:51:59)
On Fri, Jun 19, 2020 at 10:25 AM Chris Wilson chris@chris-wilson.co.uk wrote:
Forcing a generic primitive to always be part of the same global map is horrible.
And no concrete example or reason for why that's not possible. Because frankly it's not horrible, this is what upstream is all about: Shared concepts, shared contracts, shared code.
The proposed patches might very well encode the wrong contract, that's all up for discussion. But fundamentally questioning that we need one is missing what upstream is all about.
Then I have not clearly communicated, as my opinion is not that validation is worthless, but that the implementation is enshrining a global property on a low level primitive that prevents it from being used elsewhere. And I want to replace completion [chains] with fences, and bio with fences, and closures with fences, and what other equivalencies there are in the kernel. The fence is as central a locking construct as struct completion and deserves to be a foundational primitive provided by kernel/ used throughout all drivers for discrete problem domains.
This is narrowing dma_fence whereby adding struct lockdep_map *dma_fence::wait_map and annotating linkage, allows you to continue to specify that all dma_fence used for a particular purpose must follow common rules, without restricting the primitive for uses outside of this scope.
Somewhere else in this thread I had discussions with Jason Gunthorpe about this topic. It might maybe change somewhat depending upon exact rules, but his take is very much "I don't want dma_fence in rdma". Or pretty close to that at least.
Similar discussions with habanalabs, they're using dma_fence internally without any of the uapi. Discussion there has also now concluded that it's best if they remove them, and simply switch over to a wait_queue or completion like every other driver does.
The next round of the patches already have a paragraph to at least somewhat limit how non-gpu drivers use dma_fence. And I guess actual consensus might be pointing even more strongly at dma_fence being solely something for gpus and closely related subsystem (maybe media) for syncing dma-buf access.
So dma_fence as general replacement for completion chains I think just wont happen.
What might make sense is if e.g. the lockdep annotations could be reused, at least in design, for wait_queue or completion or anything else really. I do think that has a fair chance compared to the automagic cross-release annotations approach, which relied way too heavily on guessing where barriers are. My experience from just a bit of playing around with these patches here and discussing them with other driver maintainers is that accurately deciding where critical sections start and end is a job for humans only. And if you get it wrong, you will have a false positive.
And you're indeed correct that if we'd do annotations for completions and wait queues, then that would need to have a class per semantically equivalent user, like we have lockdep classes for mutexes, not just one overall.
But dma_fence otoh is something very specific, which comes with very specific rules attached - it's not a generic wait_queue at all. Originally it did start out as one even, but it is a very specialized wait_queue.
So there's imo two cases:
- Your completion is entirely orthogonal of dma_fences, and can never ever block a dma_fence. Don't use dma_fence for this, and no problem. It's just another wait_queue somewhere.
- Your completion can eventually, maybe through lots of convolutions and depdencies, block a dma_fence. In that case full dma_fence rules apply, and the only thing you can do with a custom annotation is make the rules even stricter. E.g. if a sub-timeline in the scheduler isn't allowed to take certain scheduler locks. But the userspace visible/published fence do take them, maybe as part of command submission or retirement. Entirely hypotethical, no idea any driver actually needs this.
Cheers, Daniel