On 2020-07-22 00:45, Dave Airlie wrote:
On Tue, 21 Jul 2020 at 18:47, Thomas Hellström (Intel) thomas_os@shipmail.org wrote:
On 7/21/20 9:45 AM, Christian König wrote:
Am 21.07.20 um 09:41 schrieb Daniel Vetter:
On Mon, Jul 20, 2020 at 01:15:17PM +0200, Thomas Hellström (Intel) wrote:
Hi,
On 7/9/20 2:33 PM, Daniel Vetter wrote:
Comes up every few years, gets somewhat tedious to discuss, let's write this down once and for all.
What I'm not sure about is whether the text should be more explicit in flat out mandating the amdkfd eviction fences for long running compute workloads or workloads where userspace fencing is allowed.
Although (in my humble opinion) it might be possible to completely untangle kernel-introduced fences for resource management and dma-fences used for completion- and dependency tracking and lift a lot of restrictions for the dma-fences, including prohibiting infinite ones, I think this makes sense describing the current state.
Yeah I think a future patch needs to type up how we want to make that happen (for some cross driver consistency) and what needs to be considered. Some of the necessary parts are already there (with like the preemption fences amdkfd has as an example), but I think some clear docs on what's required from both hw, drivers and userspace would be really good.
I'm currently writing that up, but probably still need a few days for this.
Great! I put down some (very) initial thoughts a couple of weeks ago building on eviction fences for various hardware complexity levels here:
https://gitlab.freedesktop.org/thomash/docs/-/blob/master/Untangling%20dma-f...
We are seeing HW that has recoverable GPU page faults but only for compute tasks, and scheduler without semaphores hw for graphics.
So a single driver may have to expose both models to userspace and also introduces the problem of how to interoperate between the two models on one card.
Dave.
Hmm, yes to begin with it's important to note that this is not a replacement for new programming models or APIs, This is something that takes place internally in drivers to mitigate many of the restrictions that are currently imposed on dma-fence and documented in this and previous series. It's basically the driver-private narrow completions Jason suggested in the lockdep patches discussions implemented the same way as eviction-fences.
The memory fence API would be local to helpers and middle-layers like TTM, and the corresponding drivers. The only cross-driver-like visibility would be that the dma-buf move_notify() callback would not be allowed to wait on dma-fences or something that depends on a dma-fence.
So with that in mind, I don't foresee engines with different capabilities on the same card being a problem.
/Thomas