On Tue, Jul 21, 2020 at 7:46 PM Thomas Hellström (Intel) thomas_os@shipmail.org wrote:
On 2020-07-21 15:59, Christian König wrote:
Am 21.07.20 um 12:47 schrieb Thomas Hellström (Intel):
...
Yes, we can't do magic. As soon as an indefinite batch makes it to such hardware we've lost. But since we can break out while the batch is stuck in the scheduler waiting, what I believe we *can* do with this approach is to avoid deadlocks due to locally unknown dependencies, which has some bearing on this documentation patch, and also to allow memory allocation in dma-fence (not memory-fence) critical sections, like gpu fault- and error handlers without resorting to using memory pools.
Avoiding deadlocks is only the tip of the iceberg here.
When you allow the kernel to depend on user space to proceed with some operation there are a lot more things which need consideration.
E.g. what happens when an userspace process which has submitted stuff to the kernel is killed? Are the prepared commands send to the hardware or aborted as well? What do we do with other processes waiting for that stuff?
How to we do resource accounting? When processes need to block when submitting to the hardware stuff which is not ready we have a process we can punish for blocking resources. But how is kernel memory used for a submission accounted? How do we avoid deny of service attacks here were somebody eats up all memory by doing submissions which can't finish?
Hmm. Are these problems really unique to user-space controlled dependencies? Couldn't you hit the same or similar problems with mis-behaving shaders blocking timeline progress?
We just kill them, which we can because stuff needs to complete in a timely fashion, and without any further intervention - all prerequisite dependencies must be and are known by the kernel.
But with the long/endless running compute stuff with userspace sync point and everything free-wheeling, including stuff like "hey I'll submit this patch but the memory isn't even all allocated yet, so I'm just going to hang it on this semaphore until that's done" is entirely different. There just shooting the batch kills the programming model, and abitrarily holding up a batch for another one to first get its memory also breaks it, because userspace might have issued them with dependencies in the other order.
So with that execution model you don't run batches, but just an entire context. Up to userspace what it does with that, and like with cpu threads just running a busy loop doing nothing is perfectly legit (from the kernel pov's at least) workload. Nothing in the kernel ever waits on such a context to do anything, if the kernel needs something you just preempt (or if it's memory and you have gpu page fault handling, rip out the page). Accounting is all done on a specific gpu context too. And probably we need a somewhat consistent approach on how we handle these gpu context things (definitely needed for cgroups and all that). -Daniel