That's also why I'm not positive on the "no hw preemption, only scheduler" case: You still have a dma_fence for the batch itself, which means still no userspace controlled synchronization or other form of indefinite batches allowed. So not getting us any closer to enabling the compute use cases people want.
What compute use case are you talking about? I'm only aware about the wait before signal case from Vulkan, the page fault case and the KFD preemption fence case.
So slight aside, but it does appear as if Intel's Level 0 API exposes some of the same problems as vulkan.
They have fences: "A fence cannot be shared across processes."
They have events (userspace fences) like Vulkan but specify: "Signaled from the host, and waited upon from within a device’s command list."
"There are no protections against events causing deadlocks, such as circular waits scenarios.
These problems are left to the application to avoid."
https://spec.oneapi.com/level-zero/latest/core/PROG.html#synchronization-pri...
Dave.