On 5/19/26 19:08, Xaver Hugl wrote:
The part where we get this independent of attached hardware is quite important for us though, since we can't just ignore explicit sync once the device we previously imported the syncobj into is disconnected.
Can you elaborate more on this?
In Wayland, the client is allowed to attach dmabuf and syncobj independently, they don't have to be from the same device (and the compositor wouldn't be able to verify the opposite anyways). The compositor will usually import both into the same drm device, but especially with compositors that render on multiple devices, that's not necessarily the case either.
If for example we had a system with one internal GPU and one external GPU, the client renders on the internal GPU and the compositor uses the external one. Now when the user yanks the USB C cable, afaiu
Well I would say the other way around is a pretty common use case.
In other words the compositors uses the internal GPU for composing and displaying the picture. And the client uses the external GPU for fast rendering.
- the buffers from the client stay valid
Buffers from the hot plugged GPU don't stay valid. Accessing CPU mappings either result in a SIGBUS or are redirected to a dummy page.
DMA operations to hot plugged buffers from other GPUs (or rather more general other devices) are waited on before the underlying resource is removed (e.g. system memory or PCIe address space or whatever is backing that).
But no new DMA operations are usually permitted to start.
- the syncobj stays valid on the client side
- the syncobj becomes invalid on the compositor side
Nope that's not correct. The syncobj itself stays valid even if you completely hot plug the device.
It can just be that the fences inside the syncobj are terminated with an error.
"invalid" there means either
- the acquire point of the client is marked as signaled, before
rendering on the client side is completed
- the acquire point of the client is never signaled. Since the
compositor waits for the acquire point, the Wayland surface is stuck forever
Both of those would be a *massive* violation of documented kernel rules for hot-plugging which could lead to random data corruption and/or deadlocks.
If you see any HW driver showing behavior like that please open up a bug report and ping the relevant maintainers immediately.
When a hotplug happens all operations of the device should return an -ENODEV error, even when exposed to other devices/application through syncobj or syncfile.
One problem is that only syncfile allows for querying such error codes at the moment, we have patches pending to add that to syncobj as well but we lack a compositor with support for that as userspace client.
Afaik the latter is currently the case. The former wouldn't be much better though, not when it's preventable.
This is admittedly an edge case, but GPU hotunplug is something we try to support as well as possible in Plasma, and all the edge cases cause a lot of problems in combination and are a lot of headaches to handle (or really work around) in the compositor.
Well exactly that design is used in the Tesla 3 infotainment system for example.
So GPU hotplug is actually a pretty common use case.
Another edge case is when the client asks the compositor to import the syncobj, which can fail when a hotunplug is in process, and ends up disconnecting the client for no fault of either client or compositor.
Well the question here is if the device the compositor is using or the client is using is gone?
If the client device is hot removed the compositor should be perfectly capable to import the syncobj.
If the compositor device is gone then you don't have a device to display anything any more, so generating the next frame doesn't seem to make sense either.
What could be is that you want the compositor to be kept alive even when the display device is gone to switch over to vkms or whatever so that a VNC session or other remote desktop still works.
- It removes the need to translate between syncobjs fds and handles.
That's a pretty big no-go as well. The differentiation between FDs and handles is completely intentional.
Could you expand on why it's needed? For compositors, the handle is just an intermediary thing when translating between file descriptors.
Well what we could do is to add an IOCTL to directly attach an syncobj file descriptor to an eventfd.
That would be nice.
Take a look at drm_syncobj_file_fops and how drm_syncobj_add_eventfd() is used. Adding that functionality shouldn't be more than a typing exercise.
Do I see it right that this would already solve most problems in the compositor side?
Regards, Christian.
- Xaver
linaro-mm-sig@lists.linaro.org