Hi
On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
My question was rather about why we do this? Semantics for EINTR are well defined, and with SA_RESTART (default on linux) user-space can ignore it. However, looping on EAGAIN is very uncommon, and it is not at all clear why it is needed?
Returning an error to user-space makes sense if user-space has a reason to react to it. I fail to see how EAGAIN on a cache-flush/sync operation helps user-space at all? As someone without insight into the driver implementation, it is hard to tell why.. Any hints?
The reason we return EAGAIN is to workaround a deadlock we face when blocking on the GPU holding the struct_mutex (inside the client's process), but the GPU is dead. As our locking is very, very coarse we cannot restart the GPU without acquiring the struct_mutex being held by the client so we wake the client up and tell them the resource they are waiting on (the flush of the object from the GPU into the CPU domain) is temporarily unavailable. If they try to immediately wait upon the ioctl again, they are blocked waiting for the reset to occur before they may complete their flush. There are a few other possible deadlocks that are also avoided with EAGAIN (again, the issue is more or less the lack of fine grained locking).
...so you hijacked EAGAIN for all DRM ioctls just for a driver workaround? EAGAIN is universally used to signal the caller about a blocking resource. It is very much linked to O_NONBLOCK. Why not use EBUSY, ECANCELED, ECOMM, EDEADLOCK, EIO, EL3RST, ...
Anyhow, I guess that ship has sailed. But just mentioning EAGAIN in a kernel-doc is way to vague for user-space to figure out they should loop on it.
Thanks David