On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
My question was rather about why we do this? Semantics for EINTR are well defined, and with SA_RESTART (default on linux) user-space can ignore it. However, looping on EAGAIN is very uncommon, and it is not at all clear why it is needed?
Returning an error to user-space makes sense if user-space has a reason to react to it. I fail to see how EAGAIN on a cache-flush/sync operation helps user-space at all? As someone without insight into the driver implementation, it is hard to tell why.. Any hints?
The reason we return EAGAIN is to workaround a deadlock we face when blocking on the GPU holding the struct_mutex (inside the client's process), but the GPU is dead. As our locking is very, very coarse we cannot restart the GPU without acquiring the struct_mutex being held by the client so we wake the client up and tell them the resource they are waiting on (the flush of the object from the GPU into the CPU domain) is temporarily unavailable. If they try to immediately wait upon the ioctl again, they are blocked waiting for the reset to occur before they may complete their flush. There are a few other possible deadlocks that are also avoided with EAGAIN (again, the issue is more or less the lack of fine grained locking). -Chris