On 21/06/2021 14:57, Alyssa Rosenzweig wrote:
Jobs can be in-flight when the file descriptor is closed (either because the process did not terminate properly, or because it didn't wait for all GPU jobs to be finished), and apparently panfrost_job_close() does not cancel already running jobs. Let's refcount the MMU context object so it's lifetime is no longer bound to the FD lifetime and running jobs can finish properly without generating spurious page faults.
Remind me - why can't we hard stop in-flight jobs when the fd is closed? I've seen cases where kill -9'ing a badly behaved process doesn't end the fault storm, or unfreeze the desktop.
Hard-stopping the in-flight jobs would also make sense. But unless we want to actually hang the close() then there will be a period between issuing the hard-stop and actually having completed all jobs in the context.
But equally to be fair I've been cherry-picking this patch myself for quite some time, so we should just merge it and improve from there. So you can have my:
Reviewed-by: Steven Price steven.price@arm.com