On 1/21/21 4:55 AM, Jarkko Sakkinen wrote:
The root cause is fully known already and audited.
An mm_list entry is kept up until the process exits *or* when VFS close happens, and sgx_release() executes and removes it.
It's entirely possible that MMU notifier callback registered by a process happens while sgx_release() is executing, which causes list_del_rcu() to happen, unnoticed by sgx_release().
Just to be clear, are you talking about sgx_mmu_notifier_release()?
According to comments, mmu_notifier->release() can be called "before all pages are freed." That means it can be called before VMAs are torn down, which is where the filp->release() is called.
There is also nothing to *stop* a VMA from being torn down or an fd closed at the point that mmu_notifier->release() is called.
fd = open("/dev/sgx") // filp count==1 mmap(fd) // filp count==2 close(fd) // filp count==1 munmap() // filp count==0, sgx_encl_release() cleanup_srcu_struct(&encl->srcu); kfree(encl);
Meanwhile, another thread is doing:
exit() ... exit_mmap() ... mmu_notifier_release()
Which is touching 'encl'.
Shouldn't a kref be held on 'encl' to ensure it isn't referenced after it is freed?
If that empties the list, cleanup_srcu_struct() is executed unsynchronized in the middle a unfinished grace period.
- /*
* Each sgx_mmu_notifier_release() starts a grace period. Therefore, an
* additional sync is required here.
*/
Except that sgx_mmu_notifier_release() doesn't do any srcu locking. How does sgx_mmu_notifier_release() start a new grace period?