On Wed, Jan 20, 2021 at 05:19:44PM -0800, Dave Hansen wrote:
On 1/20/21 4:29 PM, Jarkko Sakkinen wrote:
On Wed, Jan 20, 2021 at 09:35:28AM -0800, Sean Christopherson wrote:
On Fri, Jan 15, 2021, jarkko@kernel.org wrote:
From: Jarkko Sakkinen jarkko@kernel.org
The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:
CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()
To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().
Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...
I've lost the klog.
Haitao said he thought it was this:
void cleanup_srcu_struct(struct srcu_struct *ssp) {
...
if (WARN_ON(srcu_readers_active(ssp))) return; /* Just leak it! */
Which would indicate that an 'encl' kref reached 0 while some other thread was inside a
idx = srcu_read_lock(&encl->srcu);
... srcu_read_unlock(&encl->srcu, idx);
critical section. A quick audit didn't turn up any obvious suspects, though.
If that *is* it, it might be nice to try to catch the culprit at srcu_read_{un}lock() time. If there's ever a 0 refcount at those sites, it would be nice to spew a warning:
idx = srcu_read_lock(&encl->srcu);
WARN_ON(!atomic_read(&encl->refcount->refcount); ... WARN_ON(!atomic_read(&encl->refcount->refcount); srcu_read_unlock(&encl->srcu, idx);
at each site.
The root cause is fully known already and audited.
An mm_list entry is kept up until the process exits *or* when VFS close happens, and sgx_release() executes and removes it.
It's entirely possible that MMU notifier callback registered by a process happens while sgx_release() is executing, which causes list_del_rcu() to happen, unnoticed by sgx_release().
If that empties the list, cleanup_srcu_struct() is executed unsynchronized in the middle a unfinished grace period.
/Jarkko