Re: [PATCH v4] x86/sgx: Fix the call order of synchronize_srcu() in sgx_release()

21 Jan 2021

      On Wed, Jan 20, 2021 at 05:19:44PM -0800, Dave Hansen wrote:
...
On 1/20/21 4:29 PM, Jarkko Sakkinen wrote:
...
On Wed, Jan 20, 2021 at 09:35:28AM -0800, Sean Christopherson wrote:
...
On Fri, Jan 15, 2021, jarkko@kernel.org wrote:
...
From: Jarkko Sakkinen jarkko@kernel.org
The most trivial example of a race condition can be demonstrated with this
example where mm_list contains just one entry:
CPU A                   CPU B
sgx_release()
                        sgx_mmu_notifier_release()
                        list_del_rcu()
sgx_encl_release()
                        synchronize_srcu()
cleanup_srcu_struct()
To fix this, call synchronize_srcu() before checking whether mm_list is
empty in sgx_release().
Why haven't you included the splat that Haitao provided?  That would go a long
way to helping answer Boris' question about exactly what is broken...
I've lost the klog.
Haitao said he thought it was this:
...
void cleanup_srcu_struct(struct srcu_struct *ssp)
{
...
...
    if (WARN_ON(srcu_readers_active(ssp)))
            return; /* Just leak it! */

Which would indicate that an 'encl' kref reached 0 while some other
thread was inside a
    idx = srcu_read_lock(&encl->srcu);

...
   srcu_read_unlock(&encl->srcu, idx);
critical section.  A quick audit didn't turn up any obvious suspects,
though.
If that *is* it, it might be nice to try to catch the culprit at
srcu_read_{un}lock() time.  If there's ever a 0 refcount at those sites,
it would be nice to spew a warning:
    idx = srcu_read_lock(&encl->srcu);

WARN_ON(!atomic_read(&encl->refcount->refcount);
   ...
   WARN_ON(!atomic_read(&encl->refcount->refcount);
   srcu_read_unlock(&encl->srcu, idx);
at each site.
The root cause is fully known already and audited.
An mm_list entry is kept up until the process exits *or* when
VFS close happens, and sgx_release() executes and removes it.
It's entirely possible that MMU notifier callback registered
by a process happens while sgx_release() is executing, which
causes list_del_rcu() to happen, unnoticed by sgx_release().
If that empties the list, cleanup_srcu_struct() is executed
unsynchronized in the middle a unfinished grace period.
/Jarkko

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4] x86/sgx: Fix the call order of synchronize_srcu() in sgx_release()