On Sat, 2021-01-30 at 21:20 +0200, Jarkko Sakkinen wrote:
On Thu, 2021-01-28 at 08:33 -0800, Dave Hansen wrote:
On 1/28/21 4:58 AM, Jarkko Sakkinen wrote:
The most trivial example of a race condition can be demonstrated by this sequence where mm_list contains just one entry:
CPU A CPU B -> sgx_release() -> sgx_mmu_notifier_release() -> list_del_rcu() <- list_del_rcu() -> kref_put() -> sgx_encl_release() -> synchronize_srcu() -> cleanup_srcu_struct()
This is missing some key details including a clear, unambiguous, problem statement. To me, the patch should concentrate on the SRCU warning since that's where we started. Here's the detail that needs to be added about the issue and the locking in general in this path:
sgx_release() also does this:
mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
which does another synchronize_srcu() on the mmu_notifier's srcu_struct. *But*, it only does this if its own list_del_rcu() is successful. It does all of this before the kref_put().
In other words, sgx_release() can *only* get to this buggy path if sgx_mmu_notifier_release() races with sgx_release and does a list_del_rcu() first.
The key to this patch is that the sgx_mmu_notifier_release() will now take an 'encl' reference in that case, which prevents kref_put() from calling sgx_release() which cleans up and frees 'encl'.
I was actually also hoping to see some better comments about the new refcount, and the locking in general. There are *TWO* struct_srcu's in play, a spinlock and a refcount. I took me several days with Sean and your help to identify the actual path and get a proper fix (versions 1-4 did *not* fix the race).
This was really good input, thank you. It made realize something but now I need a sanity check.
I think that this bug fix is *neither* a legit one :-)
Example scenario would such that all removals "side-channel" through the notifier callback. Then mmu_notifier_unregister() gets called exactly zero times. No MMU notifier srcu sync would be then happening.
NOTE: There's bunch of other examples, I'm just giving one.
How I think this should be actually fixed is:
- Whenever MMU notifier is *registered* kref_get() should be called for
the enclave reference count. 2. *BOTH* sgx_release() and sgx_mmu_notifier_release() should decrease the refcount when they process an entry. I.e. the fix that I sent does kref_get() in wrong location. Please sanity check my conclusion.
Also, the use-after-free is *fixed* in sgx_mmu_notifier_release() but does not *occur* in sgx_mmu_notifier_release(). The subject here is a bit misleading in that regard.
Right, this is a valid point. It's incorrect. So if I just change the short summary by substituting sgx_mmu_notifier_release() with sgx_release()?
I.e. refcount should be increased in sgx_encl_mm_add(). That way the whole thing should be somewhat stable.
/Jarkko