[PATCH v4] x86/sgx: Fix the call order of synchronize_srcu() in sgx_release()

List overview All Threads
Download

newer

older

stable/linux-5.4.y baseline: 176...

stable-rc/queue/4.14 baseline: 133...

jarkko＠kernel.org

15 Jan 2021 15 Jan '21

1:46 a.m.

From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Sean Christopherson seanjc@google.com Suggested-by: Haitao Huang haitao.huang@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org --- v4: - Rewrite the commit message. - Just change the call order. *_expedited() is out of scope for this bug fix. v3: Fine-tuned tags, and added missing change log for v2. v2: Switch to synchronize_srcu_expedited(). arch/x86/kernel/cpu/sgx/driver.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index f2eac41bb4ff..53056345f5f8 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -65,11 +65,16 @@ static int sgx_release(struct inode *inode, struct file *file)

spin_unlock(&encl->mm_lock);

+ /* + * The call is need even if the list empty, because sgx_encl_mmu_notifier_release() + * could have initiated a new grace period. + */ + synchronize_srcu(&encl->srcu); + /* The enclave is no longer mapped by any mm. */ if (!encl_mm) break;

- synchronize_srcu(&encl->srcu); mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); kfree(encl_mm); }

-- 2.29.2

Show replies by date

Borislav Petkov

15 Jan 15 Jan

7:18 a.m.

On Fri, Jan 15, 2021 at 03:46:38AM +0200, jarkko@kernel.org wrote:

...

From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

To fix what?

Lemme explain one more time: a commit message for a race condition fix needs to explain in *detail* what the race condition is. And that explanation needs to be understandable months and years from now.

You have the function call order above, now you have to explain what can happen. Just how you did here:

https://lkml.kernel.org/r/X/zoarV7gd/LNo4A@kernel.org

...

Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Sean Christopherson seanjc@google.com Suggested-by: Haitao Huang haitao.huang@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org

v4:

Rewrite the commit message.

Just change the call order. *_expedited() is out of scope for this bug fix.

v3: Fine-tuned tags, and added missing change log for v2. v2: Switch to synchronize_srcu_expedited(). arch/x86/kernel/cpu/sgx/driver.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index f2eac41bb4ff..53056345f5f8 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -65,11 +65,16 @@ static int sgx_release(struct inode *inode, struct file *file) spin_unlock(&encl->mm_lock);
/*
 * The call is need even if the list empty, because sgx_encl_mmu_notifier_release()

"The call is need"?

$ git grep sgx_encl_mmu_notifier_release $

WTF?

Please be more careful.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Jarkko Sakkinen

16 Jan 16 Jan

5:12 a.m.

On Fri, Jan 15, 2021 at 08:18:09AM +0100, Borislav Petkov wrote:

...

On Fri, Jan 15, 2021 at 03:46:38AM +0200, jarkko@kernel.org wrote:

...
From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

To fix what?

Lemme explain one more time: a commit message for a race condition fix needs to explain in *detail* what the race condition is. And that explanation needs to be understandable months and years from now.

You have the function call order above, now you have to explain what can happen. Just how you did here:

https://lkml.kernel.org/r/X/zoarV7gd/LNo4A@kernel.org

OK, I could recall the race that from but that must be partly because I've been proactively working on it, i.e. getting your point.

So let's say I add this after the sequence:

"The sequence demonstrates a scenario where CPU B starts a new grace period, which goes unnoticed by CPU A in sgx_release(), because it did not remove the final entry from the enclave's mm list."

Would this be sufficient or not?

...

...
Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Sean Christopherson seanjc@google.com Suggested-by: Haitao Huang haitao.huang@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org

v4:

Rewrite the commit message.

Just change the call order. *_expedited() is out of scope for this bug fix.

v3: Fine-tuned tags, and added missing change log for v2. v2: Switch to synchronize_srcu_expedited(). arch/x86/kernel/cpu/sgx/driver.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index f2eac41bb4ff..53056345f5f8 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -65,11 +65,16 @@ static int sgx_release(struct inode *inode, struct file *file) spin_unlock(&encl->mm_lock);
/*
 * The call is need even if the list empty, because sgx_encl_mmu_notifier_release()
"The call is need"?

$ git grep sgx_encl_mmu_notifier_release $

WTF?

Please be more careful.

Note taken.

...

-- Regards/Gruss, Boris.

https://people.kernel.org/tglx/notes-about-netiquette

/Jarkko

Borislav Petkov

18 Jan 18 Jan

6:57 p.m.

On Sat, Jan 16, 2021 at 07:12:54AM +0200, Jarkko Sakkinen wrote:

...

...
https://lkml.kernel.org/r/X/zoarV7gd/LNo4A@kernel.org

OK, I could recall the race that from but that must be partly because I've been proactively working on it, i.e. getting your point.

So let's say I add this after the sequence:

"The sequence demonstrates a scenario where CPU B starts a new grace period, which goes unnoticed by CPU A in sgx_release(), because it did not remove the final entry from the enclave's mm list."

Would this be sufficient or not?

Not sure.

That link above says:

"Now, let's imagine that there is exactly one entry in the encl->mm_list. and sgx_release() execution gets scheduled right after returning from synchronize_srcu().

With some bad luck, some process comes and removes that last entry befoe sgx_release() acquires mm_lock."

So, the last entry gets removed by some other process before sgx_release() acquires mm_lock. When it does acquire that lock, the test

if (list_empty(&encl->mm_list))

will be true because "some other process" has removed that last entry.

So why do you need the synchronize_srcu() call when this process sees an empty mm_list already?

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Jarkko Sakkinen

20 Jan 20 Jan

2:43 p.m.

On Mon, Jan 18, 2021 at 07:57:12PM +0100, Borislav Petkov wrote:

...

On Sat, Jan 16, 2021 at 07:12:54AM +0200, Jarkko Sakkinen wrote:

...
...
https://lkml.kernel.org/r/X/zoarV7gd/LNo4A@kernel.org

OK, I could recall the race that from but that must be partly because I've been proactively working on it, i.e. getting your point.

So let's say I add this after the sequence:

"The sequence demonstrates a scenario where CPU B starts a new grace period, which goes unnoticed by CPU A in sgx_release(), because it did not remove the final entry from the enclave's mm list."

Would this be sufficient or not?

Not sure.

That link above says:

"Now, let's imagine that there is exactly one entry in the encl->mm_list. and sgx_release() execution gets scheduled right after returning from synchronize_srcu().

With some bad luck, some process comes and removes that last entry befoe sgx_release() acquires mm_lock."

So, the last entry gets removed by some other process before sgx_release() acquires mm_lock. When it does acquire that lock, the test

if (list_empty(&encl->mm_list))

will be true because "some other process" has removed that last entry.

So why do you need the synchronize_srcu() call when this process sees an empty mm_list already?

Thx.

The other process aka some process using the enclave calls list_del_rcu() (and synchronize_srcu()), which starts a new grace period. If we don't do it, then the cleanup_srcu() will race with that grace period.

...

-- Regards/Gruss, Boris.

https://people.kernel.org/tglx/notes-about-netiquette

/Jarkko

Dave Hansen

5:34 p.m.

On 1/20/21 6:43 AM, Jarkko Sakkinen wrote:

...

...
So why do you need the synchronize_srcu() call when this process sees an empty mm_list already?

Thx.

The other process aka some process using the enclave calls list_del_rcu() (and synchronize_srcu()), which starts a new grace period. If we don't do it, then the cleanup_srcu() will race with that grace period.

To me, this is only a partial explanation.

That goal of synchronize_srcu() is to wait for the completion of a *previous* grace period: one that might have observed the old state of the list.

Could you explain the *actual* effects of the misplaced synchronize_srcu()? If the race _occurs_, what is the side-effect?

Jarkko Sakkinen

21 Jan 21 Jan

12:26 a.m.

On Wed, Jan 20, 2021 at 09:34:10AM -0800, Dave Hansen wrote:

...

On 1/20/21 6:43 AM, Jarkko Sakkinen wrote:

...
...
So why do you need the synchronize_srcu() call when this process sees an empty mm_list already?

Thx.

The other process aka some process using the enclave calls list_del_rcu() (and synchronize_srcu()), which starts a new grace period. If we don't do it, then the cleanup_srcu() will race with that grace period.

To me, this is only a partial explanation.

That goal of synchronize_srcu() is to wait for the completion of a *previous* grace period: one that might have observed the old state of the list.

Could you explain the *actual* effects of the misplaced synchronize_srcu()? If the race _occurs_, what is the side-effect?

As I haven't been able to reproduce this regression myself, I need to take steps back and try to reproduce the it with Graphene.

WARN_ON()'s trigger inside cleanup_srcu_struct(), which causes a memory leak since free_percpu() gets never called. If I recall correctly, it was srcu_readers_active() but unfortunately I don't have a log available.

Perhaps Haitao could provide us one.

/Jarkko

Haitao Huang

22 Jan 22 Jan

6:20 p.m.

On Wed, 20 Jan 2021 18:26:56 -0600, Jarkko Sakkinen jarkko@kernel.org wrote:

...

On Wed, Jan 20, 2021 at 09:34:10AM -0800, Dave Hansen wrote:

...
On 1/20/21 6:43 AM, Jarkko Sakkinen wrote:

...
...
So why do you need the synchronize_srcu() call when this process

sees an

...
...
empty mm_list already?

Thx.

The other process aka some process using the enclave calls

list_del_rcu()

...
(and synchronize_srcu()), which starts a new grace period. If we don't do it, then the cleanup_srcu() will race with that grace period.

To me, this is only a partial explanation.

That goal of synchronize_srcu() is to wait for the completion of a *previous* grace period: one that might have observed the old state of the list.

Could you explain the *actual* effects of the misplaced synchronize_srcu()? If the race _occurs_, what is the side-effect?

As I haven't been able to reproduce this regression myself, I need to take steps back and try to reproduce the it with Graphene.

WARN_ON()'s trigger inside cleanup_srcu_struct(), which causes a memory leak since free_percpu() gets never called. If I recall correctly, it was srcu_readers_active() but unfortunately I don't have a log available.

Perhaps Haitao could provide us one.

/Jarkko

Sorry I lost those logs too as our email server automatically clean up old emails. I have been re-running the tests but have not been able to reproduce the same issue.

Haitao

Sean Christopherson

20 Jan 20 Jan

5:35 p.m.

On Fri, Jan 15, 2021, jarkko@kernel.org wrote:

...

From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...

...

Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Sean Christopherson seanjc@google.com Suggested-by: Haitao Huang haitao.huang@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org

v4:

Rewrite the commit message.

Just change the call order. *_expedited() is out of scope for this bug fix.

v3: Fine-tuned tags, and added missing change log for v2. v2: Switch to synchronize_srcu_expedited(). arch/x86/kernel/cpu/sgx/driver.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index f2eac41bb4ff..53056345f5f8 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -65,11 +65,16 @@ static int sgx_release(struct inode *inode, struct file *file) spin_unlock(&encl->mm_lock);
/*
 * The call is need even if the list empty, because sgx_encl_mmu_notifier_release()
 * could have initiated a new grace period.
 */
synchronize_srcu(&encl->srcu);

I don't think this completely fixes the issue as sgx_release() isn't guaranteed to trigger cleanup_srcu_struct(), e.g. the reclaimer can also have a reference to the enclave.

...

/* The enclave is no longer mapped by any mm. */ if (!encl_mm) break;
synchronize_srcu(&encl->srcu);
mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); kfree(encl_mm); }
-- 2.29.2

Jarkko Sakkinen

21 Jan 21 Jan

12:29 a.m.

On Wed, Jan 20, 2021 at 09:35:28AM -0800, Sean Christopherson wrote:

...

On Fri, Jan 15, 2021, jarkko@kernel.org wrote:

...
From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...

I've lost the klog.

/Jarkko

Dave Hansen

1:19 a.m.

On 1/20/21 4:29 PM, Jarkko Sakkinen wrote:

...

On Wed, Jan 20, 2021 at 09:35:28AM -0800, Sean Christopherson wrote:

...
On Fri, Jan 15, 2021, jarkko@kernel.org wrote:

...
From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...

I've lost the klog.

Haitao said he thought it was this:

...

void cleanup_srcu_struct(struct srcu_struct *ssp) {

...

    if (WARN_ON(srcu_readers_active(ssp)))
            return; /* Just leak it! */

Which would indicate that an 'encl' kref reached 0 while some other thread was inside a

idx = srcu_read_lock(&encl->srcu); ... srcu_read_unlock(&encl->srcu, idx);

critical section. A quick audit didn't turn up any obvious suspects, though.

If that *is* it, it might be nice to try to catch the culprit at srcu_read_{un}lock() time. If there's ever a 0 refcount at those sites, it would be nice to spew a warning:

idx = srcu_read_lock(&encl->srcu); WARN_ON(!atomic_read(&encl->refcount->refcount); ... WARN_ON(!atomic_read(&encl->refcount->refcount); srcu_read_unlock(&encl->srcu, idx);

at each site.

Jarkko Sakkinen

12:55 p.m.

On Wed, Jan 20, 2021 at 05:19:44PM -0800, Dave Hansen wrote:

...

On 1/20/21 4:29 PM, Jarkko Sakkinen wrote:

...
On Wed, Jan 20, 2021 at 09:35:28AM -0800, Sean Christopherson wrote:

...
On Fri, Jan 15, 2021, jarkko@kernel.org wrote:

...
From: Jarkko Sakkinen jarkko@kernel.org

The most trivial example of a race condition can be demonstrated with this example where mm_list contains just one entry:

CPU A CPU B sgx_release() sgx_mmu_notifier_release() list_del_rcu() sgx_encl_release() synchronize_srcu() cleanup_srcu_struct()

To fix this, call synchronize_srcu() before checking whether mm_list is empty in sgx_release().

Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...

I've lost the klog.

Haitao said he thought it was this:

...
void cleanup_srcu_struct(struct srcu_struct *ssp) {

...

...
    if (WARN_ON(srcu_readers_active(ssp)))
            return; /* Just leak it! */
Which would indicate that an 'encl' kref reached 0 while some other thread was inside a
    idx = srcu_read_lock(&encl->srcu);
... srcu_read_unlock(&encl->srcu, idx);

critical section. A quick audit didn't turn up any obvious suspects, though.

If that *is* it, it might be nice to try to catch the culprit at srcu_read_{un}lock() time. If there's ever a 0 refcount at those sites, it would be nice to spew a warning:
    idx = srcu_read_lock(&encl->srcu);
WARN_ON(!atomic_read(&encl->refcount->refcount); ... WARN_ON(!atomic_read(&encl->refcount->refcount); srcu_read_unlock(&encl->srcu, idx);

at each site.

The root cause is fully known already and audited.

An mm_list entry is kept up until the process exits *or* when VFS close happens, and sgx_release() executes and removes it.

It's entirely possible that MMU notifier callback registered by a process happens while sgx_release() is executing, which causes list_del_rcu() to happen, unnoticed by sgx_release().

If that empties the list, cleanup_srcu_struct() is executed unsynchronized in the middle a unfinished grace period.

/Jarkko

Dave Hansen

6:19 p.m.

On 1/21/21 4:55 AM, Jarkko Sakkinen wrote:

...

The root cause is fully known already and audited.

An mm_list entry is kept up until the process exits *or* when VFS close happens, and sgx_release() executes and removes it.

It's entirely possible that MMU notifier callback registered by a process happens while sgx_release() is executing, which causes list_del_rcu() to happen, unnoticed by sgx_release().

Just to be clear, are you talking about sgx_mmu_notifier_release()?

According to comments, mmu_notifier->release() can be called "before all pages are freed." That means it can be called before VMAs are torn down, which is where the filp->release() is called.

There is also nothing to *stop* a VMA from being torn down or an fd closed at the point that mmu_notifier->release() is called.

fd = open("/dev/sgx") // filp count==1 mmap(fd) // filp count==2 close(fd) // filp count==1 munmap() // filp count==0, sgx_encl_release() cleanup_srcu_struct(&encl->srcu); kfree(encl);

Meanwhile, another thread is doing:

exit() ... exit_mmap() ... mmu_notifier_release()

Which is touching 'encl'.

Shouldn't a kref be held on 'encl' to ensure it isn't referenced after it is freed?

...

If that empties the list, cleanup_srcu_struct() is executed unsynchronized in the middle a unfinished grace period.

...

* Each sgx_mmu_notifier_release() starts a grace period. Therefore, an

```
* additional sync is required here.
```
```
*/
```

Except that sgx_mmu_notifier_release() doesn't do any srcu locking. How does sgx_mmu_notifier_release() start a new grace period?

Dave Hansen

22 Jan 22 Jan

4:56 p.m.

On 1/20/21 9:35 AM, Sean Christopherson wrote:

...

Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...

The bad news is that the original splat seems to be lost.

The good news is that this is hard to reproduce and *might* not occur on what got merged in mainline.

We're going to circle back around and make sure we have a clean reproduction before we try to fix this again.

Jarkko Sakkinen

23 Jan 23 Jan

8:58 a.m.

On Fri, Jan 22, 2021 at 08:56:44AM -0800, Dave Hansen wrote:

...

On 1/20/21 9:35 AM, Sean Christopherson wrote:

...
Why haven't you included the splat that Haitao provided? That would go a long way to helping answer Boris' question about exactly what is broken...

The bad news is that the original splat seems to be lost.

The good news is that this is hard to reproduce and *might* not occur on what got merged in mainline.

We're going to circle back around and make sure we have a clean reproduction before we try to fix this again.

Yeah, this a good opportunity to level up the QA.

/Jarkko

Dave Hansen

25 Jan 25 Jan

3:49 p.m.

Haitao managed to create another splat over the weekend. It was, indeed:

...

WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100

which is:

...

    if (WARN_ON(!srcu_get_delay(ssp)))
            return; /* Just leak it! */

That check means that there is an outstanding "expedited" grace period. The fact that it's expedited is not important. This:

https://lwn.net/Articles/202847/

describes the reasoning behind the warning:

If the struct srcu_struct is dynamically allocated, then cleanup_srcu_struct() must be called before it is freed ... the caller must take care to ensure that all SRCU read-side critical sections have completed (and that no more will commence) before calling cleanup_srcu_struct().

synchronize_srcu() will (obviously) wait for the grace period to complete. Calling it will shut up the warning for sure, most of the time.

The required sequence of events is in here:

...

https://lore.kernel.org/lkml/1492472726-3841-4-git-send-email-paulmck@linux....

I suspect that the mmu notifier's synchronize_srcu() is run in parallel very close to when cleanup_srcu_struct() is called. This violates the "prevent any further calls to synchronize_srcu" rule.

So, while I suspect that adding a synchronize_srcu() is *part* of the correct solution, I'm still not convinced that the sgx_mmu_notifier_release() code is correct.

Jarkko Sakkinen

27 Jan 27 Jan

5:31 p.m.

On Mon, Jan 25, 2021 at 07:49:04AM -0800, Dave Hansen wrote:

...

Haitao managed to create another splat over the weekend. It was, indeed:

...
WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100

which is:

...
    if (WARN_ON(!srcu_get_delay(ssp)))
            return; /* Just leak it! */
That check means that there is an outstanding "expedited" grace period. The fact that it's expedited is not important. This:

https://lwn.net/Articles/202847/

describes the reasoning behind the warning:

If the struct srcu_struct is dynamically allocated, then cleanup_srcu_struct() must be called before it is freed ... the caller must take care to ensure that all SRCU read-side critical sections have completed (and that no more will commence) before calling cleanup_srcu_struct().

synchronize_srcu() will (obviously) wait for the grace period to complete. Calling it will shut up the warning for sure, most of the time.

The required sequence of events is in here:

...
https://lore.kernel.org/lkml/1492472726-3841-4-git-send-email-paulmck@linux....

I'll add "Link:" for this to the final fix. It's a good reference.

...

I suspect that the mmu notifier's synchronize_srcu() is run in parallel very close to when cleanup_srcu_struct() is called. This violates the "prevent any further calls to synchronize_srcu" rule.

So, while I suspect that adding a synchronize_srcu() is *part* of the correct solution, I'm still not convinced that the sgx_mmu_notifier_release() code is correct.

What Sean suggested in private discussion, i.e. using kref_get() in the MMU notifier AFAIK should be enough to do the necessary serialization.

/Jarkko

1669

days inactive

1681

days old

linux-stable-mirror@lists.linaro.org

16 comments

participants

tags (0)

participants (6)

Borislav Petkov
Dave Hansen
Haitao Huang
Jarkko Sakkinen
jarkko＠kernel.org
Sean Christopherson