Hi all,
We'd like to have the following commit backport to 4.9 branch to fix an issue we are seeing.
35a2897c2a306cca344ca5c0b43416707018f434 sched/wait: Remove the lockless swait_active() check in swake_up*()
In 4.9 branch, we hit an issue in RCU, where the NOCB follower list not getting reclaimed and causing OOM.
In discussion with Paul, we were able to figure out the problem was because of missed wake up resulted from lack of proper memory barrier between setting wake up condition and swake_up().
nocb_leader_wait() { *tail = rdp->nocb_gp_head; smp_mb__after_atomic(); /* Store *tail before wakeup. */ if (rdp != my_rdp && tail == &rdp->nocb_follower_head) { swake_up(&rdp->nocb_wq);
Note, that the smp_mb__after_atomic() is only a compiler barrier on x86. Originally I was going to change the barrier to smp_mb(). But then I found out master has the above mentioned patch that solves the same class of problem by removing the lockless check inside swake_up().
So I'm wonder if we can backport this patch to 4.9 branch to solve this issue, and maybe solve other potential missed wake up issue as well.
Thanks, David
On Thu, Aug 02, 2018 at 07:08:41PM +0000, David Chen wrote:
Hi all,
We'd like to have the following commit backport to 4.9 branch to fix an issue we are seeing.
35a2897c2a306cca344ca5c0b43416707018f434 sched/wait: Remove the lockless swait_active() check in swake_up*()
In 4.9 branch, we hit an issue in RCU, where the NOCB follower list not getting reclaimed and causing OOM.
In discussion with Paul, we were able to figure out the problem was because of missed wake up resulted from lack of proper memory barrier between setting wake up condition and swake_up().
nocb_leader_wait() { *tail = rdp->nocb_gp_head; smp_mb__after_atomic(); /* Store *tail before wakeup. */ if (rdp != my_rdp && tail == &rdp->nocb_follower_head) { swake_up(&rdp->nocb_wq);
Note, that the smp_mb__after_atomic() is only a compiler barrier on x86. Originally I was going to change the barrier to smp_mb(). But then I found out master has the above mentioned patch that solves the same class of problem by removing the lockless check inside swake_up().
So I'm wonder if we can backport this patch to 4.9 branch to solve this issue, and maybe solve other potential missed wake up issue as well.
Now applied, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org