On Mon 11-03-19 20:17:01, Laurent Dufour wrote:
The commit 95402b382901 ("cpu-hotplug: replace per-subsystem mutexes with get_online_cpus()") remove the CPU_LOCK_ACQUIRE operation which was use to grap the cache_chain_mutex lock which was protecting cache_reap() against CPU hot plug operations.
Later the commit 18004c5d4084 ("mm, sl[aou]b: Use a common mutex definition") changed cache_chain_mutex to slab_mutex but this didn't help fixing the missing the cache_reap() protection against CPU hot plug operations.
Here we are stopping the per cpu worker while holding the slab_mutex to ensure that cache_reap() is not running in our back and will not be triggered anymore for this cpu.
This patch fixes that race leading to SLAB's data corruption when CPU hotplug are triggered. We hit it while doing partition migration on PowerVM leading to CPU reconfiguration through the CPU hotplug mechanism.
What is the actual race? slab_offline_cpu calls cancel_delayed_work_sync so it removes a pending item and waits for the item to finish if they run concurently. So why do we need an additional lock?
This fix is covering kernel containing to the commit 6731d4f12315 ("slab: Convert to hotplug state machine"), ie 4.9.1, earlier kernel needs a slightly different patch.
Cc: stable@vger.kernel.org Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Signed-off-by: Laurent Dufour ldufour@linux.ibm.com
mm/slab.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/slab.c b/mm/slab.c index 28652e4218e0..ba499d90f27f 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1103,6 +1103,7 @@ static int slab_online_cpu(unsigned int cpu) static int slab_offline_cpu(unsigned int cpu) {
- mutex_lock(&slab_mutex); /*
- Shutdown cache reaper. Note that the slab_mutex is held so
- that if cache_reap() is invoked it cannot do anything
@@ -1112,6 +1113,7 @@ static int slab_offline_cpu(unsigned int cpu) cancel_delayed_work_sync(&per_cpu(slab_reap_work, cpu)); /* Now the cache_reaper is guaranteed to be not running. */ per_cpu(slab_reap_work, cpu).work.func = NULL;
- mutex_unlock(&slab_mutex); return 0;
} -- 2.21.0