On Wed, Oct 23, 2019 at 3:11 PM Christian Brauner christian.brauner@ubuntu.com wrote:
On Wed, Oct 23, 2019 at 02:39:55PM +0200, Dmitry Vyukov wrote:
On Wed, Oct 23, 2019 at 2:16 PM Andrea Parri parri.andrea@gmail.com wrote:
On Mon, Oct 21, 2019 at 01:33:27PM +0200, Christian Brauner wrote:
When assiging and testing taskstats in taskstats_exit() there's a race when writing and reading sig->stats when a thread-group with more than one thread exits:
cpu0: thread catches fatal signal and whole thread-group gets taken down do_exit() do_group_exit() taskstats_exit() taskstats_tgid_alloc() The tasks reads sig->stats without holding sighand lock.
cpu1: task calls exit_group() do_exit() do_group_exit() taskstats_exit() taskstats_tgid_alloc() The task takes sighand lock and assigns new stats to sig->stats.
The first approach used smp_load_acquire() and smp_store_release(). However, after having discussed this it seems that the data dependency for kmem_cache_alloc() would be fixed by WRITE_ONCE(). Furthermore, the smp_load_acquire() would only manage to order the stats check before the thread_group_empty() check. So it seems just using READ_ONCE() and WRITE_ONCE() will do the job and I wanted to bring this up for discussion at least.
Mmh, the RELEASE was intended to order the memory initialization in kmem_cache_zalloc() with the later ->stats pointer assignment; AFAICT, there is no data dependency between such memory accesses.
I agree. This needs smp_store_release. The latest version that I looked at contained: smp_store_release(&sig->stats, stats_new);
This is what really makes me wonder. Can the compiler really re-order the kmem_cache_zalloc() call with the assignment.
Yes. Not sure about compiler, but hardware definitely can. And generally one does not care if it's compiler or hardware.
If that's really the case then shouldn't all allocation functions have compiler barriers in them? This then seems like a very generic problem.
No. One puts memory barriers into synchronization primitives. This equally affects memset's, memcpy's and in fact all normal stores. Adding a memory barrier to every normal store is not the solution to this. The memory barrier is done before publication of the memory. And we already have smp_store_release for this. So if one doesn't publish objects with a plain store (which breaks all possible rules anyways) and uses a proper primitive, there is no problem.
Correspondingly, the ACQUIRE was intended to order the ->stats pointer load with later, _independent dereferences of the same pointer; the latter are, e.g., in taskstats_exit() (but not thread_group_empty()).
How these later loads can be completely independent of the pointer value? They need to obtain the pointer value from somewhere. And this can only be done by loaded it. And if a thread loads a pointer and then dereferences that pointer, that's a data/address dependency and we assume this is now covered by READ_ONCE. Or these later loads of the pointer can also race with the store? If
To clarify, later loads as in taskstats_exit() and thread_group_empty(), not the later load in the double-checked locking case.
so, I think they also need to use READ_ONCE (rather than turn this earlier pointer load into acquire).
Using READ_ONCE() in the alloc, taskstat_exit(), and thread_group_empty() case.
Christian