On Fri, Aug 18, 2023 at 3:19 PM Johannes Weiner hannes@cmpxchg.org wrote:
On Fri, Aug 18, 2023 at 11:44:45AM -0700, Yosry Ahmed wrote:
On Fri, Aug 18, 2023 at 11:35 AM Johannes Weiner hannes@cmpxchg.org wrote:
On Fri, Aug 18, 2023 at 10:45:56AM -0700, Yosry Ahmed wrote:
On Fri, Aug 18, 2023 at 10:35 AM Johannes Weiner hannes@cmpxchg.org wrote:
On Fri, Aug 18, 2023 at 07:56:37AM -0700, Yosry Ahmed wrote:
If this happens it seems possible for this to happen:
cpu #1 cpu#2 css_put() /* css_free_rwork_fn is queued */ rcu_read_lock() mem_cgroup_from_id() mem_cgroup_id_remove() /* access memcg */
I don't quite see how that'd possible. IDR uses rcu_assign_pointer() during deletion, which inserts the necessary barriering. My understanding is that this should always be safe:
rcu_read_lock() (writer serialization, in this case ref count == 0) foo = idr_find(x) idr_remove(x) if (foo) kfree_rcu(foo) LOAD(foo->bar) rcu_read_unlock()
How does a barrier inside IDR removal protect against the memcg being freed here though?
If css_put() is executed out-of-order before mem_cgroup_id_remove(), the memcg can be freed even before mem_cgroup_id_remove() is called, right?
css_put() can start earlier, but it's not allowed to reorder the rcu callback that frees past the rcu_assign_pointer() in idr_remove().
This is what RCU and its access primitives guarantees. It ensures that after "unpublishing" the pointer, all concurrent RCU-protected accesses to the object have finished, and the memory can be freed.
I am not sure I understand, this is the scenario I mean:
cpu#1 cpu#2 cpu#3 css_put() /* schedule free */ rcu_read_lock() idr_remove() mem_cgroup_from_id()
/* free memcg */ /* use memcg */
idr_remove() cannot be re-ordered after scheduling the free. Think about it, this is the common rcu-freeing pattern:
rcu_assign_pointer(p, NULL); call_rcu(rh, free_pointee);
on the write side, and:
rcu_read_lock(); pointee = rcu_dereference(p); if (pointee) do_stuff(pointee); rcu_read_unlock();
on the read side.
In our case, the rcu_assign_pointer() is in idr_remove(). And the rcu_dereference() is in mem_cgroup_from_id() -> idr_find() -> radix_tree_lookup() -> radix_tree_descend().
So if we find the memcg in the idr under rcu lock, the cgroup rcu work is guaranteed to not run until the lock is dropped. If we don't find it, it may or may not have already run.
Yeah I missed the implicit barrier there, thanks for bearing with me. I think Shakeel might have found the actual problem here (see his response).