On Wed, 3 Jul 2024 09:56:45 -0700 Mina Almasry wrote:
Is this really sufficient in terms of locking? @binding is not RCU-protected and neither is the reader guaranteed to be in an RCU critical section. Actually the "reader" tries to take a ref and use this struct so it's not even a pure reader.
Let's add a lock or use one of the existing locks
Can we just use rtnl_lock() for this synchronization? It seems it's already locked everywhere that access mp_params.mp_priv, so the WRITE/READ_ONCE are actually superfluous. Both the dmabuf bind/unbind already lock rtnl_lock, and the only other place that access mp_params.mp_priv is page_pool_init(). I think it's reasonable to assume rtnl_lock is also held during page_pool_init, no? AFAICT it would be very weird for some code path to be reconfiguring the driver page_pools without holding rtnl_lock?
What I wanna do here is delete the incorrect comment, remove the READ/WRITE_ONCE, and maybe add a DEBUG_NET_WARN_ON(!rtnl_is_locked()) in mp_dmabuf_devmem_init() but probably that is too defensive.
The only concern I have is driver error recovery paths. They may be async and may happen outside of rtnl_lock. Same situation we have with the queue <> NAPI <> IRQ mapping helpers. queue <> NAPI <> IRQ helpers require rtnl_lock today, and Intel recently had a number of fixes because that complicates their error recovery paths.
But I guess any locking here will take non-trivial amount of analysis. Let's go with rtnl_lock, that's fine.