On Wed, 3 Jun 2026 21:43:05 -0300 Daniel Almeida dwlsalmeida@gmail.com wrote:
On 3 Jun 2026, at 14:14, Boris Brezillon boris.brezillon@collabora.com wrote:
On Wed, 3 Jun 2026 13:41:02 -0300 Daniel Almeida dwlsalmeida@gmail.com wrote:
- /// Called when the fence is signaled.
- ///
- /// This is called from the fence signaling path, which may be in interrupt
- /// context or with locks held, which is why `self` is only borrowed, so that
- /// it cannot drop. Implementations must not sleep or perform
- /// long-running operations.
- ///
- /// An implementation likely wants to inform itself (e.g., through a work item)
- /// within this callback that the associated [`FenceCbRegistration`] can now be
- /// dropped.
- fn called(&mut self);
This is a central point. We ideally would want this to consume self, because we may want to move things out of the callback.
This one comes from me. The rationale being that ::called() is called from an atomic context, and the resources attached to the callback data might require acquiring other sleeping locks to be released, and sometimes you don't even notice immediately because said resources are refcounted, and the lock is only acquired when you happen to be the last owner. Yes, those can be caught at runtime if the C side is properly annotated with might_sleep(), but that's not always the case.
If we defer the drop of the data only when the FenceCb is dropped/recycled, we're at least not constrained by this "runs in atomic context" thing.
This design does not solve it, because one can quite trivially get around this restriction using Option<T> as I said. If your point is “don’t run any drop() here”, then &mut self doesn’t do it.
My bad, I thought you were talking about some Option<T> in FenceCbRegistration<T> (there was one at some point, but it's gone now), but you're talking about having an Option<X> inside the T. Yes, there's indeed nothing preventing a drop on X in that path, and it's just as bad as passing the fence back as value to the callback in that case.
Consider a fence design where signal() consumes self. Now consider this:
impl FenceCb for MyCallback { fn called(&mut self) { // Can't move the fence out, so we have to put an Option<T> just to be able // to move. if let Some(f) = self.some_fence.take() { f.signal(); } }This used to be the case when our version of the job queue used the "proxy fence" design:
// Callback on the hw fence impl FenceCb for MyCallback { fn called(&mut self) { if let Some(f) = self.submit_fence.take() { f.signal(); }I'm pretty sure lockdep won't like it anyway, because this is nested locking of the same lock class. For such proxies, we'll need to teach lockdep about the nesting like has been recently done on dma_fence_array & co. But I'm digressing.
Yeah, but this is more about resource transfer in general, not this pattern specifically.
I agree that this has issues, and yes, lockdep complained back then :)
The thing is, there's so many aspects that could go wrong because of the context this callback is called in. Nested locking is one of them, the fact we can't sleep is another. And with rust it's even worse, because of the implicit drops that will happen when you take ownership of resources (taking sleeping locks to remove resources from a dataset for instance).
So, by passing self by value to the ::callback(), you're basically telling users "hey, BTW, don't forget to defer the drop to some workqueue if you think it's not atomic-safe". And how can users know that the thing they're about to drop can be dropped in atomic context? They basically have to audit the ::drop() of all the resources they embed in their type implementing FenceCb. Not only that, but they also have to design the thing so the deferral of this ::drop() doesn't allocate, because, obviously, allocating in atomic context is tricky/fallible. AFAIK, none of this can be spot at compile-time (I remember Gary/Danilo mentioning that we could teach the klint about some of these rules). This would leave us with runtime checks like might_sleep(), but most of the C putters (xxx_put(object)) don't have might_sleep() in the path where the decref doesn't lead to a refcnt=0 situation.
TLDR; Call this PTSD if you want, but this is the sort of bugs I struggled with on the C side, and I can predict that the exact same will happen in rust drivers if we expose the FenceCb as it is designed here and we don't have a way to check the soundness of the FenceCb implementations at compile time.
The other option (the one I've been advocating for from the start), is to not let drivers implement FenceCb (make it private), but instead have a bunch of implementations that we know are safe. Here's a list of implementations that I think would unblock most of the drivers use cases:
- wakeup a thread - complete a completion object - schedule a WorkItem - schedule a kthread_worker (once we get a proper rust abstraction for that)
It doesn't mean we can't have optimized FenceCb implementations that do a lot more in the callback() path instead of deferring to a workqueue/thread, but at least those would have to be implemented in dma_fence.rs, and the dma_fence.rs maintainers can then carefully audit the code as part of the review process, which we know is not really the case when changes touch drivers code only.
FWIW, I think the FenceProxy design you were describing falls into this "must be carefully audited" bucket, and should be implemented in dma_fence.rs.
}
Although this is not the case anymore, since we phased out this design given Christian's recent work. Still, we should ideally not require Option<T> here in general just to make resource transfer possible.I see. OTOH, don't we need to make this inner data movable if we want to cancel the FenceCb before the fence is signaled anyway? And that's most certainly a case we have in the teardown path.
Can you expand a bit on what you mean here?
Never mind, I was confusing two different iterations of the code here. I thought the Option<T> you were mentioning was in FenceCbRegistration<T>, with some explicit ::cancel() function that would return Option<T> so the user can get its resources back when it cancels the registration, and also know whether the callback was called or not. But this is all gone now, and all we can do is drop the registration, which will automatically drop the inner T.
linaro-mm-sig@lists.linaro.org