Re: [Linaro-mm-sig] [RFC] Add DMA_RESV_USAGE flags

21 May 2021


      On Thu, May 20, 2021 at 9:15 PM Daniel Vetter daniel@ffwll.ch wrote:
...
On Thu, May 20, 2021 at 9:04 PM Jason Ekstrand jason@jlekstrand.net wrote:
...
On Thu, May 20, 2021 at 12:23 PM Jason Ekstrand jason@jlekstrand.net wrote:
...
On Thu, May 20, 2021 at 5:50 AM Christian König
ckoenig.leichtzumerken@gmail.com wrote:
...
Am 20.05.21 um 09:55 schrieb Daniel Vetter:
...
On Wed, May 19, 2021 at 5:48 PM Michel Dänzer michel@daenzer.net wrote:
...
On 2021-05-19 5:21 p.m., Jason Ekstrand wrote:
> On Wed, May 19, 2021 at 5:52 AM Michel Dänzer michel@daenzer.net wrote:
>> On 2021-05-19 12:06 a.m., Jason Ekstrand wrote:
>>> On Tue, May 18, 2021 at 4:17 PM Daniel Vetter daniel@ffwll.ch wrote:
>>>> On Tue, May 18, 2021 at 7:40 PM Christian König
>>>> ckoenig.leichtzumerken@gmail.com wrote:
>>>>> Am 18.05.21 um 18:48 schrieb Daniel Vetter:
>>>>>> On Tue, May 18, 2021 at 2:49 PM Christian König
>>>>>> ckoenig.leichtzumerken@gmail.com wrote:
>>>>>>
>>>>>>> And as long as we are all inside amdgpu we also don't have any oversync,
>>>>>>> the issue only happens when we share dma-bufs with i915 (radeon and
>>>>>>> AFAIK nouveau does the right thing as well).
>>>>>> Yeah because then you can't use the amdgpu dma_resv model anymore and
>>>>>> have to use the one atomic helpers use. Which is also the one that
>>>>>> e.g. Jason is threathening to bake in as uapi with his dma_buf ioctl,
>>>>>> so as soon as that lands and someone starts using it, something has to
>>>>>> adapt _anytime_ you have a dma-buf hanging around. Not just when it's
>>>>>> shared with another device.
>>>>> Yeah, and that is exactly the reason why I will NAK this uAPI change.
I just re-sent my dma-buf sync_file import/export series.  Assuming we
can sort out what implicit sync looks like on the inside of dma-buf,
would that alleviate some of your uAPI fears?  The idea would be that
radeonsi and RADV would use amdgpu explicit sync primitives for
everything and then, at the very end, fetch a sync_file and stuff it
in the dma-buf's implicit sync container.  No nasty new uAPI for you.
We still get implicit sync.  Everyone wins?
You still need the implicit fencing opt-out, which currently amdgpu
lacks completely.
I think one of my big questions is to what extent we need another
opt-out in amdgpu (I mean I certainly want the submission level
opt-out but I don't think it is blocking)? Currently amdgpu has two
opt outs at buffer creation time:
1) AMDGPU_GEM_CREATE_EXPLICIT_SYNC
2) AMDGPU_GEM_CREATE_VM_ALWAYS_VALID
Together implicit sync is mostly disabled for Vulkan outside of the
WSI, allowing multiple engines simultaneously if we just use shared
fences for these (or for the latter not really any fences at the BO
level, but they are not shareable). On GL I think radeonsi now
disabled essentially all SDMA usage meaning it is pretty much
restricted to one engine at a time. Leaves video, where I'm not sure.
Maybe reference frames? On the other hand, video often involves a
process/drm-file boundary in which case the shared fences already
worked as exclusive and hence prohibit cross-engine usage.
...
But I also thought through the security implications of the patch set
(including the exclusive injection patch 4), and I think even with
current amdgpu that's perfectly fine. Not very useful since the fences
you get out aren't reflecting status accurately, but that's not a
correctness/security issue. You'll simply hit stalls when you don't
expect, because the kernel is allowed to throw random other exclusive
fences in whenever it feels like.
...
Of course, that still leaves the question of what read and write
fences are, what they mean, and where they go in the dma_resv.  But
I'm trying to separate problems here.
Yeah I'll dump my patch set for clarifying status quo tomorrow for that.
-Daniel
...
--Jason
...
...
...
...
>>>>> This doesn't works for amdgpu at all for the reasons outlined above.
>>>> Uh that's really not how uapi works. "my driver is right, everyone
>>>> else is wrong" is not how cross driver contracts are defined. If that
>>>> means a perf impact until you've fixed your rules, that's on you.
>>>>
>>>> Also you're a few years too late with nacking this, it's already uapi
>>>> in the form of the dma-buf poll() support.
>>> ^^  My fancy new ioctl doesn't expose anything that isn't already
>>> there.  It just lets you take a snap-shot of a wait instead of doing
>>> an active wait which might end up with more fences added depending on
>>> interrupts and retries.  The dma-buf poll waits on all fences for
>>> POLLOUT and only the exclusive fence for POLLIN.  It's already uAPI.
>> Note that the dma-buf poll support could be useful to Wayland compositors for the same purpose as Jason's new ioctl (only using client buffers which have finished drawing for an output frame, to avoid missing a refresh cycle due to client drawing), *if* it didn't work differently with amdgpu.
>>
>> Am I understanding correctly that Jason's new ioctl would also work differently with amdgpu as things stand currently? If so, that would be a real bummer and might hinder adoption of the ioctl by Wayland compositors.
> My new ioctl has identical semantics to poll().  It just lets you take
> a snapshot in time to wait on later instead of waiting on whatever
> happens to be set right now.  IMO, having identical semantics to
> poll() isn't something we want to change.
Agreed.
I'd argue then that making amdgpu poll semantics match those of other drivers is a pre-requisite for the new ioctl, otherwise it seems unlikely that the ioctl will be widely adopted.
This seems backwards, because that means useful improvements in all
other drivers are stalled until amdgpu is fixed.
Well there is nothing to fix in amdgpu, what we need to is to come up
with an DMA-buf implicit syncing model which works for everyone.
I've pointed this problem out at FOSDEM roughly 6 years ago, before
DMA-buf was even merged upstream and way before amdgpu even existed. And
the response was yeah, maybe we need to look at this as well.
Over the years I've mentioned now at least 5 times that this isn't going
to work in some situations and came up with different approaches how to
fix it.
And you still have the nerves to tell me that this isn't a problem and
we should fix amdgpu instead? Sorry, but I'm really running out of ideas
how to explain why this isn't working for everybody.
I'm trying really hard to not fuel a flame war here but I tend to lean
Daniel's direction on this.  Stepping back from the individual needs
of amdgpu and looking at things from the PoV of Linux as a whole, AMD
being a special snowflake here is bad.  I think we have two problems:
amdgpu doesn't play by the established rules, and the rules don't work
well for amdgpu.  We need to solve BOTH problems.  Does that mean we
need to smash something into amdgpu to force it into the dma-buf model
today?  Maybe not; stuff's working well enough, I guess.  But we can't
just rewrite all the rules and break everyone else either.
...
That amdgpu wants to be special is true, but it is a fundamental problem
that we have designed the implicit sync in DMA-buf only around the needs
of DRM drivers at that time instead of going a step back and saying hey
what would be an approach which works for everyone.
How else was it supposed to be designed?  Based on the needs of
non-existent future drivers?  That's just not fair.  We (Intel) are
being burned by various aspects of dma-buf these days too.  It does no
good to blame past developers or our past selves for not knowing the
future.  It sucks but it's what we have.  And, to move forward, we
need to fix it.  Let's do that.
My concern with the flags approach as I'm beginning to digest it is
that it's a bit too much of an attempt to rewrite history for my
liking.  What do I mean by that?  I mean that any solution we come up
with needs ensure that legacy drivers and modern drivers can play
nicely together.  Either that or we need to modernize all the users of
dma-buf implicit sync.  I really don't like the "as long as AMD+Intel
works, we're good" approach.
...
You just need to apply my example from FOSDEM with ring buffers in a
single BO to the DMA-buf implicit sync model and immediately see how it
falls apart.
...
I think we need agreement on what the rules are, reasonable plan to
get there, and then that should be enough to unblock work in the wider
community. Holding the community at large hostage because one driver
is different is really not great.
Well forcing a drivers into a synchronization model not ideal for their
hardware isn't great either.
As I said above, we're feeling the pain too.
--Jason
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [RFC] Add DMA_RESV_USAGE flags