On Wed, 23 Nov 2022 at 10:39, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 23.11.22 um 10:30 schrieb Daniel Vetter:
On Wed, 23 Nov 2022 at 10:06, Christian König christian.koenig@amd.com wrote:
Am 22.11.22 um 20:50 schrieb Daniel Vetter:
On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
You nuke all the ptes. Drivers that move have slightly more than a bare struct file, they also have a struct address_space so that invalidate_mapping_range() works.
Okay, this is one of the ways that this can be made to work correctly, as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this was the DAX mistake)
Hence this patch, to enforce that no dma-buf exporter gets this wrong. Which some did, and then blamed bug reporters for the resulting splats :-) One of the things we've reverted was the ttm huge pte support, since that doesn't have the pmd_special flag (yet) and so would let gup_fast through.
The problem is not only gup, a lot of people seem to assume that when you are able to grab a reference to a page that the ptes pointing to that page can't change any more. And that's obviously incorrect.
I witnessed tons of discussions about that already. Some customers even modified our code assuming that and then wondered why the heck they ran into data corruption.
It's gotten so bad that I've even proposed intentionally mangling the page reference count on TTM allocated pages: https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1...
Yeah maybe something like this could be applied after we land this patch here.
I think both should land ASAP. We don't have any other way than to clearly point out incorrect approaches if we want to prevent the resulting data corruption.
Well maybe should have the same check in gem mmap code to make sure no driver
Reads like the sentence is somehow cut of?
Yeah, just wanted to say that we need to make sure in as many places as possible that VM_PFNMAP is set for bo mmaps.
I think it would be better that instead of having special flags in the ptes and vmas that you can't follow them to a page structure we would add something to the page indicating that you can't grab a reference to it. But this might break some use cases as well.
Afaik the problem with that is that there's no free page bits left for these debug checks. Plus the pte+vma flags are the flags to make this clear already.
Maybe a GFP flag to set the page reference count to zero or something like this?
Hm yeah that might work. I'm not sure what it will all break though? And we'd need to make sure that underflowing the page refcount dies in a backtrace. -Daniel