On Mon, Apr 19, 2021 at 12:23:02PM +0100, Matthew Wilcox wrote:
On Mon, Apr 19, 2021 at 11:42:18AM +0300, Mike Rapoport wrote:
The perf profile of the test indicated that the regression is caused by page_is_secretmem() called from gup_pte_range() (inlined by gup_pgd_range):
Uhh ... you're calling it in the wrong place!
VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); if (page_is_secretmem(page)) goto pte_unmap; head = try_grab_compound_head(page, 1, flags); if (!head) goto pte_unmap;
So you're calling page_is_secretmem() on a struct page without having a refcount on it. That is definitely not allowed. secretmem seems to be full of these kinds of races; I know this isn't the first one I've seen in it. I don't think this patchset is ready for this merge window.
There were races in the older version that did caching of large pages and those were fixed then, but this is anyway irrelevant because all that code was dropped in the latest respins.
I don't think that the fix of the race in gup_pte_range is that significant to wait 3 more months because of it.
With that fixed, you'll have a head page that you can use for testing, which means you don't need to test PageCompound() (because you know the page isn't PageTail), you can just test PageHead().
I can't say I follow you here. page_is_secretmem() is intended to be a generic test, so I don't see how it will get PageHead() if it is called from other places.