On Wed, Jan 13, 2021 at 3:05 PM Pavel Tatashin pasha.tatashin@soleen.com wrote:
Oh, that existing logic is wrong too :( Another bug.
I do not think there is a bug.
You can't skip pages in the pages[] array under the assumption they are contiguous. ie the i+=step is wrong.
If pages[i] is part of a compound page, the other parts of this page must be sequential in this array for this compound page
That is true only if the PMD points to the page. If the PTE points to a tail page then there is no requirement that other PTEs are contiguous with the compount page.
At this point we have no idea if the GUP logic got this compound page as a head page in a PMD or as a tail page from a PTE, so we can't assume a contiguous run of addresses.
I see, I will fix this bug in an upstream as a separate patch in my series, and keep the fix when my fixes are applied.
Look at split_huge_pmd() - it doesn't break up the compound page it just converts the PMD to a PTE array and scatters the tail pages to the PTE.
Hi Jason,
I've been thinking about this some more. Again, I am not sure this is a bug. I understand split_huge_pmd() may split the PMD size page into PTEs and leave the compound page intact. However, in order for pages[] to have non sequential addresses in compound page, those PTEs must also be migrated after split_huge_pmd(), however when we migrate them we will either migrate the whole compound page or do split_huge_page_to_list() which will in turn do ClearPageCompound(). Please let me know if I am missing something.
Thank you, Pasha
Got it, unfortunately the fix will deoptimize the code by having to check every page if it is part of a previous compound page or not.
I understand Matt is pushing on this idea more by having compound pages in the page cache, but still mapping tail pages when required.
This is actually standard migration procedure, elsewhere in the kernel we migrate pages in exactly the same fashion: isolate and later migrate. The isolation works for LRU only pages.
But do other places cause a userspace visible random failure when LRU isolation fails?
Makes sense, I will remove maximum retries for isolation, and retry indefinitely, the same as it is done during memory hot-remove. So, we will fail only when migration fails.
I don't like it at all, what is the user supposed to do?
Jason