On 23.09.25 13:46, Harry Yoo wrote:
On Tue, Sep 23, 2025 at 11:00:57AM +0200, David Hildenbrand wrote:
On 22.09.25 01:27, Harry Yoo wrote:
Hi. This is supposed to be a patch, but I think it's worth discussing how it should be backported to -stable, so I've labeled it as [DISCUSSION].
The bug described below was unintentionally fixed in v6.5 and not backported to -stable. So technically I would need to use "Option 3" [A],
What is option 3?
Citing Option 3 from [A]:
Ah, I see.
Send the patch, after verifying that it follows the above rules, to stable@vger.kernel.org and mention the kernel versions you wish it to be applied to. When doing so, you must note the upstream commit ID in the changelog of your submission with a separate line above the commit text,
like this: commit <sha1> upstream.
Or alternatively: [ Upstream commit <sha1> ]
If the submitted patch deviates from the original upstream patch (for example because it had to be adjusted for the older API), this must be very clearly documented and justified in the patch description.
Just to clarify: it's fine to do a backport of a commit even though it was not tagged as a fix.
Thanks for looking into it, David!
Ok, I was worried that the original patch's description will confuse people because 1) we don't allow pte_map_offset_lock() to fail in older kernels, which the original patch relies on, and 2) the patch does not mention the race condition (because it fixed the race 'accidentaly' :D).
I'll backport the original patch but make it clear that:
while the original patch did not mention the race condition, the patch fixes a it, and add link to this discussion.
we can't remove 1) pmd_trans_unstable() check in change_pte_range(), and 2) "bad" pmd check in change_pmd_range() because we don't allow pte_offset_map_lock() to fail().
pmd_read_atomic() is used instead of pmdp_get_lockless() beucase it does not exist in older kernels.
Right, and backporting all the prerequisites might not be feasible/desirable.
[...]
goto next;
This is all because we are trying to be smart and walking page tables without the page table lock held. This is just absolutely nasty.
commit 175ad4f1e7a2 ("mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock") did this :(
Right. I can understand why we would not want to grab the lock when there is a leaf page table. But everything else is just asking for trouble (as we saw :) ).
What about the following check:
if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
Couldn't we have a similar race there when we are concurrently migrating?
An excellent point! I agree that there could be a similar race, but with something other than "bad pmd" error.
Right, instead we'd go into change_pte_range() where we check pmd_trans_unstable().
After that, I'm not sure ... maybe we'll just retry or we'll accidentally try treating it as a PTE table.
Looks like pmd_trans_unstable()->pud_none_or_trans_huge_or_dev_or_clear_bad() would return "0" in case we hit migration entry? :/
It'd be more robust to do something like:
That's also what I had in mind. But all this lockless stuff makes me a bit nervous :)