On 2 May 2024, at 3:33, Ryan Roberts wrote:
On 02/05/2024 02:27, Zi Yan wrote:
On 1 May 2024, at 10:33, Ryan Roberts wrote:
__split_huge_pmd_locked() can be called for a present THP, devmap or (non-present) migration entry. It calls pmdp_invalidate() unconditionally on the pmdp and only determines if it is present or not based on the returned old pmd. This is a problem for the migration entry case because pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a present pmd.
On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future call to pmd_present() will return true. And therefore any lockless pgtable walker could see the migration entry pmd in this state and start interpretting the fields as if it were present, leading to BadThings (TM). GUP-fast appears to be one such lockless pgtable walker.
x86 does not suffer the above problem, but instead pmd_mkinvalid() will corrupt the offset field of the swap entry within the swap pte. See link below for discussion of that problem.
Fix all of this by only calling pmdp_invalidate() for a present pmd. And for good measure let's add a warning to all implementations of pmdp_invalidate[_ad](). I've manually reviewed all other pmdp_invalidate[_ad]() call sites and believe all others to be conformant.
This is a theoretical bug found during code review. I don't have any test case to trigger it in practice.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Ryan Roberts ryan.roberts@arm.com
Right v3; this goes back to the original approach in v1 to fix core-mm rather than push the fix into arm64, since we discovered that x86 can't handle pmd_mkinvalid() being called for non-present pmds either.
I'm pulling in more arch maintainers because this version adds some warnings in arch code to help spot incorrect usage.
Although Catalin had already accepted v2 (fixing arm64) [2] into for-next/fixes, he's agreed to either remove or revert it.
Changes since v1 [1]
- Improve pmdp_mkinvalid() docs to make it clear it can only be called for present pmd (per JohnH, Zi Yan)
- Added warnings to arch overrides of pmdp_invalidate[_ad]() (per Zi Yan)
- Moved comment next to new location of pmpd_invalidate() (per Zi Yan)
[1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.c... [2] https://lore.kernel.org/all/20240430133138.732088-1-ryan.roberts@arm.com/
Thanks, Ryan
Documentation/mm/arch_pgtable_helpers.rst | 6 ++- arch/powerpc/mm/book3s64/pgtable.c | 1 + arch/s390/include/asm/pgtable.h | 4 +- arch/sparc/mm/tlb.c | 1 + arch/x86/mm/pgtable.c | 2 + mm/huge_memory.c | 49 ++++++++++++----------- mm/pgtable-generic.c | 2 + 7 files changed, 39 insertions(+), 26 deletions(-)
The changes in Documentation/mm and mm/* look good to me. Thanks. Reviewed-by: Zi Yan ziy@nvidia.com
Thanks!
I wonder if making Documentation/mm and mm/* changes in a separate patch would be better, since you will not need acks from arch maintainers and get the patch in quicker.
Yeah maybe - I considered that, but then decided I'm literally just adding a debug warning to the arch code so it shouldn't be too controversial. Perhaps wait a few days for acks and if nothing turns up then I'll re-post with it split?
Sure, sounds good to me. Thanks.
-- Best Regards, Yan, Zi