The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 625110b5e9dae9074d8a7e67dd07f821a053eed7 Mon Sep 17 00:00:00 2001
From: Thomas Hellstrom thellstrom@vmware.com Date: Sat, 30 Nov 2019 17:51:32 -0800 Subject: [PATCH] mm/memory.c: fix a huge pud insertion race during faulting
A huge pud page can theoretically be faulted in racing with pmd_alloc() in __handle_mm_fault(). That will lead to pmd_alloc() returning an invalid pmd pointer.
Fix this by adding a pud_trans_unstable() function similar to pmd_trans_unstable() and check whether the pud is really stable before using the pmd pointer.
Race: Thread 1: Thread 2: Comment create_huge_pud() Fallback - not taken. create_huge_pud() Taken. pmd_alloc() Returns an invalid pointer.
This will result in user-visible huge page data corruption.
Note that this was caught during a code audit rather than a real experienced problem. It looks to me like the only implementation that currently creates huge pud pagetable entries is dev_dax_huge_fault() which doesn't appear to care much about private (COW) mappings or write-tracking which is, I believe, a prerequisite for create_huge_pud() falling back on thread 1, but not in thread 2.
Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 3127f9028f54..798ea36a0549 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -938,6 +938,31 @@ static inline int pud_trans_huge(pud_t pud) } #endif
+/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */ +static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud) +{ + pud_t pudval = READ_ONCE(*pud); + + if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + return 1; + if (unlikely(pud_bad(pudval))) { + pud_clear_bad(pud); + return 1; + } + return 0; +} + +/* See pmd_trans_unstable for discussion. */ +static inline int pud_trans_unstable(pud_t *pud) +{ +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + return pud_none_or_trans_huge_or_dev_or_clear_bad(pud); +#else + return 0; +#endif +} + #ifndef pmd_read_atomic static inline pmd_t pmd_read_atomic(pmd_t *pmdp) { diff --git a/mm/memory.c b/mm/memory.c index 62b5cce653f6..c3902201989f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4010,6 +4010,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pud = pud_alloc(mm, p4d, address); if (!vmf.pud) return VM_FAULT_OOM; +retry_pud: if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) @@ -4036,6 +4037,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pmd = pmd_alloc(mm, vmf.pud, address); if (!vmf.pmd) return VM_FAULT_OOM; + + /* Huge pud page fault raced with pmd_alloc? */ + if (pud_trans_unstable(vmf.pud)) + goto retry_pud; + if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) { ret = create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK))
On Mon, Dec 16, 2019 at 01:00:44PM +0100, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 625110b5e9dae9074d8a7e67dd07f821a053eed7 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom thellstrom@vmware.com Date: Sat, 30 Nov 2019 17:51:32 -0800 Subject: [PATCH] mm/memory.c: fix a huge pud insertion race during faulting
A huge pud page can theoretically be faulted in racing with pmd_alloc() in __handle_mm_fault(). That will lead to pmd_alloc() returning an invalid pmd pointer.
Fix this by adding a pud_trans_unstable() function similar to pmd_trans_unstable() and check whether the pud is really stable before using the pmd pointer.
Race: Thread 1: Thread 2: Comment create_huge_pud() Fallback - not taken. create_huge_pud() Taken. pmd_alloc() Returns an invalid pointer.
This will result in user-visible huge page data corruption.
Note that this was caught during a code audit rather than a real experienced problem. It looks to me like the only implementation that currently creates huge pud pagetable entries is dev_dax_huge_fault() which doesn't appear to care much about private (COW) mappings or write-tracking which is, I believe, a prerequisite for create_huge_pud() falling back on thread 1, but not in thread 2.
Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
This one doesn't apply cleanly because 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") has changed what transparent_hugepage_enabled() does.
The "right" backport here would be to simply change from calling __transparent_hugepage_enabled() to calling transparent_hugepage_enabled() as we don't have 7635d9cbe832 in older kernels, but I worry that if we do end up backporting some part of that logic change later it will diverge us from upstream and will cause for subtle issues that are difficult to debug.
So unless Michal / Andrew yell at me for this, I'm going to take in 7635d9cbe832 even though it's clearly a new feature just to make 625110b5e9da and future patches apply cleanly, and avoid future issues.
Hi!
On 12/16/19 4:57 PM, Sasha Levin wrote:
On Mon, Dec 16, 2019 at 01:00:44PM +0100, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 625110b5e9dae9074d8a7e67dd07f821a053eed7 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom thellstrom@vmware.com Date: Sat, 30 Nov 2019 17:51:32 -0800 Subject: [PATCH] mm/memory.c: fix a huge pud insertion race during faulting
A huge pud page can theoretically be faulted in racing with pmd_alloc() in __handle_mm_fault(). That will lead to pmd_alloc() returning an invalid pmd pointer.
Fix this by adding a pud_trans_unstable() function similar to pmd_trans_unstable() and check whether the pud is really stable before using the pmd pointer.
Race: Thread 1: Thread 2: Comment create_huge_pud() Fallback - not taken. create_huge_pud() Taken. pmd_alloc() Returns an invalid pointer.
This will result in user-visible huge page data corruption.
Note that this was caught during a code audit rather than a real experienced problem. It looks to me like the only implementation that currently creates huge pud pagetable entries is dev_dax_huge_fault() which doesn't appear to care much about private (COW) mappings or write-tracking which is, I believe, a prerequisite for create_huge_pud() falling back on thread 1, but not in thread 2.
Link: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.kernel... Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
This one doesn't apply cleanly because 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") has changed what transparent_hugepage_enabled() does.
The "right" backport here would be to simply change from calling __transparent_hugepage_enabled() to calling transparent_hugepage_enabled() as we don't have 7635d9cbe832 in older kernels, but I worry that if we do end up backporting some part of that logic change later it will diverge us from upstream and will cause for subtle issues that are difficult to debug.
So unless Michal / Andrew yell at me for this, I'm going to take in 7635d9cbe832 even though it's clearly a new feature just to make 625110b5e9da and future patches apply cleanly, and avoid future issues.
Isn't this a change just in the patch context?
In any case, please see previous mails regarding additional testing of this patch.
Thanks,
Thomas
linux-stable-mirror@lists.linaro.org