On Mon, Mar 30, 2020 at 09:44:08PM -0700, Sean Christopherson wrote:
On Mon, Mar 30, 2020 at 08:35:29PM -0700, Mike Kravetz wrote:
On 3/28/20 3:10 PM, akpm@linux-foundation.org wrote:
The patch titled Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset has been added to the -mm tree. Its filename is mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-excepti... and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-excepti...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
From: Longpeng longpeng2@huawei.com Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
This patch is what caused the BUG reported on i386 non-PAE kernel here:
https://lore.kernel.org/linux-mm/CA+G9fYsJgZhhWLMzUxu_ZQ+THdCcJmFbHQ2ETA_YPP...
As a clue, when building in this environment I get:
CC mm/hugetlb.o mm/hugetlb.c: In function ‘huge_pte_offset’: cc1: warning: function may return address of local variable [-Wreturn-local-addr] mm/hugetlb.c:5361:14: note: declared here pud_t *pud, pud_entry; ^~~~~~~~~ cc1: warning: function may return address of local variable [-Wreturn-local-addr] mm/hugetlb.c:5361:14: note: declared here cc1: warning: function may return address of local variable [-Wreturn-local-addr] mm/hugetlb.c:5360:14: note: declared here p4d_t *p4d, p4d_entry; ^~~~~~~~~
Yes, this is certainly very bad.
Non-PAE uses ModeB / PSE paging, which only has 2-level page tables. The non-existent levels get folded in and pmd_offset/pud_offset() return the passed in pointer instead of accessing a table, e.g.:
static inline pmd_t * pmd_offset(pud_t * pud, unsigned long address) { return (pmd_t *)pud; }
The bug probably only manifests with PSE paging because it can have huge pages in the top-level table, i.e. is the only mode that can get a false positive.
This is arguably a bug in pmd_huge/pud_hug(), seems like they should unconditionally return false if the relevant level doesn't exist.
The issue is that to get the READ_ONCE semantic for a lockless flow this hackily defeats the de-reference inside the pXX_offset by passing in a pointer to a stack variable. This is fine unless you actually care about the *address* of the result of pXX_offset, which huge_pte_offset() does.
I can't think of an easy fix here.
Andrew, I think this patch has to be dropped :(
Longpeng can fix the direct bug he saw by not changing the pXX_offset(), but this extra de-reference will remain some theortical/rare bug according to the memory model.
Maybe we need to change pXX_offset to take in the pointer and the de'refd value?
Jason