From: Mike Rapoport rppt@linux.ibm.com
commit 34b1999da935a33be6239226bfa6cd4f704c5c88 upstream.
Jiri Olsa reported a fault when running:
# cat /proc/kallsyms | grep ksys_read ffffffff8136d580 T ksys_read # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore
/proc/kcore: file format elf64-x86-64
Segmentation fault
general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:kern_addr_valid Call Trace: read_kcore ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? trace_hardirqs_on ? rcu_read_lock_sched_held ? lock_acquire ? lock_acquire ? rcu_read_lock_sched_held ? lock_acquire ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? lock_release ? _raw_spin_unlock ? __handle_mm_fault ? rcu_read_lock_sched_held ? lock_acquire ? rcu_read_lock_sched_held ? lock_release proc_reg_read ? vfs_read vfs_read ksys_read do_syscall_64 entry_SYSCALL_64_after_hwframe
The fault happens because kern_addr_valid() dereferences existent but not present PMD in the high kernel mappings.
Such PMDs are created when free_kernel_image_pages() frees regions larger than 2Mb. In this case, a part of the freed memory is mapped with PMDs and the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will mark the PMD as not present rather than wipe it completely.
Have kern_addr_valid() check whether higher level page table entries are present before trying to dereference them to fix this issue and to avoid similar issues in the future.
Stable backporting note: ------------------------
Note that the stable marking is for all active stable branches because there could be cases where pagetable entries exist but are not valid - see 9a14aefc1d28 ("x86: cpa, fix lookup_address"), for example. So make sure to be on the safe side here and use pXY_present() accessors rather than pXY_none() which could #GP when accessing pages in the direct map.
Also see:
c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping")
for more info.
Reported-by: Jiri Olsa jolsa@redhat.com Signed-off-by: Mike Rapoport rppt@linux.ibm.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: David Hildenbrand david@redhat.com Acked-by: Dave Hansen dave.hansen@intel.com Tested-by: Jiri Olsa jolsa@redhat.com Cc: stable@vger.kernel.org # 4.4+ Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/mm/init_64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1182,21 +1182,21 @@ int kern_addr_valid(unsigned long addr) return 0;
pud = pud_offset(pgd, addr); - if (pud_none(*pud)) + if (!pud_present(*pud)) return 0;
if (pud_large(*pud)) return pfn_valid(pud_pfn(*pud));
pmd = pmd_offset(pud, addr); - if (pmd_none(*pmd)) + if (!pmd_present(*pmd)) return 0;
if (pmd_large(*pmd)) return pfn_valid(pmd_pfn(*pmd));
pte = pte_offset_kernel(pmd, addr); - if (pte_none(*pte)) + if (!pte_present(*pte)) return 0;
return pfn_valid(pte_pfn(*pte));