This is the start of the stable review cycle for the 4.14.9 release. There are 159 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Dec 24 08:45:36 UTC 2017. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.9-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.9-rc1
Peter Hutterer peter.hutterer@who-t.net platform/x86: asus-wireless: send an EV_SYN/SYN_REPORT between state changes
Daniel Lezcano daniel.lezcano@linaro.org thermal/drivers/hisi: Fix multiple alarm interrupts firing
Daniel Lezcano daniel.lezcano@linaro.org thermal/drivers/hisi: Simplify the temperature/step computation
Daniel Lezcano daniel.lezcano@linaro.org thermal/drivers/hisi: Fix kernel panic on alarm interrupt
Daniel Lezcano daniel.lezcano@linaro.org thermal/drivers/hisi: Fix missing interrupt enablement
Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com IB/opa_vnic: Properly return the total MACs in UC MAC list
Scott Franco safranco@intel.com IB/opa_vnic: Properly clear Mac Table Digest
Eric Anholt eric@anholt.net drm/vc4: Avoid using vrefresh==0 mode in DSI htotal math.
Nicholas Piggin npiggin@gmail.com cpuidle: fix broadcast control when broadcast can not be entered
Alexandre Belloni alexandre.belloni@free-electrons.com rtc: set the alarm to the next expiring timer
Hoang Tran tranviethoang.vn@gmail.com tcp: fix under-evaluated ssthresh in TCP Vegas
Chen-Yu Tsai wens@csie.org clk: sunxi-ng: sun6i: Rename HDMI DDC clock to avoid name collision
Arvind Yadav arvind.yadav.cs@gmail.com staging: greybus: light: Release memory obtained by kasprintf
Wei Hu(Xavier) xavier.huwei@huawei.com RDMA/hns: Avoid NULL pointer exception
Mike Manning mmanning@brocade.com net: ipv6: send NS for DAD when link operationally up
Mick Tarsel mjtarsel@linux.vnet.ibm.com ibmvnic: Set state UP
Jacob Keller jacob.e.keller@intel.com fm10k: ensure we process SM mbx when processing VF mbx
Marek Szyprowski m.szyprowski@samsung.com ARM: exynos_defconfig: Enable UAS support for Odroid HC1 board
Alex Williamson alex.williamson@redhat.com vfio/pci: Virtualize Maximum Payload Size
Alan Brady alan.brady@intel.com i40e: fix client notify of VF reset
Dick Kennedy dick.kennedy@broadcom.com scsi: lpfc: Fix warning messages when NVME_TARGET_FC not defined
Dick Kennedy dick.kennedy@broadcom.com scsi: lpfc: PLOGI failures during NPIV testing
Dick Kennedy dick.kennedy@broadcom.com scsi: lpfc: Fix secure firmware updates
Jacob Keller jacob.e.keller@intel.com fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
Nicolas Dechesne nicolas.dechesne@linaro.org ASoC: codecs: msm8916-wcd-analog: fix module autoload
Marcelo Ricardo Leitner marcelo.leitner@gmail.com sctp: silence warns on sctp_stream_init allocations
Nicholas Piggin npiggin@gmail.com powerpc/watchdog: Do not trigger SMP crash from touch_nmi_watchdog
Nicholas Piggin npiggin@gmail.com powerpc/xmon: Avoid tripping SMP hardlockup watchdog
Ed Blake ed.blake@sondrel.com ASoC: img-parallel-out: Add pm_runtime_get/put to set_fmt callback
Jean-François TĂȘtu jean-francois.tetu@savoirfairelinux.com ASoC: codecs: msm8916-wcd-analog: fix micbias level
Tom Zanussi tom.zanussi@linux.intel.com tracing: Exclude 'generic fields' from histograms
Gabriele Paoloni gabriele.paoloni@huawei.com PCI/AER: Report non-fatal errors only to the affected endpoint
Jacob Keller jacob.e.keller@intel.com i40e/i40evf: spread CPU affinity hints across online CPUs only
Hans de Goede hdegoede@redhat.com Bluetooth: hci_bcm: Fix setting of irq trigger type
Hans de Goede hdegoede@redhat.com Bluetooth: hci_uart_set_flow_control: Fix NULL deref when using serdev
Andrew Jeffery andrew@aj.id.au leds: pca955x: Don't invert requested value in pca955x_gpio_set_value()
Wei Wang weiwan@google.com ipv6: grab rt->rt6i_ref before allocating pcpu rt
William Tu u9012063@gmail.com ip_gre: check packet length and mtu correctly in erspan tx
Guoqing Jiang gqjiang@suse.com md: always set THREAD_WAKEUP and wake up wqueue if thread existed
Luca Miccio lucmiccio@gmail.com block,bfq: Disable writeback throttling
Colin Ian King colin.king@canonical.com IB/rxe: check for allocation failure on elem
Emil Tantilov emil.s.tantilov@intel.com ixgbe: fix use of uninitialized padding
Lorenzo Bianconi lorenzo.bianconi83@gmail.com iio: st_sensors: add register mask for status register
Lihong Yang lihong.yang@intel.com i40e: use the safe hash table iterator when deleting mac filters
Christophe JAILLET christophe.jaillet@wanadoo.fr igb: check memory allocation failure
Fabio Estevam fabio.estevam@nxp.com PM / OPP: Move error message to debug level
Stuart Hayes stuart.w.hayes@gmail.com PCI: Create SR-IOV virtfn/physfn links before attaching driver
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
Varun Prakash varun@chelsio.com scsi: cxgb4i: fix Tx skb leak
David Daney david.daney@cavium.com PCI: Avoid bus reset if bridge itself is broken
Dan Murphy dmurphy@ti.com net: phy: at803x: Change error to EINVAL for invalid MAC
Shakeel Butt shakeelb@google.com kvm, mm: account kvm related kmem slabs to kmemcg
Russell King rmk+kernel@armlinux.org.uk rtc: pl031: make interrupt optional
Christophe Jaillet christophe.jaillet@wanadoo.fr crypto: lrw - Fix an error handling path in 'create()'
Christian Lamparter chunkeey@gmail.com crypto: crypto4xx - increase context and scatter ring buffer elements
Chen-Yu Tsai wens@csie.org clk: sunxi-ng: sun5i: Fix bit offset of audio PLL post-divider
Chen-Yu Tsai wens@csie.org clk: sunxi-ng: nm: Check if requested rate is supported by fractional clock
Shashank Sharma shashank.sharma@intel.com drm: Add retries for lspcon mode detection
Derek Basehore dbasehore@chromium.org backlight: pwm_bl: Fix overflow condition
Jens Wiklander jens.wiklander@linaro.org optee: fix invalid of_node_put() in optee_driver_init()
Thomas Gleixner tglx@linutronix.de x86/cpufeatures: Make CPU bugs sticky
Thomas Gleixner tglx@linutronix.de x86/paravirt: Provide a way to check for hypervisors
Thomas Gleixner tglx@linutronix.de x86/paravirt: Dont patch flush_tlb_single
Andy Lutomirski luto@kernel.org x86/entry/64: Make cpu_entry_area.tss read-only
Andy Lutomirski luto@kernel.org x86/entry: Clean up the SYSENTER_stack code
Andy Lutomirski luto@kernel.org x86/entry/64: Remove the SYSENTER stack canary
Andy Lutomirski luto@kernel.org x86/entry/64: Move the IST stacks into struct cpu_entry_area
Andy Lutomirski luto@kernel.org x86/entry/64: Create a per-CPU SYSCALL entry trampoline
Andy Lutomirski luto@kernel.org x86/entry/64: Return to userspace from the trampoline stack
Andy Lutomirski luto@kernel.org x86/entry/64: Use a per-CPU trampoline stack for IDT entries
Andy Lutomirski luto@kernel.org x86/espfix/64: Stop assuming that pt_regs is on the entry stack
Andy Lutomirski luto@kernel.org x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0
Andy Lutomirski luto@kernel.org x86/entry: Remap the TSS into the CPU entry area
Andy Lutomirski luto@kernel.org x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
Andy Lutomirski luto@kernel.org x86/dumpstack: Handle stack overflow on all stacks
Andy Lutomirski luto@kernel.org x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
Andy Lutomirski luto@kernel.org x86/kasan/64: Teach KASAN about the cpu_entry_area
Andy Lutomirski luto@kernel.org x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area
Andy Lutomirski luto@kernel.org x86/entry/gdt: Put per-CPU GDT remaps in ascending order
Andy Lutomirski luto@kernel.org x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
Andy Lutomirski luto@kernel.org x86/entry/64: Allocate and enable the SYSENTER stack
Andy Lutomirski luto@kernel.org x86/irq/64: Print the offending IP in the stack overflow warning
Andy Lutomirski luto@kernel.org x86/irq: Remove an old outdated comment about context tracking races
Josh Poimboeuf jpoimboe@redhat.com x86/unwinder: Handle stack overflows more gracefully
Andy Lutomirski luto@kernel.org x86/unwinder/orc: Dont bail on stack overflow
Boris Ostrovsky boris.ostrovsky@oracle.com x86/entry/64/paravirt: Use paravirt-safe macro to access eflags
Andrey Ryabinin aryabinin@virtuozzo.com x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
Will Deacon will.deacon@arm.com locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
Will Deacon will.deacon@arm.com locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()
Daniel Borkmann daniel@iogearbox.net bpf: fix build issues on um due to mising bpf_perf_event.h
Andi Kleen ak@linux.intel.com perf/x86: Enable free running PEBS for REGS_USER/INTR
Rudolf Marek r.marek@assembler.cz x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
Ricardo Neri ricardo.neri-calderon@linux.intel.com x86/cpufeature: Add User-Mode Instruction Prevention definitions
Ingo Molnar mingo@kernel.org drivers/misc/intel/pti: Rename the header file to free up the namespace
Juergen Gross jgross@suse.com x86/virt: Add enum for hypervisors to replace x86_hyper
Juergen Gross jgross@suse.com x86/virt, x86/platform: Merge 'struct x86_hyper' into 'struct x86_platform' and 'struct x86_init'
James Morse james.morse@arm.com ACPI / APEI: Replace ioremap_page_range() with fixmap
Andy Lutomirski luto@kernel.org selftests/x86/ldt_gdt: Run most existing LDT test cases against the GDT as well
Andy Lutomirski luto@kernel.org selftests/x86/ldt_gdt: Add infrastructure to test set_thread_area()
Ingo Molnar mingo@kernel.org x86/cpufeatures: Fix various details in the feature definitions
Ingo Molnar mingo@kernel.org x86/cpufeatures: Re-tabulate the X86_FEATURE definitions
Borislav Petkov bp@suse.de x86/mm: Define _PAGE_TABLE using _KERNPG_TABLE
Thomas Gleixner tglx@linutronix.de bitops: Revert cbe96375025e ("bitops: Add clear/set_bit32() to linux/bitops.h")
Thomas Gleixner tglx@linutronix.de x86/cpuid: Replace set/clear_bit32()
Borislav Petkov bp@suse.de x86/entry/64: Shorten TEST instructions
Andy Lutomirski luto@kernel.org x86/traps: Use a new on_thread_stack() helper to clean up an assertion
Andy Lutomirski luto@kernel.org x86/entry/64: Remove thread_struct::sp0
Andy Lutomirski luto@kernel.org x86/entry/32: Fix cpu_current_top_of_stack initialization at boot
Andy Lutomirski luto@kernel.org x86/entry/64: Remove all remaining direct thread_struct::sp0 reads
Andy Lutomirski luto@kernel.org x86/entry/64: Stop initializing TSS.sp0 at boot
Andy Lutomirski luto@kernel.org x86/xen/64, x86/entry/64: Clean up SP code in cpu_initialize_context()
Andy Lutomirski luto@kernel.org x86/entry: Add task_top_of_stack() to find the top of a task's stack
Andy Lutomirski luto@kernel.org x86/entry/64: Pass SP0 directly to load_sp0()
Andy Lutomirski luto@kernel.org x86/entry/32: Pull the MSR_IA32_SYSENTER_CS update code out of native_load_sp0()
Andy Lutomirski luto@kernel.org x86/entry/64: De-Xen-ify our NMI code
Juergen Gross jgross@suse.com xen, x86/entry/64: Add xen NMI trap entry
Andy Lutomirski luto@kernel.org x86/entry/64: Remove the RESTORE_..._REGS infrastructure
Andy Lutomirski luto@kernel.org x86/entry/64: Use POP instead of MOV to restore regs on NMI return
Andy Lutomirski luto@kernel.org x86/entry/64: Merge the fast and slow SYSRET paths
Andy Lutomirski luto@kernel.org x86/entry/64: Use pop instead of movq in syscall_return_via_sysret
Andy Lutomirski luto@kernel.org x86/entry/64: Shrink paranoid_exit_restore and make labels local
Andy Lutomirski luto@kernel.org x86/entry/64: Simplify reg restore code in the standard IRET paths
Andy Lutomirski luto@kernel.org x86/entry/64: Move SWAPGS into the common IRET-to-usermode path
Andy Lutomirski luto@kernel.org x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths
Andy Lutomirski luto@kernel.org x86/entry/64: Remove the restore_c_regs_and_iret label
Ricardo Neri ricardo.neri-calderon@linux.intel.com ptrace,x86: Make user_64bit_mode() available to 32-bit builds
Ricardo Neri ricardo.neri-calderon@linux.intel.com x86/boot: Relocate definition of the initial state of CR0
Ricardo Neri ricardo.neri-calderon@linux.intel.com x86/mm: Relocate page fault error codes to traps.h
Gayatri Kammela gayatri.kammela@intel.com x86/cpufeatures: Enable new SSE/AVX/AVX512 CPU features
Baoquan He bhe@redhat.com x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'
Masahiro Yamada yamada.masahiro@socionext.com x86/build: Beautify build log of syscall headers
Josh Poimboeuf jpoimboe@redhat.com x86/asm: Don't use the confusing '.ifeq' directive
Dongjiu Geng gengdongjiu@huawei.com ACPI / APEI: remove the unused dead-code for SEA/NMI notification type
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/xen: Drop 5-level paging support code from the XEN_PV code
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/xen: Provide pre-built page tables only for CONFIG_XEN_PV=y and CONFIG_XEN_PVH=y
Andrey Ryabinin aryabinin@virtuozzo.com x86/kasan: Use the same shadow offset for 4- and 5-level paging
Kirill A. Shutemov kirill.shutemov@linux.intel.com mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y
Thomas Gleixner tglx@linutronix.de x86/cpuid: Prevent out of bound access in do_clear_cpu_cap()
Kamalesh Babulal kamalesh@linux.vnet.ibm.com objtool: Print top level commands on incorrect usage
Kees Cook keescook@chromium.org x86/platform/UV: Convert timers to use timer_setup()
Andi Kleen ak@linux.intel.com x86/fpu: Remove the explicit clearing of XSAVE dependent features
Andi Kleen ak@linux.intel.com x86/fpu: Make XSAVE check the base CPUID features before enabling
Andi Kleen ak@linux.intel.com x86/fpu: Parse clearcpuid= as early XSAVE argument
Andi Kleen ak@linux.intel.com x86/cpuid: Add generic table for CPUID dependencies
Andi Kleen ak@linux.intel.com bitops: Add clear/set_bit32() to linux/bitops.h
Josh Poimboeuf jpoimboe@redhat.com x86/unwind: Make CONFIG_UNWINDER_ORC=y the default in kconfig for 64-bit
Josh Poimboeuf jpoimboe@redhat.com x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*'
Steven Rostedt (VMware) rostedt@goodmis.org x86/fpu/debug: Remove unused 'x86_fpu_state' and 'x86_fpu_deactivate_state' tracepoints
Ingo Molnar mingo@kernel.org x86/unwinder: Make CONFIG_UNWINDER_ORC=y the default in the 64-bit defconfig
Jan Beulich JBeulich@suse.com ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq()
Josh Poimboeuf jpoimboe@redhat.com x86/head: Add unwind hint annotations
Josh Poimboeuf jpoimboe@redhat.com x86/xen: Add unwind hint annotations
Josh Poimboeuf jpoimboe@redhat.com x86/xen: Fix xen head ELF annotations
Josh Poimboeuf jpoimboe@redhat.com x86/boot: Annotate verify_cpu() as a callable function
Josh Poimboeuf jpoimboe@redhat.com x86/head: Fix head ELF function annotations
Josh Poimboeuf jpoimboe@redhat.com x86/head: Remove unused 'bad_address' code
Josh Poimboeuf jpoimboe@redhat.com x86/head: Remove confusing comment
Josh Poimboeuf jpoimboe@redhat.com objtool: Don't report end of section error after an empty unwind hint
Uros Bizjak ubizjak@gmail.com x86/asm: Remove unnecessary \n\t in front of CC_SET() from asm templates
-------------
Diffstat:
Documentation/x86/orc-unwinder.txt | 2 +- Documentation/x86/x86_64/mm.txt | 2 +- Makefile | 8 +- arch/arm/configs/exynos_defconfig | 2 +- arch/arm64/include/asm/fixmap.h | 7 + arch/powerpc/kernel/watchdog.c | 7 +- arch/powerpc/xmon/xmon.c | 17 +- arch/um/include/asm/Kbuild | 1 + arch/x86/Kconfig | 5 +- arch/x86/Kconfig.debug | 39 +- arch/x86/configs/tiny.config | 4 +- arch/x86/configs/x86_64_defconfig | 1 + arch/x86/entry/calling.h | 69 +-- arch/x86/entry/entry_32.S | 6 +- arch/x86/entry/entry_64.S | 322 +++++++++--- arch/x86/entry/entry_64_compat.S | 10 +- arch/x86/entry/syscalls/Makefile | 4 +- arch/x86/events/core.c | 2 +- arch/x86/events/intel/core.c | 4 + arch/x86/events/perf_event.h | 24 +- arch/x86/hyperv/hv_init.c | 2 +- arch/x86/include/asm/archrandom.h | 8 +- arch/x86/include/asm/bitops.h | 10 +- arch/x86/include/asm/compat.h | 1 + arch/x86/include/asm/cpufeature.h | 11 +- arch/x86/include/asm/cpufeatures.h | 538 +++++++++++---------- arch/x86/include/asm/desc.h | 11 +- arch/x86/include/asm/fixmap.h | 74 ++- arch/x86/include/asm/hypervisor.h | 53 +- arch/x86/include/asm/irqflags.h | 3 + arch/x86/include/asm/kdebug.h | 1 + arch/x86/include/asm/mmu_context.h | 4 +- arch/x86/include/asm/module.h | 2 +- arch/x86/include/asm/paravirt.h | 14 +- arch/x86/include/asm/paravirt_types.h | 2 +- arch/x86/include/asm/percpu.h | 2 +- arch/x86/include/asm/pgtable_types.h | 3 +- arch/x86/include/asm/processor.h | 109 +++-- arch/x86/include/asm/ptrace.h | 6 +- arch/x86/include/asm/rmwcc.h | 2 +- arch/x86/include/asm/stacktrace.h | 3 + arch/x86/include/asm/switch_to.h | 26 + arch/x86/include/asm/thread_info.h | 2 +- arch/x86/include/asm/trace/fpu.h | 10 - arch/x86/include/asm/traps.h | 21 +- arch/x86/include/asm/unwind.h | 15 +- arch/x86/include/asm/x86_init.h | 24 + arch/x86/include/uapi/asm/processor-flags.h | 3 + arch/x86/kernel/Makefile | 10 +- arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/x2apic_uv_x.c | 5 +- arch/x86/kernel/asm-offsets.c | 6 + arch/x86/kernel/asm-offsets_32.c | 9 +- arch/x86/kernel/asm-offsets_64.c | 4 + arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/amd.c | 7 +- arch/x86/kernel/cpu/common.c | 195 +++++--- arch/x86/kernel/cpu/cpuid-deps.c | 121 +++++ arch/x86/kernel/cpu/hypervisor.c | 64 +-- arch/x86/kernel/cpu/mshyperv.c | 6 +- arch/x86/kernel/cpu/vmware.c | 8 +- arch/x86/kernel/doublefault.c | 36 +- arch/x86/kernel/dumpstack.c | 74 ++- arch/x86/kernel/dumpstack_32.c | 6 + arch/x86/kernel/dumpstack_64.c | 6 + arch/x86/kernel/fpu/init.c | 11 + arch/x86/kernel/fpu/xstate.c | 43 +- arch/x86/kernel/head_32.S | 5 +- arch/x86/kernel/head_64.S | 45 +- arch/x86/kernel/ioport.c | 2 +- arch/x86/kernel/irq.c | 12 - arch/x86/kernel/irq_64.c | 4 +- arch/x86/kernel/kvm.c | 6 +- arch/x86/kernel/ldt.c | 2 +- arch/x86/kernel/paravirt_patch_64.c | 2 - arch/x86/kernel/process.c | 27 +- arch/x86/kernel/process_32.c | 8 +- arch/x86/kernel/process_64.c | 19 +- arch/x86/kernel/smpboot.c | 3 +- arch/x86/kernel/traps.c | 72 +-- arch/x86/kernel/unwind_orc.c | 88 ++-- arch/x86/kernel/verify_cpu.S | 3 +- arch/x86/kernel/vm86_32.c | 20 +- arch/x86/kernel/vmlinux.lds.S | 9 + arch/x86/kernel/x86_init.c | 9 + arch/x86/kvm/mmu.c | 4 +- arch/x86/kvm/vmx.c | 2 +- arch/x86/lib/delay.c | 4 +- arch/x86/mm/fault.c | 88 ++-- arch/x86/mm/init.c | 2 +- arch/x86/mm/init_64.c | 10 +- arch/x86/mm/kasan_init_64.c | 262 ++++++++-- arch/x86/power/cpu.c | 16 +- arch/x86/xen/enlighten_hvm.c | 12 +- arch/x86/xen/enlighten_pv.c | 15 +- arch/x86/xen/mmu_pv.c | 161 +++--- arch/x86/xen/smp_pv.c | 17 +- arch/x86/xen/xen-asm_64.S | 2 +- arch/x86/xen/xen-head.S | 11 +- block/bfq-iosched.c | 3 +- block/blk-wbt.c | 2 +- crypto/lrw.c | 6 +- drivers/acpi/apei/ghes.c | 78 +-- drivers/base/power/opp/core.c | 2 +- drivers/bluetooth/hci_bcm.c | 23 +- drivers/bluetooth/hci_ldisc.c | 7 + drivers/clk/sunxi-ng/ccu-sun5i.c | 4 +- drivers/clk/sunxi-ng/ccu-sun6i-a31.c | 2 +- drivers/clk/sunxi-ng/ccu_nm.c | 3 + drivers/cpuidle/cpuidle.c | 1 + drivers/crypto/amcc/crypto4xx_core.h | 10 +- drivers/gpu/drm/drm_dp_dual_mode_helper.c | 16 +- drivers/gpu/drm/vc4/vc4_dsi.c | 3 +- drivers/hv/vmbus_drv.c | 2 +- drivers/iio/accel/st_accel_core.c | 35 +- drivers/iio/common/st_sensors/st_sensors_core.c | 2 +- drivers/iio/common/st_sensors/st_sensors_trigger.c | 16 +- drivers/iio/gyro/st_gyro_core.c | 15 +- drivers/iio/magnetometer/st_magn_core.c | 10 +- drivers/iio/pressure/st_pressure_core.c | 15 +- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 5 + drivers/infiniband/sw/rxe/rxe_pool.c | 2 + drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c | 1 + .../infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c | 8 +- drivers/input/mouse/vmmouse.c | 10 +- drivers/leds/leds-pca955x.c | 17 +- drivers/md/dm-mpath.c | 20 +- drivers/md/md.c | 4 +- drivers/misc/pti.c | 2 +- drivers/misc/vmw_balloon.c | 2 +- drivers/net/ethernet/ibm/ibmvnic.c | 2 + drivers/net/ethernet/intel/fm10k/fm10k.h | 4 +- drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 12 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 16 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 7 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 9 +- drivers/net/ethernet/intel/igb/igb_main.c | 2 + drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 4 +- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 2 + drivers/net/phy/at803x.c | 2 +- drivers/pci/iov.c | 3 +- drivers/pci/pci.c | 4 + drivers/pci/pcie/aer/aerdrv_core.c | 9 +- drivers/platform/x86/asus-wireless.c | 1 + drivers/rtc/interface.c | 2 +- drivers/rtc/rtc-pl031.c | 14 +- drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 1 + drivers/scsi/lpfc/lpfc_hbadisc.c | 3 +- drivers/scsi/lpfc/lpfc_hw4.h | 2 +- drivers/scsi/lpfc/lpfc_nvmet.c | 2 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 + drivers/staging/greybus/light.c | 2 + drivers/tee/optee/core.c | 1 - drivers/thermal/hisi_thermal.c | 74 ++- drivers/vfio/pci/vfio_pci_config.c | 6 +- drivers/video/backlight/pwm_bl.c | 7 +- fs/dcache.c | 4 +- fs/overlayfs/ovl_entry.h | 2 +- fs/overlayfs/readdir.c | 2 +- include/asm-generic/vmlinux.lds.h | 2 +- include/linux/compiler.h | 1 + include/linux/hypervisor.h | 8 +- include/linux/iio/common/st_sensors.h | 7 +- include/linux/{pti.h => intel-pti.h} | 6 +- include/linux/mm.h | 2 +- include/linux/mmzone.h | 6 +- include/linux/rculist.h | 4 +- include/linux/rcupdate.h | 4 +- kernel/events/core.c | 4 +- kernel/seccomp.c | 2 +- kernel/task_work.c | 2 +- kernel/trace/trace_events_hist.c | 4 +- lib/Kconfig.debug | 2 +- mm/page_alloc.c | 10 + mm/slab.h | 2 +- mm/sparse.c | 17 +- net/ipv4/ip_gre.c | 8 +- net/ipv4/tcp_vegas.c | 2 +- net/ipv6/addrconf.c | 12 +- net/ipv6/route.c | 58 +-- net/sctp/stream.c | 8 +- scripts/Makefile.build | 2 +- sound/soc/codecs/msm8916-wcd-analog.c | 9 +- sound/soc/img/img-parallel-out.c | 2 + tools/objtool/check.c | 7 +- tools/objtool/objtool.c | 6 +- tools/testing/selftests/x86/ldt_gdt.c | 61 ++- virt/kvm/kvm_main.c | 2 +- 188 files changed, 2414 insertions(+), 1428 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uros Bizjak ubizjak@gmail.com
commit 3c52b5c64326d9dcfee4e10611c53ec1b1b20675 upstream.
There is no need for \n\t in front of CC_SET(), as the macro already includes these two.
Signed-off-by: Uros Bizjak ubizjak@gmail.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20170906151808.5634-1-ubizjak@gmail.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/archrandom.h | 8 ++++---- arch/x86/include/asm/bitops.h | 10 +++++----- arch/x86/include/asm/percpu.h | 2 +- arch/x86/include/asm/rmwcc.h | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-)
--- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -45,7 +45,7 @@ static inline bool rdrand_long(unsigned bool ok; unsigned int retry = RDRAND_RETRY_LOOPS; do { - asm volatile(RDRAND_LONG "\n\t" + asm volatile(RDRAND_LONG CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); if (ok) @@ -59,7 +59,7 @@ static inline bool rdrand_int(unsigned i bool ok; unsigned int retry = RDRAND_RETRY_LOOPS; do { - asm volatile(RDRAND_INT "\n\t" + asm volatile(RDRAND_INT CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); if (ok) @@ -71,7 +71,7 @@ static inline bool rdrand_int(unsigned i static inline bool rdseed_long(unsigned long *v) { bool ok; - asm volatile(RDSEED_LONG "\n\t" + asm volatile(RDSEED_LONG CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); return ok; @@ -80,7 +80,7 @@ static inline bool rdseed_long(unsigned static inline bool rdseed_int(unsigned int *v) { bool ok; - asm volatile(RDSEED_INT "\n\t" + asm volatile(RDSEED_INT CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); return ok; --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -143,7 +143,7 @@ static __always_inline void __clear_bit( static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) { bool negative; - asm volatile(LOCK_PREFIX "andb %2,%1\n\t" + asm volatile(LOCK_PREFIX "andb %2,%1" CC_SET(s) : CC_OUT(s) (negative), ADDR : "ir" ((char) ~(1 << nr)) : "memory"); @@ -246,7 +246,7 @@ static __always_inline bool __test_and_s { bool oldbit;
- asm("bts %2,%1\n\t" + asm("bts %2,%1" CC_SET(c) : CC_OUT(c) (oldbit), ADDR : "Ir" (nr)); @@ -286,7 +286,7 @@ static __always_inline bool __test_and_c { bool oldbit;
- asm volatile("btr %2,%1\n\t" + asm volatile("btr %2,%1" CC_SET(c) : CC_OUT(c) (oldbit), ADDR : "Ir" (nr)); @@ -298,7 +298,7 @@ static __always_inline bool __test_and_c { bool oldbit;
- asm volatile("btc %2,%1\n\t" + asm volatile("btc %2,%1" CC_SET(c) : CC_OUT(c) (oldbit), ADDR : "Ir" (nr) : "memory"); @@ -329,7 +329,7 @@ static __always_inline bool variable_tes { bool oldbit;
- asm volatile("bt %2,%1\n\t" + asm volatile("bt %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) : "m" (*(unsigned long *)addr), "Ir" (nr)); --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -526,7 +526,7 @@ static inline bool x86_this_cpu_variable { bool oldbit;
- asm volatile("bt "__percpu_arg(2)",%1\n\t" + asm volatile("bt "__percpu_arg(2)",%1" CC_SET(c) : CC_OUT(c) (oldbit) : "m" (*(unsigned long __percpu *)addr), "Ir" (nr)); --- a/arch/x86/include/asm/rmwcc.h +++ b/arch/x86/include/asm/rmwcc.h @@ -29,7 +29,7 @@ cc_label: \ #define __GEN_RMWcc(fullop, var, cc, clobbers, ...) \ do { \ bool c; \ - asm volatile (fullop ";" CC_SET(cc) \ + asm volatile (fullop CC_SET(cc) \ : [counter] "+m" (var), CC_OUT(cc) (c) \ : __VA_ARGS__ : clobbers); \ return c; \
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 00d96180dc38ef872ac471c2d3e14b067cbd895d upstream.
If asm code specifies an UNWIND_HINT_EMPTY hint, don't warn if the section ends unexpectedly. This can happen with the xen-head.S code because the hypercall_page is "text" but it's all zeros.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/ddafe199dd8797e40e3c2777373347eba1d65572.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/objtool/check.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1757,11 +1757,14 @@ static int validate_branch(struct objtoo if (insn->dead_end) return 0;
- insn = next_insn; - if (!insn) { + if (!next_insn) { + if (state.cfa.base == CFI_UNDEFINED) + return 0; WARN("%s: unexpected end of section", sec->name); return 1; } + + insn = next_insn; }
return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 17270717e80de33a884ad328fea5f407d87f6d6a upstream.
This comment is actively wrong and confusing. It refers to the registers' stack offsets after the pt_regs has been constructed on the stack, but this code is *before* that.
At this point the stack just has the standard iret frame, for which no comment should be needed.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/a3c267b770fc56c9b86df9c11c552848248aace2.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/head_64.S | 4 ---- 1 file changed, 4 deletions(-)
--- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -271,10 +271,6 @@ bad_address:
__INIT ENTRY(early_idt_handler_array) - # 104(%rsp) %rflags - # 96(%rsp) %cs - # 88(%rsp) %rip - # 80(%rsp) error code i = 0 .rept NUM_EXCEPTION_VECTORS .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit a8b88e84d124bc92c4808e72b8b8c0e0bb538630 upstream.
It's no longer possible for this code to be executed, so remove it.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/32a46fe92d2083700599b36872b26e7dfd7b7965.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/head_64.S | 3 --- 1 file changed, 3 deletions(-)
--- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -266,9 +266,6 @@ ENDPROC(start_cpu0) .quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS __FINITDATA
-bad_address: - jmp bad_address - __INIT ENTRY(early_idt_handler_array) i = 0
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 015a2ea5478680fc5216d56b7ff306f2a74efaf9 upstream.
These functions aren't callable C-type functions, so don't annotate them as such.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/36eb182738c28514f8bf95e403d89b6413a88883.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/head_64.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -235,7 +235,7 @@ ENTRY(secondary_startup_64) pushq %rax # target address in negative space lretq .Lafter_lret: -ENDPROC(secondary_startup_64) +END(secondary_startup_64)
#include "verify_cpu.S"
@@ -278,7 +278,7 @@ ENTRY(early_idt_handler_array) i = i + 1 .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc .endr -ENDPROC(early_idt_handler_array) +END(early_idt_handler_array)
early_idt_handler_common: /* @@ -321,7 +321,7 @@ early_idt_handler_common: 20: decl early_recursion_flag(%rip) jmp restore_regs_and_iret -ENDPROC(early_idt_handler_common) +END(early_idt_handler_common)
__INITDATA
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit e93db75a0054b23a874a12c63376753544f3fe9e upstream.
verify_cpu() is a callable function. Annotate it as such.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/293024b8a080832075312f38c07ccc970fc70292.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/verify_cpu.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/verify_cpu.S +++ b/arch/x86/kernel/verify_cpu.S @@ -33,7 +33,7 @@ #include <asm/cpufeatures.h> #include <asm/msr-index.h>
-verify_cpu: +ENTRY(verify_cpu) pushf # Save caller passed flags push $0 # Kill any dangerous flags popf @@ -139,3 +139,4 @@ verify_cpu: popf # Restore caller passed flags xorl %eax, %eax ret +ENDPROC(verify_cpu)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 2582d3df95c76d3b686453baf90b64d57e87d1e8 upstream.
Mark the ends of the startup_xen and hypercall_page code sections.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/3a80a394d30af43d9cefa1a29628c45ed8420c97.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/xen-head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -34,7 +34,7 @@ ENTRY(startup_xen) mov $init_thread_union+THREAD_SIZE, %_ASM_SP
jmp xen_start_kernel - +END(startup_xen) __FINIT #endif
@@ -48,7 +48,7 @@ ENTRY(hypercall_page) .type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32 #include <asm/xen-hypercalls.h> #undef HYPERCALL - +END(hypercall_page) .popsection
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux")
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit abbe1cac6214d81d2f4e149aba64a8760703144e upstream.
Add unwind hint annotations to the xen head code so the ORC unwinder can read head_64.o.
hypercall_page needs empty annotations at 32-byte intervals to match the 'xen_hypercall_*' ELF functions at those locations.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Jiri Slaby jslaby@suse.cz Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/70ed2eb516fe9266be766d953f93c2571bca88cc.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/xen-head.S | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -10,6 +10,7 @@ #include <asm/boot.h> #include <asm/asm.h> #include <asm/page_types.h> +#include <asm/unwind_hints.h>
#include <xen/interface/elfnote.h> #include <xen/interface/features.h> @@ -20,6 +21,7 @@ #ifdef CONFIG_XEN_PV __INIT ENTRY(startup_xen) + UNWIND_HINT_EMPTY cld
/* Clear .bss */ @@ -41,7 +43,10 @@ END(startup_xen) .pushsection .text .balign PAGE_SIZE ENTRY(hypercall_page) - .skip PAGE_SIZE + .rept (PAGE_SIZE / 32) + UNWIND_HINT_EMPTY + .skip 32 + .endr
#define HYPERCALL(n) \ .equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 2704fbb672d0d9a19414907fda7949283dcef6a1 upstream.
Jiri Slaby reported an ORC issue when unwinding from an idle task. The stack was:
ffffffff811083c2 do_idle+0x142/0x1e0 ffffffff8110861d cpu_startup_entry+0x5d/0x60 ffffffff82715f58 start_kernel+0x3ff/0x407 ffffffff827153e8 x86_64_start_kernel+0x14e/0x15d ffffffff810001bf secondary_startup_64+0x9f/0xa0
The ORC unwinder errored out at secondary_startup_64 because the head code isn't annotated yet so there wasn't a corresponding ORC entry.
Fix that and any other head-related unwinding issues by adding unwind hints to the head code.
Reported-by: Jiri Slaby jslaby@suse.cz Tested-by: Jiri Slaby jslaby@suse.cz Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/78ef000a2f68f545d6eef44ee912edceaad82ccf.1505764066... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/Makefile | 1 - arch/x86/kernel/head_64.S | 14 ++++++++++++-- 2 files changed, 12 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -27,7 +27,6 @@ KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o := n
-OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y OBJECT_FILES_NON_STANDARD_test_nx.o := y --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -50,6 +50,7 @@ L3_START_KERNEL = pud_index(__START_KERN .code64 .globl startup_64 startup_64: + UNWIND_HINT_EMPTY /* * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, * and someone has loaded an identity mapped page table @@ -89,6 +90,7 @@ startup_64: addq $(early_top_pgt - __START_KERNEL_map), %rax jmp 1f ENTRY(secondary_startup_64) + UNWIND_HINT_EMPTY /* * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, * and someone has loaded a mapped page table. @@ -133,6 +135,7 @@ ENTRY(secondary_startup_64) movq $1f, %rax jmp *%rax 1: + UNWIND_HINT_EMPTY
/* Check if nx is implemented */ movl $0x80000001, %eax @@ -247,6 +250,7 @@ END(secondary_startup_64) */ ENTRY(start_cpu0) movq initial_stack(%rip), %rsp + UNWIND_HINT_EMPTY jmp .Ljump_to_C_code ENDPROC(start_cpu0) #endif @@ -271,13 +275,18 @@ ENTRY(early_idt_handler_array) i = 0 .rept NUM_EXCEPTION_VECTORS .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 - pushq $0 # Dummy error code, to make stack frame uniform + UNWIND_HINT_IRET_REGS + pushq $0 # Dummy error code, to make stack frame uniform + .else + UNWIND_HINT_IRET_REGS offset=8 .endif pushq $i # 72(%rsp) Vector number jmp early_idt_handler_common + UNWIND_HINT_IRET_REGS i = i + 1 .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc .endr + UNWIND_HINT_IRET_REGS offset=16 END(early_idt_handler_array)
early_idt_handler_common: @@ -306,6 +315,7 @@ early_idt_handler_common: pushq %r13 /* pt_regs->r13 */ pushq %r14 /* pt_regs->r14 */ pushq %r15 /* pt_regs->r15 */ + UNWIND_HINT_REGS
cmpq $14,%rsi /* Page fault? */ jnz 10f @@ -428,7 +438,7 @@ ENTRY(phys_base) EXPORT_SYMBOL(phys_base)
#include "../../x86/xen/xen-head.S" - + __PAGE_ALIGNED_BSS NEXT_PAGE(empty_zero_page) .skip PAGE_SIZE
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich JBeulich@suse.com
commit 095f613c6b386a1704b73a549e9ba66c1d5381ae upstream.
Match up with what 7edda0886b ("acpi: apei: handle SEA notification type for ARMv8") did for ghes_ioremap_pfn_nmi().
Signed-off-by: Jan Beulich jbeulich@suse.com Reviewed-by: Borislav Petkov bp@suse.de Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/apei/ghes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -174,7 +174,8 @@ static void __iomem *ghes_ioremap_pfn_nm
static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr, paddr; + unsigned long vaddr; + phys_addr_t paddr; pgprot_t prot;
vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@kernel.org
commit 1e4078f0bba46ad61b69548abe6a6faf63b89380 upstream.
Increase testing coverage by turning on the primary x86 unwinder for the 64-bit defconfig.
Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/configs/x86_64_defconfig | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -299,6 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_RODATA_TEST is not set CONFIG_DEBUG_BOOT_PARAMS=y CONFIG_OPTIMIZE_INLINING=y +CONFIG_ORC_UNWINDER=y CONFIG_SECURITY=y CONFIG_SECURITY_NETWORK=y CONFIG_SECURITY_SELINUX=y
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 127a1bea40f7f2a36bc7207ea4d51bb6b4e936fa upstream.
Commit:
d1898b733619 ("x86/fpu: Add tracepoints to dump FPU state at key points")
... added the 'x86_fpu_state' and 'x86_fpu_deactivate_state' trace points, but never used them. Today they are still not used. As they take up and waste memory, remove them.
Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20171012180619.670b68b6@gandalf.local.home Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/trace/fpu.h | 10 ---------- 1 file changed, 10 deletions(-)
--- a/arch/x86/include/asm/trace/fpu.h +++ b/arch/x86/include/asm/trace/fpu.h @@ -34,11 +34,6 @@ DECLARE_EVENT_CLASS(x86_fpu, ) );
-DEFINE_EVENT(x86_fpu, x86_fpu_state, - TP_PROTO(struct fpu *fpu), - TP_ARGS(fpu) -); - DEFINE_EVENT(x86_fpu, x86_fpu_before_save, TP_PROTO(struct fpu *fpu), TP_ARGS(fpu) @@ -73,11 +68,6 @@ DEFINE_EVENT(x86_fpu, x86_fpu_activate_s TP_PROTO(struct fpu *fpu), TP_ARGS(fpu) ); - -DEFINE_EVENT(x86_fpu, x86_fpu_deactivate_state, - TP_PROTO(struct fpu *fpu), - TP_ARGS(fpu) -);
DEFINE_EVENT(x86_fpu, x86_fpu_init_state, TP_PROTO(struct fpu *fpu),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 11af847446ed0d131cf24d16a7ef3d5ea7a49554 upstream.
Rename the unwinder config options from:
CONFIG_ORC_UNWINDER CONFIG_FRAME_POINTER_UNWINDER CONFIG_GUESS_UNWINDER
to:
CONFIG_UNWINDER_ORC CONFIG_UNWINDER_FRAME_POINTER CONFIG_UNWINDER_GUESS
... in order to give them a more logical config namespace.
Suggested-by: Ingo Molnar mingo@kernel.org Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/x86/orc-unwinder.txt | 2 +- Makefile | 4 ++-- arch/x86/Kconfig | 2 +- arch/x86/Kconfig.debug | 10 +++++----- arch/x86/configs/tiny.config | 4 ++-- arch/x86/configs/x86_64_defconfig | 2 +- arch/x86/include/asm/module.h | 2 +- arch/x86/include/asm/unwind.h | 8 ++++---- arch/x86/kernel/Makefile | 6 +++--- include/asm-generic/vmlinux.lds.h | 2 +- lib/Kconfig.debug | 2 +- scripts/Makefile.build | 2 +- 12 files changed, 23 insertions(+), 23 deletions(-)
--- a/Documentation/x86/orc-unwinder.txt +++ b/Documentation/x86/orc-unwinder.txt @@ -4,7 +4,7 @@ ORC unwinder Overview --------
-The kernel CONFIG_ORC_UNWINDER option enables the ORC unwinder, which is +The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is similar in concept to a DWARF unwinder. The difference is that the format of the ORC data is much simpler than DWARF, which in turn allows the ORC unwinder to be much simpler and faster. --- a/Makefile +++ b/Makefile @@ -935,8 +935,8 @@ ifdef CONFIG_STACK_VALIDATION ifeq ($(has_libelf),1) objtool_target := tools/objtool FORCE else - ifdef CONFIG_ORC_UNWINDER - $(error "Cannot generate ORC metadata for CONFIG_ORC_UNWINDER=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") + ifdef CONFIG_UNWINDER_ORC + $(error "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") else $(warning "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") endif --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -171,7 +171,7 @@ config X86 select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION + select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION select HAVE_STACK_VALIDATION if X86_64 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -359,13 +359,13 @@ config PUNIT_ATOM_DEBUG
choice prompt "Choose kernel unwinder" - default FRAME_POINTER_UNWINDER + default UNWINDER_FRAME_POINTER ---help--- This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack, livepatch, lockdep, and more.
-config FRAME_POINTER_UNWINDER +config UNWINDER_FRAME_POINTER bool "Frame pointer unwinder" select FRAME_POINTER ---help--- @@ -380,7 +380,7 @@ config FRAME_POINTER_UNWINDER consistency model, as this is currently the only way to get a reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE).
-config ORC_UNWINDER +config UNWINDER_ORC bool "ORC unwinder" depends on X86_64 select STACK_VALIDATION @@ -396,7 +396,7 @@ config ORC_UNWINDER Enabling this option will increase the kernel's runtime memory usage by roughly 2-4MB, depending on your kernel config.
-config GUESS_UNWINDER +config UNWINDER_GUESS bool "Guess unwinder" depends on EXPERT ---help--- @@ -411,7 +411,7 @@ config GUESS_UNWINDER endchoice
config FRAME_POINTER - depends on !ORC_UNWINDER && !GUESS_UNWINDER + depends on !UNWINDER_ORC && !UNWINDER_GUESS bool
endmenu --- a/arch/x86/configs/tiny.config +++ b/arch/x86/configs/tiny.config @@ -1,5 +1,5 @@ CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set -CONFIG_GUESS_UNWINDER=y -# CONFIG_FRAME_POINTER_UNWINDER is not set +CONFIG_UNWINDER_GUESS=y +# CONFIG_UNWINDER_FRAME_POINTER is not set --- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -299,7 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_RODATA_TEST is not set CONFIG_DEBUG_BOOT_PARAMS=y CONFIG_OPTIMIZE_INLINING=y -CONFIG_ORC_UNWINDER=y +CONFIG_UNWINDER_ORC=y CONFIG_SECURITY=y CONFIG_SECURITY_NETWORK=y CONFIG_SECURITY_SELINUX=y --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -6,7 +6,7 @@ #include <asm/orc_types.h>
struct mod_arch_specific { -#ifdef CONFIG_ORC_UNWINDER +#ifdef CONFIG_UNWINDER_ORC unsigned int num_orcs; int *orc_unwind_ip; struct orc_entry *orc_unwind; --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -13,11 +13,11 @@ struct unwind_state { struct task_struct *task; int graph_idx; bool error; -#if defined(CONFIG_ORC_UNWINDER) +#if defined(CONFIG_UNWINDER_ORC) bool signal, full_regs; unsigned long sp, bp, ip; struct pt_regs *regs; -#elif defined(CONFIG_FRAME_POINTER_UNWINDER) +#elif defined(CONFIG_UNWINDER_FRAME_POINTER) bool got_irq; unsigned long *bp, *orig_sp, ip; struct pt_regs *regs; @@ -51,7 +51,7 @@ void unwind_start(struct unwind_state *s __unwind_start(state, task, regs, first_frame); }
-#if defined(CONFIG_ORC_UNWINDER) || defined(CONFIG_FRAME_POINTER_UNWINDER) +#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) { if (unwind_done(state)) @@ -66,7 +66,7 @@ static inline struct pt_regs *unwind_get } #endif
-#ifdef CONFIG_ORC_UNWINDER +#ifdef CONFIG_UNWINDER_ORC void unwind_init(void); void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size, void *orc, size_t orc_size); --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -127,9 +127,9 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs. obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
-obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o -obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o -obj-$(CONFIG_GUESS_UNWINDER) += unwind_guess.o +obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o +obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o +obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o
### # 64 bit specific files --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -688,7 +688,7 @@ #define BUG_TABLE #endif
-#ifdef CONFIG_ORC_UNWINDER +#ifdef CONFIG_UNWINDER_ORC #define ORC_UNWIND_TABLE \ . = ALIGN(4); \ .orc_unwind_ip : AT(ADDR(.orc_unwind_ip) - LOAD_OFFSET) { \ --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -376,7 +376,7 @@ config STACK_VALIDATION that runtime stack traces are more reliable.
This is also a prerequisite for generation of ORC unwind data, which - is needed for CONFIG_ORC_UNWINDER. + is needed for CONFIG_UNWINDER_ORC.
For more information, see tools/objtool/Documentation/stack-validation.txt. --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -259,7 +259,7 @@ ifneq ($(SKIP_STACK_VALIDATION),1)
__objtool_obj := $(objtree)/tools/objtool/objtool
-objtool_args = $(if $(CONFIG_ORC_UNWINDER),orc generate,check) +objtool_args = $(if $(CONFIG_UNWINDER_ORC),orc generate,check)
ifndef CONFIG_FRAME_POINTER objtool_args += --no-fp
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit fc72ae40e30327aa24eb88a24b9c7058f938bd36 upstream.
The ORC unwinder has been stable in testing so far. Give it much wider testing by making it the default in kconfig for x86_64. It's not yet supported for 32-bit, so leave frame pointers as the default there.
Suggested-by: Ingo Molnar mingo@kernel.org Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/9b1237bbe7244ed9cdf8db2dcb1253e37e1c341e.1507924831... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/Kconfig.debug | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-)
--- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -359,27 +359,13 @@ config PUNIT_ATOM_DEBUG
choice prompt "Choose kernel unwinder" - default UNWINDER_FRAME_POINTER + default UNWINDER_ORC if X86_64 + default UNWINDER_FRAME_POINTER if X86_32 ---help--- This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack, livepatch, lockdep, and more.
-config UNWINDER_FRAME_POINTER - bool "Frame pointer unwinder" - select FRAME_POINTER - ---help--- - This option enables the frame pointer unwinder for unwinding kernel - stack traces. - - The unwinder itself is fast and it uses less RAM than the ORC - unwinder, but the kernel text size will grow by ~3% and the kernel's - overall performance will degrade by roughly 5-10%. - - This option is recommended if you want to use the livepatch - consistency model, as this is currently the only way to get a - reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). - config UNWINDER_ORC bool "ORC unwinder" depends on X86_64 @@ -396,6 +382,21 @@ config UNWINDER_ORC Enabling this option will increase the kernel's runtime memory usage by roughly 2-4MB, depending on your kernel config.
+config UNWINDER_FRAME_POINTER + bool "Frame pointer unwinder" + select FRAME_POINTER + ---help--- + This option enables the frame pointer unwinder for unwinding kernel + stack traces. + + The unwinder itself is fast and it uses less RAM than the ORC + unwinder, but the kernel text size will grow by ~3% and the kernel's + overall performance will degrade by roughly 5-10%. + + This option is recommended if you want to use the livepatch + consistency model, as this is currently the only way to get a + reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). + config UNWINDER_GUESS bool "Guess unwinder" depends on EXPERT
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit cbe96375025e14fc76f9ed42ee5225120d7210f8 upstream.
Add two simple wrappers around set_bit/clear_bit() that accept the common case of an u32 array. This avoids writing casts in all callers.
Signed-off-by: Andi Kleen ak@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20171013215645.23166-2-andi@firstfloor.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bitops.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
--- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -228,6 +228,32 @@ static inline unsigned long __ffs64(u64 return __ffs((unsigned long)word); }
+/* + * clear_bit32 - Clear a bit in memory for u32 array + * @nr: Bit to clear + * @addr: u32 * address of bitmap + * + * Same as clear_bit, but avoids needing casts for u32 arrays. + */ + +static __always_inline void clear_bit32(long nr, volatile u32 *addr) +{ + clear_bit(nr, (volatile unsigned long *)addr); +} + +/* + * set_bit32 - Set a bit in memory for u32 array + * @nr: Bit to clear + * @addr: u32 * address of bitmap + * + * Same as set_bit, but avoids needing casts for u32 arrays. + */ + +static __always_inline void set_bit32(long nr, volatile u32 *addr) +{ + set_bit(nr, (volatile unsigned long *)addr); +} + #ifdef __KERNEL__
#ifndef set_mask_bits
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit 0b00de857a648dafe7020878c7a27cf776f5edf4 upstream.
Some CPUID features depend on other features. Currently it's possible to to clear dependent features, but not clear the base features, which can cause various interesting problems.
This patch implements a generic table to describe dependencies between CPUID features, to be used by all code that clears CPUID.
Some subsystems (like XSAVE) had an own implementation of this, but it's better to do it all in a single place for everyone.
Then clear_cpu_cap and setup_clear_cpu_cap always look up this table and clear all dependencies too.
This is intended to be a practical table: only for features that make sense to clear. If someone for example clears FPU, or other features that are essentially part of the required base feature set, not much is going to work. Handling that is right now out of scope. We're only handling features which can be usefully cleared.
Signed-off-by: Andi Kleen ak@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: Jonathan McDowell noodles@earth.li Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20171013215645.23166-3-andi@firstfloor.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeature.h | 9 +- arch/x86/include/asm/cpufeatures.h | 5 + arch/x86/kernel/cpu/Makefile | 1 arch/x86/kernel/cpu/cpuid-deps.c | 113 +++++++++++++++++++++++++++++++++++++ 4 files changed, 123 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -126,11 +126,10 @@ extern const char * const x86_bug_flags[ #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
#define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability)) -#define clear_cpu_cap(c, bit) clear_bit(bit, (unsigned long *)((c)->x86_capability)) -#define setup_clear_cpu_cap(bit) do { \ - clear_cpu_cap(&boot_cpu_data, bit); \ - set_bit(bit, (unsigned long *)cpu_caps_cleared); \ -} while (0) + +extern void setup_clear_cpu_cap(unsigned int bit); +extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); + #define setup_force_cpu_cap(bit) do { \ set_cpu_cap(&boot_cpu_data, bit); \ set_bit(bit, (unsigned long *)cpu_caps_set); \ --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -22,6 +22,11 @@ * this feature bit is not displayed in /proc/cpuinfo at all. */
+/* + * When adding new features here that depend on other features, + * please update the table in kernel/cpu/cpuid-deps.c + */ + /* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ #define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ #define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -23,6 +23,7 @@ obj-y += rdrand.o obj-y += match.o obj-y += bugs.o obj-$(CONFIG_CPU_FREQ) += aperfmperf.o +obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o --- /dev/null +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -0,0 +1,113 @@ +/* Declare dependencies between CPUIDs */ +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <asm/cpufeature.h> + +struct cpuid_dep { + unsigned int feature; + unsigned int depends; +}; + +/* + * Table of CPUID features that depend on others. + * + * This only includes dependencies that can be usefully disabled, not + * features part of the base set (like FPU). + * + * Note this all is not __init / __initdata because it can be + * called from cpu hotplug. It shouldn't do anything in this case, + * but it's difficult to tell that to the init reference checker. + */ +const static struct cpuid_dep cpuid_deps[] = { + { X86_FEATURE_XSAVEOPT, X86_FEATURE_XSAVE }, + { X86_FEATURE_XSAVEC, X86_FEATURE_XSAVE }, + { X86_FEATURE_XSAVES, X86_FEATURE_XSAVE }, + { X86_FEATURE_AVX, X86_FEATURE_XSAVE }, + { X86_FEATURE_PKU, X86_FEATURE_XSAVE }, + { X86_FEATURE_MPX, X86_FEATURE_XSAVE }, + { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE }, + { X86_FEATURE_FXSR_OPT, X86_FEATURE_FXSR }, + { X86_FEATURE_XMM, X86_FEATURE_FXSR }, + { X86_FEATURE_XMM2, X86_FEATURE_XMM }, + { X86_FEATURE_XMM3, X86_FEATURE_XMM2 }, + { X86_FEATURE_XMM4_1, X86_FEATURE_XMM2 }, + { X86_FEATURE_XMM4_2, X86_FEATURE_XMM2 }, + { X86_FEATURE_XMM3, X86_FEATURE_XMM2 }, + { X86_FEATURE_PCLMULQDQ, X86_FEATURE_XMM2 }, + { X86_FEATURE_SSSE3, X86_FEATURE_XMM2, }, + { X86_FEATURE_F16C, X86_FEATURE_XMM2, }, + { X86_FEATURE_AES, X86_FEATURE_XMM2 }, + { X86_FEATURE_SHA_NI, X86_FEATURE_XMM2 }, + { X86_FEATURE_FMA, X86_FEATURE_AVX }, + { X86_FEATURE_AVX2, X86_FEATURE_AVX, }, + { X86_FEATURE_AVX512F, X86_FEATURE_AVX, }, + { X86_FEATURE_AVX512IFMA, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512PF, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512ER, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512CD, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512DQ, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F }, + {} +}; + +static inline void __clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit) +{ + clear_bit32(bit, c->x86_capability); +} + +static inline void __setup_clear_cpu_cap(unsigned int bit) +{ + clear_cpu_cap(&boot_cpu_data, bit); + set_bit32(bit, cpu_caps_cleared); +} + +static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) +{ + if (!c) + __setup_clear_cpu_cap(feature); + else + __clear_cpu_cap(c, feature); +} + +static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) +{ + bool changed; + DECLARE_BITMAP(disable, NCAPINTS * sizeof(u32) * 8); + const struct cpuid_dep *d; + + clear_feature(c, feature); + + /* Collect all features to disable, handling dependencies */ + memset(disable, 0, sizeof(disable)); + __set_bit(feature, disable); + + /* Loop until we get a stable state. */ + do { + changed = false; + for (d = cpuid_deps; d->feature; d++) { + if (!test_bit(d->depends, disable)) + continue; + if (__test_and_set_bit(d->feature, disable)) + continue; + + changed = true; + clear_feature(c, d->feature); + } + } while (changed); +} + +void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) +{ + do_clear_cpu_cap(c, feature); +} + +void setup_clear_cpu_cap(unsigned int feature) +{ + do_clear_cpu_cap(NULL, feature); +}
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit 0c2a3913d6f50503f7c59d83a6219e39508cc898 upstream.
With a followon patch we want to make clearcpuid affect the XSAVE configuration. But xsave is currently initialized before arguments are parsed. Move the clearcpuid= parsing into the special early xsave argument parsing code.
Since clearcpuid= contains a = we need to keep the old __setup around as a dummy, otherwise it would end up as a environment variable in init's environment.
Signed-off-by: Andi Kleen ak@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20171013215645.23166-4-andi@firstfloor.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/common.c | 16 +++++++--------- arch/x86/kernel/fpu/init.c | 11 +++++++++++ 2 files changed, 18 insertions(+), 9 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1301,18 +1301,16 @@ void print_cpu_info(struct cpuinfo_x86 * pr_cont(")\n"); }
-static __init int setup_disablecpuid(char *arg) +/* + * clearcpuid= was already parsed in fpu__init_parse_early_param. + * But we need to keep a dummy __setup around otherwise it would + * show up as an environment variable for init. + */ +static __init int setup_clearcpuid(char *arg) { - int bit; - - if (get_option(&arg, &bit) && bit >= 0 && bit < NCAPINTS * 32) - setup_clear_cpu_cap(bit); - else - return 0; - return 1; } -__setup("clearcpuid=", setup_disablecpuid); +__setup("clearcpuid=", setup_clearcpuid);
#ifdef CONFIG_X86_64 DEFINE_PER_CPU_FIRST(union irq_stack_union, --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -249,6 +249,10 @@ static void __init fpu__init_system_ctx_ */ static void __init fpu__init_parse_early_param(void) { + char arg[32]; + char *argptr = arg; + int bit; + if (cmdline_find_option_bool(boot_command_line, "no387")) setup_clear_cpu_cap(X86_FEATURE_FPU);
@@ -266,6 +270,13 @@ static void __init fpu__init_parse_early
if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + + if (cmdline_find_option(boot_command_line, "clearcpuid", arg, + sizeof(arg)) && + get_option(&argptr, &bit) && + bit >= 0 && + bit < NCAPINTS * 32) + setup_clear_cpu_cap(bit); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit ccb18db2ab9d923df07e7495123fe5fb02329713 upstream.
Before enabling XSAVE, not only check the XSAVE specific CPUID bits, but also the base CPUID features of the respective XSAVE feature. This allows to disable individual XSAVE states using the existing clearcpuid= option, which can be useful for performance testing and debugging, and also in general avoids inconsistencies.
Signed-off-by: Andi Kleen ak@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20171013215645.23166-5-andi@firstfloor.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/fpu/xstate.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
--- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -15,6 +15,7 @@ #include <asm/fpu/xstate.h>
#include <asm/tlbflush.h> +#include <asm/cpufeature.h>
/* * Although we spell it out in here, the Processor Trace @@ -36,6 +37,19 @@ static const char *xfeature_names[] = "unknown xstate feature" , };
+static short xsave_cpuid_features[] __initdata = { + X86_FEATURE_FPU, + X86_FEATURE_XMM, + X86_FEATURE_AVX, + X86_FEATURE_MPX, + X86_FEATURE_MPX, + X86_FEATURE_AVX512F, + X86_FEATURE_AVX512F, + X86_FEATURE_AVX512F, + X86_FEATURE_INTEL_PT, + X86_FEATURE_PKU, +}; + /* * Mask of xstate features supported by the CPU and the kernel: */ @@ -726,6 +740,7 @@ void __init fpu__init_system_xstate(void unsigned int eax, ebx, ecx, edx; static int on_boot_cpu __initdata = 1; int err; + int i;
WARN_ON_FPU(!on_boot_cpu); on_boot_cpu = 0; @@ -759,6 +774,14 @@ void __init fpu__init_system_xstate(void goto out_disable; }
+ /* + * Clear XSAVE features that are disabled in the normal CPUID. + */ + for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { + if (!boot_cpu_has(xsave_cpuid_features[i])) + xfeatures_mask &= ~BIT(i); + } + xfeatures_mask &= fpu__get_supported_xfeatures_mask();
/* Enable xstate instructions to be able to continue with initialization: */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit 73e3a7d2a7c3be29a5a22b85026f6cfa5664267f upstream.
Clearing a CPU feature with setup_clear_cpu_cap() clears all features which depend on it. Expressing feature dependencies in one place is easier to maintain than keeping functions like fpu__xstate_clear_all_cpu_caps() up to date.
The features which depend on XSAVE have their dependency expressed in the dependency table, so its sufficient to clear X86_FEATURE_XSAVE.
Remove the explicit clearing of XSAVE dependent features.
Signed-off-by: Andi Kleen ak@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20171013215645.23166-6-andi@firstfloor.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/fpu/xstate.c | 20 -------------------- 1 file changed, 20 deletions(-)
--- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -73,26 +73,6 @@ unsigned int fpu_user_xstate_size; void fpu__xstate_clear_all_cpu_caps(void) { setup_clear_cpu_cap(X86_FEATURE_XSAVE); - setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT); - setup_clear_cpu_cap(X86_FEATURE_XSAVEC); - setup_clear_cpu_cap(X86_FEATURE_XSAVES); - setup_clear_cpu_cap(X86_FEATURE_AVX); - setup_clear_cpu_cap(X86_FEATURE_AVX2); - setup_clear_cpu_cap(X86_FEATURE_AVX512F); - setup_clear_cpu_cap(X86_FEATURE_AVX512IFMA); - setup_clear_cpu_cap(X86_FEATURE_AVX512PF); - setup_clear_cpu_cap(X86_FEATURE_AVX512ER); - setup_clear_cpu_cap(X86_FEATURE_AVX512CD); - setup_clear_cpu_cap(X86_FEATURE_AVX512DQ); - setup_clear_cpu_cap(X86_FEATURE_AVX512BW); - setup_clear_cpu_cap(X86_FEATURE_AVX512VL); - setup_clear_cpu_cap(X86_FEATURE_MPX); - setup_clear_cpu_cap(X86_FEATURE_XGETBV1); - setup_clear_cpu_cap(X86_FEATURE_AVX512VBMI); - setup_clear_cpu_cap(X86_FEATURE_PKU); - setup_clear_cpu_cap(X86_FEATURE_AVX512_4VNNIW); - setup_clear_cpu_cap(X86_FEATURE_AVX512_4FMAPS); - setup_clear_cpu_cap(X86_FEATURE_AVX512_VPOPCNTDQ); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
commit 376f3bcebdc999cc737d9052109cc33b573b3a8b upstream.
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly.
Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Dimitri Sivanich sivanich@hpe.com Cc: Russ Anderson rja@hpe.com Cc: Mike Travis mike.travis@hpe.com Link: https://lkml.kernel.org/r/20171016232231.GA100493@beast Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/apic/x2apic_uv_x.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -920,9 +920,8 @@ static __init void uv_rtc_init(void) /* * percpu heartbeat timer */ -static void uv_heartbeat(unsigned long ignored) +static void uv_heartbeat(struct timer_list *timer) { - struct timer_list *timer = &uv_scir_info->timer; unsigned char bits = uv_scir_info->state;
/* Flip heartbeat bit: */ @@ -947,7 +946,7 @@ static int uv_heartbeat_enable(unsigned struct timer_list *timer = &uv_cpu_scir_info(cpu)->timer;
uv_set_cpu_scir_bits(cpu, SCIR_CPU_HEARTBEAT|SCIR_CPU_ACTIVITY); - setup_pinned_timer(timer, uv_heartbeat, cpu); + timer_setup(timer, uv_heartbeat, TIMER_PINNED); timer->expires = jiffies + SCIR_CPU_HB_INTERVAL; add_timer_on(timer, cpu); uv_cpu_scir_info(cpu)->enabled = 1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kamalesh Babulal kamalesh@linux.vnet.ibm.com
commit 6a93bb7e4a7d6670677d5b0eb980936eb9cc5d2e upstream.
Print top-level objtool commands, along with the error on incorrect command line usage. Objtool command line parser exit's with code 129, for incorrect usage. Convert the cmd_usage() exit code also, to maintain consistency across objtool.
After the patch:
$ ./objtool -j
Unknown option: -j
usage: objtool COMMAND [ARGS]
Commands: check Perform stack metadata validation on an object file orc Generate in-place ORC unwind tables for an object file
$ echo $? 129
Signed-off-by: Kamalesh Babulal kamalesh@linux.vnet.ibm.com Acked-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/1507992474-16142-1-git-send-email-kamalesh@linux.vn... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/objtool/objtool.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/tools/objtool/objtool.c +++ b/tools/objtool/objtool.c @@ -70,7 +70,7 @@ static void cmd_usage(void)
printf("\n");
- exit(1); + exit(129); }
static void handle_options(int *argc, const char ***argv) @@ -86,9 +86,7 @@ static void handle_options(int *argc, co break; } else { fprintf(stderr, "Unknown option: %s\n", cmd); - fprintf(stderr, "\n Usage: %s\n", - objtool_usage_string); - exit(1); + cmd_usage(); }
(*argv)++;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 57b8b1a1856adaa849d02d547411a553a531022b upstream.
do_clear_cpu_cap() allocates a bitmap to keep track of disabled feature dependencies. That bitmap is sized NCAPINTS * BITS_PER_INIT. The possible 'features' which can be handed in are larger than this, because after the capabilities the bug 'feature' bits occupy another 32bit. Not really obvious...
So clearing any of the misfeature bits, as 32bit does for the F00F bug, accesses that bitmap out of bounds thereby corrupting the stack.
Size the bitmap proper and add a sanity check to catch accidental out of bound access.
Fixes: 0b00de857a64 ("x86/cpuid: Add generic table for CPUID dependencies") Reported-by: kernel test robot xiaolong.ye@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Andi Kleen ak@linux.intel.com Cc: Borislav Petkov bp@alien8.de Link: https://lkml.kernel.org/r/20171018022023.GA12058@yexl-desktop Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/cpuid-deps.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -75,11 +75,17 @@ static inline void clear_feature(struct __clear_cpu_cap(c, feature); }
+/* Take the capabilities and the BUG bits into account */ +#define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8) + static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) { - bool changed; - DECLARE_BITMAP(disable, NCAPINTS * sizeof(u32) * 8); + DECLARE_BITMAP(disable, MAX_FEATURE_BITS); const struct cpuid_dep *d; + bool changed; + + if (WARN_ON(feature >= MAX_FEATURE_BITS)) + return;
clear_feature(c, feature);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mmzone.h | 6 +++++- mm/page_alloc.c | 10 ++++++++++ mm/sparse.c | 17 +++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1152,13 +1152,17 @@ struct mem_section { #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
#ifdef CONFIG_SPARSEMEM_EXTREME -extern struct mem_section *mem_section[NR_SECTION_ROOTS]; +extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif
static inline struct mem_section *__nr_to_section(unsigned long nr) { +#ifdef CONFIG_SPARSEMEM_EXTREME + if (!mem_section) + return NULL; +#endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,6 +5651,16 @@ void __init sparse_memory_present_with_a unsigned long start_pfn, end_pfn; int i, this_nid;
+#ifdef CONFIG_SPARSEMEM_EXTREME + if (!mem_section) { + unsigned long size, align; + + size = sizeof(struct mem_section) * NR_SECTION_ROOTS; + align = 1 << (INTERNODE_CACHE_SHIFT); + mem_section = memblock_virt_alloc(size, align); + } +#endif + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn); } --- a/mm/sparse.c +++ b/mm/sparse.c @@ -23,8 +23,7 @@ * 1) mem_section - memory sections, mem_map's for valid memory */ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section *mem_section[NR_SECTION_ROOTS] - ____cacheline_internodealigned_in_smp; +struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; @@ -101,7 +100,7 @@ static inline int sparse_index_init(unsi int __section_nr(struct mem_section* ms) { unsigned long root_nr; - struct mem_section* root; + struct mem_section *root = NULL;
for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { root = __nr_to_section(root_nr * SECTIONS_PER_ROOT); @@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) break; }
- VM_BUG_ON(root_nr == NR_SECTION_ROOTS); + VM_BUG_ON(!root);
return (root_nr * SECTIONS_PER_ROOT) + (ms - root); } @@ -330,11 +329,17 @@ again: static void __init check_usemap_section_nr(int nid, unsigned long *usemap) { unsigned long usemap_snr, pgdat_snr; - static unsigned long old_usemap_snr = NR_MEM_SECTIONS; - static unsigned long old_pgdat_snr = NR_MEM_SECTIONS; + static unsigned long old_usemap_snr; + static unsigned long old_pgdat_snr; struct pglist_data *pgdat = NODE_DATA(nid); int usemap_nid;
+ /* First call */ + if (!old_usemap_snr) { + old_usemap_snr = NR_MEM_SECTIONS; + old_pgdat_snr = NR_MEM_SECTIONS; + } + usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr)
On Fri, Dec 22, 2017 at 09:45:08AM +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This patch causes a boot failure on arm64.
Please drop this patch, or pick up the fix in:
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
See https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1527427.html
include/linux/mmzone.h | 6 +++++- mm/page_alloc.c | 10 ++++++++++ mm/sparse.c | 17 +++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1152,13 +1152,17 @@ struct mem_section { #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) #ifdef CONFIG_SPARSEMEM_EXTREME -extern struct mem_section *mem_section[NR_SECTION_ROOTS]; +extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif static inline struct mem_section *__nr_to_section(unsigned long nr) { +#ifdef CONFIG_SPARSEMEM_EXTREME
- if (!mem_section)
return NULL;
+#endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,6 +5651,16 @@ void __init sparse_memory_present_with_a unsigned long start_pfn, end_pfn; int i, this_nid; +#ifdef CONFIG_SPARSEMEM_EXTREME
- if (!mem_section) {
unsigned long size, align;
size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
- }
+#endif
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn);
} --- a/mm/sparse.c +++ b/mm/sparse.c @@ -23,8 +23,7 @@
- mem_section - memory sections, mem_map's for valid memory
*/ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section *mem_section[NR_SECTION_ROOTS]
- ____cacheline_internodealigned_in_smp;
+struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; @@ -101,7 +100,7 @@ static inline int sparse_index_init(unsi int __section_nr(struct mem_section* ms) { unsigned long root_nr;
- struct mem_section* root;
- struct mem_section *root = NULL;
for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { root = __nr_to_section(root_nr * SECTIONS_PER_ROOT); @@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) break; }
- VM_BUG_ON(root_nr == NR_SECTION_ROOTS);
- VM_BUG_ON(!root);
return (root_nr * SECTIONS_PER_ROOT) + (ms - root); } @@ -330,11 +329,17 @@ again: static void __init check_usemap_section_nr(int nid, unsigned long *usemap) { unsigned long usemap_snr, pgdat_snr;
- static unsigned long old_usemap_snr = NR_MEM_SECTIONS;
- static unsigned long old_pgdat_snr = NR_MEM_SECTIONS;
- static unsigned long old_usemap_snr;
- static unsigned long old_pgdat_snr; struct pglist_data *pgdat = NODE_DATA(nid); int usemap_nid;
- /* First call */
- if (!old_usemap_snr) {
old_usemap_snr = NR_MEM_SECTIONS;
old_pgdat_snr = NR_MEM_SECTIONS;
- }
- usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr)
On 22 December 2017 at 19:48, Dan Rue dan.rue@linaro.org wrote:
On Fri, Dec 22, 2017 at 09:45:08AM +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This patch causes a boot failure on arm64.
Please drop this patch, or pick up the fix in:
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Tue Nov 7 11:33:37 2017 +0300 mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
See https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1527427.html
+1. Boot failed on arm64 without 629a359b mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
Boot Error log: -------------------- [ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 0.000000] Mem abort info: [ 0.000000] Exception class = DABT (current EL), IL = 32 bits [ 0.000000] SET = 0, FnV = 0 [ 0.000000] EA = 0, S1PTW = 0 [ 0.000000] Data abort info: [ 0.000000] ISV = 0, ISS = 0x00000004 [ 0.000000] CM = 0, WnR = 0 [ 0.000000] [0000000000000000] user address but active_mm is swapper [ 0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.9-rc1 #1 [ 0.000000] Hardware name: ARM Juno development board (r2) (DT) [ 0.000000] task: ffff0000091d9380 task.stack: ffff0000091c0000 [ 0.000000] PC is at memory_present+0x64/0xf4 [ 0.000000] LR is at memory_present+0x38/0xf4 [ 0.000000] pc : [<ffff0000090a1f54>] lr : [<ffff0000090a1f28>] pstate: 800000c5 [ 0.000000] sp : ffff0000091c3e80
More information, https://pastebin.com/KambxUwb
- Naresh
include/linux/mmzone.h | 6 +++++- mm/page_alloc.c | 10 ++++++++++ mm/sparse.c | 17 +++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1152,13 +1152,17 @@ struct mem_section { #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
#ifdef CONFIG_SPARSEMEM_EXTREME -extern struct mem_section *mem_section[NR_SECTION_ROOTS]; +extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif
static inline struct mem_section *__nr_to_section(unsigned long nr) { +#ifdef CONFIG_SPARSEMEM_EXTREME
if (!mem_section)
return NULL;
+#endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,6 +5651,16 @@ void __init sparse_memory_present_with_a unsigned long start_pfn, end_pfn; int i, this_nid;
+#ifdef CONFIG_SPARSEMEM_EXTREME
if (!mem_section) {
unsigned long size, align;
size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
}
+#endif
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn);
} --- a/mm/sparse.c +++ b/mm/sparse.c @@ -23,8 +23,7 @@
- mem_section - memory sections, mem_map's for valid memory
*/ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section *mem_section[NR_SECTION_ROOTS]
____cacheline_internodealigned_in_smp;
+struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; @@ -101,7 +100,7 @@ static inline int sparse_index_init(unsi int __section_nr(struct mem_section* ms) { unsigned long root_nr;
struct mem_section* root;
struct mem_section *root = NULL; for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { root = __nr_to_section(root_nr * SECTIONS_PER_ROOT);
@@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) break; }
VM_BUG_ON(root_nr == NR_SECTION_ROOTS);
VM_BUG_ON(!root); return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
} @@ -330,11 +329,17 @@ again: static void __init check_usemap_section_nr(int nid, unsigned long *usemap) { unsigned long usemap_snr, pgdat_snr;
static unsigned long old_usemap_snr = NR_MEM_SECTIONS;
static unsigned long old_pgdat_snr = NR_MEM_SECTIONS;
static unsigned long old_usemap_snr;
static unsigned long old_pgdat_snr; struct pglist_data *pgdat = NODE_DATA(nid); int usemap_nid;
/* First call */
if (!old_usemap_snr) {
old_usemap_snr = NR_MEM_SECTIONS;
old_pgdat_snr = NR_MEM_SECTIONS;
}
usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr)
On Fri, Dec 22, 2017 at 08:22:09PM +0530, Naresh Kamboju wrote:
On 22 December 2017 at 19:48, Dan Rue dan.rue@linaro.org wrote:
On Fri, Dec 22, 2017 at 09:45:08AM +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This patch causes a boot failure on arm64.
Please drop this patch, or pick up the fix in:
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Tue Nov 7 11:33:37 2017 +0300 mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
See https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1527427.html
+1. Boot failed on arm64 without 629a359b mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
Boot Error log:
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 0.000000] Mem abort info: [ 0.000000] Exception class = DABT (current EL), IL = 32 bits [ 0.000000] SET = 0, FnV = 0 [ 0.000000] EA = 0, S1PTW = 0 [ 0.000000] Data abort info: [ 0.000000] ISV = 0, ISS = 0x00000004 [ 0.000000] CM = 0, WnR = 0 [ 0.000000] [0000000000000000] user address but active_mm is swapper [ 0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.9-rc1 #1 [ 0.000000] Hardware name: ARM Juno development board (r2) (DT) [ 0.000000] task: ffff0000091d9380 task.stack: ffff0000091c0000 [ 0.000000] PC is at memory_present+0x64/0xf4 [ 0.000000] LR is at memory_present+0x38/0xf4 [ 0.000000] pc : [<ffff0000090a1f54>] lr : [<ffff0000090a1f28>] pstate: 800000c5 [ 0.000000] sp : ffff0000091c3e80
More information, https://pastebin.com/KambxUwb
-rc2 is out with the fix, hopefully that survives longer :)
thanks,
greg k-h
On Fri, Dec 22, 2017 at 08:18:10AM -0600, Dan Rue wrote:
On Fri, Dec 22, 2017 at 09:45:08AM +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This patch causes a boot failure on arm64.
Please drop this patch, or pick up the fix in:
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Tue Nov 7 11:33:37 2017 +0300 mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y
See https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1527427.html
Now added, thanks.
greg k-h
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
include/linux/mmzone.h | 6 +++++- mm/page_alloc.c | 10 ++++++++++ mm/sparse.c | 17 +++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1152,13 +1152,17 @@ struct mem_section { #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) #ifdef CONFIG_SPARSEMEM_EXTREME -extern struct mem_section *mem_section[NR_SECTION_ROOTS]; +extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif static inline struct mem_section *__nr_to_section(unsigned long nr) { +#ifdef CONFIG_SPARSEMEM_EXTREME
- if (!mem_section)
return NULL;
+#endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,6 +5651,16 @@ void __init sparse_memory_present_with_a unsigned long start_pfn, end_pfn; int i, this_nid; +#ifdef CONFIG_SPARSEMEM_EXTREME
- if (!mem_section) {
unsigned long size, align;
size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
- }
+#endif
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn);
} --- a/mm/sparse.c +++ b/mm/sparse.c @@ -23,8 +23,7 @@
- mem_section - memory sections, mem_map's for valid memory
*/ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section *mem_section[NR_SECTION_ROOTS]
- ____cacheline_internodealigned_in_smp;
+struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; @@ -101,7 +100,7 @@ static inline int sparse_index_init(unsi int __section_nr(struct mem_section* ms) { unsigned long root_nr;
- struct mem_section* root;
- struct mem_section *root = NULL;
for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { root = __nr_to_section(root_nr * SECTIONS_PER_ROOT); @@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) break; }
- VM_BUG_ON(root_nr == NR_SECTION_ROOTS);
- VM_BUG_ON(!root);
return (root_nr * SECTIONS_PER_ROOT) + (ms - root); } @@ -330,11 +329,17 @@ again: static void __init check_usemap_section_nr(int nid, unsigned long *usemap) { unsigned long usemap_snr, pgdat_snr;
- static unsigned long old_usemap_snr = NR_MEM_SECTIONS;
- static unsigned long old_pgdat_snr = NR_MEM_SECTIONS;
- static unsigned long old_usemap_snr;
- static unsigned long old_pgdat_snr; struct pglist_data *pgdat = NODE_DATA(nid); int usemap_nid;
- /* First call */
- if (!old_usemap_snr) {
old_usemap_snr = NR_MEM_SECTIONS;
old_pgdat_snr = NR_MEM_SECTIONS;
- }
- usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr)
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
thanks,
greg k-h
On Sun, 2018-01-07 at 10:11 +0100, Greg Kroah-Hartman wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
Yeah, it's belly up. Â By its very nature, it's gonna get dinged up regularly. Â I only mentioned it because it's not expected that stuff gets dinged up retroactively.
-Mike
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
On Sun, Jan 07, 2018 at 11:18:47AM +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
Good, that patch is queued up for the next 4.14-stable release in a few days.
thanks,
greg k-h
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
-Mike
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
On Sun, Jan 07, 2018 at 02:23:09PM +0100, Michal Hocko wrote:
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
It was part of the prep for the KTPI code from what I can tell. If you think it should be reverted, just let me know and I'll be glad to do so.
thanks,
greg k-h
On Mon, 2018-01-08 at 08:53 +0100, Greg Kroah-Hartman wrote:
On Sun, Jan 07, 2018 at 02:23:09PM +0100, Michal Hocko wrote:
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote: > 4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
It was part of the prep for the KTPI code from what I can tell. If you think it should be reverted, just let me know and I'll be glad to do so.
No preference here. Â I have to patch master regardless if I want kdump to work while I patiently wait for userspace to get fixed up (either that or use time I don't have to go fix it up myself).
-Mike
On Mon, Jan 08, 2018 at 09:15:33AM +0100, Mike Galbraith wrote:
On Mon, 2018-01-08 at 08:53 +0100, Greg Kroah-Hartman wrote:
On Sun, Jan 07, 2018 at 02:23:09PM +0100, Michal Hocko wrote:
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote: > On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote: > > 4.14-stable review patch. If anyone has any objections, please let me know. > > FYI, this broke kdump, or rather the makedumpfile part thereof. > Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
It was part of the prep for the KTPI code from what I can tell. If you think it should be reverted, just let me know and I'll be glad to do so.
No preference here. Â I have to patch master regardless if I want kdump to work while I patiently wait for userspace to get fixed up (either that or use time I don't have to go fix it up myself).
I'll stay "bug compatible" for the time being. If you do fix this up, can you add a cc: stable tag in your patch so I can pick it up when it gets merged?
thanks,
greg k-h
On Mon, 2018-01-08 at 09:33 +0100, Greg Kroah-Hartman wrote:
On Mon, Jan 08, 2018 at 09:15:33AM +0100, Mike Galbraith wrote:
It was part of the prep for the KTPI code from what I can tell. If you think it should be reverted, just let me know and I'll be glad to do so.
No preference here. Â I have to patch master regardless if I want kdump to work while I patiently wait for userspace to get fixed up (either that or use time I don't have to go fix it up myself).
I'll stay "bug compatible" for the time being. If you do fix this up, can you add a cc: stable tag in your patch so I can pick it up when it gets merged?
Userspace (makedumpfile) will have to adapt, not the kernel. Meanwhile I carry reverts, making kernels, kdump and myself all happy campers.
-Mike
On Mon 08-01-18 08:53:08, Greg KH wrote:
On Sun, Jan 07, 2018 at 02:23:09PM +0100, Michal Hocko wrote:
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote: > 4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
It was part of the prep for the KTPI code from what I can tell.
I do not see a direct relation, to be honest. It is more related to 5-level page tables but I might be missing some subtle relation.
If you think it should be reverted, just let me know and I'll be glad to do so.
This seems to be affecting Linus tree as well so it needs to get resolved. I would suggest reverting in stable for the mean time. If you really need it in the stable tree then you can pull it back later with all the follow up fixes.
On Mon, Jan 08, 2018 at 09:47:23AM +0100, Michal Hocko wrote:
On Mon 08-01-18 08:53:08, Greg KH wrote:
On Sun, Jan 07, 2018 at 02:23:09PM +0100, Michal Hocko wrote:
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote:
On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote: > On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote: > > 4.14-stable review patch. If anyone has any objections, please let me know. > > FYI, this broke kdump, or rather the makedumpfile part thereof. > Â Forward looking wreckage is par for the kdump course, but...
Is it also broken in Linus's tree with this patch? Or is there an add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
It was part of the prep for the KTPI code from what I can tell.
I do not see a direct relation, to be honest. It is more related to 5-level page tables but I might be missing some subtle relation.
If you think it should be reverted, just let me know and I'll be glad to do so.
This seems to be affecting Linus tree as well so it needs to get resolved. I would suggest reverting in stable for the mean time. If you really need it in the stable tree then you can pull it back later with all the follow up fixes.
Ok, I've now reverted it, thanks.
greg k-h
On Mon, Jan 08, 2018 at 10:10:44AM +0100, Greg Kroah-Hartman wrote:
On Mon, Jan 08, 2018 at 09:47:23AM +0100, Michal Hocko wrote:
On Mon 08-01-18 08:53:08, Greg KH wrote:
On Sun, Jan 07, 2018 at 02:23:09PM +0100, Michal Hocko wrote:
On Sun 07-01-18 13:44:02, Mike Galbraith wrote:
On Sun, 2018-01-07 at 11:18 +0100, Michal Hocko wrote:
On Sun 07-01-18 10:11:15, Greg KH wrote: > On Sun, Jan 07, 2018 at 06:14:22AM +0100, Mike Galbraith wrote: > > On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote: > > > 4.14-stable review patch. If anyone has any objections, please let me know. > > > > FYI, this broke kdump, or rather the makedumpfile part thereof. > > Â Forward looking wreckage is par for the kdump course, but... > > Is it also broken in Linus's tree with this patch? Or is there an > add-on patch that I should apply to 4.14 to resolve this issue there?
This one http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com I guess.
That won't unbreak kdump, else master wouldn't be broken. Â I don't care deeply, or know if anyone else does, I'm just reporting it because I met it and chased it down.
OK, I didn't notice that d8cfbbfa0f7 ("mm/sparse.c: wrong allocation for mem_section") made it in after rc6. I am still wondering why 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") made it into the stable tree in the first place.
It was part of the prep for the KTPI code from what I can tell.
I do not see a direct relation, to be honest. It is more related to 5-level page tables but I might be missing some subtle relation.
If you think it should be reverted, just let me know and I'll be glad to do so.
This seems to be affecting Linus tree as well so it needs to get resolved. I would suggest reverting in stable for the mean time. If you really need it in the stable tree then you can pull it back later with all the follow up fixes.
Ok, I've now reverted it, thanks.
Nope, it breaks the build when reverted, I'm dropping that revert now.
It's as if the x86 maintainers actually knew what they were doing in asking for this to be backported :)
thanks,
greg k-h
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
Thanks,
Ingo
* Mike Galbraith efault@gmx.de wrote:
On Fri, 2017-12-22 at 09:45 +0100, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this broke kdump, or rather the makedumpfile part thereof. Â Forward looking wreckage is par for the kdump course, but...
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream.
Size of the mem_section[] array depends on the size of the physical address space.
In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode.
The patch allocates the array on the first call to sparse_memory_present_with_active_regions().
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
include/linux/mmzone.h | 6 +++++- mm/page_alloc.c | 10 ++++++++++ mm/sparse.c | 17 +++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1152,13 +1152,17 @@ struct mem_section { #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) #ifdef CONFIG_SPARSEMEM_EXTREME -extern struct mem_section *mem_section[NR_SECTION_ROOTS]; +extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif static inline struct mem_section *__nr_to_section(unsigned long nr) { +#ifdef CONFIG_SPARSEMEM_EXTREME
- if (!mem_section)
return NULL;
+#endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,6 +5651,16 @@ void __init sparse_memory_present_with_a unsigned long start_pfn, end_pfn; int i, this_nid; +#ifdef CONFIG_SPARSEMEM_EXTREME
- if (!mem_section) {
unsigned long size, align;
size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
- }
+#endif
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn);
} --- a/mm/sparse.c +++ b/mm/sparse.c @@ -23,8 +23,7 @@
- mem_section - memory sections, mem_map's for valid memory
*/ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section *mem_section[NR_SECTION_ROOTS]
- ____cacheline_internodealigned_in_smp;
+struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; @@ -101,7 +100,7 @@ static inline int sparse_index_init(unsi int __section_nr(struct mem_section* ms) { unsigned long root_nr;
- struct mem_section* root;
- struct mem_section *root = NULL;
for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { root = __nr_to_section(root_nr * SECTIONS_PER_ROOT); @@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) break; }
- VM_BUG_ON(root_nr == NR_SECTION_ROOTS);
- VM_BUG_ON(!root);
return (root_nr * SECTIONS_PER_ROOT) + (ms - root); } @@ -330,11 +329,17 @@ again: static void __init check_usemap_section_nr(int nid, unsigned long *usemap) { unsigned long usemap_snr, pgdat_snr;
- static unsigned long old_usemap_snr = NR_MEM_SECTIONS;
- static unsigned long old_pgdat_snr = NR_MEM_SECTIONS;
- static unsigned long old_usemap_snr;
- static unsigned long old_pgdat_snr; struct pglist_data *pgdat = NODE_DATA(nid); int usemap_nid;
- /* First call */
- if (!old_usemap_snr) {
old_usemap_snr = NR_MEM_SECTIONS;
old_pgdat_snr = NR_MEM_SECTIONS;
- }
- usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr)
On Mon, Jan 08, 2018 at 04:04:44PM +0000, Ingo Molnar wrote:
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
Urghh.. Sorry about this.
I'm on vacation right now. Give me a day to sort this out.
On Mon, Jan 08, 2018 at 08:46:53PM +0300, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 04:04:44PM +0000, Ingo Molnar wrote:
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
This *should* help.
Mike, could you test this? (On top of the rest of the fixes.)
Sorry for the mess.
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") --- include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \ + vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name) #define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM - VMCOREINFO_SYMBOL(mem_section); + VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
On 01/09/18 at 03:13am, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 08:46:53PM +0300, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 04:04:44PM +0000, Ingo Molnar wrote:
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
This *should* help.
Mike, could you test this? (On top of the rest of the fixes.)
Sorry for the mess.
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
Thanks for the patch, I have a similar patch but makedumpfile maintainer is looking at a userspace fix instead.
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
Thanks Dave
On 01/09/18 at 09:09am, Dave Young wrote:
On 01/09/18 at 03:13am, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 08:46:53PM +0300, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 04:04:44PM +0000, Ingo Molnar wrote:
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
This *should* help.
Mike, could you test this? (On top of the rest of the fixes.)
Sorry for the mess.
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
Thanks for the patch, I have a similar patch but makedumpfile maintainer is looking at a userspace fix instead.
Seems we should add lkml to CC next time so that people can watch it.
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
Thanks Dave
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
On 01/09/18 at 03:13am, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 08:46:53PM +0300, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 04:04:44PM +0000, Ingo Molnar wrote:
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
This *should* help.
Mike, could you test this? (On top of the rest of the fixes.)
Sorry for the mess.
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
Thanks for the patch, I have a similar patch but makedumpfile maintainer is looking at a userspace fix instead.
Seems we should add lkml to CC next time so that people can watch it.
Yes, agreed.
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
Thanks Dave
Thanks Dave
On Tue, Jan 09, 2018 at 03:24:40PM +0800, Dave Young wrote:
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
Yep, that's better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
Do you need some action on my side or will you folks take care about this?
On Tue, Jan 09, 2018 at 12:05:52PM +0300, Kirill A. Shutemov wrote:
On Tue, Jan 09, 2018 at 03:24:40PM +0800, Dave Young wrote:
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
Yep, that's better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
Do you need some action on my side or will you folks take care about this?
I think Baoquan was suggesting to update all array users in current code, if you can check every VMCOREINFO_SYMBOL and update all the arrays he will be happy. But if can not do it easily I'm fine with a VMCOREINFO_SYMBOL_ARRAY changes only now, we kdump people can do it later as well.
-- Kirill A. Shutemov
Thanks Dave
On Wed, Jan 10, 2018 at 03:08:04AM +0000, Dave Young wrote:
On Tue, Jan 09, 2018 at 12:05:52PM +0300, Kirill A. Shutemov wrote:
On Tue, Jan 09, 2018 at 03:24:40PM +0800, Dave Young wrote:
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
Yep, that's better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
Do you need some action on my side or will you folks take care about this?
I think Baoquan was suggesting to update all array users in current code, if you can check every VMCOREINFO_SYMBOL and update all the arrays he will be happy. But if can not do it easily I'm fine with a VMCOREINFO_SYMBOL_ARRAY changes only now, we kdump people can do it later as well.
It seems it's the only array we have there. swapper_pg_dir is a potential candidate, but it's 'unsigned long' on arm.
Below it patch with corrected macro name.
Please, consider applying.
From 70f3a84b97f2de98d1364f7b10b7a42a1d8e9968 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") --- include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..b511f6d24b42 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_SYMBOL_ARRAY(name) \ + vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name) #define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..4f63597c824d 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM - VMCOREINFO_SYMBOL(mem_section); + VMCOREINFO_SYMBOL_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
On 01/10/18 at 02:16pm, Kirill A. Shutemov wrote:
On Wed, Jan 10, 2018 at 03:08:04AM +0000, Dave Young wrote:
On Tue, Jan 09, 2018 at 12:05:52PM +0300, Kirill A. Shutemov wrote:
On Tue, Jan 09, 2018 at 03:24:40PM +0800, Dave Young wrote:
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
Yep, that's better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
Do you need some action on my side or will you folks take care about this?
I think Baoquan was suggesting to update all array users in current code, if you can check every VMCOREINFO_SYMBOL and update all the arrays he will be happy. But if can not do it easily I'm fine with a VMCOREINFO_SYMBOL_ARRAY changes only now, we kdump people can do it later as well.
It seems it's the only array we have there. swapper_pg_dir is a potential candidate, but it's 'unsigned long' on arm.
Below it patch with corrected macro name.
Please, consider applying.
From 70f3a84b97f2de98d1364f7b10b7a42a1d8e9968 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Ack it, thanks.
Acked-by: Baoquan He bhe@redhat.com
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..b511f6d24b42 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_SYMBOL_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..4f63597c824d 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_SYMBOL_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
On 01/10/18 at 02:16pm, Kirill A. Shutemov wrote:
On Wed, Jan 10, 2018 at 03:08:04AM +0000, Dave Young wrote:
On Tue, Jan 09, 2018 at 12:05:52PM +0300, Kirill A. Shutemov wrote:
On Tue, Jan 09, 2018 at 03:24:40PM +0800, Dave Young wrote:
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
Yep, that's better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
Do you need some action on my side or will you folks take care about this?
I think Baoquan was suggesting to update all array users in current code, if you can check every VMCOREINFO_SYMBOL and update all the arrays he will be happy. But if can not do it easily I'm fine with a VMCOREINFO_SYMBOL_ARRAY changes only now, we kdump people can do it later as well.
It seems it's the only array we have there. swapper_pg_dir is a potential candidate, but it's 'unsigned long' on arm.
Below it patch with corrected macro name.
Please, consider applying.
From 70f3a84b97f2de98d1364f7b10b7a42a1d8e9968 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..b511f6d24b42 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_SYMBOL_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..4f63597c824d 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_SYMBOL_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
Acked-by: Dave Young dyoung@redhat.com
If stable kernel took the mem section commits, then should also cc stable. Andrew, can you help to make this in 4.15?
Thanks Dave
On Fri, Jan 12, 2018 at 08:55:49AM +0800, Dave Young wrote:
On 01/10/18 at 02:16pm, Kirill A. Shutemov wrote:
On Wed, Jan 10, 2018 at 03:08:04AM +0000, Dave Young wrote:
On Tue, Jan 09, 2018 at 12:05:52PM +0300, Kirill A. Shutemov wrote:
On Tue, Jan 09, 2018 at 03:24:40PM +0800, Dave Young wrote:
On 01/09/18 at 01:41pm, Baoquan He wrote:
On 01/09/18 at 09:09am, Dave Young wrote:
> As for the macro name, VMCOREINFO_SYMBOL_ARRAY sounds better.
Yep, that's better.
I still think using vmcoreinfo_append_str is better. Unless we replace all array variables with the newly added macro.
vmcoreinfo_append_str("SYMBOL(mem_section)=%lx\n", (unsigned long)mem_section);
I have no strong opinion, either change all array uses or just introduce the macro and start to use it from now on if we have similar array symbols.
Do you need some action on my side or will you folks take care about this?
I think Baoquan was suggesting to update all array users in current code, if you can check every VMCOREINFO_SYMBOL and update all the arrays he will be happy. But if can not do it easily I'm fine with a VMCOREINFO_SYMBOL_ARRAY changes only now, we kdump people can do it later as well.
It seems it's the only array we have there. swapper_pg_dir is a potential candidate, but it's 'unsigned long' on arm.
Below it patch with corrected macro name.
Please, consider applying.
From 70f3a84b97f2de98d1364f7b10b7a42a1d8e9968 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..b511f6d24b42 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_SYMBOL_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..4f63597c824d 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_SYMBOL_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
Acked-by: Dave Young dyoung@redhat.com
If stable kernel took the mem section commits, then should also cc stable. Andrew, can you help to make this in 4.15?
Thanks Dave
Hm, this fix means that the vmlinux symbol table and vmcoreinfo have different values for mem_section. That seems... odd. I had to patch makedumpfile to fix the case of an explicit vmlinux being passed on the command line (which I realized I don't need to do, but it should still work):
From 542a11a8f28b0f0a989abc3adff89da22f44c719 Mon Sep 17 00:00:00 2001
Message-Id: 542a11a8f28b0f0a989abc3adff89da22f44c719.1515995400.git.osandov@fb.com From: Omar Sandoval osandov@fb.com Date: Sun, 14 Jan 2018 17:10:30 -0800 Subject: [PATCH] Fix SPARSEMEM_EXTREME support on Linux v4.15 when passing vmlinux
Since kernel commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y"), mem_section is a dynamically allocated array of pointers to mem_section instead of a static one (i.e., struct mem_section ** instead of struct mem_section * []). This adds an extra layer of indirection that breaks makedumpfile, which will end up with a bunch of bogus mem_maps.
Since kernel commit a0b1280368d1 ("kdump: write correct address of mem_section into vmcoreinfo"), the mem_section symbol in vmcoreinfo contains the address of the actual struct mem_section * array instead of the address of the pointer in .bss, which gets rid of the extra indirection. However, makedumpfile still uses the debugging symbol from the vmlinux image. Fix this by allowing symbols from the vmcore to override symbols from the vmlinux image. As the comment in initial() says, "vmcoreinfo in /proc/vmcore is more reliable than -x/-i option".
Signed-off-by: Omar Sandoval osandov@fb.com --- makedumpfile.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/makedumpfile.h b/makedumpfile.h index 57cf4d9..d68c798 100644 --- a/makedumpfile.h +++ b/makedumpfile.h @@ -274,8 +274,10 @@ do { \ } while (0) #define READ_SYMBOL(str_symbol, symbol) \ do { \ - if (SYMBOL(symbol) == NOT_FOUND_SYMBOL) { \ - SYMBOL(symbol) = read_vmcoreinfo_symbol(STR_SYMBOL(str_symbol)); \ + unsigned long _tmp_symbol; \ + _tmp_symbol = read_vmcoreinfo_symbol(STR_SYMBOL(str_symbol)); \ + if (_tmp_symbol != NOT_FOUND_SYMBOL) { \ + SYMBOL(symbol) = _tmp_symbol; \ if (SYMBOL(symbol) == INVALID_SYMBOL_DATA) \ return FALSE; \ } \
Hm, this fix means that the vmlinux symbol table and vmcoreinfo have different values for mem_section. That seems... odd. I had to patch makedumpfile to fix the case of an explicit vmlinux being passed on the command line (which I realized I don't need to do, but it should still work):
Looks good to me, I'll merge this into makedumpfile-1.6.4.
Thanks, Atsushi Kumagai
From 542a11a8f28b0f0a989abc3adff89da22f44c719 Mon Sep 17 00:00:00 2001 Message-Id: 542a11a8f28b0f0a989abc3adff89da22f44c719.1515995400.git.osandov@fb.com From: Omar Sandoval osandov@fb.com Date: Sun, 14 Jan 2018 17:10:30 -0800 Subject: [PATCH] Fix SPARSEMEM_EXTREME support on Linux v4.15 when passing vmlinux
Since kernel commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y"), mem_section is a dynamically allocated array of pointers to mem_section instead of a static one (i.e., struct mem_section ** instead of struct mem_section * []). This adds an extra layer of indirection that breaks makedumpfile, which will end up with a bunch of bogus mem_maps.
Since kernel commit a0b1280368d1 ("kdump: write correct address of mem_section into vmcoreinfo"), the mem_section symbol in vmcoreinfo contains the address of the actual struct mem_section * array instead of the address of the pointer in .bss, which gets rid of the extra indirection. However, makedumpfile still uses the debugging symbol from the vmlinux image. Fix this by allowing symbols from the vmcore to override symbols from the vmlinux image. As the comment in initial() says, "vmcoreinfo in /proc/vmcore is more reliable than -x/-i option".
Signed-off-by: Omar Sandoval osandov@fb.com
makedumpfile.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/makedumpfile.h b/makedumpfile.h index 57cf4d9..d68c798 100644 --- a/makedumpfile.h +++ b/makedumpfile.h @@ -274,8 +274,10 @@ do { \ } while (0) #define READ_SYMBOL(str_symbol, symbol) \ do { \
- if (SYMBOL(symbol) == NOT_FOUND_SYMBOL) { \
SYMBOL(symbol) = read_vmcoreinfo_symbol(STR_SYMBOL(str_symbol)); \
- unsigned long _tmp_symbol; \
- _tmp_symbol = read_vmcoreinfo_symbol(STR_SYMBOL(str_symbol)); \
- if (_tmp_symbol != NOT_FOUND_SYMBOL) { \
if (SYMBOL(symbol) == INVALID_SYMBOL_DATA) \ return FALSE; \ } \SYMBOL(symbol) = _tmp_symbol; \
-- 2.9.5
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
On Tue, 2018-01-09 at 03:13 +0300, Kirill A. Shutemov wrote:
Mike, could you test this? (On top of the rest of the fixes.)
homer:..crash/2018-01-09-04:25 # ll total 1863604 -rw------- 1 root root 66255 Jan 9 04:25 dmesg.txt -rw-r--r-- 1 root root 182 Jan 9 04:25 README.txt -rw-r--r-- 1 root root 2818240 Jan 9 04:25 System.map-4.15.0.gb2cd1df-master -rw------- 1 root root 1832914928 Jan 9 04:25 vmcore -rw-r--r-- 1 root root 72514993 Jan 9 04:25 vmlinux-4.15.0.gb2cd1df-master.gz
Yup, all better.
Sorry for the mess.
(why, developers not installing shiny new bugs is a whole lot worse:)
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
In fedora27 host:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000060 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000010 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
Thanks, dou At 01/09/2018 11:44 AM, Mike Galbraith wrote:
On Tue, 2018-01-09 at 03:13 +0300, Kirill A. Shutemov wrote:
Mike, could you test this? (On top of the rest of the fixes.)
homer:..crash/2018-01-09-04:25 # ll total 1863604 -rw------- 1 root root 66255 Jan 9 04:25 dmesg.txt -rw-r--r-- 1 root root 182 Jan 9 04:25 README.txt -rw-r--r-- 1 root root 2818240 Jan 9 04:25 System.map-4.15.0.gb2cd1df-master -rw------- 1 root root 1832914928 Jan 9 04:25 vmcore -rw-r--r-- 1 root root 72514993 Jan 9 04:25 vmlinux-4.15.0.gb2cd1df-master.gz
Yup, all better.
Sorry for the mess.
(why, developers not installing shiny new bugs is a whole lot worse:)
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name) #define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name))
diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote:
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
Still works fine for me with .today. Box is only 16GB desktop box though.
-Mike
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote:
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote:
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
I did a contrastive test by my colleagues Indoh's suggestion.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
Still works fine for me with .today. Box is only 16GB desktop box though.
Btw, In the upstream kernel which contained this patch, I did two tests:
1) use the makedumpfile as core_collector in /etc/kdump.conf, then trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK
2) use cp as core_collector, do the same operation to get the vmcore file. then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
......It causes makedumpfile failed.
Thanks, dou.
-Mike
On 02/07/18 at 08:00pm, Dou Liyang wrote:
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote:
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote:
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
I did a contrastive test by my colleagues Indoh's suggestion.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
Still works fine for me with .today. Box is only 16GB desktop box though.
Btw, In the upstream kernel which contained this patch, I did two tests:
- use the makedumpfile as core_collector in /etc/kdump.conf, then
trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK
- use cp as core_collector, do the same operation to get the vmcore file.
then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x
vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
......It causes makedumpfile failed.
Thanks, dou.
-Mike
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote:
On 02/07/18 at 08:00pm, Dou Liyang wrote:
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote:
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote:
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
I did a contrastive test by my colleagues Indoh's suggestion.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
Still works fine for me with .today. Box is only 16GB desktop box though.
Btw, In the upstream kernel which contained this patch, I did two tests:
- use the makedumpfile as core_collector in /etc/kdump.conf, then
trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK
- use cp as core_collector, do the same operation to get the vmcore file.
then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x
vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command ../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
......It causes makedumpfile failed.
Thanks, dou.
-Mike
On 02/07/18 at 08:17pm, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote:
On 02/07/18 at 08:00pm, Dou Liyang wrote:
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote:
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote:
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
I did a contrastive test by my colleagues Indoh's suggestion.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
Still works fine for me with .today. Box is only 16GB desktop box though.
Btw, In the upstream kernel which contained this patch, I did two tests:
- use the makedumpfile as core_collector in /etc/kdump.conf, then
trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK
- use cp as core_collector, do the same operation to get the vmcore file.
then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x
vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command ../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
And what's the debugging printing when trigger crash by sysrq?
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
......It causes makedumpfile failed.
Thanks, dou.
-Mike
At 02/07/2018 08:27 PM, Baoquan He wrote:
On 02/07/18 at 08:17pm, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote:
On 02/07/18 at 08:00pm, Dou Liyang wrote:
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote:
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote:
On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote: > Hi All, > > I met the makedumpfile failed in the upstream kernel which contained > this patch. Did I missed something else?
None I'm aware of.
Is there a reason to suspect that the issue is related to the bug this patch fixed?
I did a contrastive test by my colleagues Indoh's suggestion.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
Still works fine for me with .today. Box is only 16GB desktop box though.
Btw, In the upstream kernel which contained this patch, I did two tests:
- use the makedumpfile as core_collector in /etc/kdump.conf, then
trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK
- use cp as core_collector, do the same operation to get the vmcore file.
then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x
vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command
../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
And what's the debugging printing when trigger crash by sysrq?
kdump: dump target is /dev/vda2 kdump: saving to /sysroot//var/crash/127.0.0.1-2018-02-07-07:31:56/ [ 2.751352] EXT4-fs (vda2): re-mounted. Opts: data=ordered kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffffea0000000000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : ffffea0000200000 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : ffffea0000400000 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : ffffea0000600000 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : ffffea0000800000 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : ffffea0000a00000 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : ffffea0000c00000 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : ffffea0000e00000 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : ffffea0001000000 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : ffffea0001200000 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : ffffea0001400000 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : ffffea0001600000 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : ffffea0001800000 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : ffffea0001a00000 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : ffffea0001c00000 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : ffffea0001e00000 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Copying data : [100.0 %] - eta: 0s Writing erase info... offset_eraseinfo: 9567fb0, size_eraseinfo: 0 kdump: saving vmcore complete
Thanks, dou
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
......It causes makedumpfile failed.
Thanks, dou.
-Mike
On 02/07/18 at 08:34pm, Dou Liyang wrote:
At 02/07/2018 08:27 PM, Baoquan He wrote:
On 02/07/18 at 08:17pm, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote:
On 02/07/18 at 08:00pm, Dou Liyang wrote:
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote:
On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote: > On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote: > > Hi All, > > > > I met the makedumpfile failed in the upstream kernel which contained > > this patch. Did I missed something else? > > None I'm aware of. > > Is there a reason to suspect that the issue is related to the bug this patch > fixed?
I did a contrastive test by my colleagues Indoh's suggestion.
OK, I may get the reason. kaslr is enabled, right? You can try to disable kaslr and try them again. Because phys_base and kaslr_offset are got from vmlinux, while these are generated at compiling time. Just a guess.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
Still works fine for me with .today. Box is only 16GB desktop box though.
Btw, In the upstream kernel which contained this patch, I did two tests:
- use the makedumpfile as core_collector in /etc/kdump.conf, then
trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK
- use cp as core_collector, do the same operation to get the vmcore file.
then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x
vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command
../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
And what's the debugging printing when trigger crash by sysrq?
kdump: dump target is /dev/vda2 kdump: saving to /sysroot//var/crash/127.0.0.1-2018-02-07-07:31:56/ [ 2.751352] EXT4-fs (vda2): re-mounted. Opts: data=ordered kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffffea0000000000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : ffffea0000200000 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : ffffea0000400000 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : ffffea0000600000 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : ffffea0000800000 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : ffffea0000a00000 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : ffffea0000c00000 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : ffffea0000e00000 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : ffffea0001000000 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : ffffea0001200000 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : ffffea0001400000 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : ffffea0001600000 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : ffffea0001800000 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : ffffea0001a00000 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : ffffea0001c00000 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : ffffea0001e00000 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Copying data : [100.0 %] - eta: 0s Writing erase info... offset_eraseinfo: 9567fb0, size_eraseinfo: 0 kdump: saving vmcore complete
Thanks, dou
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
......It causes makedumpfile failed.
Thanks, dou.
-Mike
Hi Baoquan,
At 02/07/2018 08:45 PM, Baoquan He wrote:
On 02/07/18 at 08:34pm, Dou Liyang wrote:
At 02/07/2018 08:27 PM, Baoquan He wrote:
On 02/07/18 at 08:17pm, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote:
On 02/07/18 at 08:00pm, Dou Liyang wrote:
Hi Kirill,Mike
At 02/07/2018 06:45 PM, Mike Galbraith wrote: > On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote: >> On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote: >>> Hi All, >>> >>> I met the makedumpfile failed in the upstream kernel which contained >>> this patch. Did I missed something else? >> >> None I'm aware of. >> >> Is there a reason to suspect that the issue is related to the bug this patch >> fixed? >
I did a contrastive test by my colleagues Indoh's suggestion.
OK, I may get the reason. kaslr is enabled, right? You can try to
I add 'nokaslr' to disable the KASLR feature.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0+ root=UUID=10f10326-c923-4098-86aa-afed5c54ee0b ro crashkernel=512M rhgb console=tty0 console=ttyS0 nokaslr LANG=en_US.UTF-8
disable kaslr and try them again. Because phys_base and kaslr_offset are got from vmlinux, while these are generated at compiling time. Just a guess.
Oh, I will recompile the kernel with KASLR disabled in .config.
Thanks, dou.
Revert your two commits:
commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Fri Sep 29 17:08:16 2017 +0300
commit 629a359bdb0e0652a8227b4ff3125431995fec6e Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com Date: Tue Nov 7 11:33:37 2017 +0300
...and keep others unchanged, the makedumpfile works well.
> Still works fine for me with .today. Box is only 16GB desktop box though. > Btw, In the upstream kernel which contained this patch, I did two tests:
1) use the makedumpfile as core_collector in /etc/kdump.conf, then
trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the makedumpfile works well and I can get the vmcore file.
......It is OK 2) use cp as core_collector, do the same operation to get the vmcore file.
then use makedumpfile to do like above:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x
vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command
../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
And what's the debugging printing when trigger crash by sysrq?
kdump: dump target is /dev/vda2 kdump: saving to /sysroot//var/crash/127.0.0.1-2018-02-07-07:31:56/ [ 2.751352] EXT4-fs (vda2): re-mounted. Opts: data=ordered kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffffea0000000000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : ffffea0000200000 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : ffffea0000400000 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : ffffea0000600000 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : ffffea0000800000 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : ffffea0000a00000 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : ffffea0000c00000 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : ffffea0000e00000 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : ffffea0001000000 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : ffffea0001200000 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : ffffea0001400000 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : ffffea0001600000 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : ffffea0001800000 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : ffffea0001a00000 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : ffffea0001c00000 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : ffffea0001e00000 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Copying data : [100.0 %] - eta: 0s Writing erase info... offset_eraseinfo: 9567fb0, size_eraseinfo: 0 kdump: saving vmcore complete
Thanks, dou
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
......It causes makedumpfile failed.
Thanks, dou.
> -Mike > > >
On 02/08/18 at 09:14am, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:45 PM, Baoquan He wrote:
On 02/07/18 at 08:34pm, Dou Liyang wrote:
At 02/07/2018 08:27 PM, Baoquan He wrote:
On 02/07/18 at 08:17pm, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote:
On 02/07/18 at 08:00pm, Dou Liyang wrote: > Hi Kirill,Mike > > At 02/07/2018 06:45 PM, Mike Galbraith wrote: > > On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote: > > > On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote: > > > > Hi All, > > > > > > > > I met the makedumpfile failed in the upstream kernel which contained > > > > this patch. Did I missed something else? > > > > > > None I'm aware of. > > > > > > Is there a reason to suspect that the issue is related to the bug this patch > > > fixed? > > > > I did a contrastive test by my colleagues Indoh's suggestion.
OK, I may get the reason. kaslr is enabled, right? You can try to
I add 'nokaslr' to disable the KASLR feature.
~~~added??
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0+ root=UUID=10f10326-c923-4098-86aa-afed5c54ee0b ro crashkernel=512M rhgb console=tty0 console=ttyS0 nokaslr LANG=en_US.UTF-8
disable kaslr and try them again. Because phys_base and kaslr_offset are got from vmlinux, while these are generated at compiling time. Just a guess.
Oh, I will recompile the kernel with KASLR disabled in .config.
Then it's not what I guessed. Need debug makedumpfile since using vmlinux is another code path, few people use it usually.
Thanks, dou.
> > Revert your two commits: > > commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 > Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com > Date: Fri Sep 29 17:08:16 2017 +0300 > > commit 629a359bdb0e0652a8227b4ff3125431995fec6e > Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com > Date: Tue Nov 7 11:33:37 2017 +0300 > > ...and keep others unchanged, the makedumpfile works well. > > > Still works fine for me with .today. Box is only 16GB desktop box though. > > > Btw, In the upstream kernel which contained this patch, I did two tests: > > 1) use the makedumpfile as core_collector in /etc/kdump.conf, then > trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the > makedumpfile works well and I can get the vmcore file. > > ......It is OK > > 2) use cp as core_collector, do the same operation to get the vmcore file. > then use makedumpfile to do like above: > > [douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x > vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
Oh, then please ignore my previous comment. Adding '-D' can give more debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command
../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
And what's the debugging printing when trigger crash by sysrq?
kdump: dump target is /dev/vda2 kdump: saving to /sysroot//var/crash/127.0.0.1-2018-02-07-07:31:56/ [ 2.751352] EXT4-fs (vda2): re-mounted. Opts: data=ordered kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffffea0000000000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : ffffea0000200000 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : ffffea0000400000 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : ffffea0000600000 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : ffffea0000800000 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : ffffea0000a00000 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : ffffea0000c00000 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : ffffea0000e00000 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : ffffea0001000000 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : ffffea0001200000 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : ffffea0001400000 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : ffffea0001600000 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : ffffea0001800000 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : ffffea0001a00000 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : ffffea0001c00000 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : ffffea0001e00000 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Copying data : [100.0 %] - eta: 0s Writing erase info... offset_eraseinfo: 9567fb0, size_eraseinfo: 0 kdump: saving vmcore complete
Thanks, dou
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
> > ......It causes makedumpfile failed. > > > Thanks, > dou. > > > -Mike > > > > > > > >
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Hi Baoquan,
At 02/08/2018 09:23 AM, Baoquan He wrote:
On 02/08/18 at 09:14am, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:45 PM, Baoquan He wrote:
On 02/07/18 at 08:34pm, Dou Liyang wrote:
At 02/07/2018 08:27 PM, Baoquan He wrote:
On 02/07/18 at 08:17pm, Dou Liyang wrote:
Hi Baoquan,
At 02/07/2018 08:08 PM, Baoquan He wrote: > On 02/07/18 at 08:00pm, Dou Liyang wrote: >> Hi Kirill,Mike >> >> At 02/07/2018 06:45 PM, Mike Galbraith wrote: >>> On Wed, 2018-02-07 at 13:41 +0300, Kirill A. Shutemov wrote: >>>> On Wed, Feb 07, 2018 at 05:25:05PM +0800, Dou Liyang wrote: >>>>> Hi All, >>>>> >>>>> I met the makedumpfile failed in the upstream kernel which contained >>>>> this patch. Did I missed something else? >>>> >>>> None I'm aware of. >>>> >>>> Is there a reason to suspect that the issue is related to the bug this patch >>>> fixed? >>> >> >> I did a contrastive test by my colleagues Indoh's suggestion.
OK, I may get the reason. kaslr is enabled, right? You can try to
I add 'nokaslr' to disable the KASLR feature.
~~~added??
oops! yes, the kaslr had already disabled by this option when I tested.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0+ root=UUID=10f10326-c923-4098-86aa-afed5c54ee0b ro crashkernel=512M rhgb console=tty0 console=ttyS0 nokaslr LANG=en_US.UTF-8
disable kaslr and try them again. Because phys_base and kaslr_offset are got from vmlinux, while these are generated at compiling time. Just a guess.
Oh, I will recompile the kernel with KASLR disabled in .config.
Then it's not what I guessed. Need debug makedumpfile since using vmlinux is another code path, few people use it usually.
Understood, I will try to look into it.
Thanks, dou
Thanks, dou.
>> >> Revert your two commits: >> >> commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 >> Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com >> Date: Fri Sep 29 17:08:16 2017 +0300 >> >> commit 629a359bdb0e0652a8227b4ff3125431995fec6e >> Author: Kirill A. Shutemov kirill.shutemov@linux.intel.com >> Date: Tue Nov 7 11:33:37 2017 +0300 >> >> ...and keep others unchanged, the makedumpfile works well. >> >>> Still works fine for me with .today. Box is only 16GB desktop box though. >>> >> Btw, In the upstream kernel which contained this patch, I did two tests: >> >> 1) use the makedumpfile as core_collector in /etc/kdump.conf, then >> trigger the process of kdump by echo 1 >/proc/sysrq-trigger, the >> makedumpfile works well and I can get the vmcore file. >> >> ......It is OK >> >> 2) use cp as core_collector, do the same operation to get the vmcore file. >> then use makedumpfile to do like above: >> >> [douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x >> vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ > > Oh, then please ignore my previous comment. Adding '-D' can give more > debugging message.
I added '-D', Just like before, no more debugging message:
BTW, I use crash to analyze the vmcore file created by 'cp' command.
./crash ../makedumpfile/code/vmcore_4.15+_from_cp_command
../makedumpfile/code/vmlinux_4.15+
the crash works well, It's so interesting.
Thanks, dou.
The debugging message with '-D':
And what's the debugging printing when trigger crash by sysrq?
kdump: dump target is /dev/vda2 kdump: saving to /sysroot//var/crash/127.0.0.1-2018-02-07-07:31:56/ [ 2.751352] EXT4-fs (vda2): re-mounted. Opts: data=ordered kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffffea0000000000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : ffffea0000200000 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : ffffea0000400000 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : ffffea0000600000 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : ffffea0000800000 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : ffffea0000a00000 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : ffffea0000c00000 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : ffffea0000e00000 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : ffffea0001000000 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : ffffea0001200000 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : ffffea0001400000 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : ffffea0001600000 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : ffffea0001800000 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : ffffea0001a00000 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : ffffea0001c00000 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : ffffea0001e00000 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Copying data : [100.0 %] - eta: 0s Writing erase info... offset_eraseinfo: 9567fb0, size_eraseinfo: 0 kdump: saving vmcore complete
Thanks, dou
[douly@localhost code]$ ./makedumpfile -D -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+ sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000014 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000006 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
makedumpfile Failed.
> >> >> ......It causes makedumpfile failed. >> >> >> Thanks, >> dou. >> >>> -Mike >>> >>> >>> >> >> > > >
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
On 02/07/18 at 05:25pm, Dou Liyang wrote:
Hi All,
I met the makedumpfile failed in the upstream kernel which contained this patch. Did I missed something else?
readmem: Can't convert a virtual address(ffff88007ffd7000) to physical
Should not related to this patch. Otherwise your code can't get to that step. From message, ffff88007ffd7000 is the end of the last mem region, seems a code bug. You are testing 5-level on makedumpfile, right?
The patches I posted to descrease the memmory cost on mem map allocation has code bug, Fengguang's test robot sent a mail to me, I have updated patches, try to write a good patch log. You might also need check the 5-level patches you posted to makedumpfile upstream.
In fedora27 host:
[douly@localhost code]$ ./makedumpfile -d 31 --message-level 31 -x vmlinux_4.15+ vmcore_4.15+_from_cp_command vmcore_4.15+
sadump: does not have partition header sadump: read dump device as unknown format sadump: unknown format LOAD (0) phys_start : 1000000 phys_end : 2a86000 virt_start : ffffffff81000000 virt_end : ffffffff82a86000 LOAD (1) phys_start : 1000 phys_end : 9fc00 virt_start : ffff880000001000 virt_end : ffff88000009fc00 LOAD (2) phys_start : 100000 phys_end : 13000000 virt_start : ffff880000100000 virt_end : ffff880013000000 LOAD (3) phys_start : 33000000 phys_end : 7ffd7000 virt_start : ffff880033000000 virt_end : ffff88007ffd7000 Linux kdump page_size : 4096
max_mapnr : 7ffd7
Buffer size for the cyclic mode: 131061 The kernel version is not supported. The makedumpfile operation may be incomplete.
num of NODEs : 1
Memory type : SPARSEMEM_EX
mem_map (0) mem_map : ffff88007ff26000 pfn_start : 0 pfn_end : 8000 mem_map (1) mem_map : 0 pfn_start : 8000 pfn_end : 10000 mem_map (2) mem_map : 0 pfn_start : 10000 pfn_end : 18000 mem_map (3) mem_map : 0 pfn_start : 18000 pfn_end : 20000 mem_map (4) mem_map : 0 pfn_start : 20000 pfn_end : 28000 mem_map (5) mem_map : 0 pfn_start : 28000 pfn_end : 30000 mem_map (6) mem_map : 0 pfn_start : 30000 pfn_end : 38000 mem_map (7) mem_map : 0 pfn_start : 38000 pfn_end : 40000 mem_map (8) mem_map : 0 pfn_start : 40000 pfn_end : 48000 mem_map (9) mem_map : 0 pfn_start : 48000 pfn_end : 50000 mem_map (10) mem_map : 0 pfn_start : 50000 pfn_end : 58000 mem_map (11) mem_map : 0 pfn_start : 58000 pfn_end : 60000 mem_map (12) mem_map : 0 pfn_start : 60000 pfn_end : 68000 mem_map (13) mem_map : 0 pfn_start : 68000 pfn_end : 70000 mem_map (14) mem_map : 0 pfn_start : 70000 pfn_end : 78000 mem_map (15) mem_map : 0 pfn_start : 78000 pfn_end : 7ffd7 mmap() is available on the kernel. Checking for memory holes : [100.0 %] | STEP [Checking for memory holes ] : 0.000060 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages. Checking for memory holes : [100.0 %] \ STEP [Checking for memory holes ] : 0.000010 seconds Checking for memory holes : [100.0 %] - STEP [Checking for memory holes ] : 0.000004 seconds __vtop4_x86_64: Can't get a valid pte. readmem: Can't convert a virtual address(ffff88007ffd7000) to physical address. readmem: type_addr: 0, addr:ffff88007ffd7000, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. create_2nd_bitmap: Can't exclude unnecessary pages.
Thanks, dou At 01/09/2018 11:44 AM, Mike Galbraith wrote:
On Tue, 2018-01-09 at 03:13 +0300, Kirill A. Shutemov wrote:
Mike, could you test this? (On top of the rest of the fixes.)
homer:..crash/2018-01-09-04:25 # ll total 1863604 -rw------- 1 root root 66255 Jan 9 04:25 dmesg.txt -rw-r--r-- 1 root root 182 Jan 9 04:25 README.txt -rw-r--r-- 1 root root 2818240 Jan 9 04:25 System.map-4.15.0.gb2cd1df-master -rw------- 1 root root 1832914928 Jan 9 04:25 vmcore -rw-r--r-- 1 root root 72514993 Jan 9 04:25 vmlinux-4.15.0.gb2cd1df-master.gz
Yup, all better.
Sorry for the mess.
(why, developers not installing shiny new bugs is a whole lot worse:)
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name) #define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name))
diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Hi Kirill,
I setup qemu 2.9.0 to test 5-level on kexec/kdump support. While both kexec and kdump reset to BIOS immediately after triggering. I saw your patch adding 5-level paging support for kexec. Wonder if your test succeeded to jump into kexec/kdump kernel, and what else I need to make it. By the way, I just tested the latest upstream kernel.
commit 7f6890418 x86/kexec: Add 5-level paging support
[ ~]$ qemu-system-x86_64 --version QEMU emulator version 2.9.0(qemu-2.9.0-1.fc26.1) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
Thanks Baoquan
On 01/09/18 at 03:13am, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 08:46:53PM +0300, Kirill A. Shutemov wrote:
On Mon, Jan 08, 2018 at 04:04:44PM +0000, Ingo Molnar wrote:
hi Kirill,
As Mike reported it below, your 5-level paging related upstream commit 83e3c48729d9 and all its followup fixes:
83e3c48729d9: mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y 629a359bdb0e: mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y d09cfbbfa0f7: mm/sparse.c: wrong allocation for mem_section
... still breaks kexec - and that now regresses -stable as well.
Given that 5-level paging now syntactically depends on having this commit, if we fully revert this then we'll have to disable 5-level paging as well.
This *should* help.
Mike, could you test this? (On top of the rest of the fixes.)
Sorry for the mess.
From 100fd567754f1457be94732046aefca204c842d2 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Tue, 9 Jan 2018 02:55:47 +0300 Subject: [PATCH] kdump: Write a correct address of mem_section into vmcoreinfo
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_ARRAY() that would handle the situation correctly for both cases.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
include/linux/crash_core.h | 2 ++ kernel/crash_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 06097ef30449..83ae04950269 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -42,6 +42,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_ARRAY(name) \
- vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)
#define VMCOREINFO_SIZE(name) \ vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ (unsigned long)sizeof(name)) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b3663896278e..d4122a837477 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -410,7 +410,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(contig_page_data); #endif #ifdef CONFIG_SPARSEMEM
- VMCOREINFO_SYMBOL(mem_section);
- VMCOREINFO_ARRAY(mem_section); VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map);
-- Kirill A. Shutemov
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
On Wed, Jan 17, 2018 at 01:24:54PM +0800, Baoquan He wrote:
Hi Kirill,
I setup qemu 2.9.0 to test 5-level on kexec/kdump support. While both kexec and kdump reset to BIOS immediately after triggering. I saw your patch adding 5-level paging support for kexec. Wonder if your test succeeded to jump into kexec/kdump kernel, and what else I need to make it. By the way, I just tested the latest upstream kernel.
commit 7f6890418 x86/kexec: Add 5-level paging support
[ ~]$ qemu-system-x86_64 --version QEMU emulator version 2.9.0(qemu-2.9.0-1.fc26.1) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
Sorry for delay.
I didn't tested it in 5-level paging mode :-/
The patch below helps in my case. Could you test it?
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index 307d3bac5f04..65a98cf2307d 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -126,8 +126,12 @@ identity_mapped: /* * Set cr4 to a known state: * - physical address extension enabled + * - 5-level paging, if enabled */ movl $X86_CR4_PAE, %eax +#ifdef CONFIG_X86_5LEVEL + orl $X86_CR4_LA57, %eax +#endif movq %rax, %cr4
jmp 1f
On 01/25/18 at 06:50pm, Kirill A. Shutemov wrote:
On Wed, Jan 17, 2018 at 01:24:54PM +0800, Baoquan He wrote:
Hi Kirill,
I setup qemu 2.9.0 to test 5-level on kexec/kdump support. While both kexec and kdump reset to BIOS immediately after triggering. I saw your patch adding 5-level paging support for kexec. Wonder if your test succeeded to jump into kexec/kdump kernel, and what else I need to make it. By the way, I just tested the latest upstream kernel.
commit 7f6890418 x86/kexec: Add 5-level paging support
[ ~]$ qemu-system-x86_64 --version QEMU emulator version 2.9.0(qemu-2.9.0-1.fc26.1) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
Sorry for delay.
I didn't tested it in 5-level paging mode :-/
The patch below helps in my case. Could you test it?
Thanks, Kirill.
Seems it doesn't work. I have some confusion about the process, will send you a private mail.
Thanks Baoquan
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index 307d3bac5f04..65a98cf2307d 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -126,8 +126,12 @@ identity_mapped: /* * Set cr4 to a known state: * - physical address extension enabled
* - 5-level paging, if enabled */ movl $X86_CR4_PAE, %eax
+#ifdef CONFIG_X86_5LEVEL
orl $X86_CR4_LA57, %eax
+#endif movq %rax, %cr4
jmp 1f
-- Kirill A. Shutemov
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Ryabinin aryabinin@virtuozzo.com
commit 12a8cc7fcf54a8575f094be1e99032ec38aa045c upstream.
We are going to support boot-time switching between 4- and 5-level paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET for different paging modes: the constant is passed to gcc to generate code and cannot be changed at runtime.
This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset for both 4- and 5-level paging.
For 5-level paging it means that shadow memory region is not aligned to PGD boundary anymore and we have to handle unaligned parts of the region properly.
In addition, we have to exclude paravirt code from KASAN instrumentation as we now use set_pgd() before KASAN is fully ready.
[kirill.shutemov@linux.intel.com: clenaup, changelog message] Signed-off-by: Andrey Ryabinin aryabinin@virtuozzo.com Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/x86/x86_64/mm.txt | 2 arch/x86/Kconfig | 1 arch/x86/kernel/Makefile | 3 - arch/x86/mm/kasan_init_64.c | 101 +++++++++++++++++++++++++++++++--------- 4 files changed, 83 insertions(+), 24 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -34,7 +34,7 @@ ff92000000000000 - ffd1ffffffffffff (=54 ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ... unused hole ... -ffd8000000000000 - fff7ffffffffffff (=53 bits) kasan shadow memory (8PB) +ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) ... unused hole ... ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -303,7 +303,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC config KASAN_SHADOW_OFFSET hex depends on KASAN - default 0xdff8000000000000 if X86_5LEVEL default 0xdffffc0000000000
config HAVE_INTEL_TXT --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -25,7 +25,8 @@ endif KASAN_SANITIZE_head$(BITS).o := n KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n -KASAN_SANITIZE_stacktrace.o := n +KASAN_SANITIZE_stacktrace.o := n +KASAN_SANITIZE_paravirt.o := n
OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -16,6 +16,8 @@
extern struct range pfn_mapped[E820_MAX_ENTRIES];
+static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE); + static int __init map_range(struct range *range) { unsigned long start; @@ -31,8 +33,10 @@ static void __init clear_pgds(unsigned l unsigned long end) { pgd_t *pgd; + /* See comment in kasan_init() */ + unsigned long pgd_end = end & PGDIR_MASK;
- for (; start < end; start += PGDIR_SIZE) { + for (; start < pgd_end; start += PGDIR_SIZE) { pgd = pgd_offset_k(start); /* * With folded p4d, pgd_clear() is nop, use p4d_clear() @@ -43,29 +47,61 @@ static void __init clear_pgds(unsigned l else pgd_clear(pgd); } + + pgd = pgd_offset_k(start); + for (; start < end; start += P4D_SIZE) + p4d_clear(p4d_offset(pgd, start)); +} + +static inline p4d_t *early_p4d_offset(pgd_t *pgd, unsigned long addr) +{ + unsigned long p4d; + + if (!IS_ENABLED(CONFIG_X86_5LEVEL)) + return (p4d_t *)pgd; + + p4d = __pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK; + p4d += __START_KERNEL_map - phys_base; + return (p4d_t *)p4d + p4d_index(addr); +} + +static void __init kasan_early_p4d_populate(pgd_t *pgd, + unsigned long addr, + unsigned long end) +{ + pgd_t pgd_entry; + p4d_t *p4d, p4d_entry; + unsigned long next; + + if (pgd_none(*pgd)) { + pgd_entry = __pgd(_KERNPG_TABLE | __pa_nodebug(kasan_zero_p4d)); + set_pgd(pgd, pgd_entry); + } + + p4d = early_p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + + if (!p4d_none(*p4d)) + continue; + + p4d_entry = __p4d(_KERNPG_TABLE | __pa_nodebug(kasan_zero_pud)); + set_p4d(p4d, p4d_entry); + } while (p4d++, addr = next, addr != end && p4d_none(*p4d)); }
static void __init kasan_map_early_shadow(pgd_t *pgd) { - int i; - unsigned long start = KASAN_SHADOW_START; + /* See comment in kasan_init() */ + unsigned long addr = KASAN_SHADOW_START & PGDIR_MASK; unsigned long end = KASAN_SHADOW_END; + unsigned long next;
- for (i = pgd_index(start); start < end; i++) { - switch (CONFIG_PGTABLE_LEVELS) { - case 4: - pgd[i] = __pgd(__pa_nodebug(kasan_zero_pud) | - _KERNPG_TABLE); - break; - case 5: - pgd[i] = __pgd(__pa_nodebug(kasan_zero_p4d) | - _KERNPG_TABLE); - break; - default: - BUILD_BUG(); - } - start += PGDIR_SIZE; - } + pgd += pgd_index(addr); + do { + next = pgd_addr_end(addr, end); + kasan_early_p4d_populate(pgd, addr, next); + } while (pgd++, addr = next, addr != end); }
#ifdef CONFIG_KASAN_INLINE @@ -102,7 +138,7 @@ void __init kasan_early_init(void) for (i = 0; i < PTRS_PER_PUD; i++) kasan_zero_pud[i] = __pud(pud_val);
- for (i = 0; CONFIG_PGTABLE_LEVELS >= 5 && i < PTRS_PER_P4D; i++) + for (i = 0; IS_ENABLED(CONFIG_X86_5LEVEL) && i < PTRS_PER_P4D; i++) kasan_zero_p4d[i] = __p4d(p4d_val);
kasan_map_early_shadow(early_top_pgt); @@ -118,12 +154,35 @@ void __init kasan_init(void) #endif
memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt)); + + /* + * We use the same shadow offset for 4- and 5-level paging to + * facilitate boot-time switching between paging modes. + * As result in 5-level paging mode KASAN_SHADOW_START and + * KASAN_SHADOW_END are not aligned to PGD boundary. + * + * KASAN_SHADOW_START doesn't share PGD with anything else. + * We claim whole PGD entry to make things easier. + * + * KASAN_SHADOW_END lands in the last PGD entry and it collides with + * bunch of things like kernel code, modules, EFI mapping, etc. + * We need to take extra steps to not overwrite them. + */ + if (IS_ENABLED(CONFIG_X86_5LEVEL)) { + void *ptr; + + ptr = (void *)pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_END)); + memcpy(tmp_p4d_table, (void *)ptr, sizeof(tmp_p4d_table)); + set_pgd(&early_top_pgt[pgd_index(KASAN_SHADOW_END)], + __pgd(__pa(tmp_p4d_table) | _KERNPG_TABLE)); + } + load_cr3(early_top_pgt); __flush_tlb_all();
- clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); + clear_pgds(KASAN_SHADOW_START & PGDIR_MASK, KASAN_SHADOW_END);
- kasan_populate_zero_shadow((void *)KASAN_SHADOW_START, + kasan_populate_zero_shadow((void *)(KASAN_SHADOW_START & PGDIR_MASK), kasan_mem_to_shadow((void *)PAGE_OFFSET));
for (i = 0; i < E820_MAX_ENTRIES; i++) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 4375c29985f155d7eb2346615d84e62d1b673682 upstream.
Looks like we only need pre-built page tables in the CONFIG_XEN_PV=y and CONFIG_XEN_PVH=y cases.
Let's not provide them for other configurations.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Juergen Gross jgross@suse.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-5-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/head_64.S | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -38,11 +38,12 @@ * */
-#define p4d_index(x) (((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1)) #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_START_KERNEL = pgd_index(__START_KERNEL_map) +#endif L3_START_KERNEL = pud_index(__START_KERNEL_map)
.text @@ -365,10 +366,7 @@ NEXT_PAGE(early_dynamic_pgts)
.data
-#ifndef CONFIG_XEN -NEXT_PAGE(init_top_pgt) - .fill 512,8,0 -#else +#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) NEXT_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .org init_top_pgt + PGD_PAGE_OFFSET*8, 0 @@ -385,6 +383,9 @@ NEXT_PAGE(level2_ident_pgt) * Don't set NX because code runs from these pages. */ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) +#else +NEXT_PAGE(init_top_pgt) + .fill 512,8,0 #endif
#ifdef CONFIG_X86_5LEVEL
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 773dd2fca581b0a80e5a33332cc8ee67e5a79cba upstream.
It was decided 5-level paging is not going to be supported in XEN_PV.
Let's drop the dead code from the XEN_PV code.
Tested-by: Juergen Gross jgross@suse.com Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Juergen Gross jgross@suse.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@suse.de Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-6-kirill.shutemov@linux.intel.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/mmu_pv.c | 159 ++++++++++++++++++-------------------------------- 1 file changed, 60 insertions(+), 99 deletions(-)
--- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -449,7 +449,7 @@ __visible pmd_t xen_make_pmd(pmdval_t pm } PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd);
-#if CONFIG_PGTABLE_LEVELS == 4 +#ifdef CONFIG_X86_64 __visible pudval_t xen_pud_val(pud_t pud) { return pte_mfn_to_pfn(pud.pud); @@ -538,7 +538,7 @@ static void xen_set_p4d(p4d_t *ptr, p4d_
xen_mc_issue(PARAVIRT_LAZY_MMU); } -#endif /* CONFIG_PGTABLE_LEVELS == 4 */ +#endif /* CONFIG_X86_64 */
static int xen_pmd_walk(struct mm_struct *mm, pmd_t *pmd, int (*func)(struct mm_struct *mm, struct page *, enum pt_level), @@ -580,21 +580,17 @@ static int xen_p4d_walk(struct mm_struct int (*func)(struct mm_struct *mm, struct page *, enum pt_level), bool last, unsigned long limit) { - int i, nr, flush = 0; + int flush = 0; + pud_t *pud;
- nr = last ? p4d_index(limit) + 1 : PTRS_PER_P4D; - for (i = 0; i < nr; i++) { - pud_t *pud;
- if (p4d_none(p4d[i])) - continue; + if (p4d_none(*p4d)) + return flush;
- pud = pud_offset(&p4d[i], 0); - if (PTRS_PER_PUD > 1) - flush |= (*func)(mm, virt_to_page(pud), PT_PUD); - flush |= xen_pud_walk(mm, pud, func, - last && i == nr - 1, limit); - } + pud = pud_offset(p4d, 0); + if (PTRS_PER_PUD > 1) + flush |= (*func)(mm, virt_to_page(pud), PT_PUD); + flush |= xen_pud_walk(mm, pud, func, last, limit); return flush; }
@@ -644,8 +640,6 @@ static int __xen_pgd_walk(struct mm_stru continue;
p4d = p4d_offset(&pgd[i], 0); - if (PTRS_PER_P4D > 1) - flush |= (*func)(mm, virt_to_page(p4d), PT_P4D); flush |= xen_p4d_walk(mm, p4d, func, i == nr - 1, limit); }
@@ -1176,22 +1170,14 @@ static void __init xen_cleanmfnmap(unsig { pgd_t *pgd; p4d_t *p4d; - unsigned int i; bool unpin;
unpin = (vaddr == 2 * PGDIR_SIZE); vaddr &= PMD_MASK; pgd = pgd_offset_k(vaddr); p4d = p4d_offset(pgd, 0); - for (i = 0; i < PTRS_PER_P4D; i++) { - if (p4d_none(p4d[i])) - continue; - xen_cleanmfnmap_p4d(p4d + i, unpin); - } - if (IS_ENABLED(CONFIG_X86_5LEVEL)) { - set_pgd(pgd, __pgd(0)); - xen_cleanmfnmap_free_pgtbl(p4d, unpin); - } + if (!p4d_none(*p4d)) + xen_cleanmfnmap_p4d(p4d, unpin); }
static void __init xen_pagetable_p2m_free(void) @@ -1692,7 +1678,7 @@ static void xen_release_pmd(unsigned lon xen_release_ptpage(pfn, PT_PMD); }
-#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 static void xen_alloc_pud(struct mm_struct *mm, unsigned long pfn) { xen_alloc_ptpage(mm, pfn, PT_PUD); @@ -2029,13 +2015,12 @@ static phys_addr_t __init xen_early_virt */ void __init xen_relocate_p2m(void) { - phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys, p4d_phys; + phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys; unsigned long p2m_pfn, p2m_pfn_end, n_frames, pfn, pfn_end; - int n_pte, n_pt, n_pmd, n_pud, n_p4d, idx_pte, idx_pt, idx_pmd, idx_pud, idx_p4d; + int n_pte, n_pt, n_pmd, n_pud, idx_pte, idx_pt, idx_pmd, idx_pud; pte_t *pt; pmd_t *pmd; pud_t *pud; - p4d_t *p4d = NULL; pgd_t *pgd; unsigned long *new_p2m; int save_pud; @@ -2045,11 +2030,7 @@ void __init xen_relocate_p2m(void) n_pt = roundup(size, PMD_SIZE) >> PMD_SHIFT; n_pmd = roundup(size, PUD_SIZE) >> PUD_SHIFT; n_pud = roundup(size, P4D_SIZE) >> P4D_SHIFT; - if (PTRS_PER_P4D > 1) - n_p4d = roundup(size, PGDIR_SIZE) >> PGDIR_SHIFT; - else - n_p4d = 0; - n_frames = n_pte + n_pt + n_pmd + n_pud + n_p4d; + n_frames = n_pte + n_pt + n_pmd + n_pud;
new_area = xen_find_free_area(PFN_PHYS(n_frames)); if (!new_area) { @@ -2065,76 +2046,56 @@ void __init xen_relocate_p2m(void) * To avoid any possible virtual address collision, just use * 2 * PUD_SIZE for the new area. */ - p4d_phys = new_area; - pud_phys = p4d_phys + PFN_PHYS(n_p4d); + pud_phys = new_area; pmd_phys = pud_phys + PFN_PHYS(n_pud); pt_phys = pmd_phys + PFN_PHYS(n_pmd); p2m_pfn = PFN_DOWN(pt_phys) + n_pt;
pgd = __va(read_cr3_pa()); new_p2m = (unsigned long *)(2 * PGDIR_SIZE); - idx_p4d = 0; save_pud = n_pud; - do { - if (n_p4d > 0) { - p4d = early_memremap(p4d_phys, PAGE_SIZE); - clear_page(p4d); - n_pud = min(save_pud, PTRS_PER_P4D); - } - for (idx_pud = 0; idx_pud < n_pud; idx_pud++) { - pud = early_memremap(pud_phys, PAGE_SIZE); - clear_page(pud); - for (idx_pmd = 0; idx_pmd < min(n_pmd, PTRS_PER_PUD); - idx_pmd++) { - pmd = early_memremap(pmd_phys, PAGE_SIZE); - clear_page(pmd); - for (idx_pt = 0; idx_pt < min(n_pt, PTRS_PER_PMD); - idx_pt++) { - pt = early_memremap(pt_phys, PAGE_SIZE); - clear_page(pt); - for (idx_pte = 0; - idx_pte < min(n_pte, PTRS_PER_PTE); - idx_pte++) { - set_pte(pt + idx_pte, - pfn_pte(p2m_pfn, PAGE_KERNEL)); - p2m_pfn++; - } - n_pte -= PTRS_PER_PTE; - early_memunmap(pt, PAGE_SIZE); - make_lowmem_page_readonly(__va(pt_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, - PFN_DOWN(pt_phys)); - set_pmd(pmd + idx_pt, - __pmd(_PAGE_TABLE | pt_phys)); - pt_phys += PAGE_SIZE; + for (idx_pud = 0; idx_pud < n_pud; idx_pud++) { + pud = early_memremap(pud_phys, PAGE_SIZE); + clear_page(pud); + for (idx_pmd = 0; idx_pmd < min(n_pmd, PTRS_PER_PUD); + idx_pmd++) { + pmd = early_memremap(pmd_phys, PAGE_SIZE); + clear_page(pmd); + for (idx_pt = 0; idx_pt < min(n_pt, PTRS_PER_PMD); + idx_pt++) { + pt = early_memremap(pt_phys, PAGE_SIZE); + clear_page(pt); + for (idx_pte = 0; + idx_pte < min(n_pte, PTRS_PER_PTE); + idx_pte++) { + set_pte(pt + idx_pte, + pfn_pte(p2m_pfn, PAGE_KERNEL)); + p2m_pfn++; } - n_pt -= PTRS_PER_PMD; - early_memunmap(pmd, PAGE_SIZE); - make_lowmem_page_readonly(__va(pmd_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L2_TABLE, - PFN_DOWN(pmd_phys)); - set_pud(pud + idx_pmd, __pud(_PAGE_TABLE | pmd_phys)); - pmd_phys += PAGE_SIZE; + n_pte -= PTRS_PER_PTE; + early_memunmap(pt, PAGE_SIZE); + make_lowmem_page_readonly(__va(pt_phys)); + pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, + PFN_DOWN(pt_phys)); + set_pmd(pmd + idx_pt, + __pmd(_PAGE_TABLE | pt_phys)); + pt_phys += PAGE_SIZE; } - n_pmd -= PTRS_PER_PUD; - early_memunmap(pud, PAGE_SIZE); - make_lowmem_page_readonly(__va(pud_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(pud_phys)); - if (n_p4d > 0) - set_p4d(p4d + idx_pud, __p4d(_PAGE_TABLE | pud_phys)); - else - set_pgd(pgd + 2 + idx_pud, __pgd(_PAGE_TABLE | pud_phys)); - pud_phys += PAGE_SIZE; - } - if (n_p4d > 0) { - save_pud -= PTRS_PER_P4D; - early_memunmap(p4d, PAGE_SIZE); - make_lowmem_page_readonly(__va(p4d_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE, PFN_DOWN(p4d_phys)); - set_pgd(pgd + 2 + idx_p4d, __pgd(_PAGE_TABLE | p4d_phys)); - p4d_phys += PAGE_SIZE; + n_pt -= PTRS_PER_PMD; + early_memunmap(pmd, PAGE_SIZE); + make_lowmem_page_readonly(__va(pmd_phys)); + pin_pagetable_pfn(MMUEXT_PIN_L2_TABLE, + PFN_DOWN(pmd_phys)); + set_pud(pud + idx_pmd, __pud(_PAGE_TABLE | pmd_phys)); + pmd_phys += PAGE_SIZE; } - } while (++idx_p4d < n_p4d); + n_pmd -= PTRS_PER_PUD; + early_memunmap(pud, PAGE_SIZE); + make_lowmem_page_readonly(__va(pud_phys)); + pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(pud_phys)); + set_pgd(pgd + 2 + idx_pud, __pgd(_PAGE_TABLE | pud_phys)); + pud_phys += PAGE_SIZE; + }
/* Now copy the old p2m info to the new area. */ memcpy(new_p2m, xen_p2m_addr, size); @@ -2361,7 +2322,7 @@ static void __init xen_post_allocator_in pv_mmu_ops.set_pte = xen_set_pte; pv_mmu_ops.set_pmd = xen_set_pmd; pv_mmu_ops.set_pud = xen_set_pud; -#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 pv_mmu_ops.set_p4d = xen_set_p4d; #endif
@@ -2371,7 +2332,7 @@ static void __init xen_post_allocator_in pv_mmu_ops.alloc_pmd = xen_alloc_pmd; pv_mmu_ops.release_pte = xen_release_pte; pv_mmu_ops.release_pmd = xen_release_pmd; -#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 pv_mmu_ops.alloc_pud = xen_alloc_pud; pv_mmu_ops.release_pud = xen_release_pud; #endif @@ -2435,14 +2396,14 @@ static const struct pv_mmu_ops xen_mmu_o .make_pmd = PV_CALLEE_SAVE(xen_make_pmd), .pmd_val = PV_CALLEE_SAVE(xen_pmd_val),
-#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 .pud_val = PV_CALLEE_SAVE(xen_pud_val), .make_pud = PV_CALLEE_SAVE(xen_make_pud), .set_p4d = xen_set_p4d_hyper,
.alloc_pud = xen_alloc_pmd_init, .release_pud = xen_release_pmd_init, -#endif /* CONFIG_PGTABLE_LEVELS == 4 */ +#endif /* CONFIG_X86_64 */
.activate_mm = xen_activate_mm, .dup_mmap = xen_dup_mmap,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dongjiu Geng gengdongjiu@huawei.com
commit c49870e89f4d2c21c76ebe90568246bb0f3572b7 upstream.
For the SEA notification, the two functions ghes_sea_add() and ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA is defined. If not, it will return errors in the ghes_probe() and not continue. If the probe is failed, the ghes_sea_remove() also has no chance to be called. Hence, remove the unnecessary handling when CONFIG_ACPI_APEI_SEA is not defined.
For the NMI notification, it has the same issue as SEA notification, so also remove the unused dead-code for it.
Signed-off-by: Dongjiu Geng gengdongjiu@huawei.com Tested-by: Tyler Baicar tbaicar@codeaurora.org Reviewed-by: Borislav Petkov bp@suse.de Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/apei/ghes.c | 33 +++++---------------------------- 1 file changed, 5 insertions(+), 28 deletions(-)
--- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -852,17 +852,8 @@ static void ghes_sea_remove(struct ghes synchronize_rcu(); } #else /* CONFIG_ACPI_APEI_SEA */ -static inline void ghes_sea_add(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n", - ghes->generic->header.source_id); -} - -static inline void ghes_sea_remove(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n", - ghes->generic->header.source_id); -} +static inline void ghes_sea_add(struct ghes *ghes) { } +static inline void ghes_sea_remove(struct ghes *ghes) { } #endif /* CONFIG_ACPI_APEI_SEA */
#ifdef CONFIG_HAVE_ACPI_APEI_NMI @@ -1064,23 +1055,9 @@ static void ghes_nmi_init_cxt(void) init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq); } #else /* CONFIG_HAVE_ACPI_APEI_NMI */ -static inline void ghes_nmi_add(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to add NMI notification which is not supported!\n", - ghes->generic->header.source_id); - BUG(); -} - -static inline void ghes_nmi_remove(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to remove NMI notification which is not supported!\n", - ghes->generic->header.source_id); - BUG(); -} - -static inline void ghes_nmi_init_cxt(void) -{ -} +static inline void ghes_nmi_add(struct ghes *ghes) { } +static inline void ghes_nmi_remove(struct ghes *ghes) { } +static inline void ghes_nmi_init_cxt(void) { } #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
static int ghes_probe(struct platform_device *ghes_dev)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 82c62fa0c49aa305104013cee4468772799bb391 upstream.
I find the '.ifeq <expression>' directive to be confusing. Reading it quickly seems to suggest its opposite meaning, or that it's missing an argument.
Improve readability by replacing all of its x86 uses with '.if <expression> == 0'.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrei Vagin avagin@virtuozzo.com Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/757da028e802c7e98d23fbab8d234b1063e161cf.1508516398... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 2 +- arch/x86/kernel/head_32.S | 2 +- arch/x86/kernel/head_64.S | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -818,7 +818,7 @@ ENTRY(\sym)
ASM_CLAC
- .ifeq \has_error_code + .if \has_error_code == 0 pushq $-1 /* ORIG_RAX: no syscall to restart */ .endif
--- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -402,7 +402,7 @@ ENTRY(early_idt_handler_array) # 24(%rsp) error code i = 0 .rept NUM_EXCEPTION_VECTORS - .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 + .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 pushl $0 # Dummy error code, to make stack frame uniform .endif pushl $i # 20(%esp) Vector number --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -275,7 +275,7 @@ ENDPROC(start_cpu0) ENTRY(early_idt_handler_array) i = 0 .rept NUM_EXCEPTION_VECTORS - .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 + .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 UNWIND_HINT_IRET_REGS pushq $0 # Dummy error code, to make stack frame uniform .else
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada yamada.masahiro@socionext.com
commit af8e947079a7dab0480b5d6db6b093fd04b86fc9 upstream.
This makes the build log look nicer.
Before: SYSTBL arch/x86/entry/syscalls/../../include/generated/asm/syscalls_32.h SYSHDR arch/x86/entry/syscalls/../../include/generated/asm/unistd_32_ia32.h SYSHDR arch/x86/entry/syscalls/../../include/generated/asm/unistd_64_x32.h SYSTBL arch/x86/entry/syscalls/../../include/generated/asm/syscalls_64.h SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_32.h SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_64.h SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_x32.h
After: SYSTBL arch/x86/include/generated/asm/syscalls_32.h SYSHDR arch/x86/include/generated/asm/unistd_32_ia32.h SYSHDR arch/x86/include/generated/asm/unistd_64_x32.h SYSTBL arch/x86/include/generated/asm/syscalls_64.h SYSHDR arch/x86/include/generated/uapi/asm/unistd_32.h SYSHDR arch/x86/include/generated/uapi/asm/unistd_64.h SYSHDR arch/x86/include/generated/uapi/asm/unistd_x32.h
Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: linux-kbuild@vger.kernel.org Link: http://lkml.kernel.org/r/1509077470-2735-1-git-send-email-yamada.masahiro@so... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/syscalls/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/syscalls/Makefile +++ b/arch/x86/entry/syscalls/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 -out := $(obj)/../../include/generated/asm -uapi := $(obj)/../../include/generated/uapi/asm +out := arch/$(SRCARCH)/include/generated/asm +uapi := arch/$(SRCARCH)/include/generated/uapi/asm
# Create output directory if not already present _dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baoquan He bhe@redhat.com
commit 15670bfe19905b1dcbb63137f40d718b59d84479 upstream.
register_page_bootmem_memmap()'s 3rd 'size' parameter is named in a somewhat misleading fashion - rename it to 'nr_pages' which makes the units of it much clearer.
Meanwhile rename the existing local variable 'nr_pages' to 'nr_pmd_pages', a more expressive name, to avoid conflict with new function parameter 'nr_pages'.
(Also clean up the unnecessary parentheses in which get_order() is called.)
Signed-off-by: Baoquan He bhe@redhat.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: akpm@linux-foundation.org Link: http://lkml.kernel.org/r/1509154238-23250-1-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/mm/init_64.c | 10 +++++----- include/linux/mm.h | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned
#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HAVE_BOOTMEM_INFO_NODE) void register_page_bootmem_memmap(unsigned long section_nr, - struct page *start_page, unsigned long size) + struct page *start_page, unsigned long nr_pages) { unsigned long addr = (unsigned long)start_page; - unsigned long end = (unsigned long)(start_page + size); + unsigned long end = (unsigned long)(start_page + nr_pages); unsigned long next; pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd; - unsigned int nr_pages; + unsigned int nr_pmd_pages; struct page *page;
for (; addr < end; addr = next) { @@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsign if (pmd_none(*pmd)) continue;
- nr_pages = 1 << (get_order(PMD_SIZE)); + nr_pmd_pages = 1 << get_order(PMD_SIZE); page = pmd_page(*pmd); - while (nr_pages--) + while (nr_pmd_pages--) get_page_bootmem(section_nr, page++, SECTION_INFO); } --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2510,7 +2510,7 @@ void vmemmap_populate_print_last(void); void vmemmap_free(unsigned long start, unsigned long end); #endif void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, - unsigned long size); + unsigned long nr_pages);
enum mf_flags { MF_COUNT_INCREASED = 1 << 0,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gayatri Kammela gayatri.kammela@intel.com
commit c128dbfa0f879f8ce7b79054037889b0b2240728 upstream.
Add a few new SSE/AVX/AVX512 instruction groups/features for enumeration in /proc/cpuinfo: AVX512_VBMI2, GFNI, VAES, VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG.
CPUID.(EAX=7,ECX=0):ECX[bit 6] AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 8] GFNI CPUID.(EAX=7,ECX=0):ECX[bit 9] VAES CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG
Detailed information of CPUID bits for these features can be found in the Intel Architecture Instruction Set Extensions and Future Features Programming Interface document (refer to Table 1-1. and Table 1-2.). A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=197239
Signed-off-by: Gayatri Kammela gayatri.kammela@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Andi Kleen andi.kleen@intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi Shankar ravi.v.shankar@intel.com Cc: Ricardo Neri ricardo.neri@intel.com Cc: Yang Zhong yang.zhong@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1509412829-23380-1-git-send-email-gayatri.kammela@i... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeatures.h | 6 ++++++ arch/x86/kernel/cpu/cpuid-deps.c | 6 ++++++ 2 files changed, 12 insertions(+)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -300,6 +300,12 @@ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ +#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ +#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ +#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ +#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -50,6 +50,12 @@ const static struct cpuid_dep cpuid_deps { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_VBMI2, X86_FEATURE_AVX512VL }, + { X86_FEATURE_GFNI, X86_FEATURE_AVX512VL }, + { X86_FEATURE_VAES, X86_FEATURE_AVX512VL }, + { X86_FEATURE_VPCLMULQDQ, X86_FEATURE_AVX512VL }, + { X86_FEATURE_AVX512_VNNI, X86_FEATURE_AVX512VL }, + { X86_FEATURE_AVX512_BITALG, X86_FEATURE_AVX512VL }, { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
commit 1067f030994c69ca1fba8c607437c8895dcf8509 upstream.
Up to this point, only fault.c used the definitions of the page fault error codes. Thus, it made sense to keep them within such file. Other portions of code might be interested in those definitions too. For instance, the User- Mode Instruction Prevention emulation code will use such definitions to emulate a page fault when it is unable to successfully copy the results of the emulated instructions to user space.
While relocating the error code enumeration, the prefix X86_ is used to make it consistent with the rest of the definitions in traps.h. Of course, code using the enumeration had to be updated as well. No functional changes were performed.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Andy Lutomirski luto@kernel.org Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: ricardo.neri@intel.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Huang Rui ray.huang@amd.com Cc: Shuah Khan shuah@kernel.org Cc: Jonathan Corbet corbet@lwn.net Cc: Jiri Slaby jslaby@suse.cz Cc: "Ravi V. Shankar" ravi.v.shankar@intel.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Brian Gerst brgerst@gmail.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Chen Yucong slaoub@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-cal... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/traps.h | 18 ++++++++ arch/x86/mm/fault.c | 88 ++++++++++++++++--------------------------- 2 files changed, 52 insertions(+), 54 deletions(-)
--- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -145,4 +145,22 @@ enum { X86_TRAP_IRET = 32, /* 32, IRET Exception */ };
+/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 == 1: use of reserved bit detected + * bit 4 == 1: fault was an instruction fetch + * bit 5 == 1: protection keys block access + */ +enum x86_pf_error_code { + X86_PF_PROT = 1 << 0, + X86_PF_WRITE = 1 << 1, + X86_PF_USER = 1 << 2, + X86_PF_RSVD = 1 << 3, + X86_PF_INSTR = 1 << 4, + X86_PF_PK = 1 << 5, +}; #endif /* _ASM_X86_TRAPS_H */ --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -30,26 +30,6 @@ #include <asm/trace/exceptions.h>
/* - * Page fault error code bits: - * - * bit 0 == 0: no page found 1: protection fault - * bit 1 == 0: read access 1: write access - * bit 2 == 0: kernel-mode access 1: user-mode access - * bit 3 == 1: use of reserved bit detected - * bit 4 == 1: fault was an instruction fetch - * bit 5 == 1: protection keys block access - */ -enum x86_pf_error_code { - - PF_PROT = 1 << 0, - PF_WRITE = 1 << 1, - PF_USER = 1 << 2, - PF_RSVD = 1 << 3, - PF_INSTR = 1 << 4, - PF_PK = 1 << 5, -}; - -/* * Returns 0 if mmiotrace is disabled, or if the fault is not * handled by mmiotrace: */ @@ -150,7 +130,7 @@ is_prefetch(struct pt_regs *regs, unsign * If it was a exec (instruction fetch) fault on NX page, then * do not ignore the fault: */ - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) return 0;
instr = (void *)convert_ip_to_linear(current, regs); @@ -180,7 +160,7 @@ is_prefetch(struct pt_regs *regs, unsign * siginfo so userspace can discover which protection key was set * on the PTE. * - * If we get here, we know that the hardware signaled a PF_PK + * If we get here, we know that the hardware signaled a X86_PF_PK * fault and that there was a VMA once we got in the fault * handler. It does *not* guarantee that the VMA we find here * was the one that we faulted on. @@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_co /* * force_sig_info_fault() is called from a number of * contexts, some of which have a VMA and some of which - * do not. The PF_PK handing happens after we have a + * do not. The X86_PF_PK handing happens after we have a * valid VMA, so we should never reach this without a * valid VMA. */ @@ -698,7 +678,7 @@ show_fault_oops(struct pt_regs *regs, un if (!oops_may_print()) return;
- if (error_code & PF_INSTR) { + if (error_code & X86_PF_INSTR) { unsigned int level; pgd_t *pgd; pte_t *pte; @@ -780,7 +760,7 @@ no_context(struct pt_regs *regs, unsigne */ if (current->thread.sig_on_uaccess_err && signal) { tsk->thread.trap_nr = X86_TRAP_PF; - tsk->thread.error_code = error_code | PF_USER; + tsk->thread.error_code = error_code | X86_PF_USER; tsk->thread.cr2 = address;
/* XXX: hwpoison faults will set the wrong code. */ @@ -898,7 +878,7 @@ __bad_area_nosemaphore(struct pt_regs *r struct task_struct *tsk = current;
/* User mode accesses just cause a SIGSEGV */ - if (error_code & PF_USER) { + if (error_code & X86_PF_USER) { /* * It's possible to have interrupts off here: */ @@ -919,7 +899,7 @@ __bad_area_nosemaphore(struct pt_regs *r * Instruction fetch faults in the vsyscall page might need * emulation. */ - if (unlikely((error_code & PF_INSTR) && + if (unlikely((error_code & X86_PF_INSTR) && ((address & ~0xfff) == VSYSCALL_ADDR))) { if (emulate_vsyscall(regs, address)) return; @@ -932,7 +912,7 @@ __bad_area_nosemaphore(struct pt_regs *r * are always protection faults. */ if (address >= TASK_SIZE_MAX) - error_code |= PF_PROT; + error_code |= X86_PF_PROT;
if (likely(show_unhandled_signals)) show_signal_msg(regs, error_code, address, tsk); @@ -993,11 +973,11 @@ static inline bool bad_area_access_from_
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return false; - if (error_code & PF_PK) + if (error_code & X86_PF_PK) return true; /* this checks permission keys on the VMA: */ - if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), - (error_code & PF_INSTR), foreign)) + if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), + (error_code & X86_PF_INSTR), foreign)) return true; return false; } @@ -1025,7 +1005,7 @@ do_sigbus(struct pt_regs *regs, unsigned int code = BUS_ADRERR;
/* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { + if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); return; } @@ -1053,14 +1033,14 @@ static noinline void mm_fault_error(struct pt_regs *regs, unsigned long error_code, unsigned long address, u32 *pkey, unsigned int fault) { - if (fatal_signal_pending(current) && !(error_code & PF_USER)) { + if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) { no_context(regs, error_code, address, 0, 0); return; }
if (fault & VM_FAULT_OOM) { /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { + if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, SIGSEGV, SEGV_MAPERR); return; @@ -1085,16 +1065,16 @@ mm_fault_error(struct pt_regs *regs, uns
static int spurious_fault_check(unsigned long error_code, pte_t *pte) { - if ((error_code & PF_WRITE) && !pte_write(*pte)) + if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) return 0;
- if ((error_code & PF_INSTR) && !pte_exec(*pte)) + if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) return 0; /* * Note: We do not do lazy flushing on protection key - * changes, so no spurious fault will ever set PF_PK. + * changes, so no spurious fault will ever set X86_PF_PK. */ - if ((error_code & PF_PK)) + if ((error_code & X86_PF_PK)) return 1;
return 1; @@ -1140,8 +1120,8 @@ spurious_fault(unsigned long error_code, * change, so user accesses are not expected to cause spurious * faults. */ - if (error_code != (PF_WRITE | PF_PROT) - && error_code != (PF_INSTR | PF_PROT)) + if (error_code != (X86_PF_WRITE | X86_PF_PROT) && + error_code != (X86_PF_INSTR | X86_PF_PROT)) return 0;
pgd = init_mm.pgd + pgd_index(address); @@ -1201,19 +1181,19 @@ access_error(unsigned long error_code, s * always an unconditional error and can never result in * a follow-up action to resolve the fault, like a COW. */ - if (error_code & PF_PK) + if (error_code & X86_PF_PK) return 1;
/* * Make sure to check the VMA so that we do not perform - * faults just to hit a PF_PK as soon as we fill in a + * faults just to hit a X86_PF_PK as soon as we fill in a * page. */ - if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), - (error_code & PF_INSTR), foreign)) + if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), + (error_code & X86_PF_INSTR), foreign)) return 1;
- if (error_code & PF_WRITE) { + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; @@ -1221,7 +1201,7 @@ access_error(unsigned long error_code, s }
/* read, present: */ - if (unlikely(error_code & PF_PROT)) + if (unlikely(error_code & X86_PF_PROT)) return 1;
/* read, not present: */ @@ -1244,7 +1224,7 @@ static inline bool smap_violation(int er if (!static_cpu_has(X86_FEATURE_SMAP)) return false;
- if (error_code & PF_USER) + if (error_code & X86_PF_USER) return false;
if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC)) @@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, un * protection error (error_code & 9) == 0. */ if (unlikely(fault_in_kernel_space(address))) { - if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) { + if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { if (vmalloc_fault(address) >= 0) return;
@@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, un if (unlikely(kprobes_fault(regs))) return;
- if (unlikely(error_code & PF_RSVD)) + if (unlikely(error_code & X86_PF_RSVD)) pgtable_bad(regs, error_code, address);
if (unlikely(smap_violation(error_code, regs))) { @@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, un */ if (user_mode(regs)) { local_irq_enable(); - error_code |= PF_USER; + error_code |= X86_PF_USER; flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) @@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, un
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
- if (error_code & PF_WRITE) + if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) flags |= FAULT_FLAG_INSTRUCTION;
/* @@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, un * space check, thus avoiding the deadlock: */ if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if ((error_code & PF_USER) == 0 && + if (!(error_code & X86_PF_USER) && !search_exception_tables(regs->ip)) { bad_area_nosemaphore(regs, error_code, address, NULL); return; @@ -1409,7 +1389,7 @@ retry: bad_area(regs, error_code, address); return; } - if (error_code & PF_USER) { + if (error_code & X86_PF_USER) { /* * Accessing the stack below %sp is always a bug. * The large cushion allows instructions like enter
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
commit b0ce5b8c95c83a7b98c679b117e3d6ae6f97154b upstream.
Both head_32.S and head_64.S utilize the same value to initialize the control register CR0. Also, other parts of the kernel might want to access this initial definition (e.g., emulation code for User-Mode Instruction Prevention uses this state to provide a sane dummy value for CR0 when emulating the smsw instruction). Thus, relocate this definition to a header file from which it can be conveniently accessed.
Suggested-by: Borislav Petkov bp@alien8.de Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Andy Lutomirski luto@kernel.org Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: ricardo.neri@intel.com Cc: linux-mm@kvack.org Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Huang Rui ray.huang@amd.com Cc: Shuah Khan shuah@kernel.org Cc: linux-arch@vger.kernel.org Cc: Jonathan Corbet corbet@lwn.net Cc: Jiri Slaby jslaby@suse.cz Cc: "Ravi V. Shankar" ravi.v.shankar@intel.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Brian Gerst brgerst@gmail.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Chen Yucong slaoub@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Dave Hansen dave.hansen@intel.com Cc: Andy Lutomirski luto@amacapital.net Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/1509135945-13762-3-git-send-email-ricardo.neri-cal... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/uapi/asm/processor-flags.h | 3 +++ arch/x86/kernel/head_32.S | 3 --- arch/x86/kernel/head_64.S | 3 --- 3 files changed, 3 insertions(+), 6 deletions(-)
--- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -152,5 +152,8 @@ #define CX86_ARR_BASE 0xc4 #define CX86_RCR_BASE 0xdc
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ + X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ + X86_CR0_PG)
#endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */ --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -212,9 +212,6 @@ ENTRY(startup_32_smp) #endif
.Ldefault_entry: -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ - X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ - X86_CR0_PG) movl $(CR0_STATE & ~X86_CR0_PG),%eax movl %eax,%cr0
--- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -154,9 +154,6 @@ ENTRY(secondary_startup_64) 1: wrmsr /* Make changes effective */
/* Setup cr0 */ -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ - X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ - X86_CR0_PG) movl $CR0_STATE, %eax /* Make changes effective */ movq %rax, %cr0
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
commit e27c310af5c05cf876d9cad006928076c27f54d4 upstream.
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64 is selected. This implies that code built with CONFIG_X86_64=n cannot use it. If a piece of code needs to be built for both CONFIG_X86_64=y and CONFIG_X86_64=n and wants to use this function, it needs to wrap it in an #ifdef/#endif; potentially, in multiple places.
This can be easily avoided with a single #ifdef/#endif pair within user_64bit_mode() itself.
Suggested-by: Borislav Petkov bp@suse.de Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: ricardo.neri@intel.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Huang Rui ray.huang@amd.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Kees Cook keescook@chromium.org Cc: Jonathan Corbet corbet@lwn.net Cc: Jiri Slaby jslaby@suse.cz Cc: Dmitry Vyukov dvyukov@google.com Cc: "Ravi V. Shankar" ravi.v.shankar@intel.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Brian Gerst brgerst@gmail.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Colin Ian King colin.king@canonical.com Cc: Chen Yucong slaoub@gmail.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Thomas Garnier thgarnie@google.com Link: https://lkml.kernel.org/r/1509135945-13762-4-git-send-email-ricardo.neri-cal... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/ptrace.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -136,9 +136,9 @@ static inline int v8086_mode(struct pt_r #endif }
-#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -149,8 +149,12 @@ static inline bool user_64bit_mode(struc /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */ + return false; +#endif }
+#ifdef CONFIG_X86_64 #define current_user_stack_pointer() current_pt_regs()->sp #define compat_user_stack_pointer() current_pt_regs()->sp #endif
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 9da78ba6b47b46428cfdfc0851511ab29c869798 upstream.
The only user was the 64-bit opportunistic SYSRET failure path, and that path didn't really need it. This change makes the opportunistic SYSRET code a bit more straightforward and gets rid of the label.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/be3006a7ad3326e3458cf1cc55d416252cbe1986.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -246,7 +246,6 @@ entry_SYSCALL64_slow_path: call do_syscall_64 /* returns with IRQs disabled */
return_from_SYSCALL_64: - RESTORE_EXTRA_REGS TRACE_IRQS_IRETQ /* we're about to change IF */
/* @@ -315,6 +314,7 @@ return_from_SYSCALL_64: */ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ + RESTORE_EXTRA_REGS RESTORE_C_REGS_EXCEPT_RCX_R11 movq RSP(%rsp), %rsp UNWIND_HINT_EMPTY @@ -322,7 +322,7 @@ syscall_return_via_sysret:
opportunistic_sysret_failed: SWAPGS - jmp restore_c_regs_and_iret + jmp restore_regs_and_iret END(entry_SYSCALL_64)
ENTRY(stub_ptregs_64) @@ -639,7 +639,6 @@ retint_kernel: */ GLOBAL(restore_regs_and_iret) RESTORE_EXTRA_REGS -restore_c_regs_and_iret: RESTORE_C_REGS REMOVE_PT_GPREGS_FROM_STACK 8 INTERRUPT_RETURN
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 26c4ef9c49d8a0341f6d97ce2cfdd55d1236ed29 upstream.
These code paths will diverge soon.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/dccf8c7b3750199b4b30383c812d4e2931811509.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 34 +++++++++++++++++++++++++--------- arch/x86/entry/entry_64_compat.S | 2 +- arch/x86/kernel/head_64.S | 2 +- 3 files changed, 27 insertions(+), 11 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -322,7 +322,7 @@ syscall_return_via_sysret:
opportunistic_sysret_failed: SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode END(entry_SYSCALL_64)
ENTRY(stub_ptregs_64) @@ -424,7 +424,7 @@ ENTRY(ret_from_fork) call syscall_return_slowpath /* returns with IRQs disabled */ TRACE_IRQS_ON /* user mode is traced as IRQS on */ SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode
1: /* kernel thread */ @@ -613,7 +613,20 @@ GLOBAL(retint_user) call prepare_exit_to_usermode TRACE_IRQS_IRETQ SWAPGS - jmp restore_regs_and_iret + +GLOBAL(restore_regs_and_return_to_usermode) +#ifdef CONFIG_DEBUG_ENTRY + /* Assert that pt_regs indicates user mode. */ + testl $3, CS(%rsp) + jnz 1f + ud2 +1: +#endif + RESTORE_EXTRA_REGS + RESTORE_C_REGS + REMOVE_PT_GPREGS_FROM_STACK 8 + INTERRUPT_RETURN +
/* Returning to kernel space */ retint_kernel: @@ -633,11 +646,14 @@ retint_kernel: */ TRACE_IRQS_IRETQ
-/* - * At this label, code paths which return to kernel and to user, - * which come from interrupts/exception and from syscalls, merge. - */ -GLOBAL(restore_regs_and_iret) +GLOBAL(restore_regs_and_return_to_kernel) +#ifdef CONFIG_DEBUG_ENTRY + /* Assert that pt_regs indicates kernel mode. */ + testl $3, CS(%rsp) + jz 1f + ud2 +1: +#endif RESTORE_EXTRA_REGS RESTORE_C_REGS REMOVE_PT_GPREGS_FROM_STACK 8 @@ -1328,7 +1344,7 @@ ENTRY(nmi) * work, because we don't want to enable interrupts. */ SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode
.Lnmi_from_kernel: /* --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -338,7 +338,7 @@ ENTRY(entry_INT80_compat) /* Go back to user mode. */ TRACE_IRQS_ON SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode END(entry_INT80_compat)
ENTRY(stub32_clone) --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -328,7 +328,7 @@ early_idt_handler_common:
20: decl early_recursion_flag(%rip) - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_kernel END(early_idt_handler_common)
__INITDATA
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 8a055d7f411d41755ce30db5bb65b154777c4b78 upstream.
All of the code paths that ended up doing IRET to usermode did SWAPGS immediately beforehand. Move the SWAPGS into the common code.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/27fd6f45b7cd640de38fb9066fd0349bcd11f8e1.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 32 ++++++++++++++------------------ arch/x86/entry/entry_64_compat.S | 3 +-- 2 files changed, 15 insertions(+), 20 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -250,12 +250,14 @@ return_from_SYSCALL_64:
/* * Try to use SYSRET instead of IRET if we're returning to - * a completely clean 64-bit userspace context. + * a completely clean 64-bit userspace context. If we're not, + * go to the slow exit path. */ movq RCX(%rsp), %rcx movq RIP(%rsp), %r11 - cmpq %rcx, %r11 /* RCX == RIP */ - jne opportunistic_sysret_failed + + cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */ + jne swapgs_restore_regs_and_return_to_usermode
/* * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP @@ -273,14 +275,14 @@ return_from_SYSCALL_64:
/* If this changed %rcx, it was not canonical */ cmpq %rcx, %r11 - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode
cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */ - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode
movq R11(%rsp), %r11 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */ - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode
/* * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot @@ -301,12 +303,12 @@ return_from_SYSCALL_64: * would never get past 'stuck_here'. */ testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11 - jnz opportunistic_sysret_failed + jnz swapgs_restore_regs_and_return_to_usermode
/* nothing to check for RSP */
cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */ - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode
/* * We win! This label is here just for ease of understanding @@ -319,10 +321,6 @@ syscall_return_via_sysret: movq RSP(%rsp), %rsp UNWIND_HINT_EMPTY USERGS_SYSRET64 - -opportunistic_sysret_failed: - SWAPGS - jmp restore_regs_and_return_to_usermode END(entry_SYSCALL_64)
ENTRY(stub_ptregs_64) @@ -423,8 +421,7 @@ ENTRY(ret_from_fork) movq %rsp, %rdi call syscall_return_slowpath /* returns with IRQs disabled */ TRACE_IRQS_ON /* user mode is traced as IRQS on */ - SWAPGS - jmp restore_regs_and_return_to_usermode + jmp swapgs_restore_regs_and_return_to_usermode
1: /* kernel thread */ @@ -612,9 +609,8 @@ GLOBAL(retint_user) mov %rsp,%rdi call prepare_exit_to_usermode TRACE_IRQS_IRETQ - SWAPGS
-GLOBAL(restore_regs_and_return_to_usermode) +GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates user mode. */ testl $3, CS(%rsp) @@ -622,6 +618,7 @@ GLOBAL(restore_regs_and_return_to_usermo ud2 1: #endif + SWAPGS RESTORE_EXTRA_REGS RESTORE_C_REGS REMOVE_PT_GPREGS_FROM_STACK 8 @@ -1343,8 +1340,7 @@ ENTRY(nmi) * Return back to user mode. We must *not* do the normal exit * work, because we don't want to enable interrupts. */ - SWAPGS - jmp restore_regs_and_return_to_usermode + jmp swapgs_restore_regs_and_return_to_usermode
.Lnmi_from_kernel: /* --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -337,8 +337,7 @@ ENTRY(entry_INT80_compat)
/* Go back to user mode. */ TRACE_IRQS_ON - SWAPGS - jmp restore_regs_and_return_to_usermode + jmp swapgs_restore_regs_and_return_to_usermode END(entry_INT80_compat)
ENTRY(stub32_clone)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit e872045bfd9c465a8555bab4b8567d56a4d2d3bb upstream.
The old code restored all the registers with movq instead of pop.
In theory, this was done because some CPUs have higher movq throughput, but any gain there would be tiny and is almost certainly outweighed by the higher text size.
This saves 96 bytes of text.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/ad82520a207ccd851b04ba613f4f752b33ac05f7.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/calling.h | 21 +++++++++++++++++++++ arch/x86/entry/entry_64.S | 12 ++++++------ 2 files changed, 27 insertions(+), 6 deletions(-)
--- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -152,6 +152,27 @@ For 32-bit we have the following convent UNWIND_HINT_REGS offset=\offset extra=0 .endm
+ .macro POP_EXTRA_REGS + popq %r15 + popq %r14 + popq %r13 + popq %r12 + popq %rbp + popq %rbx + .endm + + .macro POP_C_REGS + popq %r11 + popq %r10 + popq %r9 + popq %r8 + popq %rax + popq %rcx + popq %rdx + popq %rsi + popq %rdi + .endm + .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 .if \rstor_r11 movq 6*8(%rsp), %r11 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -619,9 +619,9 @@ GLOBAL(swapgs_restore_regs_and_return_to 1: #endif SWAPGS - RESTORE_EXTRA_REGS - RESTORE_C_REGS - REMOVE_PT_GPREGS_FROM_STACK 8 + POP_EXTRA_REGS + POP_C_REGS + addq $8, %rsp /* skip regs->orig_ax */ INTERRUPT_RETURN
@@ -651,9 +651,9 @@ GLOBAL(restore_regs_and_return_to_kernel ud2 1: #endif - RESTORE_EXTRA_REGS - RESTORE_C_REGS - REMOVE_PT_GPREGS_FROM_STACK 8 + POP_EXTRA_REGS + POP_C_REGS + addq $8, %rsp /* skip regs->orig_ax */ INTERRUPT_RETURN
ENTRY(native_iret)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit e53178328c9b96fbdbc719e78c93b5687ee007c3 upstream.
paranoid_exit_restore was a copy of restore_regs_and_return_to_kernel. Merge them and make the paranoid_exit internal labels local.
Keeping .Lparanoid_exit makes the code a bit shorter because it allows a 2-byte jnz instead of a 5-byte jnz.
Saves 96 bytes of text.
( This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS kernel, but fixing that would make the code rather messy. )
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/510d66a1895cda9473c84b1086f0bb974f22de6a.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1124,17 +1124,14 @@ ENTRY(paranoid_exit) DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF_DEBUG testl %ebx, %ebx /* swapgs needed? */ - jnz paranoid_exit_no_swapgs + jnz .Lparanoid_exit_no_swapgs TRACE_IRQS_IRETQ SWAPGS_UNSAFE_STACK - jmp paranoid_exit_restore -paranoid_exit_no_swapgs: + jmp .Lparanoid_exit_restore +.Lparanoid_exit_no_swapgs: TRACE_IRQS_IRETQ_DEBUG -paranoid_exit_restore: - RESTORE_EXTRA_REGS - RESTORE_C_REGS - REMOVE_PT_GPREGS_FROM_STACK 8 - INTERRUPT_RETURN +.Lparanoid_exit_restore: + jmp restore_regs_and_return_to_kernel END(paranoid_exit)
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 4fbb39108f972437c44e5ffa781b56635d496826 upstream.
Saves 64 bytes.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/6609b7f74ab31c36604ad746e019ea8495aec76c.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -316,10 +316,18 @@ return_from_SYSCALL_64: */ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ - RESTORE_EXTRA_REGS - RESTORE_C_REGS_EXCEPT_RCX_R11 - movq RSP(%rsp), %rsp UNWIND_HINT_EMPTY + POP_EXTRA_REGS + popq %rsi /* skip r11 */ + popq %r10 + popq %r9 + popq %r8 + popq %rax + popq %rsi /* skip rcx */ + popq %rdx + popq %rsi + popq %rdi + movq RSP-ORIG_RAX(%rsp), %rsp USERGS_SYSRET64 END(entry_SYSCALL_64)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit a512210643da8082cb44181dba8b18e752bd68f0 upstream.
They did almost the same thing. Remove a bunch of pointless instructions (mostly hidden in macros) and reduce cognitive load by merging them.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/1204e20233fcab9130a1ba80b3b1879b5db3fc1f.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -221,10 +221,9 @@ entry_SYSCALL_64_fastpath: TRACE_IRQS_ON /* user mode is traced as IRQs on */ movq RIP(%rsp), %rcx movq EFLAGS(%rsp), %r11 - RESTORE_C_REGS_EXCEPT_RCX_R11 - movq RSP(%rsp), %rsp + addq $6*8, %rsp /* skip extra regs -- they were preserved */ UNWIND_HINT_EMPTY - USERGS_SYSRET64 + jmp .Lpop_c_regs_except_rcx_r11_and_sysret
1: /* @@ -318,6 +317,7 @@ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ UNWIND_HINT_EMPTY POP_EXTRA_REGS +.Lpop_c_regs_except_rcx_r11_and_sysret: popq %rsi /* skip r11 */ popq %r10 popq %r9
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 471ee4832209e986029b9fabdaad57b1eecb856b upstream.
This gets rid of the last user of the old RESTORE_..._REGS infrastructure.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/652a260f17a160789bc6a41d997f98249b73e2ab.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1560,11 +1560,14 @@ end_repeat_nmi: nmi_swapgs: SWAPGS_UNSAFE_STACK nmi_restore: - RESTORE_EXTRA_REGS - RESTORE_C_REGS + POP_EXTRA_REGS + POP_C_REGS
- /* Point RSP at the "iret" frame. */ - REMOVE_PT_GPREGS_FROM_STACK 6*8 + /* + * Skip orig_ax and the "outermost" frame to point RSP at the "iret" + * at the "iret" frame. + */ + addq $6*8, %rsp
/* * Clear "NMI executing". Set DF first so that we can easily
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit c39858de696f0cc160a544455e8403d663d577e9 upstream.
All users of RESTORE_EXTRA_REGS, RESTORE_C_REGS and such, and REMOVE_PT_GPREGS_FROM_STACK are gone. Delete the macros.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/c32672f6e47c561893316d48e06c7656b1039a36.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/calling.h | 52 ----------------------------------------------- 1 file changed, 52 deletions(-)
--- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -142,16 +142,6 @@ For 32-bit we have the following convent UNWIND_HINT_REGS offset=\offset .endm
- .macro RESTORE_EXTRA_REGS offset=0 - movq 0*8+\offset(%rsp), %r15 - movq 1*8+\offset(%rsp), %r14 - movq 2*8+\offset(%rsp), %r13 - movq 3*8+\offset(%rsp), %r12 - movq 4*8+\offset(%rsp), %rbp - movq 5*8+\offset(%rsp), %rbx - UNWIND_HINT_REGS offset=\offset extra=0 - .endm - .macro POP_EXTRA_REGS popq %r15 popq %r14 @@ -173,48 +163,6 @@ For 32-bit we have the following convent popq %rdi .endm
- .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 - .if \rstor_r11 - movq 6*8(%rsp), %r11 - .endif - .if \rstor_r8910 - movq 7*8(%rsp), %r10 - movq 8*8(%rsp), %r9 - movq 9*8(%rsp), %r8 - .endif - .if \rstor_rax - movq 10*8(%rsp), %rax - .endif - .if \rstor_rcx - movq 11*8(%rsp), %rcx - .endif - .if \rstor_rdx - movq 12*8(%rsp), %rdx - .endif - movq 13*8(%rsp), %rsi - movq 14*8(%rsp), %rdi - UNWIND_HINT_IRET_REGS offset=16*8 - .endm - .macro RESTORE_C_REGS - RESTORE_C_REGS_HELPER 1,1,1,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_RAX - RESTORE_C_REGS_HELPER 0,1,1,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_RCX - RESTORE_C_REGS_HELPER 1,0,1,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_R11 - RESTORE_C_REGS_HELPER 1,1,0,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_RCX_R11 - RESTORE_C_REGS_HELPER 1,0,0,1,1 - .endm - - .macro REMOVE_PT_GPREGS_FROM_STACK addskip=0 - subq $-(15*8+\addskip), %rsp - .endm - .macro icebp .byte 0xf1 .endm
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit 43e4111086a70c78bedb6ad990bee97f17b27a6e upstream.
Instead of trying to execute any NMI via the bare metal's NMI trap handler use a Xen specific one for PV domains, like we do for e.g. debug traps. As in a PV domain the NMI is handled via the normal kernel stack this is the correct thing to do.
This will enable us to get rid of the very fragile and questionable dependencies between the bare metal NMI handler and Xen assumptions believed to be broken anyway.
Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/5baf5c0528d58402441550c5770b98e7961e7680.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/traps.h | 2 +- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm_64.S | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1079,6 +1079,7 @@ idtentry int3 do_int3 has_error_code idtentry stack_segment do_stack_segment has_error_code=1
#ifdef CONFIG_XEN +idtentry xennmi do_nmi has_error_code=0 idtentry xendebug do_debug has_error_code=0 idtentry xenint3 do_int3 has_error_code=0 #endif @@ -1241,7 +1242,6 @@ ENTRY(error_exit) END(error_exit)
/* Runs on exception stack */ -/* XXX: broken on Xen PV */ ENTRY(nmi) UNWIND_HINT_IRET_REGS /* --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -38,9 +38,9 @@ asmlinkage void simd_coprocessor_error(v
#if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV) asmlinkage void xen_divide_error(void); +asmlinkage void xen_xennmi(void); asmlinkage void xen_xendebug(void); asmlinkage void xen_xenint3(void); -asmlinkage void xen_nmi(void); asmlinkage void xen_overflow(void); asmlinkage void xen_bounds(void); asmlinkage void xen_invalid_op(void); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -601,7 +601,7 @@ static struct trap_array_entry trap_arra #ifdef CONFIG_X86_MCE { machine_check, xen_machine_check, true }, #endif - { nmi, xen_nmi, true }, + { nmi, xen_xennmi, true }, { overflow, xen_overflow, false }, #ifdef CONFIG_IA32_EMULATION { entry_INT80_compat, xen_entry_INT80_compat, false }, --- a/arch/x86/xen/xen-asm_64.S +++ b/arch/x86/xen/xen-asm_64.S @@ -30,7 +30,7 @@ xen_pv_trap debug xen_pv_trap xendebug xen_pv_trap int3 xen_pv_trap xenint3 -xen_pv_trap nmi +xen_pv_trap xennmi xen_pv_trap overflow xen_pv_trap bounds xen_pv_trap invalid_op
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 929bacec21478a72c78e4f29f98fb799bd00105a upstream.
Xen PV is fundamentally incompatible with our fancy NMI code: it doesn't use IST at all, and Xen entries clobber two stack slots below the hardware frame.
Drop Xen PV support from our NMI code entirely.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Acked-by: Juergen Gross jgross@suse.com Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/bfbe711b5ae03f672f8848999a8eb2711efc7f98.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1241,9 +1241,13 @@ ENTRY(error_exit) jmp retint_user END(error_exit)
-/* Runs on exception stack */ +/* + * Runs on exception stack. Xen PV does not go through this path at all, + * so we can use real assembly here. + */ ENTRY(nmi) UNWIND_HINT_IRET_REGS + /* * We allow breakpoints in NMIs. If a breakpoint occurs, then * the iretq it performs will take us out of NMI context. @@ -1301,7 +1305,7 @@ ENTRY(nmi) * stacks lest we corrupt the "NMI executing" variable. */
- SWAPGS_UNSAFE_STACK + swapgs cld movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp @@ -1466,7 +1470,7 @@ nested_nmi_out: popq %rdx
/* We are returning to kernel mode, so this cannot result in a fault. */ - INTERRUPT_RETURN + iretq
first_nmi: /* Restore rdx. */ @@ -1497,7 +1501,7 @@ first_nmi: pushfq /* RFLAGS */ pushq $__KERNEL_CS /* CS */ pushq $1f /* RIP */ - INTERRUPT_RETURN /* continues at repeat_nmi below */ + iretq /* continues at repeat_nmi below */ UNWIND_HINT_IRET_REGS 1: #endif @@ -1572,20 +1576,22 @@ nmi_restore: /* * Clear "NMI executing". Set DF first so that we can easily * distinguish the remaining code between here and IRET from - * the SYSCALL entry and exit paths. On a native kernel, we - * could just inspect RIP, but, on paravirt kernels, - * INTERRUPT_RETURN can translate into a jump into a - * hypercall page. + * the SYSCALL entry and exit paths. + * + * We arguably should just inspect RIP instead, but I (Andy) wrote + * this code when I had the misapprehension that Xen PV supported + * NMIs, and Xen PV would break that approach. */ std movq $0, 5*8(%rsp) /* clear "NMI executing" */
/* - * INTERRUPT_RETURN reads the "iret" frame and exits the NMI - * stack in a single instruction. We are returning to kernel - * mode, so this cannot result in a fault. + * iretq reads the "iret" frame and exits the NMI stack in a + * single instruction. We are returning to kernel mode, so this + * cannot result in a fault. Similarly, we don't need to worry + * about espfix64 on the way back to kernel mode. */ - INTERRUPT_RETURN + iretq END(nmi)
ENTRY(ignore_sysret)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit bd7dc5a6afac719d8ce4092391eef2c7e83c2a75 upstream.
This causes the MSR_IA32_SYSENTER_CS write to move out of the paravirt callback. This shouldn't affect Xen PV: Xen already ignores MSR_IA32_SYSENTER_ESP writes. In any event, Xen doesn't support vm86() in a useful way.
Note to any potential backporters: This patch won't break lguest, as lguest didn't have any SYSENTER support at all.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/75cf09fe03ae778532d0ca6c65aa58e66bc2f90c.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/processor.h | 7 ------- arch/x86/include/asm/switch_to.h | 12 ++++++++++++ arch/x86/kernel/process_32.c | 4 +++- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/vm86_32.c | 6 +++++- 5 files changed, 21 insertions(+), 10 deletions(-)
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -521,13 +521,6 @@ static inline void native_load_sp0(struct tss_struct *tss, struct thread_struct *thread) { tss->x86_tss.sp0 = thread->sp0; -#ifdef CONFIG_X86_32 - /* Only happens when SEP is enabled, no need to test "SEP"arately: */ - if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) { - tss->x86_tss.ss1 = thread->sysenter_cs; - wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); - } -#endif }
static inline void native_swapgs(void) --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -73,4 +73,16 @@ do { \ ((last) = __switch_to_asm((prev), (next))); \ } while (0)
+#ifdef CONFIG_X86_32 +static inline void refresh_sysenter_cs(struct thread_struct *thread) +{ + /* Only happens when SEP is enabled, no need to test "SEP"arately: */ + if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs)) + return; + + this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs); + wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); +} +#endif + #endif /* _ASM_X86_SWITCH_TO_H */ --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p,
/* * Reload esp0 and cpu_current_top_of_stack. This changes - * current_thread_info(). + * current_thread_info(). Refresh the SYSENTER configuration in + * case prev or next is vm86. */ load_sp0(tss, next); + refresh_sysenter_cs(next); this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + THREAD_SIZE); --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -464,7 +464,7 @@ __switch_to(struct task_struct *prev_p, */ this_cpu_write(current_task, next_p);
- /* Reload esp0 and ss1. This changes current_thread_info(). */ + /* Reload sp0. */ load_sp0(tss, next);
/* --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -55,6 +55,7 @@ #include <asm/irq.h> #include <asm/traps.h> #include <asm/vm86.h> +#include <asm/switch_to.h>
/* * Known problems: @@ -150,6 +151,7 @@ void save_v86_state(struct kernel_vm86_r tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sysenter_cs = __KERNEL_CS; load_sp0(tss, &tsk->thread); + refresh_sysenter_cs(&tsk->thread); vm86->saved_sp0 = 0; put_cpu();
@@ -369,8 +371,10 @@ static long do_sys_vm86(struct vm86plus_ /* make room for real-mode segments */ tsk->thread.sp0 += 16;
- if (static_cpu_has(X86_FEATURE_SEP)) + if (static_cpu_has(X86_FEATURE_SEP)) { tsk->thread.sysenter_cs = 0; + refresh_sysenter_cs(&tsk->thread); + }
load_sp0(tss, &tsk->thread); put_cpu();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit da51da189a24bb9b7e2d5a123be096e51a4695a5 upstream.
load_sp0() had an odd signature:
void load_sp0(struct tss_struct *tss, struct thread_struct *thread);
Simplify it to:
void load_sp0(unsigned long sp0);
Also simplify a few get_cpu()/put_cpu() sequences to preempt_disable()/preempt_enable().
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/2655d8b42ed940aa384fe18ee1129bbbcf730a08.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/paravirt.h | 5 ++--- arch/x86/include/asm/paravirt_types.h | 2 +- arch/x86/include/asm/processor.h | 9 ++++----- arch/x86/kernel/cpu/common.c | 4 ++-- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/vm86_32.c | 14 ++++++-------- arch/x86/xen/enlighten_pv.c | 7 +++---- 8 files changed, 20 insertions(+), 25 deletions(-)
--- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -16,10 +16,9 @@ #include <linux/cpumask.h> #include <asm/frame.h>
-static inline void load_sp0(struct tss_struct *tss, - struct thread_struct *thread) +static inline void load_sp0(unsigned long sp0) { - PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread); + PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0); }
/* The paravirtualized CPUID instruction. */ --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -134,7 +134,7 @@ struct pv_cpu_ops { void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries); void (*free_ldt)(struct desc_struct *ldt, unsigned entries);
- void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t); + void (*load_sp0)(unsigned long sp0);
void (*set_iopl_mask)(unsigned mask);
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -518,9 +518,9 @@ static inline void native_set_iopl_mask( }
static inline void -native_load_sp0(struct tss_struct *tss, struct thread_struct *thread) +native_load_sp0(unsigned long sp0) { - tss->x86_tss.sp0 = thread->sp0; + this_cpu_write(cpu_tss.x86_tss.sp0, sp0); }
static inline void native_swapgs(void) @@ -545,10 +545,9 @@ static inline unsigned long current_top_ #else #define __cpuid native_cpuid
-static inline void load_sp0(struct tss_struct *tss, - struct thread_struct *thread) +static inline void load_sp0(unsigned long sp0) { - native_load_sp0(tss, thread); + native_load_sp0(sp0); }
#define set_iopl_mask native_set_iopl_mask --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1570,7 +1570,7 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, me);
- load_sp0(t, ¤t->thread); + load_sp0(current->thread.sp0); set_tss_desc(cpu, t); load_TR_desc(); load_mm_ldt(&init_mm); @@ -1625,7 +1625,7 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, curr);
- load_sp0(t, thread); + load_sp0(thread->sp0); set_tss_desc(cpu, t); load_TR_desc(); load_mm_ldt(&init_mm); --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, * current_thread_info(). Refresh the SYSENTER configuration in * case prev or next is vm86. */ - load_sp0(tss, next); + load_sp0(next->sp0); refresh_sysenter_cs(next); this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, this_cpu_write(current_task, next_p);
/* Reload sp0. */ - load_sp0(tss, next); + load_sp0(next->sp0);
/* * Now maybe reload the debug registers and handle I/O bitmaps --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -95,7 +95,6 @@
void save_v86_state(struct kernel_vm86_regs *regs, int retval) { - struct tss_struct *tss; struct task_struct *tsk = current; struct vm86plus_struct __user *user; struct vm86 *vm86 = current->thread.vm86; @@ -147,13 +146,13 @@ void save_v86_state(struct kernel_vm86_r do_exit(SIGSEGV); }
- tss = &per_cpu(cpu_tss, get_cpu()); + preempt_disable(); tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sysenter_cs = __KERNEL_CS; - load_sp0(tss, &tsk->thread); + load_sp0(tsk->thread.sp0); refresh_sysenter_cs(&tsk->thread); vm86->saved_sp0 = 0; - put_cpu(); + preempt_enable();
memcpy(®s->pt, &vm86->regs32, sizeof(struct pt_regs));
@@ -239,7 +238,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd
static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) { - struct tss_struct *tss; struct task_struct *tsk = current; struct vm86 *vm86 = tsk->thread.vm86; struct kernel_vm86_regs vm86regs; @@ -367,8 +365,8 @@ static long do_sys_vm86(struct vm86plus_ vm86->saved_sp0 = tsk->thread.sp0; lazy_save_gs(vm86->regs32.gs);
- tss = &per_cpu(cpu_tss, get_cpu()); /* make room for real-mode segments */ + preempt_disable(); tsk->thread.sp0 += 16;
if (static_cpu_has(X86_FEATURE_SEP)) { @@ -376,8 +374,8 @@ static long do_sys_vm86(struct vm86plus_ refresh_sysenter_cs(&tsk->thread); }
- load_sp0(tss, &tsk->thread); - put_cpu(); + load_sp0(tsk->thread.sp0); + preempt_enable();
if (vm86->flags & VM86_SCREEN_BITMAP) mark_screen_rdonly(tsk->mm); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -811,15 +811,14 @@ static void __init xen_write_gdt_entry_b } }
-static void xen_load_sp0(struct tss_struct *tss, - struct thread_struct *thread) +static void xen_load_sp0(unsigned long sp0) { struct multicall_space mcs;
mcs = xen_mc_entry(0); - MULTI_stack_switch(mcs.mc, __KERNEL_DS, thread->sp0); + MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0); xen_mc_issue(PARAVIRT_LAZY_CPU); - tss->x86_tss.sp0 = thread->sp0; + this_cpu_write(cpu_tss.x86_tss.sp0, sp0); }
void xen_set_iopl_mask(unsigned mask)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 3500130b84a3cdc5b6796eba1daf178944935efe upstream.
This will let us get rid of a few places that hardcode accesses to thread.sp0.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/b49b3f95a8ff858c40c9b0f5b32be0355324327d.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/processor.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -796,6 +796,8 @@ static inline void spin_lock_prefetch(co #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \ TOP_OF_KERNEL_STACK_PADDING)
+#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1)) + #ifdef CONFIG_X86_32 /* * User space process size: 3GB (default).
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit f16b3da1dc936c0f8121741d0a1731bf242f2f56 upstream.
I'm removing thread_struct::sp0, and Xen's usage of it is slightly dubious and unnecessary. Use appropriate helpers instead.
While we're at at, reorder the code slightly to make it more obvious what's going on.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Juergen Gross jgross@suse.com Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/d5b9a3da2b47c68325bd2bbe8f82d9554dee0d0f.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/smp_pv.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
--- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -14,6 +14,7 @@ * single-threaded. */ #include <linux/sched.h> +#include <linux/sched/task_stack.h> #include <linux/err.h> #include <linux/slab.h> #include <linux/smp.h> @@ -294,12 +295,19 @@ cpu_initialize_context(unsigned int cpu, #endif memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
+ /* + * Bring up the CPU in cpu_bringup_and_idle() with the stack + * pointing just below where pt_regs would be if it were a normal + * kernel entry. + */ ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle; ctxt->flags = VGCF_IN_KERNEL; ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */ ctxt->user_regs.ds = __USER_DS; ctxt->user_regs.es = __USER_DS; ctxt->user_regs.ss = __KERNEL_DS; + ctxt->user_regs.cs = __KERNEL_CS; + ctxt->user_regs.esp = (unsigned long)task_pt_regs(idle);
xen_copy_trap_info(ctxt->trap_ctxt);
@@ -314,8 +322,13 @@ cpu_initialize_context(unsigned int cpu, ctxt->gdt_frames[0] = gdt_mfn; ctxt->gdt_ents = GDT_ENTRIES;
+ /* + * Set SS:SP that Xen will use when entering guest kernel mode + * from guest user mode. Subsequent calls to load_sp0() can + * change this value. + */ ctxt->kernel_ss = __KERNEL_DS; - ctxt->kernel_sp = idle->thread.sp0; + ctxt->kernel_sp = task_top_of_stack(idle);
#ifdef CONFIG_X86_32 ctxt->event_callback_cs = __KERNEL_CS; @@ -327,10 +340,8 @@ cpu_initialize_context(unsigned int cpu, (unsigned long)xen_hypervisor_callback; ctxt->failsafe_callback_eip = (unsigned long)xen_failsafe_callback; - ctxt->user_regs.cs = __KERNEL_CS; per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
- ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs); ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir)); if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt)) BUG();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 20bb83443ea79087b5e5f8dab4e9d80bb9bf7acb upstream.
In my quest to get rid of thread_struct::sp0, I want to clean up or remove all of its readers. Two of them are in cpu_init() (32-bit and 64-bit), and they aren't needed. This is because we never enter userspace at all on the threads that CPUs are initialized in.
Poison the initial TSS.sp0 and stop initializing it on CPU init.
The comment text mostly comes from Dave Hansen. Thanks!
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/ee4a00540ad28c6cff475fbcc7769a4460acc861.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/common.c | 13 ++++++++++--- arch/x86/kernel/process.c | 8 +++++++- 2 files changed, 17 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1570,9 +1570,13 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, me);
- load_sp0(current->thread.sp0); + /* + * Initialize the TSS. Don't bother initializing sp0, as the initial + * task never enters user mode. + */ set_tss_desc(cpu, t); load_TR_desc(); + load_mm_ldt(&init_mm);
clear_all_debug_regs(); @@ -1594,7 +1598,6 @@ void cpu_init(void) int cpu = smp_processor_id(); struct task_struct *curr = current; struct tss_struct *t = &per_cpu(cpu_tss, cpu); - struct thread_struct *thread = &curr->thread;
wait_for_master_cpu(cpu);
@@ -1625,9 +1628,13 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, curr);
- load_sp0(thread->sp0); + /* + * Initialize the TSS. Don't bother initializing sp0, as the initial + * task never enters user mode. + */ set_tss_desc(cpu, t); load_TR_desc(); + load_mm_ldt(&init_mm);
t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -49,7 +49,13 @@ */ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { .x86_tss = { - .sp0 = TOP_OF_INIT_STACK, + /* + * .sp0 is only used when entering ring 0 from a lower + * privilege level. Since the init task never runs anything + * but ring 0 code, there is no need for a valid value here. + * Poison it. + */ + .sp0 = (1UL << (BITS_PER_LONG-1)) + 1, #ifdef CONFIG_X86_32 .ss0 = __KERNEL_DS, .ss1 = __KERNEL_CS,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 46f5a10a721ce8dce8cc8fe55279b49e1c6b3288 upstream.
The only remaining readers in context switch code or vm86(), and they all just want to update TSS.sp0 to match the current task. Replace them all with a new helper update_sp0().
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/2d231687f4ff288c9d9e98d7861b7df374246ac3.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/switch_to.h | 6 ++++++ arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/vm86_32.c | 4 ++-- 4 files changed, 10 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -85,4 +85,10 @@ static inline void refresh_sysenter_cs(s } #endif
+/* This is used when switching tasks or entering/exiting vm86 mode. */ +static inline void update_sp0(struct task_struct *task) +{ + load_sp0(task->thread.sp0); +} + #endif /* _ASM_X86_SWITCH_TO_H */ --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, * current_thread_info(). Refresh the SYSENTER configuration in * case prev or next is vm86. */ - load_sp0(next->sp0); + update_sp0(next_p); refresh_sysenter_cs(next); this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, this_cpu_write(current_task, next_p);
/* Reload sp0. */ - load_sp0(next->sp0); + update_sp0(next_p);
/* * Now maybe reload the debug registers and handle I/O bitmaps --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -149,7 +149,7 @@ void save_v86_state(struct kernel_vm86_r preempt_disable(); tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sysenter_cs = __KERNEL_CS; - load_sp0(tsk->thread.sp0); + update_sp0(tsk); refresh_sysenter_cs(&tsk->thread); vm86->saved_sp0 = 0; preempt_enable(); @@ -374,7 +374,7 @@ static long do_sys_vm86(struct vm86plus_ refresh_sysenter_cs(&tsk->thread); }
- load_sp0(tsk->thread.sp0); + update_sp0(tsk); preempt_enable();
if (vm86->flags & VM86_SCREEN_BITMAP)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit cd493a6deb8b78eca280d05f7fa73fd69403ae29 upstream.
cpu_current_top_of_stack's initialization forgot about TOP_OF_KERNEL_STACK_PADDING. This bug didn't matter because the idle threads never enter user mode.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/e5e370a7e6e4fddd1c4e4cf619765d96bb874b21.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/smpboot.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -962,8 +962,7 @@ void common_cpu_up(unsigned int cpu, str #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); - per_cpu(cpu_current_top_of_stack, cpu) = - (unsigned long)task_stack_page(idle) + THREAD_SIZE; + per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle); #else initial_gs = per_cpu_offset(cpu); #endif
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit d375cf1530595e33961a8844192cddab913650e3 upstream.
On x86_64, we can easily calculate sp0 when needed instead of storing it in thread_struct.
On x86_32, a similar cleanup would be possible, but it would require cleaning up the vm86 code first, and that can wait for a later cleanup series.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/719cd9c66c548c4350d98a90f050aee8b17f8919.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/compat.h | 1 + arch/x86/include/asm/processor.h | 28 +++++++++------------------- arch/x86/include/asm/switch_to.h | 6 ++++++ arch/x86/kernel/process_64.c | 1 - 4 files changed, 16 insertions(+), 20 deletions(-)
--- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -7,6 +7,7 @@ */ #include <linux/types.h> #include <linux/sched.h> +#include <linux/sched/task_stack.h> #include <asm/processor.h> #include <asm/user32.h> #include <asm/unistd.h> --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -431,7 +431,9 @@ typedef struct { struct thread_struct { /* Cached TLS descriptors: */ struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; +#ifdef CONFIG_X86_32 unsigned long sp0; +#endif unsigned long sp; #ifdef CONFIG_X86_32 unsigned long sysenter_cs; @@ -798,6 +800,13 @@ static inline void spin_lock_prefetch(co
#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
+#define task_pt_regs(task) \ +({ \ + unsigned long __ptr = (unsigned long)task_stack_page(task); \ + __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ + ((struct pt_regs *)__ptr) - 1; \ +}) + #ifdef CONFIG_X86_32 /* * User space process size: 3GB (default). @@ -817,23 +826,6 @@ static inline void spin_lock_prefetch(co .addr_limit = KERNEL_DS, \ }
-/* - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack. - * This is necessary to guarantee that the entire "struct pt_regs" - * is accessible even if the CPU haven't stored the SS/ESP registers - * on the stack (interrupt gate does not save these registers - * when switching to the same priv ring). - * Therefore beware: accessing the ss/esp fields of the - * "struct pt_regs" is possible, but they may contain the - * completely wrong values. - */ -#define task_pt_regs(task) \ -({ \ - unsigned long __ptr = (unsigned long)task_stack_page(task); \ - __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ - ((struct pt_regs *)__ptr) - 1; \ -}) - #define KSTK_ESP(task) (task_pt_regs(task)->sp)
#else @@ -867,11 +859,9 @@ static inline void spin_lock_prefetch(co #define STACK_TOP_MAX TASK_SIZE_MAX
#define INIT_THREAD { \ - .sp0 = TOP_OF_INIT_STACK, \ .addr_limit = KERNEL_DS, \ }
-#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1) extern unsigned long KSTK_ESP(struct task_struct *task);
#endif /* CONFIG_X86_64 */ --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_SWITCH_TO_H #define _ASM_X86_SWITCH_TO_H
+#include <linux/sched/task_stack.h> + struct task_struct; /* one of the stranger aspects of C forward declarations */
struct task_struct *__switch_to_asm(struct task_struct *prev, @@ -88,7 +90,11 @@ static inline void refresh_sysenter_cs(s /* This is used when switching tasks or entering/exiting vm86 mode. */ static inline void update_sp0(struct task_struct *task) { +#ifdef CONFIG_X86_32 load_sp0(task->thread.sp0); +#else + load_sp0(task_top_of_stack(task)); +#endif }
#endif /* _ASM_X86_SWITCH_TO_H */ --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_ struct inactive_task_frame *frame; struct task_struct *me = current;
- p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE; childregs = task_pt_regs(p); fork_frame = container_of(childregs, struct fork_frame, regs); frame = &fork_frame->frame;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 3383642c2f9d4f5b4fa37436db4a109a1a10018c upstream.
Let's keep the stack-related logic together rather than open-coding a comparison in an assertion in the traps code.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/856b15bee1f55017b8f79d3758b0d51c48a08cf8.1509609304... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/processor.h | 6 ++++++ arch/x86/kernel/traps.c | 3 +-- 2 files changed, 7 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -542,6 +542,12 @@ static inline unsigned long current_top_ #endif }
+static inline bool on_thread_stack(void) +{ + return (unsigned long)(current_top_of_stack() - + current_stack_pointer) < THREAD_SIZE; +} + #ifdef CONFIG_PARAVIRT #include <asm/paravirt.h> #else --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs * will catch asm bugs and any attempt to use ist_preempt_enable * from double_fault. */ - BUG_ON((unsigned long)(current_top_of_stack() - - current_stack_pointer) >= THREAD_SIZE); + BUG_ON(!on_thread_stack());
preempt_enable_no_resched(); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
commit 1e4c4f610f774df6088d7c065b2dd4d22adba698 upstream.
Convert TESTL to TESTB and save 3 bytes per callsite.
No functionality change.
Signed-off-by: Borislav Petkov bp@suse.de Cc: Andy Lutomirski luto@kernel.org Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20171102120926.4srwerqrr7g72e2k@pd.tnic Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -621,7 +621,7 @@ GLOBAL(retint_user) GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates user mode. */ - testl $3, CS(%rsp) + testb $3, CS(%rsp) jnz 1f ud2 1: @@ -654,7 +654,7 @@ retint_kernel: GLOBAL(restore_regs_and_return_to_kernel) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates kernel mode. */ - testl $3, CS(%rsp) + testb $3, CS(%rsp) jz 1f ud2 1:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 06dd688ddda5819025e014b79aea9af6ab475fa2 upstream.
Peter pointed out that the set/clear_bit32() variants are broken in various aspects.
Replace them with open coded set/clear_bit() and type cast cpu_info::x86_capability as it's done in all other places throughout x86.
Fixes: 0b00de857a64 ("x86/cpuid: Add generic table for CPUID dependencies") Reported-by: Peter Ziljstra peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Andi Kleen ak@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/cpuid-deps.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -62,23 +62,19 @@ const static struct cpuid_dep cpuid_deps {} };
-static inline void __clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit) -{ - clear_bit32(bit, c->x86_capability); -} - -static inline void __setup_clear_cpu_cap(unsigned int bit) -{ - clear_cpu_cap(&boot_cpu_data, bit); - set_bit32(bit, cpu_caps_cleared); -} - static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) { - if (!c) - __setup_clear_cpu_cap(feature); - else - __clear_cpu_cap(c, feature); + /* + * Note: This could use the non atomic __*_bit() variants, but the + * rest of the cpufeature code uses atomics as well, so keep it for + * consistency. Cleanup all of it separately. + */ + if (!c) { + clear_cpu_cap(&boot_cpu_data, feature); + set_bit(feature, (unsigned long *)cpu_caps_cleared); + } else { + clear_bit(feature, (unsigned long *)c->x86_capability); + } }
/* Take the capabilities and the BUG bits into account */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 1943dc07b45e347c52c1bfdd4a37e04a86e399aa upstream.
These ops are not endian safe and may break on architectures which have aligment requirements.
Reverts: cbe96375025e ("bitops: Add clear/set_bit32() to linux/bitops.h") Reported-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Andi Kleen ak@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bitops.h | 26 -------------------------- 1 file changed, 26 deletions(-)
--- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -228,32 +228,6 @@ static inline unsigned long __ffs64(u64 return __ffs((unsigned long)word); }
-/* - * clear_bit32 - Clear a bit in memory for u32 array - * @nr: Bit to clear - * @addr: u32 * address of bitmap - * - * Same as clear_bit, but avoids needing casts for u32 arrays. - */ - -static __always_inline void clear_bit32(long nr, volatile u32 *addr) -{ - clear_bit(nr, (volatile unsigned long *)addr); -} - -/* - * set_bit32 - Set a bit in memory for u32 array - * @nr: Bit to clear - * @addr: u32 * address of bitmap - * - * Same as set_bit, but avoids needing casts for u32 arrays. - */ - -static __always_inline void set_bit32(long nr, volatile u32 *addr) -{ - set_bit(nr, (volatile unsigned long *)addr); -} - #ifdef __KERNEL__
#ifndef set_mask_bits
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
commit c7da092a1f243bfd1bfb4124f538e69e941882da upstream.
... so that the difference is obvious.
No functionality change.
Signed-off-by: Borislav Petkov bp@suse.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20171103102028.20284-1-bp@alien8.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/pgtable_types.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -200,10 +200,9 @@ enum page_cache_mode {
#define _PAGE_ENC (_AT(pteval_t, sme_me_mask))
-#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ - _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC) #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ _PAGE_DIRTY | _PAGE_ENC) +#define _PAGE_TABLE (_KERNPG_TABLE | _PAGE_USER)
#define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC) #define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@kernel.org
commit acbc845ffefd9fb70466182cd8555a26189462b2 upstream.
Over the years asm/cpufeatures.h has become somewhat of a mess: the original tabulation style was too narrow, while x86 feature names also kept growing in length, creating frequent field width overflows.
Re-tabulate it to make it wider and easier to read/modify. Also harmonize the tabulation of the other defines in this file to match it.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20171031121723.28524-3-mingo@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeatures.h | 512 ++++++++++++++++++------------------- 1 file changed, 256 insertions(+), 256 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -13,8 +13,8 @@ /* * Defines x86 CPU feature bits */ -#define NCAPINTS 18 /* N 32-bit words worth of info */ -#define NBUGINTS 1 /* N 32-bit bug flags */ +#define NCAPINTS 18 /* N 32-bit words worth of info */ +#define NBUGINTS 1 /* N 32-bit bug flags */
/* * Note: If the comment begins with a quoted string, that string is used @@ -28,163 +28,163 @@ */
/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ -#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ -#define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ -#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ -#define X86_FEATURE_PSE ( 0*32+ 3) /* Page Size Extensions */ -#define X86_FEATURE_TSC ( 0*32+ 4) /* Time Stamp Counter */ -#define X86_FEATURE_MSR ( 0*32+ 5) /* Model-Specific Registers */ -#define X86_FEATURE_PAE ( 0*32+ 6) /* Physical Address Extensions */ -#define X86_FEATURE_MCE ( 0*32+ 7) /* Machine Check Exception */ -#define X86_FEATURE_CX8 ( 0*32+ 8) /* CMPXCHG8 instruction */ -#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ -#define X86_FEATURE_SEP ( 0*32+11) /* SYSENTER/SYSEXIT */ -#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ -#define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ -#define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ -#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ +#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ +#define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ +#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ +#define X86_FEATURE_PSE ( 0*32+ 3) /* Page Size Extensions */ +#define X86_FEATURE_TSC ( 0*32+ 4) /* Time Stamp Counter */ +#define X86_FEATURE_MSR ( 0*32+ 5) /* Model-Specific Registers */ +#define X86_FEATURE_PAE ( 0*32+ 6) /* Physical Address Extensions */ +#define X86_FEATURE_MCE ( 0*32+ 7) /* Machine Check Exception */ +#define X86_FEATURE_CX8 ( 0*32+ 8) /* CMPXCHG8 instruction */ +#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ +#define X86_FEATURE_SEP ( 0*32+11) /* SYSENTER/SYSEXIT */ +#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ +#define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ +#define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ +#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ /* (plus FCMOVcc, FCOMI with FPU) */ -#define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ -#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ -#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ -#define X86_FEATURE_CLFLUSH ( 0*32+19) /* CLFLUSH instruction */ -#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ -#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ -#define X86_FEATURE_MMX ( 0*32+23) /* Multimedia Extensions */ -#define X86_FEATURE_FXSR ( 0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ -#define X86_FEATURE_XMM ( 0*32+25) /* "sse" */ -#define X86_FEATURE_XMM2 ( 0*32+26) /* "sse2" */ -#define X86_FEATURE_SELFSNOOP ( 0*32+27) /* "ss" CPU self snoop */ -#define X86_FEATURE_HT ( 0*32+28) /* Hyper-Threading */ -#define X86_FEATURE_ACC ( 0*32+29) /* "tm" Automatic clock control */ -#define X86_FEATURE_IA64 ( 0*32+30) /* IA-64 processor */ -#define X86_FEATURE_PBE ( 0*32+31) /* Pending Break Enable */ +#define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ +#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ +#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ +#define X86_FEATURE_CLFLUSH ( 0*32+19) /* CLFLUSH instruction */ +#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ +#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ +#define X86_FEATURE_MMX ( 0*32+23) /* Multimedia Extensions */ +#define X86_FEATURE_FXSR ( 0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ +#define X86_FEATURE_XMM ( 0*32+25) /* "sse" */ +#define X86_FEATURE_XMM2 ( 0*32+26) /* "sse2" */ +#define X86_FEATURE_SELFSNOOP ( 0*32+27) /* "ss" CPU self snoop */ +#define X86_FEATURE_HT ( 0*32+28) /* Hyper-Threading */ +#define X86_FEATURE_ACC ( 0*32+29) /* "tm" Automatic clock control */ +#define X86_FEATURE_IA64 ( 0*32+30) /* IA-64 processor */ +#define X86_FEATURE_PBE ( 0*32+31) /* Pending Break Enable */
/* AMD-defined CPU features, CPUID level 0x80000001, word 1 */ /* Don't duplicate feature flags which are redundant with Intel! */ -#define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ -#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ -#define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ -#define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ -#define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ -#define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ -#define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ -#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ -#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ -#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */ +#define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ +#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ +#define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ +#define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ +#define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ +#define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ +#define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ +#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ +#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ +#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */
/* Transmeta-defined CPU features, CPUID level 0x80860001, word 2 */ -#define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ -#define X86_FEATURE_LONGRUN ( 2*32+ 1) /* Longrun power control */ -#define X86_FEATURE_LRTI ( 2*32+ 3) /* LongRun table interface */ +#define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ +#define X86_FEATURE_LONGRUN ( 2*32+ 1) /* Longrun power control */ +#define X86_FEATURE_LRTI ( 2*32+ 3) /* LongRun table interface */
/* Other features, Linux-defined mapping, word 3 */ /* This range is used for feature bits which conflict or are synthesized */ -#define X86_FEATURE_CXMMX ( 3*32+ 0) /* Cyrix MMX extensions */ -#define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ -#define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ -#define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ +#define X86_FEATURE_CXMMX ( 3*32+ 0) /* Cyrix MMX extensions */ +#define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ +#define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ +#define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ /* cpu types for specific tunings: */ -#define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ -#define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ -#define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ -#define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ -#define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ -#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ -#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ -#define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ -#define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ -#define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ -#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ -#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ -#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ -#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ -#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ -#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ -#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ -#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ -#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ -#define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ -#define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ -#define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ -#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ -#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ -#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ -#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ -#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ +#define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ +#define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ +#define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ +#define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ +#define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ +#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ +#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ +#define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ +#define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ +#define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ +#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ +#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ +#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ +#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ +#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ +#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ +#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ +#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ +#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ +#define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ +#define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ +#define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ +#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ +#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ +#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ +#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ +#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ -#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ -#define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ -#define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ -#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ -#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ -#define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ -#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ -#define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ -#define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ -#define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ -#define X86_FEATURE_CID ( 4*32+10) /* Context ID */ -#define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ -#define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ -#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ -#define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ -#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ -#define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ -#define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ -#define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ -#define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ -#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ -#define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ -#define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ +#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ +#define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ +#define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ +#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ +#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ +#define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ +#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ +#define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ +#define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ +#define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ +#define X86_FEATURE_CID ( 4*32+10) /* Context ID */ +#define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ +#define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ +#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ +#define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ +#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ +#define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ +#define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ +#define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ +#define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ +#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ +#define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ +#define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ #define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* Tsc deadline timer */ -#define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ -#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ -#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ -#define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ -#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ -#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ -#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */ +#define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ +#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ +#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ +#define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ +#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ +#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ +#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */
/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */ -#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */ -#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */ -#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */ -#define X86_FEATURE_XCRYPT_EN ( 5*32+ 7) /* "ace_en" on-CPU crypto enabled */ -#define X86_FEATURE_ACE2 ( 5*32+ 8) /* Advanced Cryptography Engine v2 */ -#define X86_FEATURE_ACE2_EN ( 5*32+ 9) /* ACE v2 enabled */ -#define X86_FEATURE_PHE ( 5*32+10) /* PadLock Hash Engine */ -#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */ -#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ -#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */ +#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */ +#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */ +#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */ +#define X86_FEATURE_XCRYPT_EN ( 5*32+ 7) /* "ace_en" on-CPU crypto enabled */ +#define X86_FEATURE_ACE2 ( 5*32+ 8) /* Advanced Cryptography Engine v2 */ +#define X86_FEATURE_ACE2_EN ( 5*32+ 9) /* ACE v2 enabled */ +#define X86_FEATURE_PHE ( 5*32+10) /* PadLock Hash Engine */ +#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */ +#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ +#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */
/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */ -#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ -#define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ -#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ -#define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ -#define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ -#define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ -#define X86_FEATURE_SSE4A ( 6*32+ 6) /* SSE-4A */ -#define X86_FEATURE_MISALIGNSSE ( 6*32+ 7) /* Misaligned SSE mode */ -#define X86_FEATURE_3DNOWPREFETCH ( 6*32+ 8) /* 3DNow prefetch instructions */ -#define X86_FEATURE_OSVW ( 6*32+ 9) /* OS Visible Workaround */ -#define X86_FEATURE_IBS ( 6*32+10) /* Instruction Based Sampling */ -#define X86_FEATURE_XOP ( 6*32+11) /* extended AVX instructions */ -#define X86_FEATURE_SKINIT ( 6*32+12) /* SKINIT/STGI instructions */ -#define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ -#define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ -#define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ -#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ -#define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ -#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ -#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ -#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ -#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ -#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ -#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ -#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ -#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */ +#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ +#define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ +#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ +#define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ +#define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ +#define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ +#define X86_FEATURE_SSE4A ( 6*32+ 6) /* SSE-4A */ +#define X86_FEATURE_MISALIGNSSE ( 6*32+ 7) /* Misaligned SSE mode */ +#define X86_FEATURE_3DNOWPREFETCH ( 6*32+ 8) /* 3DNow prefetch instructions */ +#define X86_FEATURE_OSVW ( 6*32+ 9) /* OS Visible Workaround */ +#define X86_FEATURE_IBS ( 6*32+10) /* Instruction Based Sampling */ +#define X86_FEATURE_XOP ( 6*32+11) /* extended AVX instructions */ +#define X86_FEATURE_SKINIT ( 6*32+12) /* SKINIT/STGI instructions */ +#define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ +#define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ +#define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ +#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ +#define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ +#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ +#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ +#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ +#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ +#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ +#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ +#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ +#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
/* * Auxiliary flags: Linux defined - For features scattered in various @@ -192,152 +192,152 @@ * * Reuse free bits when adding new feature flags! */ -#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ -#define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ -#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ -#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ -#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ -#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ -#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ - -#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ -#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ -#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ - -#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ -#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ -#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ -#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ +#define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ +#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ +#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ +#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ +#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ +#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ + +#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ +#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ + +#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ +#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ +#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
-#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ +#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
/* Virtualization flags: Linux defined, word 8 */ -#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ -#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ -#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ -#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ -#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ +#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ +#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ +#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ +#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ +#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */
-#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ -#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ +#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ -#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ -#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ -#define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ -#define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ -#define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ -#define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ -#define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ -#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ -#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ -#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ -#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ -#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ -#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ -#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ -#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ -#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ -#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ -#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ -#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ -#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ -#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ -#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ -#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ -#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ -#define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ -#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ -#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ +#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ +#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ +#define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ +#define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ +#define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ +#define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ +#define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ +#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ +#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ +#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ +#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ +#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ +#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ +#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ +#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ +#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ +#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ +#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ +#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ +#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ +#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ +#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ +#define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ +#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ +#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */
/* Extended state features, CPUID level 0x0000000d:1 (eax), word 10 */ -#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ -#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ -#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ -#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ +#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ +#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ +#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ +#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */
/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (edx), word 11 */ -#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */ +#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */
/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */ -#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ -#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ -#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ +#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ +#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ +#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */
/* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ -#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ -#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */ +#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ +#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ -#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ -#define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ -#define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ -#define X86_FEATURE_PLN (14*32+ 4) /* Intel Power Limit Notification */ -#define X86_FEATURE_PTS (14*32+ 6) /* Intel Package Thermal Status */ -#define X86_FEATURE_HWP (14*32+ 7) /* Intel Hardware P-states */ -#define X86_FEATURE_HWP_NOTIFY (14*32+ 8) /* HWP Notification */ -#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */ -#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ -#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ +#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ +#define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ +#define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ +#define X86_FEATURE_PLN (14*32+ 4) /* Intel Power Limit Notification */ +#define X86_FEATURE_PTS (14*32+ 6) /* Intel Package Thermal Status */ +#define X86_FEATURE_HWP (14*32+ 7) /* Intel Hardware P-states */ +#define X86_FEATURE_HWP_NOTIFY (14*32+ 8) /* HWP Notification */ +#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */ +#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ +#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */
/* AMD SVM Feature Identification, CPUID level 0x8000000a (edx), word 15 */ -#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ -#define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ -#define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ -#define X86_FEATURE_NRIPS (15*32+ 3) /* "nrip_save" SVM next_rip save */ -#define X86_FEATURE_TSCRATEMSR (15*32+ 4) /* "tsc_scale" TSC scaling support */ -#define X86_FEATURE_VMCBCLEAN (15*32+ 5) /* "vmcb_clean" VMCB clean bits support */ -#define X86_FEATURE_FLUSHBYASID (15*32+ 6) /* flush-by-ASID support */ -#define X86_FEATURE_DECODEASSISTS (15*32+ 7) /* Decode Assists support */ -#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */ -#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */ -#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ -#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ -#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */ +#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ +#define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ +#define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ +#define X86_FEATURE_NRIPS (15*32+ 3) /* "nrip_save" SVM next_rip save */ +#define X86_FEATURE_TSCRATEMSR (15*32+ 4) /* "tsc_scale" TSC scaling support */ +#define X86_FEATURE_VMCBCLEAN (15*32+ 5) /* "vmcb_clean" VMCB clean bits support */ +#define X86_FEATURE_FLUSHBYASID (15*32+ 6) /* flush-by-ASID support */ +#define X86_FEATURE_DECODEASSISTS (15*32+ 7) /* Decode Assists support */ +#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */ +#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */ +#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ +#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ +#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ -#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ -#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ -#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ -#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ -#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ -#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ -#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ -#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ -#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ -#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ -#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ -#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ +#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ +#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ +#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ +#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ +#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ +#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ +#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ +#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ +#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */
/* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */ -#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ -#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ -#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ +#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ +#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ +#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */
/* * BUG word(s) */ -#define X86_BUG(x) (NCAPINTS*32 + (x)) +#define X86_BUG(x) (NCAPINTS*32 + (x))
-#define X86_BUG_F00F X86_BUG(0) /* Intel F00F */ -#define X86_BUG_FDIV X86_BUG(1) /* FPU FDIV */ -#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ -#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* "tlb_mmatch" AMD Erratum 383 */ -#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* "apic_c1e" AMD Erratum 400 */ -#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */ -#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ -#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ -#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */ +#define X86_BUG_F00F X86_BUG(0) /* Intel F00F */ +#define X86_BUG_FDIV X86_BUG(1) /* FPU FDIV */ +#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ +#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* "tlb_mmatch" AMD Erratum 383 */ +#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* "apic_c1e" AMD Erratum 400 */ +#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */ +#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ +#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ +#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */ #ifdef CONFIG_X86_32 /* * 64-bit kernels don't use X86_BUG_ESPFIX. Make the define conditional * to avoid confusion. */ -#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ +#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ #endif -#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ -#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ -#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ -#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ +#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ +#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ +#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ +#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ #endif /* _ASM_X86_CPUFEATURES_H */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@kernel.org
commit f3a624e901c633593156f7b00ca743a6204a29bc upstream.
Kept this commit separate from the re-tabulation changes, to make the changes easier to review:
- add better explanation for entries with no explanation - fix/enhance the text of some of the entries - fix the vertical alignment of some of the feature number definitions - fix inconsistent capitalization - ... and lots of other small details
i.e. make it all more of a coherent unit, instead of a patchwork of years of additions.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@amacapital.net Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20171031121723.28524-4-mingo@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeatures.h | 149 ++++++++++++++++++------------------- 1 file changed, 74 insertions(+), 75 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -20,14 +20,12 @@ * Note: If the comment begins with a quoted string, that string is used * in /proc/cpuinfo instead of the macro name. If the string is "", * this feature bit is not displayed in /proc/cpuinfo at all. - */ - -/* + * * When adding new features here that depend on other features, - * please update the table in kernel/cpu/cpuid-deps.c + * please update the table in kernel/cpu/cpuid-deps.c as well. */
-/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ +/* Intel-defined CPU features, CPUID level 0x00000001 (EDX), word 0 */ #define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ #define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ #define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ @@ -42,8 +40,7 @@ #define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ #define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ #define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ -#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ - /* (plus FCMOVcc, FCOMI with FPU) */ +#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions (plus FCMOVcc, FCOMI with FPU) */ #define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ #define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ #define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ @@ -63,15 +60,15 @@ /* AMD-defined CPU features, CPUID level 0x80000001, word 1 */ /* Don't duplicate feature flags which are redundant with Intel! */ #define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ -#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ +#define X86_FEATURE_MP ( 1*32+19) /* MP Capable */ #define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ #define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ #define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ #define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ #define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ -#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ -#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ -#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */ +#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64, 64-bit support) */ +#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow extensions */ +#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow */
/* Transmeta-defined CPU features, CPUID level 0x80860001, word 2 */ #define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ @@ -84,66 +81,67 @@ #define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ #define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ #define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ -/* cpu types for specific tunings: */ + +/* CPU types for specific tunings: */ #define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ #define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ #define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ #define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ -#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ -#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ +#define X86_FEATURE_UP ( 3*32+ 9) /* SMP kernel running on UP */ +#define X86_FEATURE_ART ( 3*32+10) /* Always running timer (ART) */ #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ #define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ #define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ -#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ -#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ -#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ -#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ -#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ +#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */ +#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ +#define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ +#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" MFENCE synchronizes RDTSC */ +#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */ #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ #define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ -#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ +#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* CPU topology enum extensions */ #define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ #define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ #define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ -#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ -#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ -#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ +#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* Extended APICID (8 bits) */ +#define X86_FEATURE_AMD_DCM ( 3*32+27) /* AMD multi-node processor */ +#define X86_FEATURE_APERFMPERF ( 3*32+28) /* P-State hardware coordination feedback capability (APERF/MPERF MSRs) */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ #define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */
-/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ +/* Intel-defined CPU features, CPUID level 0x00000001 (ECX), word 4 */ #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ #define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ #define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ -#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ -#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ +#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" MONITOR/MWAIT support */ +#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL-qualified (filtered) Debug Store */ #define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ -#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ +#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer Mode eXtensions */ #define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ #define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ #define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ #define X86_FEATURE_CID ( 4*32+10) /* Context ID */ #define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ #define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ -#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ +#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B instruction */ #define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ -#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ +#define X86_FEATURE_PDCM ( 4*32+15) /* Perf/Debug Capabilities MSR */ #define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ #define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ #define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ #define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ -#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ +#define X86_FEATURE_X2APIC ( 4*32+21) /* X2APIC */ #define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ #define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ -#define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* Tsc deadline timer */ +#define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* TSC deadline timer */ #define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ -#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ -#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ +#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV instructions */ +#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE instruction enabled in the OS */ #define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ -#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ -#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ +#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit FP conversions */ +#define X86_FEATURE_RDRAND ( 4*32+30) /* RDRAND instruction */ #define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */
/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */ @@ -158,10 +156,10 @@ #define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ #define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */
-/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */ +/* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */ #define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ #define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ -#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ +#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure Virtual Machine */ #define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ #define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ #define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ @@ -175,16 +173,16 @@ #define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ #define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ #define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ -#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ +#define X86_FEATURE_TCE ( 6*32+17) /* Translation Cache Extension */ #define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ -#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ -#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ -#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ +#define X86_FEATURE_TBM ( 6*32+21) /* Trailing Bit Manipulations */ +#define X86_FEATURE_TOPOEXT ( 6*32+22) /* Topology extensions CPUID leafs */ +#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* Core performance counter extensions */ #define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ -#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ -#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ +#define X86_FEATURE_BPEXT ( 6*32+26) /* Data breakpoint extension */ +#define X86_FEATURE_PTSC ( 6*32+27) /* Performance time-stamp counter */ #define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ -#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */ +#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX instructions) */
/* * Auxiliary flags: Linux defined - For features scattered in various @@ -192,7 +190,7 @@ * * Reuse free bits when adding new feature flags! */ -#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ +#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT instructions */ #define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ #define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ #define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ @@ -206,8 +204,8 @@
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ -#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ -#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
@@ -218,19 +216,19 @@ #define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ #define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */
-#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */ #define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ -#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ -#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ +/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ +#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ +#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3B */ #define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ #define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ #define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ #define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ #define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ -#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB instructions */ #define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ #define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ @@ -238,8 +236,8 @@ #define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ #define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ -#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ -#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ +#define X86_FEATURE_RDSEED ( 9*32+18) /* RDSEED instruction */ +#define X86_FEATURE_ADX ( 9*32+19) /* ADCX and ADOX instructions */ #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ #define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ @@ -251,25 +249,25 @@ #define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ #define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */
-/* Extended state features, CPUID level 0x0000000d:1 (eax), word 10 */ -#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ -#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ -#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ -#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ +/* Extended state features, CPUID level 0x0000000d:1 (EAX), word 10 */ +#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT instruction */ +#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */ +#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */ +#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (edx), word 11 */ +/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */ #define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */ -#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ +/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */ +#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring */ #define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ #define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */
-/* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ -#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ -#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */ +/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ +#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ +#define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */
-/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ +/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ #define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ #define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ @@ -281,7 +279,7 @@ #define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ #define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */
-/* AMD SVM Feature Identification, CPUID level 0x8000000a (edx), word 15 */ +/* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */ #define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ #define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ #define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ @@ -296,24 +294,24 @@ #define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ #define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ +/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ -#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ -#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ -#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ +#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ +#define X86_FEATURE_AVX512_VNNI (16*32+11) /* Vector Neural Network Instructions */ +#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */
-/* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */ -#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ -#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ -#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ +/* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ +#define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */ +#define X86_FEATURE_SUCCOR (17*32+ 1) /* Uncorrectable error containment and recovery */ +#define X86_FEATURE_SMCA (17*32+ 3) /* Scalable MCA */
/* * BUG word(s) @@ -340,4 +338,5 @@ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ + #endif /* _ASM_X86_CPUFEATURES_H */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit d744dcad39094c9187075e274d1cdef79c57c8b5 upstream.
Much of the test design could apply to set_thread_area() (i.e. GDT), not just modify_ldt(). Add set_thread_area() to the install_valid_mode() helper.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/02c23f8fba5547007f741dc24c3926e5284ede02.1509794321... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/x86/ldt_gdt.c | 53 +++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 16 deletions(-)
--- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -137,30 +137,51 @@ static void check_valid_segment(uint16_t } }
-static bool install_valid_mode(const struct user_desc *desc, uint32_t ar, - bool oldmode) +static bool install_valid_mode(const struct user_desc *d, uint32_t ar, + bool oldmode, bool ldt) { - int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11, - desc, sizeof(*desc)); - if (ret < -1) - errno = -ret; + struct user_desc desc = *d; + int ret; + + if (!ldt) { +#ifndef __i386__ + /* No point testing set_thread_area in a 64-bit build */ + return false; +#endif + if (!gdt_entry_num) + return false; + desc.entry_number = gdt_entry_num; + + ret = syscall(SYS_set_thread_area, &desc); + } else { + ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11, + &desc, sizeof(desc)); + + if (ret < -1) + errno = -ret; + + if (ret != 0 && errno == ENOSYS) { + printf("[OK]\tmodify_ldt returned -ENOSYS\n"); + return false; + } + } + if (ret == 0) { - uint32_t limit = desc->limit; - if (desc->limit_in_pages) + uint32_t limit = desc.limit; + if (desc.limit_in_pages) limit = (limit << 12) + 4095; - check_valid_segment(desc->entry_number, 1, ar, limit, true); + check_valid_segment(desc.entry_number, ldt, ar, limit, true); return true; - } else if (errno == ENOSYS) { - printf("[OK]\tmodify_ldt returned -ENOSYS\n"); - return false; } else { - if (desc->seg_32bit) { - printf("[FAIL]\tUnexpected modify_ldt failure %d\n", + if (desc.seg_32bit) { + printf("[FAIL]\tUnexpected %s failure %d\n", + ldt ? "modify_ldt" : "set_thread_area", errno); nerrs++; return false; } else { - printf("[OK]\tmodify_ldt rejected 16 bit segment\n"); + printf("[OK]\t%s rejected 16 bit segment\n", + ldt ? "modify_ldt" : "set_thread_area"); return false; } } @@ -168,7 +189,7 @@ static bool install_valid_mode(const str
static bool install_valid(const struct user_desc *desc, uint32_t ar) { - return install_valid_mode(desc, ar, false); + return install_valid_mode(desc, ar, false, true); }
static void install_invalid(const struct user_desc *desc, bool oldmode)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit adedf2893c192dd09b1cc2f2dcfdd7cad99ec49d upstream.
Now that the main test infrastructure supports the GDT, run tests that will pass the kernel's GDT permission tests against the GDT.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/686a1eda63414da38fcecc2412db8dba1ae40581.1509794321... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/x86/ldt_gdt.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -189,7 +189,15 @@ static bool install_valid_mode(const str
static bool install_valid(const struct user_desc *desc, uint32_t ar) { - return install_valid_mode(desc, ar, false, true); + bool ret = install_valid_mode(desc, ar, false, true); + + if (desc->contents <= 1 && desc->seg_32bit && + !desc->seg_not_present) { + /* Should work in the GDT, too. */ + install_valid_mode(desc, ar, false, false); + } + + return ret; }
static void install_invalid(const struct user_desc *desc, bool oldmode)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
commit 4f89fa286f6729312e227e7c2d764e8e7b9d340e upstream.
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping.
The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.
clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one().
Reported-by: Fengguang Wu fengguang.wu@intel.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Borislav Petkov bp@suse.de Tested-by: Tyler Baicar tbaicar@codeaurora.org Tested-by: Toshi Kani toshi.kani@hpe.com [ For the arm64 bits: ] Acked-by: Will Deacon will.deacon@arm.com [ For the x86 bits: ] Acked-by: Ingo Molnar mingo@kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: All applicable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/fixmap.h | 7 ++++++ arch/x86/include/asm/fixmap.h | 6 +++++ drivers/acpi/apei/ghes.c | 44 ++++++++++++---------------------------- 3 files changed, 27 insertions(+), 30 deletions(-)
--- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -51,6 +51,13 @@ enum fixed_addresses {
FIX_EARLYCON_MEM_BASE, FIX_TEXT_POKE0, + +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif /* CONFIG_ACPI_APEI_GHES */ + __end_of_permanent_fixed_addresses,
/* --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,6 +104,12 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif + __end_of_permanent_fixed_addresses,
/* --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -51,6 +51,7 @@ #include <acpi/actbl1.h> #include <acpi/ghes.h> #include <acpi/apei.h> +#include <asm/fixmap.h> #include <asm/tlbflush.h> #include <ras/ras_event.h>
@@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so - * a special version of atomic ioremap is implemented for that. + * the fixmap is used instead. */
/* @@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); /* virtual memory area for atomic ioremap */ static struct vm_struct *ghes_ioremap_area; /* - * These 2 spinlock is used to prevent atomic ioremap virtual memory - * area from being mapped simultaneously. + * These 2 spinlocks are used to prevent the fixmap entries from being used + * simultaneously. */ static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); @@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void)
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); + __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
- return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); }
static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); + __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); - - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); }
-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) +static void ghes_iounmap_nmi(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_NMI); }
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr) +static void ghes_iounmap_irq(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_IRQ); }
static int ghes_estatus_pool_init(void) @@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void * paddr += trunk; buffer += trunk; if (in_nmi) { - ghes_iounmap_nmi(vaddr); + ghes_iounmap_nmi(); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { - ghes_iounmap_irq(vaddr); + ghes_iounmap_irq(); spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit f72e38e8ec8869ac0ba5a75d7d2f897d98a1454e upstream.
Instead of x86_hyper being either NULL on bare metal or a pointer to a struct hypervisor_x86 in case of the kernel running as a guest merge the struct into x86_platform and x86_init.
This will remove the need for wrappers making it hard to find out what is being called. With dummy functions added for all callbacks testing for a NULL function pointer can be removed, too.
Suggested-by: Ingo Molnar mingo@kernel.org Signed-off-by: Juergen Gross jgross@suse.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: devel@linuxdriverproject.org Cc: haiyangz@microsoft.com Cc: kvm@vger.kernel.org Cc: kys@microsoft.com Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: rusty@rustcorp.com.au Cc: sthemmin@microsoft.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20171109132739.23465-2-jgross@suse.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/hypervisor.h | 25 +++------------- arch/x86/include/asm/x86_init.h | 24 ++++++++++++++++ arch/x86/kernel/apic/apic.c | 2 - arch/x86/kernel/cpu/hypervisor.c | 56 ++++++++++++++++++-------------------- arch/x86/kernel/cpu/mshyperv.c | 2 - arch/x86/kernel/cpu/vmware.c | 4 +- arch/x86/kernel/kvm.c | 2 - arch/x86/kernel/x86_init.c | 9 ++++++ arch/x86/mm/init.c | 2 - arch/x86/xen/enlighten_hvm.c | 8 ++--- arch/x86/xen/enlighten_pv.c | 2 - include/linux/hypervisor.h | 8 ++++- 12 files changed, 82 insertions(+), 62 deletions(-)
--- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -23,6 +23,7 @@ #ifdef CONFIG_HYPERVISOR_GUEST
#include <asm/kvm_para.h> +#include <asm/x86_init.h> #include <asm/xen/hypervisor.h>
/* @@ -35,17 +36,11 @@ struct hypervisor_x86 { /* Detection routine */ uint32_t (*detect)(void);
- /* Platform setup (run once per boot) */ - void (*init_platform)(void); + /* init time callbacks */ + struct x86_hyper_init init;
- /* X2APIC detection (run once per boot) */ - bool (*x2apic_available)(void); - - /* pin current vcpu to specified physical cpu (run rarely) */ - void (*pin_vcpu)(int); - - /* called during init_mem_mapping() to setup early mappings. */ - void (*init_mem_mapping)(void); + /* runtime callbacks */ + struct x86_hyper_runtime runtime; };
extern const struct hypervisor_x86 *x86_hyper; @@ -58,17 +53,7 @@ extern const struct hypervisor_x86 x86_h extern const struct hypervisor_x86 x86_hyper_kvm;
extern void init_hypervisor_platform(void); -extern bool hypervisor_x2apic_available(void); -extern void hypervisor_pin_vcpu(int cpu); - -static inline void hypervisor_init_mem_mapping(void) -{ - if (x86_hyper && x86_hyper->init_mem_mapping) - x86_hyper->init_mem_mapping(); -} #else static inline void init_hypervisor_platform(void) { } -static inline bool hypervisor_x2apic_available(void) { return false; } -static inline void hypervisor_init_mem_mapping(void) { } #endif /* CONFIG_HYPERVISOR_GUEST */ #endif /* _ASM_X86_HYPERVISOR_H */ --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -115,6 +115,18 @@ struct x86_init_pci { };
/** + * struct x86_hyper_init - x86 hypervisor init functions + * @init_platform: platform setup + * @x2apic_available: X2APIC detection + * @init_mem_mapping: setup early mappings during init_mem_mapping() + */ +struct x86_hyper_init { + void (*init_platform)(void); + bool (*x2apic_available)(void); + void (*init_mem_mapping)(void); +}; + +/** * struct x86_init_ops - functions for platform specific setup * */ @@ -127,6 +139,7 @@ struct x86_init_ops { struct x86_init_timers timers; struct x86_init_iommu iommu; struct x86_init_pci pci; + struct x86_hyper_init hyper; };
/** @@ -200,6 +213,15 @@ struct x86_legacy_features { };
/** + * struct x86_hyper_runtime - x86 hypervisor specific runtime callbacks + * + * @pin_vcpu: pin current vcpu to specified physical cpu (run rarely) + */ +struct x86_hyper_runtime { + void (*pin_vcpu)(int cpu); +}; + +/** * struct x86_platform_ops - platform specific runtime functions * @calibrate_cpu: calibrate CPU * @calibrate_tsc: calibrate TSC, if different from CPU @@ -218,6 +240,7 @@ struct x86_legacy_features { * possible in x86_early_init_platform_quirks() by * only using the current x86_hardware_subarch * semantics. + * @hyper: x86 hypervisor specific runtime callbacks */ struct x86_platform_ops { unsigned long (*calibrate_cpu)(void); @@ -233,6 +256,7 @@ struct x86_platform_ops { void (*apic_post_init)(void); struct x86_legacy_features legacy; void (*set_legacy_features)(void); + struct x86_hyper_runtime hyper; };
struct pci_dev; --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1645,7 +1645,7 @@ static __init void try_to_enable_x2apic( * under KVM */ if (max_physical_apicid > 255 || - !hypervisor_x2apic_available()) { + !x86_init.hyper.x2apic_available()) { pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n"); x2apic_disable(); return; --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -44,51 +44,49 @@ static const __initconst struct hypervis const struct hypervisor_x86 *x86_hyper; EXPORT_SYMBOL(x86_hyper);
-static inline void __init +static inline const struct hypervisor_x86 * __init detect_hypervisor_vendor(void) { - const struct hypervisor_x86 *h, * const *p; + const struct hypervisor_x86 *h = NULL, * const *p; uint32_t pri, max_pri = 0;
for (p = hypervisors; p < hypervisors + ARRAY_SIZE(hypervisors); p++) { - h = *p; - pri = h->detect(); - if (pri != 0 && pri > max_pri) { + pri = (*p)->detect(); + if (pri > max_pri) { max_pri = pri; - x86_hyper = h; + h = *p; } }
- if (max_pri) - pr_info("Hypervisor detected: %s\n", x86_hyper->name); -} - -void __init init_hypervisor_platform(void) -{ - - detect_hypervisor_vendor(); + if (h) + pr_info("Hypervisor detected: %s\n", h->name);
- if (!x86_hyper) - return; - - if (x86_hyper->init_platform) - x86_hyper->init_platform(); + return h; }
-bool __init hypervisor_x2apic_available(void) +static void __init copy_array(const void *src, void *target, unsigned int size) { - return x86_hyper && - x86_hyper->x2apic_available && - x86_hyper->x2apic_available(); + unsigned int i, n = size / sizeof(void *); + const void * const *from = (const void * const *)src; + const void **to = (const void **)target; + + for (i = 0; i < n; i++) + if (from[i]) + to[i] = from[i]; }
-void hypervisor_pin_vcpu(int cpu) +void __init init_hypervisor_platform(void) { - if (!x86_hyper) + const struct hypervisor_x86 *h; + + h = detect_hypervisor_vendor(); + + if (!h) return;
- if (x86_hyper->pin_vcpu) - x86_hyper->pin_vcpu(cpu); - else - WARN_ONCE(1, "vcpu pinning requested but not supported!\n"); + copy_array(&h->init, &x86_init.hyper, sizeof(h->init)); + copy_array(&h->runtime, &x86_platform.hyper, sizeof(h->runtime)); + + x86_hyper = h; + x86_init.hyper.init_platform(); } --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -257,6 +257,6 @@ static void __init ms_hyperv_init_platfo const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = { .name = "Microsoft Hyper-V", .detect = ms_hyperv_platform, - .init_platform = ms_hyperv_init_platform, + .init.init_platform = ms_hyperv_init_platform, }; EXPORT_SYMBOL(x86_hyper_ms_hyperv); --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -208,7 +208,7 @@ static bool __init vmware_legacy_x2apic_ const __refconst struct hypervisor_x86 x86_hyper_vmware = { .name = "VMware", .detect = vmware_platform, - .init_platform = vmware_platform_setup, - .x2apic_available = vmware_legacy_x2apic_available, + .init.init_platform = vmware_platform_setup, + .init.x2apic_available = vmware_legacy_x2apic_available, }; EXPORT_SYMBOL(x86_hyper_vmware); --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -547,7 +547,7 @@ static uint32_t __init kvm_detect(void) const struct hypervisor_x86 x86_hyper_kvm __refconst = { .name = "KVM", .detect = kvm_detect, - .x2apic_available = kvm_para_available, + .init.x2apic_available = kvm_para_available, }; EXPORT_SYMBOL_GPL(x86_hyper_kvm);
--- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -28,6 +28,8 @@ void x86_init_noop(void) { } void __init x86_init_uint_noop(unsigned int unused) { } int __init iommu_init_noop(void) { return 0; } void iommu_shutdown_noop(void) { } +bool __init bool_x86_init_noop(void) { return false; } +void x86_op_int_noop(int cpu) { }
/* * The platform setup functions are preset with the default functions @@ -81,6 +83,12 @@ struct x86_init_ops x86_init __initdata .init_irq = x86_default_pci_init_irq, .fixup_irqs = x86_default_pci_fixup_irqs, }, + + .hyper = { + .init_platform = x86_init_noop, + .x2apic_available = bool_x86_init_noop, + .init_mem_mapping = x86_init_noop, + }, };
struct x86_cpuinit_ops x86_cpuinit = { @@ -101,6 +109,7 @@ struct x86_platform_ops x86_platform __r .get_nmi_reason = default_get_nmi_reason, .save_sched_clock_state = tsc_save_sched_clock_state, .restore_sched_clock_state = tsc_restore_sched_clock_state, + .hyper.pin_vcpu = x86_op_int_noop, };
EXPORT_SYMBOL_GPL(x86_platform); --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -671,7 +671,7 @@ void __init init_mem_mapping(void) load_cr3(swapper_pg_dir); __flush_tlb_all();
- hypervisor_init_mem_mapping(); + x86_init.hyper.init_mem_mapping();
early_memtest(0, max_pfn_mapped << PAGE_SHIFT); } --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -229,9 +229,9 @@ static uint32_t __init xen_platform_hvm( const struct hypervisor_x86 x86_hyper_xen_hvm = { .name = "Xen HVM", .detect = xen_platform_hvm, - .init_platform = xen_hvm_guest_init, - .pin_vcpu = xen_pin_vcpu, - .x2apic_available = xen_x2apic_para_available, - .init_mem_mapping = xen_hvm_init_mem_mapping, + .init.init_platform = xen_hvm_guest_init, + .init.x2apic_available = xen_x2apic_para_available, + .init.init_mem_mapping = xen_hvm_init_mem_mapping, + .runtime.pin_vcpu = xen_pin_vcpu, }; EXPORT_SYMBOL(x86_hyper_xen_hvm); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1462,6 +1462,6 @@ static uint32_t __init xen_platform_pv(v const struct hypervisor_x86 x86_hyper_xen_pv = { .name = "Xen PV", .detect = xen_platform_pv, - .pin_vcpu = xen_pin_vcpu, + .runtime.pin_vcpu = xen_pin_vcpu, }; EXPORT_SYMBOL(x86_hyper_xen_pv); --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -7,8 +7,12 @@ * Juergen Gross jgross@suse.com */
-#ifdef CONFIG_HYPERVISOR_GUEST -#include <asm/hypervisor.h> +#ifdef CONFIG_X86 +#include <asm/x86_init.h> +static inline void hypervisor_pin_vcpu(int cpu) +{ + x86_platform.hyper.pin_vcpu(cpu); +} #else static inline void hypervisor_pin_vcpu(int cpu) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit 03b2a320b19f1424e9ac9c21696be9c60b6d0d93 upstream.
The x86_hyper pointer is only used for checking whether a virtual device is supporting the hypervisor the system is running on.
Use an enum for that purpose instead and drop the x86_hyper pointer.
Signed-off-by: Juergen Gross jgross@suse.com Acked-by: Thomas Gleixner tglx@linutronix.de Acked-by: Xavier Deguillard xdeguillard@vmware.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: akataria@vmware.com Cc: arnd@arndb.de Cc: boris.ostrovsky@oracle.com Cc: devel@linuxdriverproject.org Cc: dmitry.torokhov@gmail.com Cc: gregkh@linuxfoundation.org Cc: haiyangz@microsoft.com Cc: kvm@vger.kernel.org Cc: kys@microsoft.com Cc: linux-graphics-maintainer@vmware.com Cc: linux-input@vger.kernel.org Cc: moltmann@vmware.com Cc: pbonzini@redhat.com Cc: pv-drivers@vmware.com Cc: rkrcmar@redhat.com Cc: sthemmin@microsoft.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20171109132739.23465-3-jgross@suse.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/hyperv/hv_init.c | 2 +- arch/x86/include/asm/hypervisor.h | 23 ++++++++++++++--------- arch/x86/kernel/cpu/hypervisor.c | 12 +++++++++--- arch/x86/kernel/cpu/mshyperv.c | 4 ++-- arch/x86/kernel/cpu/vmware.c | 4 ++-- arch/x86/kernel/kvm.c | 4 ++-- arch/x86/xen/enlighten_hvm.c | 4 ++-- arch/x86/xen/enlighten_pv.c | 4 ++-- drivers/hv/vmbus_drv.c | 2 +- drivers/input/mouse/vmmouse.c | 10 ++++------ drivers/misc/vmw_balloon.c | 2 +- 11 files changed, 40 insertions(+), 31 deletions(-)
--- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -113,7 +113,7 @@ void hyperv_init(void) u64 guest_id; union hv_x64_msr_hypercall_contents hypercall_msr;
- if (x86_hyper != &x86_hyper_ms_hyperv) + if (x86_hyper_type != X86_HYPER_MS_HYPERV) return;
/* Allocate percpu VP index */ --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -29,6 +29,16 @@ /* * x86 hypervisor information */ + +enum x86_hypervisor_type { + X86_HYPER_NATIVE = 0, + X86_HYPER_VMWARE, + X86_HYPER_MS_HYPERV, + X86_HYPER_XEN_PV, + X86_HYPER_XEN_HVM, + X86_HYPER_KVM, +}; + struct hypervisor_x86 { /* Hypervisor name */ const char *name; @@ -36,6 +46,9 @@ struct hypervisor_x86 { /* Detection routine */ uint32_t (*detect)(void);
+ /* Hypervisor type */ + enum x86_hypervisor_type type; + /* init time callbacks */ struct x86_hyper_init init;
@@ -43,15 +56,7 @@ struct hypervisor_x86 { struct x86_hyper_runtime runtime; };
-extern const struct hypervisor_x86 *x86_hyper; - -/* Recognized hypervisors */ -extern const struct hypervisor_x86 x86_hyper_vmware; -extern const struct hypervisor_x86 x86_hyper_ms_hyperv; -extern const struct hypervisor_x86 x86_hyper_xen_pv; -extern const struct hypervisor_x86 x86_hyper_xen_hvm; -extern const struct hypervisor_x86 x86_hyper_kvm; - +extern enum x86_hypervisor_type x86_hyper_type; extern void init_hypervisor_platform(void); #else static inline void init_hypervisor_platform(void) { } --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -26,6 +26,12 @@ #include <asm/processor.h> #include <asm/hypervisor.h>
+extern const struct hypervisor_x86 x86_hyper_vmware; +extern const struct hypervisor_x86 x86_hyper_ms_hyperv; +extern const struct hypervisor_x86 x86_hyper_xen_pv; +extern const struct hypervisor_x86 x86_hyper_xen_hvm; +extern const struct hypervisor_x86 x86_hyper_kvm; + static const __initconst struct hypervisor_x86 * const hypervisors[] = { #ifdef CONFIG_XEN_PV @@ -41,8 +47,8 @@ static const __initconst struct hypervis #endif };
-const struct hypervisor_x86 *x86_hyper; -EXPORT_SYMBOL(x86_hyper); +enum x86_hypervisor_type x86_hyper_type; +EXPORT_SYMBOL(x86_hyper_type);
static inline const struct hypervisor_x86 * __init detect_hypervisor_vendor(void) @@ -87,6 +93,6 @@ void __init init_hypervisor_platform(voi copy_array(&h->init, &x86_init.hyper, sizeof(h->init)); copy_array(&h->runtime, &x86_platform.hyper, sizeof(h->runtime));
- x86_hyper = h; + x86_hyper_type = h->type; x86_init.hyper.init_platform(); } --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -254,9 +254,9 @@ static void __init ms_hyperv_init_platfo #endif }
-const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = { +const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = { .name = "Microsoft Hyper-V", .detect = ms_hyperv_platform, + .type = X86_HYPER_MS_HYPERV, .init.init_platform = ms_hyperv_init_platform, }; -EXPORT_SYMBOL(x86_hyper_ms_hyperv); --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -205,10 +205,10 @@ static bool __init vmware_legacy_x2apic_ (eax & (1 << VMWARE_PORT_CMD_LEGACY_X2APIC)) != 0; }
-const __refconst struct hypervisor_x86 x86_hyper_vmware = { +const __initconst struct hypervisor_x86 x86_hyper_vmware = { .name = "VMware", .detect = vmware_platform, + .type = X86_HYPER_VMWARE, .init.init_platform = vmware_platform_setup, .init.x2apic_available = vmware_legacy_x2apic_available, }; -EXPORT_SYMBOL(x86_hyper_vmware); --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -544,12 +544,12 @@ static uint32_t __init kvm_detect(void) return kvm_cpuid_base(); }
-const struct hypervisor_x86 x86_hyper_kvm __refconst = { +const __initconst struct hypervisor_x86 x86_hyper_kvm = { .name = "KVM", .detect = kvm_detect, + .type = X86_HYPER_KVM, .init.x2apic_available = kvm_para_available, }; -EXPORT_SYMBOL_GPL(x86_hyper_kvm);
static __init int activate_jump_labels(void) { --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -226,12 +226,12 @@ static uint32_t __init xen_platform_hvm( return xen_cpuid_base(); }
-const struct hypervisor_x86 x86_hyper_xen_hvm = { +const __initconst struct hypervisor_x86 x86_hyper_xen_hvm = { .name = "Xen HVM", .detect = xen_platform_hvm, + .type = X86_HYPER_XEN_HVM, .init.init_platform = xen_hvm_guest_init, .init.x2apic_available = xen_x2apic_para_available, .init.init_mem_mapping = xen_hvm_init_mem_mapping, .runtime.pin_vcpu = xen_pin_vcpu, }; -EXPORT_SYMBOL(x86_hyper_xen_hvm); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1459,9 +1459,9 @@ static uint32_t __init xen_platform_pv(v return 0; }
-const struct hypervisor_x86 x86_hyper_xen_pv = { +const __initconst struct hypervisor_x86 x86_hyper_xen_pv = { .name = "Xen PV", .detect = xen_platform_pv, + .type = X86_HYPER_XEN_PV, .runtime.pin_vcpu = xen_pin_vcpu, }; -EXPORT_SYMBOL(x86_hyper_xen_pv); --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1534,7 +1534,7 @@ static int __init hv_acpi_init(void) { int ret, t;
- if (x86_hyper != &x86_hyper_ms_hyperv) + if (x86_hyper_type != X86_HYPER_MS_HYPERV) return -ENODEV;
init_completion(&probe_event); --- a/drivers/input/mouse/vmmouse.c +++ b/drivers/input/mouse/vmmouse.c @@ -316,11 +316,9 @@ static int vmmouse_enable(struct psmouse /* * Array of supported hypervisors. */ -static const struct hypervisor_x86 *vmmouse_supported_hypervisors[] = { - &x86_hyper_vmware, -#ifdef CONFIG_KVM_GUEST - &x86_hyper_kvm, -#endif +static enum x86_hypervisor_type vmmouse_supported_hypervisors[] = { + X86_HYPER_VMWARE, + X86_HYPER_KVM, };
/** @@ -331,7 +329,7 @@ static bool vmmouse_check_hypervisor(voi int i;
for (i = 0; i < ARRAY_SIZE(vmmouse_supported_hypervisors); i++) - if (vmmouse_supported_hypervisors[i] == x86_hyper) + if (vmmouse_supported_hypervisors[i] == x86_hyper_type) return true;
return false; --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -1271,7 +1271,7 @@ static int __init vmballoon_init(void) * Check if we are running on VMware's hypervisor and bail out * if we are not. */ - if (x86_hyper != &x86_hyper_vmware) + if (x86_hyper_type != X86_HYPER_VMWARE) return -ENODEV;
for (is_2m_pages = 0; is_2m_pages < VMW_BALLOON_NUM_PAGE_SIZES;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@kernel.org
commit 1784f9144b143a1e8b19fe94083b040aa559182b upstream.
We'd like to use the 'PTI' acronym for 'Page Table Isolation' - free up the namespace by renaming the <linux/pti.h> driver header to <linux/intel-pti.h>.
(Also standardize the header guard name while at it.)
Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: J Freyensee james_p_freyensee@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/pti.c | 2 +- include/linux/intel-pti.h | 43 +++++++++++++++++++++++++++++++++++++++++++ include/linux/pti.h | 43 ------------------------------------------- 3 files changed, 44 insertions(+), 44 deletions(-)
--- a/drivers/misc/pti.c +++ b/drivers/misc/pti.c @@ -32,7 +32,7 @@ #include <linux/pci.h> #include <linux/mutex.h> #include <linux/miscdevice.h> -#include <linux/pti.h> +#include <linux/intel-pti.h> #include <linux/slab.h> #include <linux/uaccess.h>
--- /dev/null +++ b/include/linux/intel-pti.h @@ -0,0 +1,43 @@ +/* + * Copyright (C) Intel 2011 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * The PTI (Parallel Trace Interface) driver directs trace data routed from + * various parts in the system out through the Intel Penwell PTI port and + * out of the mobile device for analysis with a debugging tool + * (Lauterbach, Fido). This is part of a solution for the MIPI P1149.7, + * compact JTAG, standard. + * + * This header file will allow other parts of the OS to use the + * interface to write out it's contents for debugging a mobile system. + */ + +#ifndef LINUX_INTEL_PTI_H_ +#define LINUX_INTEL_PTI_H_ + +/* offset for last dword of any PTI message. Part of MIPI P1149.7 */ +#define PTI_LASTDWORD_DTS 0x30 + +/* basic structure used as a write address to the PTI HW */ +struct pti_masterchannel { + u8 master; + u8 channel; +}; + +/* the following functions are defined in misc/pti.c */ +void pti_writedata(struct pti_masterchannel *mc, u8 *buf, int count); +struct pti_masterchannel *pti_request_masterchannel(u8 type, + const char *thread_name); +void pti_release_masterchannel(struct pti_masterchannel *mc); + +#endif /* LINUX_INTEL_PTI_H_ */ --- a/include/linux/pti.h +++ /dev/null @@ -1,43 +0,0 @@ -/* - * Copyright (C) Intel 2011 - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - * - * The PTI (Parallel Trace Interface) driver directs trace data routed from - * various parts in the system out through the Intel Penwell PTI port and - * out of the mobile device for analysis with a debugging tool - * (Lauterbach, Fido). This is part of a solution for the MIPI P1149.7, - * compact JTAG, standard. - * - * This header file will allow other parts of the OS to use the - * interface to write out it's contents for debugging a mobile system. - */ - -#ifndef PTI_H_ -#define PTI_H_ - -/* offset for last dword of any PTI message. Part of MIPI P1149.7 */ -#define PTI_LASTDWORD_DTS 0x30 - -/* basic structure used as a write address to the PTI HW */ -struct pti_masterchannel { - u8 master; - u8 channel; -}; - -/* the following functions are defined in misc/pti.c */ -void pti_writedata(struct pti_masterchannel *mc, u8 *buf, int count); -struct pti_masterchannel *pti_request_masterchannel(u8 type, - const char *thread_name); -void pti_release_masterchannel(struct pti_masterchannel *mc); - -#endif /*PTI_H_*/
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
commit a8b4db562e7283a1520f9e9730297ecaab7622ea upstream.
[ Note, this is a Git cherry-pick of the following commit: (limited to the cpufeatures.h file)
3522c2a6a4f3 ("x86/cpufeature: Add User-Mode Instruction Prevention definitions")
... for easier x86 PTI code testing and back-porting. ]
User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register
This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Cc: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Chen Yucong slaoub@gmail.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: H. Peter Anvin hpa@zytor.com Cc: Huang Rui ray.huang@amd.com Cc: Jiri Slaby jslaby@suse.cz Cc: Jonathan Corbet corbet@lwn.net Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Michael S. Tsirkin mst@redhat.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ravi V. Shankar ravi.v.shankar@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Tony Luck tony.luck@intel.com Cc: Vlastimil Babka vbabka@suse.cz Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-cald... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -296,6 +296,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rudolf Marek r.marek@assembler.cz
commit f2dbad36c55e5d3a91dccbde6e8cae345fe5632f upstream.
[ Note, this is a Git cherry-pick of the following commit:
2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD")
... for easier x86 PTI code testing and back-porting. ]
The latest AMD AMD64 Architecture Programmer's Manual adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).
If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers, thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.
Signed-Off-By: Rudolf Marek r.marek@assembler.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Tested-by: Borislav Petkov bp@suse.de Cc: Andy Lutomirski luto@amacapital.net Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/amd.c | 7 +++++-- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -266,6 +266,7 @@ /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ #define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */ +#define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* Always save/restore FP error pointers */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -804,8 +804,11 @@ static void init_amd(struct cpuinfo_x86 case 0x17: init_amd_zn(c); break; }
- /* Enable workaround for FXSAVE leak */ - if (c->x86 >= 6) + /* + * Enable workaround for FXSAVE leak on CPUs + * without a XSaveErPtr feature + */ + if ((c->x86 >= 6) && (!cpu_has(c, X86_FEATURE_XSAVEERPTR))) set_cpu_bug(c, X86_BUG_FXSAVE_LEAK);
cpu_detect_cache_sizes(c);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit 2fe1bc1f501d55e5925b4035bcd85781adc76c63 upstream.
[ Note, this is a Git cherry-pick of the following commit:
a47ba4d77e12 ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
... for easier x86 PTI code testing and back-porting. ]
Currently free running PEBS is disabled when user or interrupt registers are requested. Most of the registers are actually available in the PEBS record and can be supported.
So we just need to check for the supported registers and then allow it: it is all except for the segment register.
For user registers this only works when the counter is limited to ring 3 only, so this also needs to be checked.
Signed-off-by: Andi Kleen ak@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/intel/core.c | 4 ++++ arch/x86/events/perf_event.h | 24 +++++++++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2958,6 +2958,10 @@ static unsigned long intel_pmu_free_runn
if (event->attr.use_clockid) flags &= ~PERF_SAMPLE_TIME; + if (!event->attr.exclude_kernel) + flags &= ~PERF_SAMPLE_REGS_USER; + if (event->attr.sample_regs_user & ~PEBS_REGS) + flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); return flags; }
--- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -85,13 +85,15 @@ struct amd_nb { * Flags PEBS can handle without an PMI. * * TID can only be handled by flushing at context switch. + * REGS_USER can be handled for events limited to ring 3. * */ #define PEBS_FREERUNNING_FLAGS \ (PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | \ PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | \ PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \ - PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR) + PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \ + PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)
/* * A debug store configuration. @@ -110,6 +112,26 @@ struct debug_store { u64 pebs_event_reset[MAX_PEBS_EVENTS]; };
+#define PEBS_REGS \ + (PERF_REG_X86_AX | \ + PERF_REG_X86_BX | \ + PERF_REG_X86_CX | \ + PERF_REG_X86_DX | \ + PERF_REG_X86_DI | \ + PERF_REG_X86_SI | \ + PERF_REG_X86_SP | \ + PERF_REG_X86_BP | \ + PERF_REG_X86_IP | \ + PERF_REG_X86_FLAGS | \ + PERF_REG_X86_R8 | \ + PERF_REG_X86_R9 | \ + PERF_REG_X86_R10 | \ + PERF_REG_X86_R11 | \ + PERF_REG_X86_R12 | \ + PERF_REG_X86_R13 | \ + PERF_REG_X86_R14 | \ + PERF_REG_X86_R15) + /* * Per register state. */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
commit ab95477e7cb35557ecfc837687007b646bab9a9f upstream.
[ Note, this is a Git cherry-pick of the following commit:
a23f06f06dbe ("bpf: fix build issues on um due to mising bpf_perf_event.h")
... for easier x86 PTI code testing and back-porting. ]
Since c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build on i386 or x86_64:
[...] CC init/main.o In file included from ../include/linux/perf_event.h:18:0, from ../include/linux/trace_events.h:10, from ../include/trace/syscall.h:7, from ../include/linux/syscalls.h:82, from ../init/main.c:20: ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error: asm/bpf_perf_event.h: No such file or directory #include <asm/bpf_perf_event.h> [...]
Lets add missing bpf_perf_event.h also to um arch. This seems to be the only one still missing.
Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type") Reported-by: Randy Dunlap rdunlap@infradead.org Suggested-by: Richard Weinberger richard@sigma-star.at Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Randy Dunlap rdunlap@infradead.org Cc: Hendrik Brueckner brueckner@linux.vnet.ibm.com Cc: Richard Weinberger richard@sigma-star.at Acked-by: Alexei Starovoitov ast@kernel.org Acked-by: Richard Weinberger richard@nod.at Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/um/include/asm/Kbuild | 1 + 1 file changed, 1 insertion(+)
--- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -1,4 +1,5 @@ generic-y += barrier.h +generic-y += bpf_perf_event.h generic-y += bug.h generic-y += clkdev.h generic-y += current.h
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Will Deacon will.deacon@arm.com
commit c2bc66082e1048c7573d72e62f597bdc5ce13fea upstream.
[ Note, this is a Git cherry-pick of the following commit:
76ebbe78f739 ("locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()")
... for easier x86 PTI code testing and back-porting. ]
In preparation for the removal of lockless_dereference(), which is the same as READ_ONCE() on all architectures other than Alpha, add an implicit smp_read_barrier_depends() to READ_ONCE() so that it can be used to head dependency chains on all architectures.
Signed-off-by: Will Deacon will.deacon@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Paul E. McKenney paulmck@linux.vnet.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/1508840570-22169-3-git-send-email-will.deacon@arm.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler.h | 1 + 1 file changed, 1 insertion(+)
--- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -341,6 +341,7 @@ static __always_inline void __write_once __read_once_size(&(x), __u.__c, sizeof(x)); \ else \ __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ __u.__val; \ }) #define READ_ONCE(x) __READ_ONCE(x, 1)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Will Deacon will.deacon@arm.com
commit 3382290ed2d5e275429cef510ab21889d3ccd164 upstream.
[ Note, this is a Git cherry-pick of the following commit:
506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")
... for easier x86 PTI code testing and back-porting. ]
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics.
Signed-off-by: Will Deacon will.deacon@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Paul E. McKenney paulmck@linux.vnet.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/core.c | 2 +- arch/x86/include/asm/mmu_context.h | 4 ++-- arch/x86/kernel/ldt.c | 2 +- drivers/md/dm-mpath.c | 20 ++++++++++---------- fs/dcache.c | 4 ++-- fs/overlayfs/ovl_entry.h | 2 +- fs/overlayfs/readdir.c | 2 +- include/linux/rculist.h | 4 ++-- include/linux/rcupdate.h | 4 ++-- kernel/events/core.c | 4 ++-- kernel/seccomp.c | 2 +- kernel/task_work.c | 2 +- mm/slab.h | 2 +- 13 files changed, 27 insertions(+), 27 deletions(-)
--- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(un struct ldt_struct *ldt;
/* IRQs are off, so this synchronizes with smp_store_release */ - ldt = lockless_dereference(current->active_mm->context.ldt); + ldt = READ_ONCE(current->active_mm->context.ldt); if (!ldt || idx >= ldt->nr_entries) return 0;
--- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -73,8 +73,8 @@ static inline void load_mm_ldt(struct mm #ifdef CONFIG_MODIFY_LDT_SYSCALL struct ldt_struct *ldt;
- /* lockless_dereference synchronizes with smp_store_release */ - ldt = lockless_dereference(mm->context.ldt); + /* READ_ONCE synchronizes with smp_store_release */ + ldt = READ_ONCE(mm->context.ldt);
/* * Any change to mm->context.ldt is followed by an IPI to all --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -103,7 +103,7 @@ static void finalize_ldt_struct(struct l static void install_ldt(struct mm_struct *current_mm, struct ldt_struct *ldt) { - /* Synchronizes with lockless_dereference in load_mm_ldt. */ + /* Synchronizes with READ_ONCE in load_mm_ldt. */ smp_store_release(¤t_mm->context.ldt, ldt);
/* Activate the LDT for all CPUs using current_mm. */ --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(
pgpath = path_to_pgpath(path);
- if (unlikely(lockless_dereference(m->current_pg) != pg)) { + if (unlikely(READ_ONCE(m->current_pg) != pg)) { /* Only update current_pgpath if pg changed */ spin_lock_irqsave(&m->lock, flags); m->current_pgpath = pgpath; @@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(stru }
/* Were we instructed to switch PG? */ - if (lockless_dereference(m->next_pg)) { + if (READ_ONCE(m->next_pg)) { spin_lock_irqsave(&m->lock, flags); pg = m->next_pg; if (!pg) { @@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(stru
/* Don't change PG until it has no remaining paths */ check_current_pg: - pg = lockless_dereference(m->current_pg); + pg = READ_ONCE(m->current_pg); if (pg) { pgpath = choose_path_in_pg(m, pg, nr_bytes); if (!IS_ERR_OR_NULL(pgpath)) @@ -473,7 +473,7 @@ static int multipath_clone_and_map(struc struct request *clone;
/* Do we need to select a new pgpath? */ - pgpath = lockless_dereference(m->current_pgpath); + pgpath = READ_ONCE(m->current_pgpath); if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) pgpath = choose_pgpath(m, nr_bytes);
@@ -533,7 +533,7 @@ static int __multipath_map_bio(struct mu bool queue_io;
/* Do we need to select a new pgpath? */ - pgpath = lockless_dereference(m->current_pgpath); + pgpath = READ_ONCE(m->current_pgpath); queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags); if (!pgpath || !queue_io) pgpath = choose_pgpath(m, nr_bytes); @@ -1802,7 +1802,7 @@ static int multipath_prepare_ioctl(struc struct pgpath *current_pgpath; int r;
- current_pgpath = lockless_dereference(m->current_pgpath); + current_pgpath = READ_ONCE(m->current_pgpath); if (!current_pgpath) current_pgpath = choose_pgpath(m, 0);
@@ -1824,7 +1824,7 @@ static int multipath_prepare_ioctl(struc }
if (r == -ENOTCONN) { - if (!lockless_dereference(m->current_pg)) { + if (!READ_ONCE(m->current_pg)) { /* Path status changed, redo selection */ (void) choose_pgpath(m, 0); } @@ -1893,9 +1893,9 @@ static int multipath_busy(struct dm_targ return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED);
/* Guess which priority_group will be used at next mapping time */ - pg = lockless_dereference(m->current_pg); - next_pg = lockless_dereference(m->next_pg); - if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg)) + pg = READ_ONCE(m->current_pg); + next_pg = READ_ONCE(m->next_pg); + if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg)) pg = next_pg;
if (!pg) { --- a/fs/dcache.c +++ b/fs/dcache.c @@ -231,7 +231,7 @@ static inline int dentry_cmp(const struc { /* * Be careful about RCU walk racing with rename: - * use 'lockless_dereference' to fetch the name pointer. + * use 'READ_ONCE' to fetch the name pointer. * * NOTE! Even if a rename will mean that the length * was not loaded atomically, we don't care. The @@ -245,7 +245,7 @@ static inline int dentry_cmp(const struc * early because the data cannot match (there can * be no NUL in the ct/tcount data) */ - const unsigned char *cs = lockless_dereference(dentry->d_name.name); + const unsigned char *cs = READ_ONCE(dentry->d_name.name);
return dentry_string_cmp(cs, ct, tcount); } --- a/fs/overlayfs/ovl_entry.h +++ b/fs/overlayfs/ovl_entry.h @@ -77,5 +77,5 @@ static inline struct ovl_inode *OVL_I(st
static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi) { - return lockless_dereference(oi->__upperdentry); + return READ_ONCE(oi->__upperdentry); } --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -757,7 +757,7 @@ static int ovl_dir_fsync(struct file *fi if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) { struct inode *inode = file_inode(file);
- realfile = lockless_dereference(od->upperfile); + realfile = READ_ONCE(od->upperfile); if (!realfile) { struct path upperpath;
--- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -275,7 +275,7 @@ static inline void list_splice_tail_init * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). */ #define list_entry_rcu(ptr, type, member) \ - container_of(lockless_dereference(ptr), type, member) + container_of(READ_ONCE(ptr), type, member)
/* * Where are list_empty_rcu() and list_first_entry_rcu()? @@ -368,7 +368,7 @@ static inline void list_splice_tail_init * example is when items are added to the list, but never deleted. */ #define list_entry_lockless(ptr, type, member) \ - container_of((typeof(ptr))lockless_dereference(ptr), type, member) + container_of((typeof(ptr))READ_ONCE(ptr), type, member)
/** * list_for_each_entry_lockless - iterate over rcu list of given type --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_che #define __rcu_dereference_check(p, c, space) \ ({ \ /* Dependency order vs. p above. */ \ - typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ + typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ rcu_dereference_sparse(p, space); \ ((typeof(*p) __force __kernel *)(________p1)); \ @@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_che #define rcu_dereference_raw(p) \ ({ \ /* Dependency order vs. p above. */ \ - typeof(p) ________p1 = lockless_dereference(p); \ + typeof(p) ________p1 = READ_ONCE(p); \ ((typeof(*p) __force __kernel *)(________p1)); \ })
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4233,7 +4233,7 @@ static void perf_remove_from_owner(struc * indeed free this event, otherwise we need to serialize on * owner->perf_event_mutex. */ - owner = lockless_dereference(event->owner); + owner = READ_ONCE(event->owner); if (owner) { /* * Since delayed_put_task_struct() also drops the last @@ -4330,7 +4330,7 @@ again: * Cannot change, child events are not migrated, see the * comment with perf_event_ctx_lock_nested(). */ - ctx = lockless_dereference(child->ctx); + ctx = READ_ONCE(child->ctx); /* * Since child_mutex nests inside ctx::mutex, we must jump * through hoops. We start by grabbing a reference on the ctx. --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -190,7 +190,7 @@ static u32 seccomp_run_filters(const str u32 ret = SECCOMP_RET_ALLOW; /* Make sure cross-thread synced filter points somewhere sane. */ struct seccomp_filter *f = - lockless_dereference(current->seccomp.filter); + READ_ONCE(current->seccomp.filter);
/* Ensure unexpected behavior doesn't result in failing open. */ if (unlikely(WARN_ON(f == NULL))) --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -68,7 +68,7 @@ task_work_cancel(struct task_struct *tas * we raced with task_work_run(), *pprev == NULL/exited. */ raw_spin_lock_irqsave(&task->pi_lock, flags); - while ((work = lockless_dereference(*pprev))) { + while ((work = READ_ONCE(*pprev))) { if (work->func != func) pprev = &work->next; else if (cmpxchg(pprev, work, work->next) == work) --- a/mm/slab.h +++ b/mm/slab.h @@ -259,7 +259,7 @@ cache_from_memcg_idx(struct kmem_cache * * memcg_caches issues a write barrier to match this (see * memcg_create_kmem_cache()). */ - cachep = lockless_dereference(arr->entries[idx]); + cachep = READ_ONCE(arr->entries[idx]); rcu_read_unlock();
return cachep;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Ryabinin aryabinin@virtuozzo.com
commit 2aeb07365bcd489620f71390a7d2031cd4dfb83e upstream.
[ Note, this is a Git cherry-pick of the following commit:
d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
... for easier x86 PTI code testing and back-porting. ]
The KASAN shadow is currently mapped using vmemmap_populate() since that provides a semi-convenient way to map pages into init_top_pgt. However, since that no longer zeroes the mapped pages, it is not suitable for KASAN, which requires zeroed shadow memory.
Add kasan_populate_shadow() interface and use it instead of vmemmap_populate(). Besides, this allows us to take advantage of gigantic pages and use them to populate the shadow, which should save us some memory wasted on page tables and reduce TLB pressure.
Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com Signed-off-by: Andrey Ryabinin aryabinin@virtuozzo.com Signed-off-by: Pavel Tatashin pasha.tatashin@oracle.com Cc: Andy Lutomirski luto@kernel.org Cc: Steven Sistare steven.sistare@oracle.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: Bob Picco bob.picco@oracle.com Cc: Michal Hocko mhocko@suse.com Cc: Alexander Potapenko glider@google.com Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christian Borntraeger borntraeger@de.ibm.com Cc: David S. Miller davem@davemloft.net Cc: Dmitry Vyukov dvyukov@google.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Matthew Wilcox willy@infradead.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Sam Ravnborg sam@ravnborg.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Will Deacon will.deacon@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/Kconfig | 2 arch/x86/mm/kasan_init_64.c | 143 +++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 137 insertions(+), 8 deletions(-)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -108,7 +108,7 @@ config X86 select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE select HAVE_ARCH_JUMP_LABEL - select HAVE_ARCH_KASAN if X86_64 && SPARSEMEM_VMEMMAP + select HAVE_ARCH_KASAN if X86_64 select HAVE_ARCH_KGDB select HAVE_ARCH_KMEMCHECK select HAVE_ARCH_MMAP_RND_BITS if MMU --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -4,12 +4,14 @@ #include <linux/bootmem.h> #include <linux/kasan.h> #include <linux/kdebug.h> +#include <linux/memblock.h> #include <linux/mm.h> #include <linux/sched.h> #include <linux/sched/task.h> #include <linux/vmalloc.h>
#include <asm/e820/types.h> +#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/sections.h> #include <asm/pgtable.h> @@ -18,7 +20,134 @@ extern struct range pfn_mapped[E820_MAX_
static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
-static int __init map_range(struct range *range) +static __init void *early_alloc(size_t size, int nid) +{ + return memblock_virt_alloc_try_nid_nopanic(size, size, + __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); +} + +static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr, + unsigned long end, int nid) +{ + pte_t *pte; + + if (pmd_none(*pmd)) { + void *p; + + if (boot_cpu_has(X86_FEATURE_PSE) && + ((end - addr) == PMD_SIZE) && + IS_ALIGNED(addr, PMD_SIZE)) { + p = early_alloc(PMD_SIZE, nid); + if (p && pmd_set_huge(pmd, __pa(p), PAGE_KERNEL)) + return; + else if (p) + memblock_free(__pa(p), PMD_SIZE); + } + + p = early_alloc(PAGE_SIZE, nid); + pmd_populate_kernel(&init_mm, pmd, p); + } + + pte = pte_offset_kernel(pmd, addr); + do { + pte_t entry; + void *p; + + if (!pte_none(*pte)) + continue; + + p = early_alloc(PAGE_SIZE, nid); + entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL); + set_pte_at(&init_mm, addr, pte, entry); + } while (pte++, addr += PAGE_SIZE, addr != end); +} + +static void __init kasan_populate_pud(pud_t *pud, unsigned long addr, + unsigned long end, int nid) +{ + pmd_t *pmd; + unsigned long next; + + if (pud_none(*pud)) { + void *p; + + if (boot_cpu_has(X86_FEATURE_GBPAGES) && + ((end - addr) == PUD_SIZE) && + IS_ALIGNED(addr, PUD_SIZE)) { + p = early_alloc(PUD_SIZE, nid); + if (p && pud_set_huge(pud, __pa(p), PAGE_KERNEL)) + return; + else if (p) + memblock_free(__pa(p), PUD_SIZE); + } + + p = early_alloc(PAGE_SIZE, nid); + pud_populate(&init_mm, pud, p); + } + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + if (!pmd_large(*pmd)) + kasan_populate_pmd(pmd, addr, next, nid); + } while (pmd++, addr = next, addr != end); +} + +static void __init kasan_populate_p4d(p4d_t *p4d, unsigned long addr, + unsigned long end, int nid) +{ + pud_t *pud; + unsigned long next; + + if (p4d_none(*p4d)) { + void *p = early_alloc(PAGE_SIZE, nid); + + p4d_populate(&init_mm, p4d, p); + } + + pud = pud_offset(p4d, addr); + do { + next = pud_addr_end(addr, end); + if (!pud_large(*pud)) + kasan_populate_pud(pud, addr, next, nid); + } while (pud++, addr = next, addr != end); +} + +static void __init kasan_populate_pgd(pgd_t *pgd, unsigned long addr, + unsigned long end, int nid) +{ + void *p; + p4d_t *p4d; + unsigned long next; + + if (pgd_none(*pgd)) { + p = early_alloc(PAGE_SIZE, nid); + pgd_populate(&init_mm, pgd, p); + } + + p4d = p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + kasan_populate_p4d(p4d, addr, next, nid); + } while (p4d++, addr = next, addr != end); +} + +static void __init kasan_populate_shadow(unsigned long addr, unsigned long end, + int nid) +{ + pgd_t *pgd; + unsigned long next; + + addr = addr & PAGE_MASK; + end = round_up(end, PAGE_SIZE); + pgd = pgd_offset_k(addr); + do { + next = pgd_addr_end(addr, end); + kasan_populate_pgd(pgd, addr, next, nid); + } while (pgd++, addr = next, addr != end); +} + +static void __init map_range(struct range *range) { unsigned long start; unsigned long end; @@ -26,7 +155,7 @@ static int __init map_range(struct range start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start)); end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end));
- return vmemmap_populate(start, end, NUMA_NO_NODE); + kasan_populate_shadow(start, end, early_pfn_to_nid(range->start)); }
static void __init clear_pgds(unsigned long start, @@ -189,16 +318,16 @@ void __init kasan_init(void) if (pfn_mapped[i].end == 0) break;
- if (map_range(&pfn_mapped[i])) - panic("kasan: unable to allocate shadow!"); + map_range(&pfn_mapped[i]); } + kasan_populate_zero_shadow( kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), kasan_mem_to_shadow((void *)__START_KERNEL_map));
- vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext), - (unsigned long)kasan_mem_to_shadow(_end), - NUMA_NO_NODE); + kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext), + (unsigned long)kasan_mem_to_shadow(_end), + early_pfn_to_nid(__pa(_stext)));
kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), (void *)KASAN_SHADOW_END);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Ostrovsky boris.ostrovsky@oracle.com
commit e17f8234538d1ff708673f287a42457c4dee720d upstream.
Commit 1d3e53e8624a ("x86/entry/64: Refactor IRQ stacks and make them NMI-safe") added DEBUG_ENTRY_ASSERT_IRQS_OFF macro that acceses eflags using 'pushfq' instruction when testing for IF bit. On PV Xen guests looking at IF flag directly will always see it set, resulting in 'ud2'.
Introduce SAVE_FLAGS() macro that will use appropriate save_fl pv op when running paravirt.
Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Juergen Gross jgross@suse.com Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Laight David.Laight@aculab.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Eduardo Valentin eduval@amazon.com Cc: Greg KH gregkh@linuxfoundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Rik van Riel riel@redhat.com Cc: Will Deacon will.deacon@arm.com Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20171204150604.899457242@linutronix.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_64.S | 7 ++++--- arch/x86/include/asm/irqflags.h | 3 +++ arch/x86/include/asm/paravirt.h | 9 +++++++++ arch/x86/kernel/asm-offsets_64.c | 3 +++ 4 files changed, 19 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -462,12 +462,13 @@ END(irq_entries_start)
.macro DEBUG_ENTRY_ASSERT_IRQS_OFF #ifdef CONFIG_DEBUG_ENTRY - pushfq - testl $X86_EFLAGS_IF, (%rsp) + pushq %rax + SAVE_FLAGS(CLBR_RAX) + testl $X86_EFLAGS_IF, %eax jz .Lokay_@ ud2 .Lokay_@: - addq $8, %rsp + popq %rax #endif .endm
--- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -142,6 +142,9 @@ static inline notrace unsigned long arch swapgs; \ sysretl
+#ifdef CONFIG_DEBUG_ENTRY +#define SAVE_FLAGS(x) pushfq; popq %rax +#endif #else #define INTERRUPT_RETURN iret #define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -927,6 +927,15 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \ CLBR_NONE, \ jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret64)) + +#ifdef CONFIG_DEBUG_ENTRY +#define SAVE_FLAGS(clobbers) \ + PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), clobbers, \ + PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ + call PARA_INDIRECT(pv_irq_ops+PV_IRQ_save_fl); \ + PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) +#endif + #endif /* CONFIG_X86_32 */
#endif /* __ASSEMBLY__ */ --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -23,6 +23,9 @@ int main(void) #ifdef CONFIG_PARAVIRT OFFSET(PV_CPU_usergs_sysret64, pv_cpu_ops, usergs_sysret64); OFFSET(PV_CPU_swapgs, pv_cpu_ops, swapgs); +#ifdef CONFIG_DEBUG_ENTRY + OFFSET(PV_IRQ_save_fl, pv_irq_ops, save_fl); +#endif BLANK(); #endif
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit d3a09104018cf2ad5973dfa8a9c138ef9f5015a3 upstream.
If the stack overflows into a guard page and the ORC unwinder should work well: by construction, there can't be any meaningful data in the guard page because no writes to the guard page will have succeeded.
But there is a bug that prevents unwinding from working correctly: if the starting register state has RSP pointing into a stack guard page, the ORC unwinder bails out immediately.
Instead of bailing out immediately check whether the next page up is a valid check page and if so analyze that. As a result the ORC unwinder will start the unwind.
Tested by intentionally overflowing the task stack. The result is an accurate call trace instead of a trace consisting purely of '?' entries.
There are a few other bugs that are triggered if t