This is the start of the stable review cycle for the 4.4.180 release. There are 266 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 17 May 2019 09:04:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.180-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.180-rc1
Laurentiu Tudor laurentiu.tudor@nxp.com powerpc/booke64: set RI in default MSR
Dan Carpenter dan.carpenter@oracle.com drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
Dan Carpenter dan.carpenter@oracle.com drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
Jarod Wilson jarod@redhat.com bonding: fix arp_validate toggling in active-backup mode
David Ahern dsahern@gmail.com ipv4: Fix raw socket lookup for local traffic
Stephen Suryaputra ssuryaextr@gmail.com vrf: sit mtu should not be updated when vrf netdev is the link
Hangbin Liu liuhangbin@gmail.com vlan: disable SIOCSHWTSTAMP in container
YueHaibing yuehaibing@huawei.com packet: Fix error path in packet_init
Christophe Leroy christophe.leroy@c-s.fr net: ucc_geth - fix Oops when changing number of buffers in the ring
Tobin C. Harding tobin@kernel.org bridge: Fix error path for kobject_init_and_add()
Breno Leitao leitao@debian.org powerpc/64s: Include cpu header
Johan Hovold johan@kernel.org USB: serial: fix unthrottle races
Oliver Neukum oneukum@suse.com USB: serial: use variable for status
Ben Hutchings ben@decadent.org.uk x86/bugs: Change L1TF mitigation string to match upstream
Josh Poimboeuf jpoimboe@redhat.com x86/speculation/mds: Fix documentation typo
Tyler Hicks tyhicks@canonical.com Documentation: Correct the possible MDS sysfs values
speck for Pawan Gupta speck@linutronix.de x86/mds: Add MDSUM variant to the MDS documentation
Josh Poimboeuf jpoimboe@redhat.com x86/speculation/mds: Add 'mitigations=' support for MDS
Josh Poimboeuf jpoimboe@redhat.com x86/speculation: Support 'mitigations=' cmdline option
Josh Poimboeuf jpoimboe@redhat.com cpu/speculation: Add 'mitigations=' cmdline option
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
Boris Ostrovsky boris.ostrovsky@oracle.com x86/speculation/mds: Fix comment
Josh Poimboeuf jpoimboe@redhat.com x86/speculation/mds: Add SMT warning message
Josh Poimboeuf jpoimboe@redhat.com x86/speculation: Move arch_smt_update() call to after mitigation decisions
Andi Kleen ak@linux.intel.com x86/cpu/bugs: Use __initconst for 'const' init data
Thomas Gleixner tglx@linutronix.de Documentation: Add MDS vulnerability documentation
Thomas Gleixner tglx@linutronix.de Documentation: Move L1TF to separate directory
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Add mitigation mode VMWERV
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Add sysfs reporting for MDS
Ben Hutchings ben@decadent.org.uk x86/speculation/l1tf: Document l1tf in sysfs
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Add mitigation control for MDS
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Conditionally clear CPU buffers on idle entry
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Clear CPU buffers on exit to user
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Add mds_clear_cpu_buffers()
Andi Kleen ak@linux.intel.com x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
Thomas Gleixner tglx@linutronix.de x86/speculation/mds: Add BUG_MSBDS_ONLY
Andi Kleen ak@linux.intel.com x86/speculation/mds: Add basic bug infrastructure for MDS
Thomas Gleixner tglx@linutronix.de x86/speculation: Consolidate CPU whitelists
Thomas Gleixner tglx@linutronix.de x86/msr-index: Cleanup bit defines
Eduardo Habkost ehabkost@redhat.com kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
Thomas Gleixner tglx@linutronix.de x86/speculation: Provide IBPB always command line options
Thomas Gleixner tglx@linutronix.de x86/speculation: Add seccomp Spectre v2 user space protection mode
Thomas Gleixner tglx@linutronix.de x86/speculation: Enable prctl mode for spectre_v2_user
Thomas Gleixner tglx@linutronix.de x86/speculation: Add prctl() control for indirect branch speculation
Thomas Gleixner tglx@linutronix.de x86/speculation: Prevent stale SPEC_CTRL msr content
Thomas Gleixner tglx@linutronix.de x86/speculation: Prepare arch_smt_update() for PRCTL mode
Thomas Gleixner tglx@linutronix.de x86/speculation: Split out TIF update
Thomas Gleixner tglx@linutronix.de x86/speculation: Prepare for conditional IBPB in switch_mm()
Thomas Gleixner tglx@linutronix.de x86/speculation: Avoid __switch_to_xtra() calls
Thomas Gleixner tglx@linutronix.de x86/process: Consolidate and simplify switch_to_xtra() code
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Prepare for per task indirect branch speculation control
Thomas Gleixner tglx@linutronix.de x86/speculation: Add command line control for indirect branch speculation
Thomas Gleixner tglx@linutronix.de x86/speculation: Unify conditional spectre v2 print functions
Thomas Gleixner tglx@linutronix.de x86/speculataion: Mark command line parser data __initdata
Thomas Gleixner tglx@linutronix.de x86/speculation: Mark string arrays const correctly
Thomas Gleixner tglx@linutronix.de x86/speculation: Reorder the spec_v2 code
Thomas Gleixner tglx@linutronix.de x86/speculation: Rework SMT state change
Ben Hutchings ben@decadent.org.uk sched: Add sched_smt_active()
Thomas Gleixner tglx@linutronix.de x86/Kconfig: Select SCHED_SMT if SMP enabled
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Reorganize speculation control MSRs update
Thomas Gleixner tglx@linutronix.de x86/speculation: Rename SSBD update functions
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Disable STIBP when enhanced IBRS is in use
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Remove unnecessary ret variable in cpu_show_common()
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Clean up spectre_v2_parse_cmdline()
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Update the TIF_SSBD comment
Jiri Kosina jkosina@suse.cz x86/speculation: Propagate information about RSB filling mitigation to sysfs
Jiri Kosina jkosina@suse.cz x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
Jiri Kosina jkosina@suse.cz x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
Nadav Amit namit@vmware.com x86/mm: Use WRITE_ONCE() when setting PTEs
Thomas Gleixner tglx@xxxxxxxxxxxxx KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
Peter Zijlstra peterz@infradead.org x86/cpu: Sanitize FAM6_ATOM naming
Filippo Sironi sironi@amazon.de x86/microcode: Update the new microcode revision unconditionally
Prarit Bhargava prarit@redhat.com x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
Jiang Biao jiang.biao2@zte.com.cn x86/speculation: Remove SPECTRE_V2_IBRS in enum spectre_v2_mitigation
Tom Lendacky thomas.lendacky@amd.com x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
Will Deacon will.deacon@arm.com locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/bugs: Switch the selection of mitigation from CPU vendor to CPU features
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/bugs: Add AMD's SPEC_CTRL MSR usage
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/bugs: Add AMD's variant of SSB_NO
Dominik Brodowski linux@dominikbrodowski.net x86/speculation: Simplify the CPU bug detection logic
Sai Praneeth sai.praneeth.prakhya@intel.com x86/speculation: Support Enhanced IBRS on future CPUs
Ben Hutchings ben@decadent.org.uk x86/cpufeatures: Hide AMD-specific speculation flags
Tony Luck tony.luck@intel.com x86/MCE: Save microcode revision in machine check records
Ashok Raj ashok.raj@intel.com x86/microcode/intel: Check microcode revision before updating sibling threads
Matthias Kaehlcke mka@chromium.org bitops: avoid integer overflow in GENMASK(_ULL)
Nicolas Dichtel nicolas.dichtel@6wind.com x86: stop exporting msr-index.h to userland
Borislav Petkov bp@suse.de x86/microcode/intel: Add a helper which gives the microcode revision
Tony Luck tony.luck@intel.com locking/static_keys: Provide DECLARE and well as DEFINE macros
Nigel Croxon ncroxon@redhat.com Don't jump to compute_result state from check_result state
Alistair Strachan astrachan@google.com x86/vdso: Pass --eh-frame-hdr to the linker
Wei Yongjun weiyongjun1@huawei.com cw1200: fix missing unlock on error in cw1200_hw_scan()
Lucas Stach l.stach@pengutronix.de gpu: ipu-v3: dp: fix CSC handling
Po-Hsu Lin po-hsu.lin@canonical.com selftests/net: correct the return value for run_netsocktests
Arnd Bergmann arnd@arndb.de s390: ctcm: fix ctcm_new_device error return code
Julian Anastasov ja@ssi.bg ipvs: do not schedule icmp errors from tunnels
Dan Williams dan.j.williams@intel.com init: initialize jump labels before command line option parsing
Rikard Falkeborn rikard.falkeborn@gmail.com tools lib traceevent: Fix missing equality check for strcmp
Vitaly Kuznetsov vkuznets@redhat.com KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
Martin Schwidefsky schwidefsky@de.ibm.com s390/3270: fix lockdep false positive on view->lock
Peter Oberparleiter oberpar@linux.ibm.com s390/dasd: Fix capacity calculation for large volumes
Aditya Pakki pakki001@umn.edu libnvdimm/btt: Fix a kmemdup failure check
Dmitry Torokhov dmitry.torokhov@gmail.com HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
Dmitry Torokhov dmitry.torokhov@gmail.com HID: input: add mapping for Expose/Overview key
Sven Van Asbroeck thesven73@gmail.com iio: adc: xilinx: fix potential use-after-free on remove
Gustavo A. R. Silva gustavo@embeddedor.com platform/x86: sony-laptop: Fix unintentional fall-through
Michal Hocko mhocko@suse.com mm, vmstat: make quiet_vmstat lighter
Francesco Ruggeri fruggeri@arista.com netfilter: compat: initialize all fields in xt_init
Ben Hutchings ben@decadent.org.uk timer/debug: Change /proc/timer_stats from 0644 to 0600
Ross Zwisler zwisler@chromium.org ASoC: Intel: avoid Oops if DMA setup fails
WANG Cong xiyou.wangcong@gmail.com ipv6: fix a potential deadlock in do_ipv6_setsockopt()
Oliver Neukum oneukum@suse.com UAS: fix alignment of scatter/gather segments
Marcel Holtmann marcel@holtmann.org Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
Young Xiao YangX92@hotmail.com Bluetooth: hidp: fix buffer overflow
Andrew Vasquez andrewv@marvell.com scsi: qla2xxx: Fix incorrect region-size setting in optrom SYSFS routines
Thinh Nguyen Thinh.Nguyen@synopsys.com usb: dwc3: Fix default lpm_nyet_threshold value
Prasad Sodagudi psodagud@codeaurora.org genirq: Prevent use-after-free and work list corruption
Joerg Roedel jroedel@suse.de iommu/amd: Set exclusion range correctly
Varun Prakash varun@chelsio.com scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
Stephane Eranian eranian@google.com perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
Annaliese McDermond nh6z@nh6z.net ASoC: tlv320aic32x4: Fix Common Pins
Daniel Mack daniel@zonque.org ASoC: cs4270: Set auto-increment bit for register writes
Rander Wang rander.wang@linux.intel.com ASoC:soc-pcm:fix a codec fixup issue in TDM case
Jason Yan yanaijie@huawei.com scsi: libsas: fix a race condition when smp task timeout
Jacopo Mondi jacopo+renesas@jmondi.org media: v4l2: i2c: ov7670: Fix PLL bypass register values
Tony Luck tony.luck@intel.com x86/mce: Improve error message when kernel cannot recover, p2
Ondrej Mosnacek omosnace@redhat.com selinux: never allow relabeling on context mounts
Anson Huang anson.huang@nxp.com Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
Jeremy Fertic jeremyfertic@gmail.com staging: iio: adt7316: fix the dac write calculation
Jeremy Fertic jeremyfertic@gmail.com staging: iio: adt7316: fix the dac read calculation
Jeremy Fertic jeremyfertic@gmail.com staging: iio: adt7316: allow adt751x to use internal vref for all dacs
Malte Leip malte@leip.net usb: usbip: fix isoc packet num validation in get_pipe
Arnd Bergmann arnd@arndb.de ARM: iop: don't use using 64-bit DMA masks
Arnd Bergmann arnd@arndb.de ARM: orion: don't use using 64-bit DMA masks
Guenter Roeck linux@roeck-us.net xsysace: Fix error handling in ace_setup
Mike Kravetz mike.kravetz@oracle.com hugetlbfs: fix memory leak for resv_map
Yonglong Liu liuyonglong@huawei.com net: hns: Fix WARNING when remove HNS driver with SMMU enabled
Yonglong Liu liuyonglong@huawei.com net: hns: Use NAPI_POLL_WEIGHT for hns driver
Michael Kelley mikelley@microsoft.com scsi: storvsc: Fix calculation of sub-channel count
Louis Taylor louis@kragniz.eu vfio/pci: use correct format characters
Alexandre Belloni alexandre.belloni@bootlin.com rtc: da9063: set uie_unsupported when relevant
Al Viro viro@zeniv.linux.org.uk debugfs: fix use-after-free on symlink traversal
Al Viro viro@zeniv.linux.org.uk jffs2: fix use-after-free on symlink traversal
Konstantin Khorenko khorenko@virtuozzo.com bonding: show full hw address in sysfs for slave entries
Arvind Sankar niveditas98@gmail.com igb: Fix WARN_ONCE on runtime suspend
Geert Uytterhoeven geert+renesas@glider.be rtc: sh: Fix invalid alarm warning for non-enabled alarm
He, Bo bo.he@intel.com HID: debug: fix race condition with between rdesc_show() and device removal
Alan Stern stern@rowland.harvard.edu USB: core: Fix bug caused by duplicate interface PM usage counter
Alan Stern stern@rowland.harvard.edu USB: core: Fix unterminated string returned by usb_string()
Alan Stern stern@rowland.harvard.edu USB: w1 ds2490: Fix bug caused by improper use of altsetting array
Alan Stern stern@rowland.harvard.edu USB: yurex: Fix protection fault after device removal
Willem de Bruijn willemb@google.com packet: validate msg_namelen in send directly
Michael Chan michael.chan@broadcom.com bnxt_en: Improve multicast address setup logic.
Willem de Bruijn willemb@google.com ipv6: invert flowlabel sharing check in process and user mode
Eric Dumazet edumazet@google.com ipv6/flowlabel: wait rcu grace period before put_pid()
Shmulik Ladkani shmulik@metanetworks.com ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
Greg Kroah-Hartman gregkh@linuxfoundation.org ALSA: line6: use dynamic buffers
Alex Williamson alex.williamson@redhat.com vfio/type1: Limit DMA mappings per container
Changbin Du changbin.du@gmail.com kconfig/[mn]conf: handle backspace (^H) key
raymond pang raymondpangxd@gmail.com libata: fix using DMA buffers on stack
Steffen Maier maier@linux.ibm.com scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
Al Viro viro@zeniv.linux.org.uk ceph: fix use-after-free on symlink traversal
Mukesh Ojha mojha@codeaurora.org usb: u132-hcd: fix resource leak
Kangjie Lu kjlu@umn.edu scsi: qla4xxx: fix a potential NULL pointer dereference
Wen Yang wen.yang99@zte.com.cn net: ethernet: ti: fix possible object reference leak
Wen Yang wen.yang99@zte.com.cn net: ibm: fix possible object reference leak
Wen Yang wen.yang99@zte.com.cn net: xilinx: fix possible object reference leak
Lukas Wunner lukas@wunner.de net: ks8851: Set initial carrier state to down
Lukas Wunner lukas@wunner.de net: ks8851: Delay requesting IRQ until opened
Lukas Wunner lukas@wunner.de net: ks8851: Reassert reset pin if chip ID check fails
Lukas Wunner lukas@wunner.de net: ks8851: Dequeue RX packets explicitly
Marco Felsch m.felsch@pengutronix.de ARM: dts: pfla02: increase phy reset duration
Guido Kiener guido@kiener-muenchen.de usb: gadget: net2272: Fix net2272_dequeue()
Guido Kiener guido@kiener-muenchen.de usb: gadget: net2280: Fix net2280_dequeue()
Guido Kiener guido@kiener-muenchen.de usb: gadget: net2280: Fix overrun of OUT messages
Mao Wenan maowenan@huawei.com sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
Xin Long lucien.xin@gmail.com netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING
Aditya Pakki pakki001@umn.edu qlcnic: Avoid potential NULL pointer dereference
Gustavo A. R. Silva garsilva@embeddedor.com usbnet: ipheth: fix potential null pointer dereference in ipheth_carrier_set
Alexander Kappner agk@godking.net usbnet: ipheth: prevent TX queue timeouts when device not ready
Diana Craciun diana.craciun@nxp.com Documentation: Add nospectre_v1 parameter
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Emulate SPRN_BUCSR register
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Flush branch predictor when entering KVM
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
ZhangXiaoxu zhangxiaoxu5@huawei.com ipv4: set the tcp_min_rtt_wlen range from 0 to one day
Vinod Koul vkoul@kernel.org net: stmmac: move stmmac_check_ether_addr() to driver probe
Hangbin Liu liuhangbin@gmail.com team: fix possible recursive locking when add slaves
Eric Dumazet edumazet@google.com ipv4: add sanity checks in ipv4_link_failure()
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "block/loop: Use global lock for ioctl() operation."
Daniel Borkmann daniel@iogearbox.net bpf: reject wrong sized filters earlier
Xin Long lucien.xin@gmail.com tipc: check link name with right length in tipc_nl_compat_link_set
Xin Long lucien.xin@gmail.com tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
Florian Westphal fw@strlen.de netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
YueHaibing yuehaibing@huawei.com fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: gth: Fix an off-by-one in output unassigning
Linus Torvalds torvalds@linux-foundation.org slip: make slhc_free() silently accept an error pointer
Xin Long lucien.xin@gmail.com tipc: handle the err returned from cmd header function
Christophe Leroy christophe.leroy@c-s.fr powerpc/fsl: Fix the flush of branch predictor.
Michael Ellerman mpe@ellerman.id.au powerpc/security: Fix spectre_v2 reporting
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Update Spectre v2 reporting
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add nospectre_v2 command line argument
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Fix spectre_v2 mitigations reporting
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add macro to flush the branch predictor
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add infrastructure to fixup branch predictor flush
Michael Neuling mikey@neuling.org powerpc: Avoid code patching freed init sections
Michael Ellerman mpe@ellerman.id.au powerpc/powernv: Query firmware for count cache flush settings
Michael Ellerman mpe@ellerman.id.au powerpc/pseries: Query hypervisor for count cache flush settings
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Add support for software count cache flush
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Add new security feature flags for count cache flush
Michael Ellerman mpe@ellerman.id.au powerpc/asm: Add a patch_site macro & helpers for patching instructions
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
Diana Craciun diana.craciun@nxp.com powerpc/64: Make meltdown reporting Book3S 64 specific
Michael Ellerman mpe@ellerman.id.au powerpc/64: Call setup_barrier_nospec() from setup_arch()
Michael Ellerman mpe@ellerman.id.au powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
Diana Craciun diana.craciun@nxp.com powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
Diana Craciun diana.craciun@nxp.com powerpc/64: Disable the speculation barrier from the command line
Michael Ellerman mpe@ellerman.id.au powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2
Michal Suchanek msuchanek@suse.de powerpc/64s: Enhance the information in cpu_show_spectre_v1()
Michael Ellerman mpe@ellerman.id.au powerpc: Use barrier_nospec in copy_from_user()
Michael Ellerman mpe@ellerman.id.au powerpc/64: Use barrier_nospec in syscall entry
Michal Suchanek msuchanek@suse.de powerpc/64s: Enable barrier_nospec based on firmware settings
Michal Suchanek msuchanek@suse.de powerpc/64s: Patch barrier_nospec in modules
Michal Suchanek msuchanek@suse.de powerpc/64s: Add support for ori barrier_nospec patching
Michal Suchanek msuchanek@suse.de powerpc/64s: Add barrier_nospec
Nicholas Piggin npiggin@gmail.com powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com powerpc/pseries: Restore default security feature flags on setup
Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com powerpc: Move default security feature flags
Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com powerpc/pseries: Fix clearing of security feature flags
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Wire up cpu_show_spectre_v2()
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Wire up cpu_show_spectre_v1()
Michael Ellerman mpe@ellerman.id.au powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()
Michael Ellerman mpe@ellerman.id.au powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Enhance the information in cpu_show_meltdown()
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Move cpu_show_meltdown()
Michael Ellerman mpe@ellerman.id.au powerpc/powernv: Set or clear security feature flags
Michael Ellerman mpe@ellerman.id.au powerpc/pseries: Set or clear security feature flags
Michael Ellerman mpe@ellerman.id.au powerpc: Add security feature flags for Spectre/Meltdown
Michael Ellerman mpe@ellerman.id.au powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration
Michael Ellerman mpe@ellerman.id.au powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com powerpc/rfi-flush: Differentiate enabled and patched flush types
Michael Ellerman mpe@ellerman.id.au powerpc/rfi-flush: Always enable fallback flush on pseries
Michael Ellerman mpe@ellerman.id.au powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
Michael Ellerman mpe@ellerman.id.au powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code
Michael Ellerman mpe@ellerman.id.au powerpc/powernv: Support firmware disable of RFI flush
Michael Ellerman mpe@ellerman.id.au powerpc/pseries: Support firmware disable of RFI flush
Nicholas Piggin npiggin@gmail.com powerpc/64s: Improve RFI L1-D cache flush fallback
Michael Ellerman mpe@ellerman.id.au powerpc/xmon: Add RFI flush related fields to paca dump
Kai-Heng Feng kai.heng.feng@canonical.com USB: Consolidate LPM checks to avoid enabling LPM twice
Kai-Heng Feng kai.heng.feng@canonical.com USB: Add new USB LPM helpers
NeilBrown neilb@suse.com sunrpc: don't mark uninitialised items as VALID.
Trond Myklebust trondmy@gmail.com nfsd: Don't release the callback slot unless it was actually held
Yan, Zheng zyan@redhat.com ceph: fix ci->i_head_snapc leak
Jeff Layton jlayton@kernel.org ceph: ensure d_name stability in ceph_dentry_hash()
Xie XiuQi xiexiuqi@huawei.com sched/numa: Fix a possible divide-by-zero
Peter Zijlstra peterz@infradead.org trace: Fix preempt_enable_no_resched() abuse
Aurelien Jarno aurelien@aurel32.net MIPS: scall64-o32: Fix indirect syscall number load
Frank Sorenson sorenson@redhat.com cifs: do not attempt cifs operation on smb2+ rename error
Paolo Bonzini pbonzini@redhat.com KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
Masahiro Yamada yamada.masahiro@socionext.com kbuild: simplify ld-option implementation
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 2 + Documentation/hw-vuln/mds.rst | 305 ++++++++++ Documentation/kernel-parameters.txt | 110 +++- Documentation/networking/ip-sysctl.txt | 1 + Documentation/spec_ctrl.txt | 9 + Documentation/usb/power-management.txt | 14 +- Documentation/x86/mds.rst | 225 +++++++ Makefile | 4 +- arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi | 1 + arch/arm/mach-iop13xx/setup.c | 8 +- arch/arm/mach-iop13xx/tpmi.c | 10 +- arch/arm/plat-iop/adma.c | 6 +- arch/arm/plat-orion/common.c | 4 +- arch/mips/kernel/scall64-o32.S | 2 +- arch/powerpc/Kconfig | 7 +- arch/powerpc/include/asm/asm-prototypes.h | 21 + arch/powerpc/include/asm/barrier.h | 21 + arch/powerpc/include/asm/code-patching-asm.h | 18 + arch/powerpc/include/asm/code-patching.h | 2 + arch/powerpc/include/asm/exception-64s.h | 35 ++ arch/powerpc/include/asm/feature-fixups.h | 40 ++ arch/powerpc/include/asm/hvcall.h | 5 + arch/powerpc/include/asm/paca.h | 3 +- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/include/asm/ppc_asm.h | 11 + arch/powerpc/include/asm/reg_booke.h | 2 +- arch/powerpc/include/asm/security_features.h | 92 +++ arch/powerpc/include/asm/setup.h | 23 +- arch/powerpc/include/asm/uaccess.h | 18 +- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/asm-offsets.c | 3 +- arch/powerpc/kernel/entry_32.S | 10 + arch/powerpc/kernel/entry_64.S | 69 +++ arch/powerpc/kernel/exceptions-64e.S | 27 +- arch/powerpc/kernel/exceptions-64s.S | 98 ++-- arch/powerpc/kernel/head_booke.h | 12 + arch/powerpc/kernel/head_fsl_booke.S | 15 + arch/powerpc/kernel/module.c | 10 +- arch/powerpc/kernel/security.c | 434 ++++++++++++++ arch/powerpc/kernel/setup_32.c | 3 + arch/powerpc/kernel/setup_64.c | 51 +- arch/powerpc/kernel/vmlinux.lds.S | 33 +- arch/powerpc/kvm/bookehv_interrupts.S | 4 + arch/powerpc/kvm/e500_emulate.c | 7 + arch/powerpc/lib/code-patching.c | 29 + arch/powerpc/lib/feature-fixups.c | 218 ++++++- arch/powerpc/mm/mem.c | 2 + arch/powerpc/mm/tlb_low_64e.S | 7 + arch/powerpc/platforms/powernv/setup.c | 99 +++- arch/powerpc/platforms/pseries/mobility.c | 3 + arch/powerpc/platforms/pseries/pseries.h | 2 + arch/powerpc/platforms/pseries/setup.c | 88 ++- arch/powerpc/xmon/xmon.c | 2 + arch/x86/Kconfig | 8 +- arch/x86/entry/common.c | 3 + arch/x86/entry/vdso/Makefile | 3 +- arch/x86/include/asm/cpufeatures.h | 12 +- arch/x86/include/asm/intel-family.h | 30 +- arch/x86/include/asm/irqflags.h | 5 + arch/x86/include/asm/microcode_intel.h | 15 + arch/x86/include/asm/msr-index.h | 30 +- arch/x86/include/asm/mwait.h | 7 + arch/x86/include/asm/nospec-branch.h | 66 ++- arch/x86/include/asm/pgtable_64.h | 16 +- arch/x86/include/asm/processor.h | 7 + arch/x86/include/asm/spec-ctrl.h | 20 +- arch/x86/include/asm/switch_to.h | 3 - arch/x86/include/asm/thread_info.h | 20 +- arch/x86/include/asm/tlbflush.h | 8 +- arch/x86/include/uapi/asm/Kbuild | 1 - arch/x86/include/uapi/asm/mce.h | 4 + arch/x86/kernel/cpu/bugs.c | 643 +++++++++++++++++---- arch/x86/kernel/cpu/common.c | 140 +++-- arch/x86/kernel/cpu/intel.c | 11 +- arch/x86/kernel/cpu/mcheck/mce-severity.c | 5 + arch/x86/kernel/cpu/mcheck/mce.c | 4 +- arch/x86/kernel/cpu/microcode/amd.c | 22 +- arch/x86/kernel/cpu/microcode/intel.c | 64 +- arch/x86/kernel/cpu/perf_event_intel.c | 2 +- arch/x86/kernel/nmi.c | 4 + arch/x86/kernel/process.c | 101 +++- arch/x86/kernel/process.h | 39 ++ arch/x86/kernel/process_32.c | 9 +- arch/x86/kernel/process_64.c | 9 +- arch/x86/kernel/traps.c | 8 + arch/x86/kvm/cpuid.c | 13 +- arch/x86/kvm/cpuid.h | 2 +- arch/x86/kvm/svm.c | 10 +- arch/x86/kvm/trace.h | 4 +- arch/x86/kvm/x86.c | 4 + arch/x86/mm/kaiser.c | 4 +- arch/x86/mm/pgtable.c | 6 +- arch/x86/mm/tlb.c | 114 +++- drivers/ata/libata-zpodd.c | 34 +- drivers/base/cpu.c | 8 + drivers/block/loop.c | 42 +- drivers/block/loop.h | 1 + drivers/block/xsysace.c | 2 + drivers/gpu/ipu-v3/ipu-dp.c | 12 +- drivers/hid/hid-debug.c | 5 + drivers/hid/hid-input.c | 6 + drivers/hwtracing/intel_th/gth.c | 2 +- drivers/iio/adc/xilinx-xadc-core.c | 2 +- drivers/input/keyboard/snvs_pwrkey.c | 6 +- drivers/iommu/amd_iommu_init.c | 2 +- drivers/md/raid5.c | 19 +- drivers/media/i2c/ov7670.c | 16 +- drivers/net/bonding/bond_options.c | 7 - drivers/net/bonding/bond_sysfs_slave.c | 4 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +- drivers/net/ethernet/freescale/ucc_geth_ethtool.c | 8 +- drivers/net/ethernet/hisilicon/hns/hnae.c | 4 +- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 7 +- drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 + drivers/net/ethernet/intel/igb/e1000_defines.h | 2 + drivers/net/ethernet/intel/igb/igb_main.c | 57 +- drivers/net/ethernet/micrel/ks8851.c | 36 +- .../net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c | 2 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +- drivers/net/ethernet/ti/netcp_ethss.c | 8 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 2 + drivers/net/slip/slhc.c | 2 +- drivers/net/team/team.c | 6 + drivers/net/usb/ipheth.c | 33 +- drivers/net/wireless/cw1200/scan.c | 5 +- drivers/nvdimm/btt_devs.c | 18 +- drivers/platform/x86/sony-laptop.c | 8 +- drivers/rtc/rtc-da9063.c | 7 + drivers/rtc/rtc-sh.c | 2 +- drivers/s390/block/dasd_eckd.c | 6 +- drivers/s390/char/con3270.c | 2 +- drivers/s390/char/fs3270.c | 3 +- drivers/s390/char/raw3270.c | 3 +- drivers/s390/char/raw3270.h | 4 +- drivers/s390/char/tty3270.c | 3 +- drivers/s390/net/ctcm_main.c | 1 + drivers/s390/scsi/zfcp_fc.c | 21 +- drivers/scsi/csiostor/csio_scsi.c | 5 +- drivers/scsi/libsas/sas_expander.c | 9 +- drivers/scsi/qla2xxx/qla_attr.c | 4 +- drivers/scsi/qla4xxx/ql4_os.c | 2 + drivers/scsi/storvsc_drv.c | 13 +- drivers/staging/iio/addac/adt7316.c | 22 +- drivers/tty/serial/sc16is7xx.c | 12 +- drivers/usb/core/driver.c | 36 +- drivers/usb/core/hub.c | 16 +- drivers/usb/core/message.c | 7 +- drivers/usb/core/sysfs.c | 5 +- drivers/usb/core/usb.h | 10 +- drivers/usb/dwc3/core.c | 2 +- drivers/usb/gadget/udc/net2272.c | 1 + drivers/usb/gadget/udc/net2280.c | 8 +- drivers/usb/host/u132-hcd.c | 3 + drivers/usb/misc/yurex.c | 1 + drivers/usb/serial/generic.c | 57 +- drivers/usb/storage/realtek_cr.c | 13 +- drivers/usb/storage/uas.c | 38 +- drivers/usb/usbip/stub_rx.c | 18 +- drivers/usb/usbip/usbip_common.h | 7 + drivers/vfio/pci/vfio_pci.c | 4 +- drivers/vfio/vfio_iommu_type1.c | 14 + drivers/virt/fsl_hypervisor.c | 29 +- drivers/w1/masters/ds2490.c | 6 +- fs/ceph/dir.c | 6 +- fs/ceph/inode.c | 2 +- fs/ceph/mds_client.c | 9 + fs/ceph/snap.c | 7 +- fs/cifs/inode.c | 4 + fs/debugfs/inode.c | 13 +- fs/hugetlbfs/inode.c | 20 +- fs/jffs2/readinode.c | 5 - fs/jffs2/super.c | 5 +- fs/nfs/super.c | 3 +- fs/nfsd/nfs4callback.c | 8 +- fs/nfsd/state.h | 1 + fs/proc/proc_sysctl.c | 6 +- include/linux/bitops.h | 21 +- include/linux/bits.h | 26 + include/linux/cpu.h | 19 + include/linux/jump_label.h | 6 + include/linux/ptrace.h | 21 +- include/linux/sched.h | 9 + include/linux/sched/smt.h | 20 + include/linux/usb.h | 2 - include/net/addrconf.h | 1 + include/net/bluetooth/hci_core.h | 3 + include/uapi/linux/prctl.h | 1 + init/main.c | 4 +- kernel/cpu.c | 23 +- kernel/irq/manage.c | 4 +- kernel/ptrace.c | 10 + kernel/sched/core.c | 24 + kernel/sched/fair.c | 4 + kernel/sched/sched.h | 1 + kernel/time/timer_stats.c | 2 +- kernel/trace/ring_buffer.c | 2 +- mm/vmstat.c | 68 ++- net/8021q/vlan_dev.c | 4 +- net/bluetooth/hci_conn.c | 8 + net/bluetooth/hidp/sock.c | 1 + net/bridge/br_if.c | 13 +- net/bridge/br_netfilter_hooks.c | 1 + net/bridge/br_netfilter_ipv6.c | 2 + net/bridge/netfilter/ebtables.c | 3 +- net/core/filter.c | 23 +- net/ipv4/ip_output.c | 1 + net/ipv4/raw.c | 4 +- net/ipv4/route.c | 32 +- net/ipv4/sysctl_net_ipv4.c | 5 +- net/ipv6/ip6_flowlabel.c | 22 +- net/ipv6/ipv6_sockglue.c | 3 +- net/ipv6/mcast.c | 17 +- net/ipv6/sit.c | 2 +- net/netfilter/ipvs/ip_vs_core.c | 2 +- net/netfilter/x_tables.c | 2 +- net/packet/af_packet.c | 48 +- net/sunrpc/cache.c | 3 + net/tipc/netlink_compat.c | 24 +- scripts/Kbuild.include | 4 +- scripts/kconfig/lxdialog/inputbox.c | 3 +- scripts/kconfig/nconf.c | 2 +- scripts/kconfig/nconf.gui.c | 3 +- security/selinux/hooks.c | 40 +- sound/soc/codecs/cs4270.c | 1 + sound/soc/codecs/tlv320aic32x4.c | 2 + sound/soc/intel/common/sst-dsp.c | 8 +- sound/soc/soc-pcm.c | 7 +- sound/usb/line6/driver.c | 60 +- sound/usb/line6/toneport.c | 24 +- tools/lib/traceevent/event-parse.c | 2 +- tools/power/x86/turbostat/Makefile | 2 +- tools/testing/selftests/net/run_netsocktests | 2 +- 232 files changed, 4217 insertions(+), 1000 deletions(-)
From: Masahiro Yamada yamada.masahiro@socionext.com
commit 0294e6f4a0006856e1f36b8cd8fa088d9e499e98 upstream.
Currently, linker options are tested by the coordination of $(CC) and $(LD) because $(LD) needs some object to link.
As commit 86a9df597cdd ("kbuild: fix linker feature test macros when cross compiling with Clang") addressed, we need to make sure $(CC) and $(LD) agree the underlying architecture of the passed object.
This could be a bit complex when we combine tools from different groups. For example, we can use clang for $(CC), but we still need to rely on GCC toolchain for $(LD).
So, I was searching for a way of standalone testing of linker options. A trick I found is to use '-v'; this not only prints the version string, but also tests if the given option is recognized.
If a given option is supported,
$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419 GNU ld (Linaro_Binutils-2017.11) 2.28.2.20170706 $ echo $? 0
If unsupported,
$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419 GNU ld (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 2.23.1 aarch64-linux-gnu-ld: unrecognized option '--fix-cortex-a53-843419' aarch64-linux-gnu-ld: use the --help option for usage information $ echo $? 1
Gold works likewise.
$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-843419 GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14 masahiro@pug:~/ref/linux$ echo $? 0 $ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-999999 GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14 aarch64-linux-gnu-ld.gold: --fix-cortex-a53-999999: unknown option aarch64-linux-gnu-ld.gold: use the --help option for usage information $ echo $? 1
LLD too.
$ ld.lld -v --gc-sections LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers) $ echo $? 0 $ ld.lld -v --fix-cortex-a53-843419 LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers) $ echo $? 0 $ ld.lld -v --fix-cortex-a53-999999 ld.lld: error: unknown argument: --fix-cortex-a53-999999 LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers) $ echo $? 1
Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Tested-by: Nick Desaulniers ndesaulniers@google.com [nc: try-run-cached was added later, just use try-run, which is the current mainline state] Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/Kbuild.include | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -156,9 +156,7 @@ cc-ldoption = $(call try-run,\
# ld-option # Usage: LDFLAGS += $(call ld-option, -X) -ld-option = $(call try-run,\ - $(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -x c /dev/null -c -o "$$TMPO"; \ - $(LD) $(LDFLAGS) $(1) "$$TMPO" -o "$$TMP",$(1),$(2)) +ld-option = $(call try-run, $(LD) $(LDFLAGS) $(1) -v,$(1),$(2))
# ar-option # Usage: KBUILD_ARFLAGS := $(call ar-option,D)
commit 78e546c824fa8f96d323b7edd6f5cad5b74af057 upstream
This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return EINVAL. It causes a WARN from exception_type:
WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]() CPU: 3 PID: 16732 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2 ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm] [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm] [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 ---[ end trace b1a0391266848f50 ]---
Testcase (beautified/reduced from syzkaller output):
#include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h>
long r[31];
int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0);
struct kvm_vcpu_events ve = { .exception.injected = 1, .exception.nr = 0xd4 }; r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve); r[30] = ioctl(r[7], KVM_RUN, 0); return 0; }
Reported-by: Dmitry Vyukov dvyukov@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Radim Krčmář rkrcmar@redhat.com Signed-off-by: Zubin Mithra zsm@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2972,6 +2972,10 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_e | KVM_VCPUEVENT_VALID_SMM)) return -EINVAL;
+ if (events->exception.injected && + (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR)) + return -EINVAL; + /* INITs are latched while in SMM */ if (events->flags & KVM_VCPUEVENT_VALID_SMM && (events->smi.smm || events->smi.pending) &&
From: Frank Sorenson sorenson@redhat.com
commit 652727bbe1b17993636346716ae5867627793647 upstream.
A path-based rename returning EBUSY will incorrectly try opening the file with a cifs (NT Create AndX) operation on an smb2+ mount, which causes the server to force a session close.
If the mount is smb2+, skip the fallback.
Signed-off-by: Frank Sorenson sorenson@redhat.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Reviewed-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/inode.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1669,6 +1669,10 @@ cifs_do_rename(const unsigned int xid, s if (rc == 0 || rc != -EBUSY) goto do_rename_exit;
+ /* Don't fall back to using SMB on SMB 2+ mount */ + if (server->vals->protocol_id != 0) + goto do_rename_exit; + /* open-file renames don't work across directories */ if (to_dentry->d_parent != from_dentry->d_parent) goto do_rename_exit;
From: Aurelien Jarno aurelien@aurel32.net
commit 79b4a9cf0e2ea8203ce777c8d5cfa86c71eae86e upstream.
Commit 4c21b8fd8f14 (MIPS: seccomp: Handle indirect system calls (o32)) added indirect syscall detection for O32 processes running on MIPS64, but it did not work correctly for big endian kernel/processes. The reason is that the syscall number is loaded from ARG1 using the lw instruction while this is a 64-bit value, so zero is loaded instead of the syscall number.
Fix the code by using the ld instruction instead. When running a 32-bit processes on a 64 bit CPU, the values are properly sign-extended, so it ensures the value passed to syscall_trace_enter is correct.
Recent systemd versions with seccomp enabled whitelist the getpid syscall for their internal processes (e.g. systemd-journald), but call it through syscall(SYS_getpid). This fix therefore allows O32 big endian systems with a 64-bit kernel to run recent systemd versions.
Signed-off-by: Aurelien Jarno aurelien@aurel32.net Cc: stable@vger.kernel.org # v3.15+ Reviewed-by: Philippe Mathieu-Daudé f4bug@amsat.org Signed-off-by: Paul Burton paul.burton@mips.com Cc: Ralf Baechle ralf@linux-mips.org Cc: James Hogan jhogan@kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/scall64-o32.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -126,7 +126,7 @@ trace_a_syscall: subu t1, v0, __NR_O32_Linux move a1, v0 bnez t1, 1f /* __NR_syscall at offset 0 */ - lw a1, PT_R4(sp) /* Arg1 for __NR_syscall case */ + ld a1, PT_R4(sp) /* Arg1 for __NR_syscall case */ .set pop
1: jal syscall_trace_enter
From: Peter Zijlstra peterz@infradead.org
commit d6097c9e4454adf1f8f2c9547c2fa6060d55d952 upstream.
Unless the very next line is schedule(), or implies it, one must not use preempt_enable_no_resched(). It can cause a preemption to go missing and thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass....
Cc: Waiman Long longman@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ingo Molnar mingo@redhat.com Cc: Will Deacon will.deacon@arm.com Cc: Thomas Gleixner tglx@linutronix.de Cc: the arch/x86 maintainers x86@kernel.org Cc: Davidlohr Bueso dave@stgolabs.net Cc: Tim Chen tim.c.chen@linux.intel.com Cc: huang ying huang.ying.caritas@gmail.com Cc: Roman Gushchin guro@fb.com Cc: Alexei Starovoitov ast@kernel.org Cc: Daniel Borkmann daniel@iogearbox.net Cc: stable@vger.kernel.org Fixes: 2c2d7329d8af ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/ring_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -701,7 +701,7 @@ u64 ring_buffer_time_stamp(struct ring_b
preempt_disable_notrace(); time = rb_time_stamp(buffer); - preempt_enable_no_resched_notrace(); + preempt_enable_notrace();
return time; }
From: Xie XiuQi xiexiuqi@huawei.com
commit a860fa7b96e1a1c974556327aa1aee852d434c21 upstream.
sched_clock_cpu() may not be consistent between CPUs. If a task migrates to another CPU, then se.exec_start is set to that CPU's rq_clock_task() by update_stats_curr_start(). Specifically, the new value might be before the old value due to clock skew.
So then if in numa_get_avg_runtime() the expression:
'now - p->last_task_numa_placement'
ends up as -1, then the divider '*period + 1' in task_numa_placement() is 0 and things go bang. Similar to update_curr(), check if time goes backwards to avoid this.
[ peterz: Wrote new changelog. ] [ mingo: Tweaked the code comment. ]
Signed-off-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: cj.chengjian@huawei.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1722,6 +1722,10 @@ static u64 numa_get_avg_runtime(struct t if (p->last_task_numa_placement) { delta = runtime - p->last_sum_exec_runtime; *period = now - p->last_task_numa_placement; + + /* Avoid time going backwards, prevent potential divide error: */ + if (unlikely((s64)*period < 0)) + *period = 0; } else { delta = p->se.avg.load_sum / p->se.load.weight; *period = LOAD_AVG_MAX;
From: Jeff Layton jlayton@kernel.org
commit 76a495d666e5043ffc315695f8241f5e94a98849 upstream.
Take the d_lock here to ensure that d_name doesn't change.
Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/dir.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1288,6 +1288,7 @@ void ceph_dentry_lru_del(struct dentry * unsigned ceph_dentry_hash(struct inode *dir, struct dentry *dn) { struct ceph_inode_info *dci = ceph_inode(dir); + unsigned hash;
switch (dci->i_dir_layout.dl_dir_hash) { case 0: /* for backward compat */ @@ -1295,8 +1296,11 @@ unsigned ceph_dentry_hash(struct inode * return dn->d_name.hash;
default: - return ceph_str_hash(dci->i_dir_layout.dl_dir_hash, + spin_lock(&dn->d_lock); + hash = ceph_str_hash(dci->i_dir_layout.dl_dir_hash, dn->d_name.name, dn->d_name.len); + spin_unlock(&dn->d_lock); + return hash; } }
From: Yan, Zheng zyan@redhat.com
commit 37659182bff1eeaaeadcfc8f853c6d2b6dbc3f47 upstream.
We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps and i_flushing_caps may change. When they are all zeros, we should free i_head_snapc.
Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/38224 Reported-and-tested-by: Luis Henriques lhenriques@suse.com Signed-off-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/mds_client.c | 9 +++++++++ fs/ceph/snap.c | 7 ++++++- 2 files changed, 15 insertions(+), 1 deletion(-)
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1198,6 +1198,15 @@ static int remove_session_caps_cb(struct list_add(&ci->i_prealloc_cap_flush->list, &to_remove); ci->i_prealloc_cap_flush = NULL; } + + if (drop && + ci->i_wrbuffer_ref_head == 0 && + ci->i_wr_ref == 0 && + ci->i_dirty_caps == 0 && + ci->i_flushing_caps == 0) { + ceph_put_snap_context(ci->i_head_snapc); + ci->i_head_snapc = NULL; + } } spin_unlock(&ci->i_ceph_lock); while (!list_empty(&to_remove)) { --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -567,7 +567,12 @@ void ceph_queue_cap_snap(struct ceph_ino capsnap = NULL;
update_snapc: - if (ci->i_head_snapc) { + if (ci->i_wrbuffer_ref_head == 0 && + ci->i_wr_ref == 0 && + ci->i_dirty_caps == 0 && + ci->i_flushing_caps == 0) { + ci->i_head_snapc = NULL; + } else { ci->i_head_snapc = ceph_get_snap_context(new_snapc); dout(" new snapc is %p\n", new_snapc); }
From: Trond Myklebust trondmy@gmail.com
commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream.
If there are multiple callbacks queued, waiting for the callback slot when the callback gets shut down, then they all currently end up acting as if they hold the slot, and call nfsd4_cb_sequence_done() resulting in interesting side-effects.
In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done() causes a loop back to nfsd4_cb_prepare() without first freeing the slot, which causes a deadlock when nfsd41_cb_get_slot() gets called a second time.
This patch therefore adds a boolean to track whether or not the callback did pick up the slot, so that it can do the right thing in these 2 cases.
Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfsd/nfs4callback.c | 8 +++++++- fs/nfsd/state.h | 1 + 2 files changed, 8 insertions(+), 1 deletion(-)
--- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -874,8 +874,9 @@ static void nfsd4_cb_prepare(struct rpc_ cb->cb_seq_status = 1; cb->cb_status = 0; if (minorversion) { - if (!nfsd41_cb_get_slot(clp, task)) + if (!cb->cb_holds_slot && !nfsd41_cb_get_slot(clp, task)) return; + cb->cb_holds_slot = true; } rpc_call_start(task); } @@ -902,6 +903,9 @@ static bool nfsd4_cb_sequence_done(struc return true; }
+ if (!cb->cb_holds_slot) + goto need_restart; + switch (cb->cb_seq_status) { case 0: /* @@ -939,6 +943,7 @@ static bool nfsd4_cb_sequence_done(struc cb->cb_seq_status); }
+ cb->cb_holds_slot = false; clear_bit(0, &clp->cl_cb_slot_busy); rpc_wake_up_next(&clp->cl_cb_waitq); dprintk("%s: freed slot, new seqid=%d\n", __func__, @@ -1146,6 +1151,7 @@ void nfsd4_init_cb(struct nfsd4_callback cb->cb_seq_status = 1; cb->cb_status = 0; cb->cb_need_restart = false; + cb->cb_holds_slot = false; }
void nfsd4_run_cb(struct nfsd4_callback *cb) --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -70,6 +70,7 @@ struct nfsd4_callback { int cb_seq_status; int cb_status; bool cb_need_restart; + bool cb_holds_slot; };
struct nfsd4_callback_ops {
From: NeilBrown neilb@suse.com
commit d58431eacb226222430940134d97bfd72f292fcd upstream.
A recent commit added a call to cache_fresh_locked() when an expired item was found. The call sets the CACHE_VALID flag, so it is important that the item actually is valid. There are two ways it could be valid: 1/ If ->update has been called to fill in relevant content 2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.
An expired item that is waiting for an update will be neither. Setting CACHE_VALID will mean that a subsequent call to cache_put() will be likely to dereference uninitialised pointers.
So we must make sure the item is valid, and we already have code to do that in try_to_negate_entry(). This takes the hash lock and so cannot be used directly, so take out the two lines that we need and use them.
Now cache_fresh_locked() is certain to be called only on a valid item.
Cc: stable@kernel.org # 2.6.35 Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request") Signed-off-by: NeilBrown neilb@suse.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/cache.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -54,6 +54,7 @@ static void cache_init(struct cache_head h->last_refresh = now; }
+static inline int cache_is_valid(struct cache_head *h); static void cache_fresh_locked(struct cache_head *head, time_t expiry, struct cache_detail *detail); static void cache_fresh_unlocked(struct cache_head *head, @@ -100,6 +101,8 @@ struct cache_head *sunrpc_cache_lookup(s if (cache_is_expired(detail, tmp)) { hlist_del_init(&tmp->cache_list); detail->entries --; + if (cache_is_valid(tmp) == -EAGAIN) + set_bit(CACHE_NEGATIVE, &tmp->flags); cache_fresh_locked(tmp, 0, detail); freeme = tmp; break;
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit 7529b2574a7aaf902f1f8159fbc2a7caa74be559 upstream.
Use new helpers to make LPM enabling/disabling more clear.
This is a preparation to subsequent patch.
Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Cc: stable stable@vger.kernel.org # after much soaking Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/driver.c | 12 +++++++++++- drivers/usb/core/hub.c | 12 ++++++------ drivers/usb/core/message.c | 2 +- drivers/usb/core/sysfs.c | 5 ++++- drivers/usb/core/usb.h | 10 ++++++++-- 5 files changed, 30 insertions(+), 11 deletions(-)
--- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1888,7 +1888,7 @@ int usb_runtime_idle(struct device *dev) return -EBUSY; }
-int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable) +static int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); int ret = -EPERM; @@ -1905,6 +1905,16 @@ int usb_set_usb2_hardware_lpm(struct usb return ret; }
+int usb_enable_usb2_hardware_lpm(struct usb_device *udev) +{ + return usb_set_usb2_hardware_lpm(udev, 1); +} + +int usb_disable_usb2_hardware_lpm(struct usb_device *udev) +{ + return usb_set_usb2_hardware_lpm(udev, 0); +} + #endif /* CONFIG_PM */
struct bus_type usb_bus_type = { --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3117,7 +3117,7 @@ int usb_port_suspend(struct usb_device *
/* disable USB2 hardware LPM */ if (udev->usb2_hw_lpm_enabled == 1) - usb_set_usb2_hardware_lpm(udev, 0); + usb_disable_usb2_hardware_lpm(udev);
if (usb_disable_ltm(udev)) { dev_err(&udev->dev, "Failed to disable LTM before suspend\n."); @@ -3164,7 +3164,7 @@ int usb_port_suspend(struct usb_device * err_ltm: /* Try to enable USB2 hardware LPM again */ if (udev->usb2_hw_lpm_capable == 1) - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev);
if (udev->do_remote_wakeup) (void) usb_disable_remote_wakeup(udev); @@ -3444,7 +3444,7 @@ int usb_port_resume(struct usb_device *u } else { /* Try to enable USB2 hardware LPM */ if (udev->usb2_hw_lpm_capable == 1) - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev);
/* Try to enable USB3 LTM and LPM */ usb_enable_ltm(udev); @@ -4270,7 +4270,7 @@ static void hub_set_initial_usb2_lpm_pol if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { udev->usb2_hw_lpm_allowed = 1; - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev); } }
@@ -5416,7 +5416,7 @@ static int usb_reset_and_verify_device(s * It will be re-enabled by the enumeration process. */ if (udev->usb2_hw_lpm_enabled == 1) - usb_set_usb2_hardware_lpm(udev, 0); + usb_disable_usb2_hardware_lpm(udev);
/* Disable LPM and LTM while we reset the device and reinstall the alt * settings. Device-initiated LPM settings, and system exit latency @@ -5526,7 +5526,7 @@ static int usb_reset_and_verify_device(s
done: /* Now that the alt settings are re-installed, enable LTM and LPM. */ - usb_set_usb2_hardware_lpm(udev, 1); + usb_enable_usb2_hardware_lpm(udev); usb_unlocked_enable_lpm(udev); usb_enable_ltm(udev); usb_release_bos_descriptor(udev); --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1185,7 +1185,7 @@ void usb_disable_device(struct usb_devic }
if (dev->usb2_hw_lpm_enabled == 1) - usb_set_usb2_hardware_lpm(dev, 0); + usb_disable_usb2_hardware_lpm(dev); usb_unlocked_disable_lpm(dev); usb_disable_ltm(dev);
--- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -472,7 +472,10 @@ static ssize_t usb2_hardware_lpm_store(s
if (!ret) { udev->usb2_hw_lpm_allowed = value; - ret = usb_set_usb2_hardware_lpm(udev, value); + if (value) + ret = usb_enable_usb2_hardware_lpm(udev); + else + ret = usb_disable_usb2_hardware_lpm(udev); }
usb_unlock_device(udev); --- a/drivers/usb/core/usb.h +++ b/drivers/usb/core/usb.h @@ -84,7 +84,8 @@ extern int usb_remote_wakeup(struct usb_ extern int usb_runtime_suspend(struct device *dev); extern int usb_runtime_resume(struct device *dev); extern int usb_runtime_idle(struct device *dev); -extern int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable); +extern int usb_enable_usb2_hardware_lpm(struct usb_device *udev); +extern int usb_disable_usb2_hardware_lpm(struct usb_device *udev);
#else
@@ -104,7 +105,12 @@ static inline int usb_autoresume_device( return 0; }
-static inline int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable) +static inline int usb_enable_usb2_hardware_lpm(struct usb_device *udev) +{ + return 0; +} + +static inline int usb_disable_usb2_hardware_lpm(struct usb_device *udev) { return 0; }
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit d7a6c0ce8d26412903c7981503bad9e1cc7c45d2 upstream.
USB Bluetooth controller QCA ROME (0cf3:e007) sometimes stops working after S3: [ 165.110742] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin [ 168.432065] Bluetooth: hci0: Failed to send body at 4 of 1953 (-110)
After some experiments, I found that disabling LPM can workaround the issue.
On some platforms, the USB power is cut during S3, so the driver uses reset-resume to resume the device. During port resume, LPM gets enabled twice, by usb_reset_and_verify_device() and usb_port_resume().
Consolidate all checks into new LPM helpers to make sure LPM only gets enabled once.
Fixes: de68bab4fa96 ("usb: Don't enable USB 2.0 Link PM by default.”) Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Cc: stable stable@vger.kernel.org # after much soaking Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/driver.c | 11 ++++++++--- drivers/usb/core/hub.c | 12 ++++-------- drivers/usb/core/message.c | 3 +-- 3 files changed, 13 insertions(+), 13 deletions(-)
--- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1893,9 +1893,6 @@ static int usb_set_usb2_hardware_lpm(str struct usb_hcd *hcd = bus_to_hcd(udev->bus); int ret = -EPERM;
- if (enable && !udev->usb2_hw_lpm_allowed) - return 0; - if (hcd->driver->set_usb2_hw_lpm) { ret = hcd->driver->set_usb2_hw_lpm(hcd, udev, enable); if (!ret) @@ -1907,11 +1904,19 @@ static int usb_set_usb2_hardware_lpm(str
int usb_enable_usb2_hardware_lpm(struct usb_device *udev) { + if (!udev->usb2_hw_lpm_capable || + !udev->usb2_hw_lpm_allowed || + udev->usb2_hw_lpm_enabled) + return 0; + return usb_set_usb2_hardware_lpm(udev, 1); }
int usb_disable_usb2_hardware_lpm(struct usb_device *udev) { + if (!udev->usb2_hw_lpm_enabled) + return 0; + return usb_set_usb2_hardware_lpm(udev, 0); }
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3116,8 +3116,7 @@ int usb_port_suspend(struct usb_device * }
/* disable USB2 hardware LPM */ - if (udev->usb2_hw_lpm_enabled == 1) - usb_disable_usb2_hardware_lpm(udev); + usb_disable_usb2_hardware_lpm(udev);
if (usb_disable_ltm(udev)) { dev_err(&udev->dev, "Failed to disable LTM before suspend\n."); @@ -3163,8 +3162,7 @@ int usb_port_suspend(struct usb_device * usb_enable_ltm(udev); err_ltm: /* Try to enable USB2 hardware LPM again */ - if (udev->usb2_hw_lpm_capable == 1) - usb_enable_usb2_hardware_lpm(udev); + usb_enable_usb2_hardware_lpm(udev);
if (udev->do_remote_wakeup) (void) usb_disable_remote_wakeup(udev); @@ -3443,8 +3441,7 @@ int usb_port_resume(struct usb_device *u hub_port_logical_disconnect(hub, port1); } else { /* Try to enable USB2 hardware LPM */ - if (udev->usb2_hw_lpm_capable == 1) - usb_enable_usb2_hardware_lpm(udev); + usb_enable_usb2_hardware_lpm(udev);
/* Try to enable USB3 LTM and LPM */ usb_enable_ltm(udev); @@ -5415,8 +5412,7 @@ static int usb_reset_and_verify_device(s /* Disable USB2 hardware LPM. * It will be re-enabled by the enumeration process. */ - if (udev->usb2_hw_lpm_enabled == 1) - usb_disable_usb2_hardware_lpm(udev); + usb_disable_usb2_hardware_lpm(udev);
/* Disable LPM and LTM while we reset the device and reinstall the alt * settings. Device-initiated LPM settings, and system exit latency --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1184,8 +1184,7 @@ void usb_disable_device(struct usb_devic dev->actconfig->interface[i] = NULL; }
- if (dev->usb2_hw_lpm_enabled == 1) - usb_disable_usb2_hardware_lpm(dev); + usb_disable_usb2_hardware_lpm(dev); usb_unlocked_disable_lpm(dev); usb_disable_ltm(dev);
From: Michael Ellerman mpe@ellerman.id.au
commit 274920a3ecd5f43af0cc380bc0a9ee73a52b9f8a upstream.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/xmon/xmon.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2144,6 +2144,10 @@ static void dump_one_paca(int cpu) DUMP(p, slb_cache_ptr, "x"); for (i = 0; i < SLB_CACHE_ENTRIES; i++) printf(" slb_cache[%d]: = 0x%016lx\n", i, p->slb_cache[i]); + + DUMP(p, rfi_flush_fallback_area, "px"); + DUMP(p, l1d_flush_congruence, "llx"); + DUMP(p, l1d_flush_sets, "llx"); #endif DUMP(p, dscr_default, "llx"); #ifdef CONFIG_PPC_BOOK3E
From: Nicholas Piggin npiggin@gmail.com
commit bdcb1aefc5b3f7d0f1dc8b02673602bca2ff7a4b upstream.
The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer.
The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster.
Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement:
P8 - 1.83x P9 - 1.75x
The flush also becomes simpler and more adaptable to different cache geometries.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/paca.h | 3 - arch/powerpc/kernel/asm-offsets.c | 3 - arch/powerpc/kernel/exceptions-64s.S | 76 ++++++++++++++++------------------- arch/powerpc/kernel/setup_64.c | 13 ----- arch/powerpc/xmon/xmon.c | 2 5 files changed, 39 insertions(+), 58 deletions(-)
--- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -199,8 +199,7 @@ struct paca_struct { */ u64 exrfi[13] __aligned(0x80); void *rfi_flush_fallback_area; - u64 l1d_flush_congruence; - u64 l1d_flush_sets; + u64 l1d_flush_size; #endif };
--- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -245,8 +245,7 @@ int main(void) DEFINE(PACA_IN_MCE, offsetof(struct paca_struct, in_mce)); DEFINE(PACA_RFI_FLUSH_FALLBACK_AREA, offsetof(struct paca_struct, rfi_flush_fallback_area)); DEFINE(PACA_EXRFI, offsetof(struct paca_struct, exrfi)); - DEFINE(PACA_L1D_FLUSH_CONGRUENCE, offsetof(struct paca_struct, l1d_flush_congruence)); - DEFINE(PACA_L1D_FLUSH_SETS, offsetof(struct paca_struct, l1d_flush_sets)); + DEFINE(PACA_L1D_FLUSH_SIZE, offsetof(struct paca_struct, l1d_flush_size)); #endif DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state)); --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1571,39 +1571,37 @@ rfi_flush_fallback: std r9,PACA_EXRFI+EX_R9(r13) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) - std r12,PACA_EXRFI+EX_R12(r13) - std r8,PACA_EXRFI+EX_R13(r13) mfctr r9 ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SETS(r13) - ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ - addi r12,r12,8 + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ mtctr r11 DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
/* order ld/st prior to dcbt stop all streams with flushing */ sync -1: li r8,0 - .rept 8 /* 8-way set associative */ - ldx r11,r10,r8 - add r8,r8,r12 - xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not - add r8,r8,r11 // Add 0, this creates a dependency on the ldx - .endr - addi r10,r10,128 /* 128 byte cache line */ + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 bdnz 1b
mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) ld r11,PACA_EXRFI+EX_R11(r13) - ld r12,PACA_EXRFI+EX_R12(r13) - ld r8,PACA_EXRFI+EX_R13(r13) GET_SCRATCH0(r13); rfid
@@ -1614,39 +1612,37 @@ hrfi_flush_fallback: std r9,PACA_EXRFI+EX_R9(r13) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) - std r12,PACA_EXRFI+EX_R12(r13) - std r8,PACA_EXRFI+EX_R13(r13) mfctr r9 ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SETS(r13) - ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ - addi r12,r12,8 + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ mtctr r11 DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
/* order ld/st prior to dcbt stop all streams with flushing */ sync -1: li r8,0 - .rept 8 /* 8-way set associative */ - ldx r11,r10,r8 - add r8,r8,r12 - xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not - add r8,r8,r11 // Add 0, this creates a dependency on the ldx - .endr - addi r10,r10,128 /* 128 byte cache line */ + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 bdnz 1b
mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) ld r11,PACA_EXRFI+EX_R11(r13) - ld r12,PACA_EXRFI+EX_R12(r13) - ld r8,PACA_EXRFI+EX_R13(r13) GET_SCRATCH0(r13); hrfid
--- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -902,19 +902,8 @@ static void init_fallback_flush(void) memset(l1d_flush_fallback_area, 0, l1d_size * 2);
for_each_possible_cpu(cpu) { - /* - * The fallback flush is currently coded for 8-way - * associativity. Different associativity is possible, but it - * will be treated as 8-way and may not evict the lines as - * effectively. - * - * 128 byte lines are mandatory. - */ - u64 c = l1d_size / 8; - paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area; - paca[cpu].l1d_flush_congruence = c; - paca[cpu].l1d_flush_sets = c / 128; + paca[cpu].l1d_flush_size = l1d_size; } }
--- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2146,8 +2146,6 @@ static void dump_one_paca(int cpu) printf(" slb_cache[%d]: = 0x%016lx\n", i, p->slb_cache[i]);
DUMP(p, rfi_flush_fallback_area, "px"); - DUMP(p, l1d_flush_congruence, "llx"); - DUMP(p, l1d_flush_sets, "llx"); #endif DUMP(p, dscr_default, "llx"); #ifdef CONFIG_PPC_BOOK3E
From: Michael Ellerman mpe@ellerman.id.au
commit 582605a429e20ae68fd0b041b2e840af296edd08 upstream.
Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it.
Fixes: 8989d56878a7 ("powerpc/pseries: Query hypervisor for RFI flush settings") Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -522,7 +522,8 @@ static void pseries_setup_rfi_flush(void if (types == L1D_FLUSH_NONE) types = L1D_FLUSH_FALLBACK;
- if (!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) + if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) || + (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) enable = false; } else { /* Default to fallback if case hcall is not available */
From: Michael Ellerman mpe@ellerman.id.au
commit eb0a2d2620ae431c543963c8c7f08f597366fc60 upstream.
Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it.
Fixes: 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush settings") Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/powernv/setup.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -79,6 +79,10 @@ static void pnv_setup_rfi_flush(void) if (np && of_property_read_bool(np, "disabled")) enable--;
+ np = of_get_child_by_name(fw_features, "speculation-policy-favor-security"); + if (np && of_property_read_bool(np, "disabled")) + enable = 0; + of_node_put(np); of_node_put(fw_features); }
From: Michael Ellerman mpe@ellerman.id.au
commit 1e2a9fc7496955faacbbed49461d611b704a7505 upstream.
rfi_flush_enable() includes a check to see if we're already enabled (or disabled), and in that case does nothing.
But that means calling setup_rfi_flush() a 2nd time doesn't actually work, which is a bit confusing.
Move that check into the debugfs code, where it really belongs.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/setup_64.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -873,9 +873,6 @@ static void do_nothing(void *unused)
void rfi_flush_enable(bool enable) { - if (rfi_flush == enable) - return; - if (enable) { do_rfi_flush_fixups(enabled_flush_types); on_each_cpu(do_nothing, NULL, 1); @@ -929,13 +926,19 @@ void __init setup_rfi_flush(enum l1d_flu #ifdef CONFIG_DEBUG_FS static int rfi_flush_set(void *data, u64 val) { + bool enable; + if (val == 1) - rfi_flush_enable(true); + enable = true; else if (val == 0) - rfi_flush_enable(false); + enable = false; else return -EINVAL;
+ /* Only do anything if we're changing state */ + if (enable != rfi_flush) + rfi_flush_enable(enable); + return 0; }
From: Michael Ellerman mpe@ellerman.id.au
commit abf110f3e1cea40f5ea15e85f5d67c39c14568a7 upstream.
For PowerVM migration we want to be able to call setup_rfi_flush() again after we've migrated the partition.
To support that we need to check that we're not trying to allocate the fallback flush area after memblock has gone away (i.e., boot-time only).
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/setup.h | 2 +- arch/powerpc/kernel/setup_64.c | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -36,7 +36,7 @@ enum l1d_flush_type { L1D_FLUSH_MTTRIG = 0x8, };
-void __init setup_rfi_flush(enum l1d_flush_type, bool enable); +void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types);
#endif /* !__ASSEMBLY__ */ --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -887,6 +887,10 @@ static void init_fallback_flush(void) u64 l1d_size, limit; int cpu;
+ /* Only allocate the fallback flush area once (at boot time). */ + if (l1d_flush_fallback_area) + return; + l1d_size = ppc64_caches.dsize; limit = min(safe_stack_limit(), ppc64_rma_size);
@@ -904,7 +908,7 @@ static void init_fallback_flush(void) } }
-void __init setup_rfi_flush(enum l1d_flush_type types, bool enable) +void setup_rfi_flush(enum l1d_flush_type types, bool enable) { if (types & L1D_FLUSH_FALLBACK) { pr_info("rfi-flush: Using fallback displacement flush\n");
From: Michael Ellerman mpe@ellerman.id.au
commit 84749a58b6e382f109abf1e734bc4dd43c2c25bb upstream.
This ensures the fallback flush area is always allocated on pseries, so in case a LPAR is migrated from a patched to an unpatched system, it is possible to enable the fallback flush in the target system.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -508,26 +508,18 @@ static void pseries_setup_rfi_flush(void
/* Enable by default */ enable = true; + types = L1D_FLUSH_FALLBACK;
rc = plpar_get_cpu_characteristics(&result); if (rc == H_SUCCESS) { - types = L1D_FLUSH_NONE; - if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) types |= L1D_FLUSH_MTTRIG; if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) types |= L1D_FLUSH_ORI;
- /* Use fallback if nothing set in hcall */ - if (types == L1D_FLUSH_NONE) - types = L1D_FLUSH_FALLBACK; - if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) || (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) enable = false; - } else { - /* Default to fallback if case hcall is not available */ - types = L1D_FLUSH_FALLBACK; }
setup_rfi_flush(types, enable);
From: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com
commit 0063d61ccfc011f379a31acaeba6de7c926fed2c upstream.
Currently the rfi-flush messages print 'Using <type> flush' for all enabled_flush_types, but that is not necessarily true -- as now the fallback flush is always enabled on pseries, but the fixup function overwrites its nop/branch slot with other flush types, if available.
So, replace the 'Using <type> flush' messages with '<type> flush is available'.
Also, print the patched flush types in the fixup function, so users can know what is (not) being used (e.g., the slower, fallback flush, or no flush type at all if flush is disabled via the debugfs switch).
Suggested-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/setup_64.c | 6 +++--- arch/powerpc/lib/feature-fixups.c | 9 ++++++++- 2 files changed, 11 insertions(+), 4 deletions(-)
--- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -911,15 +911,15 @@ static void init_fallback_flush(void) void setup_rfi_flush(enum l1d_flush_type types, bool enable) { if (types & L1D_FLUSH_FALLBACK) { - pr_info("rfi-flush: Using fallback displacement flush\n"); + pr_info("rfi-flush: fallback displacement flush available\n"); init_fallback_flush(); }
if (types & L1D_FLUSH_ORI) - pr_info("rfi-flush: Using ori type flush\n"); + pr_info("rfi-flush: ori type flush available\n");
if (types & L1D_FLUSH_MTTRIG) - pr_info("rfi-flush: Using mttrig type flush\n"); + pr_info("rfi-flush: mttrig type flush available\n");
enabled_flush_types = types;
--- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -151,7 +151,14 @@ void do_rfi_flush_fixups(enum l1d_flush_ patch_instruction(dest + 2, instrs[2]); }
- printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i); + printk(KERN_DEBUG "rfi-flush: patched %d locations (%s flush)\n", i, + (types == L1D_FLUSH_NONE) ? "no" : + (types == L1D_FLUSH_FALLBACK) ? "fallback displacement" : + (types & L1D_FLUSH_ORI) ? (types & L1D_FLUSH_MTTRIG) + ? "ori+mttrig type" + : "ori type" : + (types & L1D_FLUSH_MTTRIG) ? "mttrig type" + : "unknown"); } #endif /* CONFIG_PPC_BOOK3S_64 */
From: Michael Ellerman mpe@ellerman.id.au
commit c4bc36628d7f8b664657d8bd6ad1c44c177880b7 upstream.
Add some additional values which have been defined for the H_GET_CPU_CHARACTERISTICS hypercall.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/hvcall.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -292,6 +292,9 @@ #define H_CPU_CHAR_L1D_FLUSH_ORI30 (1ull << 61) // IBM bit 2 #define H_CPU_CHAR_L1D_FLUSH_TRIG2 (1ull << 60) // IBM bit 3 #define H_CPU_CHAR_L1D_THREAD_PRIV (1ull << 59) // IBM bit 4 +#define H_CPU_CHAR_BRANCH_HINTS_HONORED (1ull << 58) // IBM bit 5 +#define H_CPU_CHAR_THREAD_RECONFIG_CTRL (1ull << 57) // IBM bit 6 +#define H_CPU_CHAR_COUNT_CACHE_DISABLED (1ull << 56) // IBM bit 7
#define H_CPU_BEHAV_FAVOUR_SECURITY (1ull << 63) // IBM bit 0 #define H_CPU_BEHAV_L1D_FLUSH_PR (1ull << 62) // IBM bit 1
From: Michael Ellerman mpe@ellerman.id.au
commit 921bc6cf807ceb2ab8005319cf39f33494d6b100 upstream.
We might have migrated to a machine that uses a different flush type, or doesn't need flushing at all.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/mobility.c | 3 +++ arch/powerpc/platforms/pseries/pseries.h | 2 ++ arch/powerpc/platforms/pseries/setup.c | 2 +- 3 files changed, 6 insertions(+), 1 deletion(-)
--- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -314,6 +314,9 @@ void post_mobility_fixup(void) printk(KERN_ERR "Post-mobility device tree update " "failed: %d\n", rc);
+ /* Possibly switch to a new RFI flush type */ + pseries_setup_rfi_flush(); + return; }
--- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -81,4 +81,6 @@ extern struct pci_controller_ops pseries
unsigned long pseries_memory_block_size(void);
+void pseries_setup_rfi_flush(void); + #endif /* _PSERIES_PSERIES_H */ --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -499,7 +499,7 @@ static void __init find_and_init_phbs(vo of_pci_check_probe_only(); }
-static void pseries_setup_rfi_flush(void) +void pseries_setup_rfi_flush(void) { struct h_cpu_char_result result; enum l1d_flush_type types;
From: Michael Ellerman mpe@ellerman.id.au
commit 9a868f634349e62922c226834aa23e3d1329ae7f upstream.
This commit adds security feature flags to reflect the settings we receive from firmware regarding Spectre/Meltdown mitigations.
The feature names reflect the names we are given by firmware on bare metal machines. See the hostboot source for details.
Arguably these could be firmware features, but that then requires them to be read early in boot so they're available prior to asm feature patching, but we don't actually want to use them for patching. We may also want to dynamically update them in future, which would be incompatible with the way firmware features work (at the moment at least). So for now just make them separate flags.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/security_features.h | 65 +++++++++++++++++++++++++++ arch/powerpc/kernel/Makefile | 2 arch/powerpc/kernel/security.c | 15 ++++++ 3 files changed, 81 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/security_features.h create mode 100644 arch/powerpc/kernel/security.c
--- /dev/null +++ b/arch/powerpc/include/asm/security_features.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Security related feature bit definitions. + * + * Copyright 2018, Michael Ellerman, IBM Corporation. + */ + +#ifndef _ASM_POWERPC_SECURITY_FEATURES_H +#define _ASM_POWERPC_SECURITY_FEATURES_H + + +extern unsigned long powerpc_security_features; + +static inline void security_ftr_set(unsigned long feature) +{ + powerpc_security_features |= feature; +} + +static inline void security_ftr_clear(unsigned long feature) +{ + powerpc_security_features &= ~feature; +} + +static inline bool security_ftr_enabled(unsigned long feature) +{ + return !!(powerpc_security_features & feature); +} + + +// Features indicating support for Spectre/Meltdown mitigations + +// The L1-D cache can be flushed with ori r30,r30,0 +#define SEC_FTR_L1D_FLUSH_ORI30 0x0000000000000001ull + +// The L1-D cache can be flushed with mtspr 882,r0 (aka SPRN_TRIG2) +#define SEC_FTR_L1D_FLUSH_TRIG2 0x0000000000000002ull + +// ori r31,r31,0 acts as a speculation barrier +#define SEC_FTR_SPEC_BAR_ORI31 0x0000000000000004ull + +// Speculation past bctr is disabled +#define SEC_FTR_BCCTRL_SERIALISED 0x0000000000000008ull + +// Entries in L1-D are private to a SMT thread +#define SEC_FTR_L1D_THREAD_PRIV 0x0000000000000010ull + +// Indirect branch prediction cache disabled +#define SEC_FTR_COUNT_CACHE_DISABLED 0x0000000000000020ull + + +// Features indicating need for Spectre/Meltdown mitigations + +// The L1-D cache should be flushed on MSR[HV] 1->0 transition (hypervisor to guest) +#define SEC_FTR_L1D_FLUSH_HV 0x0000000000000040ull + +// The L1-D cache should be flushed on MSR[PR] 0->1 transition (kernel to userspace) +#define SEC_FTR_L1D_FLUSH_PR 0x0000000000000080ull + +// A speculation barrier should be used for bounds checks (Spectre variant 1) +#define SEC_FTR_BNDS_CHK_SPEC_BAR 0x0000000000000100ull + +// Firmware configuration indicates user favours security over performance +#define SEC_FTR_FAVOUR_SECURITY 0x0000000000000200ull + +#endif /* _ASM_POWERPC_SECURITY_FEATURES_H */ --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -40,7 +40,7 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_p obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_ppc970.o cpu_setup_pa6t.o -obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o +obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o security.o obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o obj64-$(CONFIG_RELOCATABLE) += reloc_64.o obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o --- /dev/null +++ b/arch/powerpc/kernel/security.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0+ +// +// Security related flags and so on. +// +// Copyright 2018, Michael Ellerman, IBM Corporation. + +#include <linux/kernel.h> +#include <asm/security_features.h> + + +unsigned long powerpc_security_features __read_mostly = \ + SEC_FTR_L1D_FLUSH_HV | \ + SEC_FTR_L1D_FLUSH_PR | \ + SEC_FTR_BNDS_CHK_SPEC_BAR | \ + SEC_FTR_FAVOUR_SECURITY;
From: Michael Ellerman mpe@ellerman.id.au
commit f636c14790ead6cc22cf62279b1f8d7e11a67116 upstream.
Now that we have feature flags for security related things, set or clear them based on what we receive from the hypercall.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 43 +++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -67,6 +67,7 @@ #include <asm/eeh.h> #include <asm/reg.h> #include <asm/plpar_wrappers.h> +#include <asm/security_features.h>
#include "pseries.h"
@@ -499,6 +500,40 @@ static void __init find_and_init_phbs(vo of_pci_check_probe_only(); }
+static void init_cpu_char_feature_flags(struct h_cpu_char_result *result) +{ + if (result->character & H_CPU_CHAR_SPEC_BAR_ORI31) + security_ftr_set(SEC_FTR_SPEC_BAR_ORI31); + + if (result->character & H_CPU_CHAR_BCCTRL_SERIALISED) + security_ftr_set(SEC_FTR_BCCTRL_SERIALISED); + + if (result->character & H_CPU_CHAR_L1D_FLUSH_ORI30) + security_ftr_set(SEC_FTR_L1D_FLUSH_ORI30); + + if (result->character & H_CPU_CHAR_L1D_FLUSH_TRIG2) + security_ftr_set(SEC_FTR_L1D_FLUSH_TRIG2); + + if (result->character & H_CPU_CHAR_L1D_THREAD_PRIV) + security_ftr_set(SEC_FTR_L1D_THREAD_PRIV); + + if (result->character & H_CPU_CHAR_COUNT_CACHE_DISABLED) + security_ftr_set(SEC_FTR_COUNT_CACHE_DISABLED); + + /* + * The features below are enabled by default, so we instead look to see + * if firmware has *disabled* them, and clear them if so. + */ + if (!(result->character & H_CPU_BEHAV_FAVOUR_SECURITY)) + security_ftr_clear(SEC_FTR_FAVOUR_SECURITY); + + if (!(result->character & H_CPU_BEHAV_L1D_FLUSH_PR)) + security_ftr_clear(SEC_FTR_L1D_FLUSH_PR); + + if (!(result->character & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR)) + security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR); +} + void pseries_setup_rfi_flush(void) { struct h_cpu_char_result result; @@ -512,6 +547,8 @@ void pseries_setup_rfi_flush(void)
rc = plpar_get_cpu_characteristics(&result); if (rc == H_SUCCESS) { + init_cpu_char_feature_flags(&result); + if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) types |= L1D_FLUSH_MTTRIG; if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) @@ -522,6 +559,12 @@ void pseries_setup_rfi_flush(void) enable = false; }
+ /* + * We're the guest so this doesn't apply to us, clear it to simplify + * handling of it elsewhere. + */ + security_ftr_clear(SEC_FTR_L1D_FLUSH_HV); + setup_rfi_flush(types, enable); }
From: Michael Ellerman mpe@ellerman.id.au
commit 77addf6e95c8689e478d607176b399a6242a777e upstream.
Now that we have feature flags for security related things, set or clear them based on what we see in the device tree provided by firmware.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/powernv/setup.c | 56 +++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
--- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -37,9 +37,63 @@ #include <asm/smp.h> #include <asm/tm.h> #include <asm/setup.h> +#include <asm/security_features.h>
#include "powernv.h"
+ +static bool fw_feature_is(const char *state, const char *name, + struct device_node *fw_features) +{ + struct device_node *np; + bool rc = false; + + np = of_get_child_by_name(fw_features, name); + if (np) { + rc = of_property_read_bool(np, state); + of_node_put(np); + } + + return rc; +} + +static void init_fw_feat_flags(struct device_node *np) +{ + if (fw_feature_is("enabled", "inst-spec-barrier-ori31,31,0", np)) + security_ftr_set(SEC_FTR_SPEC_BAR_ORI31); + + if (fw_feature_is("enabled", "fw-bcctrl-serialized", np)) + security_ftr_set(SEC_FTR_BCCTRL_SERIALISED); + + if (fw_feature_is("enabled", "inst-spec-barrier-ori31,31,0", np)) + security_ftr_set(SEC_FTR_L1D_FLUSH_ORI30); + + if (fw_feature_is("enabled", "inst-l1d-flush-trig2", np)) + security_ftr_set(SEC_FTR_L1D_FLUSH_TRIG2); + + if (fw_feature_is("enabled", "fw-l1d-thread-split", np)) + security_ftr_set(SEC_FTR_L1D_THREAD_PRIV); + + if (fw_feature_is("enabled", "fw-count-cache-disabled", np)) + security_ftr_set(SEC_FTR_COUNT_CACHE_DISABLED); + + /* + * The features below are enabled by default, so we instead look to see + * if firmware has *disabled* them, and clear them if so. + */ + if (fw_feature_is("disabled", "speculation-policy-favor-security", np)) + security_ftr_clear(SEC_FTR_FAVOUR_SECURITY); + + if (fw_feature_is("disabled", "needs-l1d-flush-msr-pr-0-to-1", np)) + security_ftr_clear(SEC_FTR_L1D_FLUSH_PR); + + if (fw_feature_is("disabled", "needs-l1d-flush-msr-hv-1-to-0", np)) + security_ftr_clear(SEC_FTR_L1D_FLUSH_HV); + + if (fw_feature_is("disabled", "needs-spec-barrier-for-bound-checks", np)) + security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR); +} + static void pnv_setup_rfi_flush(void) { struct device_node *np, *fw_features; @@ -55,6 +109,8 @@ static void pnv_setup_rfi_flush(void) of_node_put(np);
if (fw_features) { + init_fw_feat_flags(fw_features); + np = of_get_child_by_name(fw_features, "inst-l1d-flush-trig2"); if (np && of_property_read_bool(np, "enabled")) type = L1D_FLUSH_MTTRIG;
From: Michael Ellerman mpe@ellerman.id.au
commit 8ad33041563a10b34988800c682ada14b2612533 upstream.
This landed in setup_64.c for no good reason other than we had nowhere else to put it. Now that we have a security-related file, that is a better place for it so move it.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 11 +++++++++++ arch/powerpc/kernel/setup_64.c | 8 -------- 2 files changed, 11 insertions(+), 8 deletions(-)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -5,6 +5,8 @@ // Copyright 2018, Michael Ellerman, IBM Corporation.
#include <linux/kernel.h> +#include <linux/device.h> + #include <asm/security_features.h>
@@ -13,3 +15,12 @@ unsigned long powerpc_security_features SEC_FTR_L1D_FLUSH_PR | \ SEC_FTR_BNDS_CHK_SPEC_BAR | \ SEC_FTR_FAVOUR_SECURITY; + + +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) +{ + if (rfi_flush) + return sprintf(buf, "Mitigation: RFI Flush\n"); + + return sprintf(buf, "Vulnerable\n"); +} --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -961,12 +961,4 @@ static __init int rfi_flush_debugfs_init } device_initcall(rfi_flush_debugfs_init); #endif - -ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) -{ - if (rfi_flush) - return sprintf(buf, "Mitigation: RFI Flush\n"); - - return sprintf(buf, "Vulnerable\n"); -} #endif /* CONFIG_PPC_BOOK3S_64 */
From: Michael Ellerman mpe@ellerman.id.au
commit ff348355e9c72493947be337bb4fae4fc1a41eba upstream.
Now that we have the security feature flags we can make the information displayed in the "meltdown" file more informative.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/security_features.h | 1 arch/powerpc/kernel/security.c | 30 +++++++++++++++++++++++++-- 2 files changed, 29 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -10,6 +10,7 @@
extern unsigned long powerpc_security_features; +extern bool rfi_flush;
static inline void security_ftr_set(unsigned long feature) { --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -6,6 +6,7 @@
#include <linux/kernel.h> #include <linux/device.h> +#include <linux/seq_buf.h>
#include <asm/security_features.h>
@@ -19,8 +20,33 @@ unsigned long powerpc_security_features
ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { - if (rfi_flush) - return sprintf(buf, "Mitigation: RFI Flush\n"); + bool thread_priv; + + thread_priv = security_ftr_enabled(SEC_FTR_L1D_THREAD_PRIV); + + if (rfi_flush || thread_priv) { + struct seq_buf s; + seq_buf_init(&s, buf, PAGE_SIZE - 1); + + seq_buf_printf(&s, "Mitigation: "); + + if (rfi_flush) + seq_buf_printf(&s, "RFI Flush"); + + if (rfi_flush && thread_priv) + seq_buf_printf(&s, ", "); + + if (thread_priv) + seq_buf_printf(&s, "L1D private per thread"); + + seq_buf_printf(&s, "\n"); + + return s.len; + } + + if (!security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV) && + !security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR)) + return sprintf(buf, "Not affected\n");
return sprintf(buf, "Vulnerable\n"); }
From: Michael Ellerman mpe@ellerman.id.au
commit 37c0bdd00d3ae83369ab60a6712c28e11e6458d5 upstream.
Now that we have the security flags we can significantly simplify the code in pnv_setup_rfi_flush(), because we can use the flags instead of checking device tree properties and because the security flags have pessimistic defaults.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/powernv/setup.c | 41 ++++++++------------------------- 1 file changed, 10 insertions(+), 31 deletions(-)
--- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -65,7 +65,7 @@ static void init_fw_feat_flags(struct de if (fw_feature_is("enabled", "fw-bcctrl-serialized", np)) security_ftr_set(SEC_FTR_BCCTRL_SERIALISED);
- if (fw_feature_is("enabled", "inst-spec-barrier-ori31,31,0", np)) + if (fw_feature_is("enabled", "inst-l1d-flush-ori30,30,0", np)) security_ftr_set(SEC_FTR_L1D_FLUSH_ORI30);
if (fw_feature_is("enabled", "inst-l1d-flush-trig2", np)) @@ -98,11 +98,10 @@ static void pnv_setup_rfi_flush(void) { struct device_node *np, *fw_features; enum l1d_flush_type type; - int enable; + bool enable;
/* Default to fallback in case fw-features are not available */ type = L1D_FLUSH_FALLBACK; - enable = 1;
np = of_find_node_by_name(NULL, "ibm,opal"); fw_features = of_get_child_by_name(np, "fw-features"); @@ -110,40 +109,20 @@ static void pnv_setup_rfi_flush(void)
if (fw_features) { init_fw_feat_flags(fw_features); + of_node_put(fw_features);
- np = of_get_child_by_name(fw_features, "inst-l1d-flush-trig2"); - if (np && of_property_read_bool(np, "enabled")) + if (security_ftr_enabled(SEC_FTR_L1D_FLUSH_TRIG2)) type = L1D_FLUSH_MTTRIG;
- of_node_put(np); - - np = of_get_child_by_name(fw_features, "inst-l1d-flush-ori30,30,0"); - if (np && of_property_read_bool(np, "enabled")) + if (security_ftr_enabled(SEC_FTR_L1D_FLUSH_ORI30)) type = L1D_FLUSH_ORI; - - of_node_put(np); - - /* Enable unless firmware says NOT to */ - enable = 2; - np = of_get_child_by_name(fw_features, "needs-l1d-flush-msr-hv-1-to-0"); - if (np && of_property_read_bool(np, "disabled")) - enable--; - - of_node_put(np); - - np = of_get_child_by_name(fw_features, "needs-l1d-flush-msr-pr-0-to-1"); - if (np && of_property_read_bool(np, "disabled")) - enable--; - - np = of_get_child_by_name(fw_features, "speculation-policy-favor-security"); - if (np && of_property_read_bool(np, "disabled")) - enable = 0; - - of_node_put(np); - of_node_put(fw_features); }
- setup_rfi_flush(type, enable > 0); + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && \ + (security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR) || \ + security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV)); + + setup_rfi_flush(type, enable); }
static void __init pnv_setup_arch(void)
From: Michael Ellerman mpe@ellerman.id.au
commit 2e4a16161fcd324b1f9bf6cb6856529f7eaf0689 upstream.
Now that we have the security flags we can simplify the code in pseries_setup_rfi_flush() because the security flags have pessimistic defaults.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -541,30 +541,27 @@ void pseries_setup_rfi_flush(void) bool enable; long rc;
- /* Enable by default */ - enable = true; - types = L1D_FLUSH_FALLBACK; - rc = plpar_get_cpu_characteristics(&result); - if (rc == H_SUCCESS) { + if (rc == H_SUCCESS) init_cpu_char_feature_flags(&result);
- if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) - types |= L1D_FLUSH_MTTRIG; - if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) - types |= L1D_FLUSH_ORI; - - if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) || - (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) - enable = false; - } - /* * We're the guest so this doesn't apply to us, clear it to simplify * handling of it elsewhere. */ security_ftr_clear(SEC_FTR_L1D_FLUSH_HV);
+ types = L1D_FLUSH_FALLBACK; + + if (security_ftr_enabled(SEC_FTR_L1D_FLUSH_TRIG2)) + types |= L1D_FLUSH_MTTRIG; + + if (security_ftr_enabled(SEC_FTR_L1D_FLUSH_ORI30)) + types |= L1D_FLUSH_ORI; + + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && \ + security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR); + setup_rfi_flush(types, enable); }
From: Michael Ellerman mpe@ellerman.id.au
commit 56986016cb8cd9050e601831fe89f332b4e3c46e upstream.
Add a definition for cpu_show_spectre_v1() to override the generic version. Currently this just prints "Not affected" or "Vulnerable" based on the firmware flag.
Although the kernel does have array_index_nospec() in a few places, we haven't yet audited all the powerpc code to see where it's necessary, so for now we don't list that as a mitigation.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -50,3 +50,11 @@ ssize_t cpu_show_meltdown(struct device
return sprintf(buf, "Vulnerable\n"); } + +ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) +{ + if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) + return sprintf(buf, "Not affected\n"); + + return sprintf(buf, "Vulnerable\n"); +}
From: Michael Ellerman mpe@ellerman.id.au
commit d6fbe1c55c55c6937cbea3531af7da84ab7473c3 upstream.
Add a definition for cpu_show_spectre_v2() to override the generic version. This has several permuations, though in practice some may not occur we cater for any combination.
The most verbose is:
Mitigation: Indirect branch serialisation (kernel only), Indirect branch cache disabled, ori31 speculation barrier enabled
We don't treat the ori31 speculation barrier as a mitigation on its own, because it has to be *used* by code in order to be a mitigation and we don't know if userspace is doing that. So if that's all we see we say:
Vulnerable, ori31 speculation barrier enabled
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -58,3 +58,36 @@ ssize_t cpu_show_spectre_v1(struct devic
return sprintf(buf, "Vulnerable\n"); } + +ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf) +{ + bool bcs, ccd, ori; + struct seq_buf s; + + seq_buf_init(&s, buf, PAGE_SIZE - 1); + + bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); + ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED); + ori = security_ftr_enabled(SEC_FTR_SPEC_BAR_ORI31); + + if (bcs || ccd) { + seq_buf_printf(&s, "Mitigation: "); + + if (bcs) + seq_buf_printf(&s, "Indirect branch serialisation (kernel only)"); + + if (bcs && ccd) + seq_buf_printf(&s, ", "); + + if (ccd) + seq_buf_printf(&s, "Indirect branch cache disabled"); + } else + seq_buf_printf(&s, "Vulnerable"); + + if (ori) + seq_buf_printf(&s, ", ori31 speculation barrier enabled"); + + seq_buf_printf(&s, "\n"); + + return s.len; +}
From: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com
commit 0f9bdfe3c77091e8704d2e510eb7c2c2c6cde524 upstream.
The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_* flags.
Found by playing around with QEMU's implementation of the hypercall:
H_CPU_CHAR=0xf000000000000000 H_CPU_BEHAV=0x0000000000000000
This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush mitigation at all for cpu_show_meltdown() to report; but currently it does:
Original kernel:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush
Patched kernel:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown Not affected
H_CPU_CHAR=0x0000000000000000 H_CPU_BEHAV=0xf000000000000000
This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should report vulnerable; but currently it doesn't:
Original kernel:
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Not affected
Patched kernel:
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Vulnerable
Brown-paper-bag-by: Michael Ellerman mpe@ellerman.id.au Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags") Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -524,13 +524,13 @@ static void init_cpu_char_feature_flags( * The features below are enabled by default, so we instead look to see * if firmware has *disabled* them, and clear them if so. */ - if (!(result->character & H_CPU_BEHAV_FAVOUR_SECURITY)) + if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)) security_ftr_clear(SEC_FTR_FAVOUR_SECURITY);
- if (!(result->character & H_CPU_BEHAV_L1D_FLUSH_PR)) + if (!(result->behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) security_ftr_clear(SEC_FTR_L1D_FLUSH_PR);
- if (!(result->character & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR)) + if (!(result->behaviour & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR)) security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR); }
From: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com
commit e7347a86830f38dc3e40c8f7e28c04412b12a2e7 upstream.
This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags.
This can be used to restore current flags to the default flags.
Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/security_features.h | 8 ++++++++ arch/powerpc/kernel/security.c | 7 +------ 2 files changed, 9 insertions(+), 6 deletions(-)
--- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -63,4 +63,12 @@ static inline bool security_ftr_enabled( // Firmware configuration indicates user favours security over performance #define SEC_FTR_FAVOUR_SECURITY 0x0000000000000200ull
+ +// Features enabled by default +#define SEC_FTR_DEFAULT \ + (SEC_FTR_L1D_FLUSH_HV | \ + SEC_FTR_L1D_FLUSH_PR | \ + SEC_FTR_BNDS_CHK_SPEC_BAR | \ + SEC_FTR_FAVOUR_SECURITY) + #endif /* _ASM_POWERPC_SECURITY_FEATURES_H */ --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -11,12 +11,7 @@ #include <asm/security_features.h>
-unsigned long powerpc_security_features __read_mostly = \ - SEC_FTR_L1D_FLUSH_HV | \ - SEC_FTR_L1D_FLUSH_PR | \ - SEC_FTR_BNDS_CHK_SPEC_BAR | \ - SEC_FTR_FAVOUR_SECURITY; - +unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) {
From: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com
commit 6232774f1599028a15418179d17f7df47ede770a upstream.
After migration the security feature flags might have changed (e.g., destination system with unpatched firmware), but some flags are not set/clear again in init_cpu_char_feature_flags() because it assumes the security flags to be the defaults.
Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then init_cpu_char_feature_flags() does not run again, which potentially might leave the system in an insecure or sub-optimal configuration.
So, just restore the security feature flags to the defaults assumed by init_cpu_char_feature_flags() so it can set/clear them correctly, and to ensure safe settings are in place in case the hypercall fail.
Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags") Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags") Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -502,6 +502,10 @@ static void __init find_and_init_phbs(vo
static void init_cpu_char_feature_flags(struct h_cpu_char_result *result) { + /* + * The features below are disabled by default, so we instead look to see + * if firmware has *enabled* them, and set them if so. + */ if (result->character & H_CPU_CHAR_SPEC_BAR_ORI31) security_ftr_set(SEC_FTR_SPEC_BAR_ORI31);
@@ -541,6 +545,13 @@ void pseries_setup_rfi_flush(void) bool enable; long rc;
+ /* + * Set features to the defaults assumed by init_cpu_char_feature_flags() + * so it can set/clear again any features that might have changed after + * migration, and in case the hypercall fails and it is not even called. + */ + powerpc_security_features = SEC_FTR_DEFAULT; + rc = plpar_get_cpu_characteristics(&result); if (rc == H_SUCCESS) init_cpu_char_feature_flags(&result);
From: Michael Ellerman mpe@ellerman.id.au
commit 501a78cbc17c329fabf8e9750a1e9ab810c88a0e upstream.
The recent LPM changes to setup_rfi_flush() are causing some section mismatch warnings because we removed the __init annotation on setup_rfi_flush():
The function setup_rfi_flush() references the function __init ppc64_bolted_size(). the function __init memblock_alloc_base().
The references are actually in init_fallback_flush(), but that is inlined into setup_rfi_flush().
These references are safe because: - only pseries calls setup_rfi_flush() at runtime - pseries always passes L1D_FLUSH_FALLBACK at boot - so the fallback flush area will always be allocated - so the check in init_fallback_flush() will always return early: /* Only allocate the fallback flush area once (at boot time). */ if (l1d_flush_fallback_area) return;
- and therefore we won't actually call the freed init routines.
We should rework the code to make it safer by default rather than relying on the above, but for now as a quick-fix just add a __ref annotation to squash the warning.
Fixes: abf110f3e1ce ("powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again") Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/setup_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -882,7 +882,7 @@ void rfi_flush_enable(bool enable) rfi_flush = enable; }
-static void init_fallback_flush(void) +static void __ref init_fallback_flush(void) { u64 l1d_size, limit; int cpu;
From: Nicholas Piggin npiggin@gmail.com
commit a048a07d7f4535baa4cbad6bc024f175317ab938 upstream.
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths.
This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs.
Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected.
Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching.
Firmware advertisement is not implemented yet, so CPU flush types are hard coded.
Thanks to Michal Suchánek for bug fixes and review.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Mauricio Faria de Oliveira mauricfo@linux.vnet.ibm.com Signed-off-by: Michael Neuling mikey@neuling.org Signed-off-by: Michal Suchánek msuchanek@suse.de [mpe: 4.4 doesn't have EXC_REAL_OOL_MASKABLE, so do it manually] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/exception-64s.h | 35 ++++++ arch/powerpc/include/asm/feature-fixups.h | 19 +++ arch/powerpc/include/asm/security_features.h | 11 ++ arch/powerpc/kernel/exceptions-64s.S | 22 +++- arch/powerpc/kernel/security.c | 148 +++++++++++++++++++++++++++ arch/powerpc/kernel/vmlinux.lds.S | 14 ++ arch/powerpc/lib/feature-fixups.c | 116 ++++++++++++++++++++- arch/powerpc/platforms/powernv/setup.c | 1 arch/powerpc/platforms/pseries/setup.c | 1 9 files changed, 365 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -50,6 +50,27 @@ #define EX_PPR 88 /* SMT thread status register (priority) */ #define EX_CTR 96
+#define STF_ENTRY_BARRIER_SLOT \ + STF_ENTRY_BARRIER_FIXUP_SECTION; \ + nop; \ + nop; \ + nop + +#define STF_EXIT_BARRIER_SLOT \ + STF_EXIT_BARRIER_FIXUP_SECTION; \ + nop; \ + nop; \ + nop; \ + nop; \ + nop; \ + nop + +/* + * r10 must be free to use, r13 must be paca + */ +#define INTERRUPT_TO_KERNEL \ + STF_ENTRY_BARRIER_SLOT + /* * Macros for annotating the expected destination of (h)rfid * @@ -66,16 +87,19 @@ rfid
#define RFI_TO_USER \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ rfid; \ b rfi_flush_fallback
#define RFI_TO_USER_OR_KERNEL \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ rfid; \ b rfi_flush_fallback
#define RFI_TO_GUEST \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ rfid; \ b rfi_flush_fallback @@ -84,21 +108,25 @@ hrfid
#define HRFI_TO_USER \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ hrfid; \ b hrfi_flush_fallback
#define HRFI_TO_USER_OR_KERNEL \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ hrfid; \ b hrfi_flush_fallback
#define HRFI_TO_GUEST \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ hrfid; \ b hrfi_flush_fallback
#define HRFI_TO_UNKNOWN \ + STF_EXIT_BARRIER_SLOT; \ RFI_FLUSH_SLOT; \ hrfid; \ b hrfi_flush_fallback @@ -226,6 +254,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define __EXCEPTION_PROLOG_1(area, extra, vec) \ OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR); \ OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR); \ + INTERRUPT_TO_KERNEL; \ SAVE_CTR(r10, area); \ mfcr r9; \ extra(vec); \ @@ -512,6 +541,12 @@ label##_relon_hv: \ #define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \ __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)
+#define MASKABLE_EXCEPTION_OOL(vec, label) \ + .globl label##_ool; \ +label##_ool: \ + EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec); \ + EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD); + #define MASKABLE_EXCEPTION_PSERIES(loc, vec, label) \ . = loc; \ .globl label##_pSeries; \ --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -184,6 +184,22 @@ label##3: \ FTR_ENTRY_OFFSET label##1b-label##3b; \ .popsection;
+#define STF_ENTRY_BARRIER_FIXUP_SECTION \ +953: \ + .pushsection __stf_entry_barrier_fixup,"a"; \ + .align 2; \ +954: \ + FTR_ENTRY_OFFSET 953b-954b; \ + .popsection; + +#define STF_EXIT_BARRIER_FIXUP_SECTION \ +955: \ + .pushsection __stf_exit_barrier_fixup,"a"; \ + .align 2; \ +956: \ + FTR_ENTRY_OFFSET 955b-956b; \ + .popsection; + #define RFI_FLUSH_FIXUP_SECTION \ 951: \ .pushsection __rfi_flush_fixup,"a"; \ @@ -195,6 +211,9 @@ label##3: \
#ifndef __ASSEMBLY__
+extern long stf_barrier_fallback; +extern long __start___stf_entry_barrier_fixup, __stop___stf_entry_barrier_fixup; +extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup;
#endif --- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -12,6 +12,17 @@ extern unsigned long powerpc_security_features; extern bool rfi_flush;
+/* These are bit flags */ +enum stf_barrier_type { + STF_BARRIER_NONE = 0x1, + STF_BARRIER_FALLBACK = 0x2, + STF_BARRIER_EIEIO = 0x4, + STF_BARRIER_SYNC_ORI = 0x8, +}; + +void setup_stf_barrier(void); +void do_stf_barrier_fixups(enum stf_barrier_type types); + static inline void security_ftr_set(unsigned long feature) { powerpc_security_features |= feature; --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -36,6 +36,7 @@ BEGIN_FTR_SECTION \ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \ mr r9,r13 ; \ GET_PACA(r13) ; \ + INTERRUPT_TO_KERNEL ; \ mfspr r11,SPRN_SRR0 ; \ 0:
@@ -292,7 +293,9 @@ hardware_interrupt_hv: . = 0x900 .globl decrementer_pSeries decrementer_pSeries: - _MASKABLE_EXCEPTION_PSERIES(0x900, decrementer, EXC_STD, SOFTEN_TEST_PR) + SET_SCRATCH0(r13) + EXCEPTION_PROLOG_0(PACA_EXGEN) + b decrementer_ool
STD_EXCEPTION_HV(0x980, 0x982, hdecrementer)
@@ -319,6 +322,7 @@ system_call_pSeries: OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR); HMT_MEDIUM; std r10,PACA_EXGEN+EX_R10(r13) + INTERRUPT_TO_KERNEL OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r9, CPU_FTR_HAS_PPR); mfcr r9 KVMTEST(0xc00) @@ -607,6 +611,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
.align 7 /* moved from 0xe00 */ + MASKABLE_EXCEPTION_OOL(0x900, decrementer) STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) STD_EXCEPTION_HV_OOL(0xe22, h_instr_storage) @@ -1564,6 +1569,21 @@ power4_fixup_nap: blr #endif
+ .balign 16 + .globl stf_barrier_fallback +stf_barrier_fallback: + std r9,PACA_EXRFI+EX_R9(r13) + std r10,PACA_EXRFI+EX_R10(r13) + sync + ld r9,PACA_EXRFI+EX_R9(r13) + ld r10,PACA_EXRFI+EX_R10(r13) + ori 31,31,0 + .rept 14 + b 1f +1: + .endr + blr + .globl rfi_flush_fallback rfi_flush_fallback: SET_SCRATCH0(r13); --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -5,9 +5,11 @@ // Copyright 2018, Michael Ellerman, IBM Corporation.
#include <linux/kernel.h> +#include <linux/debugfs.h> #include <linux/device.h> #include <linux/seq_buf.h>
+#include <asm/debug.h> #include <asm/security_features.h>
@@ -86,3 +88,149 @@ ssize_t cpu_show_spectre_v2(struct devic
return s.len; } + +/* + * Store-forwarding barrier support. + */ + +static enum stf_barrier_type stf_enabled_flush_types; +static bool no_stf_barrier; +bool stf_barrier; + +static int __init handle_no_stf_barrier(char *p) +{ + pr_info("stf-barrier: disabled on command line."); + no_stf_barrier = true; + return 0; +} + +early_param("no_stf_barrier", handle_no_stf_barrier); + +/* This is the generic flag used by other architectures */ +static int __init handle_ssbd(char *p) +{ + if (!p || strncmp(p, "auto", 5) == 0 || strncmp(p, "on", 2) == 0 ) { + /* Until firmware tells us, we have the barrier with auto */ + return 0; + } else if (strncmp(p, "off", 3) == 0) { + handle_no_stf_barrier(NULL); + return 0; + } else + return 1; + + return 0; +} +early_param("spec_store_bypass_disable", handle_ssbd); + +/* This is the generic flag used by other architectures */ +static int __init handle_no_ssbd(char *p) +{ + handle_no_stf_barrier(NULL); + return 0; +} +early_param("nospec_store_bypass_disable", handle_no_ssbd); + +static void stf_barrier_enable(bool enable) +{ + if (enable) + do_stf_barrier_fixups(stf_enabled_flush_types); + else + do_stf_barrier_fixups(STF_BARRIER_NONE); + + stf_barrier = enable; +} + +void setup_stf_barrier(void) +{ + enum stf_barrier_type type; + bool enable, hv; + + hv = cpu_has_feature(CPU_FTR_HVMODE); + + /* Default to fallback in case fw-features are not available */ + if (cpu_has_feature(CPU_FTR_ARCH_207S)) + type = STF_BARRIER_SYNC_ORI; + else if (cpu_has_feature(CPU_FTR_ARCH_206)) + type = STF_BARRIER_FALLBACK; + else + type = STF_BARRIER_NONE; + + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + (security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR) || + (security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV) && hv)); + + if (type == STF_BARRIER_FALLBACK) { + pr_info("stf-barrier: fallback barrier available\n"); + } else if (type == STF_BARRIER_SYNC_ORI) { + pr_info("stf-barrier: hwsync barrier available\n"); + } else if (type == STF_BARRIER_EIEIO) { + pr_info("stf-barrier: eieio barrier available\n"); + } + + stf_enabled_flush_types = type; + + if (!no_stf_barrier) + stf_barrier_enable(enable); +} + +ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute *attr, char *buf) +{ + if (stf_barrier && stf_enabled_flush_types != STF_BARRIER_NONE) { + const char *type; + switch (stf_enabled_flush_types) { + case STF_BARRIER_EIEIO: + type = "eieio"; + break; + case STF_BARRIER_SYNC_ORI: + type = "hwsync"; + break; + case STF_BARRIER_FALLBACK: + type = "fallback"; + break; + default: + type = "unknown"; + } + return sprintf(buf, "Mitigation: Kernel entry/exit barrier (%s)\n", type); + } + + if (!security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV) && + !security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR)) + return sprintf(buf, "Not affected\n"); + + return sprintf(buf, "Vulnerable\n"); +} + +#ifdef CONFIG_DEBUG_FS +static int stf_barrier_set(void *data, u64 val) +{ + bool enable; + + if (val == 1) + enable = true; + else if (val == 0) + enable = false; + else + return -EINVAL; + + /* Only do anything if we're changing state */ + if (enable != stf_barrier) + stf_barrier_enable(enable); + + return 0; +} + +static int stf_barrier_get(void *data, u64 *val) +{ + *val = stf_barrier ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_stf_barrier, stf_barrier_get, stf_barrier_set, "%llu\n"); + +static __init int stf_barrier_debugfs_init(void) +{ + debugfs_create_file("stf_barrier", 0600, powerpc_debugfs_root, NULL, &fops_stf_barrier); + return 0; +} +device_initcall(stf_barrier_debugfs_init); +#endif /* CONFIG_DEBUG_FS */ --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -74,6 +74,20 @@ SECTIONS
#ifdef CONFIG_PPC64 . = ALIGN(8); + __stf_entry_barrier_fixup : AT(ADDR(__stf_entry_barrier_fixup) - LOAD_OFFSET) { + __start___stf_entry_barrier_fixup = .; + *(__stf_entry_barrier_fixup) + __stop___stf_entry_barrier_fixup = .; + } + + . = ALIGN(8); + __stf_exit_barrier_fixup : AT(ADDR(__stf_exit_barrier_fixup) - LOAD_OFFSET) { + __start___stf_exit_barrier_fixup = .; + *(__stf_exit_barrier_fixup) + __stop___stf_exit_barrier_fixup = .; + } + + . = ALIGN(8); __rfi_flush_fixup : AT(ADDR(__rfi_flush_fixup) - LOAD_OFFSET) { __start___rfi_flush_fixup = .; *(__rfi_flush_fixup) --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -21,7 +21,7 @@ #include <asm/page.h> #include <asm/sections.h> #include <asm/setup.h> - +#include <asm/security_features.h>
struct fixup_entry { unsigned long mask; @@ -115,6 +115,120 @@ void do_feature_fixups(unsigned long val }
#ifdef CONFIG_PPC_BOOK3S_64 +void do_stf_entry_barrier_fixups(enum stf_barrier_type types) +{ + unsigned int instrs[3], *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___stf_entry_barrier_fixup), + end = PTRRELOC(&__stop___stf_entry_barrier_fixup); + + instrs[0] = 0x60000000; /* nop */ + instrs[1] = 0x60000000; /* nop */ + instrs[2] = 0x60000000; /* nop */ + + i = 0; + if (types & STF_BARRIER_FALLBACK) { + instrs[i++] = 0x7d4802a6; /* mflr r10 */ + instrs[i++] = 0x60000000; /* branch patched below */ + instrs[i++] = 0x7d4803a6; /* mtlr r10 */ + } else if (types & STF_BARRIER_EIEIO) { + instrs[i++] = 0x7e0006ac; /* eieio + bit 6 hint */ + } else if (types & STF_BARRIER_SYNC_ORI) { + instrs[i++] = 0x7c0004ac; /* hwsync */ + instrs[i++] = 0xe94d0000; /* ld r10,0(r13) */ + instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + } + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + + patch_instruction(dest, instrs[0]); + + if (types & STF_BARRIER_FALLBACK) + patch_branch(dest + 1, (unsigned long)&stf_barrier_fallback, + BRANCH_SET_LINK); + else + patch_instruction(dest + 1, instrs[1]); + + patch_instruction(dest + 2, instrs[2]); + } + + printk(KERN_DEBUG "stf-barrier: patched %d entry locations (%s barrier)\n", i, + (types == STF_BARRIER_NONE) ? "no" : + (types == STF_BARRIER_FALLBACK) ? "fallback" : + (types == STF_BARRIER_EIEIO) ? "eieio" : + (types == (STF_BARRIER_SYNC_ORI)) ? "hwsync" + : "unknown"); +} + +void do_stf_exit_barrier_fixups(enum stf_barrier_type types) +{ + unsigned int instrs[6], *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___stf_exit_barrier_fixup), + end = PTRRELOC(&__stop___stf_exit_barrier_fixup); + + instrs[0] = 0x60000000; /* nop */ + instrs[1] = 0x60000000; /* nop */ + instrs[2] = 0x60000000; /* nop */ + instrs[3] = 0x60000000; /* nop */ + instrs[4] = 0x60000000; /* nop */ + instrs[5] = 0x60000000; /* nop */ + + i = 0; + if (types & STF_BARRIER_FALLBACK || types & STF_BARRIER_SYNC_ORI) { + if (cpu_has_feature(CPU_FTR_HVMODE)) { + instrs[i++] = 0x7db14ba6; /* mtspr 0x131, r13 (HSPRG1) */ + instrs[i++] = 0x7db04aa6; /* mfspr r13, 0x130 (HSPRG0) */ + } else { + instrs[i++] = 0x7db243a6; /* mtsprg 2,r13 */ + instrs[i++] = 0x7db142a6; /* mfsprg r13,1 */ + } + instrs[i++] = 0x7c0004ac; /* hwsync */ + instrs[i++] = 0xe9ad0000; /* ld r13,0(r13) */ + instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + if (cpu_has_feature(CPU_FTR_HVMODE)) { + instrs[i++] = 0x7db14aa6; /* mfspr r13, 0x131 (HSPRG1) */ + } else { + instrs[i++] = 0x7db242a6; /* mfsprg r13,2 */ + } + } else if (types & STF_BARRIER_EIEIO) { + instrs[i++] = 0x7e0006ac; /* eieio + bit 6 hint */ + } + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + + patch_instruction(dest, instrs[0]); + patch_instruction(dest + 1, instrs[1]); + patch_instruction(dest + 2, instrs[2]); + patch_instruction(dest + 3, instrs[3]); + patch_instruction(dest + 4, instrs[4]); + patch_instruction(dest + 5, instrs[5]); + } + printk(KERN_DEBUG "stf-barrier: patched %d exit locations (%s barrier)\n", i, + (types == STF_BARRIER_NONE) ? "no" : + (types == STF_BARRIER_FALLBACK) ? "fallback" : + (types == STF_BARRIER_EIEIO) ? "eieio" : + (types == (STF_BARRIER_SYNC_ORI)) ? "hwsync" + : "unknown"); +} + + +void do_stf_barrier_fixups(enum stf_barrier_type types) +{ + do_stf_entry_barrier_fixups(types); + do_stf_exit_barrier_fixups(types); +} + void do_rfi_flush_fixups(enum l1d_flush_type types) { unsigned int instrs[3], *dest; --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -130,6 +130,7 @@ static void __init pnv_setup_arch(void) set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
pnv_setup_rfi_flush(); + setup_stf_barrier();
/* Initialize SMP */ pnv_smp_init(); --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -593,6 +593,7 @@ static void __init pSeries_setup_arch(vo fwnmi_init();
pseries_setup_rfi_flush(); + setup_stf_barrier();
/* By default, only probe PCI (can be overridden by rtas_pci) */ pci_add_flags(PCI_PROBE_ONLY);
From: Michal Suchanek msuchanek@suse.de
commit a6b3964ad71a61bb7c61d80a60bea7d42187b2eb upstream.
A no-op form of ori (or immediate of 0 into r31 and the result stored in r31) has been re-tasked as a speculation barrier. The instruction only acts as a barrier on newer machines with appropriate firmware support. On older CPUs it remains a harmless no-op.
Implement barrier_nospec using this instruction.
mpe: The semantics of the instruction are believed to be that it prevents execution of subsequent instructions until preceding branches have been fully resolved and are no longer executing speculatively. There is no further documentation available at this time.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/barrier.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -92,4 +92,19 @@ do { \ #define smp_mb__after_atomic() smp_mb() #define smp_mb__before_spinlock() smp_mb()
+#ifdef CONFIG_PPC_BOOK3S_64 +/* + * Prevent execution of subsequent instructions until preceding branches have + * been fully resolved and are no longer executing speculatively. + */ +#define barrier_nospec_asm ori 31,31,0 + +// This also acts as a compiler barrier due to the memory clobber. +#define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory") + +#else /* !CONFIG_PPC_BOOK3S_64 */ +#define barrier_nospec_asm +#define barrier_nospec() +#endif + #endif /* _ASM_POWERPC_BARRIER_H */
From: Michal Suchanek msuchanek@suse.de
commit 2eea7f067f495e33b8b116b35b5988ab2b8aec55 upstream.
Based on the RFI patching. This is required to be able to disable the speculation barrier.
Only one barrier type is supported and it does nothing when the firmware does not enable it. Also re-patching modules is not supported So the only meaningful thing that can be done is patching out the speculation barrier at boot when the user says it is not wanted.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/barrier.h | 2 +- arch/powerpc/include/asm/feature-fixups.h | 9 +++++++++ arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/kernel/security.c | 9 +++++++++ arch/powerpc/kernel/vmlinux.lds.S | 7 +++++++ arch/powerpc/lib/feature-fixups.c | 27 +++++++++++++++++++++++++++ 6 files changed, 54 insertions(+), 1 deletion(-)
--- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -97,7 +97,7 @@ do { \ * Prevent execution of subsequent instructions until preceding branches have * been fully resolved and are no longer executing speculatively. */ -#define barrier_nospec_asm ori 31,31,0 +#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; nop
// This also acts as a compiler barrier due to the memory clobber. #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory") --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -208,6 +208,14 @@ label##3: \ FTR_ENTRY_OFFSET 951b-952b; \ .popsection;
+#define NOSPEC_BARRIER_FIXUP_SECTION \ +953: \ + .pushsection __barrier_nospec_fixup,"a"; \ + .align 2; \ +954: \ + FTR_ENTRY_OFFSET 953b-954b; \ + .popsection; +
#ifndef __ASSEMBLY__
@@ -215,6 +223,7 @@ extern long stf_barrier_fallback; extern long __start___stf_entry_barrier_fixup, __stop___stf_entry_barrier_fixup; extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; +extern long __start___barrier_nospec_fixup, __stop___barrier_nospec_fixup;
#endif
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -38,6 +38,7 @@ enum l1d_flush_type {
void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void do_barrier_nospec_fixups(bool enable);
#endif /* !__ASSEMBLY__ */
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -11,10 +11,19 @@
#include <asm/debug.h> #include <asm/security_features.h> +#include <asm/setup.h>
unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
+static bool barrier_nospec_enabled; + +static void enable_barrier_nospec(bool enable) +{ + barrier_nospec_enabled = enable; + do_barrier_nospec_fixups(enable); +} + ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { bool thread_priv; --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -93,6 +93,13 @@ SECTIONS *(__rfi_flush_fixup) __stop___rfi_flush_fixup = .; } + + . = ALIGN(8); + __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) { + __start___barrier_nospec_fixup = .; + *(__barrier_nospec_fixup) + __stop___barrier_nospec_fixup = .; + } #endif
EXCEPTION_TABLE(0) --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -274,6 +274,33 @@ void do_rfi_flush_fixups(enum l1d_flush_ (types & L1D_FLUSH_MTTRIG) ? "mttrig type" : "unknown"); } + +void do_barrier_nospec_fixups(bool enable) +{ + unsigned int instr, *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___barrier_nospec_fixup), + end = PTRRELOC(&__stop___barrier_nospec_fixup); + + instr = 0x60000000; /* nop */ + + if (enable) { + pr_info("barrier-nospec: using ORI speculation barrier\n"); + instr = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + } + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + patch_instruction(dest, instr); + } + + printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); +} + #endif /* CONFIG_PPC_BOOK3S_64 */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
From: Michal Suchanek msuchanek@suse.de
commit 815069ca57c142eb71d27439bc27f41a433a67b3 upstream.
Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded.
Iterating all modules and re-patching every time the settings change is not implemented.
Based on lwsync patching.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/setup.h | 7 +++++++ arch/powerpc/kernel/module.c | 6 ++++++ arch/powerpc/kernel/security.c | 2 +- arch/powerpc/lib/feature-fixups.c | 16 +++++++++++++--- 4 files changed, 27 insertions(+), 4 deletions(-)
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -39,6 +39,13 @@ enum l1d_flush_type { void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); void do_barrier_nospec_fixups(bool enable); +extern bool barrier_nospec_enabled; + +#ifdef CONFIG_PPC_BOOK3S_64 +void do_barrier_nospec_fixups_range(bool enable, void *start, void *end); +#else +static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; +#endif
#endif /* !__ASSEMBLY__ */
--- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -67,6 +67,12 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); + + sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); + if (sect != NULL) + do_barrier_nospec_fixups_range(barrier_nospec_enabled, + (void *)sect->sh_addr, + (void *)sect->sh_addr + sect->sh_size); #endif
sect = find_section(hdr, sechdrs, "__lwsync_fixup"); --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -16,7 +16,7 @@
unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
-static bool barrier_nospec_enabled; +bool barrier_nospec_enabled;
static void enable_barrier_nospec(bool enable) { --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -275,14 +275,14 @@ void do_rfi_flush_fixups(enum l1d_flush_ : "unknown"); }
-void do_barrier_nospec_fixups(bool enable) +void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_end) { unsigned int instr, *dest; long *start, *end; int i;
- start = PTRRELOC(&__start___barrier_nospec_fixup), - end = PTRRELOC(&__stop___barrier_nospec_fixup); + start = fixup_start; + end = fixup_end;
instr = 0x60000000; /* nop */
@@ -301,6 +301,16 @@ void do_barrier_nospec_fixups(bool enabl printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); }
+void do_barrier_nospec_fixups(bool enable) +{ + void *start, *end; + + start = PTRRELOC(&__start___barrier_nospec_fixup), + end = PTRRELOC(&__stop___barrier_nospec_fixup); + + do_barrier_nospec_fixups_range(enable, start, end); +} + #endif /* CONFIG_PPC_BOOK3S_64 */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
From: Michal Suchanek msuchanek@suse.de
commit cb3d6759a93c6d0aea1c10deb6d00e111c29c19c upstream.
Check what firmware told us and enable/disable the barrier_nospec as appropriate.
We err on the side of enabling the barrier, as it's no-op on older systems, see the comment for more detail.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/setup.h | 1 arch/powerpc/kernel/security.c | 59 +++++++++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/setup.c | 1 arch/powerpc/platforms/pseries/setup.c | 1 4 files changed, 62 insertions(+)
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -38,6 +38,7 @@ enum l1d_flush_type {
void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void setup_barrier_nospec(void); void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled;
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -24,6 +24,65 @@ static void enable_barrier_nospec(bool e do_barrier_nospec_fixups(enable); }
+void setup_barrier_nospec(void) +{ + bool enable; + + /* + * It would make sense to check SEC_FTR_SPEC_BAR_ORI31 below as well. + * But there's a good reason not to. The two flags we check below are + * both are enabled by default in the kernel, so if the hcall is not + * functional they will be enabled. + * On a system where the host firmware has been updated (so the ori + * functions as a barrier), but on which the hypervisor (KVM/Qemu) has + * not been updated, we would like to enable the barrier. Dropping the + * check for SEC_FTR_SPEC_BAR_ORI31 achieves that. The only downside is + * we potentially enable the barrier on systems where the host firmware + * is not updated, but that's harmless as it's a no-op. + */ + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR); + + enable_barrier_nospec(enable); +} + +#ifdef CONFIG_DEBUG_FS +static int barrier_nospec_set(void *data, u64 val) +{ + switch (val) { + case 0: + case 1: + break; + default: + return -EINVAL; + } + + if (!!val == !!barrier_nospec_enabled) + return 0; + + enable_barrier_nospec(!!val); + + return 0; +} + +static int barrier_nospec_get(void *data, u64 *val) +{ + *val = barrier_nospec_enabled ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec, + barrier_nospec_get, barrier_nospec_set, "%llu\n"); + +static __init int barrier_nospec_debugfs_init(void) +{ + debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL, + &fops_barrier_nospec); + return 0; +} +device_initcall(barrier_nospec_debugfs_init); +#endif /* CONFIG_DEBUG_FS */ + ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { bool thread_priv; --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -123,6 +123,7 @@ static void pnv_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV));
setup_rfi_flush(type, enable); + setup_barrier_nospec(); }
static void __init pnv_setup_arch(void) --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -574,6 +574,7 @@ void pseries_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR);
setup_rfi_flush(types, enable); + setup_barrier_nospec(); }
static void __init pSeries_setup_arch(void)
From: Michael Ellerman mpe@ellerman.id.au
commit 51973a815c6b46d7b23b68d6af371ad1c9d503ca upstream.
Our syscall entry is done in assembly so patch in an explicit barrier_nospec.
Based on a patch by Michal Suchanek.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/entry_64.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -36,6 +36,7 @@ #include <asm/hw_irq.h> #include <asm/context_tracking.h> #include <asm/tm.h> +#include <asm/barrier.h> #ifdef CONFIG_PPC_BOOK3S #include <asm/exception-64s.h> #else @@ -177,6 +178,15 @@ system_call: /* label this so stack tr clrldi r8,r8,32 15: slwi r0,r0,4 + + barrier_nospec_asm + /* + * Prevent the load of the handler below (based on the user-passed + * system call number) being speculatively executed until the test + * against NR_syscalls and branch to .Lsyscall_enosys above has + * committed. + */ + ldx r12,r11,r0 /* Fetch system call handler [ptr] */ mtctr r12 bctrl /* Call handler */
From: Michael Ellerman mpe@ellerman.id.au
commit ddf35cf3764b5a182b178105f57515b42e2634f8 upstream.
Based on the x86 commit doing the same.
See commit 304ec1b05031 ("x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec") and b3bbfb3fb5d2 ("x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec") for more detail.
In all cases we are ordering the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar.
Base on a patch from Michal Suchanek.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/uaccess.h | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -269,6 +269,7 @@ do { \ __chk_user_ptr(ptr); \ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -283,6 +284,7 @@ do { \ __chk_user_ptr(ptr); \ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -295,8 +297,10 @@ do { \ unsigned long __gu_val = 0; \ __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ might_fault(); \ - if (access_ok(VERIFY_READ, __gu_addr, (size))) \ + if (access_ok(VERIFY_READ, __gu_addr, (size))) { \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ + } \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ }) @@ -307,6 +311,7 @@ do { \ unsigned long __gu_val; \ __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -323,8 +328,10 @@ extern unsigned long __copy_tofrom_user( static inline unsigned long copy_from_user(void *to, const void __user *from, unsigned long n) { - if (likely(access_ok(VERIFY_READ, from, n))) + if (likely(access_ok(VERIFY_READ, from, n))) { + barrier_nospec(); return __copy_tofrom_user((__force void __user *)to, from, n); + } memset(to, 0, n); return n; } @@ -359,21 +366,27 @@ static inline unsigned long __copy_from_
switch (n) { case 1: + barrier_nospec(); __get_user_size(*(u8 *)to, from, 1, ret); break; case 2: + barrier_nospec(); __get_user_size(*(u16 *)to, from, 2, ret); break; case 4: + barrier_nospec(); __get_user_size(*(u32 *)to, from, 4, ret); break; case 8: + barrier_nospec(); __get_user_size(*(u64 *)to, from, 8, ret); break; } if (ret == 0) return 0; } + + barrier_nospec(); return __copy_tofrom_user((__force void __user *)to, from, n); }
@@ -400,6 +413,7 @@ static inline unsigned long __copy_to_us if (ret == 0) return 0; } + return __copy_tofrom_user(to, (__force const void __user *)from, n); }
From: Michal Suchanek msuchanek@suse.de
commit a377514519b9a20fa1ea9adddbb4129573129cef upstream.
We now have barrier_nospec as mitigation so print it in cpu_show_spectre_v1() when enabled.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -121,6 +121,9 @@ ssize_t cpu_show_spectre_v1(struct devic if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) return sprintf(buf, "Not affected\n");
+ if (barrier_nospec_enabled) + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + return sprintf(buf, "Vulnerable\n"); }
From: Michael Ellerman mpe@ellerman.id.au
commit 6d44acae1937b81cf8115ada8958e04f601f3f2e upstream.
When I added the spectre_v2 information in sysfs, I included the availability of the ori31 speculation barrier.
Although the ori31 barrier can be used to mitigate v2, it's primarily intended as a spectre v1 mitigation. Spectre v2 is mitigated by hardware changes.
So rework the sysfs files to show the ori31 information in the spectre_v1 file, rather than v2.
Currently we display eg:
$ grep . spectre_v* spectre_v1:Mitigation: __user pointer sanitization spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation barrier enabled
After:
$ grep . spectre_v* spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation barrier enabled spectre_v2:Mitigation: Indirect branch cache disabled
Fixes: d6fbe1c55c55 ("powerpc/64s: Wire up cpu_show_spectre_v2()") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -118,25 +118,35 @@ ssize_t cpu_show_meltdown(struct device
ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) { - if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) - return sprintf(buf, "Not affected\n"); + struct seq_buf s; + + seq_buf_init(&s, buf, PAGE_SIZE - 1); + + if (security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) { + if (barrier_nospec_enabled) + seq_buf_printf(&s, "Mitigation: __user pointer sanitization"); + else + seq_buf_printf(&s, "Vulnerable");
- if (barrier_nospec_enabled) - return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + if (security_ftr_enabled(SEC_FTR_SPEC_BAR_ORI31)) + seq_buf_printf(&s, ", ori31 speculation barrier enabled");
- return sprintf(buf, "Vulnerable\n"); + seq_buf_printf(&s, "\n"); + } else + seq_buf_printf(&s, "Not affected\n"); + + return s.len; }
ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf) { - bool bcs, ccd, ori; struct seq_buf s; + bool bcs, ccd;
seq_buf_init(&s, buf, PAGE_SIZE - 1);
bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED); - ori = security_ftr_enabled(SEC_FTR_SPEC_BAR_ORI31);
if (bcs || ccd) { seq_buf_printf(&s, "Mitigation: "); @@ -152,9 +162,6 @@ ssize_t cpu_show_spectre_v2(struct devic } else seq_buf_printf(&s, "Vulnerable");
- if (ori) - seq_buf_printf(&s, ", ori31 speculation barrier enabled"); - seq_buf_printf(&s, "\n");
return s.len;
From: Diana Craciun diana.craciun@nxp.com
commit cf175dc315f90185128fb061dc05b6fbb211aa2f upstream.
The speculation barrier can be disabled from the command line with the parameter: "nospectre_v1".
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -17,6 +17,7 @@ unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
bool barrier_nospec_enabled; +static bool no_nospec;
static void enable_barrier_nospec(bool enable) { @@ -43,9 +44,18 @@ void setup_barrier_nospec(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
- enable_barrier_nospec(enable); + if (!no_nospec) + enable_barrier_nospec(enable); }
+static int __init handle_nospectre_v1(char *p) +{ + no_nospec = true; + + return 0; +} +early_param("nospectre_v1", handle_nospectre_v1); + #ifdef CONFIG_DEBUG_FS static int barrier_nospec_set(void *data, u64 val) {
From: Diana Craciun diana.craciun@nxp.com
commit 6453b532f2c8856a80381e6b9a1f5ea2f12294df upstream.
NXP Book3E platforms are not vulnerable to speculative store bypass, so make the mitigations PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -177,6 +177,7 @@ ssize_t cpu_show_spectre_v2(struct devic return s.len; }
+#ifdef CONFIG_PPC_BOOK3S_64 /* * Store-forwarding barrier support. */ @@ -322,3 +323,4 @@ static __init int stf_barrier_debugfs_in } device_initcall(stf_barrier_debugfs_init); #endif /* CONFIG_DEBUG_FS */ +#endif /* CONFIG_PPC_BOOK3S_64 */
From: Michael Ellerman mpe@ellerman.id.au
commit 179ab1cbf883575c3a585bcfc0f2160f1d22a149 upstream.
Add a config symbol to encode which platforms support the barrier_nospec speculation barrier. Currently this is just Book3S 64 but we will add Book3E in a future patch.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/Kconfig | 7 ++++++- arch/powerpc/include/asm/barrier.h | 6 +++--- arch/powerpc/include/asm/setup.h | 2 +- arch/powerpc/kernel/Makefile | 3 ++- arch/powerpc/kernel/module.c | 4 +++- arch/powerpc/kernel/vmlinux.lds.S | 4 +++- arch/powerpc/lib/feature-fixups.c | 6 ++++-- 7 files changed, 22 insertions(+), 10 deletions(-)
--- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -136,7 +136,7 @@ config PPC select GENERIC_SMP_IDLE_THREAD select GENERIC_CMOS_UPDATE select GENERIC_TIME_VSYSCALL_OLD - select GENERIC_CPU_VULNERABILITIES if PPC_BOOK3S_64 + select GENERIC_CPU_VULNERABILITIES if PPC_BARRIER_NOSPEC select GENERIC_CLOCKEVENTS select GENERIC_CLOCKEVENTS_BROADCAST if SMP select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST @@ -162,6 +162,11 @@ config PPC select ARCH_HAS_DMA_SET_COHERENT_MASK select HAVE_ARCH_SECCOMP_FILTER
+config PPC_BARRIER_NOSPEC + bool + default y + depends on PPC_BOOK3S_64 + config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN
--- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -92,7 +92,7 @@ do { \ #define smp_mb__after_atomic() smp_mb() #define smp_mb__before_spinlock() smp_mb()
-#ifdef CONFIG_PPC_BOOK3S_64 +#ifdef CONFIG_PPC_BARRIER_NOSPEC /* * Prevent execution of subsequent instructions until preceding branches have * been fully resolved and are no longer executing speculatively. @@ -102,9 +102,9 @@ do { \ // This also acts as a compiler barrier due to the memory clobber. #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory")
-#else /* !CONFIG_PPC_BOOK3S_64 */ +#else /* !CONFIG_PPC_BARRIER_NOSPEC */ #define barrier_nospec_asm #define barrier_nospec() -#endif +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
#endif /* _ASM_POWERPC_BARRIER_H */ --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -42,7 +42,7 @@ void setup_barrier_nospec(void); void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled;
-#ifdef CONFIG_PPC_BOOK3S_64 +#ifdef CONFIG_PPC_BARRIER_NOSPEC void do_barrier_nospec_fixups_range(bool enable, void *start, void *end); #else static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -40,10 +40,11 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_p obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_ppc970.o cpu_setup_pa6t.o -obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o security.o +obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o obj64-$(CONFIG_RELOCATABLE) += reloc_64.o obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o +obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o obj-$(CONFIG_PPC64) += vdso64/ obj-$(CONFIG_ALTIVEC) += vecemu.o obj-$(CONFIG_PPC_970_NAP) += idle_power4.o --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -67,13 +67,15 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); +#endif /* CONFIG_PPC64 */
+#ifdef CONFIG_PPC_BARRIER_NOSPEC sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); if (sect != NULL) do_barrier_nospec_fixups_range(barrier_nospec_enabled, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); -#endif +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
sect = find_section(hdr, sechdrs, "__lwsync_fixup"); if (sect != NULL) --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -93,14 +93,16 @@ SECTIONS *(__rfi_flush_fixup) __stop___rfi_flush_fixup = .; } +#endif /* CONFIG_PPC64 */
+#ifdef CONFIG_PPC_BARRIER_NOSPEC . = ALIGN(8); __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) { __start___barrier_nospec_fixup = .; *(__barrier_nospec_fixup) __stop___barrier_nospec_fixup = .; } -#endif +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
EXCEPTION_TABLE(0)
--- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -301,6 +301,9 @@ void do_barrier_nospec_fixups_range(bool printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); }
+#endif /* CONFIG_PPC_BOOK3S_64 */ + +#ifdef CONFIG_PPC_BARRIER_NOSPEC void do_barrier_nospec_fixups(bool enable) { void *start, *end; @@ -310,8 +313,7 @@ void do_barrier_nospec_fixups(bool enabl
do_barrier_nospec_fixups_range(enable, start, end); } - -#endif /* CONFIG_PPC_BOOK3S_64 */ +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) {
From: Michael Ellerman mpe@ellerman.id.au
commit af375eefbfb27cbb5b831984e66d724a40d26b5c upstream.
Currently we require platform code to call setup_barrier_nospec(). But if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case then we can call it in setup_arch().
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/setup.h | 4 ++++ arch/powerpc/kernel/setup_32.c | 2 ++ arch/powerpc/kernel/setup_64.c | 2 ++ arch/powerpc/platforms/powernv/setup.c | 1 - arch/powerpc/platforms/pseries/setup.c | 1 - 5 files changed, 8 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -38,7 +38,11 @@ enum l1d_flush_type {
void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +#ifdef CONFIG_PPC_BARRIER_NOSPEC void setup_barrier_nospec(void); +#else +static inline void setup_barrier_nospec(void) { }; +#endif void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled;
--- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -322,6 +322,8 @@ void __init setup_arch(char **cmdline_p) ppc_md.setup_arch(); if ( ppc_md.progress ) ppc_md.progress("arch: exit", 0x3eab);
+ setup_barrier_nospec(); + paging_init();
/* Initialize the MMU context management stuff */ --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -736,6 +736,8 @@ void __init setup_arch(char **cmdline_p) if (ppc_md.setup_arch) ppc_md.setup_arch();
+ setup_barrier_nospec(); + paging_init();
/* Initialize the MMU context management stuff */ --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -123,7 +123,6 @@ static void pnv_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV));
setup_rfi_flush(type, enable); - setup_barrier_nospec(); }
static void __init pnv_setup_arch(void) --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -574,7 +574,6 @@ void pseries_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR);
setup_rfi_flush(types, enable); - setup_barrier_nospec(); }
static void __init pSeries_setup_arch(void)
From: Diana Craciun diana.craciun@nxp.com
commit 406d2b6ae3420f5bb2b3db6986dc6f0b6dbb637b upstream.
In a subsequent patch we will enable building security.c for Book3E. However the NXP platforms are not vulnerable to Meltdown, so make the Meltdown vulnerability reporting PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun diana.craciun@nxp.com [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -93,6 +93,7 @@ static __init int barrier_nospec_debugfs device_initcall(barrier_nospec_debugfs_init); #endif /* CONFIG_DEBUG_FS */
+#ifdef CONFIG_PPC_BOOK3S_64 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { bool thread_priv; @@ -125,6 +126,7 @@ ssize_t cpu_show_meltdown(struct device
return sprintf(buf, "Vulnerable\n"); } +#endif
ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) {
From: Diana Craciun diana.craciun@nxp.com
commit ebcd1bfc33c7a90df941df68a6e5d4018c022fba upstream.
Implement the barrier_nospec as a isync;sync instruction sequence. The implementation uses the infrastructure built for BOOK3S 64.
Signed-off-by: Diana Craciun diana.craciun@nxp.com [mpe: Add PPC_INST_ISYNC for backport] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/barrier.h | 8 +++++++- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/lib/feature-fixups.c | 31 +++++++++++++++++++++++++++++++ 4 files changed, 40 insertions(+), 2 deletions(-)
--- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -165,7 +165,7 @@ config PPC config PPC_BARRIER_NOSPEC bool default y - depends on PPC_BOOK3S_64 + depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -92,12 +92,18 @@ do { \ #define smp_mb__after_atomic() smp_mb() #define smp_mb__before_spinlock() smp_mb()
+#ifdef CONFIG_PPC_BOOK3S_64 +#define NOSPEC_BARRIER_SLOT nop +#elif defined(CONFIG_PPC_FSL_BOOK3E) +#define NOSPEC_BARRIER_SLOT nop; nop +#endif + #ifdef CONFIG_PPC_BARRIER_NOSPEC /* * Prevent execution of subsequent instructions until preceding branches have * been fully resolved and are no longer executing speculatively. */ -#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; nop +#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; NOSPEC_BARRIER_SLOT
// This also acts as a compiler barrier due to the memory clobber. #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory") --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -147,6 +147,7 @@ #define PPC_INST_LWSYNC 0x7c2004ac #define PPC_INST_SYNC 0x7c0004ac #define PPC_INST_SYNC_MASK 0xfc0007fe +#define PPC_INST_ISYNC 0x4c00012c #define PPC_INST_LXVD2X 0x7c000698 #define PPC_INST_MCRXR 0x7c000400 #define PPC_INST_MCRXR_MASK 0xfc0007fe --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -315,6 +315,37 @@ void do_barrier_nospec_fixups(bool enabl } #endif /* CONFIG_PPC_BARRIER_NOSPEC */
+#ifdef CONFIG_PPC_FSL_BOOK3E +void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_end) +{ + unsigned int instr[2], *dest; + long *start, *end; + int i; + + start = fixup_start; + end = fixup_end; + + instr[0] = PPC_INST_NOP; + instr[1] = PPC_INST_NOP; + + if (enable) { + pr_info("barrier-nospec: using isync; sync as speculation barrier\n"); + instr[0] = PPC_INST_ISYNC; + instr[1] = PPC_INST_SYNC; + } + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + patch_instruction(dest, instr[0]); + patch_instruction(dest + 1, instr[1]); + } + + printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); +} +#endif /* CONFIG_PPC_FSL_BOOK3E */ + void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) { long *start, *end;
From: Michael Ellerman mpe@ellerman.id.au
commit 06d0bbc6d0f56dacac3a79900e9a9a0d5972d818 upstream.
Add a macro and some helper C functions for patching single asm instructions.
The gas macro means we can do something like:
1: nop patch_site 1b, patch__foo
Which is less visually distracting than defining a GLOBAL symbol at 1, and also doesn't pollute the symbol table which can confuse eg. perf.
These are obviously similar to our existing feature sections, but are not automatically patched based on CPU/MMU features, rather they are designed to be manually patched by C code at some arbitrary point.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/code-patching-asm.h | 18 ++++++++++++++++++ arch/powerpc/include/asm/code-patching.h | 2 ++ arch/powerpc/lib/code-patching.c | 16 ++++++++++++++++ 3 files changed, 36 insertions(+) create mode 100644 arch/powerpc/include/asm/code-patching-asm.h
--- /dev/null +++ b/arch/powerpc/include/asm/code-patching-asm.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Copyright 2018, Michael Ellerman, IBM Corporation. + */ +#ifndef _ASM_POWERPC_CODE_PATCHING_ASM_H +#define _ASM_POWERPC_CODE_PATCHING_ASM_H + +/* Define a "site" that can be patched */ +.macro patch_site label name + .pushsection ".rodata" + .balign 4 + .global \name +\name: + .4byte \label - . + .popsection +.endm + +#endif /* _ASM_POWERPC_CODE_PATCHING_ASM_H */ --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -28,6 +28,8 @@ unsigned int create_cond_branch(const un unsigned long target, int flags); int patch_branch(unsigned int *addr, unsigned long target, int flags); int patch_instruction(unsigned int *addr, unsigned int instr); +int patch_instruction_site(s32 *addr, unsigned int instr); +int patch_branch_site(s32 *site, unsigned long target, int flags);
int instr_is_relative_branch(unsigned int instr); int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr); --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -32,6 +32,22 @@ int patch_branch(unsigned int *addr, uns return patch_instruction(addr, create_branch(addr, target, flags)); }
+int patch_branch_site(s32 *site, unsigned long target, int flags) +{ + unsigned int *addr; + + addr = (unsigned int *)((unsigned long)site + *site); + return patch_instruction(addr, create_branch(addr, target, flags)); +} + +int patch_instruction_site(s32 *site, unsigned int instr) +{ + unsigned int *addr; + + addr = (unsigned int *)((unsigned long)site + *site); + return patch_instruction(addr, instr); +} + unsigned int create_branch(const unsigned int *addr, unsigned long target, int flags) {
From: Michael Ellerman mpe@ellerman.id.au
commit dc8c6cce9a26a51fc19961accb978217a3ba8c75 upstream.
Add security feature flags to indicate the need for software to flush the count cache on context switch, and for the presence of a hardware assisted count cache flush.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/security_features.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -59,6 +59,9 @@ static inline bool security_ftr_enabled( // Indirect branch prediction cache disabled #define SEC_FTR_COUNT_CACHE_DISABLED 0x0000000000000020ull
+// bcctr 2,0,0 triggers a hardware assisted count cache flush +#define SEC_FTR_BCCTR_FLUSH_ASSIST 0x0000000000000800ull +
// Features indicating need for Spectre/Meltdown mitigations
@@ -74,6 +77,9 @@ static inline bool security_ftr_enabled( // Firmware configuration indicates user favours security over performance #define SEC_FTR_FAVOUR_SECURITY 0x0000000000000200ull
+// Software required to flush count cache on context switch +#define SEC_FTR_FLUSH_COUNT_CACHE 0x0000000000000400ull +
// Features enabled by default #define SEC_FTR_DEFAULT \
From: Michael Ellerman mpe@ellerman.id.au
commit ee13cb249fabdff8b90aaff61add347749280087 upstream.
Some CPU revisions support a mode where the count cache needs to be flushed by software on context switch. Additionally some revisions may have a hardware accelerated flush, in which case the software flush sequence can be shortened.
If we detect the appropriate flag from firmware we patch a branch into _switch() which takes us to a count cache flush sequence.
That sequence in turn may be patched to return early if we detect that the CPU supports accelerating the flush sequence in hardware.
Add debugfs support for reporting the state of the flush, as well as runtime disabling it.
And modify the spectre_v2 sysfs file to report the state of the software flush.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/asm-prototypes.h | 21 +++++ arch/powerpc/include/asm/security_features.h | 1 arch/powerpc/kernel/entry_64.S | 54 ++++++++++++++ arch/powerpc/kernel/security.c | 98 +++++++++++++++++++++++++-- 4 files changed, 169 insertions(+), 5 deletions(-) create mode 100644 arch/powerpc/include/asm/asm-prototypes.h
--- /dev/null +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -0,0 +1,21 @@ +#ifndef _ASM_POWERPC_ASM_PROTOTYPES_H +#define _ASM_POWERPC_ASM_PROTOTYPES_H +/* + * This file is for prototypes of C functions that are only called + * from asm, and any associated variables. + * + * Copyright 2016, Daniel Axtens, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + */ + +/* Patch sites */ +extern s32 patch__call_flush_count_cache; +extern s32 patch__flush_count_cache_return; + +extern long flush_count_cache; + +#endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */ --- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -22,6 +22,7 @@ enum stf_barrier_type {
void setup_stf_barrier(void); void do_stf_barrier_fixups(enum stf_barrier_type types); +void setup_count_cache_flush(void);
static inline void security_ftr_set(unsigned long feature) { --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -25,6 +25,7 @@ #include <asm/page.h> #include <asm/mmu.h> #include <asm/thread_info.h> +#include <asm/code-patching-asm.h> #include <asm/ppc_asm.h> #include <asm/asm-offsets.h> #include <asm/cputable.h> @@ -450,6 +451,57 @@ _GLOBAL(ret_from_kernel_thread) li r3,0 b .Lsyscall_exit
+#ifdef CONFIG_PPC_BOOK3S_64 + +#define FLUSH_COUNT_CACHE \ +1: nop; \ + patch_site 1b, patch__call_flush_count_cache + + +#define BCCTR_FLUSH .long 0x4c400420 + +.macro nops number + .rept \number + nop + .endr +.endm + +.balign 32 +.global flush_count_cache +flush_count_cache: + /* Save LR into r9 */ + mflr r9 + + .rept 64 + bl .+4 + .endr + b 1f + nops 6 + + .balign 32 + /* Restore LR */ +1: mtlr r9 + li r9,0x7fff + mtctr r9 + + BCCTR_FLUSH + +2: nop + patch_site 2b patch__flush_count_cache_return + + nops 3 + + .rept 278 + .balign 32 + BCCTR_FLUSH + nops 7 + .endr + + blr +#else +#define FLUSH_COUNT_CACHE +#endif /* CONFIG_PPC_BOOK3S_64 */ + /* * This routine switches between two different tasks. The process * state of one is saved on its kernel stack. Then the state @@ -513,6 +565,8 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) #endif
+ FLUSH_COUNT_CACHE + #ifdef CONFIG_SMP /* We need a sync somewhere here to make sure that if the * previous task gets rescheduled on another CPU, it sees all --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -10,12 +10,21 @@ #include <linux/seq_buf.h>
#include <asm/debug.h> +#include <asm/asm-prototypes.h> +#include <asm/code-patching.h> #include <asm/security_features.h> #include <asm/setup.h>
unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
+enum count_cache_flush_type { + COUNT_CACHE_FLUSH_NONE = 0x1, + COUNT_CACHE_FLUSH_SW = 0x2, + COUNT_CACHE_FLUSH_HW = 0x4, +}; +static enum count_cache_flush_type count_cache_flush_type; + bool barrier_nospec_enabled; static bool no_nospec;
@@ -160,17 +169,29 @@ ssize_t cpu_show_spectre_v2(struct devic bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
- if (bcs || ccd) { + if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) { + bool comma = false; seq_buf_printf(&s, "Mitigation: ");
- if (bcs) + if (bcs) { seq_buf_printf(&s, "Indirect branch serialisation (kernel only)"); + comma = true; + } + + if (ccd) { + if (comma) + seq_buf_printf(&s, ", "); + seq_buf_printf(&s, "Indirect branch cache disabled"); + comma = true; + }
- if (bcs && ccd) + if (comma) seq_buf_printf(&s, ", ");
- if (ccd) - seq_buf_printf(&s, "Indirect branch cache disabled"); + seq_buf_printf(&s, "Software count cache flush"); + + if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW) + seq_buf_printf(&s, "(hardware accelerated)"); } else seq_buf_printf(&s, "Vulnerable");
@@ -325,4 +346,71 @@ static __init int stf_barrier_debugfs_in } device_initcall(stf_barrier_debugfs_init); #endif /* CONFIG_DEBUG_FS */ + +static void toggle_count_cache_flush(bool enable) +{ + if (!enable || !security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) { + patch_instruction_site(&patch__call_flush_count_cache, PPC_INST_NOP); + count_cache_flush_type = COUNT_CACHE_FLUSH_NONE; + pr_info("count-cache-flush: software flush disabled.\n"); + return; + } + + patch_branch_site(&patch__call_flush_count_cache, + (u64)&flush_count_cache, BRANCH_SET_LINK); + + if (!security_ftr_enabled(SEC_FTR_BCCTR_FLUSH_ASSIST)) { + count_cache_flush_type = COUNT_CACHE_FLUSH_SW; + pr_info("count-cache-flush: full software flush sequence enabled.\n"); + return; + } + + patch_instruction_site(&patch__flush_count_cache_return, PPC_INST_BLR); + count_cache_flush_type = COUNT_CACHE_FLUSH_HW; + pr_info("count-cache-flush: hardware assisted flush sequence enabled\n"); +} + +void setup_count_cache_flush(void) +{ + toggle_count_cache_flush(true); +} + +#ifdef CONFIG_DEBUG_FS +static int count_cache_flush_set(void *data, u64 val) +{ + bool enable; + + if (val == 1) + enable = true; + else if (val == 0) + enable = false; + else + return -EINVAL; + + toggle_count_cache_flush(enable); + + return 0; +} + +static int count_cache_flush_get(void *data, u64 *val) +{ + if (count_cache_flush_type == COUNT_CACHE_FLUSH_NONE) + *val = 0; + else + *val = 1; + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_count_cache_flush, count_cache_flush_get, + count_cache_flush_set, "%llu\n"); + +static __init int count_cache_flush_debugfs_init(void) +{ + debugfs_create_file("count_cache_flush", 0600, powerpc_debugfs_root, + NULL, &fops_count_cache_flush); + return 0; +} +device_initcall(count_cache_flush_debugfs_init); +#endif /* CONFIG_DEBUG_FS */ #endif /* CONFIG_PPC_BOOK3S_64 */
From: Michael Ellerman mpe@ellerman.id.au
commit ba72dc171954b782a79d25e0f4b3ed91090c3b1e upstream.
Use the existing hypercall to determine the appropriate settings for the count cache flush, and then call the generic powerpc code to set it up based on the security feature flags.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/hvcall.h | 2 ++ arch/powerpc/platforms/pseries/setup.c | 7 +++++++ 2 files changed, 9 insertions(+)
--- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -295,10 +295,12 @@ #define H_CPU_CHAR_BRANCH_HINTS_HONORED (1ull << 58) // IBM bit 5 #define H_CPU_CHAR_THREAD_RECONFIG_CTRL (1ull << 57) // IBM bit 6 #define H_CPU_CHAR_COUNT_CACHE_DISABLED (1ull << 56) // IBM bit 7 +#define H_CPU_CHAR_BCCTR_FLUSH_ASSIST (1ull << 54) // IBM bit 9
#define H_CPU_BEHAV_FAVOUR_SECURITY (1ull << 63) // IBM bit 0 #define H_CPU_BEHAV_L1D_FLUSH_PR (1ull << 62) // IBM bit 1 #define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR (1ull << 61) // IBM bit 2 +#define H_CPU_BEHAV_FLUSH_COUNT_CACHE (1ull << 58) // IBM bit 5
#ifndef __ASSEMBLY__ #include <linux/types.h> --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -524,6 +524,12 @@ static void init_cpu_char_feature_flags( if (result->character & H_CPU_CHAR_COUNT_CACHE_DISABLED) security_ftr_set(SEC_FTR_COUNT_CACHE_DISABLED);
+ if (result->character & H_CPU_CHAR_BCCTR_FLUSH_ASSIST) + security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST); + + if (result->behaviour & H_CPU_BEHAV_FLUSH_COUNT_CACHE) + security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE); + /* * The features below are enabled by default, so we instead look to see * if firmware has *disabled* them, and clear them if so. @@ -574,6 +580,7 @@ void pseries_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR);
setup_rfi_flush(types, enable); + setup_count_cache_flush(); }
static void __init pSeries_setup_arch(void)
From: Michael Ellerman mpe@ellerman.id.au
commit 99d54754d3d5f896a8f616b0b6520662bc99d66b upstream.
Look for fw-features properties to determine the appropriate settings for the count cache flush, and then call the generic powerpc code to set it up based on the security feature flags.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/powernv/setup.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -77,6 +77,12 @@ static void init_fw_feat_flags(struct de if (fw_feature_is("enabled", "fw-count-cache-disabled", np)) security_ftr_set(SEC_FTR_COUNT_CACHE_DISABLED);
+ if (fw_feature_is("enabled", "fw-count-cache-flush-bcctr2,0,0", np)) + security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST); + + if (fw_feature_is("enabled", "needs-count-cache-flush-on-context-switch", np)) + security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE); + /* * The features below are enabled by default, so we instead look to see * if firmware has *disabled* them, and clear them if so. @@ -123,6 +129,7 @@ static void pnv_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV));
setup_rfi_flush(type, enable); + setup_count_cache_flush(); }
static void __init pnv_setup_arch(void)
From: Michael Neuling mikey@neuling.org
commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream.
This stops us from doing code patching in init sections after they've been freed.
In this chain: kvm_guest_init() -> kvm_use_magic_page() -> fault_in_pages_readable() -> __get_user() -> __get_user_nocheck() -> barrier_nospec();
We have a code patching location at barrier_nospec() and kvm_guest_init() is an init function. This whole chain gets inlined, so when we free the init section (hence kvm_guest_init()), this code goes away and hence should no longer be patched.
We seen this as userspace memory corruption when using a memory checker while doing partition migration testing on powervm (this starts the code patching post migration via /sys/kernel/mobility/migration). In theory, it could also happen when using /sys/kernel/debug/powerpc/barrier_nospec.
Signed-off-by: Michael Neuling mikey@neuling.org Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/lib/code-patching.c | 13 +++++++++++++ arch/powerpc/mm/mem.c | 2 ++ 3 files changed, 16 insertions(+)
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -8,6 +8,7 @@ extern void ppc_printk_progress(char *s,
extern unsigned int rtas_data; extern unsigned long long memory_limit; +extern bool init_mem_is_free; extern unsigned long klimit; extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
--- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -14,12 +14,25 @@ #include <asm/page.h> #include <asm/code-patching.h> #include <asm/uaccess.h> +#include <asm/setup.h> +#include <asm/sections.h>
+static inline bool is_init(unsigned int *addr) +{ + return addr >= (unsigned int *)__init_begin && addr < (unsigned int *)__init_end; +} + int patch_instruction(unsigned int *addr, unsigned int instr) { int err;
+ /* Make sure we aren't patching a freed init section */ + if (init_mem_is_free && is_init(addr)) { + pr_debug("Skipping init section patching addr: 0x%px\n", addr); + return 0; + } + __put_user_size(instr, addr, 4, err); if (err) return err; --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -62,6 +62,7 @@ #endif
unsigned long long memory_limit; +bool init_mem_is_free;
#ifdef CONFIG_HIGHMEM pte_t *kmap_pte; @@ -381,6 +382,7 @@ void __init mem_init(void) void free_initmem(void) { ppc_md.progress = ppc_printk_progress; + init_mem_is_free = true; free_initmem_default(POISON_FREE_INITMEM); }
From: Diana Craciun diana.craciun@nxp.com
commit 76a5eaa38b15dda92cd6964248c39b5a6f3a4e9d upstream.
In order to protect against speculation attacks (Spectre variant 2) on NXP PowerPC platforms, the branch predictor should be flushed when the privillege level is changed. This patch is adding the infrastructure to fixup at runtime the code sections that are performing the branch predictor flush depending on a boot arg parameter which is added later in a separate patch.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/feature-fixups.h | 12 ++++++++++++ arch/powerpc/include/asm/setup.h | 2 ++ arch/powerpc/kernel/vmlinux.lds.S | 8 ++++++++ arch/powerpc/lib/feature-fixups.c | 23 +++++++++++++++++++++++ 4 files changed, 45 insertions(+)
--- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -216,6 +216,17 @@ label##3: \ FTR_ENTRY_OFFSET 953b-954b; \ .popsection;
+#define START_BTB_FLUSH_SECTION \ +955: \ + +#define END_BTB_FLUSH_SECTION \ +956: \ + .pushsection __btb_flush_fixup,"a"; \ + .align 2; \ +957: \ + FTR_ENTRY_OFFSET 955b-957b; \ + FTR_ENTRY_OFFSET 956b-957b; \ + .popsection;
#ifndef __ASSEMBLY__
@@ -224,6 +235,7 @@ extern long __start___stf_entry_barrier_ extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; extern long __start___barrier_nospec_fixup, __stop___barrier_nospec_fixup; +extern long __start__btb_flush_fixup, __stop__btb_flush_fixup;
#endif
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -53,6 +53,8 @@ void do_barrier_nospec_fixups_range(bool static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; #endif
+void do_btb_flush_fixups(void); + #endif /* !__ASSEMBLY__ */
#endif /* _ASM_POWERPC_SETUP_H */ --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -104,6 +104,14 @@ SECTIONS } #endif /* CONFIG_PPC_BARRIER_NOSPEC */
+#ifdef CONFIG_PPC_FSL_BOOK3E + . = ALIGN(8); + __spec_btb_flush_fixup : AT(ADDR(__spec_btb_flush_fixup) - LOAD_OFFSET) { + __start__btb_flush_fixup = .; + *(__btb_flush_fixup) + __stop__btb_flush_fixup = .; + } +#endif EXCEPTION_TABLE(0)
NOTES :kernel :notes --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -344,6 +344,29 @@ void do_barrier_nospec_fixups_range(bool
printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); } + +static void patch_btb_flush_section(long *curr) +{ + unsigned int *start, *end; + + start = (void *)curr + *curr; + end = (void *)curr + *(curr + 1); + for (; start < end; start++) { + pr_devel("patching dest %lx\n", (unsigned long)start); + patch_instruction(start, PPC_INST_NOP); + } +} + +void do_btb_flush_fixups(void) +{ + long *start, *end; + + start = PTRRELOC(&__start__btb_flush_fixup); + end = PTRRELOC(&__stop__btb_flush_fixup); + + for (; start < end; start += 2) + patch_btb_flush_section(start); +} #endif /* CONFIG_PPC_FSL_BOOK3E */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
From: Diana Craciun diana.craciun@nxp.com
commit 1cbf8990d79ff69da8ad09e8a3df014e1494462b upstream.
The BUCSR register can be used to invalidate the entries in the branch prediction mechanisms.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/ppc_asm.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -821,4 +821,15 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,C .long 0x2400004c /* rfid */ #endif /* !CONFIG_PPC_BOOK3E */ #endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_PPC_FSL_BOOK3E +#define BTB_FLUSH(reg) \ + lis reg,BUCSR_INIT@h; \ + ori reg,reg,BUCSR_INIT@l; \ + mtspr SPRN_BUCSR,reg; \ + isync; +#else +#define BTB_FLUSH(reg) +#endif /* CONFIG_PPC_FSL_BOOK3E */ + #endif /* _ASM_POWERPC_PPC_ASM_H */
From: Diana Craciun diana.craciun@nxp.com
commit 7d8bad99ba5a22892f0cad6881289fdc3875a930 upstream.
Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:
$ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 "Mitigation: Software count cache flush"
Which is wrong. Fix it to report vulnerable for now.
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -23,7 +23,7 @@ enum count_cache_flush_type { COUNT_CACHE_FLUSH_SW = 0x2, COUNT_CACHE_FLUSH_HW = 0x4, }; -static enum count_cache_flush_type count_cache_flush_type; +static enum count_cache_flush_type count_cache_flush_type = COUNT_CACHE_FLUSH_NONE;
bool barrier_nospec_enabled; static bool no_nospec;
From: Diana Craciun diana.craciun@nxp.com
commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06 upstream.
When the command line argument is present, the Spectre variant 2 mitigations are disabled.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/setup.h | 5 +++++ arch/powerpc/kernel/security.c | 21 +++++++++++++++++++++ 2 files changed, 26 insertions(+)
--- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -53,6 +53,11 @@ void do_barrier_nospec_fixups_range(bool static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; #endif
+#ifdef CONFIG_PPC_FSL_BOOK3E +void setup_spectre_v2(void); +#else +static inline void setup_spectre_v2(void) {}; +#endif void do_btb_flush_fixups(void);
#endif /* !__ASSEMBLY__ */ --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -27,6 +27,10 @@ static enum count_cache_flush_type count
bool barrier_nospec_enabled; static bool no_nospec; +static bool btb_flush_enabled; +#ifdef CONFIG_PPC_FSL_BOOK3E +static bool no_spectrev2; +#endif
static void enable_barrier_nospec(bool enable) { @@ -102,6 +106,23 @@ static __init int barrier_nospec_debugfs device_initcall(barrier_nospec_debugfs_init); #endif /* CONFIG_DEBUG_FS */
+#ifdef CONFIG_PPC_FSL_BOOK3E +static int __init handle_nospectre_v2(char *p) +{ + no_spectrev2 = true; + + return 0; +} +early_param("nospectre_v2", handle_nospectre_v2); +void setup_spectre_v2(void) +{ + if (no_spectrev2) + do_btb_flush_fixups(); + else + btb_flush_enabled = true; +} +#endif /* CONFIG_PPC_FSL_BOOK3E */ + #ifdef CONFIG_PPC_BOOK3S_64 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) {
From: Diana Craciun diana.craciun@nxp.com
commit 10c5e83afd4a3f01712d97d3bb1ae34d5b74a185 upstream.
In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e. the kernel is entered), the branch predictor state is flushed.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/entry_64.S | 5 +++++ arch/powerpc/kernel/exceptions-64e.S | 26 +++++++++++++++++++++++++- arch/powerpc/mm/tlb_low_64e.S | 7 +++++++ 3 files changed, 37 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -77,6 +77,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) std r0,GPR0(r1) std r10,GPR1(r1) beq 2f /* if from kernel mode */ +#ifdef CONFIG_PPC_FSL_BOOK3E +START_BTB_FLUSH_SECTION + BTB_FLUSH(r10) +END_BTB_FLUSH_SECTION +#endif ACCOUNT_CPU_USER_ENTRY(r10, r11) 2: std r2,GPR2(r1) std r3,GPR3(r1) --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -295,7 +295,8 @@ ret_from_mc_except: andi. r10,r11,MSR_PR; /* save stack pointer */ \ beq 1f; /* branch around if supervisor */ \ ld r1,PACAKSAVE(r13); /* get kernel stack coming from usr */\ -1: cmpdi cr1,r1,0; /* check if SP makes sense */ \ +1: type##_BTB_FLUSH \ + cmpdi cr1,r1,0; /* check if SP makes sense */ \ bge- cr1,exc_##n##_bad_stack;/* bad stack (TODO: out of line) */ \ mfspr r10,SPRN_##type##_SRR0; /* read SRR0 before touching stack */
@@ -327,6 +328,29 @@ ret_from_mc_except: #define SPRN_MC_SRR0 SPRN_MCSRR0 #define SPRN_MC_SRR1 SPRN_MCSRR1
+#ifdef CONFIG_PPC_FSL_BOOK3E +#define GEN_BTB_FLUSH \ + START_BTB_FLUSH_SECTION \ + beq 1f; \ + BTB_FLUSH(r10) \ + 1: \ + END_BTB_FLUSH_SECTION + +#define CRIT_BTB_FLUSH \ + START_BTB_FLUSH_SECTION \ + BTB_FLUSH(r10) \ + END_BTB_FLUSH_SECTION + +#define DBG_BTB_FLUSH CRIT_BTB_FLUSH +#define MC_BTB_FLUSH CRIT_BTB_FLUSH +#define GDBELL_BTB_FLUSH GEN_BTB_FLUSH +#else +#define GEN_BTB_FLUSH +#define CRIT_BTB_FLUSH +#define DBG_BTB_FLUSH +#define GDBELL_BTB_FLUSH +#endif + #define NORMAL_EXCEPTION_PROLOG(n, intnum, addition) \ EXCEPTION_PROLOG(n, intnum, GEN, addition##_GEN(n))
--- a/arch/powerpc/mm/tlb_low_64e.S +++ b/arch/powerpc/mm/tlb_low_64e.S @@ -69,6 +69,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) std r15,EX_TLB_R15(r12) std r10,EX_TLB_CR(r12) #ifdef CONFIG_PPC_FSL_BOOK3E +START_BTB_FLUSH_SECTION + mfspr r11, SPRN_SRR1 + andi. r10,r11,MSR_PR + beq 1f + BTB_FLUSH(r10) +1: +END_BTB_FLUSH_SECTION std r7,EX_TLB_R7(r12) #endif TLB_MISS_PROLOG_STATS
From: Diana Craciun diana.craciun@nxp.com
commit dfa88658fb0583abb92e062c7a9cd5a5b94f2a46 upstream.
Report branch predictor state flush as a mitigation for Spectre variant 2.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -213,8 +213,11 @@ ssize_t cpu_show_spectre_v2(struct devic
if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW) seq_buf_printf(&s, "(hardware accelerated)"); - } else + } else if (btb_flush_enabled) { + seq_buf_printf(&s, "Mitigation: Branch predictor state flush"); + } else { seq_buf_printf(&s, "Vulnerable"); + }
seq_buf_printf(&s, "\n");
From: Michael Ellerman mpe@ellerman.id.au
commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d upstream.
When I updated the spectre_v2 reporting to handle software count cache flush I got the logic wrong when there's no software count cache enabled at all.
The result is that on systems with the software count cache flush disabled we print:
Mitigation: Indirect branch cache disabled, Software count cache flush
Which correctly indicates that the count cache is disabled, but incorrectly says the software count cache flush is enabled.
The root of the problem is that we are trying to handle all combinations of options. But we know now that we only expect to see the software count cache flush enabled if the other options are false.
So split the two cases, which simplifies the logic and fixes the bug. We were also missing a space before "(hardware accelerated)".
The result is we see one of:
Mitigation: Indirect branch serialisation (kernel only) Mitigation: Indirect branch cache disabled Mitigation: Software count cache flush Mitigation: Software count cache flush (hardware accelerated)
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Michael Ellerman mpe@ellerman.id.au Reviewed-by: Michael Neuling mikey@neuling.org Reviewed-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/security.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -190,29 +190,22 @@ ssize_t cpu_show_spectre_v2(struct devic bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
- if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) { - bool comma = false; + if (bcs || ccd) { seq_buf_printf(&s, "Mitigation: ");
- if (bcs) { + if (bcs) seq_buf_printf(&s, "Indirect branch serialisation (kernel only)"); - comma = true; - }
- if (ccd) { - if (comma) - seq_buf_printf(&s, ", "); - seq_buf_printf(&s, "Indirect branch cache disabled"); - comma = true; - } - - if (comma) + if (bcs && ccd) seq_buf_printf(&s, ", ");
- seq_buf_printf(&s, "Software count cache flush"); + if (ccd) + seq_buf_printf(&s, "Indirect branch cache disabled"); + } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) { + seq_buf_printf(&s, "Mitigation: Software count cache flush");
if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW) - seq_buf_printf(&s, "(hardware accelerated)"); + seq_buf_printf(&s, " (hardware accelerated)"); } else if (btb_flush_enabled) { seq_buf_printf(&s, "Mitigation: Branch predictor state flush"); } else {
From: Christophe Leroy christophe.leroy@c-s.fr
commit 27da80719ef132cf8c80eb406d5aeb37dddf78cc upstream.
The commit identified below adds MC_BTB_FLUSH macro only when CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error on some configs (seen several times with kisskb randconfig_defconfig)
arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush' make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1 make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2 make[1]: *** [Makefile:1043: arch/powerpc] Error 2 make: *** [Makefile:152: sub-make] Error 2
This patch adds a blank definition of MC_BTB_FLUSH for other cases.
Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)") Cc: Diana Craciun diana.craciun@nxp.com Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Reviewed-by: Daniel Axtens dja@axtens.net Reviewed-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/exceptions-64e.S | 1 + 1 file changed, 1 insertion(+)
--- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -348,6 +348,7 @@ ret_from_mc_except: #define GEN_BTB_FLUSH #define CRIT_BTB_FLUSH #define DBG_BTB_FLUSH +#define MC_BTB_FLUSH #define GDBELL_BTB_FLUSH #endif
From: Xin Long lucien.xin@gmail.com
commit 2ac695d1d602ce00b12170242f58c3d3a8e36d04 upstream.
Syzbot found a crash:
BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872 Call Trace: tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872 __tipc_nl_compat_dumpit+0x59e/0xda0 net/tipc/netlink_compat.c:215 tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:280 tipc_nl_compat_handle net/tipc/netlink_compat.c:1226 [inline] tipc_nl_compat_recv+0x1b5f/0x2750 net/tipc/netlink_compat.c:1265 genl_family_rcv_msg net/netlink/genetlink.c:601 [inline] genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:637 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
Uninit was created at: __alloc_skb+0x309/0xa20 net/core/skbuff.c:208 alloc_skb include/linux/skbuff.h:1012 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
It was supposed to be fixed on commit 974cb0e3e7c9 ("tipc: fix uninit-value in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req) in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called ahead of tipc_nl_compat_name_table_dump().
However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd header function. It means even when the check added in that fix fails, it won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be triggered again.
So this patch is to add the process for the err returned from cmd header function in tipc_nl_compat_dumpit().
Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/netlink_compat.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -262,8 +262,14 @@ static int tipc_nl_compat_dumpit(struct if (msg->rep_type) tipc_tlv_init(msg->rep, msg->rep_type);
- if (cmd->header) - (*cmd->header)(msg); + if (cmd->header) { + err = (*cmd->header)(msg); + if (err) { + kfree_skb(msg->rep); + msg->rep = NULL; + return err; + } + }
arg = nlmsg_new(0, GFP_KERNEL); if (!arg) {
From: Linus Torvalds torvalds@linux-foundation.org
commit baf76f0c58aec435a3a864075b8f6d8ee5d1f17e upstream.
This way, slhc_free() accepts what slhc_init() returns, whether that is an error or not.
In particular, the pattern in sl_alloc_bufs() is
slcomp = slhc_init(16, 16); ... slhc_free(slcomp);
for the error handling path, and rather than complicate that code, just make it ok to always free what was returned by the init function.
That's what the code used to do before commit 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely") when slhc_init() just returned NULL for the error case, with no actual indication of the details of the error.
Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com Fixes: 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely") Acked-by: Ben Hutchings ben@decadent.org.uk Cc: David Miller davem@davemloft.net Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/slip/slhc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/slip/slhc.c +++ b/drivers/net/slip/slhc.c @@ -153,7 +153,7 @@ out_fail: void slhc_free(struct slcompress *comp) { - if ( comp == NULLSLCOMPR ) + if ( IS_ERR_OR_NULL(comp) ) return;
if ( comp->tstate != NULLSLSTATE )
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit 91d3f8a629849968dc91d6ce54f2d46abf4feb7f upstream.
Commit 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs") fixes a NULL dereference for all masters except the last one ("256+"), which keeps the stale pointer after the output driver had been unassigned.
Fix the off-by-one.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Fixes: 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/intel_th/gth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hwtracing/intel_th/gth.c +++ b/drivers/hwtracing/intel_th/gth.c @@ -597,7 +597,7 @@ static void intel_th_gth_unassign(struct othdev->output.port = -1; othdev->output.active = false; gth->output[port].output = NULL; - for (master = 0; master < TH_CONFIGURABLE_MASTERS; master++) + for (master = 0; master <= TH_CONFIGURABLE_MASTERS; master++) if (gth->master[master] == port) gth->master[master] = -1; spin_unlock(>h->gth_lock);
From: YueHaibing yuehaibing@huawei.com
commit 89189557b47b35683a27c80ee78aef18248eefb4 upstream.
Syzkaller report this:
sysctl could not get directory: /net//bridge -12 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline] RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline] RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline] RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459 Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48 RSP: 0018:ffff8881bb507778 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568 RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4 R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: erase_entry fs/proc/proc_sysctl.c:178 [inline] erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207 start_unregistering fs/proc/proc_sysctl.c:331 [inline] drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631 get_subdir fs/proc/proc_sysctl.c:1022 [inline] __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335 br_netfilter_init+0x68/0x1000 [br_netfilter] do_one_initcall+0xbc/0x47d init/main.c:901 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 68741688d5fbfe85 ]---
commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") forgot to handle start_unregistering() case, while header->parent is NULL, it calls erase_header() and as seen in the above syzkaller call trace, accessing &header->parent->root will trigger a NULL pointer dereference.
As that commit explained, there is also no need to call start_unregistering() if header->parent is NULL.
Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing yuehaibing@huawei.com Reported-by: Hulk Robot hulkci@huawei.com Reviewed-by: Kees Cook keescook@chromium.org Cc: Luis Chamberlain mcgrof@kernel.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/proc/proc_sysctl.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1550,9 +1550,11 @@ static void drop_sysctl_table(struct ctl if (--header->nreg) return;
- if (parent) + if (parent) { put_links(header); - start_unregistering(header); + start_unregistering(header); + } + if (!--header->count) kfree_rcu(header, rcu);
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family (which is embedded into user-visible "struct nfs_mount_data" structure) despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6) bytes of AF_INET6 address to rpc_sockaddr2uaddr().
Since "struct nfs_mount_data" structure is user-visible, we can't change "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore, assuming that everybody is using AF_INET family when passing address via "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
[1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75...
Reported-by: syzbot syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2020,7 +2020,8 @@ static int nfs23_validate_mount_data(voi memcpy(sap, &data->addr, sizeof(data->addr)); args->nfs_server.addrlen = sizeof(data->addr); args->nfs_server.port = ntohs(data->addr.sin_port); - if (!nfs_verify_server_address(sap)) + if (sap->sa_family != AF_INET || + !nfs_verify_server_address(sap)) goto out_no_address;
if (!(data->flags & NFS_MOUNT_TCP))
From: Florian Westphal fw@strlen.de
commit 7caa56f006e9d712b44f27b32520c66420d5cbc6 upstream.
It means userspace gave us a ruleset where there is some other data after the ebtables target but before the beginning of the next rule.
Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/netfilter/ebtables.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -2046,7 +2046,8 @@ static int ebt_size_mwt(struct compat_eb if (match_kern) match_kern->match_size = ret;
- if (WARN_ON(type == EBT_COMPAT_TARGET && size_left)) + /* rule should have no remaining data after target */ + if (type == EBT_COMPAT_TARGET && size_left) return -EINVAL;
match32 = (struct compat_ebt_entry_mwt *) buf;
From: Xin Long lucien.xin@gmail.com
commit 6f07e5f06c8712acc423485f657799fc8e11e56c upstream.
Syzbot reported the following crash:
BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961 memchr+0xce/0x110 lib/string.c:961 string_is_valid net/tipc/netlink_compat.c:176 [inline] tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401 __tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline] tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354 tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline] tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265 genl_family_rcv_msg net/netlink/genetlink.c:601 [inline] genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:637 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
Uninit was created at: __alloc_skb+0x309/0xa20 net/core/skbuff.c:208 alloc_skb include/linux/skbuff.h:1012 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline]
It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME, it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which also includes priority and disc_domain length.
This patch is to fix it by checking it with a right length: 'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/netlink_compat.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -388,7 +388,12 @@ static int tipc_nl_compat_bearer_enable( if (!bearer) return -EMSGSIZE;
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_BEARER_NAME); + len = TLV_GET_DATA_LEN(msg->req); + len -= offsetof(struct tipc_bearer_config, name); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_BEARER_NAME); if (!string_is_valid(b->name, len)) return -EINVAL;
From: Xin Long lucien.xin@gmail.com
commit 8c63bf9ab4be8b83bd8c34aacfd2f1d2c8901c8a upstream.
A similar issue as fixed by Patch "tipc: check bearer name with right length in tipc_nl_compat_bearer_enable" was also found by syzbot in tipc_nl_compat_link_set().
The length to check with should be 'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_link_config, name)'.
Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/netlink_compat.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -738,7 +738,12 @@ static int tipc_nl_compat_link_set(struc
lc = (struct tipc_link_config *)TLV_DATA(msg->req);
- len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_LINK_NAME); + len = TLV_GET_DATA_LEN(msg->req); + len -= offsetof(struct tipc_link_config, name); + if (len <= 0) + return -EINVAL; + + len = min_t(int, len, TIPC_MAX_LINK_NAME); if (!string_is_valid(lc->name, len)) return -EINVAL;
From: Daniel Borkmann daniel@iogearbox.net
commit f7bd9e36ee4a4ce38e1cddd7effe6c0d9943285b upstream.
Add a bpf_check_basics_ok() and reject filters that are of invalid size much earlier, so we don't do any useless work such as invoking bpf_prog_alloc(). Currently, rejection happens in bpf_check_classic() only, but it's really unnecessarily late and they should be rejected at earliest point. While at it, also clean up one bpf_prog_size() to make it consistent with the remaining invocations.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Zubin Mithra zsm@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -742,6 +742,17 @@ static bool chk_code_allowed(u16 code_to return codes[code_to_probe]; }
+static bool bpf_check_basics_ok(const struct sock_filter *filter, + unsigned int flen) +{ + if (filter == NULL) + return false; + if (flen == 0 || flen > BPF_MAXINSNS) + return false; + + return true; +} + /** * bpf_check_classic - verify socket filter code * @filter: filter to verify @@ -762,9 +773,6 @@ static int bpf_check_classic(const struc bool anc_found; int pc;
- if (flen == 0 || flen > BPF_MAXINSNS) - return -EINVAL; - /* Check the filter code now */ for (pc = 0; pc < flen; pc++) { const struct sock_filter *ftest = &filter[pc]; @@ -1057,7 +1065,7 @@ int bpf_prog_create(struct bpf_prog **pf struct bpf_prog *fp;
/* Make sure new filter is there and in the right amounts. */ - if (fprog->filter == NULL) + if (!bpf_check_basics_ok(fprog->filter, fprog->len)) return -EINVAL;
fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0); @@ -1104,7 +1112,7 @@ int bpf_prog_create_from_user(struct bpf int err;
/* Make sure new filter is there and in the right amounts. */ - if (fprog->filter == NULL) + if (!bpf_check_basics_ok(fprog->filter, fprog->len)) return -EINVAL;
fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0); @@ -1184,7 +1192,6 @@ int __sk_attach_filter(struct sock_fprog bool locked) { unsigned int fsize = bpf_classic_proglen(fprog); - unsigned int bpf_fsize = bpf_prog_size(fprog->len); struct bpf_prog *prog; int err;
@@ -1192,10 +1199,10 @@ int __sk_attach_filter(struct sock_fprog return -EPERM;
/* Make sure new filter is there and in the right amounts. */ - if (fprog->filter == NULL) + if (!bpf_check_basics_ok(fprog->filter, fprog->len)) return -EINVAL;
- prog = bpf_prog_alloc(bpf_fsize, 0); + prog = bpf_prog_alloc(bpf_prog_size(fprog->len), 0); if (!prog) return -ENOMEM;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit b3f3107fbd928fed6e4fecbe3da2ed5f43216439 which is commit 310ca162d779efee8a2dc3731439680f3e9c1e86 upstream.
Jan Kara has reported seeing problems with this patch applied, as has Salvatore Bonaccorso, so let's drop it for now.
Reported-by: Salvatore Bonaccorso carnil@debian.org Reported-by: Jan Kara jack@suse.cz Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/loop.c | 42 +++++++++++++++++++++--------------------- drivers/block/loop.h | 1 + 2 files changed, 22 insertions(+), 21 deletions(-)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -82,7 +82,6 @@
static DEFINE_IDR(loop_index_idr); static DEFINE_MUTEX(loop_index_mutex); -static DEFINE_MUTEX(loop_ctl_mutex);
static int max_part; static int part_shift; @@ -1045,7 +1044,7 @@ static int loop_clr_fd(struct loop_devic */ if (atomic_read(&lo->lo_refcnt) > 1) { lo->lo_flags |= LO_FLAGS_AUTOCLEAR; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return 0; }
@@ -1094,12 +1093,12 @@ static int loop_clr_fd(struct loop_devic if (!part_shift) lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; loop_unprepare_queue(lo); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); /* - * Need not hold loop_ctl_mutex to fput backing file. - * Calling fput holding loop_ctl_mutex triggers a circular + * Need not hold lo_ctl_mutex to fput backing file. + * Calling fput holding lo_ctl_mutex triggers a circular * lock dependency possibility warning as fput can take - * bd_mutex which is usually taken before loop_ctl_mutex. + * bd_mutex which is usually taken before lo_ctl_mutex. */ fput(filp); return 0; @@ -1362,7 +1361,7 @@ static int lo_ioctl(struct block_device struct loop_device *lo = bdev->bd_disk->private_data; int err;
- mutex_lock_nested(&loop_ctl_mutex, 1); + mutex_lock_nested(&lo->lo_ctl_mutex, 1); switch (cmd) { case LOOP_SET_FD: err = loop_set_fd(lo, mode, bdev, arg); @@ -1371,7 +1370,7 @@ static int lo_ioctl(struct block_device err = loop_change_fd(lo, bdev, arg); break; case LOOP_CLR_FD: - /* loop_clr_fd would have unlocked loop_ctl_mutex on success */ + /* loop_clr_fd would have unlocked lo_ctl_mutex on success */ err = loop_clr_fd(lo); if (!err) goto out_unlocked; @@ -1407,7 +1406,7 @@ static int lo_ioctl(struct block_device default: err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL; } - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex);
out_unlocked: return err; @@ -1540,16 +1539,16 @@ static int lo_compat_ioctl(struct block_
switch(cmd) { case LOOP_SET_STATUS: - mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); err = loop_set_status_compat( lo, (const struct compat_loop_info __user *) arg); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; case LOOP_GET_STATUS: - mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); err = loop_get_status_compat( lo, (struct compat_loop_info __user *) arg); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; case LOOP_SET_CAPACITY: case LOOP_CLR_FD: @@ -1593,7 +1592,7 @@ static void __lo_release(struct loop_dev if (atomic_dec_return(&lo->lo_refcnt)) return;
- mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) { /* * In autoclear mode, stop the loop thread @@ -1610,7 +1609,7 @@ static void __lo_release(struct loop_dev loop_flush(lo); }
- mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); }
static void lo_release(struct gendisk *disk, fmode_t mode) @@ -1656,10 +1655,10 @@ static int unregister_transfer_cb(int id struct loop_device *lo = ptr; struct loop_func_table *xfer = data;
- mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); if (lo->lo_encryption == xfer) loop_release_xfer(lo); - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); return 0; }
@@ -1821,6 +1820,7 @@ static int loop_add(struct loop_device * if (!part_shift) disk->flags |= GENHD_FL_NO_PART_SCAN; disk->flags |= GENHD_FL_EXT_DEVT; + mutex_init(&lo->lo_ctl_mutex); atomic_set(&lo->lo_refcnt, 0); lo->lo_number = i; spin_lock_init(&lo->lo_lock); @@ -1933,19 +1933,19 @@ static long loop_control_ioctl(struct fi ret = loop_lookup(&lo, parm); if (ret < 0) break; - mutex_lock(&loop_ctl_mutex); + mutex_lock(&lo->lo_ctl_mutex); if (lo->lo_state != Lo_unbound) { ret = -EBUSY; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; } if (atomic_read(&lo->lo_refcnt) > 0) { ret = -EBUSY; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); break; } lo->lo_disk->private_data = NULL; - mutex_unlock(&loop_ctl_mutex); + mutex_unlock(&lo->lo_ctl_mutex); idr_remove(&loop_index_idr, lo->lo_number); loop_remove(lo); break; --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -55,6 +55,7 @@ struct loop_device {
spinlock_t lo_lock; int lo_state; + struct mutex lo_ctl_mutex; struct kthread_worker worker; struct task_struct *worker_task; bool use_dio;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 20ff83f10f113c88d0bb74589389b05250994c16 ]
Before calling __ip_options_compile(), we need to ensure the network header is a an IPv4 one, and that it is already pulled in skb->head.
RAW sockets going through a tunnel can end up calling ipv4_link_failure() with total garbage in the skb, or arbitrary lengthes.
syzbot report :
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204
CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x123/0x190 mm/kasan/generic.c:191 memcpy+0x38/0x50 mm/kasan/common.c:133 memcpy include/linux/string.h:355 [inline] __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204 dst_link_failure include/net/dst.h:427 [inline] vti6_xmit net/ipv6/ip6_vti.c:514 [inline] vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553 __netdev_start_xmit include/linux/netdevice.h:4414 [inline] netdev_start_xmit include/linux/netdevice.h:4423 [inline] xmit_one net/core/dev.c:3292 [inline] dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527 neigh_output include/net/neighbour.h:508 [inline] ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip_output+0x21f/0x670 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:289 [inline] raw_send_hdrinc net/ipv4/raw.c:432 [inline] raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:661 sock_write_iter+0x27c/0x3e0 net/socket.c:988 call_write_iter include/linux/fs.h:1866 [inline] new_sync_write+0x4c7/0x760 fs/read_write.c:474 __vfs_write+0xe4/0x110 fs/read_write.c:487 vfs_write+0x20c/0x580 fs/read_write.c:549 ksys_write+0x14f/0x2d0 fs/read_write.c:599 __do_sys_write fs/read_write.c:611 [inline] __se_sys_write fs/read_write.c:608 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:608 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458c29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29 RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4 R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff
The buggy address belongs to the page: page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1fffc0000000000() raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
^ ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Stephen Suryaputra ssuryaextr@gmail.com Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/route.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-)
--- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1162,25 +1162,39 @@ static struct dst_entry *ipv4_dst_check( return dst; }
-static void ipv4_link_failure(struct sk_buff *skb) +static void ipv4_send_dest_unreach(struct sk_buff *skb) { struct ip_options opt; - struct rtable *rt; int res;
/* Recompile ip options since IPCB may not be valid anymore. + * Also check we have a reasonable ipv4 header. */ - memset(&opt, 0, sizeof(opt)); - opt.optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr); - - rcu_read_lock(); - res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL); - rcu_read_unlock(); - - if (res) + if (!pskb_network_may_pull(skb, sizeof(struct iphdr)) || + ip_hdr(skb)->version != 4 || ip_hdr(skb)->ihl < 5) return;
+ memset(&opt, 0, sizeof(opt)); + if (ip_hdr(skb)->ihl > 5) { + if (!pskb_network_may_pull(skb, ip_hdr(skb)->ihl * 4)) + return; + opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr); + + rcu_read_lock(); + res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL); + rcu_read_unlock(); + + if (res) + return; + } __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt); +} + +static void ipv4_link_failure(struct sk_buff *skb) +{ + struct rtable *rt; + + ipv4_send_dest_unreach(skb);
rt = skb_rtable(skb); if (rt)
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 925b0c841e066b488cc3a60272472b2c56300704 ]
If we add a bond device which is already the master of the team interface, we will hold the team->lock in team_add_slave() first and then request the lock in team_set_mac_address() again. The functions are called like:
- team_add_slave() - team_port_add() - team_port_enter() - team_modeop_port_enter() - __set_port_dev_addr() - dev_set_mac_address() - bond_set_mac_address() - dev_set_mac_address() - team_set_mac_address
Although team_upper_dev_link() would check the upper devices but it is called too late. Fix it by adding a checking before processing the slave.
v2: Do not split the string in netdev_err()
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/team/team.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1136,6 +1136,12 @@ static int team_port_add(struct team *te return -EINVAL; }
+ if (netdev_has_upper_dev(dev, port_dev)) { + netdev_err(dev, "Device %s is already an upper device of the team interface\n", + portname); + return -EBUSY; + } + if (port_dev->features & NETIF_F_VLAN_CHALLENGED && vlan_uses_dev(dev)) { netdev_err(dev, "Device %s is VLAN challenged and team device has VLAN set up\n",
From: Vinod Koul vkoul@kernel.org
[ Upstream commit b561af36b1841088552464cdc3f6371d92f17710 ]
stmmac_check_ether_addr() checks the MAC address and assigns one in driver open(). In many cases when we create slave netdevice, the dev addr is inherited from master but the master dev addr maybe NULL at that time, so move this call to driver probe so that address is always valid.
Signed-off-by: Xiaofei Shen xiaofeis@codeaurora.org Tested-by: Xiaofei Shen xiaofeis@codeaurora.org Signed-off-by: Sneh Shah snehshah@codeaurora.org Signed-off-by: Vinod Koul vkoul@kernel.org Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1792,8 +1792,6 @@ static int stmmac_open(struct net_device struct stmmac_priv *priv = netdev_priv(dev); int ret;
- stmmac_check_ether_addr(priv); - if (priv->pcs != STMMAC_PCS_RGMII && priv->pcs != STMMAC_PCS_TBI && priv->pcs != STMMAC_PCS_RTBI) { ret = stmmac_init_phy(dev); @@ -2929,6 +2927,8 @@ int stmmac_dvr_probe(struct device *devi if (ret) goto error_hw_init;
+ stmmac_check_ether_addr(priv); + ndev->netdev_ops = &stmmac_netdev_ops;
ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
From: ZhangXiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ]
There is a UBSAN report as below: UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56 signed integer overflow: 2147483647 * 1000 cannot be represented in type 'int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1 Call Trace: <IRQ> dump_stack+0x8c/0xba ubsan_epilogue+0x11/0x60 handle_overflow+0x12d/0x170 ? ttwu_do_wakeup+0x21/0x320 __ubsan_handle_mul_overflow+0x12/0x20 tcp_ack_update_rtt+0x76c/0x780 tcp_clean_rtx_queue+0x499/0x14d0 tcp_ack+0x69e/0x1240 ? __wake_up_sync_key+0x2c/0x50 ? update_group_capacity+0x50/0x680 tcp_rcv_established+0x4e2/0xe10 tcp_v4_do_rcv+0x22b/0x420 tcp_v4_rcv+0xfe8/0x1190 ip_protocol_deliver_rcu+0x36/0x180 ip_local_deliver+0x15b/0x1a0 ip_rcv+0xac/0xd0 __netif_receive_skb_one_core+0x7f/0xb0 __netif_receive_skb+0x33/0xc0 netif_receive_skb_internal+0x84/0x1c0 napi_gro_receive+0x2a0/0x300 receive_buf+0x3d4/0x2350 ? detach_buf_split+0x159/0x390 virtnet_poll+0x198/0x840 ? reweight_entity+0x243/0x4b0 net_rx_action+0x25c/0x770 __do_softirq+0x19b/0x66d irq_exit+0x1eb/0x230 do_IRQ+0x7a/0x150 common_interrupt+0xf/0xf </IRQ>
It can be reproduced by: echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: ZhangXiaoxu zhangxiaoxu5@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/networking/ip-sysctl.txt | 1 + net/ipv4/sysctl_net_ipv4.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -387,6 +387,7 @@ tcp_min_rtt_wlen - INTEGER minimum RTT when it is moved to a longer path (e.g., due to traffic engineering). A longer window makes the filter more resistant to RTT inflations such as transient congestion. The unit is seconds. + Possible values: 0 - 86400 (1 day) Default: 300
tcp_moderate_rcvbuf - BOOLEAN --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -42,6 +42,7 @@ static int tcp_syn_retries_min = 1; static int tcp_syn_retries_max = MAX_TCP_SYNCNT; static int ip_ping_group_range_min[] = { 0, 0 }; static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; +static int one_day_secs = 24 * 3600;
/* Update system visible IP port range */ static void set_local_port_range(struct net *net, int range[2]) @@ -597,7 +598,9 @@ static struct ctl_table ipv4_table[] = { .data = &sysctl_tcp_min_rtt_wlen, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one_day_secs }, { .procname = "tcp_low_latency",
From: Diana Craciun diana.craciun@nxp.com
commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream.
If the user choses not to use the mitigations, replace the code sequence with nops.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/setup_32.c | 1 + arch/powerpc/kernel/setup_64.c | 1 + 2 files changed, 2 insertions(+)
--- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -323,6 +323,7 @@ void __init setup_arch(char **cmdline_p) if ( ppc_md.progress ) ppc_md.progress("arch: exit", 0x3eab);
setup_barrier_nospec(); + setup_spectre_v2();
paging_init();
--- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -737,6 +737,7 @@ void __init setup_arch(char **cmdline_p) ppc_md.setup_arch();
setup_barrier_nospec(); + setup_spectre_v2();
paging_init();
From: Diana Craciun diana.craciun@nxp.com
commit e7aa61f47b23afbec41031bc47ca8d6cb6516abc upstream.
Switching from the guest to host is another place where the speculative accesses can be exploited. Flush the branch predictor when entering KVM.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/bookehv_interrupts.S | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -75,6 +75,10 @@ PPC_LL r1, VCPU_HOST_STACK(r4) PPC_LL r2, HOST_R2(r1)
+START_BTB_FLUSH_SECTION + BTB_FLUSH(r10) +END_BTB_FLUSH_SECTION + mfspr r10, SPRN_PID lwz r8, VCPU_HOST_PID(r4) PPC_LL r11, VCPU_SHARED(r4)
From: Diana Craciun diana.craciun@nxp.com
commit 98518c4d8728656db349f875fcbbc7c126d4c973 upstream.
In order to flush the branch predictor the guest kernel performs writes to the BUCSR register which is hypervisor privilleged. However, the branch predictor is flushed at each KVM entry, so the branch predictor has been already flushed, so just return as soon as possible to guest.
Signed-off-by: Diana Craciun diana.craciun@nxp.com [mpe: Tweak comment formatting] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/e500_emulate.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -277,6 +277,13 @@ int kvmppc_core_emulate_mtspr_e500(struc vcpu->arch.pwrmgtcr0 = spr_val; break;
+ case SPRN_BUCSR: + /* + * If we are here, it means that we have already flushed the + * branch predictor, so just return to guest. + */ + break; + /* extra exceptions */ #ifdef CONFIG_SPE_POSSIBLE case SPRN_IVOR32:
From: Diana Craciun diana.craciun@nxp.com
commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream.
In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e.the kernel is entered), the branch predictor state is flushed.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/head_booke.h | 6 ++++++ arch/powerpc/kernel/head_fsl_booke.S | 15 +++++++++++++++ 2 files changed, 21 insertions(+)
--- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -42,6 +42,9 @@ andi. r11, r11, MSR_PR; /* check whether user or kernel */\ mr r11, r1; \ beq 1f; \ +START_BTB_FLUSH_SECTION \ + BTB_FLUSH(r11) \ +END_BTB_FLUSH_SECTION \ /* if from user, start at top of this thread's kernel stack */ \ lwz r11, THREAD_INFO-THREAD(r10); \ ALLOC_STACK_FRAME(r11, THREAD_SIZE); \ @@ -127,6 +130,9 @@ stw r9,_CCR(r8); /* save CR on stack */\ mfspr r11,exc_level_srr1; /* check whether user or kernel */\ DO_KVM BOOKE_INTERRUPT_##intno exc_level_srr1; \ +START_BTB_FLUSH_SECTION \ + BTB_FLUSH(r10) \ +END_BTB_FLUSH_SECTION \ andi. r11,r11,MSR_PR; \ mfspr r11,SPRN_SPRG_THREAD; /* if from user, start at top of */\ lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\ --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -451,6 +451,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfcr r13 stw r13, THREAD_NORMSAVE(3)(r10) DO_KVM BOOKE_INTERRUPT_DTLB_MISS SPRN_SRR1 +START_BTB_FLUSH_SECTION + mfspr r11, SPRN_SRR1 + andi. r10,r11,MSR_PR + beq 1f + BTB_FLUSH(r10) +1: +END_BTB_FLUSH_SECTION mfspr r10, SPRN_DEAR /* Get faulting address */
/* If we are faulting a kernel address, we have to use the @@ -545,6 +552,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfcr r13 stw r13, THREAD_NORMSAVE(3)(r10) DO_KVM BOOKE_INTERRUPT_ITLB_MISS SPRN_SRR1 +START_BTB_FLUSH_SECTION + mfspr r11, SPRN_SRR1 + andi. r10,r11,MSR_PR + beq 1f + BTB_FLUSH(r10) +1: +END_BTB_FLUSH_SECTION + mfspr r10, SPRN_SRR0 /* Get faulting address */
/* If we are faulting a kernel address, we have to use the
From: Diana Craciun diana.craciun@nxp.com
commit c28218d4abbf4f2035495334d8bfcba64bda4787 upstream.
Used barrier_nospec to sanitize the syscall table.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/entry_32.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -33,6 +33,7 @@ #include <asm/unistd.h> #include <asm/ftrace.h> #include <asm/ptrace.h> +#include <asm/barrier.h>
/* * MSR_KERNEL is > 0x10000 on 4xx/Book-E since it include MSR_CE. @@ -340,6 +341,15 @@ syscall_dotrace_cont: ori r10,r10,sys_call_table@l slwi r0,r0,2 bge- 66f + + barrier_nospec_asm + /* + * Prevent the load of the handler below (based on the user-passed + * system call number) being speculatively executed until the test + * against NR_syscalls and branch to .66f above has + * committed. + */ + lwzx r10,r10,r0 /* Fetch system call handler [ptr] */ mtlr r10 addi r9,r1,STACK_FRAME_OVERHEAD
From: Diana Craciun diana.craciun@nxp.com
commit 039daac5526932ec731e4499613018d263af8b3e upstream.
Fixed the following build warning: powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from `arch/powerpc/kernel/head_44x.o' being placed in section `__btb_flush_fixup'.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/head_booke.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -31,6 +31,16 @@ */ #define THREAD_NORMSAVE(offset) (THREAD_NORMSAVES + (offset * 4))
+#ifdef CONFIG_PPC_FSL_BOOK3E +#define BOOKE_CLEAR_BTB(reg) \ +START_BTB_FLUSH_SECTION \ + BTB_FLUSH(reg) \ +END_BTB_FLUSH_SECTION +#else +#define BOOKE_CLEAR_BTB(reg) +#endif + + #define NORMAL_EXCEPTION_PROLOG(intno) \ mtspr SPRN_SPRG_WSCRATCH0, r10; /* save one register */ \ mfspr r10, SPRN_SPRG_THREAD; \ @@ -42,9 +52,7 @@ andi. r11, r11, MSR_PR; /* check whether user or kernel */\ mr r11, r1; \ beq 1f; \ -START_BTB_FLUSH_SECTION \ - BTB_FLUSH(r11) \ -END_BTB_FLUSH_SECTION \ + BOOKE_CLEAR_BTB(r11) \ /* if from user, start at top of this thread's kernel stack */ \ lwz r11, THREAD_INFO-THREAD(r10); \ ALLOC_STACK_FRAME(r11, THREAD_SIZE); \ @@ -130,9 +138,7 @@ END_BTB_FLUSH_SECTION \ stw r9,_CCR(r8); /* save CR on stack */\ mfspr r11,exc_level_srr1; /* check whether user or kernel */\ DO_KVM BOOKE_INTERRUPT_##intno exc_level_srr1; \ -START_BTB_FLUSH_SECTION \ - BTB_FLUSH(r10) \ -END_BTB_FLUSH_SECTION \ + BOOKE_CLEAR_BTB(r10) \ andi. r11,r11,MSR_PR; \ mfspr r11,SPRN_SPRG_THREAD; /* if from user, start at top of */\ lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\
From: Diana Craciun diana.craciun@nxp.com
commit e59f5bd759b7dee57593c5b6c0441609bda5d530 upstream.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2450,7 +2450,7 @@ bytes respectively. Such letter suffixes
nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.
- nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2 + nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may allow data leaks with this option, which is equivalent to spectre_v2=off.
From: Diana Craciun diana.craciun@nxp.com
commit 26cb1f36c43ee6e89d2a9f48a5a7500d5248f836 upstream.
Currently only supported on powerpc.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/kernel-parameters.txt | 4 ++++ 1 file changed, 4 insertions(+)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2450,6 +2450,10 @@ bytes respectively. Such letter suffixes
nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.
+ nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds + check bypass). With this option data leaks are possible + in the system. + nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may allow data leaks with this option, which is equivalent
From: Alexander Kappner agk@godking.net
commit bb1b40c7cb863f0800a6410c7dcb86cf3f28d3b1 upstream.
iOS devices require the host to be "trusted" before servicing network packets. Establishing trust requires the user to confirm a dialog on the iOS device.Until trust is established, the iOS device will silently discard network packets from the host. Currently, the ipheth driver does not detect whether an iOS device has established trust with the host, and immediately sets up the transmit queues.
This causes the following problems:
- Kernel taint due to WARN() in netdev watchdog. - Dmesg spam ("TX timeout"). - Disruption of user space networking activity (dhcpd, etc...) when new interface comes up but cannot be used. - Unnecessary host and device wakeups and USB traffic
Example dmesg output:
[ 1101.319778] NETDEV WATCHDOG: eth1 (ipheth): transmit queue 0 timed out [ 1101.319817] ------------[ cut here ]------------ [ 1101.319828] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x20f/0x220 [ 1101.319831] Modules linked in: ipheth usbmon nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) iwlmvm mac80211 iwlwifi btusb btrtl btbcm btintel qmi_wwan bluetooth cfg80211 ecdh_generic thinkpad_acpi rfkill [last unloaded: ipheth] [ 1101.319861] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.13.12.1 #1 [ 1101.319864] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W (1.35 ) 11/10/2016 [ 1101.319867] task: ffffffff81e11500 task.stack: ffffffff81e00000 [ 1101.319873] RIP: 0010:dev_watchdog+0x20f/0x220 [ 1101.319876] RSP: 0018:ffff8810a3c03e98 EFLAGS: 00010292 [ 1101.319880] RAX: 000000000000003a RBX: 0000000000000000 RCX: 0000000000000000 [ 1101.319883] RDX: ffff8810a3c15c48 RSI: ffffffff81ccbfc2 RDI: 00000000ffffffff [ 1101.319886] RBP: ffff880c04ebc41c R08: 0000000000000000 R09: 0000000000000379 [ 1101.319889] R10: 00000100696589d0 R11: 0000000000000378 R12: ffff880c04ebc000 [ 1101.319892] R13: 0000000000000000 R14: 0000000000000001 R15: ffff880c2865fc80 [ 1101.319896] FS: 0000000000000000(0000) GS:ffff8810a3c00000(0000) knlGS:0000000000000000 [ 1101.319899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1101.319902] CR2: 00007f3ff24ac000 CR3: 0000000001e0a000 CR4: 00000000003406f0 [ 1101.319905] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1101.319908] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1101.319910] Call Trace: [ 1101.319914] <IRQ> [ 1101.319921] ? dev_graft_qdisc+0x70/0x70 [ 1101.319928] ? dev_graft_qdisc+0x70/0x70 [ 1101.319934] ? call_timer_fn+0x2e/0x170 [ 1101.319939] ? dev_graft_qdisc+0x70/0x70 [ 1101.319944] ? run_timer_softirq+0x1ea/0x440 [ 1101.319951] ? timerqueue_add+0x54/0x80 [ 1101.319956] ? enqueue_hrtimer+0x38/0xa0 [ 1101.319963] ? __do_softirq+0xed/0x2e7 [ 1101.319970] ? irq_exit+0xb4/0xc0 [ 1101.319976] ? smp_apic_timer_interrupt+0x39/0x50 [ 1101.319981] ? apic_timer_interrupt+0x8c/0xa0 [ 1101.319983] </IRQ> [ 1101.319992] ? cpuidle_enter_state+0xfa/0x2a0 [ 1101.319999] ? do_idle+0x1a3/0x1f0 [ 1101.320004] ? cpu_startup_entry+0x5f/0x70 [ 1101.320011] ? start_kernel+0x444/0x44c [ 1101.320017] ? early_idt_handler_array+0x120/0x120 [ 1101.320023] ? x86_64_start_kernel+0x145/0x154 [ 1101.320028] ? secondary_startup_64+0x9f/0x9f [ 1101.320033] Code: 20 04 00 00 eb 9f 4c 89 e7 c6 05 59 44 71 00 01 e8 a7 df fd ff 89 d9 4c 89 e6 48 c7 c7 70 b7 cd 81 48 89 c2 31 c0 e8 97 64 90 ff <0f> ff eb bf 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 [ 1101.320103] ---[ end trace 0cc4d251e2b57080 ]--- [ 1101.320110] ipheth 1-5:4.2: ipheth_tx_timeout: TX timeout
The last message "TX timeout" is repeated every 5 seconds until trust is established or the device is disconnected, filling up dmesg.
The proposed patch eliminates the problem by, upon connection, keeping the TX queue and carrier disabled until a packet is first received from the iOS device. This is reflected by the confirmed_pairing variable in the device structure. Only after at least one packet has been received from the iOS device, the transmit queue and carrier are brought up during the periodic device poll in ipheth_carrier_set. Because the iOS device will always send a packet immediately upon trust being established, this should not delay the interface becoming useable. To prevent failed UBRs in ipheth_rcvbulk_callback from perpetually re-enabling the queue if it was disabled, a new check is added so only successful transfers re-enable the queue, whereas failed transfers only trigger an immediate poll.
This has the added benefit of removing the periodic control requests to the iOS device until trust has been established and thus should reduce wakeup events on both the host and the iOS device.
Signed-off-by: Alexander Kappner agk@godking.net Signed-off-by: David S. Miller davem@davemloft.net [groeck: Fixed context conflict seen because 45611c61dd50 was applied first] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/ipheth.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-)
--- a/drivers/net/usb/ipheth.c +++ b/drivers/net/usb/ipheth.c @@ -148,6 +148,7 @@ struct ipheth_device { u8 bulk_in; u8 bulk_out; struct delayed_work carrier_work; + bool confirmed_pairing; };
static int ipheth_rx_submit(struct ipheth_device *dev, gfp_t mem_flags); @@ -259,7 +260,7 @@ static void ipheth_rcvbulk_callback(stru
dev->net->stats.rx_packets++; dev->net->stats.rx_bytes += len; - + dev->confirmed_pairing = true; netif_rx(skb); ipheth_rx_submit(dev, GFP_ATOMIC); } @@ -280,14 +281,21 @@ static void ipheth_sndbulk_callback(stru dev_err(&dev->intf->dev, "%s: urb status: %d\n", __func__, status);
- netif_wake_queue(dev->net); + if (status == 0) + netif_wake_queue(dev->net); + else + // on URB error, trigger immediate poll + schedule_delayed_work(&dev->carrier_work, 0); }
static int ipheth_carrier_set(struct ipheth_device *dev) { struct usb_device *udev = dev->udev; int retval; - + if (!dev) + return 0; + if (!dev->confirmed_pairing) + return 0; retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, IPHETH_CTRL_ENDP), IPHETH_CMD_CARRIER_CHECK, /* request */ @@ -302,11 +310,14 @@ static int ipheth_carrier_set(struct iph return retval; }
- if (dev->ctrl_buf[0] == IPHETH_CARRIER_ON) + if (dev->ctrl_buf[0] == IPHETH_CARRIER_ON) { netif_carrier_on(dev->net); - else + if (dev->tx_urb->status != -EINPROGRESS) + netif_wake_queue(dev->net); + } else { netif_carrier_off(dev->net); - + netif_stop_queue(dev->net); + } return 0; }
@@ -386,7 +397,6 @@ static int ipheth_open(struct net_device return retval;
schedule_delayed_work(&dev->carrier_work, IPHETH_CARRIER_CHECK_TIMEOUT); - netif_start_queue(net); return retval; }
@@ -489,7 +499,7 @@ static int ipheth_probe(struct usb_inter dev->udev = udev; dev->net = netdev; dev->intf = intf; - + dev->confirmed_pairing = false; /* Set up endpoints */ hintf = usb_altnum_to_altsetting(intf, IPHETH_ALT_INTFNUM); if (hintf == NULL) { @@ -540,7 +550,9 @@ static int ipheth_probe(struct usb_inter retval = -EIO; goto err_register_netdev; } - + // carrier down and transmit queues stopped until packet from device + netif_carrier_off(netdev); + netif_tx_stop_all_queues(netdev); dev_info(&intf->dev, "Apple iPhone USB Ethernet device attached\n"); return 0;
From: Gustavo A. R. Silva garsilva@embeddedor.com
commit 61c59355e0154a938b28710dfa6c1d8be2ddcefa upstream.
_dev_ is being dereferenced before it is null checked, hence there is a potential null pointer dereference.
Fix this by moving the pointer dereference after _dev_ has been null checked.
Addresses-Coverity-ID: 1462020 Fixes: bb1b40c7cb86 ("usbnet: ipheth: prevent TX queue timeouts when device not ready") Signed-off-by: Gustavo A. R. Silva garsilva@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/ipheth.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/ipheth.c +++ b/drivers/net/usb/ipheth.c @@ -290,12 +290,15 @@ static void ipheth_sndbulk_callback(stru
static int ipheth_carrier_set(struct ipheth_device *dev) { - struct usb_device *udev = dev->udev; + struct usb_device *udev; int retval; + if (!dev) return 0; if (!dev->confirmed_pairing) return 0; + + udev = dev->udev; retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, IPHETH_CTRL_ENDP), IPHETH_CMD_CARRIER_CHECK, /* request */
[ Upstream commit 5bf7295fe34a5251b1d241b9736af4697b590670 ]
netdev_alloc_skb can fail and return a NULL pointer which is dereferenced without a check. The patch avoids such a scenario.
Signed-off-by: Aditya Pakki pakki001@umn.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c index 0a2318cad34d..63ebc491057b 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c @@ -1038,6 +1038,8 @@ int qlcnic_do_lb_test(struct qlcnic_adapter *adapter, u8 mode)
for (i = 0; i < QLCNIC_NUM_ILB_PKT; i++) { skb = netdev_alloc_skb(adapter->netdev, QLCNIC_ILB_PKT_SIZE); + if (!skb) + break; qlcnic_create_loopback_buff(skb->data, adapter->mac_addr); skb_put(skb, QLCNIC_ILB_PKT_SIZE); adapter->ahw->diag_cnt = 0;
[ Upstream commit e166e4fdaced850bee3d5ee12a5740258fb30587 ]
Since Commit 21d1196a35f5 ("ipv4: set transport header earlier"), skb->transport_header has been always set before entering INET netfilter. This patch is to set skb->transport_header for bridge before entering INET netfilter by bridge-nf-call-iptables.
It also fixes an issue that sctp_error() couldn't compute a right csum due to unset skb->transport_header.
Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code") Reported-by: Li Shuang shuali@redhat.com Suggested-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Acked-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- net/bridge/br_netfilter_hooks.c | 1 + net/bridge/br_netfilter_ipv6.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 93b5525bcccf..2ae0451fd634 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -507,6 +507,7 @@ static unsigned int br_nf_pre_routing(void *priv, nf_bridge->ipv4_daddr = ip_hdr(skb)->daddr;
skb->protocol = htons(ETH_P_IP); + skb->transport_header = skb->network_header + ip_hdr(skb)->ihl * 4;
NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, state->net, state->sk, skb, skb->dev, NULL, diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c index 69dfd212e50d..f94c83f5cc37 100644 --- a/net/bridge/br_netfilter_ipv6.c +++ b/net/bridge/br_netfilter_ipv6.c @@ -237,6 +237,8 @@ unsigned int br_nf_pre_routing_ipv6(void *priv, nf_bridge->ipv6_daddr = ipv6_hdr(skb)->daddr;
skb->protocol = htons(ETH_P_IPV6); + skb->transport_header = skb->network_header + sizeof(struct ipv6hdr); + NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, state->net, state->sk, skb, skb->dev, NULL, br_nf_pre_routing_finish_ipv6);
[ Upstream commit ac0cdb3d990108df795b676cd0d0e65ac34b2273 ]
Add the missing uart_unregister_driver() and i2c_del_driver() before return from sc16is7xx_init() in the error handling case.
Signed-off-by: Mao Wenan maowenan@huawei.com Reviewed-by: Vladimir Zapolskiy vz@mleia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/tty/serial/sc16is7xx.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 17a22073d226..032f3c13b8c4 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -1448,7 +1448,7 @@ static int __init sc16is7xx_init(void) ret = i2c_add_driver(&sc16is7xx_i2c_uart_driver); if (ret < 0) { pr_err("failed to init sc16is7xx i2c --> %d\n", ret); - return ret; + goto err_i2c; } #endif
@@ -1456,10 +1456,18 @@ static int __init sc16is7xx_init(void) ret = spi_register_driver(&sc16is7xx_spi_uart_driver); if (ret < 0) { pr_err("failed to init sc16is7xx spi --> %d\n", ret); - return ret; + goto err_spi; } #endif return ret; + +err_spi: +#ifdef CONFIG_SERIAL_SC16IS7XX_I2C + i2c_del_driver(&sc16is7xx_i2c_uart_driver); +#endif +err_i2c: + uart_unregister_driver(&sc16is7xx_uart); + return ret; } module_init(sc16is7xx_init);
[ Upstream commit 9d6a54c1430647355a5e23434881b2ca3d192b48 ]
The OUT endpoint normally blocks (NAK) subsequent packets when a short packet was received and returns an incomplete queue entry to the gadget driver. Thereby the gadget driver can detect a short packet when reading queue entries with a length that is not equal to a multiple of packet size.
The start_queue() function enables receiving OUT packets regardless of the content of the OUT FIFO. This results in a race: With the current code, it's possible that the "!ep->is_in && (readl(&ep->regs->ep_stat) & BIT(NAK_OUT_PACKETS))" test in start_dma() will fail, then a short packet will be received, and then start_queue() will call stop_out_naking(). That's what we don't want (OUT naking gets turned off while there is data in the FIFO) because then the next driver request might receive a mixture of old and new packets.
With the patch, this race can't occur because the FIFO's state is tested after we know that OUT naking is already turned on, and OUT naking is stopped only when both of the conditions are met. This ensures that all received data is delivered to the gadget driver, which can detect a short packet now before new packets are appended to the last short packet.
Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Guido Kiener guido.kiener@rohde-schwarz.com Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/usb/gadget/udc/net2280.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c index 8efeadf30b4d..fc94a09e2a5a 100644 --- a/drivers/usb/gadget/udc/net2280.c +++ b/drivers/usb/gadget/udc/net2280.c @@ -870,9 +870,6 @@ static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma) (void) readl(&ep->dev->pci->pcimstctl);
writel(BIT(DMA_START), &dma->dmastat); - - if (!ep->is_in) - stop_out_naking(ep); }
static void start_dma(struct net2280_ep *ep, struct net2280_request *req) @@ -911,6 +908,7 @@ static void start_dma(struct net2280_ep *ep, struct net2280_request *req) writel(BIT(DMA_START), &dma->dmastat); return; } + stop_out_naking(ep); }
tmp = dmactl_default;
[ Upstream commit f1d3fba17cd4eeea20397f1324b7b9c69a6a935c ]
When a request must be dequeued with net2280_dequeue() e.g. due to a device clear action and the same request is finished by the function scan_dma_completions() then the function net2280_dequeue() does not find the request in the following search loop and returns the error -EINVAL without restoring the status ep->stopped. Thus the endpoint keeps blocked and does not receive any data anymore. This fix restores the status and does not issue an error message.
Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Guido Kiener guido.kiener@rohde-schwarz.com Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/usb/gadget/udc/net2280.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c index fc94a09e2a5a..3a8d056a5d16 100644 --- a/drivers/usb/gadget/udc/net2280.c +++ b/drivers/usb/gadget/udc/net2280.c @@ -1270,9 +1270,9 @@ static int net2280_dequeue(struct usb_ep *_ep, struct usb_request *_req) break; } if (&req->req != _req) { + ep->stopped = stopped; spin_unlock_irqrestore(&ep->dev->lock, flags); - dev_err(&ep->dev->pdev->dev, "%s: Request mismatch\n", - __func__); + ep_dbg(ep->dev, "%s: Request mismatch\n", __func__); return -EINVAL; }
[ Upstream commit 091dacc3cc10979ab0422f0a9f7fcc27eee97e69 ]
Restore the status of ep->stopped in function net2272_dequeue().
When the given request is not found in the endpoint queue the function returns -EINVAL without restoring the state of ep->stopped. Thus the endpoint keeps blocked and does not transfer any data anymore.
This fix is only compile-tested, since we do not have a corresponding hardware. An analogous fix was tested in the sibling driver. See "usb: gadget: net2280: Fix net2280_dequeue()"
Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Guido Kiener guido.kiener@rohde-schwarz.com Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/usb/gadget/udc/net2272.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/udc/net2272.c b/drivers/usb/gadget/udc/net2272.c index 3b6e34fc032b..553922c3be85 100644 --- a/drivers/usb/gadget/udc/net2272.c +++ b/drivers/usb/gadget/udc/net2272.c @@ -962,6 +962,7 @@ net2272_dequeue(struct usb_ep *_ep, struct usb_request *_req) break; } if (&req->req != _req) { + ep->stopped = stopped; spin_unlock_irqrestore(&ep->dev->lock, flags); return -EINVAL; }
[ Upstream commit 032f85c9360fb1a08385c584c2c4ed114b33c260 ]
Increase the reset duration to ensure correct phy functionality. The reset duration is taken from barebox commit 52fdd510de ("ARM: dts: pfla02: use long enough reset for ethernet phy"):
Use a longer reset time for ethernet phy Micrel KSZ9031RNX. Otherwise a small percentage of modules have 'transmission timeouts' errors like
barebox@Phytec phyFLEX-i.MX6 Quad Carrier-Board:/ ifup eth0 warning: No MAC address set. Using random address 7e:94:4d:02:f8:f3 eth0: 1000Mbps full duplex link detected eth0: transmission timeout T eth0: transmission timeout T eth0: transmission timeout T eth0: transmission timeout T eth0: transmission timeout
Cc: Stefan Christ s.christ@phytec.de Cc: Christian Hemp c.hemp@phytec.de Signed-off-by: Marco Felsch m.felsch@pengutronix.de Fixes: 3180f956668e ("ARM: dts: Phytec imx6q pfla02 and pbab01 support") Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi b/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi index d6d98d426384..cae04e806036 100644 --- a/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi +++ b/arch/arm/boot/dts/imx6qdl-phytec-pfla02.dtsi @@ -90,6 +90,7 @@ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; phy-mode = "rgmii"; + phy-reset-duration = <10>; /* in msecs */ phy-reset-gpios = <&gpio3 23 GPIO_ACTIVE_LOW>; phy-supply = <&vdd_eth_io_reg>; status = "disabled";
[ Upstream commit 536d3680fd2dab5c39857d62a3e084198fc74ff9 ]
The ks8851 driver lets the chip auto-dequeue received packets once they have been read in full. It achieves that by setting the ADRFE flag in the RXQCR register ("Auto-Dequeue RXQ Frame Enable").
However if allocation of a packet's socket buffer or retrieval of the packet over the SPI bus fails, the packet will not have been read in full and is not auto-dequeued. Such partial retrieval of a packet confuses the chip's RX queue management: On the next RX interrupt, the first packet read from the queue will be the one left there previously and this one can be retrieved without issues. But for any newly received packets, the frame header status and byte count registers (RXFHSR and RXFHBCR) contain bogus values, preventing their retrieval.
The chip allows explicitly dequeueing a packet from the RX queue by setting the RRXEF flag in the RXQCR register ("Release RX Error Frame"). This could be used to dequeue the packet in case of an error, but if that error is a failed SPI transfer, it is unknown if the packet was transferred in full and was auto-dequeued or if it was only transferred in part and requires an explicit dequeue. The safest approach is thus to always dequeue packets explicitly and forgo auto-dequeueing.
Without this change, I've witnessed packet retrieval break completely when an SPI DMA transfer fails, requiring a chip reset. Explicit dequeueing magically fixes this and makes packet retrieval absolutely robust for me.
The chip's documentation suggests auto-dequeuing and uses the RRXEF flag only to dequeue error frames which the driver doesn't want to retrieve. But that seems to be a fair-weather approach.
Signed-off-by: Lukas Wunner lukas@wunner.de Cc: Frank Pavlic f.pavlic@kunbus.de Cc: Ben Dooks ben.dooks@codethink.co.uk Cc: Tristram Ha Tristram.Ha@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/micrel/ks8851.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index 1edc973df4c4..247a3377b951 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -547,9 +547,8 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) /* set dma read address */ ks8851_wrreg16(ks, KS_RXFDPR, RXFDPR_RXFPAI | 0x00);
- /* start the packet dma process, and set auto-dequeue rx */ - ks8851_wrreg16(ks, KS_RXQCR, - ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE); + /* start DMA access */ + ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr | RXQCR_SDA);
if (rxlen > 4) { unsigned int rxalign; @@ -580,7 +579,8 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) } }
- ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr); + /* end DMA access and dequeue packet */ + ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr | RXQCR_RRXEF); } }
[ Upstream commit 761cfa979a0c177d6c2d93ef5585cd79ae49a7d5 ]
Commit 73fdeb82e963 ("net: ks8851: Add optional vdd_io regulator and reset gpio") amended the ks8851 driver to briefly assert the chip's reset pin on probe. It also amended the probe routine's error path to reassert the reset pin if a subsequent initialization step fails.
However the commit misplaced reassertion of the reset pin in the error path such that it is not performed if the check of the Chip ID and Enable Register (CIDER) fails. The error path is therefore slightly asymmetrical to the probe routine's body. Fix it.
Signed-off-by: Lukas Wunner lukas@wunner.de Cc: Frank Pavlic f.pavlic@kunbus.de Cc: Stephen Boyd sboyd@codeaurora.org Cc: Nishanth Menon nm@ti.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/micrel/ks8851.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index 247a3377b951..a8c5641ff955 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -1567,9 +1567,9 @@ static int ks8851_probe(struct spi_device *spi) free_irq(ndev->irq, ks);
err_irq: +err_id: if (gpio_is_valid(gpio)) gpio_set_value(gpio, 0); -err_id: regulator_disable(ks->vdd_reg); err_reg: regulator_disable(ks->vdd_io);
[ Upstream commit d268f31552794abf5b6aa5af31021643411f25f5 ]
The ks8851 driver currently requests the IRQ before registering the net_device. Because the net_device name is used as IRQ name and is still "eth%d" when the IRQ is requested, it's impossibe to tell IRQs apart if multiple ks8851 chips are present. Most other drivers delay requesting the IRQ until the net_device is opened. Do the same.
The driver doesn't enable interrupts on the chip before opening the net_device and disables them when closing it, so there doesn't seem to be a need to request the IRQ already on probe.
Signed-off-by: Lukas Wunner lukas@wunner.de Cc: Frank Pavlic f.pavlic@kunbus.de Cc: Ben Dooks ben.dooks@codethink.co.uk Cc: Tristram Ha Tristram.Ha@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/micrel/ks8851.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index a8c5641ff955..ff6cab4f6343 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -797,6 +797,15 @@ static void ks8851_tx_work(struct work_struct *work) static int ks8851_net_open(struct net_device *dev) { struct ks8851_net *ks = netdev_priv(dev); + int ret; + + ret = request_threaded_irq(dev->irq, NULL, ks8851_irq, + IRQF_TRIGGER_LOW | IRQF_ONESHOT, + dev->name, ks); + if (ret < 0) { + netdev_err(dev, "failed to get irq\n"); + return ret; + }
/* lock the card, even if we may not actually be doing anything * else at the moment */ @@ -911,6 +920,8 @@ static int ks8851_net_stop(struct net_device *dev) dev_kfree_skb(txb); }
+ free_irq(dev->irq, ks); + return 0; }
@@ -1542,14 +1553,6 @@ static int ks8851_probe(struct spi_device *spi) ks8851_read_selftest(ks); ks8851_init_mac(ks);
- ret = request_threaded_irq(spi->irq, NULL, ks8851_irq, - IRQF_TRIGGER_LOW | IRQF_ONESHOT, - ndev->name, ks); - if (ret < 0) { - dev_err(&spi->dev, "failed to get irq\n"); - goto err_irq; - } - ret = register_netdev(ndev); if (ret) { dev_err(&spi->dev, "failed to register network device\n"); @@ -1562,11 +1565,7 @@ static int ks8851_probe(struct spi_device *spi)
return 0;
- err_netdev: - free_irq(ndev->irq, ks); - -err_irq: err_id: if (gpio_is_valid(gpio)) gpio_set_value(gpio, 0); @@ -1587,7 +1586,6 @@ static int ks8851_remove(struct spi_device *spi) dev_info(&spi->dev, "remove\n");
unregister_netdev(priv->netdev); - free_irq(spi->irq, priv); if (gpio_is_valid(priv->gpio)) gpio_set_value(priv->gpio, 0); regulator_disable(priv->vdd_reg);
[ Upstream commit 9624bafa5f6418b9ca5b3f66d1f6a6a2e8bf6d4c ]
The ks8851 chip's initial carrier state is down. A Link Change Interrupt is signaled once interrupts are enabled if the carrier is up.
The ks8851 driver has it backwards by assuming that the initial carrier state is up. The state is therefore misrepresented if the interface is opened with no cable attached. Fix it.
The Link Change interrupt is sometimes not signaled unless the P1MBSR register (which contains the Link Status bit) is read on ->ndo_open(). This might be a hardware erratum. Read the register by calling mii_check_link(), which has the desirable side effect of setting the carrier state to down if the cable was detached while the interface was closed.
Signed-off-by: Lukas Wunner lukas@wunner.de Cc: Frank Pavlic f.pavlic@kunbus.de Cc: Ben Dooks ben.dooks@codethink.co.uk Cc: Tristram Ha Tristram.Ha@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/micrel/ks8851.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index ff6cab4f6343..7377dca6eb57 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -870,6 +870,7 @@ static int ks8851_net_open(struct net_device *dev) netif_dbg(ks, ifup, ks->netdev, "network device up\n");
mutex_unlock(&ks->lock); + mii_check_link(&ks->mii); return 0; }
@@ -1527,6 +1528,7 @@ static int ks8851_probe(struct spi_device *spi)
spi_set_drvdata(spi, ks);
+ netif_carrier_off(ks->netdev); ndev->if_port = IF_PORT_100BASET; ndev->netdev_ops = &ks8851_netdev_ops; ndev->irq = spi->irq;
[ Upstream commit fa3a419d2f674b431d38748cb58fb7da17ee8949 ]
The call to of_parse_phandle returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.
Detected by coccinelle with the following warnings: ./drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1624:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1569, but without a corresponding object release within this function.
Signed-off-by: Wen Yang wen.yang99@zte.com.cn Cc: Anirudha Sarangi anirudh@xilinx.com Cc: John Linn John.Linn@xilinx.com Cc: "David S. Miller" davem@davemloft.net Cc: Michal Simek michal.simek@xilinx.com Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 4684644703cc..58ba579793f8 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1595,12 +1595,14 @@ static int axienet_probe(struct platform_device *pdev) ret = of_address_to_resource(np, 0, &dmares); if (ret) { dev_err(&pdev->dev, "unable to get DMA resource\n"); + of_node_put(np); goto free_netdev; } lp->dma_regs = devm_ioremap_resource(&pdev->dev, &dmares); if (IS_ERR(lp->dma_regs)) { dev_err(&pdev->dev, "could not map DMA regs\n"); ret = PTR_ERR(lp->dma_regs); + of_node_put(np); goto free_netdev; } lp->rx_irq = irq_of_parse_and_map(np, 1);
[ Upstream commit be693df3cf9dd113ff1d2c0d8150199efdba37f6 ]
The call to ehea_get_eth_dn returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.
Detected by coccinelle with the following warnings: ./drivers/net/ethernet/ibm/ehea/ehea_main.c:3163:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3154, but without a corresponding object release within this function.
Signed-off-by: Wen Yang wen.yang99@zte.com.cn Cc: Douglas Miller dougmill@linux.ibm.com Cc: "David S. Miller" davem@davemloft.net Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c index 2a0dc127df3f..1a56de06b014 100644 --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c @@ -3183,6 +3183,7 @@ static ssize_t ehea_probe_port(struct device *dev,
if (ehea_add_adapter_mr(adapter)) { pr_err("creating MR failed\n"); + of_node_put(eth_dn); return -EIO; }
[ Upstream commit 75eac7b5f68b0a0671e795ac636457ee27cc11d8 ]
The call to of_get_child_by_name returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.
Detected by coccinelle with the following warnings: ./drivers/net/ethernet/ti/netcp_ethss.c:3661:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function. ./drivers/net/ethernet/ti/netcp_ethss.c:3665:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
Signed-off-by: Wen Yang wen.yang99@zte.com.cn Cc: Wingman Kwok w-kwok2@ti.com Cc: Murali Karicheri m-karicheri2@ti.com Cc: "David S. Miller" davem@davemloft.net Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/net/ethernet/ti/netcp_ethss.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index 4e70e7586a09..a5732edc8437 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -3122,12 +3122,16 @@ static int gbe_probe(struct netcp_device *netcp_device, struct device *dev,
ret = netcp_txpipe_init(&gbe_dev->tx_pipe, netcp_device, gbe_dev->dma_chan_name, gbe_dev->tx_queue_id); - if (ret) + if (ret) { + of_node_put(interfaces); return ret; + }
ret = netcp_txpipe_open(&gbe_dev->tx_pipe); - if (ret) + if (ret) { + of_node_put(interfaces); return ret; + }
/* Create network interfaces */ INIT_LIST_HEAD(&gbe_dev->gbe_intf_head);
[ Upstream commit fba1bdd2a9a93f3e2181ec1936a3c2f6b37e7ed6 ]
In case iscsi_lookup_endpoint fails, the fix returns -EINVAL to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu kjlu@umn.edu Acked-by: Manish Rangankar mrangankar@marvell.com Reviewed-by: Mukesh Ojha mojha@codeaurora.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/scsi/qla4xxx/ql4_os.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c index f9f899ec9427..c158967b59d7 100644 --- a/drivers/scsi/qla4xxx/ql4_os.c +++ b/drivers/scsi/qla4xxx/ql4_os.c @@ -3207,6 +3207,8 @@ static int qla4xxx_conn_bind(struct iscsi_cls_session *cls_session, if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) return -EINVAL; ep = iscsi_lookup_endpoint(transport_fd); + if (!ep) + return -EINVAL; conn = cls_conn->dd_data; qla_conn = conn->dd_data; qla_conn->qla_ep = ep->dd_data;
[ Upstream commit f276e002793cdb820862e8ea8f76769d56bba575 ]
if platform_driver_register fails, cleanup the allocated resource gracefully.
Signed-off-by: Mukesh Ojha mojha@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/usb/host/u132-hcd.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/usb/host/u132-hcd.c b/drivers/usb/host/u132-hcd.c index d5434e7a3b2e..86f9944f337d 100644 --- a/drivers/usb/host/u132-hcd.c +++ b/drivers/usb/host/u132-hcd.c @@ -3214,6 +3214,9 @@ static int __init u132_hcd_init(void) printk(KERN_INFO "driver %s\n", hcd_name); workqueue = create_singlethread_workqueue("u132"); retval = platform_driver_register(&u132_platform_driver); + if (retval) + destroy_workqueue(workqueue); + return retval; }
[ Upstream commit daf5cc27eed99afdea8d96e71b89ba41f5406ef6 ]
free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory.
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- fs/ceph/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 9f0d99094cc1..a663b676d566 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -474,6 +474,7 @@ static void ceph_i_callback(struct rcu_head *head) struct inode *inode = container_of(head, struct inode, i_rcu); struct ceph_inode_info *ci = ceph_inode(inode);
+ kfree(ci->i_symlink); kmem_cache_free(ceph_inode_cachep, ci); }
@@ -505,7 +506,6 @@ void ceph_destroy_inode(struct inode *inode) ceph_put_snap_realm(mdsc, realm); }
- kfree(ci->i_symlink); while ((n = rb_first(&ci->i_fragtree)) != NULL) { frag = rb_entry(n, struct ceph_inode_frag, node); rb_erase(n, &ci->i_fragtree);
[ Upstream commit c8206579175c34a2546de8a74262456278a7795a ]
If an incoming ELS of type RSCN contains more than one element, zfcp suboptimally causes repeated erp trigger NOP trace records for each previously failed port. These could be ports that went away. It loops over each RSCN element, and for each of those in an inner loop over all zfcp_ports.
The trigger to recover failed ports should be just the reception of some RSCN, no matter how many elements it has. So we can loop over failed ports separately, and only then loop over each RSCN element to handle the non-failed ports.
The call chain was:
zfcp_fc_incoming_rscn for (i = 1; i < no_entries; i++) _zfcp_fc_incoming_rscn list_for_each_entry(port, &adapter->port_list, list) if (masked port->d_id match) zfcp_fc_test_link if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <===
In order the reduce the "flooding" of the REC trace area in such cases, we factor out handling the failed ports to be outside of the entries loop:
zfcp_fc_incoming_rscn if (no_entries > 1) <=== list_for_each_entry(port, &adapter->port_list, list) <=== if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <=== for (i = 1; i < no_entries; i++) _zfcp_fc_incoming_rscn list_for_each_entry(port, &adapter->port_list, list) if (masked port->d_id match) zfcp_fc_test_link
Abbreviated example trace records before this code change:
Tag : fcrscn1 WWPN : 0x500507630310d327 ERP want : 0x02 ERP need : 0x02
Tag : fcrscn1 WWPN : 0x500507630310d327 ERP want : 0x02 ERP need : 0x00 NOP => superfluous trace record
The last trace entry repeats if there are more than 2 RSCN elements.
Signed-off-by: Steffen Maier maier@linux.ibm.com Reviewed-by: Benjamin Block bblock@linux.ibm.com Reviewed-by: Jens Remus jremus@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/s390/scsi/zfcp_fc.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c index 237688af179b..f7630cf581cd 100644 --- a/drivers/s390/scsi/zfcp_fc.c +++ b/drivers/s390/scsi/zfcp_fc.c @@ -238,10 +238,6 @@ static void _zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req, u32 range, list_for_each_entry(port, &adapter->port_list, list) { if ((port->d_id & range) == (ntoh24(page->rscn_fid) & range)) zfcp_fc_test_link(port); - if (!port->d_id) - zfcp_erp_port_reopen(port, - ZFCP_STATUS_COMMON_ERP_FAILED, - "fcrscn1"); } read_unlock_irqrestore(&adapter->port_list_lock, flags); } @@ -249,6 +245,7 @@ static void _zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req, u32 range, static void zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req) { struct fsf_status_read_buffer *status_buffer = (void *)fsf_req->data; + struct zfcp_adapter *adapter = fsf_req->adapter; struct fc_els_rscn *head; struct fc_els_rscn_page *page; u16 i; @@ -261,6 +258,22 @@ static void zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req) /* see FC-FS */ no_entries = head->rscn_plen / sizeof(struct fc_els_rscn_page);
+ if (no_entries > 1) { + /* handle failed ports */ + unsigned long flags; + struct zfcp_port *port; + + read_lock_irqsave(&adapter->port_list_lock, flags); + list_for_each_entry(port, &adapter->port_list, list) { + if (port->d_id) + continue; + zfcp_erp_port_reopen(port, + ZFCP_STATUS_COMMON_ERP_FAILED, + "fcrscn1"); + } + read_unlock_irqrestore(&adapter->port_list_lock, flags); + } + for (i = 1; i < no_entries; i++) { /* skip head and start with 1st element */ page++;
[ Upstream commit dd08a8d9a66de4b54575c294a92630299f7e0fe7 ]
When CONFIG_VMAP_STACK=y, __pa() returns incorrect physical address for a stack virtual address. Stack DMA buffers must be avoided.
Signed-off-by: raymond pang raymondpangxd@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- drivers/ata/libata-zpodd.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-)
diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c index 0ad96c647541..7017a81d53cf 100644 --- a/drivers/ata/libata-zpodd.c +++ b/drivers/ata/libata-zpodd.c @@ -51,38 +51,52 @@ static int eject_tray(struct ata_device *dev) /* Per the spec, only slot type and drawer type ODD can be supported */ static enum odd_mech_type zpodd_get_mech_type(struct ata_device *dev) { - char buf[16]; + char *buf; unsigned int ret; - struct rm_feature_desc *desc = (void *)(buf + 8); + struct rm_feature_desc *desc; struct ata_taskfile tf; static const char cdb[] = { GPCMD_GET_CONFIGURATION, 2, /* only 1 feature descriptor requested */ 0, 3, /* 3, removable medium feature */ 0, 0, 0,/* reserved */ - 0, sizeof(buf), + 0, 16, 0, 0, 0, };
+ buf = kzalloc(16, GFP_KERNEL); + if (!buf) + return ODD_MECH_TYPE_UNSUPPORTED; + desc = (void *)(buf + 8); + ata_tf_init(dev, &tf); tf.flags = ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf.command = ATA_CMD_PACKET; tf.protocol = ATAPI_PROT_PIO; - tf.lbam = sizeof(buf); + tf.lbam = 16;
ret = ata_exec_internal(dev, &tf, cdb, DMA_FROM_DEVICE, - buf, sizeof(buf), 0); - if (ret) + buf, 16, 0); + if (ret) { + kfree(buf); return ODD_MECH_TYPE_UNSUPPORTED; + }
- if (be16_to_cpu(desc->feature_code) != 3) + if (be16_to_cpu(desc->feature_code) != 3) { + kfree(buf); return ODD_MECH_TYPE_UNSUPPORTED; + }
- if (desc->mech_type == 0 && desc->load == 0 && desc->eject == 1) + if (desc->mech_type == 0 && desc->load == 0 && desc->eject == 1) { + kfree(buf); return ODD_MECH_TYPE_SLOT; - else if (desc->mech_type == 1 && desc->load == 0 && desc->eject == 1) + } else if (desc->mech_type == 1 && desc->load == 0 && + desc->eject == 1) { + kfree(buf); return ODD_MECH_TYPE_DRAWER; - else + } else { + kfree(buf); return ODD_MECH_TYPE_UNSUPPORTED; + } }
/* Test if ODD is zero power ready by sense code */
[ Upstream commit 9c38f1f044080392603c497ecca4d7d09876ff99 ]
Backspace is not working on some terminal emulators which do not send the key code defined by terminfo. Terminals either send '^H' (8) or '^?' (127). But currently only '^?' is handled. Let's also handle '^H' for those terminals.
Signed-off-by: Changbin Du changbin.du@gmail.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- scripts/kconfig/lxdialog/inputbox.c | 3 ++- scripts/kconfig/nconf.c | 2 +- scripts/kconfig/nconf.gui.c | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/scripts/kconfig/lxdialog/inputbox.c b/scripts/kconfig/lxdialog/inputbox.c index d58de1dc5360..510049a7bd1d 100644 --- a/scripts/kconfig/lxdialog/inputbox.c +++ b/scripts/kconfig/lxdialog/inputbox.c @@ -126,7 +126,8 @@ int dialog_inputbox(const char *title, const char *prompt, int height, int width case KEY_DOWN: break; case KEY_BACKSPACE: - case 127: + case 8: /* ^H */ + case 127: /* ^? */ if (pos) { wattrset(dialog, dlg.inputbox.atr); if (input_x == 0) { diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c index d42d534a66cd..f7049e288e93 100644 --- a/scripts/kconfig/nconf.c +++ b/scripts/kconfig/nconf.c @@ -1046,7 +1046,7 @@ static int do_match(int key, struct match_state *state, int *ans) state->match_direction = FIND_NEXT_MATCH_UP; *ans = get_mext_match(state->pattern, state->match_direction); - } else if (key == KEY_BACKSPACE || key == 127) { + } else if (key == KEY_BACKSPACE || key == 8 || key == 127) { state->pattern[strlen(state->pattern)-1] = '\0'; adj_match_dir(&state->match_direction); } else diff --git a/scripts/kconfig/nconf.gui.c b/scripts/kconfig/nconf.gui.c index 4b2f44c20caf..9a65035cf787 100644 --- a/scripts/kconfig/nconf.gui.c +++ b/scripts/kconfig/nconf.gui.c @@ -439,7 +439,8 @@ int dialog_inputbox(WINDOW *main_window, case KEY_F(F_EXIT): case KEY_F(F_BACK): break; - case 127: + case 8: /* ^H */ + case 127: /* ^? */ case KEY_BACKSPACE: if (cursor_position > 0) { memmove(&result[cursor_position-1],
From: Alex Williamson alex.williamson@redhat.com
commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c upstream.
Memory backed DMA mappings are accounted against a user's locked memory limit, including multiple mappings of the same memory. This accounting bounds the number of such mappings that a user can create. However, DMA mappings that are not backed by memory, such as DMA mappings of device MMIO via mmaps, do not make use of page pinning and therefore do not count against the user's locked memory limit. These mappings still consume memory, but the memory is not well associated to the process for the purpose of oom killing a task.
To add bounding on this use case, we introduce a limit to the total number of concurrent DMA mappings that a user is allowed to create. This limit is exposed as a tunable module option where the default value of 64K is expected to be well in excess of any reasonable use case (a large virtual machine configuration would typically only make use of tens of concurrent mappings).
This fixes CVE-2019-3882.
Reviewed-by: Eric Auger eric.auger@redhat.com Tested-by: Eric Auger eric.auger@redhat.com Reviewed-by: Peter Xu peterx@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com [groeck: Adjust for missing upstream commit] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vfio/vfio_iommu_type1.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -53,10 +53,16 @@ module_param_named(disable_hugepages, MODULE_PARM_DESC(disable_hugepages, "Disable VFIO IOMMU support for IOMMU hugepages.");
+static unsigned int dma_entry_limit __read_mostly = U16_MAX; +module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644); +MODULE_PARM_DESC(dma_entry_limit, + "Maximum number of user DMA mappings per container (65535)."); + struct vfio_iommu { struct list_head domain_list; struct mutex lock; struct rb_root dma_list; + unsigned int dma_avail; bool v2; bool nesting; }; @@ -382,6 +388,7 @@ static void vfio_remove_dma(struct vfio_ vfio_unmap_unpin(iommu, dma); vfio_unlink_dma(iommu, dma); kfree(dma); + iommu->dma_avail++; }
static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) @@ -582,12 +589,18 @@ static int vfio_dma_do_map(struct vfio_i return -EEXIST; }
+ if (!iommu->dma_avail) { + mutex_unlock(&iommu->lock); + return -ENOSPC; + } + dma = kzalloc(sizeof(*dma), GFP_KERNEL); if (!dma) { mutex_unlock(&iommu->lock); return -ENOMEM; }
+ iommu->dma_avail--; dma->iova = iova; dma->vaddr = vaddr; dma->prot = prot; @@ -903,6 +916,7 @@ static void *vfio_iommu_type1_open(unsig
INIT_LIST_HEAD(&iommu->domain_list); iommu->dma_list = RB_ROOT; + iommu->dma_avail = dma_entry_limit; mutex_init(&iommu->lock);
return iommu;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit e5c812e84f0dece3400d5caf42522287e6ef139f upstream.
The line6 driver uses a lot of USB buffers off of the stack, which is not allowed on many systems, causing the driver to crash on some of them. Fix this up by dynamically allocating the buffers with kmalloc() which allows for proper DMA-able memory.
Reported-by: Christo Gouws gouws.christo@gmail.com Reported-by: Alan Stern stern@rowland.harvard.edu Tested-by: Christo Gouws gouws.christo@gmail.com Cc: stable stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/line6/driver.c | 60 ++++++++++++++++++++++++++------------------- sound/usb/line6/toneport.c | 24 +++++++++++++----- 2 files changed, 53 insertions(+), 31 deletions(-)
--- a/sound/usb/line6/driver.c +++ b/sound/usb/line6/driver.c @@ -307,12 +307,16 @@ int line6_read_data(struct usb_line6 *li { struct usb_device *usbdev = line6->usbdev; int ret; - unsigned char len; + unsigned char *len; unsigned count;
if (address > 0xffff || datalen > 0xff) return -EINVAL;
+ len = kmalloc(sizeof(*len), GFP_KERNEL); + if (!len) + return -ENOMEM; + /* query the serial number: */ ret = usb_control_msg(usbdev, usb_sndctrlpipe(usbdev, 0), 0x67, USB_TYPE_VENDOR | USB_RECIP_DEVICE | USB_DIR_OUT, @@ -321,7 +325,7 @@ int line6_read_data(struct usb_line6 *li
if (ret < 0) { dev_err(line6->ifcdev, "read request failed (error %d)\n", ret); - return ret; + goto exit; }
/* Wait for data length. We'll get 0xff until length arrives. */ @@ -331,28 +335,29 @@ int line6_read_data(struct usb_line6 *li ret = usb_control_msg(usbdev, usb_rcvctrlpipe(usbdev, 0), 0x67, USB_TYPE_VENDOR | USB_RECIP_DEVICE | USB_DIR_IN, - 0x0012, 0x0000, &len, 1, + 0x0012, 0x0000, len, 1, LINE6_TIMEOUT * HZ); if (ret < 0) { dev_err(line6->ifcdev, "receive length failed (error %d)\n", ret); - return ret; + goto exit; }
- if (len != 0xff) + if (*len != 0xff) break; }
- if (len == 0xff) { + ret = -EIO; + if (*len == 0xff) { dev_err(line6->ifcdev, "read failed after %d retries\n", count); - return -EIO; - } else if (len != datalen) { + goto exit; + } else if (*len != datalen) { /* should be equal or something went wrong */ dev_err(line6->ifcdev, "length mismatch (expected %d, got %d)\n", - (int)datalen, (int)len); - return -EIO; + (int)datalen, (int)*len); + goto exit; }
/* receive the result: */ @@ -361,12 +366,12 @@ int line6_read_data(struct usb_line6 *li 0x0013, 0x0000, data, datalen, LINE6_TIMEOUT * HZ);
- if (ret < 0) { + if (ret < 0) dev_err(line6->ifcdev, "read failed (error %d)\n", ret); - return ret; - }
- return 0; +exit: + kfree(len); + return ret; } EXPORT_SYMBOL_GPL(line6_read_data);
@@ -378,12 +383,16 @@ int line6_write_data(struct usb_line6 *l { struct usb_device *usbdev = line6->usbdev; int ret; - unsigned char status; + unsigned char *status; int count;
if (address > 0xffff || datalen > 0xffff) return -EINVAL;
+ status = kmalloc(sizeof(*status), GFP_KERNEL); + if (!status) + return -ENOMEM; + ret = usb_control_msg(usbdev, usb_sndctrlpipe(usbdev, 0), 0x67, USB_TYPE_VENDOR | USB_RECIP_DEVICE | USB_DIR_OUT, 0x0022, address, data, datalen, @@ -392,7 +401,7 @@ int line6_write_data(struct usb_line6 *l if (ret < 0) { dev_err(line6->ifcdev, "write request failed (error %d)\n", ret); - return ret; + goto exit; }
for (count = 0; count < LINE6_READ_WRITE_MAX_RETRIES; count++) { @@ -403,28 +412,29 @@ int line6_write_data(struct usb_line6 *l USB_TYPE_VENDOR | USB_RECIP_DEVICE | USB_DIR_IN, 0x0012, 0x0000, - &status, 1, LINE6_TIMEOUT * HZ); + status, 1, LINE6_TIMEOUT * HZ);
if (ret < 0) { dev_err(line6->ifcdev, "receiving status failed (error %d)\n", ret); - return ret; + goto exit; }
- if (status != 0xff) + if (*status != 0xff) break; }
- if (status == 0xff) { + if (*status == 0xff) { dev_err(line6->ifcdev, "write failed after %d retries\n", count); - return -EIO; - } else if (status != 0) { + ret = -EIO; + } else if (*status != 0) { dev_err(line6->ifcdev, "write failed (error %d)\n", ret); - return -EIO; + ret = -EIO; } - - return 0; +exit: + kfree(status); + return ret; } EXPORT_SYMBOL_GPL(line6_write_data);
--- a/sound/usb/line6/toneport.c +++ b/sound/usb/line6/toneport.c @@ -365,15 +365,20 @@ static bool toneport_has_source_select(s /* Setup Toneport device. */ -static void toneport_setup(struct usb_line6_toneport *toneport) +static int toneport_setup(struct usb_line6_toneport *toneport) { - int ticks; + int *ticks; struct usb_line6 *line6 = &toneport->line6; struct usb_device *usbdev = line6->usbdev;
+ ticks = kmalloc(sizeof(*ticks), GFP_KERNEL); + if (!ticks) + return -ENOMEM; + /* sync time on device with host: */ - ticks = (int)get_seconds(); - line6_write_data(line6, 0x80c6, &ticks, 4); + *ticks = (int)get_seconds(); + line6_write_data(line6, 0x80c6, ticks, 4); + kfree(ticks);
/* enable device: */ toneport_send_cmd(usbdev, 0x0301, 0x0000); @@ -388,6 +393,7 @@ static void toneport_setup(struct usb_li toneport_update_led(toneport);
mod_timer(&toneport->timer, jiffies + TONEPORT_PCM_DELAY * HZ); + return 0; }
/* @@ -451,7 +457,9 @@ static int toneport_init(struct usb_line return err; }
- toneport_setup(toneport); + err = toneport_setup(toneport); + if (err) + return err;
/* register audio system: */ return snd_card_register(line6->card); @@ -463,7 +471,11 @@ static int toneport_init(struct usb_line */ static int toneport_reset_resume(struct usb_interface *interface) { - toneport_setup(usb_get_intfdata(interface)); + int err; + + err = toneport_setup(usb_get_intfdata(interface)); + if (err) + return err; return line6_resume(interface); } #endif
From: Shmulik Ladkani shmulik@metanetworks.com
[ Upstream commit d2f0c961148f65bc73eda72b9fa3a4e80973cb49 ]
Previously, during fragmentation after forwarding, skb->skb_iif isn't preserved, i.e. 'ip_copy_metadata' does not copy skb_iif from given 'from' skb.
As a result, ip_do_fragment's creates fragments with zero skb_iif, leading to inconsistent behavior.
Assume for example an eBPF program attached at tc egress (post forwarding) that examines __sk_buff->ingress_ifindex: - the correct iif is observed if forwarding path does not involve fragmentation/refragmentation - a bogus iif is observed if forwarding path involves fragmentation/refragmentatiom
Fix, by preserving skb_iif during 'ip_copy_metadata'.
Signed-off-by: Shmulik Ladkani shmulik.ladkani@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_output.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -475,6 +475,7 @@ static void ip_copy_metadata(struct sk_b to->pkt_type = from->pkt_type; to->priority = from->priority; to->protocol = from->protocol; + to->skb_iif = from->skb_iif; skb_dst_drop(to); skb_dst_copy(to, from); to->dev = from->dev;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 6c0afef5fb0c27758f4d52b2210c61b6bd8b4470 ]
syzbot was able to catch a use-after-free read in pid_nr_ns() [1]
ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid but fl_free() releases fl->owner.pid before rcu grace period is started.
[1]
BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407 Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087
CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131 pid_nr_ns+0x128/0x140 kernel/pid.c:407 ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794 seq_read+0xad3/0x1130 fs/seq_file.c:268 proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227 do_loop_readv_writev fs/read_write.c:701 [inline] do_loop_readv_writev fs/read_write.c:688 [inline] do_iter_read+0x4a9/0x660 fs/read_write.c:922 vfs_readv+0xf0/0x160 fs/read_write.c:984 kernel_readv fs/splice.c:358 [inline] default_file_splice_read+0x475/0x890 fs/splice.c:413 do_splice_to+0x12a/0x190 fs/splice.c:876 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953 do_splice_direct+0x1da/0x2a0 fs/splice.c:1062 do_sendfile+0x597/0xd00 fs/read_write.c:1443 __do_sys_sendfile64 fs/read_write.c:1498 [inline] __se_sys_sendfile64 fs/read_write.c:1490 [inline] __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458da9 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9 RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4 R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff
Allocated by task 17543: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc mm/kasan/common.c:497 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505 slab_post_alloc_hook mm/slab.h:437 [inline] slab_alloc mm/slab.c:3393 [inline] kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555 alloc_pid+0x55/0x8f0 kernel/pid.c:168 copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932 copy_process kernel/fork.c:1709 [inline] _do_fork+0x257/0xfd0 kernel/fork.c:2226 __do_sys_clone kernel/fork.c:2333 [inline] __se_sys_clone kernel/fork.c:2327 [inline] __x64_sys_clone+0xbf/0x150 kernel/fork.c:2327 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467 __cache_free mm/slab.c:3499 [inline] kmem_cache_free+0x86/0x260 mm/slab.c:3765 put_pid.part.0+0x111/0x150 kernel/pid.c:111 put_pid+0x20/0x30 kernel/pid.c:105 fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102 ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers kernel/time/timer.c:1681 [inline] __run_timers kernel/time/timer.c:1649 [inline] run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 __do_softirq+0x266/0x95a kernel/softirq.c:293
The buggy address belongs to the object at ffff888094012a00 which belongs to the cache pid_2 of size 88 The buggy address is located 4 bytes inside of 88-byte region [ffff888094012a00, ffff888094012a58) The buggy address belongs to the page: page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080 raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
^ ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
Fixes: 4f82f45730c6 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Eric W. Biederman ebiederm@xmission.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_flowlabel.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -94,15 +94,21 @@ static struct ip6_flowlabel *fl_lookup(s return fl; }
+static void fl_free_rcu(struct rcu_head *head) +{ + struct ip6_flowlabel *fl = container_of(head, struct ip6_flowlabel, rcu); + + if (fl->share == IPV6_FL_S_PROCESS) + put_pid(fl->owner.pid); + kfree(fl->opt); + kfree(fl); +} +
static void fl_free(struct ip6_flowlabel *fl) { - if (fl) { - if (fl->share == IPV6_FL_S_PROCESS) - put_pid(fl->owner.pid); - kfree(fl->opt); - kfree_rcu(fl, rcu); - } + if (fl) + call_rcu(&fl->rcu, fl_free_rcu); }
static void fl_release(struct ip6_flowlabel *fl)
From: Willem de Bruijn willemb@google.com
[ Upstream commit 95c169251bf734aa555a1e8043e4d88ec97a04ec ]
A request for a flowlabel fails in process or user exclusive mode must fail if the caller pid or uid does not match. Invert the test.
Previously, the test was unsafe wrt PID recycling, but indeed tested for inequality: fl1->owner != fl->owner
Fixes: 4f82f45730c68 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t") Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_flowlabel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -639,9 +639,9 @@ recheck: if (fl1->share == IPV6_FL_S_EXCL || fl1->share != fl->share || ((fl1->share == IPV6_FL_S_PROCESS) && - (fl1->owner.pid == fl->owner.pid)) || + (fl1->owner.pid != fl->owner.pid)) || ((fl1->share == IPV6_FL_S_USER) && - uid_eq(fl1->owner.uid, fl->owner.uid))) + !uid_eq(fl1->owner.uid, fl->owner.uid))) goto release;
err = -ENOMEM;
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit b4e30e8e7ea1d1e35ffd64ca46f7d9a7f227b4bf ]
The driver builds a list of multicast addresses and sends it to the firmware when the driver's ndo_set_rx_mode() is called. In rare cases, the firmware can fail this call if internal resources to add multicast addresses are exhausted. In that case, we should try the call again by setting the ALL_MCAST flag which is more guaranteed to succeed.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4957,8 +4957,15 @@ static int bnxt_cfg_rx_mode(struct bnxt
skip_uc: rc = bnxt_hwrm_cfa_l2_set_rx_mask(bp, 0); + if (rc && vnic->mc_list_count) { + netdev_info(bp->dev, "Failed setting MC filters rc: %d, turning on ALL_MCAST mode\n", + rc); + vnic->rx_mask |= CFA_L2_SET_RX_MASK_REQ_MASK_ALL_MCAST; + vnic->mc_list_count = 0; + rc = bnxt_hwrm_cfa_l2_set_rx_mask(bp, 0); + } if (rc) - netdev_err(bp->dev, "HWRM cfa l2 rx mask failure rc: %x\n", + netdev_err(bp->dev, "HWRM cfa l2 rx mask failure rc: %d\n", rc);
return rc;
From: Willem de Bruijn willemb@google.com
[ Upstream commit 486efdc8f6ce802b27e15921d2353cc740c55451 ]
Packet sockets in datagram mode take a destination address. Verify its length before passing to dev_hard_header.
Prior to 2.6.14-rc3, the send code ignored sll_halen. This is established behavior. Directly compare msg_namelen to dev->addr_len.
Change v1->v2: initialize addr in all paths
Fixes: 6b8d95f1795c4 ("packet: validate address length if non-zero") Suggested-by: David Laight David.Laight@aculab.com Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2490,8 +2490,8 @@ static int tpacket_snd(struct packet_soc void *ph; DECLARE_SOCKADDR(struct sockaddr_ll *, saddr, msg->msg_name); bool need_wait = !(msg->msg_flags & MSG_DONTWAIT); + unsigned char *addr = NULL; int tp_len, size_max; - unsigned char *addr; int len_sum = 0; int status = TP_STATUS_AVAILABLE; int hlen, tlen; @@ -2511,10 +2511,13 @@ static int tpacket_snd(struct packet_soc sll_addr))) goto out; proto = saddr->sll_protocol; - addr = saddr->sll_halen ? saddr->sll_addr : NULL; dev = dev_get_by_index(sock_net(&po->sk), saddr->sll_ifindex); - if (addr && dev && saddr->sll_halen < dev->addr_len) - goto out_put; + if (po->sk.sk_socket->type == SOCK_DGRAM) { + if (dev && msg->msg_namelen < dev->addr_len + + offsetof(struct sockaddr_ll, sll_addr)) + goto out_put; + addr = saddr->sll_addr; + } }
err = -ENXIO; @@ -2652,7 +2655,7 @@ static int packet_snd(struct socket *soc struct sk_buff *skb; struct net_device *dev; __be16 proto; - unsigned char *addr; + unsigned char *addr = NULL; int err, reserve = 0; struct sockcm_cookie sockc; struct virtio_net_hdr vnet_hdr = { 0 }; @@ -2672,7 +2675,6 @@ static int packet_snd(struct socket *soc if (likely(saddr == NULL)) { dev = packet_cached_dev_get(po); proto = po->num; - addr = NULL; } else { err = -EINVAL; if (msg->msg_namelen < sizeof(struct sockaddr_ll)) @@ -2680,10 +2682,13 @@ static int packet_snd(struct socket *soc if (msg->msg_namelen < (saddr->sll_halen + offsetof(struct sockaddr_ll, sll_addr))) goto out; proto = saddr->sll_protocol; - addr = saddr->sll_halen ? saddr->sll_addr : NULL; dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex); - if (addr && dev && saddr->sll_halen < dev->addr_len) - goto out_unlock; + if (sock->type == SOCK_DGRAM) { + if (dev && msg->msg_namelen < dev->addr_len + + offsetof(struct sockaddr_ll, sll_addr)) + goto out_unlock; + addr = saddr->sll_addr; + } }
err = -ENXIO;
From: Alan Stern stern@rowland.harvard.edu
commit ef61eb43ada6c1d6b94668f0f514e4c268093ff3 upstream.
The syzkaller USB fuzzer found a general-protection-fault bug in the yurex driver. The fault occurs when a device has been unplugged; the driver's interrupt-URB handler logs an error message referring to the device by name, after the device has been unregistered and its name deallocated.
This problem is caused by the fact that the interrupt URB isn't cancelled until the driver's private data structure is released, which can happen long after the device is gone. The cure is to make sure that the interrupt URB is killed before yurex_disconnect() returns; this is exactly the sort of thing that usb_poison_urb() was meant for.
Signed-off-by: Alan Stern stern@rowland.harvard.edu Reported-and-tested-by: syzbot+2eb9121678bdb36e6d57@syzkaller.appspotmail.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/misc/yurex.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/misc/yurex.c +++ b/drivers/usb/misc/yurex.c @@ -332,6 +332,7 @@ static void yurex_disconnect(struct usb_ usb_deregister_dev(interface, &yurex_class);
/* prevent more I/O from starting */ + usb_poison_urb(dev->urb); mutex_lock(&dev->io_mutex); dev->interface = NULL; mutex_unlock(&dev->io_mutex);
From: Alan Stern stern@rowland.harvard.edu
commit c114944d7d67f24e71562fcfc18d550ab787e4d4 upstream.
The syzkaller USB fuzzer spotted a slab-out-of-bounds bug in the ds2490 driver. This bug is caused by improper use of the altsetting array in the usb_interface structure (the array's entries are not always stored in numerical order), combined with a naive assumption that all interfaces probed by the driver will have the expected number of altsettings.
The bug can be fixed by replacing references to the possibly non-existent intf->altsetting[alt] entry with the guaranteed-to-exist intf->cur_altsetting entry.
Signed-off-by: Alan Stern stern@rowland.harvard.edu Reported-and-tested-by: syzbot+d65f673b847a1a96cdba@syzkaller.appspotmail.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/w1/masters/ds2490.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/w1/masters/ds2490.c +++ b/drivers/w1/masters/ds2490.c @@ -1039,15 +1039,15 @@ static int ds_probe(struct usb_interface /* alternative 3, 1ms interrupt (greatly speeds search), 64 byte bulk */ alt = 3; err = usb_set_interface(dev->udev, - intf->altsetting[alt].desc.bInterfaceNumber, alt); + intf->cur_altsetting->desc.bInterfaceNumber, alt); if (err) { dev_err(&dev->udev->dev, "Failed to set alternative setting %d " "for %d interface: err=%d.\n", alt, - intf->altsetting[alt].desc.bInterfaceNumber, err); + intf->cur_altsetting->desc.bInterfaceNumber, err); goto err_out_clear; }
- iface_desc = &intf->altsetting[alt]; + iface_desc = intf->cur_altsetting; if (iface_desc->desc.bNumEndpoints != NUM_EP-1) { pr_info("Num endpoints=%d. It is not DS9490R.\n", iface_desc->desc.bNumEndpoints);
From: Alan Stern stern@rowland.harvard.edu
commit c01c348ecdc66085e44912c97368809612231520 upstream.
Some drivers (such as the vub300 MMC driver) expect usb_string() to return a properly NUL-terminated string, even when an error occurs. (In fact, vub300's probe routine doesn't bother to check the return code from usb_string().) When the driver goes on to use an unterminated string, it leads to kernel errors such as stack-out-of-bounds, as found by the syzkaller USB fuzzer.
An out-of-range string index argument is not at all unlikely, given that some devices don't provide string descriptors and therefore list 0 as the value for their string indexes. This patch makes usb_string() return a properly terminated empty string along with the -EINVAL error code when an out-of-range index is encountered.
And since a USB string index is a single-byte value, indexes >= 256 are just as invalid as values of 0 or below.
Signed-off-by: Alan Stern stern@rowland.harvard.edu Reported-by: syzbot+b75b85111c10b8d680f1@syzkaller.appspotmail.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/message.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -820,9 +820,11 @@ int usb_string(struct usb_device *dev, i
if (dev->state == USB_STATE_SUSPENDED) return -EHOSTUNREACH; - if (size <= 0 || !buf || !index) + if (size <= 0 || !buf) return -EINVAL; buf[0] = 0; + if (index <= 0 || index >= 256) + return -EINVAL; tbuf = kmalloc(256, GFP_NOIO); if (!tbuf) return -ENOMEM;
From: Alan Stern stern@rowland.harvard.edu
commit c2b71462d294cf517a0bc6e4fd6424d7cee5596f upstream.
The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active:
URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363
The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface.
Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound.
This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation!
As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative.
It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this.
Signed-off-by: Alan Stern stern@rowland.harvard.edu Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/usb/power-management.txt | 14 +++++++++----- drivers/usb/core/driver.c | 13 ------------- drivers/usb/storage/realtek_cr.c | 13 +++++-------- include/linux/usb.h | 2 -- 4 files changed, 14 insertions(+), 28 deletions(-)
--- a/Documentation/usb/power-management.txt +++ b/Documentation/usb/power-management.txt @@ -365,11 +365,15 @@ autosuspend the interface's device. Whe then the interface is considered to be idle, and the kernel may autosuspend the device.
-Drivers need not be concerned about balancing changes to the usage -counter; the USB core will undo any remaining "get"s when a driver -is unbound from its interface. As a corollary, drivers must not call -any of the usb_autopm_* functions after their disconnect() routine has -returned. +Drivers must be careful to balance their overall changes to the usage +counter. Unbalanced "get"s will remain in effect when a driver is +unbound from its interface, preventing the device from going into +runtime suspend should the interface be bound to a driver again. On +the other hand, drivers are allowed to achieve this balance by calling +the ``usb_autopm_*`` functions even after their ``disconnect`` routine +has returned -- say from within a work-queue routine -- provided they +retain an active reference to the interface (via ``usb_get_intf`` and +``usb_put_intf``).
Drivers using the async routines are responsible for their own synchronization and mutual exclusion. --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -470,11 +470,6 @@ static int usb_unbind_interface(struct d pm_runtime_disable(dev); pm_runtime_set_suspended(dev);
- /* Undo any residual pm_autopm_get_interface_* calls */ - for (r = atomic_read(&intf->pm_usage_cnt); r > 0; --r) - usb_autopm_put_interface_no_suspend(intf); - atomic_set(&intf->pm_usage_cnt, 0); - if (!error) usb_autosuspend_device(udev);
@@ -1625,7 +1620,6 @@ void usb_autopm_put_interface(struct usb int status;
usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); status = pm_runtime_put_sync(&intf->dev); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), @@ -1654,7 +1648,6 @@ void usb_autopm_put_interface_async(stru int status;
usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); status = pm_runtime_put(&intf->dev); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), @@ -1676,7 +1669,6 @@ void usb_autopm_put_interface_no_suspend struct usb_device *udev = interface_to_usbdev(intf);
usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); pm_runtime_put_noidle(&intf->dev); } EXPORT_SYMBOL_GPL(usb_autopm_put_interface_no_suspend); @@ -1707,8 +1699,6 @@ int usb_autopm_get_interface(struct usb_ status = pm_runtime_get_sync(&intf->dev); if (status < 0) pm_runtime_put_sync(&intf->dev); - else - atomic_inc(&intf->pm_usage_cnt); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), status); @@ -1742,8 +1732,6 @@ int usb_autopm_get_interface_async(struc status = pm_runtime_get(&intf->dev); if (status < 0 && status != -EINPROGRESS) pm_runtime_put_noidle(&intf->dev); - else - atomic_inc(&intf->pm_usage_cnt); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), status); @@ -1767,7 +1755,6 @@ void usb_autopm_get_interface_no_resume( struct usb_device *udev = interface_to_usbdev(intf);
usb_mark_last_busy(udev); - atomic_inc(&intf->pm_usage_cnt); pm_runtime_get_noresume(&intf->dev); } EXPORT_SYMBOL_GPL(usb_autopm_get_interface_no_resume); --- a/drivers/usb/storage/realtek_cr.c +++ b/drivers/usb/storage/realtek_cr.c @@ -772,18 +772,16 @@ static void rts51x_suspend_timer_fn(unsi break; case RTS51X_STAT_IDLE: case RTS51X_STAT_SS: - usb_stor_dbg(us, "RTS51X_STAT_SS, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "RTS51X_STAT_SS, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count));
- if (atomic_read(&us->pusb_intf->pm_usage_cnt) > 0) { + if (atomic_read(&us->pusb_intf->dev.power.usage_count) > 0) { usb_stor_dbg(us, "Ready to enter SS state\n"); rts51x_set_stat(chip, RTS51X_STAT_SS); /* ignore mass storage interface's children */ pm_suspend_ignore_children(&us->pusb_intf->dev, true); usb_autopm_put_interface_async(us->pusb_intf); - usb_stor_dbg(us, "RTS51X_STAT_SS 01, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "RTS51X_STAT_SS 01, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); } break; @@ -816,11 +814,10 @@ static void rts51x_invoke_transport(stru int ret;
if (working_scsi(srb)) { - usb_stor_dbg(us, "working scsi, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "working scsi, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count));
- if (atomic_read(&us->pusb_intf->pm_usage_cnt) <= 0) { + if (atomic_read(&us->pusb_intf->dev.power.usage_count) <= 0) { ret = usb_autopm_get_interface(us->pusb_intf); usb_stor_dbg(us, "working scsi, ret=%d\n", ret); } --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -127,7 +127,6 @@ enum usb_interface_condition { * @dev: driver model's view of this device * @usb_dev: if an interface is bound to the USB major, this will point * to the sysfs representation for that device. - * @pm_usage_cnt: PM usage counter for this interface * @reset_ws: Used for scheduling resets from atomic context. * @resetting_device: USB core reset the device, so use alt setting 0 as * current; needs bandwidth alloc after reset. @@ -184,7 +183,6 @@ struct usb_interface {
struct device dev; /* interface specific device info */ struct device *usb_dev; - atomic_t pm_usage_cnt; /* usage counter for autosuspend */ struct work_struct reset_ws; /* for resets in atomic context */ }; #define to_usb_interface(d) container_of(d, struct usb_interface, dev)
[ Upstream commit cef0d4948cb0a02db37ebfdc320e127c77ab1637 ]
There is a race condition that could happen if hid_debug_rdesc_show() is running while hdev is in the process of going away (device removal, system suspend, etc) which could result in NULL pointer dereference:
BUG: unable to handle kernel paging request at 0000000783316040 CPU: 1 PID: 1512 Comm: getevent Tainted: G U O 4.19.20-quilt-2e5dc0ac-00029-gc455a447dd55 #1 RIP: 0010:hid_dump_device+0x9b/0x160 Call Trace: hid_debug_rdesc_show+0x72/0x1d0 seq_read+0xe0/0x410 full_proxy_read+0x5f/0x90 __vfs_read+0x3a/0x170 vfs_read+0xa0/0x150 ksys_read+0x58/0xc0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x55/0x110 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Grab driver_input_lock to make sure the input device exists throughout the whole process of dumping the rdesc.
[jkosina@suse.cz: update changelog a bit] Signed-off-by: he, bo bo.he@intel.com Signed-off-by: "Zhang, Jun" jun.zhang@intel.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-debug.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/hid/hid-debug.c b/drivers/hid/hid-debug.c index d7179dd3c9ef..3cafa1d28fed 100644 --- a/drivers/hid/hid-debug.c +++ b/drivers/hid/hid-debug.c @@ -1058,10 +1058,15 @@ static int hid_debug_rdesc_show(struct seq_file *f, void *p) seq_printf(f, "\n\n");
/* dump parsed data and input mappings */ + if (down_interruptible(&hdev->driver_input_lock)) + return 0; + hid_dump_device(hdev, f); seq_printf(f, "\n"); hid_dump_input_mapping(hdev, f);
+ up(&hdev->driver_input_lock); + return 0; }
[ Upstream commit 15d82d22498784966df8e4696174a16b02cc1052 ]
When no alarm has been programmed on RSK-RZA1, an error message is printed during boot:
rtc rtc0: invalid alarm value: 2019-03-14T255:255:255
sh_rtc_read_alarm_value() returns 0xff when querying a hardware alarm field that is not enabled. __rtc_read_alarm() validates the received alarm values, and fills in missing fields when needed. While 0xff is handled fine for the year, month, and day fields, and corrected as considered being out-of-range, this is not the case for the hour, minute, and second fields, where -1 is expected for missing fields.
Fix this by returning -1 instead, as this value is handled fine for all fields.
Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-sh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/rtc/rtc-sh.c b/drivers/rtc/rtc-sh.c index 2b81dd4baf17..104c854d6a8a 100644 --- a/drivers/rtc/rtc-sh.c +++ b/drivers/rtc/rtc-sh.c @@ -455,7 +455,7 @@ static int sh_rtc_set_time(struct device *dev, struct rtc_time *tm) static inline int sh_rtc_read_alarm_value(struct sh_rtc *rtc, int reg_off) { unsigned int byte; - int value = 0xff; /* return 0xff for ignored values */ + int value = -1; /* return -1 for ignored values */
byte = readb(rtc->regbase + reg_off); if (byte & AR_ENB) {
[ Upstream commit dabb8338be533c18f50255cf39ff4f66d4dabdbe ]
The runtime_suspend device callbacks are not supposed to save configuration state or change the power state. Commit fb29f76cc566 ("igb: Fix an issue that PME is not enabled during runtime suspend") changed the driver to not save configuration state during runtime suspend, however the driver callback still put the device into a low-power state. This causes a warning in the pci pm core and results in pci_pm_runtime_suspend not calling pci_save_state or pci_finish_runtime_suspend.
Fix this by not changing the power state either, leaving that to pci pm core, and make the same change for suspend callback as well.
Also move a couple of defines into the appropriate header file instead of inline in the .c file.
Fixes: fb29f76cc566 ("igb: Fix an issue that PME is not enabled during runtime suspend") Signed-off-by: Arvind Sankar niveditas98@gmail.com Reviewed-by: Kai-Heng Feng kai.heng.feng@canonical.com Tested-by: Aaron Brown aaron.f.brown@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/igb/e1000_defines.h | 2 + drivers/net/ethernet/intel/igb/igb_main.c | 57 +++---------------- 2 files changed, 10 insertions(+), 49 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index b1915043bc0c..7b9fb71137da 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -193,6 +193,8 @@ /* enable link status from external LINK_0 and LINK_1 pins */ #define E1000_CTRL_SWDPIN0 0x00040000 /* SWDPIN 0 value */ #define E1000_CTRL_SWDPIN1 0x00080000 /* SWDPIN 1 value */ +#define E1000_CTRL_ADVD3WUC 0x00100000 /* D3 WUC */ +#define E1000_CTRL_EN_PHY_PWR_MGMT 0x00200000 /* PHY PM enable */ #define E1000_CTRL_SDP0_DIR 0x00400000 /* SDP0 Data direction */ #define E1000_CTRL_SDP1_DIR 0x00800000 /* SDP1 Data direction */ #define E1000_CTRL_RST 0x04000000 /* Global reset */ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index c1796aa2dde5..70ed5e5c3514 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -7325,9 +7325,7 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, struct e1000_hw *hw = &adapter->hw; u32 ctrl, rctl, status; u32 wufc = runtime ? E1000_WUFC_LNKC : adapter->wol; -#ifdef CONFIG_PM - int retval = 0; -#endif + bool wake;
rtnl_lock(); netif_device_detach(netdev); @@ -7338,14 +7336,6 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, igb_clear_interrupt_scheme(adapter); rtnl_unlock();
-#ifdef CONFIG_PM - if (!runtime) { - retval = pci_save_state(pdev); - if (retval) - return retval; - } -#endif - status = rd32(E1000_STATUS); if (status & E1000_STATUS_LU) wufc &= ~E1000_WUFC_LNKC; @@ -7362,10 +7352,6 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, }
ctrl = rd32(E1000_CTRL); - /* advertise wake from D3Cold */ - #define E1000_CTRL_ADVD3WUC 0x00100000 - /* phy power management enable */ - #define E1000_CTRL_EN_PHY_PWR_MGMT 0x00200000 ctrl |= E1000_CTRL_ADVD3WUC; wr32(E1000_CTRL, ctrl);
@@ -7379,12 +7365,15 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, wr32(E1000_WUFC, 0); }
- *enable_wake = wufc || adapter->en_mng_pt; - if (!*enable_wake) + wake = wufc || adapter->en_mng_pt; + if (!wake) igb_power_down_link(adapter); else igb_power_up_link(adapter);
+ if (enable_wake) + *enable_wake = wake; + /* Release control of h/w to f/w. If f/w is AMT enabled, this * would have already happened in close and is redundant. */ @@ -7399,22 +7388,7 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, #ifdef CONFIG_PM_SLEEP static int igb_suspend(struct device *dev) { - int retval; - bool wake; - struct pci_dev *pdev = to_pci_dev(dev); - - retval = __igb_shutdown(pdev, &wake, 0); - if (retval) - return retval; - - if (wake) { - pci_prepare_to_sleep(pdev); - } else { - pci_wake_from_d3(pdev, false); - pci_set_power_state(pdev, PCI_D3hot); - } - - return 0; + return __igb_shutdown(to_pci_dev(dev), NULL, 0); } #endif /* CONFIG_PM_SLEEP */
@@ -7483,22 +7457,7 @@ static int igb_runtime_idle(struct device *dev)
static int igb_runtime_suspend(struct device *dev) { - struct pci_dev *pdev = to_pci_dev(dev); - int retval; - bool wake; - - retval = __igb_shutdown(pdev, &wake, 1); - if (retval) - return retval; - - if (wake) { - pci_prepare_to_sleep(pdev); - } else { - pci_wake_from_d3(pdev, false); - pci_set_power_state(pdev, PCI_D3hot); - } - - return 0; + return __igb_shutdown(to_pci_dev(dev), NULL, 1); }
static int igb_runtime_resume(struct device *dev)
[ Upstream commit 18bebc6dd3281955240062655a4df35eef2c46b3 ]
Bond expects ethernet hwaddr for its slave, but it can be longer than 6 bytes - infiniband interface for example.
# cat /sys/devices/<skipped>/net/ib0/address 80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1
# cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr 80:00:02:08:fe:80
So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.
Signed-off-by: Konstantin Khorenko khorenko@virtuozzo.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_sysfs_slave.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_sysfs_slave.c b/drivers/net/bonding/bond_sysfs_slave.c index 7d16c51e6913..641a532b67cb 100644 --- a/drivers/net/bonding/bond_sysfs_slave.c +++ b/drivers/net/bonding/bond_sysfs_slave.c @@ -55,7 +55,9 @@ static SLAVE_ATTR_RO(link_failure_count);
static ssize_t perm_hwaddr_show(struct slave *slave, char *buf) { - return sprintf(buf, "%pM\n", slave->perm_hwaddr); + return sprintf(buf, "%*phC\n", + slave->dev->addr_len, + slave->perm_hwaddr); } static SLAVE_ATTR_RO(perm_hwaddr);
[ Upstream commit 4fdcfab5b5537c21891e22e65996d4d0dd8ab4ca ]
free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory.
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jffs2/readinode.c | 5 ----- fs/jffs2/super.c | 5 ++++- 2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/jffs2/readinode.c b/fs/jffs2/readinode.c index bfebbf13698c..5b52ea41b84f 100644 --- a/fs/jffs2/readinode.c +++ b/fs/jffs2/readinode.c @@ -1414,11 +1414,6 @@ void jffs2_do_clear_inode(struct jffs2_sb_info *c, struct jffs2_inode_info *f)
jffs2_kill_fragtree(&f->fragtree, deleted?c:NULL);
- if (f->target) { - kfree(f->target); - f->target = NULL; - } - fds = f->dents; while(fds) { fd = fds; diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c index 023e7f32ee1b..9fc297df8c75 100644 --- a/fs/jffs2/super.c +++ b/fs/jffs2/super.c @@ -47,7 +47,10 @@ static struct inode *jffs2_alloc_inode(struct super_block *sb) static void jffs2_i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); - kmem_cache_free(jffs2_inode_cachep, JFFS2_INODE_INFO(inode)); + struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode); + + kfree(f->target); + kmem_cache_free(jffs2_inode_cachep, f); }
static void jffs2_destroy_inode(struct inode *inode)
[ Upstream commit 93b919da64c15b90953f96a536e5e61df896ca57 ]
symlink body shouldn't be freed without an RCU delay. Switch debugfs to ->destroy_inode() and use of call_rcu(); free both the inode and symlink body in the callback. Similar to solution for bpf, only here it's even more obvious that ->evict_inode() can be dropped.
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/debugfs/inode.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index 22fe11baef2b..3530e1c3ff56 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -164,19 +164,24 @@ static int debugfs_show_options(struct seq_file *m, struct dentry *root) return 0; }
-static void debugfs_evict_inode(struct inode *inode) +static void debugfs_i_callback(struct rcu_head *head) { - truncate_inode_pages_final(&inode->i_data); - clear_inode(inode); + struct inode *inode = container_of(head, struct inode, i_rcu); if (S_ISLNK(inode->i_mode)) kfree(inode->i_link); + free_inode_nonrcu(inode); +} + +static void debugfs_destroy_inode(struct inode *inode) +{ + call_rcu(&inode->i_rcu, debugfs_i_callback); }
static const struct super_operations debugfs_super_operations = { .statfs = simple_statfs, .remount_fs = debugfs_remount, .show_options = debugfs_show_options, - .evict_inode = debugfs_evict_inode, + .destroy_inode = debugfs_destroy_inode, };
static struct vfsmount *debugfs_automount(struct path *path)
[ Upstream commit 882c5e552ffd06856de42261460f46e18319d259 ]
The DA9063AD doesn't support alarms on any seconds and its granularity is the minute. Set uie_unsupported in that case.
Reported-by: Wolfram Sang wsa+renesas@sang-engineering.com Reported-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Wolfram Sang wsa+renesas@sang-engineering.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com Acked-by: Steve Twiss stwiss.opensource@diasemi.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-da9063.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/rtc/rtc-da9063.c b/drivers/rtc/rtc-da9063.c index d6c853bbfa9f..e93beecd5010 100644 --- a/drivers/rtc/rtc-da9063.c +++ b/drivers/rtc/rtc-da9063.c @@ -491,6 +491,13 @@ static int da9063_rtc_probe(struct platform_device *pdev) da9063_data_to_tm(data, &rtc->alarm_time, rtc); rtc->rtc_sync = false;
+ /* + * TODO: some models have alarms on a minute boundary but still support + * real hardware interrupts. Add this once the core supports it. + */ + if (config->rtc_data_start != RTC_SEC) + rtc->rtc_dev->uie_unsupported = 1; + irq_alarm = platform_get_irq_byname(pdev, "ALARM"); ret = devm_request_threaded_irq(&pdev->dev, irq_alarm, NULL, da9063_alarm_event,
[ Upstream commit 426b046b748d1f47e096e05bdcc6fb4172791307 ]
When compiling with -Wformat, clang emits the following warnings:
drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~
drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~
drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~
drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~
drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for unsigned ints.
Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Louis Taylor louis@kragniz.eu Reviewed-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/vfio_pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b31b84f56e8f..47b229fa5e8e 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1191,11 +1191,11 @@ static void __init vfio_pci_fill_ids(void) rc = pci_add_dynid(&vfio_pci_driver, vendor, device, subvendor, subdevice, class, class_mask, 0); if (rc) - pr_warn("failed to add dynamic id [%04hx:%04hx[%04hx:%04hx]] class %#08x/%08x (%d)\n", + pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", vendor, device, subvendor, subdevice, class, class_mask, rc); else - pr_info("add [%04hx:%04hx[%04hx:%04hx]] class %#08x/%08x\n", + pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n", vendor, device, subvendor, subdevice, class, class_mask); }
[ Upstream commit 382e06d11e075a40b4094b6ef809f8d4bcc7ab2a ]
When the number of sub-channels offered by Hyper-V is >= the number of CPUs in the VM, calculate the correct number of sub-channels. The current code produces one too many.
This scenario arises only when the number of CPUs is artificially restricted (for example, with maxcpus=<n> on the kernel boot line), because Hyper-V normally offers a sub-channel count < number of CPUs. While the current code doesn't break, the extra sub-channel is unbalanced across the CPUs (for example, a total of 5 channels on a VM with 4 CPUs).
Signed-off-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Reviewed-by: Long Li longli@microsoft.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/storvsc_drv.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 44b7a69d022a..45cd4cf93af3 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -613,13 +613,22 @@ static void handle_sc_creation(struct vmbus_channel *new_sc) static void handle_multichannel_storage(struct hv_device *device, int max_chns) { struct storvsc_device *stor_device; - int num_cpus = num_online_cpus(); int num_sc; struct storvsc_cmd_request *request; struct vstor_packet *vstor_packet; int ret, t;
- num_sc = ((max_chns > num_cpus) ? num_cpus : max_chns); + /* + * If the number of CPUs is artificially restricted, such as + * with maxcpus=1 on the kernel boot line, Hyper-V could offer + * sub-channels >= the number of CPUs. These sub-channels + * should not be created. The primary channel is already created + * and assigned to one CPU, so check against # CPUs - 1. + */ + num_sc = min((int)(num_online_cpus() - 1), max_chns); + if (!num_sc) + return; + stor_device = get_out_stor_device(device); if (!stor_device) return;
[ Upstream commit acb1ce15a61154aa501891d67ebf79bc9ea26818 ]
When the HNS driver loaded, always have an error print: "netif_napi_add() called with weight 256"
This is because the kernel checks the NAPI polling weights requested by drivers and it prints an error message if a driver requests a weight bigger than 64.
So use NAPI_POLL_WEIGHT to fix it.
Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Peng Li lipeng321@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index 2fa54b0b0679..6d649e7b45a9 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -28,9 +28,6 @@
#define SERVICE_TIMER_HZ (1 * HZ)
-#define NIC_TX_CLEAN_MAX_NUM 256 -#define NIC_RX_CLEAN_MAX_NUM 64 - #define RCB_IRQ_NOT_INITED 0 #define RCB_IRQ_INITED 1
@@ -1408,7 +1405,7 @@ static int hns_nic_init_ring_data(struct hns_nic_priv *priv) rd->fini_process = hns_nic_tx_fini_pro;
netif_napi_add(priv->netdev, &rd->napi, - hns_nic_common_poll, NIC_TX_CLEAN_MAX_NUM); + hns_nic_common_poll, NAPI_POLL_WEIGHT); rd->ring->irq_init_flag = RCB_IRQ_NOT_INITED; } for (i = h->q_num; i < h->q_num * 2; i++) { @@ -1420,7 +1417,7 @@ static int hns_nic_init_ring_data(struct hns_nic_priv *priv) rd->fini_process = hns_nic_rx_fini_pro;
netif_napi_add(priv->netdev, &rd->napi, - hns_nic_common_poll, NIC_RX_CLEAN_MAX_NUM); + hns_nic_common_poll, NAPI_POLL_WEIGHT); rd->ring->irq_init_flag = RCB_IRQ_NOT_INITED; }
[ Upstream commit 8601a99d7c0256b7a7fdd1ab14cf6c1f1dfcadc6 ]
When enable SMMU, remove HNS driver will cause a WARNING:
[ 141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8 [ 141.954673] Modules linked in: hns_enet_drv(-) [ 141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G W 5.0.0-rc1-28723-gb729c57de95c-dirty #32 [ 141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017 [ 142.000244] pstate: 60000005 (nZCv daif -PAN -UAO) [ 142.009886] pc : __iommu_dma_unmap+0xc0/0xc8 [ 142.018476] lr : __iommu_dma_unmap+0xc0/0xc8 [ 142.027066] sp : ffff000013533b90 [ 142.033728] x29: ffff000013533b90 x28: ffff8013e6983600 [ 142.044420] x27: 0000000000000000 x26: 0000000000000000 [ 142.055113] x25: 0000000056000000 x24: 0000000000000015 [ 142.065806] x23: 0000000000000028 x22: ffff8013e66eee68 [ 142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000 [ 142.087192] x19: 0000000000001000 x18: 0000000000000007 [ 142.097885] x17: 000000000000000e x16: 0000000000000001 [ 142.108578] x15: 0000000000000019 x14: 363139343a70616d [ 142.119270] x13: 6e75656761705f67 x12: 0000000000000000 [ 142.129963] x11: 00000000ffffffff x10: 0000000000000006 [ 142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0 [ 142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8 [ 142.162042] x5 : 0000000000000000 x4 : 0000000000000000 [ 142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500 [ 142.183427] x1 : 0000000000000000 x0 : 0000000000000035 [ 142.194120] Call trace: [ 142.199030] __iommu_dma_unmap+0xc0/0xc8 [ 142.206920] iommu_dma_unmap_page+0x20/0x28 [ 142.215335] __iommu_unmap_page+0x40/0x60 [ 142.223399] hnae_unmap_buffer+0x110/0x134 [ 142.231639] hnae_free_desc+0x6c/0x10c [ 142.239177] hnae_fini_ring+0x14/0x34 [ 142.246540] hnae_fini_queue+0x2c/0x40 [ 142.254080] hnae_put_handle+0x38/0xcc [ 142.261619] hns_nic_dev_remove+0x54/0xfc [hns_enet_drv] [ 142.272312] platform_drv_remove+0x24/0x64 [ 142.280552] device_release_driver_internal+0x17c/0x20c [ 142.291070] driver_detach+0x4c/0x90 [ 142.298259] bus_remove_driver+0x5c/0xd8 [ 142.306148] driver_unregister+0x2c/0x54 [ 142.314037] platform_driver_unregister+0x10/0x18 [ 142.323505] hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv] [ 142.335248] __arm64_sys_delete_module+0x214/0x25c [ 142.344891] el0_svc_common+0xb0/0x10c [ 142.352430] el0_svc_handler+0x24/0x80 [ 142.359968] el0_svc+0x8/0x7c0 [ 142.366104] ---[ end trace 60ad1cd58e63c407 ]---
The tx ring buffer map when xmit and unmap when xmit done. So in hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring() have a unmap operation for tx ring buffer, which is already unmapped when xmit done, than cause this WARNING.
The hnae_alloc_buffers() is called in hnae_init_ring(), so the hnae_free_buffers() should be in hnae_fini_ring(), not in hnae_free_desc().
In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring(). When the ring buffer is tx ring, adds a piece of code to ensure that the tx ring is unmap.
Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Peng Li lipeng321@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns/hnae.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c index b3645297477e..3ce41efe8a94 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.c +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -144,7 +144,6 @@ static int hnae_alloc_buffers(struct hnae_ring *ring) /* free desc along with its attached buffer */ static void hnae_free_desc(struct hnae_ring *ring) { - hnae_free_buffers(ring); dma_unmap_single(ring_to_dev(ring), ring->desc_dma_addr, ring->desc_num * sizeof(ring->desc[0]), ring_to_dma_dir(ring)); @@ -177,6 +176,9 @@ static int hnae_alloc_desc(struct hnae_ring *ring) /* fini ring, also free the buffer for the ring */ static void hnae_fini_ring(struct hnae_ring *ring) { + if (is_rx_ring(ring)) + hnae_free_buffers(ring); + hnae_free_desc(ring); kfree(ring->desc_cb); ring->desc_cb = NULL;
[ Upstream commit 58b6e5e8f1addd44583d61b0a03c0f5519527e35 ]
When mknod is used to create a block special file in hugetlbfs, it will allocate an inode and kmalloc a 'struct resv_map' via resv_map_alloc(). inode->i_mapping->private_data will point the newly allocated resv_map. However, when the device special file is opened bd_acquire() will set inode->i_mapping to bd_inode->i_mapping. Thus the pointer to the allocated resv_map is lost and the structure is leaked.
Programs to reproduce: mount -t hugetlbfs nodev hugetlbfs mknod hugetlbfs/dev b 0 0 exec 30<> hugetlbfs/dev umount hugetlbfs/
resv_map structures are only needed for inodes which can have associated page allocations. To fix the leak, only allocate resv_map for those inodes which could possibly be associated with page allocations.
Link: http://lkml.kernel.org/r/20190401213101.16476-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Reported-by: Yufen Yu yuyufen@huawei.com Suggested-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/hugetlbfs/inode.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cefae2350da5..27c4e2ac39a9 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -745,11 +745,17 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, umode_t mode, dev_t dev) { struct inode *inode; - struct resv_map *resv_map; + struct resv_map *resv_map = NULL;
- resv_map = resv_map_alloc(); - if (!resv_map) - return NULL; + /* + * Reserve maps are only needed for inodes that can have associated + * page allocations. + */ + if (S_ISREG(mode) || S_ISLNK(mode)) { + resv_map = resv_map_alloc(); + if (!resv_map) + return NULL; + }
inode = new_inode(sb); if (inode) { @@ -790,8 +796,10 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, break; } lockdep_annotate_inode_mutex_key(inode); - } else - kref_put(&resv_map->refs, resv_map_release); + } else { + if (resv_map) + kref_put(&resv_map->refs, resv_map_release); + }
return inode; }
[ Upstream commit 47b16820c490149c2923e8474048f2c6e7557cab ]
If xace hardware reports a bad version number, the error handling code in ace_setup() calls put_disk(), followed by queue cleanup. However, since the disk data structure has the queue pointer set, put_disk() also cleans and releases the queue. This results in blk_cleanup_queue() accessing an already released data structure, which in turn may result in a crash such as the following.
[ 10.681671] BUG: Kernel NULL pointer dereference at 0x00000040 [ 10.681826] Faulting instruction address: 0xc0431480 [ 10.682072] Oops: Kernel access of bad area, sig: 11 [#1] [ 10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440 [ 10.682387] Modules linked in: [ 10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G W 5.0.0-rc6-next-20190218+ #2 [ 10.682733] NIP: c0431480 LR: c043147c CTR: c0422ad8 [ 10.682863] REGS: cf82fbe0 TRAP: 0300 Tainted: G W (5.0.0-rc6-next-20190218+) [ 10.683065] MSR: 00029000 <CE,EE,ME> CR: 22000222 XER: 00000000 [ 10.683236] DEAR: 00000040 ESR: 00000000 [ 10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000 [ 10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000 [ 10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000 [ 10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800 [ 10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114 [ 10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114 [ 10.684602] Call Trace: [ 10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable) [ 10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c [ 10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68 [ 10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c [ 10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508 [ 10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8 [ 10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c [ 10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464 [ 10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4 [ 10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc [ 10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0 [ 10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234 [ 10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c [ 10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac [ 10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330 [ 10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478 [ 10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114 [ 10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c [ 10.687349] Instruction dump: [ 10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008 [ 10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008 [ 10.688056] ---[ end trace 13c9ff51d41b9d40 ]---
Fix the problem by setting the disk queue pointer to NULL before calling put_disk(). A more comprehensive fix might be to rearrange the code to check the hardware version before initializing data structures, but I don't know if this would have undesirable side effects, and it would increase the complexity of backporting the fix to older kernels.
Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash interface") Acked-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/xsysace.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/block/xsysace.c b/drivers/block/xsysace.c index c4328d9d9981..f838119d12b2 100644 --- a/drivers/block/xsysace.c +++ b/drivers/block/xsysace.c @@ -1062,6 +1062,8 @@ static int ace_setup(struct ace_device *ace) return 0;
err_read: + /* prevent double queue cleanup */ + ace->gd->queue = NULL; put_disk(ace->gd); err_alloc_disk: blk_cleanup_queue(ace->queue);
[ Upstream commit cd92d74d67c811dc22544430b9ac3029f5bd64c5 ]
clang warns about statically defined DMA masks from the DMA_BIT_MASK macro with length 64:
arch/arm/plat-orion/common.c:625:29: error: shift count >= width of type [-Werror,-Wshift-count-overflow] .coherent_dma_mask = DMA_BIT_MASK(64), ^~~~~~~~~~~~~~~~ include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK' #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
The ones in orion shouldn't really be 64 bit masks, so changing them to what the driver can support avoids the warning.
Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/plat-orion/common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c index 8861c367d061..51c3737ddba7 100644 --- a/arch/arm/plat-orion/common.c +++ b/arch/arm/plat-orion/common.c @@ -645,7 +645,7 @@ static struct platform_device orion_xor0_shared = { .resource = orion_xor0_shared_resources, .dev = { .dma_mask = &orion_xor_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = &orion_xor0_pdata, }, }; @@ -706,7 +706,7 @@ static struct platform_device orion_xor1_shared = { .resource = orion_xor1_shared_resources, .dev = { .dma_mask = &orion_xor_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = &orion_xor1_pdata, }, };
[ Upstream commit 2125801ccce19249708ca3245d48998e70569ab8 ]
clang warns about statically defined DMA masks from the DMA_BIT_MASK macro with length 64:
arch/arm/mach-iop13xx/setup.c:303:35: error: shift count >= width of type [-Werror,-Wshift-count-overflow] static u64 iop13xx_adma_dmamask = DMA_BIT_MASK(64); ^~~~~~~~~~~~~~~~ include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK' #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) ^ ~~~
The ones in iop shouldn't really be 64 bit masks, so changing them to what the driver can support avoids the warning.
Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-iop13xx/setup.c | 8 ++++---- arch/arm/mach-iop13xx/tpmi.c | 10 +++++----- arch/arm/plat-iop/adma.c | 6 +++--- 3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/arm/mach-iop13xx/setup.c b/arch/arm/mach-iop13xx/setup.c index 53c316f7301e..fe4932fda01d 100644 --- a/arch/arm/mach-iop13xx/setup.c +++ b/arch/arm/mach-iop13xx/setup.c @@ -300,7 +300,7 @@ static struct resource iop13xx_adma_2_resources[] = { } };
-static u64 iop13xx_adma_dmamask = DMA_BIT_MASK(64); +static u64 iop13xx_adma_dmamask = DMA_BIT_MASK(32); static struct iop_adma_platform_data iop13xx_adma_0_data = { .hw_id = 0, .pool_size = PAGE_SIZE, @@ -324,7 +324,7 @@ static struct platform_device iop13xx_adma_0_channel = { .resource = iop13xx_adma_0_resources, .dev = { .dma_mask = &iop13xx_adma_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = (void *) &iop13xx_adma_0_data, }, }; @@ -336,7 +336,7 @@ static struct platform_device iop13xx_adma_1_channel = { .resource = iop13xx_adma_1_resources, .dev = { .dma_mask = &iop13xx_adma_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = (void *) &iop13xx_adma_1_data, }, }; @@ -348,7 +348,7 @@ static struct platform_device iop13xx_adma_2_channel = { .resource = iop13xx_adma_2_resources, .dev = { .dma_mask = &iop13xx_adma_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = (void *) &iop13xx_adma_2_data, }, }; diff --git a/arch/arm/mach-iop13xx/tpmi.c b/arch/arm/mach-iop13xx/tpmi.c index db511ec2b1df..116feb6b261e 100644 --- a/arch/arm/mach-iop13xx/tpmi.c +++ b/arch/arm/mach-iop13xx/tpmi.c @@ -152,7 +152,7 @@ static struct resource iop13xx_tpmi_3_resources[] = { } };
-u64 iop13xx_tpmi_mask = DMA_BIT_MASK(64); +u64 iop13xx_tpmi_mask = DMA_BIT_MASK(32); static struct platform_device iop13xx_tpmi_0_device = { .name = "iop-tpmi", .id = 0, @@ -160,7 +160,7 @@ static struct platform_device iop13xx_tpmi_0_device = { .resource = iop13xx_tpmi_0_resources, .dev = { .dma_mask = &iop13xx_tpmi_mask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), }, };
@@ -171,7 +171,7 @@ static struct platform_device iop13xx_tpmi_1_device = { .resource = iop13xx_tpmi_1_resources, .dev = { .dma_mask = &iop13xx_tpmi_mask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), }, };
@@ -182,7 +182,7 @@ static struct platform_device iop13xx_tpmi_2_device = { .resource = iop13xx_tpmi_2_resources, .dev = { .dma_mask = &iop13xx_tpmi_mask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), }, };
@@ -193,7 +193,7 @@ static struct platform_device iop13xx_tpmi_3_device = { .resource = iop13xx_tpmi_3_resources, .dev = { .dma_mask = &iop13xx_tpmi_mask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), }, };
diff --git a/arch/arm/plat-iop/adma.c b/arch/arm/plat-iop/adma.c index a4d1f8de3b5b..d9612221e484 100644 --- a/arch/arm/plat-iop/adma.c +++ b/arch/arm/plat-iop/adma.c @@ -143,7 +143,7 @@ struct platform_device iop3xx_dma_0_channel = { .resource = iop3xx_dma_0_resources, .dev = { .dma_mask = &iop3xx_adma_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = (void *) &iop3xx_dma_0_data, }, }; @@ -155,7 +155,7 @@ struct platform_device iop3xx_dma_1_channel = { .resource = iop3xx_dma_1_resources, .dev = { .dma_mask = &iop3xx_adma_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = (void *) &iop3xx_dma_1_data, }, }; @@ -167,7 +167,7 @@ struct platform_device iop3xx_aau_channel = { .resource = iop3xx_aau_resources, .dev = { .dma_mask = &iop3xx_adma_dmamask, - .coherent_dma_mask = DMA_BIT_MASK(64), + .coherent_dma_mask = DMA_BIT_MASK(32), .platform_data = (void *) &iop3xx_aau_data, }, };
commit c409ca3be3c6ff3a1eeb303b191184e80d412862 upstream.
Backport of the upstream commit, which fixed c6688ef9f297. c6688ef9f297 got backported as commit b6f826ba10dc, as the unavailable function usb_endpoint_maxp_mult had to be replaced. The upstream commit removed the call to this function, so the backport is straightforward.
Original commit message:
Change the validation of number_of_packets in get_pipe to compare the number of packets to a fixed maximum number of packets allowed, set to be 1024. This number was chosen due to it being used by other drivers as well, for example drivers/usb/host/uhci-q.c
Background/reason: The get_pipe function in stub_rx.c validates the number of packets in isochronous mode and aborts with an error if that number is too large, in order to prevent malicious input from possibly triggering large memory allocations. This was previously done by checking whether pdu->u.cmd_submit.number_of_packets is bigger than the number of packets that would be needed for pdu->u.cmd_submit.transfer_buffer_length bytes if all except possibly the last packet had maximum length, given by usb_endpoint_maxp(epd) * usb_endpoint_maxp_mult(epd). This leads to an error if URBs with packets shorter than the maximum possible length are submitted, which is allowed according to Documentation/driver-api/usb/URB.rst and occurs for example with the snd-usb-audio driver.
Fixes: b6f826ba10dc ("usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input") Signed-off-by: Malte Leip malte@leip.net Cc: stable stable@vger.kernel.org # 4.4.x Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/usbip/stub_rx.c | 18 +++--------------- drivers/usb/usbip/usbip_common.h | 7 +++++++ 2 files changed, 10 insertions(+), 15 deletions(-)
diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c index 56cacb68040c..808e3a317954 100644 --- a/drivers/usb/usbip/stub_rx.c +++ b/drivers/usb/usbip/stub_rx.c @@ -380,22 +380,10 @@ static int get_pipe(struct stub_device *sdev, struct usbip_header *pdu) }
if (usb_endpoint_xfer_isoc(epd)) { - /* validate packet size and number of packets */ - unsigned int maxp, packets, bytes; - -#define USB_EP_MAXP_MULT_SHIFT 11 -#define USB_EP_MAXP_MULT_MASK (3 << USB_EP_MAXP_MULT_SHIFT) -#define USB_EP_MAXP_MULT(m) \ - (((m) & USB_EP_MAXP_MULT_MASK) >> USB_EP_MAXP_MULT_SHIFT) - - maxp = usb_endpoint_maxp(epd); - maxp *= (USB_EP_MAXP_MULT( - __le16_to_cpu(epd->wMaxPacketSize)) + 1); - bytes = pdu->u.cmd_submit.transfer_buffer_length; - packets = DIV_ROUND_UP(bytes, maxp); - + /* validate number of packets */ if (pdu->u.cmd_submit.number_of_packets < 0 || - pdu->u.cmd_submit.number_of_packets > packets) { + pdu->u.cmd_submit.number_of_packets > + USBIP_MAX_ISO_PACKETS) { dev_err(&sdev->udev->dev, "CMD_SUBMIT: isoc invalid num packets %d\n", pdu->u.cmd_submit.number_of_packets); diff --git a/drivers/usb/usbip/usbip_common.h b/drivers/usb/usbip/usbip_common.h index 0fc5ace57c0e..af903aa4ad90 100644 --- a/drivers/usb/usbip/usbip_common.h +++ b/drivers/usb/usbip/usbip_common.h @@ -134,6 +134,13 @@ extern struct device_attribute dev_attr_usbip_debug; #define USBIP_DIR_OUT 0x00 #define USBIP_DIR_IN 0x01
+/* + * Arbitrary limit for the maximum number of isochronous packets in an URB, + * compare for example the uhci_submit_isochronous function in + * drivers/usb/host/uhci-q.c + */ +#define USBIP_MAX_ISO_PACKETS 1024 + /** * struct usbip_header_basic - data pertinent to every request * @command: the usbip request type
From: Jeremy Fertic jeremyfertic@gmail.com
commit 10bfe7cc1739c22f0aa296b39e53f61e9e3f4d99 upstream.
With adt7516/7/9, internal vref is available for dacs a and b, dacs c and d, or all dacs. The driver doesn't currently support internal vref for all dacs. Change the else if to an if so both bits are checked rather than just one or the other.
Signed-off-by: Jeremy Fertic jeremyfertic@gmail.com Fixes: 35f6b6b86ede ("staging: iio: new ADT7316/7/8 and ADT7516/7/9 driver") Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/iio/addac/adt7316.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/iio/addac/adt7316.c +++ b/drivers/staging/iio/addac/adt7316.c @@ -1092,7 +1092,7 @@ static ssize_t adt7316_store_DAC_interna ldac_config = chip->ldac_config & (~ADT7516_DAC_IN_VREF_MASK); if (data & 0x1) ldac_config |= ADT7516_DAC_AB_IN_VREF; - else if (data & 0x2) + if (data & 0x2) ldac_config |= ADT7516_DAC_CD_IN_VREF; } else { ret = kstrtou8(buf, 16, &data);
From: Jeremy Fertic jeremyfertic@gmail.com
commit 45130fb030aec26ac28b4bb23344901df3ec3b7f upstream.
The calculation of the current dac value is using the wrong bits of the dac lsb register. Create two macros to shift the lsb register value into lsb position, depending on whether the dac is 10 or 12 bit. Initialize data to 0 so, with an 8 bit dac, the msb register value can be bitwise ORed with data.
Fixes: 35f6b6b86ede ("staging: iio: new ADT7316/7/8 and ADT7516/7/9 driver") Signed-off-by: Jeremy Fertic jeremyfertic@gmail.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/iio/addac/adt7316.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/staging/iio/addac/adt7316.c +++ b/drivers/staging/iio/addac/adt7316.c @@ -47,6 +47,8 @@ #define ADT7516_MSB_AIN3 0xA #define ADT7516_MSB_AIN4 0xB #define ADT7316_DA_DATA_BASE 0x10 +#define ADT7316_DA_10_BIT_LSB_SHIFT 6 +#define ADT7316_DA_12_BIT_LSB_SHIFT 4 #define ADT7316_DA_MSB_DATA_REGS 4 #define ADT7316_LSB_DAC_A 0x10 #define ADT7316_MSB_DAC_A 0x11 @@ -1414,7 +1416,7 @@ static IIO_DEVICE_ATTR(ex_analog_temp_of static ssize_t adt7316_show_DAC(struct adt7316_chip_info *chip, int channel, char *buf) { - u16 data; + u16 data = 0; u8 msb, lsb, offset; int ret;
@@ -1439,7 +1441,11 @@ static ssize_t adt7316_show_DAC(struct a if (ret) return -EIO;
- data = (msb << offset) + (lsb & ((1 << offset) - 1)); + if (chip->dac_bits == 12) + data = lsb >> ADT7316_DA_12_BIT_LSB_SHIFT; + else if (chip->dac_bits == 10) + data = lsb >> ADT7316_DA_10_BIT_LSB_SHIFT; + data |= msb << offset;
return sprintf(buf, "%d\n", data); }
From: Jeremy Fertic jeremyfertic@gmail.com
commit 78accaea117c1ae878774974fab91ac4a0b0e2b0 upstream.
The lsb calculation is not masking the correct bits from the user input. Subtract 1 from (1 << offset) to correctly set up the mask to be applied to user input.
The lsb register stores its value starting at the bit 7 position. adt7316_store_DAC() currently assumes the value is at the other end of the register. Shift the lsb value before storing it in a new variable lsb_reg, and write this variable to the lsb register.
Fixes: 35f6b6b86ede ("staging: iio: new ADT7316/7/8 and ADT7516/7/9 driver") Signed-off-by: Jeremy Fertic jeremyfertic@gmail.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/iio/addac/adt7316.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/staging/iio/addac/adt7316.c +++ b/drivers/staging/iio/addac/adt7316.c @@ -1453,7 +1453,7 @@ static ssize_t adt7316_show_DAC(struct a static ssize_t adt7316_store_DAC(struct adt7316_chip_info *chip, int channel, const char *buf, size_t len) { - u8 msb, lsb, offset; + u8 msb, lsb, lsb_reg, offset; u16 data; int ret;
@@ -1471,9 +1471,13 @@ static ssize_t adt7316_store_DAC(struct return -EINVAL;
if (chip->dac_bits > 8) { - lsb = data & (1 << offset); + lsb = data & ((1 << offset) - 1); + if (chip->dac_bits == 12) + lsb_reg = lsb << ADT7316_DA_12_BIT_LSB_SHIFT; + else + lsb_reg = lsb << ADT7316_DA_10_BIT_LSB_SHIFT; ret = chip->bus.write(chip->bus.client, - ADT7316_DA_DATA_BASE + channel * 2, lsb); + ADT7316_DA_DATA_BASE + channel * 2, lsb_reg); if (ret) return -EIO; }
From: Anson Huang anson.huang@nxp.com
commit bf2a7ca39fd3ab47ef71c621a7ee69d1813b1f97 upstream.
SNVS IRQ is requested before necessary driver data initialized, if there is a pending IRQ during driver probe phase, kernel NULL pointer panic will occur in IRQ handler. To avoid such scenario, just initialize necessary driver data before enabling IRQ. This patch is inspired by NXP's internal kernel tree.
Fixes: d3dc6e232215 ("input: keyboard: imx: add snvs power key driver") Signed-off-by: Anson Huang Anson.Huang@nxp.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/keyboard/snvs_pwrkey.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/input/keyboard/snvs_pwrkey.c +++ b/drivers/input/keyboard/snvs_pwrkey.c @@ -156,6 +156,9 @@ static int imx_snvs_pwrkey_probe(struct return error; }
+ pdata->input = input; + platform_set_drvdata(pdev, pdata); + error = devm_request_irq(&pdev->dev, pdata->irq, imx_snvs_pwrkey_interrupt, 0, pdev->name, pdev); @@ -172,9 +175,6 @@ static int imx_snvs_pwrkey_probe(struct return error; }
- pdata->input = input; - platform_set_drvdata(pdev, pdata); - device_init_wakeup(&pdev->dev, pdata->wakeup);
return 0;
From: Ondrej Mosnacek omosnace@redhat.com
commit a83d6ddaebe541570291205cb538e35ad4ff94f9 upstream.
In the SECURITY_FS_USE_MNTPOINT case we never want to allow relabeling files/directories, so we should never set the SBLABEL_MNT flag. The 'special handling' in selinux_is_sblabel_mnt() is only intended for when the behavior is set to SECURITY_FS_USE_GENFS.
While there, make the logic in selinux_is_sblabel_mnt() more explicit and add a BUILD_BUG_ON() to make sure that introducing a new SECURITY_FS_USE_* forces a review of the logic.
Fixes: d5f3a5f6e7e7 ("selinux: add security in-core xattr support for pstore and debugfs") Signed-off-by: Ondrej Mosnacek omosnace@redhat.com Reviewed-by: Stephen Smalley sds@tycho.nsa.gov Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/selinux/hooks.c | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-)
--- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -396,21 +396,43 @@ static int may_context_mount_inode_relab return rc; }
-static int selinux_is_sblabel_mnt(struct super_block *sb) +static int selinux_is_genfs_special_handling(struct super_block *sb) { - struct superblock_security_struct *sbsec = sb->s_security; - - return sbsec->behavior == SECURITY_FS_USE_XATTR || - sbsec->behavior == SECURITY_FS_USE_TRANS || - sbsec->behavior == SECURITY_FS_USE_TASK || - sbsec->behavior == SECURITY_FS_USE_NATIVE || - /* Special handling. Genfs but also in-core setxattr handler */ - !strcmp(sb->s_type->name, "sysfs") || + /* Special handling. Genfs but also in-core setxattr handler */ + return !strcmp(sb->s_type->name, "sysfs") || !strcmp(sb->s_type->name, "pstore") || !strcmp(sb->s_type->name, "debugfs") || !strcmp(sb->s_type->name, "rootfs"); }
+static int selinux_is_sblabel_mnt(struct super_block *sb) +{ + struct superblock_security_struct *sbsec = sb->s_security; + + /* + * IMPORTANT: Double-check logic in this function when adding a new + * SECURITY_FS_USE_* definition! + */ + BUILD_BUG_ON(SECURITY_FS_USE_MAX != 7); + + switch (sbsec->behavior) { + case SECURITY_FS_USE_XATTR: + case SECURITY_FS_USE_TRANS: + case SECURITY_FS_USE_TASK: + case SECURITY_FS_USE_NATIVE: + return 1; + + case SECURITY_FS_USE_GENFS: + return selinux_is_genfs_special_handling(sb); + + /* Never allow relabeling on context mounts */ + case SECURITY_FS_USE_MNTPOINT: + case SECURITY_FS_USE_NONE: + default: + return 0; + } +} + static int sb_finish_set_opts(struct super_block *sb) { struct superblock_security_struct *sbsec = sb->s_security;
From: Tony Luck tony.luck@intel.com
commit 41f035a86b5b72a4f947c38e94239d20d595352a upstream.
In
c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover")
a case was added for a machine check caused by a DATA access to poison memory from the kernel. A case should have been added also for an uncorrectable error during an instruction fetch in the kernel.
Add that extra case so the error message now reads:
mce: [Hardware Error]: Machine check: Instruction fetch error in kernel
Fixes: c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover") Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Borislav Petkov bp@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Pu Wen puwen@hygon.cn Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20190225205940.15226-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/mcheck/mce-severity.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -132,6 +132,11 @@ static struct severity { SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR), USER ), + MCESEV( + PANIC, "Instruction fetch error in kernel", + SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR), + KERNEL + ), #endif MCESEV( PANIC, "Action required: unknown MCACOD",
From: Jacopo Mondi jacopo+renesas@jmondi.org
commit 61da76beef1e4f0b6ba7be4f8d0cf0dac7ce1f55 upstream.
The following commits: commit f6dd927f34d6 ("[media] media: ov7670: calculate framerate properly for ov7675") commit 04ee6d92047e ("[media] media: ov7670: add possibility to bypass pll for ov7675") introduced the ability to bypass PLL multiplier and use input clock (xvclk) as pixel clock output frequency for ov7675 sensor.
PLL is bypassed using register DBLV[7:6], according to ov7670 and ov7675 sensor manuals. Macros used to set DBLV register seem wrong in the driver, as their values do not match what reported in the datasheet.
Fix by changing DBLV_* macros to use bits [7:6] and set bits [3:0] to default 0x0a reserved value (according to datasheets).
While at there, remove a write to DBLV register in "ov7675_set_framerate()" that over-writes the previous one to the same register that takes "info->pll_bypass" flag into account instead of setting PLL multiplier to 4x unconditionally.
And, while at there, since "info->pll_bypass" is only used in set/get_framerate() functions used by ov7675 only, it is not necessary to check for the device id at probe time to make sure that when using ov7670 "info->pll_bypass" is set to false.
Fixes: f6dd927f34d6 ("[media] media: ov7670: calculate framerate properly for ov7675")
Signed-off-by: Jacopo Mondi jacopo+renesas@jmondi.org Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/i2c/ov7670.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/drivers/media/i2c/ov7670.c +++ b/drivers/media/i2c/ov7670.c @@ -155,10 +155,10 @@ MODULE_PARM_DESC(debug, "Debug level (0- #define REG_GFIX 0x69 /* Fix gain control */
#define REG_DBLV 0x6b /* PLL control an debugging */ -#define DBLV_BYPASS 0x00 /* Bypass PLL */ -#define DBLV_X4 0x01 /* clock x4 */ -#define DBLV_X6 0x10 /* clock x6 */ -#define DBLV_X8 0x11 /* clock x8 */ +#define DBLV_BYPASS 0x0a /* Bypass PLL */ +#define DBLV_X4 0x4a /* clock x4 */ +#define DBLV_X6 0x8a /* clock x6 */ +#define DBLV_X8 0xca /* clock x8 */
#define REG_REG76 0x76 /* OV's name */ #define R76_BLKPCOR 0x80 /* Black pixel correction enable */ @@ -833,7 +833,7 @@ static int ov7675_set_framerate(struct v if (ret < 0) return ret;
- return ov7670_write(sd, REG_DBLV, DBLV_X4); + return 0; }
static void ov7670_get_framerate_legacy(struct v4l2_subdev *sd, @@ -1578,11 +1578,7 @@ static int ov7670_probe(struct i2c_clien if (config->clock_speed) info->clock_speed = config->clock_speed;
- /* - * It should be allowed for ov7670 too when it is migrated to - * the new frame rate formula. - */ - if (config->pll_bypass && id->driver_data != MODEL_OV7670) + if (config->pll_bypass) info->pll_bypass = true;
if (config->pclk_hb_disable)
From: Jason Yan yanaijie@huawei.com
commit b90cd6f2b905905fb42671009dc0e27c310a16ae upstream.
When the lldd is processing the complete sas task in interrupt and set the task stat as SAS_TASK_STATE_DONE, the smp timeout timer is able to be triggered at the same time. And smp_task_timedout() will complete the task wheter the SAS_TASK_STATE_DONE is set or not. Then the sas task may freed before lldd end the interrupt process. Thus a use-after-free will happen.
Fix this by calling the complete() only when SAS_TASK_STATE_DONE is not set. And remove the check of the return value of the del_timer(). Once the LLDD sets DONE, it must call task->done(), which will call smp_task_done()->complete() and the task will be completed and freed correctly.
Reported-by: chenxiang chenxiang66@hisilicon.com Signed-off-by: Jason Yan yanaijie@huawei.com CC: John Garry john.garry@huawei.com CC: Johannes Thumshirn jthumshirn@suse.de CC: Ewan Milne emilne@redhat.com CC: Christoph Hellwig hch@lst.de CC: Tomas Henzl thenzl@redhat.com CC: Dan Williams dan.j.williams@intel.com CC: Hannes Reinecke hare@suse.com Reviewed-by: Hannes Reinecke hare@suse.com Reviewed-by: John Garry john.garry@huawei.com Reviewed-by: Johannes Thumshirn jthumshirn@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Cc: Guenter Roeck <linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/libsas/sas_expander.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -47,17 +47,16 @@ static void smp_task_timedout(unsigned l unsigned long flags;
spin_lock_irqsave(&task->task_state_lock, flags); - if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) + if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) { task->task_state_flags |= SAS_TASK_STATE_ABORTED; + complete(&task->slow_task->completion); + } spin_unlock_irqrestore(&task->task_state_lock, flags); - - complete(&task->slow_task->completion); }
static void smp_task_done(struct sas_task *task) { - if (!del_timer(&task->slow_task->timer)) - return; + del_timer(&task->slow_task->timer); complete(&task->slow_task->completion); }
[ Upstream commit 570f18b6a8d1f0e60e8caf30e66161b6438dcc91 ]
On HDaudio platforms, if playback is started when capture is working, there is no audible output.
This can be root-caused to the use of the rx|tx_mask to store an HDaudio stream tag.
If capture is stared before playback, rx_mask would be non-zero on HDaudio platform, then the channel number of playback, which is in the same codec dai with the capture, would be changed by soc_pcm_codec_params_fixup based on the tx_mask at first, then overwritten by this function based on rx_mask at last.
According to the author of tx|rx_mask, tx_mask is for playback and rx_mask is for capture. And stream direction is checked at all other references of tx|rx_mask in ASoC, so here should be an error. This patch checks stream direction for tx|rx_mask for fixup function.
This issue would affect not only HDaudio+ASoC, but also I2S codecs if the channel number based on rx_mask is not equal to the one for tx_mask. It could be rarely reproduecd because most drivers in kernel set the same channel number to tx|rx_mask or rx_mask is zero.
Tested on all platforms using stream_tag & HDaudio and intel I2S platforms.
Signed-off-by: Rander Wang rander.wang@linux.intel.com Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-pcm.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index f99eb8f442829..1c0d44c86c018 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -882,10 +882,13 @@ static int soc_pcm_hw_params(struct snd_pcm_substream *substream, codec_params = *params;
/* fixup params based on TDM slot masks */ - if (codec_dai->tx_mask) + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK && + codec_dai->tx_mask) soc_pcm_codec_params_fixup(&codec_params, codec_dai->tx_mask); - if (codec_dai->rx_mask) + + if (substream->stream == SNDRV_PCM_STREAM_CAPTURE && + codec_dai->rx_mask) soc_pcm_codec_params_fixup(&codec_params, codec_dai->rx_mask);
[ Upstream commit f0f2338a9cfaf71db895fa989ea7234e8a9b471d ]
The CS4270 does not by default increment the register address on consecutive writes. During normal operation it doesn't matter as all register accesses are done individually. At resume time after suspend, however, the regcache code gathers the biggest possible block of registers to sync and sends them one on one go.
To fix this, set the INCR bit in all cases.
Signed-off-by: Daniel Mack daniel@zonque.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs4270.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/codecs/cs4270.c b/sound/soc/codecs/cs4270.c index 3670086b9227c..f273533c66535 100644 --- a/sound/soc/codecs/cs4270.c +++ b/sound/soc/codecs/cs4270.c @@ -641,6 +641,7 @@ static const struct regmap_config cs4270_regmap = { .reg_defaults = cs4270_reg_defaults, .num_reg_defaults = ARRAY_SIZE(cs4270_reg_defaults), .cache_type = REGCACHE_RBTREE, + .write_flag_mask = CS4270_I2C_INCR,
.readable_reg = cs4270_reg_is_readable, .volatile_reg = cs4270_reg_is_volatile,
[ Upstream commit c63adb28f6d913310430f14c69f0a2ea55eed0cc ]
The common pins were mistakenly not added to the DAPM graph. Adding these pins will allow valid graphs to be created.
Signed-off-by: Annaliese McDermond nh6z@nh6z.net Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/tlv320aic32x4.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/codecs/tlv320aic32x4.c b/sound/soc/codecs/tlv320aic32x4.c index f2d3191961e14..714bd0e3fc71e 100644 --- a/sound/soc/codecs/tlv320aic32x4.c +++ b/sound/soc/codecs/tlv320aic32x4.c @@ -234,6 +234,8 @@ static const struct snd_soc_dapm_widget aic32x4_dapm_widgets[] = { SND_SOC_DAPM_INPUT("IN2_R"), SND_SOC_DAPM_INPUT("IN3_L"), SND_SOC_DAPM_INPUT("IN3_R"), + SND_SOC_DAPM_INPUT("CM_L"), + SND_SOC_DAPM_INPUT("CM_R"), };
static const struct snd_soc_dapm_route aic32x4_dapm_routes[] = {
[ Upstream commit 583feb08e7f7ac9d533b446882eb3a54737a6dbb ]
When an event is programmed with attr.wakeup_events=N (N>0), it means the caller is interested in getting a user level notification after N samples have been recorded in the kernel sampling buffer.
With precise events on Intel processors, the kernel uses PEBS. The kernel tries minimize sampling overhead by verifying if the event configuration is compatible with multi-entry PEBS mode. If so, the kernel is notified only when the buffer has reached its threshold. Other PEBS operates in single-entry mode, the kenrel is notified for each PEBS sample.
The problem is that the current implementation look at frequency mode and event sample_type but ignores the wakeup_events field. Thus, it may not be possible to receive a notification after each precise event.
This patch fixes this problem by disabling multi-entry PEBS if wakeup_events is non-zero.
Signed-off-by: Stephane Eranian eranian@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Andi Kleen ak@linux.intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Cc: kan.liang@intel.com Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/perf_event_intel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 7b79c80ce029a..325ed90511cff 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2513,7 +2513,7 @@ static int intel_pmu_hw_config(struct perf_event *event) return ret;
if (event->attr.precise_ip) { - if (!event->attr.freq) { + if (!(event->attr.freq || event->attr.wakeup_events)) { event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD; if (!(event->attr.sample_type & ~intel_pmu_free_running_flags(event)))
[ Upstream commit 5c2442fd78998af60e13aba506d103f7f43f8701 ]
If scsi cmd sglist is not suitable for DDP then csiostor driver uses preallocated buffers for DDP, because of this data copy is required from DDP buffer to scsi cmd sglist before calling ->scsi_done().
Signed-off-by: Varun Prakash varun@chelsio.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/csiostor/csio_scsi.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/csiostor/csio_scsi.c b/drivers/scsi/csiostor/csio_scsi.c index c2a6f9f294271..ddbdaade654d6 100644 --- a/drivers/scsi/csiostor/csio_scsi.c +++ b/drivers/scsi/csiostor/csio_scsi.c @@ -1713,8 +1713,11 @@ csio_scsi_err_handler(struct csio_hw *hw, struct csio_ioreq *req) }
out: - if (req->nsge > 0) + if (req->nsge > 0) { scsi_dma_unmap(cmnd); + if (req->dcopy && (host_status == DID_OK)) + host_status = csio_scsi_copy_to_sgl(hw, req); + }
cmnd->result = (((host_status) << 16) | scsi_status); cmnd->scsi_done(cmnd);
[ Upstream commit 3c677d206210f53a4be972211066c0f1cd47fe12 ]
The exlcusion range limit register needs to contain the base-address of the last page that is part of the range, as bits 0-11 of this register are treated as 0xfff by the hardware for comparisons.
So correctly set the exclusion range in the hardware to the last page which is _in_ the range.
Fixes: b2026aa2dce44 ('x86, AMD IOMMU: add functions for programming IOMMU MMIO space') Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/amd_iommu_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 94f1bf772ec93..db85cc5791dce 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -295,7 +295,7 @@ static void iommu_write_l2(struct amd_iommu *iommu, u8 address, u32 val) static void iommu_set_exclusion_range(struct amd_iommu *iommu) { u64 start = iommu->exclusion_start & PAGE_MASK; - u64 limit = (start + iommu->exclusion_length) & PAGE_MASK; + u64 limit = (start + iommu->exclusion_length - 1) & PAGE_MASK; u64 entry;
if (!iommu->exclusion_start)
[ Upstream commit 59c39840f5abf4a71e1810a8da71aaccd6c17d26 ]
When irq_set_affinity_notifier() replaces the notifier, then the reference count on the old notifier is dropped which causes it to be freed. But nothing ensures that the old notifier is not longer queued in the work list. If it is queued this results in a use after free and possibly in work list corruption.
Ensure that the work is canceled before the reference is dropped.
Signed-off-by: Prasad Sodagudi psodagud@codeaurora.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: marc.zyngier@arm.com Link: https://lkml.kernel.org/r/1553439424-6529-1-git-send-email-psodagud@codeauro... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/irq/manage.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 83cea913983c5..92c7eb1aeded9 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -319,8 +319,10 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify) desc->affinity_notify = notify; raw_spin_unlock_irqrestore(&desc->lock, flags);
- if (old_notify) + if (old_notify) { + cancel_work_sync(&old_notify->work); kref_put(&old_notify->kref, old_notify->release); + }
return 0; }
From: Thinh Nguyen Thinh.Nguyen@synopsys.com
commit 8d791929b2fbdf7734c1596d808e55cb457f4562 upstream.
The max possible value for DCTL.LPM_NYET_THRES is 15 and not 255. Change the default value to 15.
Cc: stable@vger.kernel.org Fixes: 80caf7d21adc ("usb: dwc3: add lpm erratum support") Signed-off-by: Thinh Nguyen thinhn@synopsys.com Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/dwc3/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -867,7 +867,7 @@ static int dwc3_probe(struct platform_de dwc->regs_size = resource_size(res);
/* default to highest possible threshold */ - lpm_nyet_threshold = 0xff; + lpm_nyet_threshold = 0xf;
/* default to -3.5dB de-emphasis */ tx_de_emphasis = 1;
From: Andrew Vasquez andrewv@marvell.com
commit 5cbdae10bf11f96e30b4d14de7b08c8b490e903c upstream.
Commit e6f77540c067 ("scsi: qla2xxx: Fix an integer overflow in sysfs code") incorrectly set 'optrom_region_size' to 'start+size', which can overflow option-rom boundaries when 'start' is non-zero. Continue setting optrom_region_size to the proper adjusted value of 'size'.
Fixes: e6f77540c067 ("scsi: qla2xxx: Fix an integer overflow in sysfs code") Cc: stable@vger.kernel.org Signed-off-by: Andrew Vasquez andrewv@marvell.com Signed-off-by: Himanshu Madhani hmadhani@marvell.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_attr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -431,7 +431,7 @@ qla2x00_sysfs_write_optrom_ctl(struct fi }
ha->optrom_region_start = start; - ha->optrom_region_size = start + size; + ha->optrom_region_size = size;
ha->optrom_state = QLA_SREADING; ha->optrom_buffer = vmalloc(ha->optrom_region_size); @@ -504,7 +504,7 @@ qla2x00_sysfs_write_optrom_ctl(struct fi }
ha->optrom_region_start = start; - ha->optrom_region_size = start + size; + ha->optrom_region_size = size;
ha->optrom_state = QLA_SWRITING; ha->optrom_buffer = vmalloc(ha->optrom_region_size);
From: Young Xiao YangX92@hotmail.com
commit a1616a5ac99ede5d605047a9012481ce7ff18b16 upstream.
Struct ca is copied from userspace. It is not checked whether the "name" field is NULL terminated, which allows local users to obtain potentially sensitive information from kernel stack memory, via a HIDPCONNADD command.
This vulnerability is similar to CVE-2011-1079.
Signed-off-by: Young Xiao YangX92@hotmail.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bluetooth/hidp/sock.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/bluetooth/hidp/sock.c +++ b/net/bluetooth/hidp/sock.c @@ -76,6 +76,7 @@ static int hidp_sock_ioctl(struct socket sockfd_put(csock); return err; } + ca.name[sizeof(ca.name)-1] = 0;
err = hidp_connection_add(&ca, csock, isock); if (!err && copy_to_user(argp, &ca, sizeof(ca)))
From: Marcel Holtmann marcel@holtmann.org
commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream.
The minimum encryption key size for LE connections is 56 bits and to align LE with BR/EDR, enforce 56 bits of minimum encryption key size for BR/EDR connections as well.
Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Johan Hedberg johan.hedberg@intel.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/bluetooth/hci_core.h | 3 +++ net/bluetooth/hci_conn.c | 8 ++++++++ 2 files changed, 11 insertions(+)
--- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -174,6 +174,9 @@ struct adv_info {
#define HCI_MAX_SHORT_NAME_LENGTH 10
+/* Min encryption key size to match with SMP */ +#define HCI_MIN_ENC_KEY_SIZE 7 + /* Default LE RPA expiry time, 15 minutes */ #define HCI_DEFAULT_RPA_TIMEOUT (15 * 60)
--- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1177,6 +1177,14 @@ int hci_conn_check_link_mode(struct hci_ !test_bit(HCI_CONN_ENCRYPT, &conn->flags)) return 0;
+ /* The minimum encryption key size needs to be enforced by the + * host stack before establishing any L2CAP connections. The + * specification in theory allows a minimum of 1, but to align + * BR/EDR and LE transports, a minimum of 7 is chosen. + */ + if (conn->enc_key_size < HCI_MIN_ENC_KEY_SIZE) + return 0; + return 1; }
From: Oliver Neukum oneukum@suse.com
commit 3ae62a42090f1ed48e2313ed256a1182a85fb575 upstream.
This is the UAS version of
747668dbc061b3e62bc1982767a3a1f9815fcf0e usb-storage: Set virt_boundary_mask to avoid SG overflows
We are not as likely to be vulnerable as storage, as it is unlikelier that UAS is run over a controller without native support for SG, but the issue exists. The issue has been existing since the inception of the driver.
Fixes: 115bb1ffa54c ("USB: Add UAS driver") Signed-off-by: Oliver Neukum oneukum@suse.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/uas.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-)
--- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -772,23 +772,33 @@ static int uas_slave_alloc(struct scsi_d { struct uas_dev_info *devinfo = (struct uas_dev_info *)sdev->host->hostdata; + int maxp;
sdev->hostdata = devinfo;
- /* USB has unusual DMA-alignment requirements: Although the - * starting address of each scatter-gather element doesn't matter, - * the length of each element except the last must be divisible - * by the Bulk maxpacket value. There's currently no way to - * express this by block-layer constraints, so we'll cop out - * and simply require addresses to be aligned at 512-byte - * boundaries. This is okay since most block I/O involves - * hardware sectors that are multiples of 512 bytes in length, - * and since host controllers up through USB 2.0 have maxpacket - * values no larger than 512. - * - * But it doesn't suffice for Wireless USB, where Bulk maxpacket - * values can be as large as 2048. To make that work properly - * will require changes to the block layer. + /* + * We have two requirements here. We must satisfy the requirements + * of the physical HC and the demands of the protocol, as we + * definitely want no additional memory allocation in this path + * ruling out using bounce buffers. + * + * For a transmission on USB to continue we must never send + * a package that is smaller than maxpacket. Hence the length of each + * scatterlist element except the last must be divisible by the + * Bulk maxpacket value. + * If the HC does not ensure that through SG, + * the upper layer must do that. We must assume nothing + * about the capabilities off the HC, so we use the most + * pessimistic requirement. + */ + + maxp = usb_maxpacket(devinfo->udev, devinfo->data_in_pipe, 0); + blk_queue_virt_boundary(sdev->request_queue, maxp - 1); + + /* + * The protocol has no requirements on alignment in the strict sense. + * Controllers may or may not have alignment restrictions. + * As this is not exported, we use an extremely conservative guess. */ blk_queue_update_dma_alignment(sdev->request_queue, (512 - 1));
From: WANG Cong xiyou.wangcong@gmail.com
commit 8651be8f14a12d24f203f283601d9b0418c389ff upstream.
Baozeng reported this deadlock case:
CPU0 CPU1 ---- ---- lock([ 165.136033] sk_lock-AF_INET6); lock([ 165.136033] rtnl_mutex); lock([ 165.136033] sk_lock-AF_INET6); lock([ 165.136033] rtnl_mutex);
Similar to commit 87e9f0315952 ("ipv4: fix a potential deadlock in mcast getsockopt() path") this is due to we still have a case, ipv6_sock_mc_close(), where we acquire sk_lock before rtnl_lock. Close this deadlock with the similar solution, that is always acquire rtnl lock first.
Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket") Reported-by: Baozeng Ding sploving1@gmail.com Tested-by: Baozeng Ding sploving1@gmail.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Cc: Zubin Mithra zsm@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/addrconf.h | 1 + net/ipv6/ipv6_sockglue.c | 3 ++- net/ipv6/mcast.c | 17 ++++++++++++----- 3 files changed, 15 insertions(+), 6 deletions(-)
--- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -162,6 +162,7 @@ int ipv6_sock_mc_join(struct sock *sk, i const struct in6_addr *addr); int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr); +void __ipv6_sock_mc_close(struct sock *sk); void ipv6_sock_mc_close(struct sock *sk); bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr, const struct in6_addr *src_addr); --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -121,6 +121,7 @@ struct ipv6_txoptions *ipv6_update_optio static bool setsockopt_needs_rtnl(int optname) { switch (optname) { + case IPV6_ADDRFORM: case IPV6_ADD_MEMBERSHIP: case IPV6_DROP_MEMBERSHIP: case IPV6_JOIN_ANYCAST: @@ -199,7 +200,7 @@ static int do_ipv6_setsockopt(struct soc }
fl6_free_socklist(sk); - ipv6_sock_mc_close(sk); + __ipv6_sock_mc_close(sk);
/* * Sock is moving from IPv6 to IPv4 (sk_prot), so --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -276,16 +276,14 @@ static struct inet6_dev *ip6_mc_find_dev return idev; }
-void ipv6_sock_mc_close(struct sock *sk) +void __ipv6_sock_mc_close(struct sock *sk) { struct ipv6_pinfo *np = inet6_sk(sk); struct ipv6_mc_socklist *mc_lst; struct net *net = sock_net(sk);
- if (!rcu_access_pointer(np->ipv6_mc_list)) - return; + ASSERT_RTNL();
- rtnl_lock(); while ((mc_lst = rtnl_dereference(np->ipv6_mc_list)) != NULL) { struct net_device *dev;
@@ -303,8 +301,17 @@ void ipv6_sock_mc_close(struct sock *sk)
atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc); kfree_rcu(mc_lst, rcu); - } +} + +void ipv6_sock_mc_close(struct sock *sk) +{ + struct ipv6_pinfo *np = inet6_sk(sk); + + if (!rcu_access_pointer(np->ipv6_mc_list)) + return; + rtnl_lock(); + __ipv6_sock_mc_close(sk); rtnl_unlock(); }
From: Ross Zwisler zwisler@chromium.org
commit 0efa3334d65b7f421ba12382dfa58f6ff5bf83c4 upstream.
Currently in sst_dsp_new() if we get an error return from sst_dma_new() we just print an error message and then still complete the function successfully. This means that we are trying to run without sst->dma properly set up, which will result in NULL pointer dereference when sst->dma is later used. This was happening for me in sst_dsp_dma_get_channel():
struct sst_dma *dma = dsp->dma; ... dma->ch = dma_request_channel(mask, dma_chan_filter, dsp);
This resulted in:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: sst_dsp_dma_get_channel+0x4f/0x125 [snd_soc_sst_firmware]
Fix this by adding proper error handling for the case where we fail to set up DMA.
This change only affects Haswell and Broadwell systems. Baytrail systems explicilty opt-out of DMA via sst->pdata->resindex_dma_base being set to -1.
Signed-off-by: Ross Zwisler zwisler@google.com Cc: stable@vger.kernel.org Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/intel/common/sst-dsp.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/sound/soc/intel/common/sst-dsp.c +++ b/sound/soc/intel/common/sst-dsp.c @@ -463,11 +463,15 @@ struct sst_dsp *sst_dsp_new(struct devic goto irq_err;
err = sst_dma_new(sst); - if (err) - dev_warn(dev, "sst_dma_new failed %d\n", err); + if (err) { + dev_err(dev, "sst_dma_new failed %d\n", err); + goto dma_err; + }
return sst;
+dma_err: + free_irq(sst->irq, sst); irq_err: if (sst->ops->free) sst->ops->free(sst);
From: Ben Hutchings ben@decadent.org.uk
The timer_stats facility should filter and translate PIDs if opened from a non-initial PID namespace, to avoid leaking information about the wider system. It should also not show kernel virtual addresses. Unfortunately it has now been removed upstream (as redundant) instead of being fixed.
For stable, fix the leak by restricting access to root only. A similar change was already made for the /proc/timer_list file.
Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/timer_stats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/timer_stats.c +++ b/kernel/time/timer_stats.c @@ -417,7 +417,7 @@ static int __init init_tstats_procfs(voi { struct proc_dir_entry *pe;
- pe = proc_create("timer_stats", 0644, NULL, &tstats_fops); + pe = proc_create("timer_stats", 0600, NULL, &tstats_fops); if (!pe) return -ENOMEM; return 0;
commit 8d29d16d21342a0c86405d46de0c4ac5daf1760f upstream
If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init time, the following panic can be caused by running
% ebtables -t broute -F BROUTING
from a 32-bit user level on a 64-bit kernel. This patch replaces kmalloc_array with kcalloc when allocating xt.
[ 474.680846] BUG: unable to handle kernel paging request at 0000000009600920 [ 474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0 [ 474.693838] Oops: 0000 [#1] SMP [ 474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1 [ 474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013 [ 474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables] [ 474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d [ 474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207 [ 474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249 [ 474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124 [ 474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f [ 474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001 [ 474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8 [ 474.780234] FS: 0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700 [ 474.788612] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0 [ 474.802052] Call Trace: [ 474.804789] compat_do_replace+0x1fb/0x2a3 [ebtables] [ 474.810105] compat_do_ebt_set_ctl+0x69/0xe6 [ebtables] [ 474.815605] ? try_module_get+0x37/0x42 [ 474.819716] compat_nf_setsockopt+0x4f/0x6d [ 474.824172] compat_ip_setsockopt+0x7e/0x8c [ 474.828641] compat_raw_setsockopt+0x16/0x3a [ 474.833220] compat_sock_common_setsockopt+0x1d/0x24 [ 474.838458] __compat_sys_setsockopt+0x17e/0x1b1 [ 474.843343] ? __check_object_size+0x76/0x19a [ 474.847960] __ia32_compat_sys_socketcall+0x1cb/0x25b [ 474.853276] do_fast_syscall_32+0xaf/0xf6 [ 474.857548] entry_SYSENTER_compat+0x6b/0x7a
Signed-off-by: Francesco Ruggeri fruggeri@arista.com Acked-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Zubin Mithra zsm@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/x_tables.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index b6e72af152379..cdafbd38a456b 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1699,7 +1699,7 @@ static int __init xt_init(void) seqcount_init(&per_cpu(xt_recseq, i)); }
- xt = kmalloc(sizeof(struct xt_af) * NFPROTO_NUMPROTO, GFP_KERNEL); + xt = kcalloc(NFPROTO_NUMPROTO, sizeof(struct xt_af), GFP_KERNEL); if (!xt) return -ENOMEM;
From: Michal Hocko mhocko@suse.com
commit f01f17d3705bb6081c9e5728078f64067982be36 upstream.
Mike has reported a considerable overhead of refresh_cpu_vm_stats from the idle entry during pipe test:
12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 4.75% [kernel] [k] __schedule 4.70% [kernel] [k] mutex_unlock 3.14% [kernel] [k] __switch_to
This is caused by commit 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") which has placed quiet_vmstat into cpu_idle_loop. The main reason here seems to be that the idle entry has to get over all zones and perform atomic operations for each vmstat entry even though there might be no per cpu diffs. This is a pointless overhead for _each_ idle entry.
Make sure that quiet_vmstat is as light as possible.
First of all it doesn't make any sense to do any local sync if the current cpu is already set in oncpu_stat_off because vmstat_update puts itself there only if there is nothing to do.
Then we can check need_update which should be a cheap way to check for potential per-cpu diffs and only then do refresh_cpu_vm_stats.
The original patch also did cancel_delayed_work which we are not doing here. There are two reasons for that. Firstly cancel_delayed_work from idle context will blow up on RT kernels (reported by Mike):
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 Call Trace: dump_stack+0x49/0x67 ___might_sleep+0xf5/0x180 rt_spin_lock+0x20/0x50 try_to_grab_pending+0x69/0x240 cancel_delayed_work+0x26/0xe0 quiet_vmstat+0x75/0xa0 cpu_idle_loop+0x38/0x3e0 cpu_startup_entry+0x13/0x20 start_secondary+0x114/0x140
And secondly, even on !RT kernels it might add some non trivial overhead which is not necessary. Even if the vmstat worker wakes up and preempts idle then it will be most likely a single shot noop because the stats were already synced and so it would end up on the oncpu_stat_off anyway. We just need to teach both vmstat_shepherd and vmstat_update to stop scheduling the worker if there is nothing to do.
[mgalbraith@suse.de: cancel pending work of the cpu_stat_off CPU] Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Mike Galbraith umgwanakikbuti@gmail.com Acked-by: Christoph Lameter cl@linux.com Signed-off-by: Mike Galbraith mgalbraith@suse.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Signed-off-by: Daniel Wagner wagi@monom.org
--- mm/vmstat.c | 68 ++++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 46 insertions(+), 22 deletions(-)
--- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1395,10 +1395,15 @@ static void vmstat_update(struct work_st * Counters were updated so we expect more updates * to occur in the future. Keep on running the * update worker thread. + * If we were marked on cpu_stat_off clear the flag + * so that vmstat_shepherd doesn't schedule us again. */ - queue_delayed_work_on(smp_processor_id(), vmstat_wq, - this_cpu_ptr(&vmstat_work), - round_jiffies_relative(sysctl_stat_interval)); + if (!cpumask_test_and_clear_cpu(smp_processor_id(), + cpu_stat_off)) { + queue_delayed_work_on(smp_processor_id(), vmstat_wq, + this_cpu_ptr(&vmstat_work), + round_jiffies_relative(sysctl_stat_interval)); + } } else { /* * We did not update any counters so the app may be in @@ -1426,18 +1431,6 @@ static void vmstat_update(struct work_st * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. */ -void quiet_vmstat(void) -{ - if (system_state != SYSTEM_RUNNING) - return; - - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); -} - /* * Check if the diffs for a certain cpu indicate that * an update is needed. @@ -1461,6 +1454,30 @@ static bool need_update(int cpu) return false; }
+void quiet_vmstat(void) +{ + if (system_state != SYSTEM_RUNNING) + return; + + /* + * If we are already in hands of the shepherd then there + * is nothing for us to do here. + */ + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) + return; + + if (!need_update(smp_processor_id())) + return; + + /* + * Just refresh counters and do not care about the pending delayed + * vmstat_update. It doesn't fire that often to matter and canceling + * it would be too expensive from this path. + * vmstat_shepherd will take care about that for us. + */ + refresh_cpu_vm_stats(false); +} +
/* * Shepherd worker thread that checks the @@ -1478,18 +1495,25 @@ static void vmstat_shepherd(struct work_
get_online_cpus(); /* Check processors whose vmstat worker threads have been disabled */ - for_each_cpu(cpu, cpu_stat_off) - if (need_update(cpu) && - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) - - queue_delayed_work_on(cpu, vmstat_wq, - &per_cpu(vmstat_work, cpu), 0); + for_each_cpu(cpu, cpu_stat_off) { + struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
+ if (need_update(cpu)) { + if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) + queue_delayed_work_on(cpu, vmstat_wq, dw, 0); + } else { + /* + * Cancel the work if quiet_vmstat has put this + * cpu on cpu_stat_off because the work item might + * be still scheduled + */ + cancel_delayed_work(dw); + } + } put_online_cpus();
schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); - }
static void __init start_shepherd_timer(void)
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 1cbd7a64959d33e7a2a1fa2bf36a62b350a9fcbd upstream.
It seems that the default case should return AE_CTRL_TERMINATE, instead of falling through to case ACPI_RESOURCE_TYPE_END_TAG and returning AE_OK; otherwise the line of code at the end of the function is unreachable and makes no sense:
return AE_CTRL_TERMINATE;
This fix is based on the following thread of discussion:
https://lore.kernel.org/patchwork/patch/959782/
Fixes: 33a04454527e ("sony-laptop: Add SNY6001 device handling (sonypi reimplementation)") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/sony-laptop.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/platform/x86/sony-laptop.c +++ b/drivers/platform/x86/sony-laptop.c @@ -4394,14 +4394,16 @@ sony_pic_read_possible_resource(struct a } return AE_OK; } + + case ACPI_RESOURCE_TYPE_END_TAG: + return AE_OK; + default: dprintk("Resource %d isn't an IRQ nor an IO port\n", resource->type); + return AE_CTRL_TERMINATE;
- case ACPI_RESOURCE_TYPE_END_TAG: - return AE_OK; } - return AE_CTRL_TERMINATE; }
static int sony_pic_possible_resources(struct acpi_device *device)
[ Upstream commit 62039b6aef63380ba7a37c113bbaeee8a55c5342 ]
When cancel_delayed_work() returns, the delayed work may still be running. This means that the core could potentially free the private structure (struct xadc) while the delayed work is still using it. This is a potential use-after-free.
Fix by calling cancel_delayed_work_sync(), which waits for any residual work to finish before returning.
Signed-off-by: Sven Van Asbroeck TheSven73@gmail.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/xilinx-xadc-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/xilinx-xadc-core.c b/drivers/iio/adc/xilinx-xadc-core.c index 475c5a74f2d1f..6398e86a272b8 100644 --- a/drivers/iio/adc/xilinx-xadc-core.c +++ b/drivers/iio/adc/xilinx-xadc-core.c @@ -1299,7 +1299,7 @@ static int xadc_remove(struct platform_device *pdev) } free_irq(irq, indio_dev); clk_disable_unprepare(xadc->clk); - cancel_delayed_work(&xadc->zynq_unmask_work); + cancel_delayed_work_sync(&xadc->zynq_unmask_work); kfree(xadc->data); kfree(indio_dev->channels);
[ Upstream commit 96dd86871e1fffbc39e4fa61c9c75ec54ee9af0f ]
According to HUTRR77 usage 0x29f from the consumer page is reserved for the Desktop application to present all running user’s application windows. Linux defines KEY_SCALE to request Compiz Scale (Expose) mode, so let's add the mapping.
Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-input.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c index 8d74e691ac90f..01b41ff430564 100644 --- a/drivers/hid/hid-input.c +++ b/drivers/hid/hid-input.c @@ -913,6 +913,8 @@ static void hidinput_configure_usage(struct hid_input *hidinput, struct hid_fiel case 0x2cb: map_key_clear(KEY_KBDINPUTASSIST_ACCEPT); break; case 0x2cc: map_key_clear(KEY_KBDINPUTASSIST_CANCEL); break;
+ case 0x29f: map_key_clear(KEY_SCALE); break; + default: map_key_clear(KEY_UNKNOWN); } break;
[ Upstream commit 7975a1d6a7afeb3eb61c971a153d24dd8fa032f3 ]
According to HUTRR73 usages 0x79, 0x7a and 0x7c from the consumer page correspond to Brightness Up/Down/Toggle keys, so let's add the mappings.
Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-input.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c index 01b41ff430564..ee3c66c020438 100644 --- a/drivers/hid/hid-input.c +++ b/drivers/hid/hid-input.c @@ -783,6 +783,10 @@ static void hidinput_configure_usage(struct hid_input *hidinput, struct hid_fiel case 0x074: map_key_clear(KEY_BRIGHTNESS_MAX); break; case 0x075: map_key_clear(KEY_BRIGHTNESS_AUTO); break;
+ case 0x079: map_key_clear(KEY_KBDILLUMUP); break; + case 0x07a: map_key_clear(KEY_KBDILLUMDOWN); break; + case 0x07c: map_key_clear(KEY_KBDILLUMTOGGLE); break; + case 0x082: map_key_clear(KEY_VIDEO_NEXT); break; case 0x083: map_key_clear(KEY_LAST); break; case 0x084: map_key_clear(KEY_ENTER); break;
[ Upstream commit 486fa92df4707b5df58d6508728bdb9321a59766 ]
In case kmemdup fails, the fix releases resources and returns to avoid the NULL pointer dereference.
Signed-off-by: Aditya Pakki pakki001@umn.edu Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvdimm/btt_devs.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c index cb477518dd0e4..4c129450495da 100644 --- a/drivers/nvdimm/btt_devs.c +++ b/drivers/nvdimm/btt_devs.c @@ -170,14 +170,15 @@ static struct device *__nd_btt_create(struct nd_region *nd_region, return NULL;
nd_btt->id = ida_simple_get(&nd_region->btt_ida, 0, 0, GFP_KERNEL); - if (nd_btt->id < 0) { - kfree(nd_btt); - return NULL; - } + if (nd_btt->id < 0) + goto out_nd_btt;
nd_btt->lbasize = lbasize; - if (uuid) + if (uuid) { uuid = kmemdup(uuid, 16, GFP_KERNEL); + if (!uuid) + goto out_put_id; + } nd_btt->uuid = uuid; dev = &nd_btt->dev; dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id); @@ -192,6 +193,13 @@ static struct device *__nd_btt_create(struct nd_region *nd_region, return NULL; } return dev; + +out_put_id: + ida_simple_remove(&nd_region->btt_ida, nd_btt->id); + +out_nd_btt: + kfree(nd_btt); + return NULL; }
struct device *nd_btt_create(struct nd_region *nd_region)
[ Upstream commit 2cc9637ce825f3a9f51f8f78af7474e9e85bfa5f ]
The DASD driver incorrectly limits the maximum number of blocks of ECKD DASD volumes to 32 bit numbers. Volumes with a capacity greater than 2^32-1 blocks are incorrectly recognized as smaller volumes.
This results in the following volume capacity limits depending on the formatted block size:
BLKSIZE MAX_GB MAX_CYL 512 2047 5843492 1024 4095 8676701 2048 8191 13634816 4096 16383 23860929
The same problem occurs when a volume with more than 17895697 cylinders is accessed in raw-track-access mode.
Fix this problem by adding an explicit type cast when calculating the maximum number of blocks.
Signed-off-by: Peter Oberparleiter oberpar@linux.ibm.com Reviewed-by: Stefan Haberland sth@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/block/dasd_eckd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c index 80a43074c2f9a..c530610f61ac9 100644 --- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -2066,14 +2066,14 @@ static int dasd_eckd_end_analysis(struct dasd_block *block) blk_per_trk = recs_per_track(&private->rdc_data, 0, block->bp_block);
raw: - block->blocks = (private->real_cyl * + block->blocks = ((unsigned long) private->real_cyl * private->rdc_data.trk_per_cyl * blk_per_trk);
dev_info(&device->cdev->dev, - "DASD with %d KB/block, %d KB total size, %d KB/track, " + "DASD with %u KB/block, %lu KB total size, %u KB/track, " "%s\n", (block->bp_block >> 10), - ((private->real_cyl * + (((unsigned long) private->real_cyl * private->rdc_data.trk_per_cyl * blk_per_trk * (block->bp_block >> 9)) >> 1), ((blk_per_trk * block->bp_block) >> 10),
[ Upstream commit 5712f3301a12c0c3de9cc423484496b0464f2faf ]
The spinlock in the raw3270_view structure is used by con3270, tty3270 and fs3270 in different ways. For con3270 the lock can be acquired in irq context, for tty3270 and fs3270 the highest context is bh.
Lockdep sees the view->lock as a single class and if the 3270 driver is used for the console the following message is generated:
WARNING: inconsistent lock state 5.1.0-rc3-05157-g5c168033979d #12 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes: (____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330
Introduce a lockdep subclass for the view lock to distinguish bh from irq locks.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/char/con3270.c | 2 +- drivers/s390/char/fs3270.c | 3 ++- drivers/s390/char/raw3270.c | 3 ++- drivers/s390/char/raw3270.h | 4 +++- drivers/s390/char/tty3270.c | 3 ++- 5 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c index bae98521c808c..3e5a7912044fa 100644 --- a/drivers/s390/char/con3270.c +++ b/drivers/s390/char/con3270.c @@ -627,7 +627,7 @@ con3270_init(void) (void (*)(unsigned long)) con3270_read_tasklet, (unsigned long) condev->read);
- raw3270_add_view(&condev->view, &con3270_fn, 1); + raw3270_add_view(&condev->view, &con3270_fn, 1, RAW3270_VIEW_LOCK_IRQ);
INIT_LIST_HEAD(&condev->freemem); for (i = 0; i < CON3270_STRING_PAGES; i++) { diff --git a/drivers/s390/char/fs3270.c b/drivers/s390/char/fs3270.c index 71e9747380149..f0c86bcbe3161 100644 --- a/drivers/s390/char/fs3270.c +++ b/drivers/s390/char/fs3270.c @@ -463,7 +463,8 @@ fs3270_open(struct inode *inode, struct file *filp)
init_waitqueue_head(&fp->wait); fp->fs_pid = get_pid(task_pid(current)); - rc = raw3270_add_view(&fp->view, &fs3270_fn, minor); + rc = raw3270_add_view(&fp->view, &fs3270_fn, minor, + RAW3270_VIEW_LOCK_BH); if (rc) { fs3270_free_view(&fp->view); goto out; diff --git a/drivers/s390/char/raw3270.c b/drivers/s390/char/raw3270.c index 220acb4cbee52..9c350e6d75bf7 100644 --- a/drivers/s390/char/raw3270.c +++ b/drivers/s390/char/raw3270.c @@ -956,7 +956,7 @@ raw3270_deactivate_view(struct raw3270_view *view) * Add view to device with minor "minor". */ int -raw3270_add_view(struct raw3270_view *view, struct raw3270_fn *fn, int minor) +raw3270_add_view(struct raw3270_view *view, struct raw3270_fn *fn, int minor, int subclass) { unsigned long flags; struct raw3270 *rp; @@ -978,6 +978,7 @@ raw3270_add_view(struct raw3270_view *view, struct raw3270_fn *fn, int minor) view->cols = rp->cols; view->ascebc = rp->ascebc; spin_lock_init(&view->lock); + lockdep_set_subclass(&view->lock, subclass); list_add(&view->list, &rp->view_list); rc = 0; spin_unlock_irqrestore(get_ccwdev_lock(rp->cdev), flags); diff --git a/drivers/s390/char/raw3270.h b/drivers/s390/char/raw3270.h index e1e41c2861fbb..5ae54317857a0 100644 --- a/drivers/s390/char/raw3270.h +++ b/drivers/s390/char/raw3270.h @@ -155,6 +155,8 @@ struct raw3270_fn { struct raw3270_view { struct list_head list; spinlock_t lock; +#define RAW3270_VIEW_LOCK_IRQ 0 +#define RAW3270_VIEW_LOCK_BH 1 atomic_t ref_count; struct raw3270 *dev; struct raw3270_fn *fn; @@ -163,7 +165,7 @@ struct raw3270_view { unsigned char *ascebc; /* ascii -> ebcdic table */ };
-int raw3270_add_view(struct raw3270_view *, struct raw3270_fn *, int); +int raw3270_add_view(struct raw3270_view *, struct raw3270_fn *, int, int); int raw3270_activate_view(struct raw3270_view *); void raw3270_del_view(struct raw3270_view *); void raw3270_deactivate_view(struct raw3270_view *); diff --git a/drivers/s390/char/tty3270.c b/drivers/s390/char/tty3270.c index e96fc7fd94984..ab95d24b991b4 100644 --- a/drivers/s390/char/tty3270.c +++ b/drivers/s390/char/tty3270.c @@ -937,7 +937,8 @@ static int tty3270_install(struct tty_driver *driver, struct tty_struct *tty) return PTR_ERR(tp);
rc = raw3270_add_view(&tp->view, &tty3270_fn, - tty->index + RAW3270_FIRSTMINOR); + tty->index + RAW3270_FIRSTMINOR, + RAW3270_VIEW_LOCK_BH); if (rc) { tty3270_free_view(tp); return rc;
[ Upstream commit 7a223e06b1a411cef6c4cd7a9b9a33c8d225b10e ]
In __apic_accept_irq() interface trig_mode is int and actually on some code paths it is set above u8:
kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to (1 << 15) & e->msi.data
kvm_apic_local_deliver sets it to reg & (1 << 15).
Fix the immediate issue by making 'tm' into u16. We may also want to adjust __apic_accept_irq() interface and use proper sizes for vector, level, trig_mode but this is not urgent.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/trace.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index ab9ae67a80e44..0ec94c6b47576 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -434,13 +434,13 @@ TRACE_EVENT(kvm_apic_ipi, );
TRACE_EVENT(kvm_apic_accept_irq, - TP_PROTO(__u32 apicid, __u16 dm, __u8 tm, __u8 vec), + TP_PROTO(__u32 apicid, __u16 dm, __u16 tm, __u8 vec), TP_ARGS(apicid, dm, tm, vec),
TP_STRUCT__entry( __field( __u32, apicid ) __field( __u16, dm ) - __field( __u8, tm ) + __field( __u16, tm ) __field( __u8, vec ) ),
[ Upstream commit f32c2877bcb068a718bb70094cd59ccc29d4d082 ]
There was a missing comparison with 0 when checking if type is "s64" or "u64". Therefore, the body of the if-statement was entered if "type" was "u64" or not "s64", which made the first strcmp() redundant since if type is "u64", it's not "s64".
If type is "s64", the body of the if-statement is not entered but since the remainder of the function consists of if-statements which will not be entered if type is "s64", we will just return "val", which is correct, albeit at the cost of a few more calls to strcmp(), i.e., it will behave just as if the if-statement was entered.
If type is neither "s64" or "u64", the body of the if-statement will be entered incorrectly and "val" returned. This means that any type that is checked after "s64" and "u64" is handled the same way as "s64" and "u64", i.e., the limiting of "val" to fit in for example "s8" is never reached.
This was introduced in the kernel tree when the sources were copied from trace-cmd in commit f7d82350e597 ("tools/events: Add files to create libtraceevent.a"), and in the trace-cmd repo in 1cdbae6035cei ("Implement typecasting in parser") when the function was introduced, i.e., it has always behaved the wrong way.
Detected by cppcheck.
Signed-off-by: Rikard Falkeborn rikard.falkeborn@gmail.com Reviewed-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Tzvetomir Stoyanov tstoyanov@vmware.com Fixes: f7d82350e597 ("tools/events: Add files to create libtraceevent.a") Link: http://lkml.kernel.org/r/20190409091529.2686-1-rikard.falkeborn@gmail.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/traceevent/event-parse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index 743746a3c50d7..df3c73e9dea49 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -2201,7 +2201,7 @@ eval_type_str(unsigned long long val, const char *type, int pointer) return val & 0xffffffff;
if (strcmp(type, "u64") == 0 || - strcmp(type, "s64")) + strcmp(type, "s64") == 0) return val;
if (strcmp(type, "s8") == 0)
[ Upstream commit 6041186a32585fc7a1d0f6cfe2f138b05fdc3c82 ]
When a module option, or core kernel argument, toggles a static-key it requires jump labels to be initialized early. While x86, PowerPC, and ARM64 arrange for jump_label_init() to be called before parse_args(), ARM does not.
Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303 page_alloc_shuffle+0x12c/0x1ac static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used before call to jump_label_init() Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1 Hardware name: ARM Integrator/CP (Device Tree) [<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18) [<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24) [<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108) [<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c) [<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>] (page_alloc_shuffle+0x12c/0x1ac) [<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48) [<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350) [<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
Move the fallback call to jump_label_init() to occur before parse_args().
The redundant calls to jump_label_init() in other archs are left intact in case they have static key toggling use cases that are even earlier than option parsing.
Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwi... Signed-off-by: Dan Williams dan.j.williams@intel.com Reported-by: Guenter Roeck groeck@google.com Reviewed-by: Kees Cook keescook@chromium.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Mike Rapoport rppt@linux.ibm.com Cc: Russell King rmk@armlinux.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/main.c b/init/main.c index 49926d95442f8..e88c8cdef6a7c 100644 --- a/init/main.c +++ b/init/main.c @@ -538,6 +538,8 @@ asmlinkage __visible void __init start_kernel(void) page_alloc_init();
pr_notice("Kernel command line: %s\n", boot_command_line); + /* parameters may set static keys */ + jump_label_init(); parse_early_param(); after_dashes = parse_args("Booting kernel", static_command_line, __start___param, @@ -547,8 +549,6 @@ asmlinkage __visible void __init start_kernel(void) parse_args("Setting init args", after_dashes, NULL, 0, -1, -1, NULL, set_init_arg);
- jump_label_init(); - /* * These use large bootmem allocations and must precede * kmem_cache_init()
[ Upstream commit 0261ea1bd1eb0da5c0792a9119b8655cf33c80a3 ]
We can receive ICMP errors from client or from tunneling real server. While the former can be scheduled to real server, the latter should not be scheduled, they are decapsulated only when existing connection is found.
Fixes: 6044eeffafbe ("ipvs: attempt to schedule icmp packets") Signed-off-by: Julian Anastasov ja@ssi.bg Signed-off-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index ac212542a2178..c4509a10ce52f 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1484,7 +1484,7 @@ ip_vs_in_icmp(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, if (!cp) { int v;
- if (!sysctl_schedule_icmp(ipvs)) + if (ipip || !sysctl_schedule_icmp(ipvs)) return NF_ACCEPT;
if (!ip_vs_try_to_schedule(ipvs, AF_INET, skb, pd, &v, &cp, &ciph))
[ Upstream commit 27b141fc234a3670d21bd742c35d7205d03cbb3a ]
clang points out that the return code from this function is undefined for one of the error paths:
../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (priv->channel[direction] == NULL) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here return result; ^~~~~~ ../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false if (priv->channel[direction] == NULL) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning int result; ^
Make it return -ENODEV here, as in the related failure cases. gcc has a known bug in underreporting some of these warnings when it has already eliminated the assignment of the return code based on some earlier optimization step.
Reviewed-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/ctcm_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/s390/net/ctcm_main.c b/drivers/s390/net/ctcm_main.c index 05c37d6d4afef..a31821d946775 100644 --- a/drivers/s390/net/ctcm_main.c +++ b/drivers/s390/net/ctcm_main.c @@ -1595,6 +1595,7 @@ static int ctcm_new_device(struct ccwgroup_device *cgdev) if (priv->channel[direction] == NULL) { if (direction == CTCM_WRITE) channel_free(priv->channel[CTCM_READ]); + result = -ENODEV; goto out_dev; } priv->channel[direction]->netdev = dev;
[ Upstream commit 30c04d796b693e22405c38e9b78e9a364e4c77e6 ]
The run_netsocktests will be marked as passed regardless the actual test result from the ./socket:
selftests: net: run_netsocktests ======================================== -------------------- running socket test -------------------- [FAIL] ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed. Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin po-hsu.lin@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/run_netsocktests | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests index 16058bbea7a85..c195b44786627 100755 --- a/tools/testing/selftests/net/run_netsocktests +++ b/tools/testing/selftests/net/run_netsocktests @@ -6,7 +6,7 @@ echo "--------------------" ./socket if [ $? -ne 0 ]; then echo "[FAIL]" + exit 1 else echo "[PASS]" fi -
[ Upstream commit d4fad0a426c6e26f48c9a7cdd21a7fe9c198d645 ]
Initialize the flow input colorspaces to unknown and reset to that value when the channel gets disabled. This avoids the state getting mixed up with a previous mode.
Also keep the CSC settings for the background flow intact when disabling the foreground flow.
Root-caused-by: Jonathan Marek jonathan@marek.ca Signed-off-by: Lucas Stach l.stach@pengutronix.de Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/ipu-v3/ipu-dp.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/gpu/ipu-v3/ipu-dp.c +++ b/drivers/gpu/ipu-v3/ipu-dp.c @@ -195,7 +195,8 @@ int ipu_dp_setup_channel(struct ipu_dp * ipu_dp_csc_init(flow, flow->foreground.in_cs, flow->out_cs, DP_COM_CONF_CSC_DEF_BOTH); } else { - if (flow->foreground.in_cs == flow->out_cs) + if (flow->foreground.in_cs == IPUV3_COLORSPACE_UNKNOWN || + flow->foreground.in_cs == flow->out_cs) /* * foreground identical to output, apply color * conversion on background @@ -261,6 +262,8 @@ void ipu_dp_disable_channel(struct ipu_d struct ipu_dp_priv *priv = flow->priv; u32 reg, csc;
+ dp->in_cs = IPUV3_COLORSPACE_UNKNOWN; + if (!dp->foreground) return;
@@ -268,8 +271,9 @@ void ipu_dp_disable_channel(struct ipu_d
reg = readl(flow->base + DP_COM_CONF); csc = reg & DP_COM_CONF_CSC_DEF_MASK; - if (csc == DP_COM_CONF_CSC_DEF_FG) - reg &= ~DP_COM_CONF_CSC_DEF_MASK; + reg &= ~DP_COM_CONF_CSC_DEF_MASK; + if (csc == DP_COM_CONF_CSC_DEF_BOTH || csc == DP_COM_CONF_CSC_DEF_BG) + reg |= DP_COM_CONF_CSC_DEF_BG;
reg &= ~DP_COM_CONF_FG_EN; writel(reg, flow->base + DP_COM_CONF); @@ -350,6 +354,8 @@ int ipu_dp_init(struct ipu_soc *ipu, str mutex_init(&priv->mutex);
for (i = 0; i < IPUV3_NUM_FLOWS; i++) { + priv->flow[i].background.in_cs = IPUV3_COLORSPACE_UNKNOWN; + priv->flow[i].foreground.in_cs = IPUV3_COLORSPACE_UNKNOWN; priv->flow[i].foreground.foreground = true; priv->flow[i].base = priv->base + ipu_dp_flow_base[i]; priv->flow[i].priv = priv;
From: Wei Yongjun weiyongjun1@huawei.com
commit 51c8d24101c79ffce3e79137e2cee5dfeb956dd7 upstream.
Add the missing unlock before return from function cw1200_hw_scan() in the error handling case.
Fixes: 4f68ef64cd7f ("cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()") Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Acked-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: Kalle Valo kvalo@codeaurora.org [iwamatsu: Change the patching file from drivers/net/wireless/st/cw1200/scan.c to drivers/net/wireless/cw1200/scan.c] Signed-off-by: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/cw1200/scan.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/wireless/cw1200/scan.c +++ b/drivers/net/wireless/cw1200/scan.c @@ -84,8 +84,11 @@ int cw1200_hw_scan(struct ieee80211_hw *
frame.skb = ieee80211_probereq_get(hw, priv->vif->addr, NULL, 0, req->ie_len); - if (!frame.skb) + if (!frame.skb) { + mutex_unlock(&priv->conf_mutex); + up(&priv->scan.lock); return -ENOMEM; + }
if (req->ie_len) memcpy(skb_put(frame.skb, req->ie_len), req->ie, req->ie_len);
From: Alistair Strachan astrachan@google.com
commit cd01544a268ad8ee5b1dfe42c4393f1095f86879 upstream.
Commit
379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking.
Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.)
Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method.
See relevant bug reports for additional info:
https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") Reported-by: Florian Weimer fweimer@redhat.com Reported-by: Carlos O'Donell carlos@redhat.com Reported-by: "H. J. Lu" hjl.tools@gmail.com Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Borislav Petkov bp@suse.de Tested-by: Laura Abbott labbott@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Carlos O'Donell carlos@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Joel Fernandes joel@joelfernandes.org Cc: kernel-team@android.com Cc: Laura Abbott labbott@redhat.com Cc: stable stable@vger.kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: X86 ML x86@kernel.org Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com Signed-off-by: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/vdso/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -159,7 +159,8 @@ quiet_cmd_vdso = VDSO $@ sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \ - $(call ld-option, --build-id) -Bsymbolic + $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \ + -Bsymbolic GCOV_PROFILE := n
#
From: Nigel Croxon ncroxon@redhat.com
commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef upstream.
Changing state from check_state_check_result to check_state_compute_result not only is unsafe but also doesn't appear to serve a valid purpose. A raid6 check should only be pushing out extra writes if doing repair and a mis-match occurs. The stripe dev management will already try and do repair writes for failing sectors.
This patch makes the raid6 check_state_check_result handling work more like raid5's. If somehow too many failures for a check, just quit the check operation for the stripe. When any checks pass, don't try and use check_state_compute_result for a purpose it isn't needed for and is unsafe for. Just mark the stripe as in sync for passing its parity checks and let the stripe dev read/write code and the bad blocks list do their job handling I/O errors.
Repro steps from Xiao:
These are the steps to reproduce this problem: 1. redefined OPT_MEDIUM_ERR_ADDR to 12000 in scsi_debug.c 2. insmod scsi_debug.ko dev_size_mb=11000 max_luns=1 num_tgts=1 3. mdadm --create /dev/md127 --level=6 --raid-devices=5 /dev/sde1 /dev/sde2 /dev/sde3 /dev/sde5 /dev/sde6 sde is the disk created by scsi_debug 4. echo "2" >/sys/module/scsi_debug/parameters/opts 5. raid-check
It panic: [ 4854.730899] md: data-check of RAID array md127 [ 4854.857455] sd 5:0:0:0: [sdr] tag#80 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 4854.859246] sd 5:0:0:0: [sdr] tag#80 Sense Key : Medium Error [current] [ 4854.860694] sd 5:0:0:0: [sdr] tag#80 Add. Sense: Unrecovered read error [ 4854.862207] sd 5:0:0:0: [sdr] tag#80 CDB: Read(10) 28 00 00 00 2d 88 00 04 00 00 [ 4854.864196] print_req_error: critical medium error, dev sdr, sector 11656 flags 0 [ 4854.867409] sd 5:0:0:0: [sdr] tag#100 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 4854.869469] sd 5:0:0:0: [sdr] tag#100 Sense Key : Medium Error [current] [ 4854.871206] sd 5:0:0:0: [sdr] tag#100 Add. Sense: Unrecovered read error [ 4854.872858] sd 5:0:0:0: [sdr] tag#100 CDB: Read(10) 28 00 00 00 2e e0 00 00 08 00 [ 4854.874587] print_req_error: critical medium error, dev sdr, sector 12000 flags 4000 [ 4854.876456] sd 5:0:0:0: [sdr] tag#101 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 4854.878552] sd 5:0:0:0: [sdr] tag#101 Sense Key : Medium Error [current] [ 4854.880278] sd 5:0:0:0: [sdr] tag#101 Add. Sense: Unrecovered read error [ 4854.881846] sd 5:0:0:0: [sdr] tag#101 CDB: Read(10) 28 00 00 00 2e e8 00 00 08 00 [ 4854.883691] print_req_error: critical medium error, dev sdr, sector 12008 flags 4000 [ 4854.893927] sd 5:0:0:0: [sdr] tag#166 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 4854.896002] sd 5:0:0:0: [sdr] tag#166 Sense Key : Medium Error [current] [ 4854.897561] sd 5:0:0:0: [sdr] tag#166 Add. Sense: Unrecovered read error [ 4854.899110] sd 5:0:0:0: [sdr] tag#166 CDB: Read(10) 28 00 00 00 2e e0 00 00 10 00 [ 4854.900989] print_req_error: critical medium error, dev sdr, sector 12000 flags 0 [ 4854.902757] md/raid:md127: read error NOT corrected!! (sector 9952 on sdr1). [ 4854.904375] md/raid:md127: read error NOT corrected!! (sector 9960 on sdr1). [ 4854.906201] ------------[ cut here ]------------ [ 4854.907341] kernel BUG at drivers/md/raid5.c:4190!
raid5.c:4190 above is this BUG_ON:
handle_parity_checks6() ... BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
Cc: stable@vger.kernel.org # v3.16+ OriginalAuthor: David Jeffery djeffery@redhat.com Cc: Xiao Ni xni@redhat.com Tested-by: David Jeffery djeffery@redhat.com Signed-off-by: David Jeffy djeffery@redhat.com Signed-off-by: Nigel Croxon ncroxon@redhat.com Signed-off-by: Song Liu songliubraving@fb.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/raid5.c | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-)
--- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3897,26 +3897,15 @@ static void handle_parity_checks6(struct case check_state_check_result: sh->check_state = check_state_idle;
+ if (s->failed > 1) + break; /* handle a successful check operation, if parity is correct * we are done. Otherwise update the mismatch count and repair * parity if !MD_RECOVERY_CHECK */ if (sh->ops.zero_sum_result == 0) { - /* both parities are correct */ - if (!s->failed) - set_bit(STRIPE_INSYNC, &sh->state); - else { - /* in contrast to the raid5 case we can validate - * parity, but still have a failure to write - * back - */ - sh->check_state = check_state_compute_result; - /* Returning at this point means that we may go - * off and bring p and/or q uptodate again so - * we make sure to check zero_sum_result again - * to verify if p or q need writeback - */ - } + /* Any parity checked was correct */ + set_bit(STRIPE_INSYNC, &sh->state); } else { atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches); if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
From: Tony Luck tony.luck@intel.com
commit b8fb03785d4de097507d0cf45873525e0ac4d2b2 upstream.
We will need to provide declarations of static keys in header files. Provide DECLARE_STATIC_KEY_{TRUE,FALSE} macros.
Signed-off-by: Tony Luck tony.luck@intel.com Acked-by: Borislav Petkov bp@suse.de Cc: Peter Zijlstra peterz@infradead.org Cc: Dan Williams dan.j.williams@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/816881cf85bd3cf13385d212882618f38a3b5d33.1472754711... Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/jump_label.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -267,9 +267,15 @@ struct static_key_false { #define DEFINE_STATIC_KEY_TRUE(name) \ struct static_key_true name = STATIC_KEY_TRUE_INIT
+#define DECLARE_STATIC_KEY_TRUE(name) \ + extern struct static_key_true name + #define DEFINE_STATIC_KEY_FALSE(name) \ struct static_key_false name = STATIC_KEY_FALSE_INIT
+#define DECLARE_STATIC_KEY_FALSE(name) \ + extern struct static_key_false name + extern bool ____wrong_branch_error(void);
#define static_key_enabled(x) \
From: Borislav Petkov bp@suse.de
commit 4167709bbf826512a52ebd6aafda2be104adaec9 upstream.
Since on Intel we're required to do CPUID(1) first, before reading the microcode revision MSR, let's add a special helper which does the required steps so that we don't forget to do them next time, when we want to read the microcode revision.
Signed-off-by: Borislav Petkov bp@suse.de Link: http://lkml.kernel.org/r/20170109114147.5082-4-bp@alien8.de Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.4: - Don't touch prev_rev variable in apply_microcode() - Keep using sync_core(), which will alway includes the necessary CPUID - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/microcode_intel.h | 15 ++++++++++++ arch/x86/kernel/cpu/intel.c | 11 ++------- arch/x86/kernel/cpu/microcode/intel.c | 39 +++++++++------------------------ 3 files changed, 29 insertions(+), 36 deletions(-)
--- a/arch/x86/include/asm/microcode_intel.h +++ b/arch/x86/include/asm/microcode_intel.h @@ -53,6 +53,21 @@ struct extended_sigtable {
#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
+static inline u32 intel_get_microcode_revision(void) +{ + u32 rev, dummy; + + native_wrmsrl(MSR_IA32_UCODE_REV, 0); + + /* As documented in the SDM: Do a CPUID 1 here */ + sync_core(); + + /* get the current revision from MSR 0x8B */ + native_rdmsr(MSR_IA32_UCODE_REV, dummy, rev); + + return rev; +} + extern int has_newer_microcode(void *mc, unsigned int csig, int cpf, int rev); extern int microcode_sanity_check(void *mc, int print_err); extern int find_matching_signature(void *mc, unsigned int csig, int cpf); --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -14,6 +14,7 @@ #include <asm/bugs.h> #include <asm/cpu.h> #include <asm/intel-family.h> +#include <asm/microcode_intel.h>
#ifdef CONFIG_X86_64 #include <linux/topology.h> @@ -102,14 +103,8 @@ static void early_init_intel(struct cpui (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
- if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64)) { - unsigned lower_word; - - wrmsr(MSR_IA32_UCODE_REV, 0, 0); - /* Required by the SDM */ - sync_core(); - rdmsr(MSR_IA32_UCODE_REV, lower_word, c->microcode); - } + if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64)) + c->microcode = intel_get_microcode_revision();
/* Now if any of them are set, check the blacklist and clear the lot */ if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) || --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -376,15 +376,8 @@ static int collect_cpu_info_early(struct native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]); csig.pf = 1 << ((val[1] >> 18) & 7); } - native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
- /* As documented in the SDM: Do a CPUID 1 here */ - sync_core(); - - /* get the current revision from MSR 0x8B */ - native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]); - - csig.rev = val[1]; + csig.rev = intel_get_microcode_revision();
uci->cpu_sig = csig; uci->valid = 1; @@ -654,7 +647,7 @@ static inline void print_ucode(struct uc static int apply_microcode_early(struct ucode_cpu_info *uci, bool early) { struct microcode_intel *mc_intel; - unsigned int val[2]; + u32 rev;
mc_intel = uci->mc; if (mc_intel == NULL) @@ -664,21 +657,16 @@ static int apply_microcode_early(struct native_wrmsr(MSR_IA32_UCODE_WRITE, (unsigned long) mc_intel->bits, (unsigned long) mc_intel->bits >> 16 >> 16); - native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
- /* As documented in the SDM: Do a CPUID 1 here */ - sync_core(); - - /* get the current revision from MSR 0x8B */ - native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]); - if (val[1] != mc_intel->hdr.rev) + rev = intel_get_microcode_revision(); + if (rev != mc_intel->hdr.rev) return -1;
#ifdef CONFIG_X86_64 /* Flush global tlb. This is precaution. */ flush_tlb_early(); #endif - uci->cpu_sig.rev = val[1]; + uci->cpu_sig.rev = rev;
if (early) print_ucode(uci); @@ -852,7 +840,7 @@ static int apply_microcode_intel(int cpu { struct microcode_intel *mc_intel; struct ucode_cpu_info *uci; - unsigned int val[2]; + u32 rev; int cpu_num = raw_smp_processor_id(); struct cpuinfo_x86 *c = &cpu_data(cpu_num);
@@ -877,27 +865,22 @@ static int apply_microcode_intel(int cpu wrmsr(MSR_IA32_UCODE_WRITE, (unsigned long) mc_intel->bits, (unsigned long) mc_intel->bits >> 16 >> 16); - wrmsr(MSR_IA32_UCODE_REV, 0, 0); - - /* As documented in the SDM: Do a CPUID 1 here */ - sync_core();
- /* get the current revision from MSR 0x8B */ - rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]); + rev = intel_get_microcode_revision();
- if (val[1] != mc_intel->hdr.rev) { + if (rev != mc_intel->hdr.rev) { pr_err("CPU%d update to revision 0x%x failed\n", cpu_num, mc_intel->hdr.rev); return -1; } pr_info("CPU%d updated to revision 0x%x, date = %04x-%02x-%02x\n", - cpu_num, val[1], + cpu_num, rev, mc_intel->hdr.date & 0xffff, mc_intel->hdr.date >> 24, (mc_intel->hdr.date >> 16) & 0xff);
- uci->cpu_sig.rev = val[1]; - c->microcode = val[1]; + uci->cpu_sig.rev = rev; + c->microcode = rev;
return 0; }
From: Nicolas Dichtel nicolas.dichtel@6wind.com
commit 25dc1d6cc3082aab293e5dad47623b550f7ddd2a upstream.
Even if this file was not in an uapi directory, it was exported because it was listed in the Kbuild file.
Fixes: b72e7464e4cf ("x86/uapi: Do not export <asm/msr-index.h> as part of the user API headers") Suggested-by: Borislav Petkov bp@alien8.de Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Acked-by: Ingo Molnar mingo@kernel.org Acked-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/uapi/asm/Kbuild | 1 - 1 file changed, 1 deletion(-)
--- a/arch/x86/include/uapi/asm/Kbuild +++ b/arch/x86/include/uapi/asm/Kbuild @@ -27,7 +27,6 @@ header-y += ldt.h header-y += mce.h header-y += mman.h header-y += msgbuf.h -header-y += msr-index.h header-y += msr.h header-y += mtrr.h header-y += param.h
Hi!
commit 25dc1d6cc3082aab293e5dad47623b550f7ddd2a upstream.
Even if this file was not in an uapi directory, it was exported because it was listed in the Kbuild file.
While good idea for mainline, I don't think this belongs to stable.
Dropping it should not result in problems.
Pavel
stable.> +++ b/arch/x86/include/uapi/asm/Kbuild
@@ -27,7 +27,6 @@ header-y += ldt.h header-y += mce.h header-y += mman.h header-y += msgbuf.h -header-y += msr-index.h header-y += msr.h header-y += mtrr.h header-y += param.h
On Fri, 2019-05-17 at 10:06 +0200, Pavel Machek wrote:
Hi!
commit 25dc1d6cc3082aab293e5dad47623b550f7ddd2a upstream.
Even if this file was not in an uapi directory, it was exported because it was listed in the Kbuild file.
While good idea for mainline, I don't think this belongs to stable.
Dropping it should not result in problems.
If we apply "x86/msr-index: Cleanup bit defines" and not this, then "make headers_install" stops working.
Ben.
From: Matthias Kaehlcke mka@chromium.org
commit c32ee3d9abd284b4fcaacc250b101f93829c7bae upstream.
GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically results in an integer overflow. clang raises a warning if the overflow occurs in a preprocessor expression. Clear the low-order bits through a substraction instead of the left-shift to avoid the overflow.
(akpm: no change in .text size in my testing)
Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke mka@chromium.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bitops.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -19,10 +19,11 @@ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. */ #define GENMASK(h, l) \ - (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
#define GENMASK_ULL(h, l) \ - (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) + (((~0ULL) - (1ULL << (l)) + 1) & \ + (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w);
From: Ashok Raj ashok.raj@intel.com
commit c182d2b7d0ca48e0d6ff16f7d883161238c447ed upstream.
After updating microcode on one of the threads of a core, the other thread sibling automatically gets the update since the microcode resources on a hyperthreaded core are shared between the two threads.
Check the microcode revision on the CPU before performing a microcode update and thus save us the WRMSR 0x79 because it is a particularly expensive operation.
[ Borislav: Massage changelog and coding style. ]
Signed-off-by: Ashok Raj ashok.raj@intel.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Tom Lendacky thomas.lendacky@amd.com Tested-by: Ashok Raj ashok.raj@intel.com Cc: Arjan Van De Ven arjan.van.de.ven@intel.com Link: http://lkml.kernel.org/r/1519352533-15992-2-git-send-email-ashok.raj@intel.c... Link: https://lkml.kernel.org/r/20180228102846.13447-3-bp@alien8.de [bwh: Backported to 4.4: - s/mc->/mc_intel->/ - Return 0 in this case - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/microcode/intel.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
--- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -653,6 +653,17 @@ static int apply_microcode_early(struct if (mc_intel == NULL) return 0;
+ /* + * Save us the MSR write below - which is a particular expensive + * operation - when the other hyperthread has updated the microcode + * already. + */ + rev = intel_get_microcode_revision(); + if (rev >= mc_intel->hdr.rev) { + uci->cpu_sig.rev = rev; + return 0; + } + /* write microcode via MSR 0x79 */ native_wrmsr(MSR_IA32_UCODE_WRITE, (unsigned long) mc_intel->bits, @@ -861,6 +872,18 @@ static int apply_microcode_intel(int cpu if (get_matching_mc(mc_intel, cpu) == 0) return 0;
+ /* + * Save us the MSR write below - which is a particular expensive + * operation - when the other hyperthread has updated the microcode + * already. + */ + rev = intel_get_microcode_revision(); + if (rev >= mc_intel->hdr.rev) { + uci->cpu_sig.rev = rev; + c->microcode = rev; + return 0; + } + /* write microcode via MSR 0x79 */ wrmsr(MSR_IA32_UCODE_WRITE, (unsigned long) mc_intel->bits,
From: Tony Luck tony.luck@intel.com
commit fa94d0c6e0f3431523f5701084d799c77c7d4a4f upstream.
Updating microcode used to be relatively rare. Now that it has become more common we should save the microcode version in a machine check record to make sure that those people looking at the error have this important information bundled with the rest of the logged information.
[ Borislav: Simplify a bit. ]
Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Yazen Ghannam yazen.ghannam@amd.com Cc: linux-edac linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20180301233449.24311-1-tony.luck@intel.com [bwh: Backported to 4.4: - Also add earlier fields to struct mce, to match upstream UAPI - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/uapi/asm/mce.h | 4 ++++ arch/x86/kernel/cpu/mcheck/mce.c | 4 +++- 2 files changed, 7 insertions(+), 1 deletion(-)
--- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -26,6 +26,10 @@ struct mce { __u32 socketid; /* CPU socket ID */ __u32 apicid; /* CPU initial apic ID */ __u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */ + __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ + __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ + __u64 ppin; /* Protected Processor Inventory Number */ + __u32 microcode;/* Microcode revision */ };
#define MCE_GET_RECORD_LEN _IOR('M', 1, int) --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -138,6 +138,8 @@ void mce_setup(struct mce *m) m->socketid = cpu_data(m->extcpu).phys_proc_id; m->apicid = cpu_data(m->extcpu).initial_apicid; rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap); + + m->microcode = boot_cpu_data.microcode; }
DEFINE_PER_CPU(struct mce, injectm); @@ -258,7 +260,7 @@ static void print_mce(struct mce *m) */ pr_emerg(HW_ERR "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x microcode %x\n", m->cpuvendor, m->cpuid, m->time, m->socketid, m->apicid, - cpu_data(m->extcpu).microcode); + m->microcode);
/* * Print out human-readable details about the MCE error,
From: Ben Hutchings ben@decadent.org.uk
Hide the AMD_{IBRS,IBPB,STIBP} flag from /proc/cpuinfo. This was done upstream as part of commit e7c587da1252 "x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP". That commit has already been backported but this part was omitted.
Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -265,9 +265,9 @@
/* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ #define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ -#define X86_FEATURE_AMD_IBPB (13*32+12) /* Indirect Branch Prediction Barrier */ -#define X86_FEATURE_AMD_IBRS (13*32+14) /* Indirect Branch Restricted Speculation */ -#define X86_FEATURE_AMD_STIBP (13*32+15) /* Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_AMD_IBPB (13*32+12) /* "" Indirect Branch Prediction Barrier */ +#define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ +#define X86_FEATURE_AMD_STIBP (13*32+15) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */
From: Sai Praneeth sai.praneeth.prakhya@intel.com
commit 706d51681d636a0c4a5ef53395ec3b803e45ed4d upstream.
Future Intel processors will support "Enhanced IBRS" which is an "always on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never disabled.
From the specification [1]:
"With enhanced IBRS, the predicted targets of indirect branches executed cannot be controlled by software that was executed in a less privileged predictor mode or on another logical processor. As a result, software operating on a processor with enhanced IBRS need not use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more privileged predictor mode. Software can isolate predictor modes effectively simply by setting the bit once. Software need not disable enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
If Enhanced IBRS is supported by the processor then use it as the preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's Retpoline white paper [2] states:
"Retpoline is known to be an effective branch target injection (Spectre variant 2) mitigation on Intel processors belonging to family 6 (enumerated by the CPUID instruction) that do not have support for enhanced IBRS. On processors that support enhanced IBRS, it should be used for mitigation instead of retpoline."
The reason why Enhanced IBRS is the recommended mitigation on processors which support it is that these processors also support CET which provides a defense against ROP attacks. Retpoline is very similar to ROP techniques and might trigger false positives in the CET defense.
If Enhanced IBRS is selected as the mitigation technique for spectre v2, the IBRS bit in SPEC_CTRL MSR is set once at boot time and never cleared. Kernel also has to make sure that IBRS bit remains set after VMEXIT because the guest might have cleared the bit. This is already covered by the existing x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() speculation control functions.
Enhanced IBRS still requires IBPB for full mitigation.
[1] Speculative-Execution-Side-Channel-Mitigations.pdf [2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf Both documents are available at: https://bugzilla.kernel.org/show_bug.cgi?id=199511
Originally-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tim C Chen tim.c.chen@intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Ravi Shankar ravi.v.shankar@intel.com Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.pra... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [bwh: Backported to 4.4: - Use the next bit from feature word 7 - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/nospec-branch.h | 1 + arch/x86/kernel/cpu/bugs.c | 20 ++++++++++++++++++-- arch/x86/kernel/cpu/common.c | 3 +++ 4 files changed, 23 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -214,6 +214,7 @@ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ +#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
/* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -170,6 +170,7 @@ enum spectre_v2_mitigation { SPECTRE_V2_RETPOLINE_GENERIC, SPECTRE_V2_RETPOLINE_AMD, SPECTRE_V2_IBRS, + SPECTRE_V2_IBRS_ENHANCED, };
/* The Speculative Store Bypass disable variants */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -132,6 +132,7 @@ static const char *spectre_v2_strings[] [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", + [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS", };
#undef pr_fmt @@ -332,6 +333,13 @@ static void __init spectre_v2_select_mit
case SPECTRE_V2_CMD_FORCE: case SPECTRE_V2_CMD_AUTO: + if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED)) { + mode = SPECTRE_V2_IBRS_ENHANCED; + /* Force it so VMEXIT will restore correctly */ + x86_spec_ctrl_base |= SPEC_CTRL_IBRS; + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); + goto specv2_set_mode; + } if (IS_ENABLED(CONFIG_RETPOLINE)) goto retpoline_auto; break; @@ -369,6 +377,7 @@ retpoline_auto: setup_force_cpu_cap(X86_FEATURE_RETPOLINE); }
+specv2_set_mode: spectre_v2_enabled = mode; pr_info("%s\n", spectre_v2_strings[mode]);
@@ -391,9 +400,16 @@ retpoline_auto:
/* * Retpoline means the kernel is safe because it has no indirect - * branches. But firmware isn't, so use IBRS to protect that. + * branches. Enhanced IBRS protects firmware too, so, enable restricted + * speculation around firmware calls only when Enhanced IBRS isn't + * supported. + * + * Use "mode" to check Enhanced IBRS instead of boot_cpu_has(), because + * the user might select retpoline on the kernel command line and if + * the CPU supports Enhanced IBRS, kernel might un-intentionally not + * enable IBRS around firmware calls. */ - if (boot_cpu_has(X86_FEATURE_IBRS)) { + if (boot_cpu_has(X86_FEATURE_IBRS) && mode != SPECTRE_V2_IBRS_ENHANCED) { setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW); pr_info("Enabling Restricted Speculation for firmware calls\n"); } --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -915,6 +915,9 @@ static void __init cpu_set_bug_bits(stru setup_force_cpu_bug(X86_BUG_SPECTRE_V1); setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
+ if (ia32_cap & ARCH_CAP_IBRS_ALL) + setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED); + if (x86_match_cpu(cpu_no_meltdown)) return;
From: Dominik Brodowski linux@dominikbrodowski.net
commit 8ecc4979b1bd9c94168e6fc92960033b7a951336 upstream.
Only CPUs which speculate can speculate. Therefore, it seems prudent to test for cpu_no_speculation first and only then determine whether a specific speculating CPU is susceptible to store bypass speculation. This is underlined by all CPUs currently listed in cpu_no_speculation were present in cpu_no_spec_store_bypass as well.
Signed-off-by: Dominik Brodowski linux@dominikbrodowski.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: bp@suse.de Cc: konrad.wilk@oracle.com Link: https://lkml.kernel.org/r/20180522090539.GA24668@light.dominikbrodowski.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/common.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -859,12 +859,8 @@ static const __initconst struct x86_cpu_ {} };
+/* Only list CPUs which speculate but are non susceptible to SSB */ static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 }, @@ -872,14 +868,10 @@ static const __initconst struct x86_cpu_ { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, - { X86_VENDOR_CENTAUR, 5, }, - { X86_VENDOR_INTEL, 5, }, - { X86_VENDOR_NSC, 5, }, { X86_VENDOR_AMD, 0x12, }, { X86_VENDOR_AMD, 0x11, }, { X86_VENDOR_AMD, 0x10, }, { X86_VENDOR_AMD, 0xf, }, - { X86_VENDOR_ANY, 4, }, {} };
@@ -902,6 +894,12 @@ static void __init cpu_set_bug_bits(stru { u64 ia32_cap = 0;
+ if (x86_match_cpu(cpu_no_speculation)) + return; + + setup_force_cpu_bug(X86_BUG_SPECTRE_V1); + setup_force_cpu_bug(X86_BUG_SPECTRE_V2); + if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES)) rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
@@ -909,12 +907,6 @@ static void __init cpu_set_bug_bits(stru !(ia32_cap & ARCH_CAP_SSB_NO)) setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
- if (x86_match_cpu(cpu_no_speculation)) - return; - - setup_force_cpu_bug(X86_BUG_SPECTRE_V1); - setup_force_cpu_bug(X86_BUG_SPECTRE_V2); - if (ia32_cap & ARCH_CAP_IBRS_ALL) setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit 24809860012e0130fbafe536709e08a22b3e959e upstream.
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that the CPUID 8000_0008.EBX[26] will mean that the speculative store bypass disable is no longer needed.
A copy of this document is available at: https://bugzilla.kernel.org/show_bug.cgi?id=199889
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Janakarajan Natarajan Janakarajan.Natarajan@amd.com Cc: kvm@vger.kernel.org Cc: andrew.cooper3@citrix.com Cc: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Link: https://lkml.kernel.org/r/20180601145921.9500-2-konrad.wilk@oracle.com [bwh: Backported to 4.4: adjust context, indentation] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/common.c | 3 ++- arch/x86/kvm/cpuid.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -270,6 +270,7 @@ #define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ #define X86_FEATURE_AMD_STIBP (13*32+15) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ +#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -904,7 +904,8 @@ static void __init cpu_set_bug_bits(stru rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
if (!x86_match_cpu(cpu_no_spec_store_bypass) && - !(ia32_cap & ARCH_CAP_SSB_NO)) + !(ia32_cap & ARCH_CAP_SSB_NO) && + !cpu_has(c, X86_FEATURE_AMD_SSB_NO)) setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
if (ia32_cap & ARCH_CAP_IBRS_ALL) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct
/* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD); + F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD) | F(AMD_SSB_NO);
/* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features =
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit 6ac2f49edb1ef5446089c7c660017732886d62d6 upstream.
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that if CPUID 8000_0008.EBX[24] is set we should be using the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f) for speculative store bypass disable.
This in effect means we should clear the X86_FEATURE_VIRT_SSBD flag so that we would prefer the SPEC_CTRL MSR.
See the document titled: 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199889
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Janakarajan Natarajan Janakarajan.Natarajan@amd.com Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed karahmed@amazon.de Cc: andrew.cooper3@citrix.com Cc: Joerg Roedel joro@8bytes.org Cc: Radim Krčmář rkrcmar@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Borislav Petkov bp@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com [bwh: Backported to 4.4: - Update feature test in guest_cpuid_has_spec_ctrl() instead of svm_{get,set}_msr() - Adjust context, indentation] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/bugs.c | 12 +++++++----- arch/x86/kernel/cpu/common.c | 6 ++++++ arch/x86/kvm/cpuid.c | 10 ++++++++-- arch/x86/kvm/cpuid.h | 2 +- arch/x86/kvm/svm.c | 2 +- 6 files changed, 24 insertions(+), 9 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -269,6 +269,7 @@ #define X86_FEATURE_AMD_IBPB (13*32+12) /* "" Indirect Branch Prediction Barrier */ #define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ #define X86_FEATURE_AMD_STIBP (13*32+15) /* "" Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -523,18 +523,20 @@ static enum ssb_mitigation __init __ssb_ if (mode == SPEC_STORE_BYPASS_DISABLE) { setup_force_cpu_cap(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE); /* - * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD uses - * a completely different MSR and bit dependent on family. + * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD may + * use a completely different MSR and bit dependent on family. */ switch (boot_cpu_data.x86_vendor) { case X86_VENDOR_INTEL: + case X86_VENDOR_AMD: + if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) { + x86_amd_ssb_disable(); + break; + } x86_spec_ctrl_base |= SPEC_CTRL_SSBD; x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); break; - case X86_VENDOR_AMD: - x86_amd_ssb_disable(); - break; } }
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -709,6 +709,12 @@ static void init_speculation_control(str set_cpu_cap(c, X86_FEATURE_STIBP); set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); } + + if (cpu_has(c, X86_FEATURE_AMD_SSBD)) { + set_cpu_cap(c, X86_FEATURE_SSBD); + set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); + clear_cpu_cap(c, X86_FEATURE_VIRT_SSBD); + } }
void get_cpu_cap(struct cpuinfo_x86 *c) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -343,7 +343,8 @@ static inline int __do_cpuid_ent(struct
/* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD) | F(AMD_SSB_NO); + F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) | + F(AMD_SSB_NO);
/* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features = @@ -607,7 +608,12 @@ static inline int __do_cpuid_ent(struct entry->ebx |= F(VIRT_SSBD); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); - if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD)) + /* + * The preference is to use SPEC CTRL MSR instead of the + * VIRT_SPEC MSR. + */ + if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) && + !boot_cpu_has(X86_FEATURE_AMD_SSBD)) entry->ebx |= F(VIRT_SSBD); break; } --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -175,7 +175,7 @@ static inline bool guest_cpuid_has_spec_ struct kvm_cpuid_entry2 *best;
best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0); - if (best && (best->ebx & bit(X86_FEATURE_AMD_IBRS))) + if (best && (best->ebx & (bit(X86_FEATURE_AMD_IBRS | bit(X86_FEATURE_AMD_SSBD))))) return true; best = kvm_find_cpuid_entry(vcpu, 7, 0); return best && (best->edx & (bit(X86_FEATURE_SPEC_CTRL) | bit(X86_FEATURE_SPEC_CTRL_SSBD))); --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3197,7 +3197,7 @@ static int svm_set_msr(struct kvm_vcpu * return 1;
/* The STIBP bit doesn't fault even if it's not advertised */ - if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD)) return 1;
svm->spec_ctrl = data;
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit 108fab4b5c8f12064ef86e02cb0459992affb30f upstream.
Both AMD and Intel can have SPEC_CTRL_MSR for SSBD.
However AMD also has two more other ways of doing it - which are !SPEC_CTRL MSR ways.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Kees Cook keescook@chromium.org Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed karahmed@amazon.de Cc: andrew.cooper3@citrix.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Link: https://lkml.kernel.org/r/20180601145921.9500-4-konrad.wilk@oracle.com Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -526,17 +526,12 @@ static enum ssb_mitigation __init __ssb_ * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD may * use a completely different MSR and bit dependent on family. */ - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_INTEL: - case X86_VENDOR_AMD: - if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) { - x86_amd_ssb_disable(); - break; - } + if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + x86_amd_ssb_disable(); + else { x86_spec_ctrl_base |= SPEC_CTRL_SSBD; x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); - break; } }
From: Will Deacon will.deacon@arm.com
commit 8bd9cb51daac89337295b6f037b0486911e1b408 upstream.
In preparation for implementing the asm-generic atomic bitops in terms of atomic_long_*(), we need to prevent <asm/atomic.h> implementations from pulling in <linux/bitops.h>. A common reason for this include is for the BITS_PER_BYTE definition, so move this and some other BIT() and masking macros into a new header file, <linux/bits.h>.
Signed-off-by: Will Deacon will.deacon@arm.com Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-4-git-send-email-will.deacon@a... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bitops.h | 22 +--------------------- include/linux/bits.h | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 21 deletions(-) create mode 100644 include/linux/bits.h
--- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -1,29 +1,9 @@ #ifndef _LINUX_BITOPS_H #define _LINUX_BITOPS_H #include <asm/types.h> +#include <linux/bits.h>
-#ifdef __KERNEL__ -#define BIT(nr) (1UL << (nr)) -#define BIT_ULL(nr) (1ULL << (nr)) -#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) -#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) -#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG)) -#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) -#define BITS_PER_BYTE 8 #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) -#endif - -/* - * Create a contiguous bitmask starting at bit position @l and ending at - * position @h. For example - * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. - */ -#define GENMASK(h, l) \ - (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) - -#define GENMASK_ULL(h, l) \ - (((~0ULL) - (1ULL << (l)) + 1) & \ - (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w); --- /dev/null +++ b/include/linux/bits.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_BITS_H +#define __LINUX_BITS_H +#include <asm/bitsperlong.h> + +#define BIT(nr) (1UL << (nr)) +#define BIT_ULL(nr) (1ULL << (nr)) +#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) +#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) +#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG)) +#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) +#define BITS_PER_BYTE 8 + +/* + * Create a contiguous bitmask starting at bit position @l and ending at + * position @h. For example + * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. + */ +#define GENMASK(h, l) \ + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + +#define GENMASK_ULL(h, l) \ + (((~0ULL) - (1ULL << (l)) + 1) & \ + (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) + +#endif /* __LINUX_BITS_H */
From: Tom Lendacky thomas.lendacky@amd.com
commit 612bc3b3d4be749f73a513a17d9b3ee1330d3487 upstream.
On AMD, the presence of the MSR_SPEC_CTRL feature does not imply that the SSBD mitigation support should use the SPEC_CTRL MSR. Other features could have caused the MSR_SPEC_CTRL feature to be set, while a different SSBD mitigation option is in place.
Update the SSBD support to check for the actual SSBD features that will use the SPEC_CTRL MSR.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Cc: Borislav Petkov bpetkov@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: 6ac2f49edb1e ("x86/bugs: Add AMD's SPEC_CTRL MSR usage") Link: http://lkml.kernel.org/r/20180702213602.29202.33151.stgit@tlendack-t1.amdoff... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -157,7 +157,8 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
/* SSBD controlled in MSR_SPEC_CTRL */ - if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD)) + if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || + static_cpu_has(X86_FEATURE_AMD_SSBD)) hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
if (hostval != guestval) { @@ -526,9 +527,10 @@ static enum ssb_mitigation __init __ssb_ * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD may * use a completely different MSR and bit dependent on family. */ - if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + if (!static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) && + !static_cpu_has(X86_FEATURE_AMD_SSBD)) { x86_amd_ssb_disable(); - else { + } else { x86_spec_ctrl_base |= SPEC_CTRL_SSBD; x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
From: Jiang Biao jiang.biao2@zte.com.cn
commit d9f4426c73002957be5dd39936f44a09498f7560 upstream.
SPECTRE_V2_IBRS in enum spectre_v2_mitigation is never used. Remove it.
Signed-off-by: Jiang Biao jiang.biao2@zte.com.cn Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: hpa@zytor.com Cc: dwmw2@amazon.co.uk Cc: konrad.wilk@oracle.com Cc: bp@suse.de Cc: zhong.weidong@zte.com.cn Link: https://lkml.kernel.org/r/1531872194-39207-1-git-send-email-jiang.biao2@zte.... [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 1 - 1 file changed, 1 deletion(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -169,7 +169,6 @@ enum spectre_v2_mitigation { SPECTRE_V2_RETPOLINE_MINIMAL_AMD, SPECTRE_V2_RETPOLINE_GENERIC, SPECTRE_V2_RETPOLINE_AMD, - SPECTRE_V2_IBRS, SPECTRE_V2_IBRS_ENHANCED, };
From: Prarit Bhargava prarit@redhat.com
commit 370a132bb2227ff76278f98370e0e701d86ff752 upstream.
When preparing an MCE record for logging, boot_cpu_data.microcode is used to read out the microcode revision on the box.
However, on systems where late microcode update has happened, the microcode revision output in a MCE log record is wrong because boot_cpu_data.microcode is not updated when the microcode gets updated.
But, the microcode revision saved in boot_cpu_data's microcode member should be kept up-to-date, regardless, for consistency.
Make it so.
Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records") Signed-off-by: Prarit Bhargava prarit@redhat.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: sironi@amazon.de Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/microcode/amd.c | 4 ++++ arch/x86/kernel/cpu/microcode/intel.c | 4 ++++ 2 files changed, 8 insertions(+)
--- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -712,6 +712,10 @@ int apply_microcode_amd(int cpu) uci->cpu_sig.rev = mc_amd->hdr.patch_id; c->microcode = mc_amd->hdr.patch_id;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */ + if (c->cpu_index == boot_cpu_data.cpu_index) + boot_cpu_data.microcode = mc_amd->hdr.patch_id; + return 0; }
--- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -905,6 +905,10 @@ static int apply_microcode_intel(int cpu uci->cpu_sig.rev = rev; c->microcode = rev;
+ /* Update boot_cpu_data's revision too, if we're on the BSP: */ + if (c->cpu_index == boot_cpu_data.cpu_index) + boot_cpu_data.microcode = rev; + return 0; }
From: Filippo Sironi sironi@amazon.de
commit 8da38ebaad23fe1b0c4a205438676f6356607cfc upstream.
Handle the case where microcode gets loaded on the BSP's hyperthread sibling first and the boot_cpu_data's microcode revision doesn't get updated because of early exit due to the siblings sharing a microcode engine.
For that, simply write the updated revision on all CPUs unconditionally.
Signed-off-by: Filippo Sironi sironi@amazon.de Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: prarit@redhat.com Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de [bwh: Backported to 4.4: - Keep returning 0 on success - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/microcode/amd.c | 20 ++++++++++---------- arch/x86/kernel/cpu/microcode/intel.c | 10 ++++------ 2 files changed, 14 insertions(+), 16 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -695,26 +695,26 @@ int apply_microcode_amd(int cpu) return -1;
/* need to apply patch? */ - if (rev >= mc_amd->hdr.patch_id) { - c->microcode = rev; - uci->cpu_sig.rev = rev; - return 0; - } + if (rev >= mc_amd->hdr.patch_id) + goto out;
if (__apply_microcode_amd(mc_amd)) { pr_err("CPU%d: update failed for patch_level=0x%08x\n", cpu, mc_amd->hdr.patch_id); return -1; } - pr_info("CPU%d: new patch_level=0x%08x\n", cpu, - mc_amd->hdr.patch_id);
- uci->cpu_sig.rev = mc_amd->hdr.patch_id; - c->microcode = mc_amd->hdr.patch_id; + rev = mc_amd->hdr.patch_id; + + pr_info("CPU%d: new patch_level=0x%08x\n", cpu, rev); + +out: + uci->cpu_sig.rev = rev; + c->microcode = rev;
/* Update boot_cpu_data's revision too, if we're on the BSP: */ if (c->cpu_index == boot_cpu_data.cpu_index) - boot_cpu_data.microcode = mc_amd->hdr.patch_id; + boot_cpu_data.microcode = rev;
return 0; } --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -878,11 +878,8 @@ static int apply_microcode_intel(int cpu * already. */ rev = intel_get_microcode_revision(); - if (rev >= mc_intel->hdr.rev) { - uci->cpu_sig.rev = rev; - c->microcode = rev; - return 0; - } + if (rev >= mc_intel->hdr.rev) + goto out;
/* write microcode via MSR 0x79 */ wrmsr(MSR_IA32_UCODE_WRITE, @@ -902,8 +899,9 @@ static int apply_microcode_intel(int cpu mc_intel->hdr.date >> 24, (mc_intel->hdr.date >> 16) & 0xff);
+out: uci->cpu_sig.rev = rev; - c->microcode = rev; + c->microcode = rev;
/* Update boot_cpu_data's revision too, if we're on the BSP: */ if (c->cpu_index == boot_cpu_data.cpu_index)
From: Peter Zijlstra peterz@infradead.org
commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream.
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.4: - Drop changes to CPU IDs that weren't already included - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/intel-family.h | 30 +++++++++++++++++------------- arch/x86/kernel/cpu/common.c | 28 ++++++++++++++-------------- 2 files changed, 31 insertions(+), 27 deletions(-)
--- a/arch/x86/include/asm/intel-family.h +++ b/arch/x86/include/asm/intel-family.h @@ -50,19 +50,23 @@
/* "Small Core" Processors (Atom) */
-#define INTEL_FAM6_ATOM_PINEVIEW 0x1C -#define INTEL_FAM6_ATOM_LINCROFT 0x26 -#define INTEL_FAM6_ATOM_PENWELL 0x27 -#define INTEL_FAM6_ATOM_CLOVERVIEW 0x35 -#define INTEL_FAM6_ATOM_CEDARVIEW 0x36 -#define INTEL_FAM6_ATOM_SILVERMONT1 0x37 /* BayTrail/BYT / Valleyview */ -#define INTEL_FAM6_ATOM_SILVERMONT2 0x4D /* Avaton/Rangely */ -#define INTEL_FAM6_ATOM_AIRMONT 0x4C /* CherryTrail / Braswell */ -#define INTEL_FAM6_ATOM_MERRIFIELD 0x4A /* Tangier */ -#define INTEL_FAM6_ATOM_MOOREFIELD 0x5A /* Annidale */ -#define INTEL_FAM6_ATOM_GOLDMONT 0x5C -#define INTEL_FAM6_ATOM_DENVERTON 0x5F /* Goldmont Microserver */ -#define INTEL_FAM6_ATOM_GEMINI_LAKE 0x7A +#define INTEL_FAM6_ATOM_BONNELL 0x1C /* Diamondville, Pineview */ +#define INTEL_FAM6_ATOM_BONNELL_MID 0x26 /* Silverthorne, Lincroft */ + +#define INTEL_FAM6_ATOM_SALTWELL 0x36 /* Cedarview */ +#define INTEL_FAM6_ATOM_SALTWELL_MID 0x27 /* Penwell */ +#define INTEL_FAM6_ATOM_SALTWELL_TABLET 0x35 /* Cloverview */ + +#define INTEL_FAM6_ATOM_SILVERMONT 0x37 /* Bay Trail, Valleyview */ +#define INTEL_FAM6_ATOM_SILVERMONT_X 0x4D /* Avaton, Rangely */ +#define INTEL_FAM6_ATOM_SILVERMONT_MID 0x4A /* Merriefield */ + +#define INTEL_FAM6_ATOM_AIRMONT 0x4C /* Cherry Trail, Braswell */ +#define INTEL_FAM6_ATOM_AIRMONT_MID 0x5A /* Moorefield */ + +#define INTEL_FAM6_ATOM_GOLDMONT 0x5C /* Apollo Lake */ +#define INTEL_FAM6_ATOM_GOLDMONT_X 0x5F /* Denverton */ +#define INTEL_FAM6_ATOM_GOLDMONT_PLUS 0x7A /* Gemini Lake */
/* Xeon Phi */
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -848,11 +848,11 @@ static void identify_cpu_without_cpuid(s }
static const __initconst struct x86_cpu_id cpu_no_speculation[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_TABLET, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL_MID, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_MID, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL, X86_FEATURE_ANY }, { X86_VENDOR_CENTAUR, 5 }, { X86_VENDOR_INTEL, 5 }, { X86_VENDOR_NSC, 5 }, @@ -867,10 +867,10 @@ static const __initconst struct x86_cpu_
/* Only list CPUs which speculate but are non susceptible to SSB */ static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MERRIFIELD }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, @@ -883,14 +883,14 @@ static const __initconst struct x86_cpu_
static const __initconst struct x86_cpu_id cpu_no_l1tf[] = { /* in addition to cpu_no_speculation */ - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MERRIFIELD }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MOOREFIELD }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT_MID }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_DENVERTON }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GEMINI_LAKE }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_X }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_PLUS }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, {}
From: Thomas Gleixner tglx@xxxxxxxxxxxxx
commit 024d83cadc6b2af027e473720f3c3da97496c318 upstream.
Mikhail reported the following lockdep splat:
WARNING: possible irq lock inversion dependency detected CPU 0/KVM/10284 just changed the state of lock: 000000000d538a88 (&st->lock){+...}, at: speculative_store_bypass_update+0x10b/0x170
but this lock was taken by another, HARDIRQ-safe lock in the past:
(&(&sighand->siglock)->rlock){-.-.}
and interrupts could create inverse lock ordering between them.
Possible interrupt unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK ***
The code path which connects those locks is:
speculative_store_bypass_update() ssb_prctl_set() do_seccomp() do_syscall_64()
In svm_vcpu_run() speculative_store_bypass_update() is called with interupts enabled via x86_virt_spec_ctrl_set_guest/host().
This is actually a false positive, because GIF=0 so interrupts are disabled even if IF=1; however, we can easily move the invocations of x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to cure it, and it's a good idea to keep the GIF=0/IF=1 area as small and self-contained as possible.
Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") Reported-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Cc: Joerg Roedel joro@8bytes.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: Matthew Wilcox willy@infradead.org Cc: Borislav Petkov bp@suse.de Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Tom Lendacky thomas.lendacky@amd.com Cc: kvm@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3928,8 +3928,6 @@ static void svm_vcpu_run(struct kvm_vcpu
clgi();
- local_irq_enable(); - /* * If this vCPU has touched SPEC_CTRL, restore the guest's value if * it's non-zero. Since vmentry is serialising on affected CPUs, there @@ -3938,6 +3936,8 @@ static void svm_vcpu_run(struct kvm_vcpu */ x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
+ local_irq_enable(); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -4060,12 +4060,12 @@ static void svm_vcpu_run(struct kvm_vcpu if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
- x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); - reload_tss(vcpu);
local_irq_disable();
+ x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); + vcpu->arch.cr2 = svm->vmcb->save.cr2; vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
From: Nadav Amit namit@vmware.com
commit 9bc4f28af75a91aea0ae383f50b0a430c4509303 upstream.
When page-table entries are set, the compiler might optimize their assignment by using multiple instructions to set the PTE. This might turn into a security hazard if the user somehow manages to use the interim PTE. L1TF does not make our lives easier, making even an interim non-present PTE a security hazard.
Using WRITE_ONCE() to set PTEs and friends should prevent this potential security hazard.
I skimmed the differences in the binary with and without this patch. The differences are (obviously) greater when CONFIG_PARAVIRT=n as more code optimizations are possible. For better and worse, the impact on the binary with this patch is pretty small. Skimming the code did not cause anything to jump out as a security hazard, but it seems that at least move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
Signed-off-by: Nadav Amit namit@vmware.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Sean Christopherson sean.j.christopherson@intel.com Cc: Andy Lutomirski luto@kernel.org Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com [bwh: Backported to 4.4: - Drop changes in pmdp_establish(), native_set_p4d(), pudp_set_access_flags() - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/pgtable_64.h | 16 ++++++++-------- arch/x86/mm/pgtable.c | 6 +++--- 2 files changed, 11 insertions(+), 11 deletions(-)
--- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -44,15 +44,15 @@ struct mm_struct; void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
-static inline void native_pte_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) +static inline void native_set_pte(pte_t *ptep, pte_t pte) { - *ptep = native_make_pte(0); + WRITE_ONCE(*ptep, pte); }
-static inline void native_set_pte(pte_t *ptep, pte_t pte) +static inline void native_pte_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep) { - *ptep = pte; + native_set_pte(ptep, native_make_pte(0)); }
static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte) @@ -62,7 +62,7 @@ static inline void native_set_pte_atomic
static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd) { - *pmdp = pmd; + WRITE_ONCE(*pmdp, pmd); }
static inline void native_pmd_clear(pmd_t *pmd) @@ -98,7 +98,7 @@ static inline pmd_t native_pmdp_get_and_
static inline void native_set_pud(pud_t *pudp, pud_t pud) { - *pudp = pud; + WRITE_ONCE(*pudp, pud); }
static inline void native_pud_clear(pud_t *pud) @@ -131,7 +131,7 @@ static inline pgd_t *native_get_shadow_p
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd) { - *pgdp = kaiser_set_shadow_pgd(pgdp, pgd); + WRITE_ONCE(*pgdp, kaiser_set_shadow_pgd(pgdp, pgd)); }
static inline void native_pgd_clear(pgd_t *pgd) --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -247,7 +247,7 @@ static void pgd_mop_up_pmds(struct mm_st if (pgd_val(pgd) != 0) { pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
- pgdp[i] = native_make_pgd(0); + pgd_clear(&pgdp[i]);
paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT); pmd_free(mm, pmd); @@ -424,7 +424,7 @@ int ptep_set_access_flags(struct vm_area int changed = !pte_same(*ptep, entry);
if (changed && dirty) { - *ptep = entry; + set_pte(ptep, entry); pte_update_defer(vma->vm_mm, address, ptep); }
@@ -441,7 +441,7 @@ int pmdp_set_access_flags(struct vm_area VM_BUG_ON(address & ~HPAGE_PMD_MASK);
if (changed && dirty) { - *pmdp = entry; + set_pmd(pmdp, entry); pmd_update_defer(vma->vm_mm, address, pmdp); /* * We had a write-protection fault here and changed the pmd
From: Jiri Kosina jkosina@suse.cz
commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream.
Currently, IBPB is only issued in cases when switching into a non-dumpable process, the rationale being to protect such 'important and security sensitive' processess (such as GPG) from data leaking into a different userspace process via spectre v2.
This is however completely insufficient to provide proper userspace-to-userpace spectrev2 protection, as any process can poison branch buffers before being scheduled out, and the newly scheduled process immediately becomes spectrev2 victim.
In order to minimize the performance impact (for usecases that do require spectrev2 protection), issue the barrier only in cases when switching between processess where the victim can't be ptraced by the potential attacker (as in such cases, the attacker doesn't have to bother with branch buffers at all).
[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably fine-grained ]
Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch") Originally-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "WoodhouseDavid" dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: "SchauflerCasey" casey.schaufler@intel.com Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/tlb.c | 31 ++++++++++++++++++++----------- include/linux/ptrace.h | 21 +++++++++++++++++++-- kernel/ptrace.c | 10 ++++++++++ 3 files changed, 49 insertions(+), 13 deletions(-)
--- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -7,6 +7,7 @@ #include <linux/module.h> #include <linux/cpu.h> #include <linux/debugfs.h> +#include <linux/ptrace.h>
#include <asm/tlbflush.h> #include <asm/mmu_context.h> @@ -101,6 +102,19 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); }
+static bool ibpb_needed(struct task_struct *tsk, u64 last_ctx_id) +{ + /* + * Check if the current (previous) task has access to the memory + * of the @tsk (next) task. If access is denied, make sure to + * issue a IBPB to stop user->user Spectre-v2 attacks. + * + * Note: __ptrace_may_access() returns 0 or -ERRNO. + */ + return (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id && + ptrace_may_access_sched(tsk, PTRACE_MODE_SPEC_IBPB)); +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -115,18 +129,13 @@ void switch_mm_irqs_off(struct mm_struct * one process from doing Spectre-v2 attacks on another. * * As an optimization, flush indirect branches only when - * switching into processes that disable dumping. This - * protects high value processes like gpg, without having - * too high performance overhead. IBPB is *expensive*! - * - * This will not flush branches when switching into kernel - * threads. It will also not flush if we switch to idle - * thread and back to the same process. It will flush if we - * switch to a different non-dumpable process. + * switching into a processes that can't be ptrace by the + * current one (as in such case, attacker has much more + * convenient way how to tamper with the next process than + * branch buffer poisoning). */ - if (tsk && tsk->mm && - tsk->mm->context.ctx_id != last_ctx_id && - get_dumpable(tsk->mm) != SUID_DUMP_USER) + if (static_cpu_has(X86_FEATURE_USE_IBPB) && + ibpb_needed(tsk, last_ctx_id)) indirect_branch_prediction_barrier();
/* --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -57,14 +57,17 @@ extern void exit_ptrace(struct task_stru #define PTRACE_MODE_READ 0x01 #define PTRACE_MODE_ATTACH 0x02 #define PTRACE_MODE_NOAUDIT 0x04 -#define PTRACE_MODE_FSCREDS 0x08 -#define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_FSCREDS 0x08 +#define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_SCHED 0x20 +#define PTRACE_MODE_IBPB 0x40
/* shorthands for READ/ATTACH and FSCREDS/REALCREDS combinations */ #define PTRACE_MODE_READ_FSCREDS (PTRACE_MODE_READ | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_READ_REALCREDS (PTRACE_MODE_READ | PTRACE_MODE_REALCREDS) #define PTRACE_MODE_ATTACH_FSCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS) +#define PTRACE_MODE_SPEC_IBPB (PTRACE_MODE_ATTACH_REALCREDS | PTRACE_MODE_IBPB)
/** * ptrace_may_access - check whether the caller is permitted to access @@ -82,6 +85,20 @@ extern void exit_ptrace(struct task_stru */ extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
+/** + * ptrace_may_access - check whether the caller is permitted to access + * a target task. + * @task: target task + * @mode: selects type of access and caller credentials + * + * Returns true on success, false on denial. + * + * Similar to ptrace_may_access(). Only to be called from context switch + * code. Does not call into audit and the regular LSM hooks due to locking + * constraints. + */ +extern bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode); + static inline int ptrace_reparented(struct task_struct *child) { return !same_thread_group(child->real_parent, child->parent); --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -228,6 +228,9 @@ static int ptrace_check_attach(struct ta
static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode) { + if (mode & PTRACE_MODE_SCHED) + return false; + if (mode & PTRACE_MODE_NOAUDIT) return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE); else @@ -295,9 +298,16 @@ ok: !ptrace_has_cap(mm->user_ns, mode))) return -EPERM;
+ if (mode & PTRACE_MODE_SCHED) + return 0; return security_ptrace_access_check(task, mode); }
+bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode) +{ + return __ptrace_may_access(task, mode | PTRACE_MODE_SCHED); +} + bool ptrace_may_access(struct task_struct *task, unsigned int mode) { int err;
From: Jiri Kosina jkosina@suse.cz
commit 53c613fe6349994f023245519265999eed75957f upstream.
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature (once enabled) prevents cross-hyperthread control of decisions made by indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2 - the CPU supports SMT and has SMT siblings online - spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr on crossing kernel boundary is a no-no. This could perhaps later be a bit more optimized (like disabling it in NOHZ, experiment with disabling it in idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added spec_ctrl_mutex is currently not strictly needed, as the only updater is already being serialized by cpu_add_remove_lock, but let's make this a little bit more future-proof.
Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "WoodhouseDavid" dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: "SchauflerCasey" casey.schaufler@intel.com Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm [bwh: Backported to 4.4: - Don't add any calls to arch_smt_update() yet. They will be introduced by "x86/speculation: Rework SMT state change". - Use IS_ENABLED(CONFIG_SMP) instead of cpu_smt_control for now. This will be fixed by "x86/speculation: Rework SMT state change".] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 55 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -32,12 +32,10 @@ static void __init spectre_v2_select_mit static void __init ssb_select_mitigation(void); static void __init l1tf_select_mitigation(void);
-/* - * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any - * writes to SPEC_CTRL contain whatever reserved bits have been set. - */ +/* The base value of the SPEC_CTRL MSR that always has to be preserved. */ u64 x86_spec_ctrl_base; EXPORT_SYMBOL_GPL(x86_spec_ctrl_base); +static DEFINE_MUTEX(spec_ctrl_mutex);
/* * The vendor and possibly platform specific bits which can be modified in @@ -315,6 +313,46 @@ static enum spectre_v2_mitigation_cmd __ return cmd; }
+static bool stibp_needed(void) +{ + if (spectre_v2_enabled == SPECTRE_V2_NONE) + return false; + + if (!boot_cpu_has(X86_FEATURE_STIBP)) + return false; + + return true; +} + +static void update_stibp_msr(void *info) +{ + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); +} + +void arch_smt_update(void) +{ + u64 mask; + + if (!stibp_needed()) + return; + + mutex_lock(&spec_ctrl_mutex); + mask = x86_spec_ctrl_base; + if (IS_ENABLED(CONFIG_SMP)) + mask |= SPEC_CTRL_STIBP; + else + mask &= ~SPEC_CTRL_STIBP; + + if (mask != x86_spec_ctrl_base) { + pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", + IS_ENABLED(CONFIG_SMP) ? + "Enabling" : "Disabling"); + x86_spec_ctrl_base = mask; + on_each_cpu(update_stibp_msr, NULL, 1); + } + mutex_unlock(&spec_ctrl_mutex); +} + static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -414,6 +452,9 @@ specv2_set_mode: setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW); pr_info("Enabling Restricted Speculation for firmware calls\n"); } + + /* Enable STIBP if appropriate */ + arch_smt_update(); }
#undef pr_fmt @@ -722,6 +763,8 @@ static void __init l1tf_select_mitigatio static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { + int ret; + if (!boot_cpu_has_bug(bug)) return sprintf(buf, "Not affected\n");
@@ -736,10 +779,12 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: __user pointer sanitization\n");
case X86_BUG_SPECTRE_V2: - return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + ret = sprintf(buf, "%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", + (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", spectre_v2_module_string()); + return ret;
case X86_BUG_SPEC_STORE_BYPASS: return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
From: Jiri Kosina jkosina@suse.cz
commit bb4b3b7762735cdaba5a40fd94c9303d9ffa147a upstream.
If spectrev2 mitigation has been enabled, RSB is filled on context switch in order to protect from various classes of spectrev2 attacks.
If this mitigation is enabled, say so in sysfs for spectrev2.
Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "WoodhouseDavid" dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: "SchauflerCasey" casey.schaufler@intel.com Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pm Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -779,10 +779,11 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: __user pointer sanitization\n");
case X86_BUG_SPECTRE_V2: - ret = sprintf(buf, "%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + ret = sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", + boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", spectre_v2_module_string()); return ret;
From: Tim Chen tim.c.chen@linux.intel.com
commit 8eb729b77faf83ac4c1f363a9ad68d042415f24c upstream.
"Reduced Data Speculation" is an obsolete term. The correct new name is "Speculative store bypass disable" - which is abbreviated into SSBD.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185003.593893901@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/thread_info.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -92,7 +92,7 @@ struct thread_info { #define TIF_SIGPENDING 2 /* signal pending */ #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/ -#define TIF_SSBD 5 /* Reduced data speculation */ +#define TIF_SSBD 5 /* Speculative store bypass disable */ #define TIF_SYSCALL_EMU 6 /* syscall emulation active */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */
From: Tim Chen tim.c.chen@linux.intel.com
commit 24848509aa55eac39d524b587b051f4e86df3c12 upstream.
Remove the unnecessary 'else' statement in spectre_v2_parse_cmdline() to save an indentation level.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185003.688010903@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -273,22 +273,21 @@ static enum spectre_v2_mitigation_cmd __
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; - else { - ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); - if (ret < 0) - return SPECTRE_V2_CMD_AUTO;
- for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) { - if (!match_option(arg, ret, mitigation_options[i].option)) - continue; - cmd = mitigation_options[i].cmd; - break; - } + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); + if (ret < 0) + return SPECTRE_V2_CMD_AUTO;
- if (i >= ARRAY_SIZE(mitigation_options)) { - pr_err("unknown option (%s). Switching to AUTO select\n", arg); - return SPECTRE_V2_CMD_AUTO; - } + for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) { + if (!match_option(arg, ret, mitigation_options[i].option)) + continue; + cmd = mitigation_options[i].cmd; + break; + } + + if (i >= ARRAY_SIZE(mitigation_options)) { + pr_err("unknown option (%s). Switching to AUTO select\n", arg); + return SPECTRE_V2_CMD_AUTO; }
if ((cmd == SPECTRE_V2_CMD_RETPOLINE ||
From: Tim Chen tim.c.chen@linux.intel.com
commit b86bda0426853bfe8a3506c7d2a5b332760ae46b upstream.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185003.783903657@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -762,8 +762,6 @@ static void __init l1tf_select_mitigatio static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { - int ret; - if (!boot_cpu_has_bug(bug)) return sprintf(buf, "Not affected\n");
@@ -778,13 +776,12 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: __user pointer sanitization\n");
case X86_BUG_SPECTRE_V2: - ret = sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + return sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", spectre_v2_module_string()); - return ret;
case X86_BUG_SPEC_STORE_BYPASS: return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
From: Tim Chen tim.c.chen@linux.intel.com
commit a8f76ae41cd633ac00be1b3019b1eb4741be3828 upstream.
The Spectre V2 printout in cpu_show_common() handles conditionals for the various mitigation methods directly in the sprintf() argument list. That's hard to read and will become unreadable if more complex decisions need to be made for a particular method.
Move the conditionals for STIBP and IBPB string selection into helper functions, so they can be extended later on.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185003.874479208@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -759,6 +759,22 @@ static void __init l1tf_select_mitigatio
#ifdef CONFIG_SYSFS
+static char *stibp_state(void) +{ + if (x86_spec_ctrl_base & SPEC_CTRL_STIBP) + return ", STIBP"; + else + return ""; +} + +static char *ibpb_state(void) +{ + if (boot_cpu_has(X86_FEATURE_USE_IBPB)) + return ", IBPB"; + else + return ""; +} + static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { @@ -777,9 +793,9 @@ static ssize_t cpu_show_common(struct de
case X86_BUG_SPECTRE_V2: return sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], - boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", + ibpb_state(), boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", - (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", + stibp_state(), boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", spectre_v2_module_string());
From: Tim Chen tim.c.chen@linux.intel.com
commit 34bce7c9690b1d897686aac89604ba7adc365556 upstream.
If enhanced IBRS is active, STIBP is redundant for mitigating Spectre v2 user space exploits from hyperthread sibling.
Disable STIBP when enhanced IBRS is used.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185003.966801480@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -317,6 +317,10 @@ static bool stibp_needed(void) if (spectre_v2_enabled == SPECTRE_V2_NONE) return false;
+ /* Enhanced IBRS makes using STIBP unnecessary. */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return false; + if (!boot_cpu_has(X86_FEATURE_STIBP)) return false;
@@ -761,6 +765,9 @@ static void __init l1tf_select_mitigatio
static char *stibp_state(void) { + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return ""; + if (x86_spec_ctrl_base & SPEC_CTRL_STIBP) return ", STIBP"; else
From: Thomas Gleixner tglx@linutronix.de
commit 26c4d75b234040c11728a8acb796b3a85ba7507c upstream.
During context switch, the SSBD bit in SPEC_CTRL MSR is updated according to changes of the TIF_SSBD flag in the current and next running task.
Currently, only the bit controlling speculative store bypass disable in SPEC_CTRL MSR is updated and the related update functions all have "speculative_store" or "ssb" in their names.
For enhanced mitigation control other bits in SPEC_CTRL MSR need to be updated as well, which makes the SSB names inadequate.
Rename the "speculative_store*" functions to a more generic name. No functional change.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.058866968@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/spec-ctrl.h | 6 +++--- arch/x86/kernel/cpu/bugs.c | 4 ++-- arch/x86/kernel/process.c | 12 ++++++------ 3 files changed, 11 insertions(+), 11 deletions(-)
--- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -70,11 +70,11 @@ extern void speculative_store_bypass_ht_ static inline void speculative_store_bypass_ht_init(void) { } #endif
-extern void speculative_store_bypass_update(unsigned long tif); +extern void speculation_ctrl_update(unsigned long tif);
-static inline void speculative_store_bypass_update_current(void) +static inline void speculation_ctrl_update_current(void) { - speculative_store_bypass_update(current_thread_info()->flags); + speculation_ctrl_update(current_thread_info()->flags); }
#endif --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -192,7 +192,7 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, tif = setguest ? ssbd_spec_ctrl_to_tif(guestval) : ssbd_spec_ctrl_to_tif(hostval);
- speculative_store_bypass_update(tif); + speculation_ctrl_update(tif); } } EXPORT_SYMBOL_GPL(x86_virt_spec_ctrl); @@ -629,7 +629,7 @@ static int ssb_prctl_set(struct task_str * mitigation until it is next scheduled. */ if (task == current && update) - speculative_store_bypass_update_current(); + speculation_ctrl_update_current();
return 0; } --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -317,27 +317,27 @@ static __always_inline void amd_set_ssb_ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn)); }
-static __always_inline void intel_set_ssb_state(unsigned long tifn) +static __always_inline void spec_ctrl_update_msr(unsigned long tifn) { u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn);
wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
-static __always_inline void __speculative_store_bypass_update(unsigned long tifn) +static __always_inline void __speculation_ctrl_update(unsigned long tifn) { if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) amd_set_ssb_virt_state(tifn); else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) amd_set_core_ssb_state(tifn); else - intel_set_ssb_state(tifn); + spec_ctrl_update_msr(tifn); }
-void speculative_store_bypass_update(unsigned long tif) +void speculation_ctrl_update(unsigned long tif) { preempt_disable(); - __speculative_store_bypass_update(tif); + __speculation_ctrl_update(tif); preempt_enable(); }
@@ -371,7 +371,7 @@ void __switch_to_xtra(struct task_struct cr4_toggle_bits(X86_CR4_TSD);
if ((tifp ^ tifn) & _TIF_SSBD) - __speculative_store_bypass_update(tifn); + __speculation_ctrl_update(tifn); }
/*
From: Tim Chen tim.c.chen@linux.intel.com
commit 01daf56875ee0cd50ed496a09b20eb369b45dfa5 upstream.
The logic to detect whether there's a change in the previous and next task's flag relevant to update speculation control MSRs is spread out across multiple functions.
Consolidate all checks needed for updating speculation control MSRs into the new __speculation_ctrl_update() helper function.
This makes it easy to pick the right speculation control MSR and the bits in MSR_IA32_SPEC_CTRL that need updating based on TIF flags changes.
Originally-by: Thomas Lendacky Thomas.Lendacky@amd.com Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.151077005@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/process.c | 42 +++++++++++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -317,27 +317,40 @@ static __always_inline void amd_set_ssb_ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn)); }
-static __always_inline void spec_ctrl_update_msr(unsigned long tifn) +/* + * Update the MSRs managing speculation control, during context switch. + * + * tifp: Previous task's thread flags + * tifn: Next task's thread flags + */ +static __always_inline void __speculation_ctrl_update(unsigned long tifp, + unsigned long tifn) { - u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn); + u64 msr = x86_spec_ctrl_base; + bool updmsr = false;
- wrmsrl(MSR_IA32_SPEC_CTRL, msr); -} + /* If TIF_SSBD is different, select the proper mitigation method */ + if ((tifp ^ tifn) & _TIF_SSBD) { + if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) { + amd_set_ssb_virt_state(tifn); + } else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) { + amd_set_core_ssb_state(tifn); + } else if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || + static_cpu_has(X86_FEATURE_AMD_SSBD)) { + msr |= ssbd_tif_to_spec_ctrl(tifn); + updmsr = true; + } + }
-static __always_inline void __speculation_ctrl_update(unsigned long tifn) -{ - if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) - amd_set_ssb_virt_state(tifn); - else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) - amd_set_core_ssb_state(tifn); - else - spec_ctrl_update_msr(tifn); + if (updmsr) + wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
void speculation_ctrl_update(unsigned long tif) { + /* Forced update. Make sure all relevant TIF flags are different */ preempt_disable(); - __speculation_ctrl_update(tif); + __speculation_ctrl_update(~tif, tif); preempt_enable(); }
@@ -370,8 +383,7 @@ void __switch_to_xtra(struct task_struct if ((tifp ^ tifn) & _TIF_NOTSC) cr4_toggle_bits(X86_CR4_TSD);
- if ((tifp ^ tifn) & _TIF_SSBD) - __speculation_ctrl_update(tifn); + __speculation_ctrl_update(tifp, tifn); }
/*
From: Thomas Gleixner tglx@linutronix.de
commit dbe733642e01dd108f71436aaea7b328cb28fd87 upstream.
CONFIG_SCHED_SMT is enabled by all distros, so there is not a real point to have it configurable. The runtime overhead in the core scheduler code is minimal because the actual SMT scheduling parts are conditional on a static key.
This allows to expose the scheduler's SMT state static key to the speculation control code. Alternatively the scheduler's static key could be made always available when CONFIG_SMP is enabled, but that's just adding an unused static key to every other architecture for nothing.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.337452245@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -893,13 +893,7 @@ config NR_CPUS approximately eight kilobytes to the kernel image.
config SCHED_SMT - bool "SMT (Hyperthreading) scheduler support" - depends on SMP - ---help--- - SMT scheduler support improves the CPU scheduler's decision making - when dealing with Intel Pentium 4 chips with HyperThreading at a - cost of slightly increased overhead in some places. If unsure say - N here. + def_bool y if SMP
config SCHED_MC def_bool y
From: Ben Hutchings ben@decadent.org.uk
Add the sched_smt_active() function needed for some x86 speculation mitigations. This was introduced upstream by commits 1b568f0aabf2 "sched/core: Optimize SCHED_SMT", ba2591a5993e "sched/smt: Update sched_smt_present at runtime", c5511d03ec09 "sched/smt: Make sched_smt_present track topology", and 321a874a7ef8 "sched/smt: Expose sched_smt_present static key". The upstream implementation uses the static_key_{disable,enable}_cpuslocked() functions, which aren't practical to backport.
Signed-off-by: Ben Hutchings ben@decadent.org.uk Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sched/smt.h | 18 ++++++++++++++++++ kernel/sched/core.c | 24 ++++++++++++++++++++++++ kernel/sched/sched.h | 1 + 3 files changed, 43 insertions(+) create mode 100644 include/linux/sched/smt.h
--- /dev/null +++ b/include/linux/sched/smt.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SCHED_SMT_H +#define _LINUX_SCHED_SMT_H + +#include <linux/atomic.h> + +#ifdef CONFIG_SCHED_SMT +extern atomic_t sched_smt_present; + +static __always_inline bool sched_smt_active(void) +{ + return atomic_read(&sched_smt_present); +} +#else +static inline bool sched_smt_active(void) { return false; } +#endif + +#endif --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5610,6 +5610,10 @@ static void set_cpu_rq_start_time(void) rq->age_stamp = sched_clock_cpu(cpu); }
+#ifdef CONFIG_SCHED_SMT +atomic_t sched_smt_present = ATOMIC_INIT(0); +#endif + static int sched_cpu_active(struct notifier_block *nfb, unsigned long action, void *hcpu) { @@ -5626,11 +5630,23 @@ static int sched_cpu_active(struct notif * set_cpu_online(). But it might not yet have marked itself * as active, which is essential from here on. */ +#ifdef CONFIG_SCHED_SMT + /* + * When going up, increment the number of cores with SMT present. + */ + if (cpumask_weight(cpu_smt_mask(cpu)) == 2) + atomic_inc(&sched_smt_present); +#endif set_cpu_active(cpu, true); stop_machine_unpark(cpu); return NOTIFY_OK;
case CPU_DOWN_FAILED: +#ifdef CONFIG_SCHED_SMT + /* Same as for CPU_ONLINE */ + if (cpumask_weight(cpu_smt_mask(cpu)) == 2) + atomic_inc(&sched_smt_present); +#endif set_cpu_active(cpu, true); return NOTIFY_OK;
@@ -5645,7 +5661,15 @@ static int sched_cpu_inactive(struct not switch (action & ~CPU_TASKS_FROZEN) { case CPU_DOWN_PREPARE: set_cpu_active((long)hcpu, false); +#ifdef CONFIG_SCHED_SMT + /* + * When going down, decrement the number of cores with SMT present. + */ + if (cpumask_weight(cpu_smt_mask((long)hcpu)) == 2) + atomic_dec(&sched_smt_present); +#endif return NOTIFY_OK; + default: return NOTIFY_DONE; } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2,6 +2,7 @@ #include <linux/sched.h> #include <linux/sched/sysctl.h> #include <linux/sched/rt.h> +#include <linux/sched/smt.h> #include <linux/sched/deadline.h> #include <linux/mutex.h> #include <linux/spinlock.h>
From: Thomas Gleixner tglx@linutronix.de
commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream.
arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline.
To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled.
Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 11 +++++------ include/linux/sched/smt.h | 2 ++ kernel/cpu.c | 10 +++++++++- 3 files changed, 16 insertions(+), 7 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -13,6 +13,7 @@ #include <linux/module.h> #include <linux/nospec.h> #include <linux/prctl.h> +#include <linux/sched/smt.h>
#include <asm/spec-ctrl.h> #include <asm/cmdline.h> @@ -340,16 +341,14 @@ void arch_smt_update(void) return;
mutex_lock(&spec_ctrl_mutex); - mask = x86_spec_ctrl_base; - if (IS_ENABLED(CONFIG_SMP)) + + mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; + if (sched_smt_active()) mask |= SPEC_CTRL_STIBP; - else - mask &= ~SPEC_CTRL_STIBP;
if (mask != x86_spec_ctrl_base) { pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", - IS_ENABLED(CONFIG_SMP) ? - "Enabling" : "Disabling"); + mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); x86_spec_ctrl_base = mask; on_each_cpu(update_stibp_msr, NULL, 1); } --- a/include/linux/sched/smt.h +++ b/include/linux/sched/smt.h @@ -15,4 +15,6 @@ static __always_inline bool sched_smt_ac static inline bool sched_smt_active(void) { return false; } #endif
+void arch_smt_update(void); + #endif --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -8,6 +8,7 @@ #include <linux/init.h> #include <linux/notifier.h> #include <linux/sched.h> +#include <linux/sched/smt.h> #include <linux/unistd.h> #include <linux/cpu.h> #include <linux/oom.h> @@ -199,6 +200,12 @@ void cpu_hotplug_enable(void) EXPORT_SYMBOL_GPL(cpu_hotplug_enable); #endif /* CONFIG_HOTPLUG_CPU */
+/* + * Architectures that need SMT-specific errata handling during SMT hotplug + * should override this. + */ +void __weak arch_smt_update(void) { } + /* Need to know about CPUs going up/down? */ int register_cpu_notifier(struct notifier_block *nb) { @@ -434,6 +441,7 @@ out_release: cpu_hotplug_done(); if (!err) cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu); + arch_smt_update(); return err; }
@@ -537,7 +545,7 @@ out_notify: __cpu_notify(CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL); out: cpu_hotplug_done(); - + arch_smt_update(); return ret; }
From: Thomas Gleixner tglx@linutronix.de
commit 15d6b7aab0793b2de8a05d8a828777dd24db424e upstream.
Reorder the code so it is better grouped. No functional change.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.707122879@linutronix.de [bwh: Backported to 4.4: - We still have the minimal mitigation modes - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 174 ++++++++++++++++++++++----------------------- 1 file changed, 87 insertions(+), 87 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -115,30 +115,6 @@ void __init check_bugs(void) #endif }
-/* The kernel command line selection */ -enum spectre_v2_mitigation_cmd { - SPECTRE_V2_CMD_NONE, - SPECTRE_V2_CMD_AUTO, - SPECTRE_V2_CMD_FORCE, - SPECTRE_V2_CMD_RETPOLINE, - SPECTRE_V2_CMD_RETPOLINE_GENERIC, - SPECTRE_V2_CMD_RETPOLINE_AMD, -}; - -static const char *spectre_v2_strings[] = { - [SPECTRE_V2_NONE] = "Vulnerable", - [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", - [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", - [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", - [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", - [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS", -}; - -#undef pr_fmt -#define pr_fmt(fmt) "Spectre V2 : " fmt - -static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; - void x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest) { @@ -208,6 +184,11 @@ static void x86_amd_ssb_disable(void) wrmsrl(MSR_AMD64_LS_CFG, msrval); }
+#undef pr_fmt +#define pr_fmt(fmt) "Spectre V2 : " fmt + +static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; + #ifdef RETPOLINE static bool spectre_v2_bad_module;
@@ -229,6 +210,45 @@ static inline const char *spectre_v2_mod static inline const char *spectre_v2_module_string(void) { return ""; } #endif
+static inline bool match_option(const char *arg, int arglen, const char *opt) +{ + int len = strlen(opt); + + return len == arglen && !strncmp(arg, opt, len); +} + +/* The kernel command line selection for spectre v2 */ +enum spectre_v2_mitigation_cmd { + SPECTRE_V2_CMD_NONE, + SPECTRE_V2_CMD_AUTO, + SPECTRE_V2_CMD_FORCE, + SPECTRE_V2_CMD_RETPOLINE, + SPECTRE_V2_CMD_RETPOLINE_GENERIC, + SPECTRE_V2_CMD_RETPOLINE_AMD, +}; + +static const char *spectre_v2_strings[] = { + [SPECTRE_V2_NONE] = "Vulnerable", + [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", + [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", + [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", + [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", + [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS", +}; + +static const struct { + const char *option; + enum spectre_v2_mitigation_cmd cmd; + bool secure; +} mitigation_options[] = { + { "off", SPECTRE_V2_CMD_NONE, false }, + { "on", SPECTRE_V2_CMD_FORCE, true }, + { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, + { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false }, + { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false }, + { "auto", SPECTRE_V2_CMD_AUTO, false }, +}; + static void __init spec2_print_if_insecure(const char *reason) { if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) @@ -246,31 +266,11 @@ static inline bool retp_compiler(void) return __is_defined(RETPOLINE); }
-static inline bool match_option(const char *arg, int arglen, const char *opt) -{ - int len = strlen(opt); - - return len == arglen && !strncmp(arg, opt, len); -} - -static const struct { - const char *option; - enum spectre_v2_mitigation_cmd cmd; - bool secure; -} mitigation_options[] = { - { "off", SPECTRE_V2_CMD_NONE, false }, - { "on", SPECTRE_V2_CMD_FORCE, true }, - { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, - { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false }, - { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false }, - { "auto", SPECTRE_V2_CMD_AUTO, false }, -}; - static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) { + enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO; char arg[20]; int ret, i; - enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO;
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; @@ -313,48 +313,6 @@ static enum spectre_v2_mitigation_cmd __ return cmd; }
-static bool stibp_needed(void) -{ - if (spectre_v2_enabled == SPECTRE_V2_NONE) - return false; - - /* Enhanced IBRS makes using STIBP unnecessary. */ - if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) - return false; - - if (!boot_cpu_has(X86_FEATURE_STIBP)) - return false; - - return true; -} - -static void update_stibp_msr(void *info) -{ - wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); -} - -void arch_smt_update(void) -{ - u64 mask; - - if (!stibp_needed()) - return; - - mutex_lock(&spec_ctrl_mutex); - - mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; - if (sched_smt_active()) - mask |= SPEC_CTRL_STIBP; - - if (mask != x86_spec_ctrl_base) { - pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", - mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); - x86_spec_ctrl_base = mask; - on_each_cpu(update_stibp_msr, NULL, 1); - } - mutex_unlock(&spec_ctrl_mutex); -} - static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -459,6 +417,48 @@ specv2_set_mode: arch_smt_update(); }
+static bool stibp_needed(void) +{ + if (spectre_v2_enabled == SPECTRE_V2_NONE) + return false; + + /* Enhanced IBRS makes using STIBP unnecessary. */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return false; + + if (!boot_cpu_has(X86_FEATURE_STIBP)) + return false; + + return true; +} + +static void update_stibp_msr(void *info) +{ + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); +} + +void arch_smt_update(void) +{ + u64 mask; + + if (!stibp_needed()) + return; + + mutex_lock(&spec_ctrl_mutex); + + mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; + if (sched_smt_active()) + mask |= SPEC_CTRL_STIBP; + + if (mask != x86_spec_ctrl_base) { + pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", + mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); + x86_spec_ctrl_base = mask; + on_each_cpu(update_stibp_msr, NULL, 1); + } + mutex_unlock(&spec_ctrl_mutex); +} + #undef pr_fmt #define pr_fmt(fmt) "Speculative Store Bypass: " fmt
From: Thomas Gleixner tglx@linutronix.de
commit 8770709f411763884535662744a3786a1806afd3 upstream.
checkpatch.pl muttered when reshuffling the code: WARNING: static const char * array should probably be static const char * const
Fix up all the string arrays.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.800018931@linutronix.de [bwh: Backported to 4.4: drop the part for KVM mitigation modes] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -227,7 +227,7 @@ enum spectre_v2_mitigation_cmd { SPECTRE_V2_CMD_RETPOLINE_AMD, };
-static const char *spectre_v2_strings[] = { +static const char * const spectre_v2_strings[] = { [SPECTRE_V2_NONE] = "Vulnerable", [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", @@ -473,7 +473,7 @@ enum ssb_mitigation_cmd { SPEC_STORE_BYPASS_CMD_SECCOMP, };
-static const char *ssb_strings[] = { +static const char * const ssb_strings[] = { [SPEC_STORE_BYPASS_NONE] = "Vulnerable", [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled", [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl",
From: Thomas Gleixner tglx@linutronix.de
commit 30ba72a990f5096ae08f284de17986461efcc408 upstream.
No point to keep that around.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.893886356@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -240,7 +240,7 @@ static const struct { const char *option; enum spectre_v2_mitigation_cmd cmd; bool secure; -} mitigation_options[] = { +} mitigation_options[] __initdata = { { "off", SPECTRE_V2_CMD_NONE, false }, { "on", SPECTRE_V2_CMD_FORCE, true }, { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, @@ -483,7 +483,7 @@ static const char * const ssb_strings[] static const struct { const char *option; enum ssb_mitigation_cmd cmd; -} ssb_mitigation_options[] = { +} ssb_mitigation_options[] __initdata = { { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
From: Thomas Gleixner tglx@linutronix.de
commit 495d470e9828500e0155027f230449ac5e29c025 upstream.
There is no point in having two functions and a conditional at the call site.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185004.986890749@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -249,15 +249,9 @@ static const struct { { "auto", SPECTRE_V2_CMD_AUTO, false }, };
-static void __init spec2_print_if_insecure(const char *reason) +static void __init spec_v2_print_cond(const char *reason, bool secure) { - if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - pr_info("%s selected on command line.\n", reason); -} - -static void __init spec2_print_if_secure(const char *reason) -{ - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2) != secure) pr_info("%s selected on command line.\n", reason); }
@@ -305,11 +299,8 @@ static enum spectre_v2_mitigation_cmd __ return SPECTRE_V2_CMD_AUTO; }
- if (mitigation_options[i].secure) - spec2_print_if_secure(mitigation_options[i].option); - else - spec2_print_if_insecure(mitigation_options[i].option); - + spec_v2_print_cond(mitigation_options[i].option, + mitigation_options[i].secure); return cmd; }
From: Thomas Gleixner tglx@linutronix.de
commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream.
Add command line control for user space indirect branch speculation mitigations. The new option is: spectre_v2_user=
The initial options are:
- on: Unconditionally enabled - off: Unconditionally disabled -auto: Kernel selects mitigation (default off for now)
When the spectre_v2= command line argument is either 'on' or 'off' this implies that the application to application control follows that state even if a contradicting spectre_v2_user= argument is supplied.
Originally-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.de [bwh: Backported to 4.4: - Don't use __ro_after_init or cpu_smt_control - Adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 32 ++++++++ arch/x86/include/asm/nospec-branch.h | 10 ++ arch/x86/kernel/cpu/bugs.c | 131 ++++++++++++++++++++++++++++++----- 3 files changed, 154 insertions(+), 19 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3604,9 +3604,13 @@ bytes respectively. Such letter suffixes
spectre_v2= [X86] Control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability. + The default operation protects the kernel from + user space attacks.
- on - unconditionally enable - off - unconditionally disable + on - unconditionally enable, implies + spectre_v2_user=on + off - unconditionally disable, implies + spectre_v2_user=off auto - kernel detects whether your CPU model is vulnerable
@@ -3616,6 +3620,12 @@ bytes respectively. Such letter suffixes CONFIG_RETPOLINE configuration option, and the compiler with which the kernel was built.
+ Selecting 'on' will also enable the mitigation + against user space to user space task attacks. + + Selecting 'off' will disable both the kernel and + the user space protections. + Specific mitigations can also be selected manually:
retpoline - replace indirect branches @@ -3625,6 +3635,24 @@ bytes respectively. Such letter suffixes Not specifying this option is equivalent to spectre_v2=auto.
+ spectre_v2_user= + [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability between + user space tasks + + on - Unconditionally enable mitigations. Is + enforced by spectre_v2=on + + off - Unconditionally disable mitigations. Is + enforced by spectre_v2=off + + auto - Kernel selects the mitigation depending on + the available CPU features and vulnerability. + Default is off. + + Not specifying this option is equivalent to + spectre_v2_user=auto. + spec_store_bypass_disable= [HW] Control Speculative Store Bypass (SSB) Disable mitigation (Speculative Store Bypass vulnerability) --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -3,6 +3,8 @@ #ifndef _ASM_X86_NOSPEC_BRANCH_H_ #define _ASM_X86_NOSPEC_BRANCH_H_
+#include <linux/static_key.h> + #include <asm/alternative.h> #include <asm/alternative-asm.h> #include <asm/cpufeatures.h> @@ -172,6 +174,12 @@ enum spectre_v2_mitigation { SPECTRE_V2_IBRS_ENHANCED, };
+/* The indirect branch speculation control variants */ +enum spectre_v2_user_mitigation { + SPECTRE_V2_USER_NONE, + SPECTRE_V2_USER_STRICT, +}; + /* The Speculative Store Bypass disable variants */ enum ssb_mitigation { SPEC_STORE_BYPASS_NONE, @@ -248,6 +256,8 @@ do { \ preempt_enable(); \ } while (0)
+DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp); + #endif /* __ASSEMBLY__ */
/* --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -51,6 +51,9 @@ static u64 x86_spec_ctrl_mask = SPEC_CTR u64 x86_amd_ls_cfg_base; u64 x86_amd_ls_cfg_ssbd_mask;
+/* Control conditional STIPB in switch_to() */ +DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp); + void __init check_bugs(void) { identify_boot_cpu(); @@ -189,6 +192,8 @@ static void x86_amd_ssb_disable(void)
static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE;
+static enum spectre_v2_user_mitigation spectre_v2_user = SPECTRE_V2_USER_NONE; + #ifdef RETPOLINE static bool spectre_v2_bad_module;
@@ -227,6 +232,103 @@ enum spectre_v2_mitigation_cmd { SPECTRE_V2_CMD_RETPOLINE_AMD, };
+enum spectre_v2_user_cmd { + SPECTRE_V2_USER_CMD_NONE, + SPECTRE_V2_USER_CMD_AUTO, + SPECTRE_V2_USER_CMD_FORCE, +}; + +static const char * const spectre_v2_user_strings[] = { + [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", + [SPECTRE_V2_USER_STRICT] = "User space: Mitigation: STIBP protection", +}; + +static const struct { + const char *option; + enum spectre_v2_user_cmd cmd; + bool secure; +} v2_user_options[] __initdata = { + { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, + { "off", SPECTRE_V2_USER_CMD_NONE, false }, + { "on", SPECTRE_V2_USER_CMD_FORCE, true }, +}; + +static void __init spec_v2_user_print_cond(const char *reason, bool secure) +{ + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2) != secure) + pr_info("spectre_v2_user=%s forced on command line.\n", reason); +} + +static enum spectre_v2_user_cmd __init +spectre_v2_parse_user_cmdline(enum spectre_v2_mitigation_cmd v2_cmd) +{ + char arg[20]; + int ret, i; + + switch (v2_cmd) { + case SPECTRE_V2_CMD_NONE: + return SPECTRE_V2_USER_CMD_NONE; + case SPECTRE_V2_CMD_FORCE: + return SPECTRE_V2_USER_CMD_FORCE; + default: + break; + } + + ret = cmdline_find_option(boot_command_line, "spectre_v2_user", + arg, sizeof(arg)); + if (ret < 0) + return SPECTRE_V2_USER_CMD_AUTO; + + for (i = 0; i < ARRAY_SIZE(v2_user_options); i++) { + if (match_option(arg, ret, v2_user_options[i].option)) { + spec_v2_user_print_cond(v2_user_options[i].option, + v2_user_options[i].secure); + return v2_user_options[i].cmd; + } + } + + pr_err("Unknown user space protection option (%s). Switching to AUTO select\n", arg); + return SPECTRE_V2_USER_CMD_AUTO; +} + +static void __init +spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) +{ + enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE; + bool smt_possible = IS_ENABLED(CONFIG_SMP); + + if (!boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_STIBP)) + return; + + if (!IS_ENABLED(CONFIG_SMP)) + smt_possible = false; + + switch (spectre_v2_parse_user_cmdline(v2_cmd)) { + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_NONE: + goto set_mode; + case SPECTRE_V2_USER_CMD_FORCE: + mode = SPECTRE_V2_USER_STRICT; + break; + } + + /* Initialize Indirect Branch Prediction Barrier */ + if (boot_cpu_has(X86_FEATURE_IBPB)) { + setup_force_cpu_cap(X86_FEATURE_USE_IBPB); + pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); + } + + /* If enhanced IBRS is enabled no STIPB required */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return; + +set_mode: + spectre_v2_user = mode; + /* Only print the STIBP mode when SMT possible */ + if (smt_possible) + pr_info("%s\n", spectre_v2_user_strings[mode]); +} + static const char * const spectre_v2_strings[] = { [SPECTRE_V2_NONE] = "Vulnerable", [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", @@ -382,12 +484,6 @@ specv2_set_mode: setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
- /* Initialize Indirect Branch Prediction Barrier if supported */ - if (boot_cpu_has(X86_FEATURE_IBPB)) { - setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); - } - /* * Retpoline means the kernel is safe because it has no indirect * branches. Enhanced IBRS protects firmware too, so, enable restricted @@ -404,23 +500,21 @@ specv2_set_mode: pr_info("Enabling Restricted Speculation for firmware calls\n"); }
+ /* Set up IBPB and STIBP depending on the general spectre V2 command */ + spectre_v2_user_select_mitigation(cmd); + /* Enable STIBP if appropriate */ arch_smt_update(); }
static bool stibp_needed(void) { - if (spectre_v2_enabled == SPECTRE_V2_NONE) - return false; - /* Enhanced IBRS makes using STIBP unnecessary. */ if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return false;
- if (!boot_cpu_has(X86_FEATURE_STIBP)) - return false; - - return true; + /* Check for strict user mitigation mode */ + return spectre_v2_user == SPECTRE_V2_USER_STRICT; }
static void update_stibp_msr(void *info) @@ -758,10 +852,13 @@ static char *stibp_state(void) if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return "";
- if (x86_spec_ctrl_base & SPEC_CTRL_STIBP) - return ", STIBP"; - else - return ""; + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + return ", STIBP: disabled"; + case SPECTRE_V2_USER_STRICT: + return ", STIBP: forced"; + } + return ""; }
static char *ibpb_state(void)
From: Tim Chen tim.c.chen@linux.intel.com
commit 5bfbe3ad5840d941b89bcac54b821ba14f50a0ba upstream.
To avoid the overhead of STIBP always on, it's necessary to allow per task control of STIBP.
Add a new task flag TIF_SPEC_IB and evaluate it during context switch if SMT is active and flag evaluation is enabled by the speculation control code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the guest/host switch works properly.
This has no effect because TIF_SPEC_IB cannot be set yet and the static key which controls evaluation is off. Preparatory patch for adding the control code.
[ tglx: Simplify the context switch logic and make the TIF evaluation depend on SMP=y and on the static key controlling the conditional update. Rename it to TIF_SPEC_IB because it controls both STIBP and IBPB ]
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/msr-index.h | 5 +++-- arch/x86/include/asm/spec-ctrl.h | 12 ++++++++++++ arch/x86/include/asm/thread_info.h | 5 ++++- arch/x86/kernel/cpu/bugs.c | 4 ++++ arch/x86/kernel/process.c | 20 ++++++++++++++++++-- 5 files changed, 41 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -34,9 +34,10 @@ /* Intel MSRs. Some also available on other CPUs */ #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ #define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */ -#define SPEC_CTRL_STIBP (1 << 1) /* Single Thread Indirect Branch Predictors */ +#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ +#define SPEC_CTRL_STIBP (1 << SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ #define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ -#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ +#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ #define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */ --- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -53,12 +53,24 @@ static inline u64 ssbd_tif_to_spec_ctrl( return (tifn & _TIF_SSBD) >> (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT); }
+static inline u64 stibp_tif_to_spec_ctrl(u64 tifn) +{ + BUILD_BUG_ON(TIF_SPEC_IB < SPEC_CTRL_STIBP_SHIFT); + return (tifn & _TIF_SPEC_IB) >> (TIF_SPEC_IB - SPEC_CTRL_STIBP_SHIFT); +} + static inline unsigned long ssbd_spec_ctrl_to_tif(u64 spec_ctrl) { BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT); return (spec_ctrl & SPEC_CTRL_SSBD) << (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT); }
+static inline unsigned long stibp_spec_ctrl_to_tif(u64 spec_ctrl) +{ + BUILD_BUG_ON(TIF_SPEC_IB < SPEC_CTRL_STIBP_SHIFT); + return (spec_ctrl & SPEC_CTRL_STIBP) << (TIF_SPEC_IB - SPEC_CTRL_STIBP_SHIFT); +} + static inline u64 ssbd_tif_to_amd_ls_cfg(u64 tifn) { return (tifn & _TIF_SSBD) ? x86_amd_ls_cfg_ssbd_mask : 0ULL; --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -96,6 +96,7 @@ struct thread_info { #define TIF_SYSCALL_EMU 6 /* syscall emulation active */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ +#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ #define TIF_UPROBE 12 /* breakpointed or singlestepping */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ @@ -121,6 +122,7 @@ struct thread_info { #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) +#define _TIF_SPEC_IB (1 << TIF_SPEC_IB) #define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY) #define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_NOTSC (1 << TIF_NOTSC) @@ -149,7 +151,8 @@ struct thread_info {
/* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ - (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD) + (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP| \ + _TIF_SSBD|_TIF_SPEC_IB)
#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -139,6 +139,10 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, static_cpu_has(X86_FEATURE_AMD_SSBD)) hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
+ /* Conditional STIBP enabled? */ + if (static_branch_unlikely(&switch_to_cond_stibp)) + hostval |= stibp_tif_to_spec_ctrl(ti->flags); + if (hostval != guestval) { msrval = setguest ? guestval : hostval; wrmsrl(MSR_IA32_SPEC_CTRL, msrval); --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -326,11 +326,17 @@ static __always_inline void amd_set_ssb_ static __always_inline void __speculation_ctrl_update(unsigned long tifp, unsigned long tifn) { + unsigned long tif_diff = tifp ^ tifn; u64 msr = x86_spec_ctrl_base; bool updmsr = false;
- /* If TIF_SSBD is different, select the proper mitigation method */ - if ((tifp ^ tifn) & _TIF_SSBD) { + /* + * If TIF_SSBD is different, select the proper mitigation + * method. Note that if SSBD mitigation is disabled or permanentely + * enabled this branch can't be taken because nothing can set + * TIF_SSBD. + */ + if (tif_diff & _TIF_SSBD) { if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) { amd_set_ssb_virt_state(tifn); } else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) { @@ -342,6 +348,16 @@ static __always_inline void __speculatio } }
+ /* + * Only evaluate TIF_SPEC_IB if conditional STIBP is enabled, + * otherwise avoid the MSR write. + */ + if (IS_ENABLED(CONFIG_SMP) && + static_branch_unlikely(&switch_to_cond_stibp)) { + updmsr |= !!(tif_diff & _TIF_SPEC_IB); + msr |= stibp_tif_to_spec_ctrl(tifn); + } + if (updmsr) wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
From: Thomas Gleixner tglx@linutronix.de
commit ff16701a29cba3aafa0bd1656d766813b2d0a811 upstream.
Move the conditional invocation of __switch_to_xtra() into an inline function so the logic can be shared between 32 and 64 bit.
Remove the handthrough of the TSS pointer and retrieve the pointer directly in the bitmap handling function. Use this_cpu_ptr() instead of the per_cpu() indirection.
This is a preparatory change so integration of conditional indirect branch speculation optimization happens only in one place.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.280855518@linutronix.de [bwh: Backported to 4.4: - Use cpu_tss instead of cpu_tss_rw - __switch_to() still uses the tss variable, so don't delete it - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/switch_to.h | 3 --- arch/x86/kernel/process.c | 12 +++++++----- arch/x86/kernel/process.h | 24 ++++++++++++++++++++++++ arch/x86/kernel/process_32.c | 9 +++------ arch/x86/kernel/process_64.c | 9 +++------ 5 files changed, 37 insertions(+), 20 deletions(-) create mode 100644 arch/x86/kernel/process.h
--- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -6,9 +6,6 @@ struct task_struct; /* one of the stranger aspects of C forward declarations */ __visible struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next); -struct tss_struct; -void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, - struct tss_struct *tss);
#ifdef CONFIG_X86_32
--- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -33,6 +33,8 @@ #include <asm/vm86.h> #include <asm/spec-ctrl.h>
+#include "process.h" + /* * per-CPU TSS segments. Threads are completely 'soft' on Linux, * no more per-task TSS's. The TSS size is kept cacheline-aligned @@ -179,11 +181,12 @@ int set_tsc_mode(unsigned int val) return 0; }
-static inline void switch_to_bitmap(struct tss_struct *tss, - struct thread_struct *prev, +static inline void switch_to_bitmap(struct thread_struct *prev, struct thread_struct *next, unsigned long tifp, unsigned long tifn) { + struct tss_struct *tss = this_cpu_ptr(&cpu_tss); + if (tifn & _TIF_IO_BITMAP) { /* * Copy the relevant range of the IO bitmap. @@ -370,8 +373,7 @@ void speculation_ctrl_update(unsigned lo preempt_enable(); }
-void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, - struct tss_struct *tss) +void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) { struct thread_struct *prev, *next; unsigned long tifp, tifn; @@ -381,7 +383,7 @@ void __switch_to_xtra(struct task_struct
tifn = READ_ONCE(task_thread_info(next_p)->flags); tifp = READ_ONCE(task_thread_info(prev_p)->flags); - switch_to_bitmap(tss, prev, next, tifp, tifn); + switch_to_bitmap(prev, next, tifp, tifn);
propagate_user_return_notify(prev_p, next_p);
--- /dev/null +++ b/arch/x86/kernel/process.h @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Code shared between 32 and 64 bit + +void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p); + +/* + * This needs to be inline to optimize for the common case where no extra + * work needs to be done. + */ +static inline void switch_to_extra(struct task_struct *prev, + struct task_struct *next) +{ + unsigned long next_tif = task_thread_info(next)->flags; + unsigned long prev_tif = task_thread_info(prev)->flags; + + /* + * __switch_to_xtra() handles debug registers, i/o bitmaps, + * speculation mitigations etc. + */ + if (unlikely(next_tif & _TIF_WORK_CTXSW_NEXT || + prev_tif & _TIF_WORK_CTXSW_PREV)) + __switch_to_xtra(prev, next); +} --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -55,6 +55,8 @@ #include <asm/switch_to.h> #include <asm/vm86.h>
+#include "process.h" + asmlinkage void ret_from_fork(void) __asm__("ret_from_fork"); asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
@@ -279,12 +281,7 @@ __switch_to(struct task_struct *prev_p, if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) set_iopl_mask(next->iopl);
- /* - * Now maybe handle debug registers and/or IO bitmaps - */ - if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV || - task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT)) - __switch_to_xtra(prev_p, next_p, tss); + switch_to_extra(prev_p, next_p);
/* * Leave lazy mode, flushing any hypercalls made here. --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -50,6 +50,8 @@ #include <asm/switch_to.h> #include <asm/xen/hypervisor.h>
+#include "process.h" + asmlinkage extern void ret_from_fork(void);
__visible DEFINE_PER_CPU(unsigned long, rsp_scratch); @@ -406,12 +408,7 @@ __switch_to(struct task_struct *prev_p, /* Reload esp0 and ss1. This changes current_thread_info(). */ load_sp0(tss, next);
- /* - * Now maybe reload the debug registers and handle I/O bitmaps - */ - if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || - task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) - __switch_to_xtra(prev_p, next_p, tss); + switch_to_extra(prev_p, next_p);
#ifdef CONFIG_XEN /*
From: Thomas Gleixner tglx@linutronix.de
commit 5635d99953f04b550738f6f4c1c532667c3fd872 upstream.
The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke __switch_to_xtra() when:
- CONFIG_SMP is disabled
- The conditional STIPB mode is disabled
The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the only set bit in the work masks.
Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at run time when the static key controlling the conditional STIBP mode is disabled.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/thread_info.h | 13 +++++++++++-- arch/x86/kernel/process.h | 15 +++++++++++++++ 2 files changed, 26 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -150,9 +150,18 @@ struct thread_info { _TIF_NOHZ)
/* flags to check in __switch_to() */ -#define _TIF_WORK_CTXSW \ +#define _TIF_WORK_CTXSW_BASE \ (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP| \ - _TIF_SSBD|_TIF_SPEC_IB) + _TIF_SSBD) + +/* + * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated. + */ +#ifdef CONFIG_SMP +# define _TIF_WORK_CTXSW (_TIF_WORK_CTXSW_BASE | _TIF_SPEC_IB) +#else +# define _TIF_WORK_CTXSW (_TIF_WORK_CTXSW_BASE) +#endif
#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) --- a/arch/x86/kernel/process.h +++ b/arch/x86/kernel/process.h @@ -2,6 +2,8 @@ // // Code shared between 32 and 64 bit
+#include <asm/spec-ctrl.h> + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p);
/* @@ -14,6 +16,19 @@ static inline void switch_to_extra(struc unsigned long next_tif = task_thread_info(next)->flags; unsigned long prev_tif = task_thread_info(prev)->flags;
+ if (IS_ENABLED(CONFIG_SMP)) { + /* + * Avoid __switch_to_xtra() invocation when conditional + * STIPB is disabled and the only different bit is + * TIF_SPEC_IB. For CONFIG_SMP=n TIF_SPEC_IB is not + * in the TIF_WORK_CTXSW masks. + */ + if (!static_branch_likely(&switch_to_cond_stibp)) { + prev_tif &= ~_TIF_SPEC_IB; + next_tif &= ~_TIF_SPEC_IB; + } + } + /* * __switch_to_xtra() handles debug registers, i/o bitmaps, * speculation mitigations etc.
From: Thomas Gleixner tglx@linutronix.de
commit 4c71a2b6fd7e42814aa68a6dec88abf3b42ea573 upstream.
The IBPB speculation barrier is issued from switch_mm() when the kernel switches to a user space task with a different mm than the user space task which ran last on the same CPU.
An additional optimization is to avoid IBPB when the incoming task can be ptraced by the outgoing task. This optimization only works when switching directly between two user space tasks. When switching from a kernel task to a user space task the optimization fails because the previous task cannot be accessed anymore. So for quite some scenarios the optimization is just adding overhead.
The upcoming conditional IBPB support will issue IBPB only for user space tasks which have the TIF_SPEC_IB bit set. This requires to handle the following cases:
1) Switch from a user space task (potential attacker) which has TIF_SPEC_IB set to a user space task (potential victim) which has TIF_SPEC_IB not set.
2) Switch from a user space task (potential attacker) which has TIF_SPEC_IB not set to a user space task (potential victim) which has TIF_SPEC_IB set.
This needs to be optimized for the case where the IBPB can be avoided when only kernel threads ran in between user space tasks which belong to the same process.
The current check whether two tasks belong to the same context is using the tasks context id. While correct, it's simpler to use the mm pointer because it allows to mangle the TIF_SPEC_IB bit into it. The context id based mechanism requires extra storage, which creates worse code.
When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into the per CPU storage which is used to track the last user space mm which was running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of the incoming task to make the decision whether IBPB needs to be issued or not to cover the two cases above.
As conditional IBPB is going to be the default, remove the dubious ptrace check for the IBPB always case and simply issue IBPB always when the process changes.
Move the storage to a different place in the struct as the original one created a hole.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.de [bwh: Backported to 4.4: - Drop changes in initialize_tlbstate_and_flush() - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 2 arch/x86/include/asm/tlbflush.h | 8 +- arch/x86/kernel/cpu/bugs.c | 29 +++++++- arch/x86/mm/tlb.c | 113 ++++++++++++++++++++++++++--------- 4 files changed, 117 insertions(+), 35 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -257,6 +257,8 @@ do { \ } while (0)
DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp); +DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); +DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
#endif /* __ASSEMBLY__ */
--- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -68,8 +68,12 @@ static inline void invpcid_flush_all_non struct tlb_state { struct mm_struct *active_mm; int state; - /* last user mm's ctx id */ - u64 last_ctx_id; + + /* Last user mm for optimizing IBPB */ + union { + struct mm_struct *last_user_mm; + unsigned long last_user_mm_ibpb; + };
/* * Access to this CR4 shadow and to H/W CR4 is protected by --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -53,6 +53,10 @@ u64 x86_amd_ls_cfg_ssbd_mask;
/* Control conditional STIPB in switch_to() */ DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp); +/* Control conditional IBPB in switch_mm() */ +DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); +/* Control unconditional IBPB in switch_mm() */ +DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
void __init check_bugs(void) { @@ -319,7 +323,17 @@ spectre_v2_user_select_mitigation(enum s /* Initialize Indirect Branch Prediction Barrier */ if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); + + switch (mode) { + case SPECTRE_V2_USER_STRICT: + static_branch_enable(&switch_mm_always_ibpb); + break; + default: + break; + } + + pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", + mode == SPECTRE_V2_USER_STRICT ? "always-on" : "conditional"); }
/* If enhanced IBRS is enabled no STIPB required */ @@ -867,10 +881,15 @@ static char *stibp_state(void)
static char *ibpb_state(void) { - if (boot_cpu_has(X86_FEATURE_USE_IBPB)) - return ", IBPB"; - else - return ""; + if (boot_cpu_has(X86_FEATURE_IBPB)) { + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + return ", IBPB: disabled"; + case SPECTRE_V2_USER_STRICT: + return ", IBPB: always-on"; + } + } + return ""; }
static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -7,7 +7,6 @@ #include <linux/module.h> #include <linux/cpu.h> #include <linux/debugfs.h> -#include <linux/ptrace.h>
#include <asm/tlbflush.h> #include <asm/mmu_context.h> @@ -31,6 +30,12 @@ * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi */
+/* + * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is + * stored in cpu_tlb_state.last_user_mm_ibpb. + */ +#define LAST_USER_MM_IBPB 0x1UL + atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
struct flush_tlb_info { @@ -102,17 +107,87 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); }
-static bool ibpb_needed(struct task_struct *tsk, u64 last_ctx_id) +static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next) { + unsigned long next_tif = task_thread_info(next)->flags; + unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB; + + return (unsigned long)next->mm | ibpb; +} + +static void cond_ibpb(struct task_struct *next) +{ + if (!next || !next->mm) + return; + /* - * Check if the current (previous) task has access to the memory - * of the @tsk (next) task. If access is denied, make sure to - * issue a IBPB to stop user->user Spectre-v2 attacks. - * - * Note: __ptrace_may_access() returns 0 or -ERRNO. + * Both, the conditional and the always IBPB mode use the mm + * pointer to avoid the IBPB when switching between tasks of the + * same process. Using the mm pointer instead of mm->context.ctx_id + * opens a hypothetical hole vs. mm_struct reuse, which is more or + * less impossible to control by an attacker. Aside of that it + * would only affect the first schedule so the theoretically + * exposed data is not really interesting. */ - return (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id && - ptrace_may_access_sched(tsk, PTRACE_MODE_SPEC_IBPB)); + if (static_branch_likely(&switch_mm_cond_ibpb)) { + unsigned long prev_mm, next_mm; + + /* + * This is a bit more complex than the always mode because + * it has to handle two cases: + * + * 1) Switch from a user space task (potential attacker) + * which has TIF_SPEC_IB set to a user space task + * (potential victim) which has TIF_SPEC_IB not set. + * + * 2) Switch from a user space task (potential attacker) + * which has TIF_SPEC_IB not set to a user space task + * (potential victim) which has TIF_SPEC_IB set. + * + * This could be done by unconditionally issuing IBPB when + * a task which has TIF_SPEC_IB set is either scheduled in + * or out. Though that results in two flushes when: + * + * - the same user space task is scheduled out and later + * scheduled in again and only a kernel thread ran in + * between. + * + * - a user space task belonging to the same process is + * scheduled in after a kernel thread ran in between + * + * - a user space task belonging to the same process is + * scheduled in immediately. + * + * Optimize this with reasonably small overhead for the + * above cases. Mangle the TIF_SPEC_IB bit into the mm + * pointer of the incoming task which is stored in + * cpu_tlbstate.last_user_mm_ibpb for comparison. + */ + next_mm = mm_mangle_tif_spec_ib(next); + prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb); + + /* + * Issue IBPB only if the mm's are different and one or + * both have the IBPB bit set. + */ + if (next_mm != prev_mm && + (next_mm | prev_mm) & LAST_USER_MM_IBPB) + indirect_branch_prediction_barrier(); + + this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm); + } + + if (static_branch_unlikely(&switch_mm_always_ibpb)) { + /* + * Only flush when switching to a user space task with a + * different context than the user space task which ran + * last on this CPU. + */ + if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) { + indirect_branch_prediction_barrier(); + this_cpu_write(cpu_tlbstate.last_user_mm, next->mm); + } + } }
void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, @@ -121,30 +196,12 @@ void switch_mm_irqs_off(struct mm_struct unsigned cpu = smp_processor_id();
if (likely(prev != next)) { - u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id); - /* * Avoid user/user BTB poisoning by flushing the branch * predictor when switching between processes. This stops * one process from doing Spectre-v2 attacks on another. - * - * As an optimization, flush indirect branches only when - * switching into a processes that can't be ptrace by the - * current one (as in such case, attacker has much more - * convenient way how to tamper with the next process than - * branch buffer poisoning). - */ - if (static_cpu_has(X86_FEATURE_USE_IBPB) && - ibpb_needed(tsk, last_ctx_id)) - indirect_branch_prediction_barrier(); - - /* - * Record last user mm's context id, so we can avoid - * flushing branch buffer with IBPB if we switch back - * to the same user. */ - if (next != &init_mm) - this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id); + cond_ibpb(tsk);
this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK); this_cpu_write(cpu_tlbstate.active_mm, next);
From: Thomas Gleixner tglx@linutronix.de
commit e6da8bb6f9abb2628381904b24163c770e630bac upstream.
The update of the TIF_SSBD flag and the conditional speculation control MSR update is done in the ssb_prctl_set() function directly. The upcoming prctl support for controlling indirect branch speculation via STIBP needs the same mechanism.
Split the code out and make it reusable. Reword the comment about updates for other tasks.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.652305076@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -697,10 +697,29 @@ static void ssb_select_mitigation(void) #undef pr_fmt #define pr_fmt(fmt) "Speculation prctl: " fmt
-static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) +static void task_update_spec_tif(struct task_struct *tsk, int tifbit, bool on) { bool update;
+ if (on) + update = !test_and_set_tsk_thread_flag(tsk, tifbit); + else + update = test_and_clear_tsk_thread_flag(tsk, tifbit); + + /* + * Immediately update the speculation control MSRs for the current + * task, but for a non-current task delay setting the CPU + * mitigation until it is scheduled next. + * + * This can only happen for SECCOMP mitigation. For PRCTL it's + * always the current task. + */ + if (tsk == current && update) + speculation_ctrl_update_current(); +} + +static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) +{ if (ssb_mode != SPEC_STORE_BYPASS_PRCTL && ssb_mode != SPEC_STORE_BYPASS_SECCOMP) return -ENXIO; @@ -711,28 +730,20 @@ static int ssb_prctl_set(struct task_str if (task_spec_ssb_force_disable(task)) return -EPERM; task_clear_spec_ssb_disable(task); - update = test_and_clear_tsk_thread_flag(task, TIF_SSBD); + task_update_spec_tif(task, TIF_SSBD, false); break; case PR_SPEC_DISABLE: task_set_spec_ssb_disable(task); - update = !test_and_set_tsk_thread_flag(task, TIF_SSBD); + task_update_spec_tif(task, TIF_SSBD, true); break; case PR_SPEC_FORCE_DISABLE: task_set_spec_ssb_disable(task); task_set_spec_ssb_force_disable(task); - update = !test_and_set_tsk_thread_flag(task, TIF_SSBD); + task_update_spec_tif(task, TIF_SSBD, true); break; default: return -ERANGE; } - - /* - * If being set on non-current task, delay setting the CPU - * mitigation until it is next scheduled. - */ - if (task == current && update) - speculation_ctrl_update_current(); - return 0; }
From: Thomas Gleixner tglx@linutronix.de
commit 6893a959d7fdebbab5f5aa112c277d5a44435ba1 upstream.
The upcoming fine grained per task STIBP control needs to be updated on CPU hotplug as well.
Split out the code which controls the strict mode so the prctl control code can be added later. Mark the SMP function call argument __unused while at it.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.759457117@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 46 ++++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 21 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -525,40 +525,44 @@ specv2_set_mode: arch_smt_update(); }
-static bool stibp_needed(void) +static void update_stibp_msr(void * __unused) { - /* Enhanced IBRS makes using STIBP unnecessary. */ - if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) - return false; - - /* Check for strict user mitigation mode */ - return spectre_v2_user == SPECTRE_V2_USER_STRICT; + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); }
-static void update_stibp_msr(void *info) +/* Update x86_spec_ctrl_base in case SMT state changed. */ +static void update_stibp_strict(void) { - wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); + u64 mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; + + if (sched_smt_active()) + mask |= SPEC_CTRL_STIBP; + + if (mask == x86_spec_ctrl_base) + return; + + pr_info("Update user space SMT mitigation: STIBP %s\n", + mask & SPEC_CTRL_STIBP ? "always-on" : "off"); + x86_spec_ctrl_base = mask; + on_each_cpu(update_stibp_msr, NULL, 1); }
void arch_smt_update(void) { - u64 mask; - - if (!stibp_needed()) + /* Enhanced IBRS implies STIBP. No update required. */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return;
mutex_lock(&spec_ctrl_mutex);
- mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; - if (sched_smt_active()) - mask |= SPEC_CTRL_STIBP; - - if (mask != x86_spec_ctrl_base) { - pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", - mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); - x86_spec_ctrl_base = mask; - on_each_cpu(update_stibp_msr, NULL, 1); + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + break; + case SPECTRE_V2_USER_STRICT: + update_stibp_strict(); + break; } + mutex_unlock(&spec_ctrl_mutex); }
From: Thomas Gleixner tglx@linutronix.de
commit 6d991ba509ebcfcc908e009d1db51972a4f7a064 upstream.
The seccomp speculation control operates on all tasks of a process, but only the current task of a process can update the MSR immediately. For the other threads the update is deferred to the next context switch.
This creates the following situation with Process A and B:
Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2 does not have the speculation control TIF bit set. Process B task 1 has the speculation control TIF bit set.
CPU0 CPU1 MSR bit is set ProcB.T1 schedules out ProcA.T2 schedules in MSR bit is cleared ProcA.T1 seccomp_update() set TIF bit on ProcA.T2 ProcB.T1 schedules in MSR is not updated <-- FAIL
This happens because the context switch code tries to avoid the MSR update if the speculation control TIF bits of the incoming and the outgoing task are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks scheduling back and forth on CPU1, which keeps the MSR stale forever.
In theory this could be remedied by IPIs, but chasing the remote task which could be migrated is complex and full of races.
The straight forward solution is to avoid the asychronous update of the TIF bit and defer it to the next context switch. The speculation control state is stored in task_struct::atomic_flags by the prctl and seccomp updates already.
Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the atomic_flags. Check the bit on context switch and force a synchronous update of the speculation control if set. Use the same mechanism for updating the current task.
Reported-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linut... [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/spec-ctrl.h | 6 +----- arch/x86/include/asm/thread_info.h | 4 +++- arch/x86/kernel/cpu/bugs.c | 18 +++++++----------- arch/x86/kernel/process.c | 30 +++++++++++++++++++++++++++++- 4 files changed, 40 insertions(+), 18 deletions(-)
--- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -83,10 +83,6 @@ static inline void speculative_store_byp #endif
extern void speculation_ctrl_update(unsigned long tif); - -static inline void speculation_ctrl_update_current(void) -{ - speculation_ctrl_update(current_thread_info()->flags); -} +extern void speculation_ctrl_update_current(void);
#endif --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -97,6 +97,7 @@ struct thread_info { #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ #define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */ +#define TIF_SPEC_FORCE_UPDATE 10 /* Force speculation MSR update in context switch */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ #define TIF_UPROBE 12 /* breakpointed or singlestepping */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ @@ -123,6 +124,7 @@ struct thread_info { #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) #define _TIF_SPEC_IB (1 << TIF_SPEC_IB) +#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE) #define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY) #define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_NOTSC (1 << TIF_NOTSC) @@ -152,7 +154,7 @@ struct thread_info { /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW_BASE \ (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP| \ - _TIF_SSBD) + _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
/* * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated. --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -701,14 +701,10 @@ static void ssb_select_mitigation(void) #undef pr_fmt #define pr_fmt(fmt) "Speculation prctl: " fmt
-static void task_update_spec_tif(struct task_struct *tsk, int tifbit, bool on) +static void task_update_spec_tif(struct task_struct *tsk) { - bool update; - - if (on) - update = !test_and_set_tsk_thread_flag(tsk, tifbit); - else - update = test_and_clear_tsk_thread_flag(tsk, tifbit); + /* Force the update of the real TIF bits */ + set_tsk_thread_flag(tsk, TIF_SPEC_FORCE_UPDATE);
/* * Immediately update the speculation control MSRs for the current @@ -718,7 +714,7 @@ static void task_update_spec_tif(struct * This can only happen for SECCOMP mitigation. For PRCTL it's * always the current task. */ - if (tsk == current && update) + if (tsk == current) speculation_ctrl_update_current(); }
@@ -734,16 +730,16 @@ static int ssb_prctl_set(struct task_str if (task_spec_ssb_force_disable(task)) return -EPERM; task_clear_spec_ssb_disable(task); - task_update_spec_tif(task, TIF_SSBD, false); + task_update_spec_tif(task); break; case PR_SPEC_DISABLE: task_set_spec_ssb_disable(task); - task_update_spec_tif(task, TIF_SSBD, true); + task_update_spec_tif(task); break; case PR_SPEC_FORCE_DISABLE: task_set_spec_ssb_disable(task); task_set_spec_ssb_force_disable(task); - task_update_spec_tif(task, TIF_SSBD, true); + task_update_spec_tif(task); break; default: return -ERANGE; --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -365,6 +365,18 @@ static __always_inline void __speculatio wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
+static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk) +{ + if (test_and_clear_tsk_thread_flag(tsk, TIF_SPEC_FORCE_UPDATE)) { + if (task_spec_ssb_disable(tsk)) + set_tsk_thread_flag(tsk, TIF_SSBD); + else + clear_tsk_thread_flag(tsk, TIF_SSBD); + } + /* Return the updated threadinfo flags*/ + return task_thread_info(tsk)->flags; +} + void speculation_ctrl_update(unsigned long tif) { /* Forced update. Make sure all relevant TIF flags are different */ @@ -373,6 +385,14 @@ void speculation_ctrl_update(unsigned lo preempt_enable(); }
+/* Called from seccomp/prctl update */ +void speculation_ctrl_update_current(void) +{ + preempt_disable(); + speculation_ctrl_update(speculation_ctrl_update_tif(current)); + preempt_enable(); +} + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) { struct thread_struct *prev, *next; @@ -401,7 +421,15 @@ void __switch_to_xtra(struct task_struct if ((tifp ^ tifn) & _TIF_NOTSC) cr4_toggle_bits(X86_CR4_TSD);
- __speculation_ctrl_update(tifp, tifn); + if (likely(!((tifp | tifn) & _TIF_SPEC_FORCE_UPDATE))) { + __speculation_ctrl_update(tifp, tifn); + } else { + speculation_ctrl_update_tif(prev_p); + tifn = speculation_ctrl_update_tif(next_p); + + /* Enforce MSR update to ensure consistent state */ + __speculation_ctrl_update(~tifn, tifn); + } }
/*
From: Thomas Gleixner tglx@linutronix.de
commit 9137bb27e60e554dab694eafa4cca241fa3a694f upstream.
Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB.
Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
See Documentation/userspace-api/spec_ctrl.rst.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de [bwh: Backported to 4.4: - Renumber the PFA flags - Drop changes in tools/include/uapi/linux/prctl.h - Adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/spec_ctrl.txt | 9 ++++ arch/x86/include/asm/nospec-branch.h | 1 arch/x86/kernel/cpu/bugs.c | 67 +++++++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 5 ++ include/linux/sched.h | 9 ++++ include/uapi/linux/prctl.h | 1 6 files changed, 92 insertions(+)
--- a/Documentation/spec_ctrl.txt +++ b/Documentation/spec_ctrl.txt @@ -92,3 +92,12 @@ Speculation misfeature controls * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0); * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0); * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_FORCE_DISABLE, 0, 0); + +- PR_SPEC_INDIR_BRANCH: Indirect Branch Speculation in User Processes + (Mitigate Spectre V2 style attacks against user processes) + + Invocations: + * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -178,6 +178,7 @@ enum spectre_v2_mitigation { enum spectre_v2_user_mitigation { SPECTRE_V2_USER_NONE, SPECTRE_V2_USER_STRICT, + SPECTRE_V2_USER_PRCTL, };
/* The Speculative Store Bypass disable variants */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -561,6 +561,8 @@ void arch_smt_update(void) case SPECTRE_V2_USER_STRICT: update_stibp_strict(); break; + case SPECTRE_V2_USER_PRCTL: + break; }
mutex_unlock(&spec_ctrl_mutex); @@ -747,12 +749,50 @@ static int ssb_prctl_set(struct task_str return 0; }
+static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + switch (ctrl) { + case PR_SPEC_ENABLE: + if (spectre_v2_user == SPECTRE_V2_USER_NONE) + return 0; + /* + * Indirect branch speculation is always disabled in strict + * mode. + */ + if (spectre_v2_user == SPECTRE_V2_USER_STRICT) + return -EPERM; + task_clear_spec_ib_disable(task); + task_update_spec_tif(task); + break; + case PR_SPEC_DISABLE: + case PR_SPEC_FORCE_DISABLE: + /* + * Indirect branch speculation is always allowed when + * mitigation is force disabled. + */ + if (spectre_v2_user == SPECTRE_V2_USER_NONE) + return -EPERM; + if (spectre_v2_user == SPECTRE_V2_USER_STRICT) + return 0; + task_set_spec_ib_disable(task); + if (ctrl == PR_SPEC_FORCE_DISABLE) + task_set_spec_ib_force_disable(task); + task_update_spec_tif(task); + break; + default: + return -ERANGE; + } + return 0; +} + int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, unsigned long ctrl) { switch (which) { case PR_SPEC_STORE_BYPASS: return ssb_prctl_set(task, ctrl); + case PR_SPEC_INDIRECT_BRANCH: + return ib_prctl_set(task, ctrl); default: return -ENODEV; } @@ -785,11 +825,34 @@ static int ssb_prctl_get(struct task_str } }
+static int ib_prctl_get(struct task_struct *task) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + return PR_SPEC_NOT_AFFECTED; + + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + return PR_SPEC_ENABLE; + case SPECTRE_V2_USER_PRCTL: + if (task_spec_ib_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ib_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + case SPECTRE_V2_USER_STRICT: + return PR_SPEC_DISABLE; + default: + return PR_SPEC_NOT_AFFECTED; + } +} + int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) { switch (which) { case PR_SPEC_STORE_BYPASS: return ssb_prctl_get(task); + case PR_SPEC_INDIRECT_BRANCH: + return ib_prctl_get(task); default: return -ENODEV; } @@ -886,6 +949,8 @@ static char *stibp_state(void) return ", STIBP: disabled"; case SPECTRE_V2_USER_STRICT: return ", STIBP: forced"; + case SPECTRE_V2_USER_PRCTL: + return ""; } return ""; } @@ -898,6 +963,8 @@ static char *ibpb_state(void) return ", IBPB: disabled"; case SPECTRE_V2_USER_STRICT: return ", IBPB: always-on"; + case SPECTRE_V2_USER_PRCTL: + return ""; } } return ""; --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -372,6 +372,11 @@ static unsigned long speculation_ctrl_up set_tsk_thread_flag(tsk, TIF_SSBD); else clear_tsk_thread_flag(tsk, TIF_SSBD); + + if (task_spec_ib_disable(tsk)) + set_tsk_thread_flag(tsk, TIF_SPEC_IB); + else + clear_tsk_thread_flag(tsk, TIF_SPEC_IB); } /* Return the updated threadinfo flags*/ return task_thread_info(tsk)->flags; --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2169,6 +2169,8 @@ static inline void memalloc_noio_restore #define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */ #define PFA_SPEC_SSB_DISABLE 4 /* Speculative Store Bypass disabled */ #define PFA_SPEC_SSB_FORCE_DISABLE 5 /* Speculative Store Bypass force disabled*/ +#define PFA_SPEC_IB_DISABLE 6 /* Indirect branch speculation restricted */ +#define PFA_SPEC_IB_FORCE_DISABLE 7 /* Indirect branch speculation permanently restricted */
#define TASK_PFA_TEST(name, func) \ @@ -2199,6 +2201,13 @@ TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ss TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable)
+TASK_PFA_TEST(SPEC_IB_DISABLE, spec_ib_disable) +TASK_PFA_SET(SPEC_IB_DISABLE, spec_ib_disable) +TASK_PFA_CLEAR(SPEC_IB_DISABLE, spec_ib_disable) + +TASK_PFA_TEST(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) +TASK_PFA_SET(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) + /* * task->jobctl flags */ --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -202,6 +202,7 @@ struct prctl_mm_map { #define PR_SET_SPECULATION_CTRL 53 /* Speculation control variants */ # define PR_SPEC_STORE_BYPASS 0 +# define PR_SPEC_INDIRECT_BRANCH 1 /* Return and control values for PR_SET/GET_SPECULATION_CTRL */ # define PR_SPEC_NOT_AFFECTED 0 # define PR_SPEC_PRCTL (1UL << 0)
From: Thomas Gleixner tglx@linutronix.de
commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream.
Now that all prerequisites are in place:
- Add the prctl command line option
- Default the 'auto' mode to 'prctl'
- When SMT state changes, update the static key which controls the conditional STIBP evaluation on context switch.
- At init update the static key which controls the conditional IBPB evaluation on context switch.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.de [bwh: Backported to 4.4: adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 7 +++++- arch/x86/kernel/cpu/bugs.c | 41 ++++++++++++++++++++++++++++-------- 2 files changed, 38 insertions(+), 10 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3646,9 +3646,14 @@ bytes respectively. Such letter suffixes off - Unconditionally disable mitigations. Is enforced by spectre_v2=off
+ prctl - Indirect branch speculation is enabled, + but mitigation can be enabled via prctl + per thread. The mitigation control state + is inherited on fork. + auto - Kernel selects the mitigation depending on the available CPU features and vulnerability. - Default is off. + Default is prctl.
Not specifying this option is equivalent to spectre_v2_user=auto. --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -244,11 +244,13 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_NONE, SPECTRE_V2_USER_CMD_AUTO, SPECTRE_V2_USER_CMD_FORCE, + SPECTRE_V2_USER_CMD_PRCTL, };
static const char * const spectre_v2_user_strings[] = { [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", [SPECTRE_V2_USER_STRICT] = "User space: Mitigation: STIBP protection", + [SPECTRE_V2_USER_PRCTL] = "User space: Mitigation: STIBP via prctl", };
static const struct { @@ -259,6 +261,7 @@ static const struct { { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, { "off", SPECTRE_V2_USER_CMD_NONE, false }, { "on", SPECTRE_V2_USER_CMD_FORCE, true }, + { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, };
static void __init spec_v2_user_print_cond(const char *reason, bool secure) @@ -312,12 +315,15 @@ spectre_v2_user_select_mitigation(enum s smt_possible = false;
switch (spectre_v2_parse_user_cmdline(v2_cmd)) { - case SPECTRE_V2_USER_CMD_AUTO: case SPECTRE_V2_USER_CMD_NONE: goto set_mode; case SPECTRE_V2_USER_CMD_FORCE: mode = SPECTRE_V2_USER_STRICT; break; + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_PRCTL: + mode = SPECTRE_V2_USER_PRCTL; + break; }
/* Initialize Indirect Branch Prediction Barrier */ @@ -328,6 +334,9 @@ spectre_v2_user_select_mitigation(enum s case SPECTRE_V2_USER_STRICT: static_branch_enable(&switch_mm_always_ibpb); break; + case SPECTRE_V2_USER_PRCTL: + static_branch_enable(&switch_mm_cond_ibpb); + break; default: break; } @@ -340,6 +349,12 @@ spectre_v2_user_select_mitigation(enum s if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return;
+ /* + * If SMT is not possible or STIBP is not available clear the STIPB + * mode. + */ + if (!smt_possible || !boot_cpu_has(X86_FEATURE_STIBP)) + mode = SPECTRE_V2_USER_NONE; set_mode: spectre_v2_user = mode; /* Only print the STIBP mode when SMT possible */ @@ -547,6 +562,15 @@ static void update_stibp_strict(void) on_each_cpu(update_stibp_msr, NULL, 1); }
+/* Update the static key controlling the evaluation of TIF_SPEC_IB */ +static void update_indir_branch_cond(void) +{ + if (sched_smt_active()) + static_branch_enable(&switch_to_cond_stibp); + else + static_branch_disable(&switch_to_cond_stibp); +} + void arch_smt_update(void) { /* Enhanced IBRS implies STIBP. No update required. */ @@ -562,6 +586,7 @@ void arch_smt_update(void) update_stibp_strict(); break; case SPECTRE_V2_USER_PRCTL: + update_indir_branch_cond(); break; }
@@ -950,7 +975,8 @@ static char *stibp_state(void) case SPECTRE_V2_USER_STRICT: return ", STIBP: forced"; case SPECTRE_V2_USER_PRCTL: - return ""; + if (static_key_enabled(&switch_to_cond_stibp)) + return ", STIBP: conditional"; } return ""; } @@ -958,14 +984,11 @@ static char *stibp_state(void) static char *ibpb_state(void) { if (boot_cpu_has(X86_FEATURE_IBPB)) { - switch (spectre_v2_user) { - case SPECTRE_V2_USER_NONE: - return ", IBPB: disabled"; - case SPECTRE_V2_USER_STRICT: + if (static_key_enabled(&switch_mm_always_ibpb)) return ", IBPB: always-on"; - case SPECTRE_V2_USER_PRCTL: - return ""; - } + if (static_key_enabled(&switch_mm_cond_ibpb)) + return ", IBPB: conditional"; + return ", IBPB: disabled"; } return ""; }
From: Thomas Gleixner tglx@linutronix.de
commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream.
If 'prctl' mode of user space protection from spectre v2 is selected on the kernel command-line, STIBP and IBPB are applied on tasks which restrict their indirect branch speculation via prctl.
SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it makes sense to prevent spectre v2 user space to user space attacks as well.
The Intel mitigation guide documents how STIPB works:
Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor prevents the predicted targets of indirect branches on any logical processor of that core from being controlled by software that executes (or executed previously) on another logical processor of the same core.
Ergo setting STIBP protects the task itself from being attacked from a task running on a different hyper-thread and protects the tasks running on different hyper-threads from being attacked.
While the document suggests that the branch predictors are shielded between the logical processors, the observed performance regressions suggest that STIBP simply disables the branch predictor more or less completely. Of course the document wording is vague, but the fact that there is also no requirement for issuing IBPB when STIBP is used points clearly in that direction. The kernel still issues IBPB even when STIBP is used until Intel clarifies the whole mechanism.
IBPB is issued when the task switches out, so malicious sandbox code cannot mistrain the branch predictor for the next user space task on the same logical processor.
Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de [bwh: Backported to 4.4: adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 9 ++++++++- arch/x86/include/asm/nospec-branch.h | 1 + arch/x86/kernel/cpu/bugs.c | 17 ++++++++++++++++- 3 files changed, 25 insertions(+), 2 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3651,9 +3651,16 @@ bytes respectively. Such letter suffixes per thread. The mitigation control state is inherited on fork.
+ seccomp + - Same as "prctl" above, but all seccomp + threads will enable the mitigation unless + they explicitly opt out. + auto - Kernel selects the mitigation depending on the available CPU features and vulnerability. - Default is prctl. + + Default mitigation: + If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
Not specifying this option is equivalent to spectre_v2_user=auto. --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -179,6 +179,7 @@ enum spectre_v2_user_mitigation { SPECTRE_V2_USER_NONE, SPECTRE_V2_USER_STRICT, SPECTRE_V2_USER_PRCTL, + SPECTRE_V2_USER_SECCOMP, };
/* The Speculative Store Bypass disable variants */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -245,12 +245,14 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_AUTO, SPECTRE_V2_USER_CMD_FORCE, SPECTRE_V2_USER_CMD_PRCTL, + SPECTRE_V2_USER_CMD_SECCOMP, };
static const char * const spectre_v2_user_strings[] = { [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", [SPECTRE_V2_USER_STRICT] = "User space: Mitigation: STIBP protection", [SPECTRE_V2_USER_PRCTL] = "User space: Mitigation: STIBP via prctl", + [SPECTRE_V2_USER_SECCOMP] = "User space: Mitigation: STIBP via seccomp and prctl", };
static const struct { @@ -262,6 +264,7 @@ static const struct { { "off", SPECTRE_V2_USER_CMD_NONE, false }, { "on", SPECTRE_V2_USER_CMD_FORCE, true }, { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, + { "seccomp", SPECTRE_V2_USER_CMD_SECCOMP, false }, };
static void __init spec_v2_user_print_cond(const char *reason, bool secure) @@ -320,10 +323,16 @@ spectre_v2_user_select_mitigation(enum s case SPECTRE_V2_USER_CMD_FORCE: mode = SPECTRE_V2_USER_STRICT; break; - case SPECTRE_V2_USER_CMD_AUTO: case SPECTRE_V2_USER_CMD_PRCTL: mode = SPECTRE_V2_USER_PRCTL; break; + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_SECCOMP: + if (IS_ENABLED(CONFIG_SECCOMP)) + mode = SPECTRE_V2_USER_SECCOMP; + else + mode = SPECTRE_V2_USER_PRCTL; + break; }
/* Initialize Indirect Branch Prediction Barrier */ @@ -335,6 +344,7 @@ spectre_v2_user_select_mitigation(enum s static_branch_enable(&switch_mm_always_ibpb); break; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: static_branch_enable(&switch_mm_cond_ibpb); break; default: @@ -586,6 +596,7 @@ void arch_smt_update(void) update_stibp_strict(); break; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: update_indir_branch_cond(); break; } @@ -828,6 +839,8 @@ void arch_seccomp_spec_mitigate(struct t { if (ssb_mode == SPEC_STORE_BYPASS_SECCOMP) ssb_prctl_set(task, PR_SPEC_FORCE_DISABLE); + if (spectre_v2_user == SPECTRE_V2_USER_SECCOMP) + ib_prctl_set(task, PR_SPEC_FORCE_DISABLE); } #endif
@@ -859,6 +872,7 @@ static int ib_prctl_get(struct task_stru case SPECTRE_V2_USER_NONE: return PR_SPEC_ENABLE; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: if (task_spec_ib_force_disable(task)) return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; if (task_spec_ib_disable(task)) @@ -975,6 +989,7 @@ static char *stibp_state(void) case SPECTRE_V2_USER_STRICT: return ", STIBP: forced"; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: if (static_key_enabled(&switch_to_cond_stibp)) return ", STIBP: conditional"; }
From: Thomas Gleixner tglx@linutronix.de
commit 55a974021ec952ee460dc31ca08722158639de72 upstream.
Provide the possibility to enable IBPB always in combination with 'prctl' and 'seccomp'.
Add the extra command line options and rework the IBPB selection to evaluate the command instead of the mode selected by the STIPB switch case.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.de [bwh: Backported to 4.4: adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 12 ++++++++++++ arch/x86/kernel/cpu/bugs.c | 34 +++++++++++++++++++++++----------- 2 files changed, 35 insertions(+), 11 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3651,11 +3651,23 @@ bytes respectively. Such letter suffixes per thread. The mitigation control state is inherited on fork.
+ prctl,ibpb + - Like "prctl" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different user + space processes. + seccomp - Same as "prctl" above, but all seccomp threads will enable the mitigation unless they explicitly opt out.
+ seccomp,ibpb + - Like "seccomp" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different + user space processes. + auto - Kernel selects the mitigation depending on the available CPU features and vulnerability.
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -245,7 +245,9 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_AUTO, SPECTRE_V2_USER_CMD_FORCE, SPECTRE_V2_USER_CMD_PRCTL, + SPECTRE_V2_USER_CMD_PRCTL_IBPB, SPECTRE_V2_USER_CMD_SECCOMP, + SPECTRE_V2_USER_CMD_SECCOMP_IBPB, };
static const char * const spectre_v2_user_strings[] = { @@ -260,11 +262,13 @@ static const struct { enum spectre_v2_user_cmd cmd; bool secure; } v2_user_options[] __initdata = { - { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, - { "off", SPECTRE_V2_USER_CMD_NONE, false }, - { "on", SPECTRE_V2_USER_CMD_FORCE, true }, - { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, - { "seccomp", SPECTRE_V2_USER_CMD_SECCOMP, false }, + { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, + { "off", SPECTRE_V2_USER_CMD_NONE, false }, + { "on", SPECTRE_V2_USER_CMD_FORCE, true }, + { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, + { "prctl,ibpb", SPECTRE_V2_USER_CMD_PRCTL_IBPB, false }, + { "seccomp", SPECTRE_V2_USER_CMD_SECCOMP, false }, + { "seccomp,ibpb", SPECTRE_V2_USER_CMD_SECCOMP_IBPB, false }, };
static void __init spec_v2_user_print_cond(const char *reason, bool secure) @@ -310,6 +314,7 @@ spectre_v2_user_select_mitigation(enum s { enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE; bool smt_possible = IS_ENABLED(CONFIG_SMP); + enum spectre_v2_user_cmd cmd;
if (!boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_STIBP)) return; @@ -317,17 +322,20 @@ spectre_v2_user_select_mitigation(enum s if (!IS_ENABLED(CONFIG_SMP)) smt_possible = false;
- switch (spectre_v2_parse_user_cmdline(v2_cmd)) { + cmd = spectre_v2_parse_user_cmdline(v2_cmd); + switch (cmd) { case SPECTRE_V2_USER_CMD_NONE: goto set_mode; case SPECTRE_V2_USER_CMD_FORCE: mode = SPECTRE_V2_USER_STRICT; break; case SPECTRE_V2_USER_CMD_PRCTL: + case SPECTRE_V2_USER_CMD_PRCTL_IBPB: mode = SPECTRE_V2_USER_PRCTL; break; case SPECTRE_V2_USER_CMD_AUTO: case SPECTRE_V2_USER_CMD_SECCOMP: + case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: if (IS_ENABLED(CONFIG_SECCOMP)) mode = SPECTRE_V2_USER_SECCOMP; else @@ -339,12 +347,15 @@ spectre_v2_user_select_mitigation(enum s if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
- switch (mode) { - case SPECTRE_V2_USER_STRICT: + switch (cmd) { + case SPECTRE_V2_USER_CMD_FORCE: + case SPECTRE_V2_USER_CMD_PRCTL_IBPB: + case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: static_branch_enable(&switch_mm_always_ibpb); break; - case SPECTRE_V2_USER_PRCTL: - case SPECTRE_V2_USER_SECCOMP: + case SPECTRE_V2_USER_CMD_PRCTL: + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_SECCOMP: static_branch_enable(&switch_mm_cond_ibpb); break; default: @@ -352,7 +363,8 @@ spectre_v2_user_select_mitigation(enum s }
pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", - mode == SPECTRE_V2_USER_STRICT ? "always-on" : "conditional"); + static_key_enabled(&switch_mm_always_ibpb) ? + "always-on" : "conditional"); }
/* If enhanced IBRS is enabled no STIPB required */
From: Eduardo Habkost ehabkost@redhat.com
commit d7b09c827a6cf291f66637a36f46928dd1423184 upstream.
Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL to the guest, which makes STIBP available to guests. This was implemented by commits d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") and b2ac58f90540 ("KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL").
However, we never updated GET_SUPPORTED_CPUID to let userspace know that STIBP can be enabled in CPUID. Fix that by updating kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features.
Signed-off-by: Eduardo Habkost ehabkost@redhat.com Reviewed-by: Jim Mattson jmattson@google.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -344,7 +344,7 @@ static inline int __do_cpuid_ent(struct /* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) | - F(AMD_SSB_NO); + F(AMD_SSB_NO) | F(AMD_STIBP);
/* cpuid 0xC0000001.edx */ const u32 kvm_supported_word5_x86_features = @@ -365,7 +365,8 @@ static inline int __do_cpuid_ent(struct
/* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(SPEC_CTRL) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES); + F(SPEC_CTRL) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | + F(INTEL_STIBP);
/* all calls to cpuid_count() should be made on the same cpu */ get_cpu();
From: Thomas Gleixner tglx@linutronix.de
commit d8eabc37310a92df40d07c5a8afc53cebf996716 upstream.
Greg pointed out that speculation related bit defines are using (1 << N) format instead of BIT(N). Aside of that (1 << N) is wrong as it should use 1UL at least.
Clean it up.
[ Josh Poimboeuf: Fix tools build ]
Reported-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: - Drop change to x86_energy_perf_policy, which doesn't use msr-index.h here - Drop changes to flush MSRs which we haven't defined] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/msr-index.h | 24 +++++++++++++----------- tools/power/x86/turbostat/Makefile | 2 +- 2 files changed, 14 insertions(+), 12 deletions(-)
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_MSR_INDEX_H #define _ASM_X86_MSR_INDEX_H
+#include <linux/bits.h> + /* CPU model specific register (MSR) numbers */
/* x86-64 specific MSRs */ @@ -33,14 +35,14 @@
/* Intel MSRs. Some also available on other CPUs */ #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ -#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */ +#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */ #define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ -#define SPEC_CTRL_STIBP (1 << SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ +#define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ #define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ -#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ +#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ -#define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */ +#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
#define MSR_IA32_PERFCTR0 0x000000c1 #define MSR_IA32_PERFCTR1 0x000000c2 @@ -57,13 +59,13 @@ #define MSR_MTRRcap 0x000000fe
#define MSR_IA32_ARCH_CAPABILITIES 0x0000010a -#define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */ -#define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */ -#define ARCH_CAP_SSB_NO (1 << 4) /* - * Not susceptible to Speculative Store Bypass - * attack, so no Speculative Store Bypass - * control required. - */ +#define ARCH_CAP_RDCL_NO BIT(0) /* Not susceptible to Meltdown */ +#define ARCH_CAP_IBRS_ALL BIT(1) /* Enhanced IBRS support */ +#define ARCH_CAP_SSB_NO BIT(4) /* + * Not susceptible to Speculative Store Bypass + * attack, so no Speculative Store Bypass + * control required. + */
#define MSR_IA32_BBL_CR_CTL 0x00000119 #define MSR_IA32_BBL_CR_CTL3 0x0000011e --- a/tools/power/x86/turbostat/Makefile +++ b/tools/power/x86/turbostat/Makefile @@ -8,7 +8,7 @@ ifeq ("$(origin O)", "command line") endif
turbostat : turbostat.c -CFLAGS += -Wall +CFLAGS += -Wall -I../../../include CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"'
%: %.c
From: Thomas Gleixner tglx@linutronix.de
commit 36ad35131adacc29b328b9c8b6277a8bf0d6fd5d upstream.
The CPU vulnerability whitelists have some overlap and there are more whitelists coming along.
Use the driver_data field in the x86_cpu_id struct to denote the whitelisted vulnerabilities and combine all whitelists into one.
Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/common.c | 103 ++++++++++++++++++++++--------------------- 1 file changed, 55 insertions(+), 48 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -847,60 +847,68 @@ static void identify_cpu_without_cpuid(s #endif }
-static const __initconst struct x86_cpu_id cpu_no_speculation[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_TABLET, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL_MID, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_MID, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL, X86_FEATURE_ANY }, - { X86_VENDOR_CENTAUR, 5 }, - { X86_VENDOR_INTEL, 5 }, - { X86_VENDOR_NSC, 5 }, - { X86_VENDOR_ANY, 4 }, - {} -}; +#define NO_SPECULATION BIT(0) +#define NO_MELTDOWN BIT(1) +#define NO_SSB BIT(2) +#define NO_L1TF BIT(3) + +#define VULNWL(_vendor, _family, _model, _whitelist) \ + { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } + +#define VULNWL_INTEL(model, whitelist) \ + VULNWL(INTEL, 6, INTEL_FAM6_##model, whitelist) + +#define VULNWL_AMD(family, whitelist) \ + VULNWL(AMD, family, X86_MODEL_ANY, whitelist) + +static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { + VULNWL(ANY, 4, X86_MODEL_ANY, NO_SPECULATION), + VULNWL(CENTAUR, 5, X86_MODEL_ANY, NO_SPECULATION), + VULNWL(INTEL, 5, X86_MODEL_ANY, NO_SPECULATION), + VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION), + + VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION), + VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION), + VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION), + VULNWL_INTEL(ATOM_BONNELL, NO_SPECULATION), + VULNWL_INTEL(ATOM_BONNELL_MID, NO_SPECULATION), + + VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF), + VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF), + VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF), + VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF), + VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF), + VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF), + + VULNWL_INTEL(CORE_YONAH, NO_SSB), + + VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT, NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT_X, NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_L1TF), + + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF), + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF), + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF), + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF),
-static const __initconst struct x86_cpu_id cpu_no_meltdown[] = { - { X86_VENDOR_AMD }, + /* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF), {} };
-/* Only list CPUs which speculate but are non susceptible to SSB */ -static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, - { X86_VENDOR_AMD, 0x12, }, - { X86_VENDOR_AMD, 0x11, }, - { X86_VENDOR_AMD, 0x10, }, - { X86_VENDOR_AMD, 0xf, }, - {} -}; +static bool __init cpu_matches(unsigned long which) +{ + const struct x86_cpu_id *m = x86_match_cpu(cpu_vuln_whitelist);
-static const __initconst struct x86_cpu_id cpu_no_l1tf[] = { - /* in addition to cpu_no_speculation */ - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT_MID }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_X }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_PLUS }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, - {} -}; + return m && !!(m->driver_data & which); +}
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) { u64 ia32_cap = 0;
- if (x86_match_cpu(cpu_no_speculation)) + if (cpu_matches(NO_SPECULATION)) return;
setup_force_cpu_bug(X86_BUG_SPECTRE_V1); @@ -909,15 +917,14 @@ static void __init cpu_set_bug_bits(stru if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES)) rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
- if (!x86_match_cpu(cpu_no_spec_store_bypass) && - !(ia32_cap & ARCH_CAP_SSB_NO) && + if (!cpu_matches(NO_SSB) && !(ia32_cap & ARCH_CAP_SSB_NO) && !cpu_has(c, X86_FEATURE_AMD_SSB_NO)) setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
if (ia32_cap & ARCH_CAP_IBRS_ALL) setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
- if (x86_match_cpu(cpu_no_meltdown)) + if (cpu_matches(NO_MELTDOWN)) return;
/* Rogue Data Cache Load? No! */ @@ -926,7 +933,7 @@ static void __init cpu_set_bug_bits(stru
setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
- if (x86_match_cpu(cpu_no_l1tf)) + if (cpu_matches(NO_L1TF)) return;
setup_force_cpu_bug(X86_BUG_L1TF);
From: Andi Kleen ak@linux.intel.com
commit ed5194c2732c8084af9fd159c146ea92bf137128 upstream.
Microarchitectural Data Sampling (MDS), is a class of side channel attacks on internal buffers in Intel CPUs. The variants are:
- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a dependent load (store-to-load forwarding) as an optimization. The forward can also happen to a faulting or assisting load operation for a different memory address, which can be exploited under certain conditions. Store buffers are partitioned between Hyper-Threads so cross thread forwarding is not possible. But if a thread enters or exits a sleep state the store buffer is repartitioned which can expose data from one thread to the other.
MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage L1 miss situations and to hold data which is returned or sent in response to a memory or I/O operation. Fill buffers can forward data to a load operation and also write data to the cache. When the fill buffer is deallocated it can retain the stale data of the preceding operations which can then be forwarded to a faulting or assisting load operation, which can be exploited under certain conditions. Fill buffers are shared between Hyper-Threads so cross thread leakage is possible.
MLDPS leaks Load Port Data. Load ports are used to perform load operations from memory or I/O. The received data is then forwarded to the register file or a subsequent operation. In some implementations the Load Port can contain stale data from a previous operation which can be forwarded to faulting or assisting loads under certain conditions, which again can be exploited eventually. Load ports are shared between Hyper-Threads so cross thread leakage is possible.
All variants have the same mitigation for single CPU thread case (SMT off), so the kernel can treat them as one MDS issue.
Add the basic infrastructure to detect if the current CPU is affected by MDS.
[ tglx: Rewrote changelog ]
Signed-off-by: Andi Kleen ak@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: adjust context, indentation] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/msr-index.h | 5 +++++ arch/x86/kernel/cpu/common.c | 25 ++++++++++++++++--------- 3 files changed, 23 insertions(+), 9 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -310,6 +310,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ @@ -335,5 +336,6 @@ #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ #define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ #define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ +#define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -66,6 +66,11 @@ * attack, so no Speculative Store Bypass * control required. */ +#define ARCH_CAP_MDS_NO BIT(5) /* + * Not susceptible to + * Microarchitectural Data + * Sampling (MDS) vulnerabilities. + */
#define MSR_IA32_BBL_CR_CTL 0x00000119 #define MSR_IA32_BBL_CR_CTL3 0x0000011e --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -851,6 +851,7 @@ static void identify_cpu_without_cpuid(s #define NO_MELTDOWN BIT(1) #define NO_SSB BIT(2) #define NO_L1TF BIT(3) +#define NO_MDS BIT(4)
#define VULNWL(_vendor, _family, _model, _whitelist) \ { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } @@ -867,6 +868,7 @@ static const __initconst struct x86_cpu_ VULNWL(INTEL, 5, X86_MODEL_ANY, NO_SPECULATION), VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION),
+ /* Intel Family 6 */ VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION), VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION), VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION), @@ -883,17 +885,19 @@ static const __initconst struct x86_cpu_ VULNWL_INTEL(CORE_YONAH, NO_SSB),
VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF), - VULNWL_INTEL(ATOM_GOLDMONT, NO_L1TF), - VULNWL_INTEL(ATOM_GOLDMONT_X, NO_L1TF), - VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_L1TF), - - VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF), - VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF), - VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF), - VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF), + + VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF), + + /* AMD Family 0xf - 0x12 */ + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS),
/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ - VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF), + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS), {} };
@@ -924,6 +928,9 @@ static void __init cpu_set_bug_bits(stru if (ia32_cap & ARCH_CAP_IBRS_ALL) setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
+ if (!cpu_matches(NO_MDS) && !(ia32_cap & ARCH_CAP_MDS_NO)) + setup_force_cpu_bug(X86_BUG_MDS); + if (cpu_matches(NO_MELTDOWN)) return;
From: Thomas Gleixner tglx@linutronix.de
commit e261f209c3666e842fd645a1e31f001c3a26def9 upstream.
This bug bit is set on CPUs which are only affected by Microarchitectural Store Buffer Data Sampling (MSBDS) and not by any other MDS variant.
This is important because the Store Buffers are partitioned between Hyper-Threads so cross thread forwarding is not possible. But if a thread enters or exits a sleep state the store buffer is repartitioned which can expose data from one thread to the other. This transition can be mitigated.
That means that for CPUs which are only affected by MSBDS SMT can be enabled, if the CPU is not affected by other SMT sensitive vulnerabilities, e.g. L1TF. The XEON PHI variants fall into that category. Also the Silvermont/Airmont ATOMs, but for them it's not really relevant as they do not support SMT, but mark them for completeness sake.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: adjust context, indentation] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/common.c | 20 ++++++++++++-------- 2 files changed, 13 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -337,5 +337,6 @@ #define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ #define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ #define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */ +#define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -852,6 +852,7 @@ static void identify_cpu_without_cpuid(s #define NO_SSB BIT(2) #define NO_L1TF BIT(3) #define NO_MDS BIT(4) +#define MSBDS_ONLY BIT(5)
#define VULNWL(_vendor, _family, _model, _whitelist) \ { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } @@ -875,16 +876,16 @@ static const __initconst struct x86_cpu_ VULNWL_INTEL(ATOM_BONNELL, NO_SPECULATION), VULNWL_INTEL(ATOM_BONNELL_MID, NO_SPECULATION),
- VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF), - VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF), - VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF), - VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF), - VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF), - VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF), + VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF | MSBDS_ONLY),
VULNWL_INTEL(CORE_YONAH, NO_SSB),
- VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF), + VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY),
VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF), VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF), @@ -928,8 +929,11 @@ static void __init cpu_set_bug_bits(stru if (ia32_cap & ARCH_CAP_IBRS_ALL) setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
- if (!cpu_matches(NO_MDS) && !(ia32_cap & ARCH_CAP_MDS_NO)) + if (!cpu_matches(NO_MDS) && !(ia32_cap & ARCH_CAP_MDS_NO)) { setup_force_cpu_bug(X86_BUG_MDS); + if (cpu_matches(MSBDS_ONLY)) + setup_force_cpu_bug(X86_BUG_MSBDS_ONLY); + }
if (cpu_matches(NO_MELTDOWN)) return;
From: Andi Kleen ak@linux.intel.com
commit 6c4dbbd14730c43f4ed808a9c42ca41625925c22 upstream.
X86_FEATURE_MD_CLEAR is a new CPUID bit which is set when microcode provides the mechanism to invoke a flush of various exploitable CPU buffers by invoking the VERW instruction.
Hand it through to guests so they can adjust their mitigations.
This also requires corresponding qemu changes, which are available separately.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen ak@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -366,7 +366,7 @@ static inline int __do_cpuid_ent(struct /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = F(SPEC_CTRL) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | - F(INTEL_STIBP); + F(INTEL_STIBP) | F(MD_CLEAR);
/* all calls to cpuid_count() should be made on the same cpu */ get_cpu();
From: Thomas Gleixner tglx@linutronix.de
commit 6a9e529272517755904b7afa639f6db59ddb793e upstream.
The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by clearing the affected CPU buffers. The mechanism for clearing the buffers uses the unused and obsolete VERW instruction in combination with a microcode update which triggers a CPU buffer clear when VERW is executed.
Provide a inline function with the assembly magic. The argument of the VERW instruction must be a memory operand as documented:
"MD_CLEAR enumerates that the memory-operand variant of VERW (for example, VERW m16) has been extended to also overwrite buffers affected by MDS. This buffer overwriting functionality is not guaranteed for the register operand variant of VERW."
Documentation also recommends to use a writable data segment selector:
"The buffer overwriting occurs regardless of the result of the VERW permission check, as well as when the selector is null or causes a descriptor load segment violation. However, for lowest latency we recommend using a selector that indicates a valid writable data segment."
Add x86 specific documentation about MDS and the internal workings of the mitigation.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: drop changes to doc index and configuration] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/x86/mds.rst | 99 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/nospec-branch.h | 25 ++++++++ 2 files changed, 124 insertions(+) create mode 100644 Documentation/x86/mds.rst
--- /dev/null +++ b/Documentation/x86/mds.rst @@ -0,0 +1,99 @@ +Microarchitectural Data Sampling (MDS) mitigation +================================================= + +.. _mds: + +Overview +-------- + +Microarchitectural Data Sampling (MDS) is a family of side channel attacks +on internal buffers in Intel CPUs. The variants are: + + - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) + - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) + - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) + +MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a +dependent load (store-to-load forwarding) as an optimization. The forward +can also happen to a faulting or assisting load operation for a different +memory address, which can be exploited under certain conditions. Store +buffers are partitioned between Hyper-Threads so cross thread forwarding is +not possible. But if a thread enters or exits a sleep state the store +buffer is repartitioned which can expose data from one thread to the other. + +MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage +L1 miss situations and to hold data which is returned or sent in response +to a memory or I/O operation. Fill buffers can forward data to a load +operation and also write data to the cache. When the fill buffer is +deallocated it can retain the stale data of the preceding operations which +can then be forwarded to a faulting or assisting load operation, which can +be exploited under certain conditions. Fill buffers are shared between +Hyper-Threads so cross thread leakage is possible. + +MLPDS leaks Load Port Data. Load ports are used to perform load operations +from memory or I/O. The received data is then forwarded to the register +file or a subsequent operation. In some implementations the Load Port can +contain stale data from a previous operation which can be forwarded to +faulting or assisting loads under certain conditions, which again can be +exploited eventually. Load ports are shared between Hyper-Threads so cross +thread leakage is possible. + + +Exposure assumptions +-------------------- + +It is assumed that attack code resides in user space or in a guest with one +exception. The rationale behind this assumption is that the code construct +needed for exploiting MDS requires: + + - to control the load to trigger a fault or assist + + - to have a disclosure gadget which exposes the speculatively accessed + data for consumption through a side channel. + + - to control the pointer through which the disclosure gadget exposes the + data + +The existence of such a construct in the kernel cannot be excluded with +100% certainty, but the complexity involved makes it extremly unlikely. + +There is one exception, which is untrusted BPF. The functionality of +untrusted BPF is limited, but it needs to be thoroughly investigated +whether it can be used to create such a construct. + + +Mitigation strategy +------------------- + +All variants have the same mitigation strategy at least for the single CPU +thread case (SMT off): Force the CPU to clear the affected buffers. + +This is achieved by using the otherwise unused and obsolete VERW +instruction in combination with a microcode update. The microcode clears +the affected CPU buffers when the VERW instruction is executed. + +For virtualization there are two ways to achieve CPU buffer +clearing. Either the modified VERW instruction or via the L1D Flush +command. The latter is issued when L1TF mitigation is enabled so the extra +VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to +be issued. + +If the VERW instruction with the supplied segment selector argument is +executed on a CPU without the microcode update there is no side effect +other than a small number of pointlessly wasted CPU cycles. + +This does not protect against cross Hyper-Thread attacks except for MSBDS +which is only exploitable cross Hyper-thread when one of the Hyper-Threads +enters a C-state. + +The kernel provides a function to invoke the buffer clearing: + + mds_clear_cpu_buffers() + +The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state +(idle) transitions. + +According to current knowledge additional mitigations inside the kernel +itself are not required because the necessary gadgets to expose the leaked +data cannot be controlled in a way which allows exploitation from malicious +user space or VM guests. --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -262,6 +262,31 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
+#include <asm/segment.h> + +/** + * mds_clear_cpu_buffers - Mitigation for MDS vulnerability + * + * This uses the otherwise unused and obsolete VERW instruction in + * combination with microcode which triggers a CPU buffer flush when the + * instruction is executed. + */ +static inline void mds_clear_cpu_buffers(void) +{ + static const u16 ds = __KERNEL_DS; + + /* + * Has to be the memory-operand variant because only that + * guarantees the CPU buffer flush functionality according to + * documentation. The register-operand variant does not. + * Works with any segment selector, but a valid writable + * data segment is the fastest variant. + * + * "cc" clobber is required because VERW modifies ZF. + */ + asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc"); +} + #endif /* __ASSEMBLY__ */
/*
From: Thomas Gleixner tglx@linutronix.de
commit 04dcbdb8057827b043b3c71aa397c4c63e67d086 upstream.
Add a static key which controls the invocation of the CPU buffer clear mechanism on exit to user space and add the call into prepare_exit_to_usermode() and do_nmi() right before actually returning.
Add documentation which kernel to user space transition this covers and explain why some corner cases are not mitigated.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/x86/mds.rst | 52 +++++++++++++++++++++++++++++++++++ arch/x86/entry/common.c | 3 ++ arch/x86/include/asm/nospec-branch.h | 13 ++++++++ arch/x86/kernel/cpu/bugs.c | 3 ++ arch/x86/kernel/nmi.c | 4 ++ arch/x86/kernel/traps.c | 8 +++++ 6 files changed, 83 insertions(+)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -97,3 +97,55 @@ According to current knowledge additiona itself are not required because the necessary gadgets to expose the leaked data cannot be controlled in a way which allows exploitation from malicious user space or VM guests. + +Mitigation points +----------------- + +1. Return to user space +^^^^^^^^^^^^^^^^^^^^^^^ + + When transitioning from kernel to user space the CPU buffers are flushed + on affected CPUs when the mitigation is not disabled on the kernel + command line. The migitation is enabled through the static key + mds_user_clear. + + The mitigation is invoked in prepare_exit_to_usermode() which covers + most of the kernel to user space transitions. There are a few exceptions + which are not invoking prepare_exit_to_usermode() on return to user + space. These exceptions use the paranoid exit code. + + - Non Maskable Interrupt (NMI): + + Access to sensible data like keys, credentials in the NMI context is + mostly theoretical: The CPU can do prefetching or execute a + misspeculated code path and thereby fetching data which might end up + leaking through a buffer. + + But for mounting other attacks the kernel stack address of the task is + already valuable information. So in full mitigation mode, the NMI is + mitigated on the return from do_nmi() to provide almost complete + coverage. + + - Double fault (#DF): + + A double fault is usually fatal, but the ESPFIX workaround, which can + be triggered from user space through modify_ldt(2) is a recoverable + double fault. #DF uses the paranoid exit path, so explicit mitigation + in the double fault handler is required. + + - Machine Check Exception (#MC): + + Another corner case is a #MC which hits between the CPU buffer clear + invocation and the actual return to user. As this still is in kernel + space it takes the paranoid exit path which does not clear the CPU + buffers. So the #MC handler repopulates the buffers to some + extent. Machine checks are not reliably controllable and the window is + extremly small so mitigation would just tick a checkbox that this + theoretical corner case is covered. To keep the amount of special + cases small, ignore #MC. + + - Debug Exception (#DB): + + This takes the paranoid exit path only when the INT1 breakpoint is in + kernel space. #DB on a user space address takes the regular exit path, + so no extra mitigation required. --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -28,6 +28,7 @@ #include <asm/vdso.h> #include <asm/uaccess.h> #include <asm/cpufeature.h> +#include <asm/nospec-branch.h>
#define CREATE_TRACE_POINTS #include <trace/events/syscalls.h> @@ -295,6 +296,8 @@ __visible inline void prepare_exit_to_us #endif
user_enter(); + + mds_user_clear_cpu_buffers(); }
#define SYSCALL_EXIT_WORK_FLAGS \ --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -262,6 +262,8 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
+DECLARE_STATIC_KEY_FALSE(mds_user_clear); + #include <asm/segment.h>
/** @@ -287,6 +289,17 @@ static inline void mds_clear_cpu_buffers asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc"); }
+/** + * mds_user_clear_cpu_buffers - Mitigation for MDS vulnerability + * + * Clear CPU buffers if the corresponding static key is enabled + */ +static inline void mds_user_clear_cpu_buffers(void) +{ + if (static_branch_likely(&mds_user_clear)) + mds_clear_cpu_buffers(); +} + #endif /* __ASSEMBLY__ */
/* --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -58,6 +58,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_i /* Control unconditional IBPB in switch_mm() */ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
+/* Control MDS CPU buffer clear before returning to user space */ +DEFINE_STATIC_KEY_FALSE(mds_user_clear); + void __init check_bugs(void) { identify_boot_cpu(); --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -29,6 +29,7 @@ #include <asm/mach_traps.h> #include <asm/nmi.h> #include <asm/x86_init.h> +#include <asm/nospec-branch.h>
#define CREATE_TRACE_POINTS #include <trace/events/nmi.h> @@ -522,6 +523,9 @@ nmi_restart: write_cr2(this_cpu_read(nmi_cr2)); if (this_cpu_dec_return(nmi_state)) goto nmi_restart; + + if (user_mode(regs)) + mds_user_clear_cpu_buffers(); } NOKPROBE_SYMBOL(do_nmi);
--- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -61,6 +61,7 @@ #include <asm/alternative.h> #include <asm/fpu/xstate.h> #include <asm/trace/mpx.h> +#include <asm/nospec-branch.h> #include <asm/mpx.h> #include <asm/vm86.h>
@@ -337,6 +338,13 @@ dotraplinkage void do_double_fault(struc regs->ip = (unsigned long)general_protection; regs->sp = (unsigned long)&normal_regs->orig_ax;
+ /* + * This situation can be triggered by userspace via + * modify_ldt(2) and the return does not take the regular + * user space exit, so a CPU buffer clear is required when + * MDS mitigation is enabled. + */ + mds_user_clear_cpu_buffers(); return; } #endif
From: Thomas Gleixner tglx@linutronix.de
commit 07f07f55a29cb705e221eda7894dd67ab81ef343 upstream.
Add a static key which controls the invocation of the CPU buffer clear mechanism on idle entry. This is independent of other MDS mitigations because the idle entry invocation to mitigate the potential leakage due to store buffer repartitioning is only necessary on SMT systems.
Add the actual invocations to the different halt/mwait variants which covers all usage sites. mwaitx is not patched as it's not available on Intel CPUs.
The buffer clear is only invoked before entering the C-State to prevent that stale data from the idling CPU is spilled to the Hyper-Thread sibling after the Store buffer got repartitioned and all entries are available to the non idle sibling.
When coming out of idle the store buffer is partitioned again so each sibling has half of it available. Now CPU which returned from idle could be speculatively exposed to contents of the sibling, but the buffers are flushed either on exit to user space or on VMENTER.
When later on conditional buffer clearing is implemented on top of this, then there is no action required either because before returning to user space the context switch will set the condition flag which causes a flush on the return to user path.
Note, that the buffer clearing on idle is only sensible on CPUs which are solely affected by MSBDS and not any other variant of MDS because the other MDS variants cannot be mitigated when SMT is enabled, so the buffer clearing on idle would be a window dressing exercise.
This intentionally does not handle the case in the acpi/processor_idle driver which uses the legacy IO port interface for C-State transitions for two reasons:
- The acpi/processor_idle driver was replaced by the intel_idle driver almost a decade ago. Anything Nehalem upwards supports it and defaults to that new driver.
- The legacy IO port interface is likely to be used on older and therefore unaffected CPUs or on systems which do not receive microcode updates anymore, so there is no point in adding that.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Frederic Weisbecker frederic@kernel.org Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/x86/mds.rst | 42 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/irqflags.h | 5 ++++ arch/x86/include/asm/mwait.h | 7 +++++ arch/x86/include/asm/nospec-branch.h | 12 ++++++++++ arch/x86/kernel/cpu/bugs.c | 3 ++ 5 files changed, 69 insertions(+)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -149,3 +149,45 @@ Mitigation points This takes the paranoid exit path only when the INT1 breakpoint is in kernel space. #DB on a user space address takes the regular exit path, so no extra mitigation required. + + +2. C-State transition +^^^^^^^^^^^^^^^^^^^^^ + + When a CPU goes idle and enters a C-State the CPU buffers need to be + cleared on affected CPUs when SMT is active. This addresses the + repartitioning of the store buffer when one of the Hyper-Threads enters + a C-State. + + When SMT is inactive, i.e. either the CPU does not support it or all + sibling threads are offline CPU buffer clearing is not required. + + The idle clearing is enabled on CPUs which are only affected by MSBDS + and not by any other MDS variant. The other MDS variants cannot be + protected against cross Hyper-Thread attacks because the Fill Buffer and + the Load Ports are shared. So on CPUs affected by other variants, the + idle clearing would be a window dressing exercise and is therefore not + activated. + + The invocation is controlled by the static key mds_idle_clear which is + switched depending on the chosen mitigation mode and the SMT state of + the system. + + The buffer clear is only invoked before entering the C-State to prevent + that stale data from the idling CPU from spilling to the Hyper-Thread + sibling after the store buffer got repartitioned and all entries are + available to the non idle sibling. + + When coming out of idle the store buffer is partitioned again so each + sibling has half of it available. The back from idle CPU could be then + speculatively exposed to contents of the sibling. The buffers are + flushed either on exit to user space or on VMENTER so malicious code + in user space or the guest cannot speculatively access them. + + The mitigation is hooked into all variants of halt()/mwait(), but does + not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver + has been superseded by the intel_idle driver around 2010 and is + preferred on all affected CPUs which are expected to gain the MD_CLEAR + functionality in microcode. Aside of that the IO-Port mechanism is a + legacy interface which is only used on older systems which are either + not affected or do not receive microcode updates anymore. --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -4,6 +4,9 @@ #include <asm/processor-flags.h>
#ifndef __ASSEMBLY__ + +#include <asm/nospec-branch.h> + /* * Interrupt control: */ @@ -49,11 +52,13 @@ static inline void native_irq_enable(voi
static inline void native_safe_halt(void) { + mds_idle_clear_cpu_buffers(); asm volatile("sti; hlt": : :"memory"); }
static inline void native_halt(void) { + mds_idle_clear_cpu_buffers(); asm volatile("hlt": : :"memory"); }
--- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -4,6 +4,7 @@ #include <linux/sched.h>
#include <asm/cpufeature.h> +#include <asm/nospec-branch.h>
#define MWAIT_SUBSTATE_MASK 0xf #define MWAIT_CSTATE_MASK 0xf @@ -38,6 +39,8 @@ static inline void __monitorx(const void
static inline void __mwait(unsigned long eax, unsigned long ecx) { + mds_idle_clear_cpu_buffers(); + /* "mwait %eax, %ecx;" */ asm volatile(".byte 0x0f, 0x01, 0xc9;" :: "a" (eax), "c" (ecx)); @@ -72,6 +75,8 @@ static inline void __mwait(unsigned long static inline void __mwaitx(unsigned long eax, unsigned long ebx, unsigned long ecx) { + /* No MDS buffer clear as this is AMD/HYGON only */ + /* "mwaitx %eax, %ebx, %ecx;" */ asm volatile(".byte 0x0f, 0x01, 0xfb;" :: "a" (eax), "b" (ebx), "c" (ecx)); @@ -79,6 +84,8 @@ static inline void __mwaitx(unsigned lon
static inline void __sti_mwait(unsigned long eax, unsigned long ecx) { + mds_idle_clear_cpu_buffers(); + trace_hardirqs_on(); /* "mwait %eax, %ecx;" */ asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -263,6 +263,7 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
DECLARE_STATIC_KEY_FALSE(mds_user_clear); +DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
#include <asm/segment.h>
@@ -300,6 +301,17 @@ static inline void mds_user_clear_cpu_bu mds_clear_cpu_buffers(); }
+/** + * mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability + * + * Clear CPU buffers if the corresponding static key is enabled + */ +static inline void mds_idle_clear_cpu_buffers(void) +{ + if (static_branch_likely(&mds_idle_clear)) + mds_clear_cpu_buffers(); +} + #endif /* __ASSEMBLY__ */
/* --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -60,6 +60,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_always
/* Control MDS CPU buffer clear before returning to user space */ DEFINE_STATIC_KEY_FALSE(mds_user_clear); +/* Control MDS CPU buffer clear before idling (halt, mwait) */ +DEFINE_STATIC_KEY_FALSE(mds_idle_clear); +EXPORT_SYMBOL_GPL(mds_idle_clear);
void __init check_bugs(void) {
From: Thomas Gleixner tglx@linutronix.de
commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream.
Now that the mitigations are in place, add a command line parameter to control the mitigation, a mitigation selector function and a SMT update mechanism.
This is the minimal straight forward initial implementation which just provides an always on/off mode. The command line parameter is:
mds=[full|off]
This is consistent with the existing mitigations for other speculative hardware vulnerabilities.
The idle invocation is dynamically updated according to the SMT state of the system similar to the dynamic update of the STIBP mitigation. The idle mitigation is limited to CPUs which are only affected by MSBDS and not any other variant, because the other variants cannot be mitigated on SMT enabled systems.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: - Drop " __ro_after_init" - Adjust filename, context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 22 +++++++++++ arch/x86/include/asm/processor.h | 6 +++ arch/x86/kernel/cpu/bugs.c | 70 ++++++++++++++++++++++++++++++++++++ 3 files changed, 98 insertions(+)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2035,6 +2035,28 @@ bytes respectively. Such letter suffixes Format: <first>,<last> Specifies range of consoles to be captured by the MDA.
+ mds= [X86,INTEL] + Control mitigation for the Micro-architectural Data + Sampling (MDS) vulnerability. + + Certain CPUs are vulnerable to an exploit against CPU + internal buffers which can forward information to a + disclosure gadget under certain conditions. + + In vulnerable processors, the speculatively + forwarded data can be used in a cache side channel + attack, to access data to which the attacker does + not have direct access. + + This parameter controls the MDS mitigation. The + options are: + + full - Enable MDS mitigation on vulnerable CPUs + off - Unconditionally disable MDS mitigation + + Not specifying this option is equivalent to + mds=full. + mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory Amount of memory to be used when the kernel is not able to see the whole system memory or for test. --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -845,4 +845,10 @@ bool xen_set_default_idle(void);
void stop_this_cpu(void *dummy); void df_debug(struct pt_regs *regs, long error_code); + +enum mds_mitigations { + MDS_MITIGATION_OFF, + MDS_MITIGATION_FULL, +}; + #endif /* _ASM_X86_PROCESSOR_H */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -32,6 +32,7 @@ static void __init spectre_v2_select_mitigation(void); static void __init ssb_select_mitigation(void); static void __init l1tf_select_mitigation(void); +static void __init mds_select_mitigation(void);
/* The base value of the SPEC_CTRL MSR that always has to be preserved. */ u64 x86_spec_ctrl_base; @@ -96,6 +97,8 @@ void __init check_bugs(void)
l1tf_select_mitigation();
+ mds_select_mitigation(); + #ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. @@ -202,6 +205,50 @@ static void x86_amd_ssb_disable(void) }
#undef pr_fmt +#define pr_fmt(fmt) "MDS: " fmt + +/* Default mitigation for L1TF-affected CPUs */ +static enum mds_mitigations mds_mitigation = MDS_MITIGATION_FULL; + +static const char * const mds_strings[] = { + [MDS_MITIGATION_OFF] = "Vulnerable", + [MDS_MITIGATION_FULL] = "Mitigation: Clear CPU buffers" +}; + +static void __init mds_select_mitigation(void) +{ + if (!boot_cpu_has_bug(X86_BUG_MDS)) { + mds_mitigation = MDS_MITIGATION_OFF; + return; + } + + if (mds_mitigation == MDS_MITIGATION_FULL) { + if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) + static_branch_enable(&mds_user_clear); + else + mds_mitigation = MDS_MITIGATION_OFF; + } + pr_info("%s\n", mds_strings[mds_mitigation]); +} + +static int __init mds_cmdline(char *str) +{ + if (!boot_cpu_has_bug(X86_BUG_MDS)) + return 0; + + if (!str) + return -EINVAL; + + if (!strcmp(str, "off")) + mds_mitigation = MDS_MITIGATION_OFF; + else if (!strcmp(str, "full")) + mds_mitigation = MDS_MITIGATION_FULL; + + return 0; +} +early_param("mds", mds_cmdline); + +#undef pr_fmt #define pr_fmt(fmt) "Spectre V2 : " fmt
static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; @@ -599,6 +646,26 @@ static void update_indir_branch_cond(voi static_branch_disable(&switch_to_cond_stibp); }
+/* Update the static key controlling the MDS CPU buffer clear in idle */ +static void update_mds_branch_idle(void) +{ + /* + * Enable the idle clearing if SMT is active on CPUs which are + * affected only by MSBDS and not any other MDS variant. + * + * The other variants cannot be mitigated when SMT is enabled, so + * clearing the buffers on idle just to prevent the Store Buffer + * repartitioning leak would be a window dressing exercise. + */ + if (!boot_cpu_has_bug(X86_BUG_MSBDS_ONLY)) + return; + + if (sched_smt_active()) + static_branch_enable(&mds_idle_clear); + else + static_branch_disable(&mds_idle_clear); +} + void arch_smt_update(void) { /* Enhanced IBRS implies STIBP. No update required. */ @@ -619,6 +686,9 @@ void arch_smt_update(void) break; }
+ if (mds_mitigation == MDS_MITIGATION_FULL) + update_mds_branch_idle(); + mutex_unlock(&spec_ctrl_mutex); }
From: Ben Hutchings ben@decadent.org.uk
The vulnerabilties/l1tf attribute was added by commit 17dbca119312 "x86/speculation/l1tf: Add sysfs reporting for l1tf", which has already been backported to 3.16, but only documented in commit d90a7a0ec83f "x86/bugs, kvm: Introduce boot-time control of L1TF mitigations", which has not and probbaly won't be.
Add just that line of documentation for now.
Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + 1 file changed, 1 insertion(+)
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -277,6 +277,7 @@ What: /sys/devices/system/cpu/vulnerabi /sys/devices/system/cpu/vulnerabilities/spectre_v1 /sys/devices/system/cpu/vulnerabilities/spectre_v2 /sys/devices/system/cpu/vulnerabilities/spec_store_bypass + /sys/devices/system/cpu/vulnerabilities/l1tf Date: January 2018 Contact: Linux kernel mailing list linux-kernel@vger.kernel.org Description: Information about CPU vulnerabilities
From: Thomas Gleixner tglx@linutronix.de
commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream.
Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: - Test x86_hyper instead of using hypervisor_is_type() - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/ABI/testing/sysfs-devices-system-cpu | 1 arch/x86/kernel/cpu/bugs.c | 27 +++++++++++++++++++++ drivers/base/cpu.c | 8 ++++++ include/linux/cpu.h | 2 + 4 files changed, 38 insertions(+)
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -278,6 +278,7 @@ What: /sys/devices/system/cpu/vulnerabi /sys/devices/system/cpu/vulnerabilities/spectre_v2 /sys/devices/system/cpu/vulnerabilities/spec_store_bypass /sys/devices/system/cpu/vulnerabilities/l1tf + /sys/devices/system/cpu/vulnerabilities/mds Date: January 2018 Contact: Linux kernel mailing list linux-kernel@vger.kernel.org Description: Information about CPU vulnerabilities --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -24,6 +24,7 @@ #include <asm/msr.h> #include <asm/paravirt.h> #include <asm/alternative.h> +#include <asm/hypervisor.h> #include <asm/pgtable.h> #include <asm/cacheflush.h> #include <asm/intel-family.h> @@ -1066,6 +1067,24 @@ static void __init l1tf_select_mitigatio
#ifdef CONFIG_SYSFS
+static ssize_t mds_show_state(char *buf) +{ +#ifdef CONFIG_HYPERVISOR_GUEST + if (x86_hyper) { + return sprintf(buf, "%s; SMT Host state unknown\n", + mds_strings[mds_mitigation]); + } +#endif + + if (boot_cpu_has(X86_BUG_MSBDS_ONLY)) { + return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], + sched_smt_active() ? "mitigated" : "disabled"); + } + + return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], + sched_smt_active() ? "vulnerable" : "disabled"); +} + static char *stibp_state(void) { if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) @@ -1128,6 +1147,9 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: Page Table Inversion\n"); break;
+ case X86_BUG_MDS: + return mds_show_state(buf); + default: break; } @@ -1159,4 +1181,9 @@ ssize_t cpu_show_l1tf(struct device *dev { return cpu_show_common(dev, attr, buf, X86_BUG_L1TF); } + +ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf) +{ + return cpu_show_common(dev, attr, buf, X86_BUG_MDS); +} #endif --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -530,11 +530,18 @@ ssize_t __weak cpu_show_l1tf(struct devi return sprintf(buf, "Not affected\n"); }
+ssize_t __weak cpu_show_mds(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL); static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL); +static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_meltdown.attr, @@ -542,6 +549,7 @@ static struct attribute *cpu_root_vulner &dev_attr_spectre_v2.attr, &dev_attr_spec_store_bypass.attr, &dev_attr_l1tf.attr, + &dev_attr_mds.attr, NULL };
--- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -50,6 +50,8 @@ extern ssize_t cpu_show_spec_store_bypas struct device_attribute *attr, char *buf); extern ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_mds(struct device *dev, + struct device_attribute *attr, char *buf);
extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata,
From: Thomas Gleixner tglx@linutronix.de
commit 22dd8365088b6403630b82423cf906491859b65e upstream.
In virtualized environments it can happen that the host has the microcode update which utilizes the VERW instruction to clear CPU buffers, but the hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit to guests.
Introduce an internal mitigation mode VMWERV which enables the invocation of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the system has no updated microcode this results in a pointless execution of the VERW instruction wasting a few CPU cycles. If the microcode is updated, but not exposed to a guest then the CPU buffers will be cleared.
That said: Virtual Machines Will Eventually Receive Vaccine
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Reviewed-by: Jon Masters jcm@redhat.com Tested-by: Jon Masters jcm@redhat.com Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/x86/mds.rst | 27 +++++++++++++++++++++++++++ arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/cpu/bugs.c | 18 ++++++++++++------ 3 files changed, 40 insertions(+), 6 deletions(-)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -93,11 +93,38 @@ The kernel provides a function to invoke The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state (idle) transitions.
+As a special quirk to address virtualization scenarios where the host has +the microcode updated, but the hypervisor does not (yet) expose the +MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the +hope that it might actually clear the buffers. The state is reflected +accordingly. + According to current knowledge additional mitigations inside the kernel itself are not required because the necessary gadgets to expose the leaked data cannot be controlled in a way which allows exploitation from malicious user space or VM guests.
+Kernel internal mitigation modes +-------------------------------- + + ======= ============================================================ + off Mitigation is disabled. Either the CPU is not affected or + mds=off is supplied on the kernel command line + + full Mitigation is eanbled. CPU is affected and MD_CLEAR is + advertised in CPUID. + + vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not + advertised in CPUID. That is mainly for virtualization + scenarios where the host has the updated microcode but the + hypervisor does not expose MD_CLEAR in CPUID. It's a best + effort approach without guarantee. + ======= ============================================================ + +If the CPU is affected and mds=off is not supplied on the kernel command +line then the kernel selects the appropriate mitigation mode depending on +the availability of the MD_CLEAR CPUID bit. + Mitigation points -----------------
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -849,6 +849,7 @@ void df_debug(struct pt_regs *regs, long enum mds_mitigations { MDS_MITIGATION_OFF, MDS_MITIGATION_FULL, + MDS_MITIGATION_VMWERV, };
#endif /* _ASM_X86_PROCESSOR_H */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -213,7 +213,8 @@ static enum mds_mitigations mds_mitigati
static const char * const mds_strings[] = { [MDS_MITIGATION_OFF] = "Vulnerable", - [MDS_MITIGATION_FULL] = "Mitigation: Clear CPU buffers" + [MDS_MITIGATION_FULL] = "Mitigation: Clear CPU buffers", + [MDS_MITIGATION_VMWERV] = "Vulnerable: Clear CPU buffers attempted, no microcode", };
static void __init mds_select_mitigation(void) @@ -224,10 +225,9 @@ static void __init mds_select_mitigation }
if (mds_mitigation == MDS_MITIGATION_FULL) { - if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) - static_branch_enable(&mds_user_clear); - else - mds_mitigation = MDS_MITIGATION_OFF; + if (!boot_cpu_has(X86_FEATURE_MD_CLEAR)) + mds_mitigation = MDS_MITIGATION_VMWERV; + static_branch_enable(&mds_user_clear); } pr_info("%s\n", mds_strings[mds_mitigation]); } @@ -687,8 +687,14 @@ void arch_smt_update(void) break; }
- if (mds_mitigation == MDS_MITIGATION_FULL) + switch (mds_mitigation) { + case MDS_MITIGATION_FULL: + case MDS_MITIGATION_VMWERV: update_mds_branch_idle(); + break; + case MDS_MITIGATION_OFF: + break; + }
mutex_unlock(&spec_ctrl_mutex); }
From: Thomas Gleixner tglx@linutronix.de
commit 65fd4cb65b2dad97feb8330b6690445910b56d6a upstream.
Move L!TF to a separate directory so the MDS stuff can be added at the side. Otherwise the all hardware vulnerabilites have their own top level entry. Should have done that right away.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: we never added the documentation, so just update the log message] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1063,7 +1063,7 @@ static void __init l1tf_select_mitigatio pr_info("You may make it effective by booting the kernel with mem=%llu parameter.\n", half_pa); pr_info("However, doing so will make a part of your RAM unusable.\n"); - pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html might help you decide.\n"); + pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html might help you decide.\n"); return; }
From: Thomas Gleixner tglx@linutronix.de
commit 5999bbe7a6ea3c62029532ec84dc06003a1fa258 upstream.
Add the initial MDS vulnerability documentation.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: - Drop the index updates - Adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/hw-vuln/mds.rst | 307 ++++++++++++++++++++++++++++++++++++ Documentation/kernel-parameters.txt | 2 2 files changed, 309 insertions(+) create mode 100644 Documentation/hw-vuln/mds.rst
--- /dev/null +++ b/Documentation/hw-vuln/mds.rst @@ -0,0 +1,307 @@ +MDS - Microarchitectural Data Sampling +====================================== + +Microarchitectural Data Sampling is a hardware vulnerability which allows +unprivileged speculative access to data which is available in various CPU +internal buffers. + +Affected processors +------------------- + +This vulnerability affects a wide range of Intel processors. The +vulnerability is not present on: + + - Processors from AMD, Centaur and other non Intel vendors + + - Older processor models, where the CPU family is < 6 + + - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus) + + - Intel processors which have the ARCH_CAP_MDS_NO bit set in the + IA32_ARCH_CAPABILITIES MSR. + +Whether a processor is affected or not can be read out from the MDS +vulnerability file in sysfs. See :ref:`mds_sys_info`. + +Not all processors are affected by all variants of MDS, but the mitigation +is identical for all of them so the kernel treats them as a single +vulnerability. + +Related CVEs +------------ + +The following CVE entries are related to the MDS vulnerability: + + ============== ===== ============================================== + CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling + CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling + CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling + ============== ===== ============================================== + +Problem +------- + +When performing store, load, L1 refill operations, processors write data +into temporary microarchitectural structures (buffers). The data in the +buffer can be forwarded to load operations as an optimization. + +Under certain conditions, usually a fault/assist caused by a load +operation, data unrelated to the load memory address can be speculatively +forwarded from the buffers. Because the load operation causes a fault or +assist and its result will be discarded, the forwarded data will not cause +incorrect program execution or state changes. But a malicious operation +may be able to forward this speculative data to a disclosure gadget which +allows in turn to infer the value via a cache side channel attack. + +Because the buffers are potentially shared between Hyper-Threads cross +Hyper-Thread attacks are possible. + +Deeper technical information is available in the MDS specific x86 +architecture section: :ref:`Documentation/x86/mds.rst <mds>`. + + +Attack scenarios +---------------- + +Attacks against the MDS vulnerabilities can be mounted from malicious non +priviledged user space applications running on hosts or guest. Malicious +guest OSes can obviously mount attacks as well. + +Contrary to other speculation based vulnerabilities the MDS vulnerability +does not allow the attacker to control the memory target address. As a +consequence the attacks are purely sampling based, but as demonstrated with +the TLBleed attack samples can be postprocessed successfully. + +Web-Browsers +^^^^^^^^^^^^ + + It's unclear whether attacks through Web-Browsers are possible at + all. The exploitation through Java-Script is considered very unlikely, + but other widely used web technologies like Webassembly could possibly be + abused. + + +.. _mds_sys_info: + +MDS system information +----------------------- + +The Linux kernel provides a sysfs interface to enumerate the current MDS +status of the system: whether the system is vulnerable, and which +mitigations are active. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/mds + +The possible values in this file are: + + ========================================= ================================= + 'Not affected' The processor is not vulnerable + + 'Vulnerable' The processor is vulnerable, + but no mitigation enabled + + 'Vulnerable: Clear CPU buffers attempted' The processor is vulnerable but + microcode is not updated. + The mitigation is enabled on a + best effort basis. + See :ref:`vmwerv` + + 'Mitigation: CPU buffer clear' The processor is vulnerable and the + CPU buffer clearing mitigation is + enabled. + ========================================= ================================= + +If the processor is vulnerable then the following information is appended +to the above information: + + ======================== ============================================ + 'SMT vulnerable' SMT is enabled + 'SMT mitigated' SMT is enabled and mitigated + 'SMT disabled' SMT is disabled + 'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown + ======================== ============================================ + +.. _vmwerv: + +Best effort mitigation mode +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + If the processor is vulnerable, but the availability of the microcode based + mitigation mechanism is not advertised via CPUID the kernel selects a best + effort mitigation mode. This mode invokes the mitigation instructions + without a guarantee that they clear the CPU buffers. + + This is done to address virtualization scenarios where the host has the + microcode update applied, but the hypervisor is not yet updated to expose + the CPUID to the guest. If the host has updated microcode the protection + takes effect otherwise a few cpu cycles are wasted pointlessly. + + The state in the mds sysfs file reflects this situation accordingly. + + +Mitigation mechanism +------------------------- + +The kernel detects the affected CPUs and the presence of the microcode +which is required. + +If a CPU is affected and the microcode is available, then the kernel +enables the mitigation by default. The mitigation can be controlled at boot +time via a kernel command line option. See +:ref:`mds_mitigation_control_command_line`. + +.. _cpu_buffer_clear: + +CPU buffer clearing +^^^^^^^^^^^^^^^^^^^ + + The mitigation for MDS clears the affected CPU buffers on return to user + space and when entering a guest. + + If SMT is enabled it also clears the buffers on idle entry when the CPU + is only affected by MSBDS and not any other MDS variant, because the + other variants cannot be protected against cross Hyper-Thread attacks. + + For CPUs which are only affected by MSBDS the user space, guest and idle + transition mitigations are sufficient and SMT is not affected. + +.. _virt_mechanism: + +Virtualization mitigation +^^^^^^^^^^^^^^^^^^^^^^^^^ + + The protection for host to guest transition depends on the L1TF + vulnerability of the CPU: + + - CPU is affected by L1TF: + + If the L1D flush mitigation is enabled and up to date microcode is + available, the L1D flush mitigation is automatically protecting the + guest transition. + + If the L1D flush mitigation is disabled then the MDS mitigation is + invoked explicit when the host MDS mitigation is enabled. + + For details on L1TF and virtualization see: + :ref:`Documentation/hw-vuln//l1tf.rst <mitigation_control_kvm>`. + + - CPU is not affected by L1TF: + + CPU buffers are flushed before entering the guest when the host MDS + mitigation is enabled. + + The resulting MDS protection matrix for the host to guest transition: + + ============ ===== ============= ============ ================= + L1TF MDS VMX-L1FLUSH Host MDS MDS-State + + Don't care No Don't care N/A Not affected + + Yes Yes Disabled Off Vulnerable + + Yes Yes Disabled Full Mitigated + + Yes Yes Enabled Don't care Mitigated + + No Yes N/A Off Vulnerable + + No Yes N/A Full Mitigated + ============ ===== ============= ============ ================= + + This only covers the host to guest transition, i.e. prevents leakage from + host to guest, but does not protect the guest internally. Guests need to + have their own protections. + +.. _xeon_phi: + +XEON PHI specific considerations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The XEON PHI processor family is affected by MSBDS which can be exploited + cross Hyper-Threads when entering idle states. Some XEON PHI variants allow + to use MWAIT in user space (Ring 3) which opens an potential attack vector + for malicious user space. The exposure can be disabled on the kernel + command line with the 'ring3mwait=disable' command line option. + + XEON PHI is not affected by the other MDS variants and MSBDS is mitigated + before the CPU enters a idle state. As XEON PHI is not affected by L1TF + either disabling SMT is not required for full protection. + +.. _mds_smt_control: + +SMT control +^^^^^^^^^^^ + + All MDS variants except MSBDS can be attacked cross Hyper-Threads. That + means on CPUs which are affected by MFBDS or MLPDS it is necessary to + disable SMT for full protection. These are most of the affected CPUs; the + exception is XEON PHI, see :ref:`xeon_phi`. + + Disabling SMT can have a significant performance impact, but the impact + depends on the type of workloads. + + See the relevant chapter in the L1TF mitigation documentation for details: + :ref:`Documentation/hw-vuln/l1tf.rst <smt_control>`. + + +.. _mds_mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +The kernel command line allows to control the MDS mitigations at boot +time with the option "mds=". The valid arguments for this option are: + + ============ ============================================================= + full If the CPU is vulnerable, enable all available mitigations + for the MDS vulnerability, CPU buffer clearing on exit to + userspace and when entering a VM. Idle transitions are + protected as well if SMT is enabled. + + It does not automatically disable SMT. + + off Disables MDS mitigations completely. + + ============ ============================================================= + +Not specifying this option is equivalent to "mds=full". + + +Mitigation selection guide +-------------------------- + +1. Trusted userspace +^^^^^^^^^^^^^^^^^^^^ + + If all userspace applications are from a trusted source and do not + execute untrusted code which is supplied externally, then the mitigation + can be disabled. + + +2. Virtualization with trusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The same considerations as above versus trusted user space apply. + +3. Virtualization with untrusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The protection depends on the state of the L1TF mitigations. + See :ref:`virt_mechanism`. + + If the MDS mitigation is enabled and SMT is disabled, guest to host and + guest to guest attacks are prevented. + +.. _mds_default_mitigations: + +Default mitigations +------------------- + + The kernel default mitigations for vulnerable processors are: + + - Enable CPU buffer clearing + + The kernel does not by default enforce the disabling of SMT, which leaves + SMT systems vulnerable when running untrusted code. The same rationale as + for L1TF applies. + See :ref:`Documentation/hw-vuln//l1tf.rst <default_mitigations>`. --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2057,6 +2057,8 @@ bytes respectively. Such letter suffixes Not specifying this option is equivalent to mds=full.
+ For details see: Documentation/hw-vuln/mds.rst + mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory Amount of memory to be used when the kernel is not able to see the whole system memory or for test.
From: Andi Kleen ak@linux.intel.com
commit 1de7edbb59c8f1b46071f66c5c97b8a59569eb51 upstream.
Some of the recently added const tables use __initdata which causes section attribute conflicts.
Use __initconst instead.
Fixes: fa1202ef2243 ("x86/speculation: Add command line control") Signed-off-by: Andi Kleen ak@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190330004743.29541-9-andi@firstfloor.org Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -315,7 +315,7 @@ static const struct { const char *option; enum spectre_v2_user_cmd cmd; bool secure; -} v2_user_options[] __initdata = { +} v2_user_options[] __initconst = { { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, { "off", SPECTRE_V2_USER_CMD_NONE, false }, { "on", SPECTRE_V2_USER_CMD_FORCE, true }, @@ -451,7 +451,7 @@ static const struct { const char *option; enum spectre_v2_mitigation_cmd cmd; bool secure; -} mitigation_options[] __initdata = { +} mitigation_options[] __initconst = { { "off", SPECTRE_V2_CMD_NONE, false }, { "on", SPECTRE_V2_CMD_FORCE, true }, { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, @@ -723,7 +723,7 @@ static const char * const ssb_strings[] static const struct { const char *option; enum ssb_mitigation_cmd cmd; -} ssb_mitigation_options[] __initdata = { +} ssb_mitigation_options[] __initconst = { { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
From: Josh Poimboeuf jpoimboe@redhat.com
commit 7c3658b20194a5b3209a143f63bc9c643c6a3ae2 upstream.
arch_smt_update() now has a dependency on both Spectre v2 and MDS mitigations. Move its initial call to after all the mitigation decisions have been made.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Tyler Hicks tyhicks@canonical.com Acked-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -100,6 +100,8 @@ void __init check_bugs(void)
mds_select_mitigation();
+ arch_smt_update(); + #ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. @@ -611,9 +613,6 @@ specv2_set_mode:
/* Set up IBPB and STIBP depending on the general spectre V2 command */ spectre_v2_user_select_mitigation(cmd); - - /* Enable STIBP if appropriate */ - arch_smt_update(); }
static void update_stibp_msr(void * __unused)
From: Josh Poimboeuf jpoimboe@redhat.com
commit 39226ef02bfb43248b7db12a4fdccb39d95318e3 upstream.
MDS is vulnerable with SMT. Make that clear with a one-time printk whenever SMT first gets enabled.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Tyler Hicks tyhicks@canonical.com Acked-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -646,6 +646,9 @@ static void update_indir_branch_cond(voi static_branch_disable(&switch_to_cond_stibp); }
+#undef pr_fmt +#define pr_fmt(fmt) fmt + /* Update the static key controlling the MDS CPU buffer clear in idle */ static void update_mds_branch_idle(void) { @@ -666,6 +669,8 @@ static void update_mds_branch_idle(void) static_branch_disable(&mds_idle_clear); }
+#define MDS_MSG_SMT "MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.\n" + void arch_smt_update(void) { /* Enhanced IBRS implies STIBP. No update required. */ @@ -689,6 +694,8 @@ void arch_smt_update(void) switch (mds_mitigation) { case MDS_MITIGATION_FULL: case MDS_MITIGATION_VMWERV: + if (sched_smt_active() && !boot_cpu_has(X86_BUG_MSBDS_ONLY)) + pr_warn_once(MDS_MSG_SMT); update_mds_branch_idle(); break; case MDS_MITIGATION_OFF: @@ -1069,6 +1076,7 @@ static void __init l1tf_select_mitigatio setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV); } #undef pr_fmt +#define pr_fmt(fmt) fmt
#ifdef CONFIG_SYSFS
From: Boris Ostrovsky boris.ostrovsky@oracle.com
commit cae5ec342645746d617dd420d206e1588d47768a upstream.
s/L1TF/MDS/
Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Tyler Hicks tyhicks@canonical.com Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -210,7 +210,7 @@ static void x86_amd_ssb_disable(void) #undef pr_fmt #define pr_fmt(fmt) "MDS: " fmt
-/* Default mitigation for L1TF-affected CPUs */ +/* Default mitigation for MDS-affected CPUs */ static enum mds_mitigations mds_mitigation = MDS_MITIGATION_FULL;
static const char * const mds_strings[] = {
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit e2c3c94788b08891dcf3dbe608f9880523ecd71b upstream.
This code is only for CPUs which are affected by MSBDS, but are *not* affected by the other two MDS issues.
For such CPUs, enabling the mds_idle_clear mitigation is enough to mitigate SMT.
However if user boots with 'mds=off' and still has SMT enabled, we should not report that SMT is mitigated:
$cat /sys//devices/system/cpu/vulnerabilities/mds Vulnerable; SMT mitigated
But rather: Vulnerable; SMT vulnerable
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Tyler Hicks tyhicks@canonical.com Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lkml.kernel.org/r/20190412215118.294906495@localhost.localdomain Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1091,7 +1091,8 @@ static ssize_t mds_show_state(char *buf)
if (boot_cpu_has(X86_BUG_MSBDS_ONLY)) { return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], - sched_smt_active() ? "mitigated" : "disabled"); + (mds_mitigation == MDS_MITIGATION_OFF ? "vulnerable" : + sched_smt_active() ? "mitigated" : "disabled")); }
return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation],
From: Josh Poimboeuf jpoimboe@redhat.com
commit 98af8452945c55652de68536afdde3b520fec429 upstream.
Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability.
Most users fall into a few basic categories:
a) they want all mitigations off;
b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or
c) they want all reasonable mitigations on, with SMT disabled if vulnerable.
Define a set of curated, arch-independent options, each of which is an aggregation of existing options:
- mitigations=off: Disable all mitigations.
- mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable.
- mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation.
Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Jiri Kosina jkosina@suse.cz (on x86) Reviewed-by: Jiri Kosina jkosina@suse.cz Cc: Borislav Petkov bp@alien8.de Cc: "H . Peter Anvin" hpa@zytor.com Cc: Andy Lutomirski luto@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Jiri Kosina jikos@kernel.org Cc: Waiman Long longman@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Jon Masters jcm@redhat.com Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Paul Mackerras paulus@samba.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Tyler Hicks tyhicks@canonical.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Randy Dunlap rdunlap@infradead.org Cc: Steven Price steven.price@arm.com Cc: Phil Auld pauld@redhat.com Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.155508550... [bwh: Backported to 4.4: - Drop the auto,nosmt option which we can't support - Adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 19 +++++++++++++++++++ include/linux/cpu.h | 17 +++++++++++++++++ kernel/cpu.c | 13 +++++++++++++ 3 files changed, 49 insertions(+)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2173,6 +2173,25 @@ bytes respectively. Such letter suffixes in the "bleeding edge" mini2440 support kernel at http://repo.or.cz/w/linux-2.6/mini2440.git
+ mitigations= + Control optional mitigations for CPU vulnerabilities. + This is a set of curated, arch-independent options, each + of which is an aggregation of existing arch-specific + options. + + off + Disable all optional CPU mitigations. This + improves system performance, but it may also + expose users to several CPU vulnerabilities. + + auto (default) + Mitigate all CPU vulnerabilities, but leave SMT + enabled, even if it's vulnerable. This is for + users who don't want to be surprised by SMT + getting disabled across kernel upgrades, or who + have other ways of avoiding SMT-based attacks. + This is the default behavior. + mminit_loglevel= [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this parameter allows control of the logging verbosity for --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -296,4 +296,21 @@ bool cpu_wait_death(unsigned int cpu, in bool cpu_report_death(void); #endif /* #ifdef CONFIG_HOTPLUG_CPU */
+/* + * These are used for a global "mitigations=" cmdline option for toggling + * optional CPU mitigations. + */ +enum cpu_mitigations { + CPU_MITIGATIONS_OFF, + CPU_MITIGATIONS_AUTO, +}; + +extern enum cpu_mitigations cpu_mitigations; + +/* mitigations=off */ +static inline bool cpu_mitigations_off(void) +{ + return cpu_mitigations == CPU_MITIGATIONS_OFF; +} + #endif /* _LINUX_CPU_H_ */ --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -842,3 +842,16 @@ void init_cpu_online(const struct cpumas { cpumask_copy(to_cpumask(cpu_online_bits), src); } + +enum cpu_mitigations cpu_mitigations = CPU_MITIGATIONS_AUTO; + +static int __init mitigations_parse_cmdline(char *arg) +{ + if (!strcmp(arg, "off")) + cpu_mitigations = CPU_MITIGATIONS_OFF; + else if (!strcmp(arg, "auto")) + cpu_mitigations = CPU_MITIGATIONS_AUTO; + + return 0; +} +early_param("mitigations", mitigations_parse_cmdline);
Hi Greg, Ben,
On Wed, May 15, 2019 at 1:12 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Josh Poimboeuf jpoimboe@redhat.com
commit 98af8452945c55652de68536afdde3b520fec429 upstream.
Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability.
Most users fall into a few basic categories:
a) they want all mitigations off;
b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or
c) they want all reasonable mitigations on, with SMT disabled if vulnerable.
Define a set of curated, arch-independent options, each of which is an aggregation of existing options:
mitigations=off: Disable all mitigations.
mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable.
mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation.
Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de
[bwh: Backported to 4.4:
- Drop the auto,nosmt option which we can't support
This doesn't really stand out. I.e. I completely missed it, and started wondering why "auto,nosmt" was not documented in kernel-parameters.txt below...
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2173,6 +2173,25 @@ bytes respectively. Such letter suffixes in the "bleeding edge" mini2440 support kernel at http://repo.or.cz/w/linux-2.6/mini2440.git
mitigations=
Control optional mitigations for CPU vulnerabilities.
This is a set of curated, arch-independent options, each
of which is an aggregation of existing arch-specific
options.
off
Disable all optional CPU mitigations. This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
enabled, even if it's vulnerable. This is for
users who don't want to be surprised by SMT
getting disabled across kernel upgrades, or who
have other ways of avoiding SMT-based attacks.
This is the default behavior.
mminit_loglevel= [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this parameter allows control of the logging verbosity for
--- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -842,3 +842,16 @@ void init_cpu_online(const struct cpumas { cpumask_copy(to_cpumask(cpu_online_bits), src); }
+enum cpu_mitigations cpu_mitigations = CPU_MITIGATIONS_AUTO;
+static int __init mitigations_parse_cmdline(char *arg) +{
if (!strcmp(arg, "off"))
cpu_mitigations = CPU_MITIGATIONS_OFF;
else if (!strcmp(arg, "auto"))
cpu_mitigations = CPU_MITIGATIONS_AUTO;
Perhaps
else pr_crit("mitigations=%s is not supported\n", arg);
?
Actually that makes sense on mainline, too. Cooking a patch...
return 0;
+} +early_param("mitigations", mitigations_parse_cmdline);
Gr{oetje,eeting}s,
Geert
From: Josh Poimboeuf jpoimboe@redhat.com
commit d68be4c4d31295ff6ae34a8ddfaa4c1a8ff42812 upstream.
Configure x86 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2, Speculative Store Bypass, and L1TF.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Jiri Kosina jkosina@suse.cz (on x86) Reviewed-by: Jiri Kosina jkosina@suse.cz Cc: Borislav Petkov bp@alien8.de Cc: "H . Peter Anvin" hpa@zytor.com Cc: Andy Lutomirski luto@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Jiri Kosina jikos@kernel.org Cc: Waiman Long longman@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Jon Masters jcm@redhat.com Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Paul Mackerras paulus@samba.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Tyler Hicks tyhicks@canonical.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Randy Dunlap rdunlap@infradead.org Cc: Steven Price steven.price@arm.com Cc: Phil Auld pauld@redhat.com Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.155508550... [bwh: Backported to 4.4: - Drop the auto,nosmt option and the l1tf mitigation selection, which we can't support - Adjust filenames, context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 14 +++++++++----- arch/x86/kernel/cpu/bugs.c | 6 ++++-- arch/x86/mm/kaiser.c | 4 +++- 3 files changed, 16 insertions(+), 8 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2174,15 +2174,19 @@ bytes respectively. Such letter suffixes http://repo.or.cz/w/linux-2.6/mini2440.git
mitigations= - Control optional mitigations for CPU vulnerabilities. - This is a set of curated, arch-independent options, each - of which is an aggregation of existing arch-specific - options. + [X86] Control optional mitigations for CPU + vulnerabilities. This is a set of curated, + arch-independent options, each of which is an + aggregation of existing arch-specific options.
off Disable all optional CPU mitigations. This improves system performance, but it may also expose users to several CPU vulnerabilities. + Equivalent to: nopti [X86] + nospectre_v2 [X86] + spectre_v2_user=off [X86] + spec_store_bypass_disable=off [X86]
auto (default) Mitigate all CPU vulnerabilities, but leave SMT @@ -2190,7 +2194,7 @@ bytes respectively. Such letter suffixes users who don't want to be surprised by SMT getting disabled across kernel upgrades, or who have other ways of avoiding SMT-based attacks. - This is the default behavior. + Equivalent to: (default behavior)
mminit_loglevel= [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -479,7 +479,8 @@ static enum spectre_v2_mitigation_cmd __ char arg[20]; int ret, i;
- if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) + if (cmdline_find_option_bool(boot_command_line, "nospectre_v2") || + cpu_mitigations_off()) return SPECTRE_V2_CMD_NONE;
ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); @@ -743,7 +744,8 @@ static enum ssb_mitigation_cmd __init ss char arg[20]; int ret, i;
- if (cmdline_find_option_bool(boot_command_line, "nospec_store_bypass_disable")) { + if (cmdline_find_option_bool(boot_command_line, "nospec_store_bypass_disable") || + cpu_mitigations_off()) { return SPEC_STORE_BYPASS_CMD_NONE; } else { ret = cmdline_find_option(boot_command_line, "spec_store_bypass_disable", --- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -10,6 +10,7 @@ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/ftrace.h> +#include <linux/cpu.h>
#undef pr_fmt #define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt @@ -297,7 +298,8 @@ void __init kaiser_check_boottime_disabl goto skip; }
- if (cmdline_find_option_bool(boot_command_line, "nopti")) + if (cmdline_find_option_bool(boot_command_line, "nopti") || + cpu_mitigations_off()) goto disable;
skip:
From: Josh Poimboeuf jpoimboe@redhat.com
commit 5c14068f87d04adc73ba3f41c2a303d3c3d1fa12 upstream.
Add MDS to the new 'mitigations=' cmdline option.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.4: - Drop the auto,nosmt option, which we can't support - Adjust filenames, context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 1 + arch/x86/kernel/cpu/bugs.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2187,6 +2187,7 @@ bytes respectively. Such letter suffixes nospectre_v2 [X86] spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86] + mds=off [X86]
auto (default) Mitigate all CPU vulnerabilities, but leave SMT --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -221,7 +221,7 @@ static const char * const mds_strings[]
static void __init mds_select_mitigation(void) { - if (!boot_cpu_has_bug(X86_BUG_MDS)) { + if (!boot_cpu_has_bug(X86_BUG_MDS) || cpu_mitigations_off()) { mds_mitigation = MDS_MITIGATION_OFF; return; }
From: speck for Pawan Gupta speck@linutronix.de
commit e672f8bf71c66253197e503f75c771dd28ada4a0 upstream.
Updated the documentation for a new CVE-2019-11091 Microarchitectural Data Sampling Uncacheable Memory (MDSUM) which is a variant of Microarchitectural Data Sampling (MDS). MDS is a family of side channel attacks on internal buffers in Intel CPUs.
MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from memory that takes a fault or assist can leave data in a microarchitectural structure that may later be observed using one of the same methods used by MSBDS, MFBDS or MLPDS. There are no new code changes expected for MDSUM. The existing mitigation for MDS applies to MDSUM as well.
Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Tyler Hicks tyhicks@canonical.com Reviewed-by: Jon Masters jcm@redhat.com [bwh: Backported to 4.4: adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/hw-vuln/mds.rst | 5 +++-- Documentation/x86/mds.rst | 5 +++++ 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/Documentation/hw-vuln/mds.rst +++ b/Documentation/hw-vuln/mds.rst @@ -32,11 +32,12 @@ Related CVEs
The following CVE entries are related to the MDS vulnerability:
- ============== ===== ============================================== + ============== ===== =================================================== CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling - ============== ===== ============================================== + CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory + ============== ===== ===================================================
Problem ------- --- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -12,6 +12,7 @@ on internal buffers in Intel CPUs. The v - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) + - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a dependent load (store-to-load forwarding) as an optimization. The forward @@ -38,6 +39,10 @@ faulting or assisting loads under certai exploited eventually. Load ports are shared between Hyper-Threads so cross thread leakage is possible.
+MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from +memory that takes a fault or assist can leave data in a microarchitectural +structure that may later be observed using one of the same methods used by +MSBDS, MFBDS or MLPDS.
Exposure assumptions --------------------
From: Tyler Hicks tyhicks@canonical.com
commit ea01668f9f43021b28b3f4d5ffad50106a1e1301 upstream.
Adjust the last two rows in the table that display possible values when MDS mitigation is enabled. They both were slightly innacurate.
In addition, convert the table of possible values and their descriptions to a list-table. The simple table format uses the top border of equals signs to determine cell width which resulted in the first column being far too wide in comparison to the second column that contained the majority of the text.
Signed-off-by: Tyler Hicks tyhicks@canonical.com Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.4: adjust filename] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/hw-vuln/mds.rst | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/Documentation/hw-vuln/mds.rst +++ b/Documentation/hw-vuln/mds.rst @@ -95,22 +95,19 @@ mitigations are active. The relevant sys
The possible values in this file are:
- ========================================= ================================= - 'Not affected' The processor is not vulnerable + .. list-table::
- 'Vulnerable' The processor is vulnerable, - but no mitigation enabled - - 'Vulnerable: Clear CPU buffers attempted' The processor is vulnerable but - microcode is not updated. - The mitigation is enabled on a - best effort basis. - See :ref:`vmwerv` - - 'Mitigation: CPU buffer clear' The processor is vulnerable and the - CPU buffer clearing mitigation is - enabled. - ========================================= ================================= + * - 'Not affected' + - The processor is not vulnerable + * - 'Vulnerable' + - The processor is vulnerable, but no mitigation enabled + * - 'Vulnerable: Clear CPU buffers attempted, no microcode' + - The processor is vulnerable but microcode is not updated. + + The mitigation is enabled on a best effort basis. See :ref:`vmwerv` + * - 'Mitigation: Clear CPU buffers' + - The processor is vulnerable and the CPU buffer clearing mitigation is + enabled.
If the processor is vulnerable then the following information is appended to the above information:
From: Josh Poimboeuf jpoimboe@redhat.com
commit 95310e348a321b45fb746c176961d4da72344282 upstream.
Fix a minor typo in the MDS documentation: "eanbled" -> "enabled".
Reported-by: Jeff Bastian jbastian@redhat.com Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/x86/mds.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -116,7 +116,7 @@ Kernel internal mitigation modes off Mitigation is disabled. Either the CPU is not affected or mds=off is supplied on the kernel command line
- full Mitigation is eanbled. CPU is affected and MD_CLEAR is + full Mitigation is enabled. CPU is affected and MD_CLEAR is advertised in CPUID.
vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not
From: Ben Hutchings ben@decadent.org.uk
Commit 72c6d2db64fa "x86/litf: Introduce vmx status variable" upstream changed "Page Table Inversion" to "PTE Inversion". That was part of the implementation of additional mitigations for VMX which haven't been applied to this branch. Just change this string to be consistent and match documentation.
Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1160,7 +1160,7 @@ static ssize_t cpu_show_common(struct de
case X86_BUG_L1TF: if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV)) - return sprintf(buf, "Mitigation: Page Table Inversion\n"); + return sprintf(buf, "Mitigation: PTE Inversion\n"); break;
case X86_BUG_MDS:
[ Upstream commit 3161da970d38cd6ed2ba8cadec93874d1d06e11e ]
This patch turns status in a variable read once from the URB. The long term plan is to deliver status to the callback. In addition it makes the code a bit more elegant.
Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/serial/generic.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/serial/generic.c b/drivers/usb/serial/generic.c index 54e170dd3dad0..101051dce60c7 100644 --- a/drivers/usb/serial/generic.c +++ b/drivers/usb/serial/generic.c @@ -350,6 +350,7 @@ void usb_serial_generic_read_bulk_callback(struct urb *urb) struct usb_serial_port *port = urb->context; unsigned char *data = urb->transfer_buffer; unsigned long flags; + int status = urb->status; int i;
for (i = 0; i < ARRAY_SIZE(port->read_urbs); ++i) { @@ -360,22 +361,22 @@ void usb_serial_generic_read_bulk_callback(struct urb *urb)
dev_dbg(&port->dev, "%s - urb %d, len %d\n", __func__, i, urb->actual_length); - switch (urb->status) { + switch (status) { case 0: break; case -ENOENT: case -ECONNRESET: case -ESHUTDOWN: dev_dbg(&port->dev, "%s - urb stopped: %d\n", - __func__, urb->status); + __func__, status); return; case -EPIPE: dev_err(&port->dev, "%s - urb stopped: %d\n", - __func__, urb->status); + __func__, status); return; default: dev_dbg(&port->dev, "%s - nonzero urb status: %d\n", - __func__, urb->status); + __func__, status); goto resubmit; }
@@ -399,6 +400,7 @@ void usb_serial_generic_write_bulk_callback(struct urb *urb) { unsigned long flags; struct usb_serial_port *port = urb->context; + int status = urb->status; int i;
for (i = 0; i < ARRAY_SIZE(port->write_urbs); ++i) { @@ -410,22 +412,22 @@ void usb_serial_generic_write_bulk_callback(struct urb *urb) set_bit(i, &port->write_urbs_free); spin_unlock_irqrestore(&port->lock, flags);
- switch (urb->status) { + switch (status) { case 0: break; case -ENOENT: case -ECONNRESET: case -ESHUTDOWN: dev_dbg(&port->dev, "%s - urb stopped: %d\n", - __func__, urb->status); + __func__, status); return; case -EPIPE: dev_err_console(port, "%s - urb stopped: %d\n", - __func__, urb->status); + __func__, status); return; default: dev_err_console(port, "%s - nonzero urb status: %d\n", - __func__, urb->status); + __func__, status); goto resubmit; }
[ Upstream commit 3f5edd58d040bfa4b74fb89bc02f0bc6b9cd06ab ]
Fix two long-standing bugs which could potentially lead to memory corruption or leave the port throttled until it is reopened (on weakly ordered systems), respectively, when read-URB completion races with unthrottle().
First, the URB must not be marked as free before processing is complete to prevent it from being submitted by unthrottle() on another CPU.
CPU 1 CPU 2 ================ ================ complete() unthrottle() process_urb(); smp_mb__before_atomic(); set_bit(i, free); if (test_and_clear_bit(i, free)) submit_urb();
Second, the URB must be marked as free before checking the throttled flag to prevent unthrottle() on another CPU from failing to observe that the URB needs to be submitted if complete() sees that the throttled flag is set.
CPU 1 CPU 2 ================ ================ complete() unthrottle() set_bit(i, free); throttled = 0; smp_mb__after_atomic(); smp_mb(); if (throttled) if (test_and_clear_bit(i, free)) return; submit_urb();
Note that test_and_clear_bit() only implies barriers when the test is successful. To handle the case where the URB is still in use an explicit barrier needs to be added to unthrottle() for the second race condition.
Fixes: d83b405383c9 ("USB: serial: add support for multiple read urbs") Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/serial/generic.c | 39 +++++++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/serial/generic.c b/drivers/usb/serial/generic.c index 101051dce60c7..faead4f32b1ca 100644 --- a/drivers/usb/serial/generic.c +++ b/drivers/usb/serial/generic.c @@ -350,6 +350,7 @@ void usb_serial_generic_read_bulk_callback(struct urb *urb) struct usb_serial_port *port = urb->context; unsigned char *data = urb->transfer_buffer; unsigned long flags; + bool stopped = false; int status = urb->status; int i;
@@ -357,33 +358,51 @@ void usb_serial_generic_read_bulk_callback(struct urb *urb) if (urb == port->read_urbs[i]) break; } - set_bit(i, &port->read_urbs_free);
dev_dbg(&port->dev, "%s - urb %d, len %d\n", __func__, i, urb->actual_length); switch (status) { case 0: + usb_serial_debug_data(&port->dev, __func__, urb->actual_length, + data); + port->serial->type->process_read_urb(urb); break; case -ENOENT: case -ECONNRESET: case -ESHUTDOWN: dev_dbg(&port->dev, "%s - urb stopped: %d\n", __func__, status); - return; + stopped = true; + break; case -EPIPE: dev_err(&port->dev, "%s - urb stopped: %d\n", __func__, status); - return; + stopped = true; + break; default: dev_dbg(&port->dev, "%s - nonzero urb status: %d\n", __func__, status); - goto resubmit; + break; }
- usb_serial_debug_data(&port->dev, __func__, urb->actual_length, data); - port->serial->type->process_read_urb(urb); + /* + * Make sure URB processing is done before marking as free to avoid + * racing with unthrottle() on another CPU. Matches the barriers + * implied by the test_and_clear_bit() in + * usb_serial_generic_submit_read_urb(). + */ + smp_mb__before_atomic(); + set_bit(i, &port->read_urbs_free); + /* + * Make sure URB is marked as free before checking the throttled flag + * to avoid racing with unthrottle() on another CPU. Matches the + * smp_mb() in unthrottle(). + */ + smp_mb__after_atomic(); + + if (stopped) + return;
-resubmit: /* Throttle the device if requested by tty */ spin_lock_irqsave(&port->lock, flags); port->throttled = port->throttle_req; @@ -458,6 +477,12 @@ void usb_serial_generic_unthrottle(struct tty_struct *tty) port->throttled = port->throttle_req = 0; spin_unlock_irq(&port->lock);
+ /* + * Matches the smp_mb__after_atomic() in + * usb_serial_generic_read_bulk_callback(). + */ + smp_mb(); + if (was_throttled) usb_serial_generic_submit_read_urbs(port, GFP_KERNEL); }
From: Breno Leitao leitao@debian.org
commit 42e2acde1237878462b028f5a27d9cc5bea7502c upstream.
Current powerpc security.c file is defining functions, as cpu_show_meltdown(), cpu_show_spectre_v{1,2} and others, that are being declared at linux/cpu.h header without including the header file that contains these declarations.
This is being reported by sparse, which thinks that these functions are static, due to the lack of declaration:
arch/powerpc/kernel/security.c:105:9: warning: symbol 'cpu_show_meltdown' was not declared. Should it be static? arch/powerpc/kernel/security.c:139:9: warning: symbol 'cpu_show_spectre_v1' was not declared. Should it be static? arch/powerpc/kernel/security.c:161:9: warning: symbol 'cpu_show_spectre_v2' was not declared. Should it be static? arch/powerpc/kernel/security.c:209:6: warning: symbol 'stf_barrier' was not declared. Should it be static? arch/powerpc/kernel/security.c:289:9: warning: symbol 'cpu_show_spec_store_bypass' was not declared. Should it be static?
This patch simply includes the proper header (linux/cpu.h) to match function definition and declaration.
Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Cc: Joel Stanley joel@jms.id.au Cc: Nathan Chancellor natechancellor@gmail.com Cc: Major Hayden major@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/security.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -4,6 +4,7 @@ // // Copyright 2018, Michael Ellerman, IBM Corporation.
+#include <linux/cpu.h> #include <linux/kernel.h> #include <linux/debugfs.h> #include <linux/device.h>
From: "Tobin C. Harding" tobin@kernel.org
[ Upstream commit bdfad5aec1392b93495b77b864d58d7f101dc1c1 ]
Currently error return from kobject_init_and_add() is not followed by a call to kobject_put(). This means there is a memory leak. We currently set p to NULL so that kfree() may be called on it as a noop, the code is arguably clearer if we move the kfree() up closer to where it is called (instead of after goto jump).
Remove a goto label 'err1' and jump to call to kobject_put() in error return from kobject_init_and_add() fixing the memory leak. Re-name goto label 'put_back' to 'err1' now that we don't use err1, following current nomenclature (err1, err2 ...). Move call to kfree out of the error code at bottom of function up to closer to where memory was allocated. Add comment to clarify call to kfree().
Signed-off-by: Tobin C. Harding tobin@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_if.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
--- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -471,13 +471,15 @@ int br_add_if(struct net_bridge *br, str call_netdevice_notifiers(NETDEV_JOIN, dev);
err = dev_set_allmulti(dev, 1); - if (err) - goto put_back; + if (err) { + kfree(p); /* kobject not yet init'd, manually free */ + goto err1; + }
err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj), SYSFS_BRIDGE_PORT_ATTR); if (err) - goto err1; + goto err2;
err = br_sysfs_addif(p); if (err) @@ -551,12 +553,9 @@ err3: sysfs_remove_link(br->ifobj, p->dev->name); err2: kobject_put(&p->kobj); - p = NULL; /* kobject_put frees */ -err1: dev_set_allmulti(dev, -1); -put_back: +err1: dev_put(dev); - kfree(p); return err; }
From: Christophe Leroy christophe.leroy@c-s.fr
[ Upstream commit ee0df19305d9fabd9479b785918966f6e25b733b ]
When changing the number of buffers in the RX ring while the interface is running, the following Oops is encountered due to the new number of buffers being taken into account immediately while their allocation is done when opening the device only.
[ 69.882706] Unable to handle kernel paging request for data at address 0xf0000100 [ 69.890172] Faulting instruction address: 0xc033e164 [ 69.895122] Oops: Kernel access of bad area, sig: 11 [#1] [ 69.900494] BE PREEMPT CMPCPRO [ 69.907120] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.115-00006-g179ade8ce3-dirty #269 [ 69.915956] task: c0684310 task.stack: c06da000 [ 69.920470] NIP: c033e164 LR: c02e44d0 CTR: c02e41fc [ 69.925504] REGS: dfff1e20 TRAP: 0300 Not tainted (4.14.115-00006-g179ade8ce3-dirty) [ 69.934161] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 22004428 XER: 20000000 [ 69.940869] DAR: f0000100 DSISR: 20000000 [ 69.940869] GPR00: c0352d70 dfff1ed0 c0684310 f00000a4 00000040 dfff1f68 00000000 0000001f [ 69.940869] GPR08: df53f410 1cc00040 00000021 c0781640 42004424 100c82b6 f00000a4 df53f5b0 [ 69.940869] GPR16: df53f6c0 c05daf84 00000040 00000000 00000040 c0782be4 00000000 00000001 [ 69.940869] GPR24: 00000000 df53f400 000001b0 df53f410 df53f000 0000003f df708220 1cc00044 [ 69.978348] NIP [c033e164] skb_put+0x0/0x5c [ 69.982528] LR [c02e44d0] ucc_geth_poll+0x2d4/0x3f8 [ 69.987384] Call Trace: [ 69.989830] [dfff1ed0] [c02e4554] ucc_geth_poll+0x358/0x3f8 (unreliable) [ 69.996522] [dfff1f20] [c0352d70] net_rx_action+0x248/0x30c [ 70.002099] [dfff1f80] [c04e93e4] __do_softirq+0xfc/0x310 [ 70.007492] [dfff1fe0] [c0021124] irq_exit+0xd0/0xd4 [ 70.012458] [dfff1ff0] [c000e7e0] call_do_irq+0x24/0x3c [ 70.017683] [c06dbe80] [c0006bac] do_IRQ+0x64/0xc4 [ 70.022474] [c06dbea0] [c001097c] ret_from_except+0x0/0x14 [ 70.027964] --- interrupt: 501 at rcu_idle_exit+0x84/0x90 [ 70.027964] LR = rcu_idle_exit+0x74/0x90 [ 70.037585] [c06dbf60] [20000000] 0x20000000 (unreliable) [ 70.042984] [c06dbf80] [c004bb0c] do_idle+0xb4/0x11c [ 70.047945] [c06dbfa0] [c004bd14] cpu_startup_entry+0x18/0x1c [ 70.053682] [c06dbfb0] [c05fb034] start_kernel+0x370/0x384 [ 70.059153] [c06dbff0] [00003438] 0x3438 [ 70.063062] Instruction dump: [ 70.066023] 38a00000 38800000 90010014 4bfff015 80010014 7c0803a6 3123ffff 7c691910 [ 70.073767] 38210010 4e800020 38600000 4e800020 <80e3005c> 80c30098 3107ffff 7d083910 [ 70.081690] ---[ end trace be7ccd9c1e1a9f12 ]---
This patch forbids the modification of the number of buffers in the ring while the interface is running.
Fixes: ac421852b3a0 ("ucc_geth: add ethtool support") Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/ucc_geth_ethtool.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/freescale/ucc_geth_ethtool.c +++ b/drivers/net/ethernet/freescale/ucc_geth_ethtool.c @@ -253,14 +253,12 @@ uec_set_ringparam(struct net_device *net return -EINVAL; }
+ if (netif_running(netdev)) + return -EBUSY; + ug_info->bdRingLenRx[queue] = ring->rx_pending; ug_info->bdRingLenTx[queue] = ring->tx_pending;
- if (netif_running(netdev)) { - /* FIXME: restart automatically */ - netdev_info(netdev, "Please re-open the interface\n"); - } - return ret; }
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 36096f2f4fa05f7678bc87397665491700bae757 ]
kernel BUG at lib/list_debug.c:47! invalid opcode: 0000 [#1 CPU: 0 PID: 12914 Comm: rmmod Tainted: G W 5.1.0+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__list_del_entry_valid+0x53/0x90 Code: 48 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 00 5d c3 48 89 fe 48 89 c2 48 c7 c7 18 75 fe 82 e8 cb 34 78 ff <0f> 0b 48 89 fe 48 c7 c7 50 75 fe 82 e8 ba 34 78 ff 0f 0b 48 89 f2 RSP: 0018:ffffc90001c2fe40 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffffffffa0184000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888237a17788 RDI: 00000000ffffffff RBP: ffffc90001c2fe40 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90001c2fe10 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90001c2fe50 R14: ffffffffa0184000 R15: 0000000000000000 FS: 00007f3d83634540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555c350ea818 CR3: 0000000231677000 CR4: 00000000000006f0 Call Trace: unregister_pernet_operations+0x34/0x120 unregister_pernet_subsys+0x1c/0x30 packet_exit+0x1c/0x369 [af_packet __x64_sys_delete_module+0x156/0x260 ? lockdep_hardirqs_on+0x133/0x1b0 ? do_syscall_64+0x12/0x1f0 do_syscall_64+0x6e/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
When modprobe af_packet, register_pernet_subsys fails and does a cleanup, ops->list is set to LIST_POISON1, but the module init is considered to success, then while rmmod it, BUG() is triggered in __list_del_entry_valid which is called from unregister_pernet_subsys. This patch fix error handing path in packet_init to avoid possilbe issue if some error occur.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: YueHaibing yuehaibing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4523,14 +4523,29 @@ static void __exit packet_exit(void)
static int __init packet_init(void) { - int rc = proto_register(&packet_proto, 0); + int rc;
- if (rc != 0) + rc = proto_register(&packet_proto, 0); + if (rc) goto out; + rc = sock_register(&packet_family_ops); + if (rc) + goto out_proto; + rc = register_pernet_subsys(&packet_net_ops); + if (rc) + goto out_sock; + rc = register_netdevice_notifier(&packet_netdev_notifier); + if (rc) + goto out_pernet;
- sock_register(&packet_family_ops); - register_pernet_subsys(&packet_net_ops); - register_netdevice_notifier(&packet_netdev_notifier); + return 0; + +out_pernet: + unregister_pernet_subsys(&packet_net_ops); +out_sock: + sock_unregister(PF_PACKET); +out_proto: + proto_unregister(&packet_proto); out: return rc; }
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 873017af778439f2f8e3d87f28ddb1fcaf244a76 ]
With NET_ADMIN enabled in container, a normal user could be mapped to root and is able to change the real device's rx filter via ioctl on vlan, which would affect the other ptp process on host. Fix it by disabling SIOCSHWTSTAMP in container.
Fixes: a6111d3c93d0 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Richard Cochran richardcochran@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/8021q/vlan_dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -363,10 +363,12 @@ static int vlan_dev_ioctl(struct net_dev ifrr.ifr_ifru = ifr->ifr_ifru;
switch (cmd) { + case SIOCSHWTSTAMP: + if (!net_eq(dev_net(dev), &init_net)) + break; case SIOCGMIIPHY: case SIOCGMIIREG: case SIOCSMIIREG: - case SIOCSHWTSTAMP: case SIOCGHWTSTAMP: if (netif_device_present(real_dev) && ops->ndo_do_ioctl) err = ops->ndo_do_ioctl(real_dev, &ifrr, cmd);
From: Stephen Suryaputra ssuryaextr@gmail.com
[ Upstream commit ff6ab32bd4e073976e4d8797b4d514a172cfe6cb ]
VRF netdev mtu isn't typically set and have an mtu of 65536. When the link of a tunnel is set, the tunnel mtu is changed from 1480 to the link mtu minus tunnel header. In the case of VRF netdev is the link, then the tunnel mtu becomes 65516. So, fix it by not setting the tunnel mtu in this case.
Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/sit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1076,7 +1076,7 @@ static void ipip6_tunnel_bind_dev(struct if (!tdev && tunnel->parms.link) tdev = __dev_get_by_index(tunnel->net, tunnel->parms.link);
- if (tdev) { + if (tdev && !netif_is_l3_master(tdev)) { int t_hlen = tunnel->hlen + sizeof(struct iphdr);
dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
From: David Ahern dsahern@gmail.com
[ Upstream commit 19e4e768064a87b073a4b4c138b55db70e0cfb9f ]
inet_iif should be used for the raw socket lookup. inet_iif considers rt_iif which handles the case of local traffic.
As it stands, ping to a local address with the '-I <dev>' option fails ever since ping was changed to use SO_BINDTODEVICE instead of cmsg + IP_PKTINFO.
IPv6 works fine.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/raw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -167,6 +167,7 @@ static int icmp_filter(const struct sock */ static int raw_v4_input(struct sk_buff *skb, const struct iphdr *iph, int hash) { + int dif = inet_iif(skb); struct sock *sk; struct hlist_head *head; int delivered = 0; @@ -179,8 +180,7 @@ static int raw_v4_input(struct sk_buff *
net = dev_net(skb->dev); sk = __raw_v4_lookup(net, __sk_head(head), iph->protocol, - iph->saddr, iph->daddr, - skb->dev->ifindex); + iph->saddr, iph->daddr, dif);
while (sk) { delivered = 1;
From: Jarod Wilson jarod@redhat.com
[ Upstream commit a9b8a2b39ce65df45687cf9ef648885c2a99fe75 ]
There's currently a problem with toggling arp_validate on and off with an active-backup bond. At the moment, you can start up a bond, like so:
modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1 ip link set bond0 down echo "ens4f0" > /sys/class/net/bond0/bonding/slaves echo "ens4f1" > /sys/class/net/bond0/bonding/slaves ip link set bond0 up ip addr add 192.168.1.2/24 dev bond0
Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
echo 1 > /sys/class/net/bond0/bonding/arp_validate
Pings to 192.168.1.1 continue to work just fine. Now when you go to turn arp_validate off again, the link falls flat on it's face:
echo 0 > /sys/class/net/bond0/bonding/arp_validate dmesg ... [133191.911987] bond0: Setting arp_validate to none (0) [133194.257793] bond0: bond_should_notify_peers: slave ens4f0 [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it [133194.259000] bond0: making interface ens4f1 the new active one [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it [133197.331191] bond0: now running without any active interface!
The problem lies in bond_options.c, where passing in arp_validate=0 results in bond->recv_probe getting set to NULL. This flies directly in the face of commit 3fe68df97c7f, which says we need to set recv_probe = bond_arp_recv, even if we're not using arp_validate. Said commit fixed this in bond_option_arp_interval_set, but missed that we can get to that same state in bond_option_arp_validate_set as well.
One solution would be to universally set recv_probe = bond_arp_recv here as well, but I don't think bond_option_arp_validate_set has any business touching recv_probe at all, and that should be left to the arp_interval code, so we can just make things much tidier here.
Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor") CC: Jay Vosburgh j.vosburgh@gmail.com CC: Veaceslav Falico vfalico@gmail.com CC: Andy Gospodarek andy@greyhouse.net CC: "David S. Miller" davem@davemloft.net CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson jarod@redhat.com Signed-off-by: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_options.c | 7 ------- 1 file changed, 7 deletions(-)
--- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -1066,13 +1066,6 @@ static int bond_option_arp_validate_set( { netdev_info(bond->dev, "Setting arp_validate to %s (%llu)\n", newval->string, newval->value); - - if (bond->dev->flags & IFF_UP) { - if (!newval->value) - bond->recv_probe = NULL; - else if (bond->params.arp_interval) - bond->recv_probe = bond_arp_rcv; - } bond->params.arp_validate = newval->value;
return 0;
From: Dan Carpenter dan.carpenter@oracle.com
commit c8ea3663f7a8e6996d44500ee818c9330ac4fd88 upstream.
strndup_user() returns error pointers on error, and then in the error handling we pass the error pointers to kfree(). It will cause an Oops.
Link: http://lkml.kernel.org/r/20181218082003.GD32567@kadam Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Timur Tabi timur@freescale.com Cc: Mihai Caraman mihai.caraman@freescale.com Cc: Kumar Gala galak@kernel.crashing.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/virt/fsl_hypervisor.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)
--- a/drivers/virt/fsl_hypervisor.c +++ b/drivers/virt/fsl_hypervisor.c @@ -335,8 +335,8 @@ static long ioctl_dtprop(struct fsl_hv_i struct fsl_hv_ioctl_prop param; char __user *upath, *upropname; void __user *upropval; - char *path = NULL, *propname = NULL; - void *propval = NULL; + char *path, *propname; + void *propval; int ret = 0;
/* Get the parameters from the user. */ @@ -348,32 +348,30 @@ static long ioctl_dtprop(struct fsl_hv_i upropval = (void __user *)(uintptr_t)param.propval;
path = strndup_user(upath, FH_DTPROP_MAX_PATHLEN); - if (IS_ERR(path)) { - ret = PTR_ERR(path); - goto out; - } + if (IS_ERR(path)) + return PTR_ERR(path);
propname = strndup_user(upropname, FH_DTPROP_MAX_PATHLEN); if (IS_ERR(propname)) { ret = PTR_ERR(propname); - goto out; + goto err_free_path; }
if (param.proplen > FH_DTPROP_MAX_PROPLEN) { ret = -EINVAL; - goto out; + goto err_free_propname; }
propval = kmalloc(param.proplen, GFP_KERNEL); if (!propval) { ret = -ENOMEM; - goto out; + goto err_free_propname; }
if (set) { if (copy_from_user(propval, upropval, param.proplen)) { ret = -EFAULT; - goto out; + goto err_free_propval; }
param.ret = fh_partition_set_dtprop(param.handle, @@ -392,7 +390,7 @@ static long ioctl_dtprop(struct fsl_hv_i if (copy_to_user(upropval, propval, param.proplen) || put_user(param.proplen, &p->proplen)) { ret = -EFAULT; - goto out; + goto err_free_propval; } } } @@ -400,10 +398,12 @@ static long ioctl_dtprop(struct fsl_hv_i if (put_user(param.ret, &p->ret)) ret = -EFAULT;
-out: - kfree(path); +err_free_propval: kfree(propval); +err_free_propname: kfree(propname); +err_free_path: + kfree(path);
return ret; }
From: Dan Carpenter dan.carpenter@oracle.com
commit 6a024330650e24556b8a18cc654ad00cfecf6c6c upstream.
The "param.count" value is a u64 thatcomes from the user. The code later in the function assumes that param.count is at least one and if it's not then it leads to an Oops when we dereference the ZERO_SIZE_PTR.
Also the addition can have an integer overflow which would lead us to allocate a smaller "pages" array than required. I can't immediately tell what the possible run times implications are, but it's safest to prevent the overflow.
Link: http://lkml.kernel.org/r/20181218082129.GE32567@kadam Fixes: 6db7199407ca ("drivers/virt: introduce Freescale hypervisor management driver") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Timur Tabi timur@freescale.com Cc: Mihai Caraman mihai.caraman@freescale.com Cc: Kumar Gala galak@kernel.crashing.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/virt/fsl_hypervisor.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/virt/fsl_hypervisor.c +++ b/drivers/virt/fsl_hypervisor.c @@ -215,6 +215,9 @@ static long ioctl_memcpy(struct fsl_hv_i * hypervisor. */ lb_offset = param.local_vaddr & (PAGE_SIZE - 1); + if (param.count == 0 || + param.count > U64_MAX - lb_offset - PAGE_SIZE + 1) + return -EINVAL; num_pages = (param.count + lb_offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
/* Allocate the buffers we need */
From: Laurentiu Tudor laurentiu.tudor@nxp.com
commit 5266e58d6cd90ac85c187d673093ad9cb649e16d upstream.
Set RI in the default kernel's MSR so that the architected way of detecting unrecoverable machine check interrupts has a chance to work. This is inline with the MSR setup of the rest of booke powerpc architectures configured here.
Signed-off-by: Laurentiu Tudor laurentiu.tudor@nxp.com Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/reg_booke.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -41,7 +41,7 @@ #if defined(CONFIG_PPC_BOOK3E_64) #define MSR_64BIT MSR_CM
-#define MSR_ (MSR_ME | MSR_CE) +#define MSR_ (MSR_ME | MSR_RI | MSR_CE) #define MSR_KERNEL (MSR_ | MSR_64BIT) #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE) #define MSR_USER64 (MSR_USER32 | MSR_64BIT)
stable-rc/linux-4.4.y boot: 98 boots: 1 failed, 92 passed with 3 offline, 1 untried/unknown, 1 conflict (v4.4.179-267-gbe756dada5b7)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.179-267-...
Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.179-267-gbe756dada5b7 Git Commit: be756dada5b771fe51be37a77ad0bdfba543fdae Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 44 unique boards, 21 SoC families, 14 builds out of 190
Boot Regressions Detected:
arm:
omap2plus_defconfig: gcc-8: omap4-panda: lab-baylibre: new failure (last pass: v4.4.179-254-gce69be0d452a)
Boot Failure Detected:
arm64: defconfig: gcc-8: qcom-qdf2400: 1 failed lab
Offline Platforms:
arm:
tegra_defconfig: gcc-8 tegra20-iris-512: 1 offline lab
multi_v7_defconfig: gcc-8 stih410-b2120: 1 offline lab tegra20-iris-512: 1 offline lab
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
arm: omap2plus_defconfig: omap4-panda: lab-baylibre: FAIL (gcc-8) lab-baylibre-seattle: PASS (gcc-8)
--- For more info write to info@kernelci.org
On Wed, May 15, 2019 at 07:47:45AM -0700, kernelci.org bot wrote:
stable-rc/linux-4.4.y boot: 98 boots: 1 failed, 92 passed with 3 offline, 1 untried/unknown, 1 conflict (v4.4.179-267-gbe756dada5b7)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.179-267-...
Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.179-267-gbe756dada5b7 Git Commit: be756dada5b771fe51be37a77ad0bdfba543fdae Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 44 unique boards, 21 SoC families, 14 builds out of 190
Boot Regressions Detected:
arm:
omap2plus_defconfig: gcc-8: omap4-panda: lab-baylibre: new failure (last pass: v4.4.179-254-gce69be0d452a)
Odd, is this specific to this release?
Greg Kroah-Hartman gregkh@linuxfoundation.org writes:
On Wed, May 15, 2019 at 07:47:45AM -0700, kernelci.org bot wrote:
stable-rc/linux-4.4.y boot: 98 boots: 1 failed, 92 passed with 3 offline, 1 untried/unknown, 1 conflict (v4.4.179-267-gbe756dada5b7)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.179-267-...
Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.179-267-gbe756dada5b7 Git Commit: be756dada5b771fe51be37a77ad0bdfba543fdae Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 44 unique boards, 21 SoC families, 14 builds out of 190
Boot Regressions Detected:
arm:
omap2plus_defconfig: gcc-8: omap4-panda: lab-baylibre: new failure (last pass: v4.4.179-254-gce69be0d452a)
Odd, is this specific to this release?
No, looks like a lab-specific hiccup.
A little bit further down in the original report (I know, not a useful place for it) was this:
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.) arm: omap2plus_defconfig: omap4-panda: lab-baylibre: FAIL (gcc-8) lab-baylibre-seattle: PASS (gcc-8)
which means the same board passed in one lab, but not the other, suggesting something.
This is a bug in our email reports. Regressions should not be reported whene there are conflicting results from labs.
Kevin
On Thu, May 16, 2019 at 03:47:25PM -0700, Kevin Hilman wrote:
Greg Kroah-Hartman gregkh@linuxfoundation.org writes:
On Wed, May 15, 2019 at 07:47:45AM -0700, kernelci.org bot wrote:
stable-rc/linux-4.4.y boot: 98 boots: 1 failed, 92 passed with 3 offline, 1 untried/unknown, 1 conflict (v4.4.179-267-gbe756dada5b7)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.179-267-...
Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.179-267-gbe756dada5b7 Git Commit: be756dada5b771fe51be37a77ad0bdfba543fdae Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 44 unique boards, 21 SoC families, 14 builds out of 190
Boot Regressions Detected:
arm:
omap2plus_defconfig: gcc-8: omap4-panda: lab-baylibre: new failure (last pass: v4.4.179-254-gce69be0d452a)
Odd, is this specific to this release?
No, looks like a lab-specific hiccup.
A little bit further down in the original report (I know, not a useful place for it) was this:
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.) arm: omap2plus_defconfig: omap4-panda: lab-baylibre: FAIL (gcc-8) lab-baylibre-seattle: PASS (gcc-8)
which means the same board passed in one lab, but not the other, suggesting something.
This is a bug in our email reports. Regressions should not be reported whene there are conflicting results from labs.
Ah, thanks for the explaination, that makes more sense.
greg k-h
On 5/15/19 3:51 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.180 release. There are 266 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 17 May 2019 09:04:49 AM UTC. Anything received after that time might be too late.
Build results: total: 170 pass: 170 fail: 0 Qemu test results: total: 296 pass: 296 fail: 0
Guenter
On Wed, 15 May 2019 at 16:33, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.4.180 release. There are 266 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 17 May 2019 09:04:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.180-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.4.180-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: be756dada5b771fe51be37a77ad0bdfba543fdae git describe: v4.4.179-267-gbe756dada5b7 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.179-267...
No regressions (compared to build v4.4.179)
No fixes (compared to build v4.4.179)
Ran 13304 total tests in the following environments and test suites.
Environments -------------- - i386 - juno-r2 - arm64 - qemu_arm - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * prep-tmp-disk * spectre-meltdown-checker-test * kvm-unit-tests * v4l2-compliance * install-android-platform-tools-r2600 * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
Summary ------------------------------------------------------------------------
kernel: 4.4.180-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.180-rc1-hikey-20190515-440 git commit: 4acf8bfa73bb083efe32d6b2623a48f49e662657 git describe: 4.4.180-rc1-hikey-20190515-440 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.180-rc1-hikey-20190515-439)
No fixes (compared to build 4.4.180-rc1-hikey-20190515-439)
Ran 3043 total tests in the following environments and test suites.
Environments -------------- - hi6220-hikey - arm64 - qemu_arm64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance
On 15/05/2019 11:51, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.180 release. There are 266 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 17 May 2019 09:04:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.180-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Boot regression detected for Tegra ...
Test results for stable-v4.4: 6 builds: 6 pass, 0 fail 15 boots: 6 pass, 9 fail 8 tests: 8 pass, 0 fail
Linux version: 4.4.180-rc1-gbe756da Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Bisect is point to the following commit ...
# first bad commit: [7849d64a1700ddae1963ff22a77292e9fb5c2983] mm, vmstat: make quiet_vmstat lighter
Reverting this on top v4.4.180-rc1 fixes the problem.
Crash observed ...
[ 17.155812] ------------[ cut here ]------------ [ 17.160431] kernel BUG at /home/jonathanh/workdir/tegra/mlt-linux_stable-4.4/kernel/mm/vmstat.c:1425! [ 17.169632] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 17.175450] Modules linked in: ttm [ 17.178859] CPU: 0 PID: 92 Comm: kworker/0:2 Not tainted 4.4.179-00160-g7849d64a1700 #8 [ 17.186843] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 17.193100] Workqueue: vmstat vmstat_update [ 17.197279] task: ee14e700 ti: ee17a000 task.ti: ee17a000 [ 17.202663] PC is at vmstat_update+0x9c/0xa4 [ 17.206921] LR is at vmstat_update+0x94/0xa4 [ 17.211179] pc : [<c00cdd80>] lr : [<c00cdd78>] psr: 20000113 [ 17.211179] sp : ee17bef8 ip : 00000000 fp : eef91ac0 [ 17.222629] r10: 00000008 r9 : 00000000 r8 : 00000000 [ 17.227840] r7 : eef99900 r6 : eef91ac0 r5 : eef8f34c r4 : ee13dc00 [ 17.234350] r3 : 00000001 r2 : 0000000f r1 : c0a885e0 r0 : 00000001 [ 17.240861] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 17.247978] Control: 10c5387d Table: ad02006a DAC: 00000051 [ 17.253708] Process kworker/0:2 (pid: 92, stack limit = 0xee17a210) [ 17.259957] Stack: (0xee17bef8 to 0xee17c000) [ 17.264301] bee0: ee13dc00 eef8f34c [ 17.272459] bf00: eef91ac0 c003b69c eef91ac0 ee17a038 c0a4ba60 eef91ac0 eef91ad4 ee17a038 [ 17.280618] bf20: c0a4ba60 ee13dc18 ee13dc00 00000008 eef91ac0 c003b8f8 00000000 c09f6100 [ 17.288778] bf40: c003b8b0 ee102a00 00000000 ee13dc00 c003b8b0 00000000 00000000 00000000 [ 17.296937] bf60: 00000000 c0040ad0 00000000 00000000 00000000 ee13dc00 00000000 00000000 [ 17.305094] bf80: ee17bf80 ee17bf80 00000000 00000000 ee17bf90 ee17bf90 ee17bfac ee102a00 [ 17.313253] bfa0: c00409d0 00000000 00000000 c000f650 00000000 00000000 00000000 00000000 [ 17.321412] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 17.329570] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 17.337733] [<c00cdd80>] (vmstat_update) from [<c003b69c>] (process_one_work+0x124/0x338) [ 17.345893] [<c003b69c>] (process_one_work) from [<c003b8f8>] (worker_thread+0x48/0x4c4) [ 17.353966] [<c003b8f8>] (worker_thread) from [<c0040ad0>] (kthread+0x100/0x118) [ 17.361348] [<c0040ad0>] (kthread) from [<c000f650>] (ret_from_fork+0x14/0x24) [ 17.368553] Code: e5930010 eb05c417 e3500000 08bd8070 (e7f001f2) [ 17.374633] ---[ end trace 17cf004302766810 ]---
Cheers Jon
Hi Jon,
Boot regression detected for Tegra ...
Test results for stable-v4.4: 6 builds: 6 pass, 0 fail 15 boots: 6 pass, 9 fail 8 tests: 8 pass, 0 fail
Linux version: 4.4.180-rc1-gbe756da Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Bisect is point to the following commit ...
# first bad commit: [7849d64a1700ddae1963ff22a77292e9fb5c2983] mm, vmstat: make quiet_vmstat lighter
Reverting this on top v4.4.180-rc1 fixes the problem.
I guess the patch depends on another change. I'll try to figure out what is missing.
Crash observed ...
[ 17.155812] ------------[ cut here ]------------ [ 17.160431] kernel BUG at /home/jonathanh/workdir/tegra/mlt-linux_stable-4.4/kernel/mm/vmstat.c:1425! [ 17.169632] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 17.175450] Modules linked in: ttm [ 17.178859] CPU: 0 PID: 92 Comm: kworker/0:2 Not tainted 4.4.179-00160-g7849d64a1700 #8 [ 17.186843] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 17.193100] Workqueue: vmstat vmstat_update [ 17.197279] task: ee14e700 ti: ee17a000 task.ti: ee17a000 [ 17.202663] PC is at vmstat_update+0x9c/0xa4 [ 17.206921] LR is at vmstat_update+0x94/0xa4 [ 17.211179] pc : [<c00cdd80>] lr : [<c00cdd78>] psr: 20000113 [ 17.211179] sp : ee17bef8 ip : 00000000 fp : eef91ac0 [ 17.222629] r10: 00000008 r9 : 00000000 r8 : 00000000 [ 17.227840] r7 : eef99900 r6 : eef91ac0 r5 : eef8f34c r4 : ee13dc00 [ 17.234350] r3 : 00000001 r2 : 0000000f r1 : c0a885e0 r0 : 00000001 [ 17.240861] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 17.247978] Control: 10c5387d Table: ad02006a DAC: 00000051 [ 17.253708] Process kworker/0:2 (pid: 92, stack limit = 0xee17a210) [ 17.259957] Stack: (0xee17bef8 to 0xee17c000) [ 17.264301] bee0: ee13dc00 eef8f34c [ 17.272459] bf00: eef91ac0 c003b69c eef91ac0 ee17a038 c0a4ba60 eef91ac0 eef91ad4 ee17a038 [ 17.280618] bf20: c0a4ba60 ee13dc18 ee13dc00 00000008 eef91ac0 c003b8f8 00000000 c09f6100 [ 17.288778] bf40: c003b8b0 ee102a00 00000000 ee13dc00 c003b8b0 00000000 00000000 00000000 [ 17.296937] bf60: 00000000 c0040ad0 00000000 00000000 00000000 ee13dc00 00000000 00000000 [ 17.305094] bf80: ee17bf80 ee17bf80 00000000 00000000 ee17bf90 ee17bf90 ee17bfac ee102a00 [ 17.313253] bfa0: c00409d0 00000000 00000000 c000f650 00000000 00000000 00000000 00000000 [ 17.321412] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 17.329570] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 17.337733] [<c00cdd80>] (vmstat_update) from [<c003b69c>] (process_one_work+0x124/0x338) [ 17.345893] [<c003b69c>] (process_one_work) from [<c003b8f8>] (worker_thread+0x48/0x4c4) [ 17.353966] [<c003b8f8>] (worker_thread) from [<c0040ad0>] (kthread+0x100/0x118) [ 17.361348] [<c0040ad0>] (kthread) from [<c000f650>] (ret_from_fork+0x14/0x24) [ 17.368553] Code: e5930010 eb05c417 e3500000 08bd8070 (e7f001f2) [ 17.374633] ---[ end trace 17cf004302766810 ]---
Thanks, Daniel
On Thu, May 16, 2019 at 01:59:43PM +0200, Daniel Wagner wrote:
Hi Jon,
Boot regression detected for Tegra ...
Test results for stable-v4.4: 6 builds: 6 pass, 0 fail 15 boots: 6 pass, 9 fail 8 tests: 8 pass, 0 fail
Linux version: 4.4.180-rc1-gbe756da Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Bisect is point to the following commit ...
# first bad commit: [7849d64a1700ddae1963ff22a77292e9fb5c2983] mm, vmstat: make quiet_vmstat lighter
Reverting this on top v4.4.180-rc1 fixes the problem.
I guess the patch depends on another change. I'll try to figure out what is missing.
Jon, thanks for the testing, I'll go drop this patch now from the final version.
Daniel, if you can come up with a working series, I'll be glad to take it. Or, I'd recommend you just move to a newer kernel :)
thanks,
greg k-h
Hi Greg,
On 16.05.19 18:49, Greg Kroah-Hartman wrote:
Jon, thanks for the testing, I'll go drop this patch now from the final version.
That's fine, I wanted to suggest this too. I have some time to look at this next week. So there is no hurry with this patch.
Daniel, if you can come up with a working series, I'll be glad to take it. Or, I'd recommend you just move to a newer kernel :)
Sure, I will see what is missing.
@Jon if I get have something to test, would you have time to give it a try first?
There is someone constantly updating the v4.4.y tree, which makes me update the -rt patches all the time. Don't fear, I am not running 4.4.y, this is only for important infrastructure :)
Thanks, Daniel
On 17/05/2019 08:44, Daniel Wagner wrote:
Hi Greg,
On 16.05.19 18:49, Greg Kroah-Hartman wrote:
Jon, thanks for the testing, I'll go drop this patch now from the final version.
That's fine, I wanted to suggest this too. I have some time to look at this next week. So there is no hurry with this patch.
Daniel, if you can come up with a working series, I'll be glad to take it. Or, I'd recommend you just move to a newer kernel :)
Sure, I will see what is missing.
@Jon if I get have something to test, would you have time to give it a try first?
Yes no problem.
Cheers Jon
On 5/15/19 4:51 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.180 release. There are 266 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 17 May 2019 09:04:49 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.180-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org