This is the start of the stable review cycle for the 4.14.57 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:50 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.57-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.57-rc1
Tejun Heo tj@kernel.org string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add HYP per-cpu accessors
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Add prctl interface for per-thread mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Introduce thread flag to control userspace mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Restore mitigation status on CPU resume
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Add global mitigation state accessor
Marc Zyngier marc.zyngier@arm.com arm64: Add 'ssbd' command-line option
Marc Zyngier marc.zyngier@arm.com arm64: Add ARCH_WORKAROUND_2 probing
Marc Zyngier marc.zyngier@arm.com arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
Marc Zyngier marc.zyngier@arm.com arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
Marc Zyngier marc.zyngier@arm.com arm/arm64: smccc: Add SMCCC-specific return codes
Christoffer Dall christoffer.dall@linaro.org KVM: arm64: Avoid storing the vcpu pointer on the stack
Marc Zyngier marc.zyngier@arm.com KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state
Marc Zyngier marc.zyngier@arm.com arm64: alternatives: Add dynamic patching feature
James Morse james.morse@arm.com KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
James Morse james.morse@arm.com arm64: alternatives: use tpidr_el2 on VHE hosts
James Morse james.morse@arm.com KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
James Morse james.morse@arm.com KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation
James Morse james.morse@arm.com KVM: arm64: Store vcpu on the stack during __guest_enter()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
Santosh Shilimkar santosh.shilimkar@oracle.com rds: avoid unenecessary cong_update in loop transport
Jan Kara jack@suse.cz bdi: Fix another oops in wb_workfn()
Florian Westphal fw@strlen.de netfilter: ipv6: nf_defrag: drop skb dst before queueing
Willem de Bruijn willemb@google.com nsh: set mac len based on inner packet
Tomas Bortoli tomasbortoli@gmail.com autofs: fix slab out of bounds read in getname_kernel()
Dave Watson davejwatson@fb.com tls: Stricter error checking in zerocopy sendmsg path
Eric Biggers ebiggers@google.com KEYS: DNS: fix parsing multiple options
Eric Biggers ebiggers@google.com reiserfs: fix buffer overflow with long warning messages
Florian Westphal fw@strlen.de netfilter: ebtables: reject non-bridge targets
Dexuan Cui decui@microsoft.com PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
Alan Jenkins alan.christopher.jenkins@gmail.com block: do not use interruptible wait anywhere
Masahiro Yamada yamada.masahiro@socionext.com mtd: rawnand: denali_dt: set clk_x_rate to 200 MHz unconditionally
Stephan Mueller smueller@chronox.de crypto: af_alg - Initialize sg_num_bytes in error code path
Peter Zijlstra peterz@infradead.org clocksource: Initialize cs->wd_list
Sean Young sean@mess.org media: rc: oops in ir_timer_keyup after device unplug
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix USB3 NULL pointer dereference at logical disconnect.
Stefan Wahren stefan.wahren@i2se.com net: lan78xx: Fix race in tx pending skb size calculation
Ping-Ke Shih pkshih@realtek.com rtlwifi: rtl8821ae: fix firmware is not ready to run
Ping-Ke Shih pkshih@realtek.com rtlwifi: Fix kernel Oops "Fw download fail!!"
Gustavo A. R. Silva gustavo@embeddedor.com net: cxgb3_main: fix potential Spectre v1
Claudio Imbrenda imbrenda@linux.vnet.ibm.com VSOCK: fix loopback on big-endian systems
Jason Wang jasowang@redhat.com vhost_net: validate sock before trying to put its fd
Ilpo Järvinen ilpo.jarvinen@helsinki.fi tcp: prevent bogus FRTO undos with non-SACK flows
Yuchung Cheng ycheng@google.com tcp: fix Fast Open key endianness
Doron Roberts-Kedes doronrk@fb.com strparser: Remove early eaten to fix full tcp receive buffer stall
Bhadram Varka vbhadram@nvidia.com stmmac: fix DMA channel hang in half-duplex mode
Jiri Slaby jslaby@suse.cz r8152: napi hangup fix after disconnect
Aleksander Morgado aleksander@aleksander.es qmi_wwan: add support for the Dell Wireless 5821e module
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Limit msix vectors in kdump kernel to the minimum required count.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Fix use of incorrect size in memcpy call.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Fix setting of incorrect eswitch mode.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qede: Adverstise software timestamp caps when PHC is not available.
David Ahern dsahern@gmail.com net/tcp: Fix socket lookups with SO_BINDTODEVICE
Eric Dumazet edumazet@google.com net: sungem: fix rx checksum support
Konstantin Khlebnikov khlebnikov@yandex-team.ru net_sched: blackhole: tell upper qdisc about dropped packets
Eric Dumazet edumazet@google.com net/packet: fix use-after-free
Antoine Tenart antoine.tenart@bootlin.com net: mvneta: fix the Rx desc DMA address in the Rx path
Shay Agroskin shayag@mellanox.com net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
Eli Cohen eli@mellanox.com net/mlx5: Fix required capability for manipulating MPFS
Alex Vesker valex@mellanox.com net/mlx5: Fix incorrect raw command length parsing
Alex Vesker valex@mellanox.com net/mlx5: Fix command interface race in polling mode
Or Gerlitz ogerlitz@mellanox.com net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
Or Gerlitz ogerlitz@mellanox.com net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
Or Gerlitz ogerlitz@mellanox.com net/mlx5e: Avoid dealing with vport representors if not being e-switch manager
Harini Katakam harini.katakam@xilinx.com net: macb: Fix ptp time adjustment for large negative delta
Sabrina Dubroca sd@queasysnail.net net: fix use-after-free in GRO with ESP
Eric Dumazet edumazet@google.com net: dccp: switch rx_tstamp_last_feedback to monotonic clock
Eric Dumazet edumazet@google.com net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
Jesper Dangaard Brouer brouer@redhat.com ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
Xin Long lucien.xin@gmail.com ipvlan: fix IFLA_MTU ignored on NEWLINK
Eric Biggers ebiggers@google.com ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
Stephen Hemminger sthemmin@microsoft.com hv_netvsc: split sub-channel setup into async and sync
Gustavo A. R. Silva gustavo@embeddedor.com atm: zatm: Fix potential Spectre v1
David Woodhouse dwmw2@infradead.org atm: Preserve value of skb->truesize when accounting to vcc
Sabrina Dubroca sd@queasysnail.net alx: take rtnl before calling __alx_open from resume
Christian Lamparter chunkeey@googlemail.com crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
Christian Lamparter chunkeey@googlemail.com crypto: crypto4xx - remove bad list_del
Jaehoon Chung jh80.chung@samsung.com PCI: exynos: Fix a potential init_clk_resources NULL pointer dereference
Jonas Gorski jonas.gorski@gmail.com bcm63xx_enet: do not write to random DMA channel on BCM6345
Jonas Gorski jonas.gorski@gmail.com bcm63xx_enet: correct clock usage
alex chen alex.chen@huawei.com ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
alex chen alex.chen@huawei.com ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
Chuck Lever chuck.lever@oracle.com xprtrdma: Fix corner cases when handling device removal
Prashanth Prakash pprakash@codeaurora.org cpufreq / CPPC: Set platform specific transition_delay_us
Filipe Manana fdmanana@suse.com Btrfs: fix duplicate extents after fsync of file with prealloc extents
Nick Desaulniers ndesaulniers@google.com x86/paravirt: Make native_save_fl() extern inline
H. Peter Anvin hpa@linux.intel.com x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
Nick Desaulniers ndesaulniers@google.com compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 17 ++ Makefile | 4 +- arch/arm/include/asm/kvm_host.h | 12 ++ arch/arm/include/asm/kvm_mmu.h | 12 ++ arch/arm64/Kconfig | 9 ++ arch/arm64/include/asm/alternative.h | 43 ++++- arch/arm64/include/asm/assembler.h | 8 + arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/cpufeature.h | 22 +++ arch/arm64/include/asm/kvm_asm.h | 41 +++++ arch/arm64/include/asm/kvm_host.h | 43 +++++ arch/arm64/include/asm/kvm_mmu.h | 44 +++++ arch/arm64/include/asm/percpu.h | 11 +- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/alternative.c | 52 ++++-- arch/arm64/kernel/asm-offsets.c | 2 + arch/arm64/kernel/cpu_errata.c | 180 +++++++++++++++++++++ arch/arm64/kernel/cpufeature.c | 17 ++ arch/arm64/kernel/entry.S | 30 ++++ arch/arm64/kernel/hibernate.c | 11 ++ arch/arm64/kernel/ssbd.c | 108 +++++++++++++ arch/arm64/kernel/suspend.c | 8 + arch/arm64/kvm/hyp-init.S | 4 + arch/arm64/kvm/hyp/entry.S | 12 +- arch/arm64/kvm/hyp/hyp-entry.S | 62 +++++-- arch/arm64/kvm/hyp/switch.c | 64 ++++++-- arch/arm64/kvm/hyp/sysreg-sr.c | 21 ++- arch/arm64/kvm/reset.c | 4 + arch/arm64/mm/proc.S | 8 + arch/x86/include/asm/asm.h | 59 +++++++ arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 +++ block/blk-core.c | 9 +- crypto/af_alg.c | 4 +- drivers/atm/zatm.c | 2 + drivers/cpufreq/cppc_cpufreq.c | 46 +++++- drivers/crypto/amcc/crypto4xx_core.c | 23 ++- drivers/media/rc/rc-main.c | 4 +- drivers/mtd/nand/denali_dt.c | 6 +- drivers/net/ethernet/atheros/alx/main.c | 8 +- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 34 ++-- drivers/net/ethernet/cadence/macb_ptp.c | 5 +- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 +-- drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 2 + .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 5 +- drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c | 9 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 7 +- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_main.c | 8 + drivers/net/ethernet/qlogic/qed/qed_sriov.c | 19 ++- drivers/net/ethernet/qlogic/qede/qede_ptp.c | 10 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++ drivers/net/ethernet/sun/sungem.c | 22 +-- drivers/net/geneve.c | 2 +- drivers/net/hyperv/hyperv_net.h | 2 +- drivers/net/hyperv/netvsc.c | 37 ++++- drivers/net/hyperv/netvsc_drv.c | 17 +- drivers/net/hyperv/rndis_filter.c | 61 ++----- drivers/net/ipvlan/ipvlan_main.c | 3 +- drivers/net/usb/lan78xx.c | 5 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/usb/r8152.c | 3 +- drivers/net/vxlan.c | 4 +- drivers/net/wireless/realtek/rtlwifi/base.c | 17 +- drivers/net/wireless/realtek/rtlwifi/base.h | 2 +- drivers/net/wireless/realtek/rtlwifi/core.c | 3 +- drivers/net/wireless/realtek/rtlwifi/pci.c | 2 +- drivers/net/wireless/realtek/rtlwifi/ps.c | 4 +- drivers/net/wireless/realtek/rtlwifi/usb.c | 2 +- drivers/pci/dwc/pci-exynos.c | 3 +- drivers/pci/host/pci-hyperv.c | 8 +- drivers/usb/host/xhci-hub.c | 2 +- drivers/vhost/net.c | 3 +- fs/autofs4/dev-ioctl.c | 22 +-- fs/btrfs/tree-log.c | 137 +++++++++++++--- fs/ocfs2/aops.c | 26 ++- fs/ocfs2/cluster/nodemanager.c | 63 +++++++- fs/reiserfs/prints.c | 141 +++++++++------- include/linux/arm-smccc.h | 10 ++ include/linux/atmdev.h | 15 ++ include/linux/backing-dev-defs.h | 2 +- include/linux/compiler-gcc.h | 29 +++- include/linux/mlx5/mlx5_ifc.h | 2 +- include/linux/netdevice.h | 20 +++ include/linux/string.h | 2 +- kernel/time/clocksource.c | 2 + mm/backing-dev.c | 20 +-- net/8021q/vlan.c | 2 +- net/atm/br2684.c | 3 +- net/atm/clip.c | 3 +- net/atm/common.c | 3 +- net/atm/lec.c | 3 +- net/atm/mpc.c | 3 +- net/atm/pppoatm.c | 3 +- net/atm/raw.c | 4 +- net/bridge/netfilter/ebtables.c | 13 ++ net/dccp/ccids/ccid3.c | 16 +- net/dns_resolver/dns_key.c | 28 ++-- net/ipv4/fou.c | 4 +- net/ipv4/gre_offload.c | 2 +- net/ipv4/inet_hashtables.c | 4 +- net/ipv4/sysctl_net_ipv4.c | 18 ++- net/ipv4/tcp_input.c | 9 ++ net/ipv4/udp_offload.c | 2 +- net/ipv6/inet6_hashtables.c | 4 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 + net/ipv6/seg6_hmac.c | 2 +- net/nfc/llcp_commands.c | 9 +- net/nsh/nsh.c | 2 +- net/packet/af_packet.c | 16 +- net/rds/loop.c | 1 + net/rds/rds.h | 5 + net/rds/recv.c | 5 + net/sched/sch_blackhole.c | 2 +- net/strparser/strparser.c | 17 +- net/sunrpc/xprtrdma/verbs.c | 13 +- net/tls/tls_sw.c | 2 +- net/vmw_vsock/virtio_transport.c | 2 +- virt/kvm/arm/arm.c | 22 +-- virt/kvm/arm/hyp/vgic-v2-sr.c | 2 +- virt/kvm/arm/psci.c | 18 ++- 133 files changed, 1698 insertions(+), 472 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Desaulniers ndesaulniers@google.com
commit d03db2bc26f0e4a6849ad649a09c9c73fccdc656 upstream.
Functions marked extern inline do not emit an externally visible function when the gnu89 C standard is used. Some KBUILD Makefiles overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without an explicit C standard specified, the default is gnu11. Since c99, the semantics of extern inline have changed such that an externally visible function is always emitted. This can lead to multiple definition errors of extern inline functions at link time of compilation units whose build files have removed an explicit C standard compiler flag for users of GCC 5.1+ or Clang.
Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: H. Peter Anvin hpa@zytor.com Suggested-by: Joe Perches joe@perches.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: sedat.dilek@gmail.com Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler-gcc.h | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
--- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,25 +66,40 @@ #endif
/* + * Feature detection for gnu_inline (gnu89 extern inline semantics). Either + * __GNUC_STDC_INLINE__ is defined (not using gnu89 extern inline semantics, + * and we opt in to the gnu89 semantics), or __GNUC_STDC_INLINE__ is not + * defined so the gnu89 semantics are the default. + */ +#ifdef __GNUC_STDC_INLINE__ +# define __gnu_inline __attribute__((gnu_inline)) +#else +# define __gnu_inline +#endif + +/* * Force always-inline if the user requests it so via the .config, * or if gcc is too old. * GCC does not warn about unused static inline functions for * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well by using "unused" * function attribute, which is redundant but not harmful for gcc. + * Prefer gnu_inline, so that extern inline functions do not emit an + * externally visible function. This makes extern inline behave as per gnu89 + * semantics rather than c99. This prevents multiple symbol definition errors + * of extern inline functions at link time. + * A lot of inline functions can cause havoc with function tracing. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline,unused)) notrace -#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace -#define __inline __inline __attribute__((always_inline,unused)) notrace +#define inline \ + inline __attribute__((always_inline, unused)) notrace __gnu_inline #else -/* A lot of inline functions can cause havoc with function tracing */ -#define inline inline __attribute__((unused)) notrace -#define __inline__ __inline__ __attribute__((unused)) notrace -#define __inline __inline __attribute__((unused)) notrace +#define inline inline __attribute__((unused)) notrace __gnu_inline #endif
+#define __inline__ inline +#define __inline inline #define __always_inline inline __attribute__((always_inline)) #define noinline __attribute__((noinline))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: H. Peter Anvin hpa@linux.intel.com
commit 0e2e160033283e20f688d8bad5b89460cc5bfcc4 upstream.
i386 and x86-64 uses different registers for arguments; make them available so we don't have to #ifdef in the actual code.
Native size and specified size (q, l, w, b) versions are provided.
Signed-off-by: H. Peter Anvin hpa@linux.intel.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Sedat Dilek sedat.dilek@gmail.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-3-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/asm.h | 59 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -46,6 +46,65 @@ #define _ASM_SI __ASM_REG(si) #define _ASM_DI __ASM_REG(di)
+#ifndef __x86_64__ +/* 32 bit */ + +#define _ASM_ARG1 _ASM_AX +#define _ASM_ARG2 _ASM_DX +#define _ASM_ARG3 _ASM_CX + +#define _ASM_ARG1L eax +#define _ASM_ARG2L edx +#define _ASM_ARG3L ecx + +#define _ASM_ARG1W ax +#define _ASM_ARG2W dx +#define _ASM_ARG3W cx + +#define _ASM_ARG1B al +#define _ASM_ARG2B dl +#define _ASM_ARG3B cl + +#else +/* 64 bit */ + +#define _ASM_ARG1 _ASM_DI +#define _ASM_ARG2 _ASM_SI +#define _ASM_ARG3 _ASM_DX +#define _ASM_ARG4 _ASM_CX +#define _ASM_ARG5 r8 +#define _ASM_ARG6 r9 + +#define _ASM_ARG1Q rdi +#define _ASM_ARG2Q rsi +#define _ASM_ARG3Q rdx +#define _ASM_ARG4Q rcx +#define _ASM_ARG5Q r8 +#define _ASM_ARG6Q r9 + +#define _ASM_ARG1L edi +#define _ASM_ARG2L esi +#define _ASM_ARG3L edx +#define _ASM_ARG4L ecx +#define _ASM_ARG5L r8d +#define _ASM_ARG6L r9d + +#define _ASM_ARG1W di +#define _ASM_ARG2W si +#define _ASM_ARG3W dx +#define _ASM_ARG4W cx +#define _ASM_ARG5W r8w +#define _ASM_ARG6W r9w + +#define _ASM_ARG1B dil +#define _ASM_ARG2B sil +#define _ASM_ARG3B dl +#define _ASM_ARG4B cl +#define _ASM_ARG5B r8b +#define _ASM_ARG6B r9b + +#endif + /* * Macros to generate condition code outputs from inline assembly, * The output operand must be type "bool".
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Desaulniers ndesaulniers@google.com
commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream.
native_save_fl() is marked static inline, but by using it as a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.
paravirt's use of native_save_fl() also requires that no GPRs other than %rax are clobbered.
Compilers have different heuristics which they use to emit stack guard code, the emittance of which can break paravirt's callee saved assumption by clobbering %rcx.
Marking a function definition extern inline means that if this version cannot be inlined, then the out-of-line version will be preferred. By having the out-of-line version be implemented in assembly, it cannot be instrumented with a stack protector, which might violate custom calling conventions that code like paravirt rely on.
The semantics of extern inline has changed since gnu89. This means that folks using GCC versions >= 5.1 may see symbol redefinition errors at link time for subdirs that override KBUILD_CFLAGS (making the C standard used implicit) regardless of this patch. This has been cleaned up earlier in the patch set, but is left as a note in the commit message for future travelers.
Reports: https://lkml.org/lkml/2018/5/7/534 https://github.com/ClangBuiltLinux/linux/issues/16
Discussion: https://bugs.llvm.org/show_bug.cgi?id=37512 https://lkml.org/lkml/2018/5/24/1371
Thanks to the many folks that participated in the discussion.
Debugged-by: Alistair Strachan astrachan@google.com Debugged-by: Matthias Kaehlcke mka@chromium.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: H. Peter Anvin hpa@zytor.com Suggested-by: Tom Stellar tstellar@redhat.com Reported-by: Sedat Dilek sedat.dilek@gmail.com Tested-by: Sedat Dilek sedat.dilek@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -13,7 +13,7 @@ * Interrupt control: */
-static inline unsigned long native_save_fl(void) +extern inline unsigned long native_save_fl(void) { unsigned long flags;
--- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -58,6 +58,7 @@ obj-y += alternative.o i8253.o pci-nom obj-y += tsc.o tsc_msr.o io_delay.o rtc.o obj-y += pci-iommu_table.o obj-y += resource.o +obj-y += irqflags.o
obj-y += process.o obj-y += fpu/ --- /dev/null +++ b/arch/x86/kernel/irqflags.S @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include <asm/asm.h> +#include <asm/export.h> +#include <linux/linkage.h> + +/* + * unsigned long native_save_fl(void) + */ +ENTRY(native_save_fl) + pushf + pop %_ASM_AX + ret +ENDPROC(native_save_fl) +EXPORT_SYMBOL(native_save_fl) + +/* + * void native_restore_fl(unsigned long flags) + * %eax/%rdi: flags + */ +ENTRY(native_restore_fl) + push %_ASM_ARG1 + popf + ret +ENDPROC(native_restore_fl) +EXPORT_SYMBOL(native_restore_fl)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 31d11b83b96faaee4bb514d375a09489117c3e8d upstream.
In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), on fsync, we started to always log all prealloc extents beyond an inode's i_size in order to avoid losing them after a power failure. However under some cases this can lead to the log replay code to create duplicate extent items, with different lengths, in the extent tree. That happens because, as of that commit, we can now log extent items based on extent maps that are not on the "modified" list of extent maps of the inode's extent map tree. Logging extent items based on extent maps is used during the fast fsync path to save time and for this to work reliably it requires that the extent maps are not merged with other adjacent extent maps - having the extent maps in the list of modified extents gives such guarantee.
Consider the following example, captured during a long run of fsstress, which illustrates this problem.
We have inode 271, in the filesystem tree (root 5), for which all of the following operations and discussion apply to.
A buffered write starts at offset 312391 with a length of 933471 bytes (end offset at 1245862). At this point we have, for this inode, the following extent maps with the their field values:
em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 376832, block_start 1106399232, block_len 376832, orig_block_len 376832 em C, start 417792, orig_start 417792, len 782336, block_start 18446744073709551613, block_len 0, orig_block_len 0 em D, start 1200128, orig_start 1200128, len 835584, block_start 1106776064, block_len 835584, orig_block_len 835584 em E, start 2035712, orig_start 2035712, len 245760, block_start 1107611648, block_len 245760, orig_block_len 245760
Extent map A corresponds to a hole and extent maps D and E correspond to preallocated extents.
Extent map D ends where extent map E begins (1106776064 + 835584 = 1107611648), but these extent maps were not merged because they are in the inode's list of modified extent maps.
An fsync against this inode is made, which triggers the fast path (BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback of the data previously written using buffered IO, and when the respective ordered extent finishes, btrfs_drop_extents() is called against the (aligned) range 311296..1249279. This causes a split of extent map D at btrfs_drop_extent_cache(), replacing extent map D with a new extent map D', also added to the list of modified extents, with the following values:
em D', start 1249280, orig_start of 1200128, block_start 1106825216 (= 1106776064 + 1249280 - 1200128), orig_block_len 835584, block_len 786432 (835584 - (1249280 - 1200128))
Then, during the fast fsync, btrfs_log_changed_extents() is called and extent maps D' and E are removed from the list of modified extents. The flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged clear_em_logging() is called on each of them, and that makes extent map E to be merged with extent map D' (try_merge_map()), resulting in D' being deleted and E adjusted to:
em E, start 1249280, orig_start 1200128, len 1032192, block_start 1106825216, block_len 1032192, orig_block_len 245760
A direct IO write at offset 1847296 and length of 360448 bytes (end offset at 2207744) starts, and at that moment the following extent maps exist for our inode:
em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 270336, block_start 1106399232, block_len 270336, orig_block_len 376832 em C, start 311296, orig_start 311296, len 937984, block_start 1112842240, block_len 937984, orig_block_len 937984 em E (prealloc), start 1249280, orig_start 1200128, len 1032192, block_start 1106825216, block_len 1032192, orig_block_len 245760
The dio write results in drop_extent_cache() being called twice. The first time for a range that starts at offset 1847296 and ends at offset 2035711 (length of 188416), which results in a double split of extent map E, replacing it with two new extent maps:
em F, start 1249280, orig_start 1200128, block_start 1106825216, block_len 598016, orig_block_len 598016 em G, start 2035712, orig_start 1200128, block_start 1107611648, block_len 245760, orig_block_len 1032192
It also creates a new extent map that represents a part of the requested IO (through create_io_em()):
em H, start 1847296, len 188416, block_start 1107423232, block_len 188416
The second call to drop_extent_cache() has a range with a start offset of 2035712 and end offset of 2207743 (length of 172032). This leads to replacing extent map G with a new extent map I with the following values:
em I, start 2207744, orig_start 1200128, block_start 1107783680, block_len 73728, orig_block_len 1032192
It also creates a new extent map that represents the second part of the requested IO (through create_io_em()):
em J, start 2035712, len 172032, block_start 1107611648, block_len 172032
The dio write set the inode's i_size to 2207744 bytes.
After the dio write the inode has the following extent maps:
em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 270336, block_start 1106399232, block_len 270336, orig_block_len 376832 em C, start 311296, orig_start 311296, len 937984, block_start 1112842240, block_len 937984, orig_block_len 937984 em F, start 1249280, orig_start 1200128, len 598016, block_start 1106825216, block_len 598016, orig_block_len 598016 em H, start 1847296, orig_start 1200128, len 188416, block_start 1107423232, block_len 188416, orig_block_len 835584 em J, start 2035712, orig_start 2035712, len 172032, block_start 1107611648, block_len 172032, orig_block_len 245760 em I, start 2207744, orig_start 1200128, len 73728, block_start 1107783680, block_len 73728, orig_block_len 1032192
Now do some change to the file, like adding a xattr for example and then fsync it again. This triggers a fast fsync path, and as of commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), we use the extent map I to log a file extent item because it's a prealloc extent and it starts at an offset matching the inode's i_size. However when we log it, we create a file extent item with a value for the disk byte location that is wrong, as can be seen from the following output of "btrfs inspect-internal dump-tree":
item 1 key (271 EXTENT_DATA 2207744) itemoff 3782 itemsize 53 generation 22 type 2 (prealloc) prealloc data disk byte 1106776064 nr 1032192 prealloc data offset 1007616 nr 73728
Here the disk byte value corresponds to calculation based on some fields from the extent map I:
1106776064 = block_start (1107783680) - 1007616 (extent_offset) extent_offset = 2207744 (start) - 1200128 (orig_start) = 1007616
The disk byte value of 1106776064 clashes with disk byte values of the file extent items at offsets 1249280 and 1847296 in the fs tree:
item 6 key (271 EXTENT_DATA 1249280) itemoff 3568 itemsize 53 generation 20 type 2 (prealloc) prealloc data disk byte 1106776064 nr 835584 prealloc data offset 49152 nr 598016 item 7 key (271 EXTENT_DATA 1847296) itemoff 3515 itemsize 53 generation 20 type 1 (regular) extent data disk byte 1106776064 nr 835584 extent data offset 647168 nr 188416 ram 835584 extent compression 0 (none) item 8 key (271 EXTENT_DATA 2035712) itemoff 3462 itemsize 53 generation 20 type 1 (regular) extent data disk byte 1107611648 nr 245760 extent data offset 0 nr 172032 ram 245760 extent compression 0 (none) item 9 key (271 EXTENT_DATA 2207744) itemoff 3409 itemsize 53 generation 20 type 2 (prealloc) prealloc data disk byte 1107611648 nr 245760 prealloc data offset 172032 nr 73728
Instead of the disk byte value of 1106776064, the value of 1107611648 should have been logged. Also the data offset value should have been 172032 and not 1007616. After a log replay we end up getting two extent items in the extent tree with different lengths, one of 835584, which is correct and existed before the log replay, and another one of 1032192 which is wrong and is based on the logged file extent item:
item 12 key (1106776064 EXTENT_ITEM 835584) itemoff 3406 itemsize 53 refs 2 gen 15 flags DATA extent data backref root 5 objectid 271 offset 1200128 count 2 item 13 key (1106776064 EXTENT_ITEM 1032192) itemoff 3353 itemsize 53 refs 1 gen 22 flags DATA extent data backref root 5 objectid 271 offset 1200128 count 1
Obviously this leads to many problems and a filesystem check reports many errors:
(...) checking extents Extent back ref already exists for 1106776064 parent 0 root 5 owner 271 offset 1200128 num_refs 1 extent item 1106776064 has multiple extent items ref mismatch on [1106776064 835584] extent item 2, found 3 Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 2 wanted 1 back 0x55b1d0ad7680 Backref 1106776064 root 5 owner 271 offset 1200128 num_refs 0 not found in extent tree Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 1 wanted 0 back 0x55b1d0ad4e70 Backref bytes do not match extent backref, bytenr=1106776064, ref bytes=835584, backref bytes=1032192 backpointer mismatch on [1106776064 835584] checking free space cache block group 1103101952 has wrong amount of free space failed to load free space cache for block group 1103101952 checking fs roots (...)
So fix this by logging the prealloc extents beyond the inode's i_size based on searches in the subvolume tree instead of the extent maps.
Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-log.c | 137 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 112 insertions(+), 25 deletions(-)
--- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4214,6 +4214,110 @@ static int log_one_extent(struct btrfs_t return ret; }
+/* + * Log all prealloc extents beyond the inode's i_size to make sure we do not + * lose them after doing a fast fsync and replaying the log. We scan the + * subvolume's root instead of iterating the inode's extent map tree because + * otherwise we can log incorrect extent items based on extent map conversion. + * That can happen due to the fact that extent maps are merged when they + * are not in the extent map tree's list of modified extents. + */ +static int btrfs_log_prealloc_extents(struct btrfs_trans_handle *trans, + struct btrfs_inode *inode, + struct btrfs_path *path) +{ + struct btrfs_root *root = inode->root; + struct btrfs_key key; + const u64 i_size = i_size_read(&inode->vfs_inode); + const u64 ino = btrfs_ino(inode); + struct btrfs_path *dst_path = NULL; + u64 last_extent = (u64)-1; + int ins_nr = 0; + int start_slot; + int ret; + + if (!(inode->flags & BTRFS_INODE_PREALLOC)) + return 0; + + key.objectid = ino; + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = i_size; + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + while (true) { + struct extent_buffer *leaf = path->nodes[0]; + int slot = path->slots[0]; + + if (slot >= btrfs_header_nritems(leaf)) { + if (ins_nr > 0) { + ret = copy_items(trans, inode, dst_path, path, + &last_extent, start_slot, + ins_nr, 1, 0); + if (ret < 0) + goto out; + ins_nr = 0; + } + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + break; + } + continue; + } + + btrfs_item_key_to_cpu(leaf, &key, slot); + if (key.objectid > ino) + break; + if (WARN_ON_ONCE(key.objectid < ino) || + key.type < BTRFS_EXTENT_DATA_KEY || + key.offset < i_size) { + path->slots[0]++; + continue; + } + if (last_extent == (u64)-1) { + last_extent = key.offset; + /* + * Avoid logging extent items logged in past fsync calls + * and leading to duplicate keys in the log tree. + */ + do { + ret = btrfs_truncate_inode_items(trans, + root->log_root, + &inode->vfs_inode, + i_size, + BTRFS_EXTENT_DATA_KEY); + } while (ret == -EAGAIN); + if (ret) + goto out; + } + if (ins_nr == 0) + start_slot = slot; + ins_nr++; + path->slots[0]++; + if (!dst_path) { + dst_path = btrfs_alloc_path(); + if (!dst_path) { + ret = -ENOMEM; + goto out; + } + } + } + if (ins_nr > 0) { + ret = copy_items(trans, inode, dst_path, path, &last_extent, + start_slot, ins_nr, 1, 0); + if (ret > 0) + ret = 0; + } +out: + btrfs_release_path(path); + btrfs_free_path(dst_path); + return ret; +} + static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_inode *inode, @@ -4256,6 +4360,11 @@ static int btrfs_log_changed_extents(str if (em->generation <= test_gen) continue;
+ /* We log prealloc extents beyond eof later. */ + if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) && + em->start >= i_size_read(&inode->vfs_inode)) + continue; + if (em->start < logged_start) logged_start = em->start; if ((em->start + em->len - 1) > logged_end) @@ -4268,31 +4377,6 @@ static int btrfs_log_changed_extents(str num++; }
- /* - * Add all prealloc extents beyond the inode's i_size to make sure we - * don't lose them after doing a fast fsync and replaying the log. - */ - if (inode->flags & BTRFS_INODE_PREALLOC) { - struct rb_node *node; - - for (node = rb_last(&tree->map); node; node = rb_prev(node)) { - em = rb_entry(node, struct extent_map, rb_node); - if (em->start < i_size_read(&inode->vfs_inode)) - break; - if (!list_empty(&em->list)) - continue; - /* Same as above loop. */ - if (++num > 32768) { - list_del_init(&tree->modified_extents); - ret = -EFBIG; - goto process; - } - refcount_inc(&em->refs); - set_bit(EXTENT_FLAG_LOGGING, &em->flags); - list_add_tail(&em->list, &extents); - } - } - list_sort(NULL, &extents, extent_cmp); btrfs_get_logged_extents(inode, logged_list, logged_start, logged_end); /* @@ -4337,6 +4421,9 @@ process: up_write(&inode->dio_sem);
btrfs_release_path(path); + if (!ret) + ret = btrfs_log_prealloc_extents(trans, inode, path); + return ret; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prashanth Prakash pprakash@codeaurora.org
commit d4f3388afd488ed15368fa7413b8bd6d1f98bb1d upstream.
Add support to specify platform specific transition_delay_us instead of using the transition delay derived from PCC.
With commit 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) we are setting transition_delay_us directly and not applying the LATENCY_MULTIPLIER. Because of that, on Qualcomm Centriq we can end up with a very high rate of frequency change requests when using the schedutil governor (default rate_limit_us=10 compared to an earlier value of 10000).
The PCC subspace describes the rate at which the platform can accept commands on the CPPC's PCC channel. This includes read and write command on the PCC channel that can be used for reasons other than frequency transitions. Moreover the same PCC subspace can be used by multiple freq domains and deriving transition_delay_us from it as we do now can be sub-optimal.
Moreover if a platform does not use PCC for desired_perf register then there is no way to compute the transition latency or the delay_us.
CPPC does not have a standard defined mechanism to get the transition rate or the latency at the moment.
Given the above limitations, it is simpler to have a platform specific transition_delay_us and rely on PCC derived value only if a platform specific value is not available.
Signed-off-by: Prashanth Prakash pprakash@codeaurora.org Cc: 4.14+ stable@vger.kernel.org # 4.14+ Fixes: 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpufreq/cppc_cpufreq.c | 46 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-)
--- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -126,6 +126,49 @@ static void cppc_cpufreq_stop_cpu(struct cpu->perf_caps.lowest_perf, cpu_num, ret); }
+/* + * The PCC subspace describes the rate at which platform can accept commands + * on the shared PCC channel (including READs which do not count towards freq + * trasition requests), so ideally we need to use the PCC values as a fallback + * if we don't have a platform specific transition_delay_us + */ +#ifdef CONFIG_ARM64 +#include <asm/cputype.h> + +static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu) +{ + unsigned long implementor = read_cpuid_implementor(); + unsigned long part_num = read_cpuid_part_number(); + unsigned int delay_us = 0; + + switch (implementor) { + case ARM_CPU_IMP_QCOM: + switch (part_num) { + case QCOM_CPU_PART_FALKOR_V1: + case QCOM_CPU_PART_FALKOR: + delay_us = 10000; + break; + default: + delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; + break; + } + break; + default: + delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; + break; + } + + return delay_us; +} + +#else + +static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu) +{ + return cppc_get_transition_latency(cpu) / NSEC_PER_USEC; +} +#endif + static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) { struct cppc_cpudata *cpu; @@ -163,8 +206,7 @@ static int cppc_cpufreq_cpu_init(struct policy->cpuinfo.max_freq = cppc_dmi_max_khz;
policy->cpuinfo.transition_latency = cppc_get_transition_latency(cpu_num); - policy->transition_delay_us = cppc_get_transition_latency(cpu_num) / - NSEC_PER_USEC; + policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu_num); policy->shared_type = cpu->shared_type;
if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
commit 25524288631fc5b7d33259fca1e0dc38146be5d6 upstream.
Michal Kalderon has found some corner cases around device unload with active NFS mounts that I didn't have the imagination to test when xprtrdma device removal was added last year.
- The ULP device removal handler is responsible for deallocating the PD. That wasn't clear to me initially, and my own testing suggested it was not necessary, but that is incorrect.
- The transport destruction path can no longer assume that there is a valid ID.
- When destroying a transport, ensure that ib_free_cq() is not invoked on a CQ that was already released.
Reported-by: Michal Kalderon Michal.Kalderon@cavium.com Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from ...") Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/xprtrdma/verbs.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -270,7 +270,6 @@ rpcrdma_conn_upcall(struct rdma_cm_id *i wait_for_completion(&ia->ri_remove_done);
ia->ri_id = NULL; - ia->ri_pd = NULL; ia->ri_device = NULL; /* Return 1 to ensure the core destroys the id. */ return 1; @@ -464,7 +463,9 @@ rpcrdma_ia_remove(struct rpcrdma_ia *ia) ia->ri_id->qp = NULL; } ib_free_cq(ep->rep_attr.recv_cq); + ep->rep_attr.recv_cq = NULL; ib_free_cq(ep->rep_attr.send_cq); + ep->rep_attr.send_cq = NULL;
/* The ULP is responsible for ensuring all DMA * mappings and MRs are gone. @@ -477,6 +478,8 @@ rpcrdma_ia_remove(struct rpcrdma_ia *ia) rpcrdma_dma_unmap_regbuf(req->rl_recvbuf); } rpcrdma_destroy_mrs(buf); + ib_dealloc_pd(ia->ri_pd); + ia->ri_pd = NULL;
/* Allow waiters to continue */ complete(&ia->ri_remove_done); @@ -650,14 +653,16 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep
cancel_delayed_work_sync(&ep->rep_connect_worker);
- if (ia->ri_id->qp) { + if (ia->ri_id && ia->ri_id->qp) { rpcrdma_ep_disconnect(ep, ia); rdma_destroy_qp(ia->ri_id); ia->ri_id->qp = NULL; }
- ib_free_cq(ep->rep_attr.recv_cq); - ib_free_cq(ep->rep_attr.send_cq); + if (ep->rep_attr.recv_cq) + ib_free_cq(ep->rep_attr.recv_cq); + if (ep->rep_attr.send_cq) + ib_free_cq(ep->rep_attr.send_cq); }
/* Re-establish a connection after a device removal event.
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: alex chen alex.chen@huawei.com
commit 853bc26a7ea39e354b9f8889ae7ad1492ffa28d2 upstream.
The subsystem.su_mutex is required while accessing the item->ci_parent, otherwise, NULL pointer dereference to the item->ci_parent will be triggered in the following situation:
add node delete node sys_write vfs_write configfs_write_file o2nm_node_store o2nm_node_local_write do_rmdir vfs_rmdir configfs_rmdir mutex_lock(&subsys->su_mutex); unlink_obj item->ci_group = NULL; item->ci_parent = NULL; to_o2nm_cluster_from_node node->nd_item.ci_parent->ci_parent BUG since of NULL pointer dereference to nd_item.ci_parent
Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex.
[alex.chen@huawei.com: v2] Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.com Signed-off-by: Alex Chen alex.chen@huawei.com Reviewed-by: Jun Piao piaojun@huawei.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/cluster/nodemanager.c | 63 +++++++++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 8 deletions(-)
--- a/fs/ocfs2/cluster/nodemanager.c +++ b/fs/ocfs2/cluster/nodemanager.c @@ -40,6 +40,9 @@ char *o2nm_fence_method_desc[O2NM_FENCE_ "panic", /* O2NM_FENCE_PANIC */ };
+static inline void o2nm_lock_subsystem(void); +static inline void o2nm_unlock_subsystem(void); + struct o2nm_node *o2nm_get_node_by_num(u8 node_num) { struct o2nm_node *node = NULL; @@ -181,7 +184,10 @@ static struct o2nm_cluster *to_o2nm_clus { /* through the first node_set .parent * mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */ - return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + if (node->nd_item.ci_parent) + return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + else + return NULL; }
enum { @@ -194,7 +200,7 @@ static ssize_t o2nm_node_num_store(struc size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; int ret = 0; @@ -214,6 +220,13 @@ static ssize_t o2nm_node_num_store(struc !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */
+ o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + write_lock(&cluster->cl_nodes_lock); if (cluster->cl_nodes[tmp]) ret = -EEXIST; @@ -226,6 +239,8 @@ static ssize_t o2nm_node_num_store(struc set_bit(tmp, cluster->cl_nodes_bitmap); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret;
@@ -269,7 +284,7 @@ static ssize_t o2nm_node_ipv4_address_st size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; int ret, i; struct rb_node **p, *parent; unsigned int octets[4]; @@ -286,6 +301,13 @@ static ssize_t o2nm_node_ipv4_address_st be32_add_cpu(&ipv4_addr, octets[i] << (i * 8)); }
+ o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + ret = 0; write_lock(&cluster->cl_nodes_lock); if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, &p, &parent)) @@ -298,6 +320,8 @@ static ssize_t o2nm_node_ipv4_address_st rb_insert_color(&node->nd_ip_node, &cluster->cl_node_ip_tree); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret;
@@ -315,7 +339,7 @@ static ssize_t o2nm_node_local_store(str size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; ssize_t ret; @@ -333,17 +357,26 @@ static ssize_t o2nm_node_local_store(str !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */
+ o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + ret = -EINVAL; + goto out; + } + /* the only failure case is trying to set a new local node * when a different one is already set */ if (tmp && tmp == cluster->cl_has_local && - cluster->cl_local_node != node->nd_num) - return -EBUSY; + cluster->cl_local_node != node->nd_num) { + ret = -EBUSY; + goto out; + }
/* bring up the rx thread if we're setting the new local node. */ if (tmp && !cluster->cl_has_local) { ret = o2net_start_listening(node); if (ret) - return ret; + goto out; }
if (!tmp && cluster->cl_has_local && @@ -358,7 +391,11 @@ static ssize_t o2nm_node_local_store(str cluster->cl_local_node = node->nd_num; }
- return count; + ret = count; + +out: + o2nm_unlock_subsystem(); + return ret; }
CONFIGFS_ATTR(o2nm_node_, num); @@ -738,6 +775,16 @@ static struct o2nm_cluster_group o2nm_cl }, };
+static inline void o2nm_lock_subsystem(void) +{ + mutex_lock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + +static inline void o2nm_unlock_subsystem(void) +{ + mutex_unlock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + int o2nm_depend_item(struct config_item *item) { return configfs_depend_item(&o2nm_cluster_group.cs_subsys, item);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: alex chen alex.chen@huawei.com
commit 3e4c56d41eef5595035872a2ec5a483f42e8917f upstream.
ip_alloc_sem should be taken in ocfs2_get_block() when reading file in DIRECT mode to prevent concurrent access to extent tree with ocfs2_dio_end_io_write(), which may cause BUGON in the following situation:
read file 'A' end_io of writing file 'A' vfs_read __vfs_read ocfs2_file_read_iter generic_file_read_iter ocfs2_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO get_more_blocks ocfs2_get_block ocfs2_extent_map_get_blocks ocfs2_get_clusters ocfs2_get_clusters_nocache() ocfs2_search_extent_list return the index of record which contains the v_cluster, that is v_cluster > rec[i]->e_cpos. ocfs2_dio_end_io ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem); ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ... --> modify the rec[i]->e_cpos, resulting in v_cluster < rec[i]->e_cpos. BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
[alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Alex Chen alex.chen@huawei.com Reviewed-by: Jun Piao piaojun@huawei.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Reviewed-by: Gang He ghe@suse.com Acked-by: Changwei Ge ge.changwei@h3c.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/aops.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
--- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -134,6 +134,19 @@ bail: return err; }
+static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) +{ + int ret = 0; + struct ocfs2_inode_info *oi = OCFS2_I(inode); + + down_read(&oi->ip_alloc_sem); + ret = ocfs2_get_block(inode, iblock, bh_result, create); + up_read(&oi->ip_alloc_sem); + + return ret; +} + int ocfs2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { @@ -2128,7 +2141,7 @@ static void ocfs2_dio_free_write_ctx(str * called like this: dio->get_blocks(dio->inode, fs_startblk, * fs_count, map_bh, dio->rw == WRITE); */ -static int ocfs2_dio_get_block(struct inode *inode, sector_t iblock, +static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); @@ -2154,12 +2167,9 @@ static int ocfs2_dio_get_block(struct in * while file size will be changed. */ if (pos + total_len <= i_size_read(inode)) { - down_read(&oi->ip_alloc_sem); - /* This is the fast path for re-write. */ - ret = ocfs2_get_block(inode, iblock, bh_result, create); - - up_read(&oi->ip_alloc_sem);
+ /* This is the fast path for re-write. */ + ret = ocfs2_lock_get_block(inode, iblock, bh_result, create); if (buffer_mapped(bh_result) && !buffer_new(bh_result) && ret == 0) @@ -2424,9 +2434,9 @@ static ssize_t ocfs2_direct_IO(struct ki return 0;
if (iov_iter_rw(iter) == READ) - get_block = ocfs2_get_block; + get_block = ocfs2_lock_get_block; else - get_block = ocfs2_dio_get_block; + get_block = ocfs2_dio_wr_get_block;
return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, get_block,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Gorski jonas.gorski@gmail.com
commit 9c86b846ce02f7e35d7234cf090b80553eba5389 upstream.
Check the return code of prepare_enable and change one last instance of enable only to prepare_enable. Also properly disable and release the clock in error paths and on remove for enetsw.
Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 31 ++++++++++++++++++++------- 1 file changed, 23 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1773,7 +1773,9 @@ static int bcm_enet_probe(struct platfor ret = PTR_ERR(priv->mac_clk); goto out; } - clk_prepare_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk_mac;
/* initialize default and fetch platform data */ priv->rx_ring_size = BCMENET_DEF_RX_DESC; @@ -1805,9 +1807,11 @@ static int bcm_enet_probe(struct platfor if (IS_ERR(priv->phy_clk)) { ret = PTR_ERR(priv->phy_clk); priv->phy_clk = NULL; - goto out_put_clk_mac; + goto out_disable_clk_mac; } - clk_prepare_enable(priv->phy_clk); + ret = clk_prepare_enable(priv->phy_clk); + if (ret) + goto out_put_clk_phy; }
/* do minimal hardware init to be able to probe mii bus */ @@ -1901,13 +1905,16 @@ out_free_mdio: out_uninit_hw: /* turn off mdc clock */ enet_writel(priv, 0, ENET_MIISC_REG); - if (priv->phy_clk) { + if (priv->phy_clk) clk_disable_unprepare(priv->phy_clk); + +out_put_clk_phy: + if (priv->phy_clk) clk_put(priv->phy_clk); - }
-out_put_clk_mac: +out_disable_clk_mac: clk_disable_unprepare(priv->mac_clk); +out_put_clk_mac: clk_put(priv->mac_clk); out: free_netdev(dev); @@ -2752,7 +2759,9 @@ static int bcm_enetsw_probe(struct platf ret = PTR_ERR(priv->mac_clk); goto out_unmap; } - clk_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk;
priv->rx_chan = 0; priv->tx_chan = 1; @@ -2773,7 +2782,7 @@ static int bcm_enetsw_probe(struct platf
ret = register_netdev(dev); if (ret) - goto out_put_clk; + goto out_disable_clk;
netif_carrier_off(dev); platform_set_drvdata(pdev, dev); @@ -2782,6 +2791,9 @@ static int bcm_enetsw_probe(struct platf
return 0;
+out_disable_clk: + clk_disable_unprepare(priv->mac_clk); + out_put_clk: clk_put(priv->mac_clk);
@@ -2813,6 +2825,9 @@ static int bcm_enetsw_remove(struct plat res = platform_get_resource(pdev, IORESOURCE_MEM, 0); release_mem_region(res->start, resource_size(res));
+ clk_disable_unprepare(priv->mac_clk); + clk_put(priv->mac_clk); + free_netdev(dev); return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Gorski jonas.gorski@gmail.com
commit d6213c1f2ad54a964b77471690264ed685718928 upstream.
The DMA controller regs actually point to DMA channel 0, so the write to ENETDMA_CFG_REG will actually modify a random DMA channel.
Since DMA controller registers do not exist on BCM6345, guard the write with the usual check for dma_has_sram.
Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1062,7 +1062,8 @@ static int bcm_enet_open(struct net_devi val = enet_readl(priv, ENET_CTL_REG); val |= ENET_CTL_ENABLE_MASK; enet_writel(priv, val, ENET_CTL_REG); - enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); + if (priv->dma_has_sram) + enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); enet_dmac_writel(priv, priv->dma_chan_en_mask, ENETDMAC_CHANCFG, priv->rx_chan);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaehoon Chung jh80.chung@samsung.com
commit b5d6bc90c9129279d363ccbc02ad11e7b657c0b4 upstream.
In order to avoid triggering a NULL pointer dereference in exynos_pcie_probe() a check must be put in place to detect if the init_clk_resources hook is initialized before calling it.
Add the respective function pointer check in exynos_pcie_probe().
Signed-off-by: Jaehoon Chung jh80.chung@samsung.com [lorenzo.pieralisi@arm.com: rewrote the commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/dwc/pci-exynos.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/pci/dwc/pci-exynos.c +++ b/drivers/pci/dwc/pci-exynos.c @@ -695,7 +695,8 @@ static int __init exynos_pcie_probe(stru return ret; }
- if (ep->ops && ep->ops->get_clk_resources) { + if (ep->ops && ep->ops->get_clk_resources && + ep->ops->init_clk_resources) { ret = ep->ops->get_clk_resources(ep); if (ret) return ret;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Lamparter chunkeey@googlemail.com
commit a728a196d253530f17da5c86dc7dfbe58c5f7094 upstream.
alg entries are only added to the list, after the registration was successful. If the registration failed, it was never added to the list in the first place.
Signed-off-by: Christian Lamparter chunkeey@googlemail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/amcc/crypto4xx_core.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -1033,12 +1033,10 @@ int crypto4xx_register_alg(struct crypto break; }
- if (rc) { - list_del(&alg->entry); + if (rc) kfree(alg); - } else { + else list_add_tail(&alg->entry, &sec_dev->alg_list); - } }
return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Lamparter chunkeey@googlemail.com
commit 5d59ad6eea82ef8df92b4109615a0dde9d8093e9 upstream.
If one of the later memory allocations in rypto4xx_build_pdr() fails: dev->pdr (and/or) dev->pdr_uinfo wouldn't be freed.
crypto4xx_build_sdr() has the same issue with dev->sdr.
Signed-off-by: Christian Lamparter chunkeey@googlemail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/amcc/crypto4xx_core.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
--- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -207,7 +207,7 @@ static u32 crypto4xx_build_pdr(struct cr dev->pdr_pa); return -ENOMEM; } - memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); + memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); dev->shadow_sa_pool = dma_alloc_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, &dev->shadow_sa_pool_pa, @@ -240,13 +240,15 @@ static u32 crypto4xx_build_pdr(struct cr
static void crypto4xx_destroy_pdr(struct crypto4xx_device *dev) { - if (dev->pdr != NULL) + if (dev->pdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_pd) * PPC4XX_NUM_PD, dev->pdr, dev->pdr_pa); + if (dev->shadow_sa_pool) dma_free_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, dev->shadow_sa_pool, dev->shadow_sa_pool_pa); + if (dev->shadow_sr_pool) dma_free_coherent(dev->core_dev->device, sizeof(struct sa_state_record) * PPC4XX_NUM_PD, @@ -416,12 +418,12 @@ static u32 crypto4xx_build_sdr(struct cr
static void crypto4xx_destroy_sdr(struct crypto4xx_device *dev) { - if (dev->sdr != NULL) + if (dev->sdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_sd) * PPC4XX_NUM_SD, dev->sdr, dev->sdr_pa);
- if (dev->scatter_buffer_va != NULL) + if (dev->scatter_buffer_va) dma_free_coherent(dev->core_dev->device, dev->scatter_buffer_size * PPC4XX_NUM_SD, dev->scatter_buffer_va, @@ -1191,7 +1193,7 @@ static int crypto4xx_probe(struct platfo
rc = crypto4xx_build_gdr(core_dev->dev); if (rc) - goto err_build_gdr; + goto err_build_pdr;
rc = crypto4xx_build_sdr(core_dev->dev); if (rc) @@ -1234,12 +1236,11 @@ err_iomap: err_request_irq: irq_dispose_mapping(core_dev->irq); tasklet_kill(&core_dev->tasklet); - crypto4xx_destroy_sdr(core_dev->dev); err_build_sdr: + crypto4xx_destroy_sdr(core_dev->dev); crypto4xx_destroy_gdr(core_dev->dev); -err_build_gdr: - crypto4xx_destroy_pdr(core_dev->dev); err_build_pdr: + crypto4xx_destroy_pdr(core_dev->dev); kfree(core_dev->dev); err_alloc_dev: kfree(core_dev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit bc800e8b39bad60ccdb83be828da63af71ab87b3 ]
The __alx_open function can be called from ndo_open, which is called under RTNL, or from alx_resume, which isn't. Since commit d768319cd427, we're calling the netif_set_real_num_{tx,rx}_queues functions, which need to be called under RTNL.
This is similar to commit 0c2cc02e571a ("igb: Move the calls to set the Tx and Rx queues into igb_open").
Fixes: d768319cd427 ("alx: enable multiple tx queues") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/atheros/alx/main.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/atheros/alx/main.c +++ b/drivers/net/ethernet/atheros/alx/main.c @@ -1897,13 +1897,19 @@ static int alx_resume(struct device *dev struct pci_dev *pdev = to_pci_dev(dev); struct alx_priv *alx = pci_get_drvdata(pdev); struct alx_hw *hw = &alx->hw; + int err;
alx_reset_phy(hw);
if (!netif_running(alx->dev)) return 0; netif_device_attach(alx->dev); - return __alx_open(alx, true); + + rtnl_lock(); + err = __alx_open(alx, true); + rtnl_unlock(); + + return err; }
static SIMPLE_DEV_PM_OPS(alx_pm_ops, alx_suspend, alx_resume);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Gustavo A. R. Silva" gustavo@embeddedor.com
[ Upstream commit ced9e191501e52b95e1b57b8e0db00943869eed0 ]
pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap)
Fix this by sanitizing pool before using it to index zatm_dev->pool_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/atm/zatm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/atm/zatm.c +++ b/drivers/atm/zatm.c @@ -1483,6 +1483,8 @@ static int zatm_ioctl(struct atm_dev *de return -EFAULT; if (pool < 0 || pool > ZATM_LAST_POOL) return -EINVAL; + pool = array_index_nospec(pool, + ZATM_LAST_POOL + 1); if (copy_from_user(&info, &((struct zatm_pool_req __user *) arg)->info, sizeof(info))) return -EFAULT;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Hemminger sthemmin@microsoft.com
[ Upstream commit 3ffe64f1a641b80a82d9ef4efa7a05ce69049871 ]
When doing device hotplug the sub channel must be async to avoid deadlock issues because device is discovered in softirq context.
When doing changes to MTU and number of channels, the setup must be synchronous to avoid races such as when MTU and device settings are done in a single ip command.
Reported-by: Thomas Walker Thomas.Walker@twosigma.com Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation") Signed-off-by: Stephen Hemminger sthemmin@microsoft.com Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/hyperv_net.h | 2 - drivers/net/hyperv/netvsc.c | 37 ++++++++++++++++++++++- drivers/net/hyperv/netvsc_drv.c | 17 +++++++++- drivers/net/hyperv/rndis_filter.c | 61 +++++++------------------------------- 4 files changed, 65 insertions(+), 52 deletions(-)
--- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -207,7 +207,7 @@ int netvsc_recv_callback(struct net_devi void netvsc_channel_cb(void *context); int netvsc_poll(struct napi_struct *napi, int budget);
-void rndis_set_subchannel(struct work_struct *w); +int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev); int rndis_filter_open(struct netvsc_device *nvdev); int rndis_filter_close(struct netvsc_device *nvdev); struct netvsc_device *rndis_filter_device_add(struct hv_device *dev, --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -62,6 +62,41 @@ void netvsc_switch_datapath(struct net_d VM_PKT_DATA_INBAND, 0); }
+/* Worker to setup sub channels on initial setup + * Initial hotplug event occurs in softirq context + * and can't wait for channels. + */ +static void netvsc_subchan_work(struct work_struct *w) +{ + struct netvsc_device *nvdev = + container_of(w, struct netvsc_device, subchan_work); + struct rndis_device *rdev; + int i, ret; + + /* Avoid deadlock with device removal already under RTNL */ + if (!rtnl_trylock()) { + schedule_work(w); + return; + } + + rdev = nvdev->extension; + if (rdev) { + ret = rndis_set_subchannel(rdev->ndev, nvdev); + if (ret == 0) { + netif_device_attach(rdev->ndev); + } else { + /* fallback to only primary channel */ + for (i = 1; i < nvdev->num_chn; i++) + netif_napi_del(&nvdev->chan_table[i].napi); + + nvdev->max_chn = 1; + nvdev->num_chn = 1; + } + } + + rtnl_unlock(); +} + static struct netvsc_device *alloc_net_device(void) { struct netvsc_device *net_device; @@ -78,7 +113,7 @@ static struct netvsc_device *alloc_net_d
init_completion(&net_device->channel_init_wait); init_waitqueue_head(&net_device->subchan_open); - INIT_WORK(&net_device->subchan_work, rndis_set_subchannel); + INIT_WORK(&net_device->subchan_work, netvsc_subchan_work);
return net_device; } --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -911,8 +911,20 @@ static int netvsc_attach(struct net_devi if (IS_ERR(nvdev)) return PTR_ERR(nvdev);
- /* Note: enable and attach happen when sub-channels setup */ + if (nvdev->num_chn > 1) { + ret = rndis_set_subchannel(ndev, nvdev); + + /* if unavailable, just proceed with one queue */ + if (ret) { + nvdev->max_chn = 1; + nvdev->num_chn = 1; + } + }
+ /* In any case device is now ready */ + netif_device_attach(ndev); + + /* Note: enable and attach happen when sub-channels setup */ netif_carrier_off(ndev);
if (netif_running(ndev)) { @@ -2035,6 +2047,9 @@ static int netvsc_probe(struct hv_device
memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
+ if (nvdev->num_chn > 1) + schedule_work(&nvdev->subchan_work); + /* hw_features computed in rndis_netdev_set_hwcaps() */ net->features = net->hw_features | NETIF_F_HIGHDMA | NETIF_F_SG | --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -1055,29 +1055,15 @@ static void netvsc_sc_open(struct vmbus_ * This breaks overlap of processing the host message for the * new primary channel with the initialization of sub-channels. */ -void rndis_set_subchannel(struct work_struct *w) +int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev) { - struct netvsc_device *nvdev - = container_of(w, struct netvsc_device, subchan_work); struct nvsp_message *init_packet = &nvdev->channel_init_pkt; - struct net_device_context *ndev_ctx; - struct rndis_device *rdev; - struct net_device *ndev; - struct hv_device *hv_dev; + struct net_device_context *ndev_ctx = netdev_priv(ndev); + struct hv_device *hv_dev = ndev_ctx->device_ctx; + struct rndis_device *rdev = nvdev->extension; int i, ret;
- if (!rtnl_trylock()) { - schedule_work(w); - return; - } - - rdev = nvdev->extension; - if (!rdev) - goto unlock; /* device was removed */ - - ndev = rdev->ndev; - ndev_ctx = netdev_priv(ndev); - hv_dev = ndev_ctx->device_ctx; + ASSERT_RTNL();
memset(init_packet, 0, sizeof(struct nvsp_message)); init_packet->hdr.msg_type = NVSP_MSG5_TYPE_SUBCHANNEL; @@ -1091,13 +1077,13 @@ void rndis_set_subchannel(struct work_st VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret) { netdev_err(ndev, "sub channel allocate send failed: %d\n", ret); - goto failed; + return ret; }
wait_for_completion(&nvdev->channel_init_wait); if (init_packet->msg.v5_msg.subchn_comp.status != NVSP_STAT_SUCCESS) { netdev_err(ndev, "sub channel request failed\n"); - goto failed; + return -EIO; }
nvdev->num_chn = 1 + @@ -1116,21 +1102,7 @@ void rndis_set_subchannel(struct work_st for (i = 0; i < VRSS_SEND_TAB_SIZE; i++) ndev_ctx->tx_table[i] = i % nvdev->num_chn;
- netif_device_attach(ndev); - rtnl_unlock(); - return; - -failed: - /* fallback to only primary channel */ - for (i = 1; i < nvdev->num_chn; i++) - netif_napi_del(&nvdev->chan_table[i].napi); - - nvdev->max_chn = 1; - nvdev->num_chn = 1; - - netif_device_attach(ndev); -unlock: - rtnl_unlock(); + return 0; }
static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device, @@ -1321,21 +1293,12 @@ struct netvsc_device *rndis_filter_devic netif_napi_add(net, &net_device->chan_table[i].napi, netvsc_poll, NAPI_POLL_WEIGHT);
- if (net_device->num_chn > 1) - schedule_work(&net_device->subchan_work); + return net_device;
out: - /* if unavailable, just proceed with one queue */ - if (ret) { - net_device->max_chn = 1; - net_device->num_chn = 1; - } - - /* No sub channels, device is ready */ - if (net_device->num_chn == 1) - netif_device_attach(net); - - return net_device; + /* setting up multiple channels failed */ + net_device->max_chn = 1; + net_device->num_chn = 1;
err_dev_remv: rndis_filter_device_remove(dev, net_device);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
[ Upstream commit fc9c2029e37c3ae9efc28bf47045e0b87e09660c ]
The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/seg6_hmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/seg6_hmac.c +++ b/net/ipv6/seg6_hmac.c @@ -373,7 +373,7 @@ static int seg6_hmac_init_algo(void) return -ENOMEM;
for_each_possible_cpu(cpu) { - tfm = crypto_alloc_shash(algo->name, 0, GFP_KERNEL); + tfm = crypto_alloc_shash(algo->name, 0, 0); if (IS_ERR(tfm)) return PTR_ERR(tfm); p_tfm = per_cpu_ptr(algo->tfms, cpu);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 30877961b1cdd6fdca783c2e8c4f0f47e95dc58c ]
Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted the mtu from the master device when creating a ipvlan device, but it would also override the mtu value set in rtnl_create_link. It causes IFLA_MTU param not to take effect.
So this patch is to not adjust the mtu if IFLA_MTU param is set when creating a ipvlan device.
Fixes: 296d48568042 ("ipvlan: inherit MTU from master device") Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipvlan/ipvlan_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -546,7 +546,8 @@ int ipvlan_link_new(struct net *src_net, ipvlan->dev = dev; ipvlan->port = port; ipvlan->sfeatures = IPVLAN_FEATURES; - ipvlan_adjust_mtu(ipvlan, phy_dev); + if (!tb[IFLA_MTU]) + ipvlan_adjust_mtu(ipvlan, phy_dev); INIT_LIST_HEAD(&ipvlan->addrs);
/* If the port-id base is at the MAX value, then wrap it around and
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesper Dangaard Brouer brouer@redhat.com
[ Upstream commit ad088ec480768850db019a5cc543685e868a513d ]
The driver was combining the XDP_TX tail flush and XDP_REDIRECT map flushing (xdp_do_flush_map). This is suboptimal, these two flush operations should be kept separate.
Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map") Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2211,9 +2211,10 @@ static struct sk_buff *ixgbe_build_skb(s return skb; }
-#define IXGBE_XDP_PASS 0 -#define IXGBE_XDP_CONSUMED 1 -#define IXGBE_XDP_TX 2 +#define IXGBE_XDP_PASS 0 +#define IXGBE_XDP_CONSUMED BIT(0) +#define IXGBE_XDP_TX BIT(1) +#define IXGBE_XDP_REDIR BIT(2)
static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, struct xdp_buff *xdp); @@ -2242,7 +2243,7 @@ static struct sk_buff *ixgbe_run_xdp(str case XDP_REDIRECT: err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog); if (!err) - result = IXGBE_XDP_TX; + result = IXGBE_XDP_REDIR; else result = IXGBE_XDP_CONSUMED; break; @@ -2302,7 +2303,7 @@ static int ixgbe_clean_rx_irq(struct ixg unsigned int mss = 0; #endif /* IXGBE_FCOE */ u16 cleaned_count = ixgbe_desc_unused(rx_ring); - bool xdp_xmit = false; + unsigned int xdp_xmit = 0;
while (likely(total_rx_packets < budget)) { union ixgbe_adv_rx_desc *rx_desc; @@ -2342,8 +2343,10 @@ static int ixgbe_clean_rx_irq(struct ixg }
if (IS_ERR(skb)) { - if (PTR_ERR(skb) == -IXGBE_XDP_TX) { - xdp_xmit = true; + unsigned int xdp_res = -PTR_ERR(skb); + + if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { + xdp_xmit |= xdp_res; ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; @@ -2415,7 +2418,10 @@ static int ixgbe_clean_rx_irq(struct ixg total_rx_packets++; }
- if (xdp_xmit) { + if (xdp_xmit & IXGBE_XDP_REDIR) + xdp_do_flush_map(); + + if (xdp_xmit & IXGBE_XDP_TX) { struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
/* Force memory writes to complete before letting h/w @@ -2423,8 +2429,6 @@ static int ixgbe_clean_rx_irq(struct ixg */ wmb(); writel(ring->next_to_use, ring->tail); - - xdp_do_flush_map(); }
u64_stats_update_begin(&rx_ring->syncp);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 74174fe5634ffbf645a7ca5a261571f700b2f332 ]
On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive.
syzbot reported :
BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid3.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -624,9 +624,8 @@ static void ccid3_hc_rx_send_feedback(st case CCID3_FBACK_PERIODIC: delta = ktime_us_delta(now, hc->rx_tstamp_last_feedback); if (delta <= 0) - DCCP_BUG("delta (%ld) <= 0", (long)delta); - else - hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); + delta = 1; + hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); break; default: return;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0ce4e70ff00662ad7490e545ba0cd8c1fa179fca ]
To compute delays, better not use time of the day which can be changed by admins or malicious programs.
Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid3.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -599,7 +599,7 @@ static void ccid3_hc_rx_send_feedback(st { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - ktime_t now = ktime_get_real(); + ktime_t now = ktime_get(); s64 delta = 0;
switch (fbtype) { @@ -631,7 +631,7 @@ static void ccid3_hc_rx_send_feedback(st return; }
- ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + ccid3_pr_debug("Interval %lldusec, X_recv=%u, 1/p=%u\n", delta, hc->rx_x_recv, hc->rx_pinv);
hc->rx_tstamp_last_feedback = now; @@ -678,7 +678,8 @@ static int ccid3_hc_rx_insert_options(st static u32 ccid3_first_li(struct sock *sk) { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); - u32 x_recv, p, delta; + u32 x_recv, p; + s64 delta; u64 fval;
if (hc->rx_rtt == 0) { @@ -686,7 +687,9 @@ static u32 ccid3_first_li(struct sock *s hc->rx_rtt = DCCP_FALLBACK_RTT; }
- delta = ktime_to_us(net_timedelta(hc->rx_tstamp_last_feedback)); + delta = ktime_us_delta(ktime_get(), hc->rx_tstamp_last_feedback); + if (delta <= 0) + delta = 1; x_recv = scaled_div32(hc->rx_bytes_recv, delta); if (x_recv == 0) { /* would also trigger divide-by-zero */ DCCP_WARN("X_recv==0\n");
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 603d4cf8fe095b1ee78f423d514427be507fb513 ]
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore.
Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols.
This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE.
Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/geneve.c | 2 +- drivers/net/vxlan.c | 4 +--- include/linux/netdevice.h | 20 ++++++++++++++++++++ net/8021q/vlan.c | 2 +- net/ipv4/fou.c | 4 +--- net/ipv4/gre_offload.c | 2 +- net/ipv4/udp_offload.c | 2 +- 7 files changed, 26 insertions(+), 10 deletions(-)
--- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -474,7 +474,7 @@ static struct sk_buff **geneve_gro_recei out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush);
return pp; } --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -623,9 +623,7 @@ static struct sk_buff **vxlan_gro_receiv flush = 0;
out: - skb_gro_remcsum_cleanup(skb, &grc); - skb->remcsum_offload = 0; - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final_remcsum(skb, pp, flush, &grc);
return pp; } --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2668,11 +2668,31 @@ static inline void skb_gro_flush_final(s if (PTR_ERR(pp) != -EINPROGRESS) NAPI_GRO_CB(skb)->flush |= flush; } +static inline void skb_gro_flush_final_remcsum(struct sk_buff *skb, + struct sk_buff **pp, + int flush, + struct gro_remcsum *grc) +{ + if (PTR_ERR(pp) != -EINPROGRESS) { + NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_remcsum_cleanup(skb, grc); + skb->remcsum_offload = 0; + } +} #else static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff **pp, int flush) { NAPI_GRO_CB(skb)->flush |= flush; } +static inline void skb_gro_flush_final_remcsum(struct sk_buff *skb, + struct sk_buff **pp, + int flush, + struct gro_remcsum *grc) +{ + NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_remcsum_cleanup(skb, grc); + skb->remcsum_offload = 0; +} #endif
static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev, --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -664,7 +664,7 @@ static struct sk_buff **vlan_gro_receive out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush);
return pp; } --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -448,9 +448,7 @@ next_proto: out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; - skb_gro_remcsum_cleanup(skb, &grc); - skb->remcsum_offload = 0; + skb_gro_flush_final_remcsum(skb, pp, flush, &grc);
return pp; } --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -223,7 +223,7 @@ static struct sk_buff **gre_gro_receive( out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush);
return pp; } --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -295,7 +295,7 @@ unflush: out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush); return pp; } EXPORT_SYMBOL(udp_gro_receive);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harini Katakam harini.katakam@xilinx.com
[ Upstream commit 64d7839af8c8f67daaf9bf387135052c55d85f90 ]
When delta passed to gem_ptp_adjtime is negative, the sign is maintained in the ns_to_timespec64 conversion. Hence timespec_add should be used directly. timespec_sub will just subtract the negative value thus increasing the time difference.
Signed-off-by: Harini Katakam harini.katakam@xilinx.com Acked-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cadence/macb_ptp.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/net/ethernet/cadence/macb_ptp.c +++ b/drivers/net/ethernet/cadence/macb_ptp.c @@ -170,10 +170,7 @@ static int gem_ptp_adjtime(struct ptp_cl
if (delta > TSU_NSEC_MAX_VAL) { gem_tsu_get_time(&bp->ptp_clock_info, &now); - if (sign) - now = timespec64_sub(now, then); - else - now = timespec64_add(now, then); + now = timespec64_add(now, then);
gem_tsu_set_time(&bp->ptp_clock_info, (const struct timespec64 *)&now);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit 733d3e5497070d05971352ca5087bac83c197c3d ]
In smartnic env, the host (PF) driver might not be an e-switch manager, hence the switchdev mode representors are running on the embedded cpu (EC) and not at the host.
As such, we should avoid dealing with vport representors if not being esw manager.
While here, make sure to disallow eswitch switchdev related setups through devlink if we are not esw managers.
Fixes: cb67b832921c ('net/mlx5e: Introduce SRIOV VF representors') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reviewed-by: Eli Cohen eli@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++++++------ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2626,7 +2626,7 @@ void mlx5e_activate_priv_channels(struct mlx5e_activate_channels(&priv->channels); netif_tx_start_all_queues(priv->netdev);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_add_sqs_fwd_rules(priv);
mlx5e_wait_channels_min_rx_wqes(&priv->channels); @@ -2637,7 +2637,7 @@ void mlx5e_deactivate_priv_channels(stru { mlx5e_redirect_rqts_to_drop(priv);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_remove_sqs_fwd_rules(priv);
/* FIXME: This is a W/A only for tx timeout watch dog false alarm when @@ -4127,7 +4127,7 @@ static void mlx5e_build_nic_netdev(struc mlx5e_set_netdev_dev_addr(netdev);
#if IS_ENABLED(CONFIG_MLX5_ESWITCH) - if (MLX5_VPORT_MANAGER(mdev)) + if (MLX5_ESWITCH_MANAGER(mdev)) netdev->switchdev_ops = &mlx5e_switchdev_ops; #endif
@@ -4273,7 +4273,7 @@ static void mlx5e_nic_enable(struct mlx5
mlx5e_enable_async_events(priv);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_register_vport_reps(priv);
if (netdev->reg_state != NETREG_REGISTERED) @@ -4300,7 +4300,7 @@ static void mlx5e_nic_disable(struct mlx
queue_work(priv->wq, &priv->set_rx_mode_work);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_unregister_vport_reps(priv);
mlx5e_disable_async_events(priv); @@ -4483,7 +4483,7 @@ static void *mlx5e_add(struct mlx5_core_ return NULL;
#ifdef CONFIG_MLX5_ESWITCH - if (MLX5_VPORT_MANAGER(mdev)) { + if (MLX5_ESWITCH_MANAGER(mdev)) { rpriv = mlx5e_alloc_nic_rep_priv(mdev); if (!rpriv) { mlx5_core_warn(mdev, "Failed to alloc NIC rep priv data\n"); --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -710,7 +710,7 @@ bool mlx5e_is_uplink_rep(struct mlx5e_pr struct mlx5e_rep_priv *rpriv = priv->ppriv; struct mlx5_eswitch_rep *rep;
- if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + if (!MLX5_ESWITCH_MANAGER(priv->mdev)) return false;
rep = rpriv->rep; --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -912,8 +912,8 @@ static int mlx5_devlink_eswitch_check(st if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH) return -EOPNOTSUPP;
- if (!MLX5_CAP_GEN(dev, vport_group_manager)) - return -EOPNOTSUPP; + if(!MLX5_ESWITCH_MANAGER(dev)) + return -EPERM;
if (dev->priv.eswitch->mode == SRIOV_NONE) return -EOPNOTSUPP;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit 8ffd569aaa818f2624ca821d9a246342fa8b8c50 ]
The check for cpu hit statistics was not returning immediate false for any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if user-space tool was calling into any possible netdev in the system.
Fix that by doing a proper check before dereferencing.
Fixes: 1d447a39142e ('net/mlx5e: Extendable vport representor netdev private data') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reported-by: Eli Cohen eli@melloanox.com Reviewed-by: Eli Cohen eli@melloanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -724,8 +724,12 @@ bool mlx5e_is_uplink_rep(struct mlx5e_pr static bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv) { struct mlx5e_rep_priv *rpriv = priv->ppriv; - struct mlx5_eswitch_rep *rep = rpriv->rep; + struct mlx5_eswitch_rep *rep;
+ if (!MLX5_CAP_GEN(priv->mdev, eswitch_flow_table)) + return false; + + rep = rpriv->rep; if (rep && rep->vport != FDB_UPLINK_VPORT) return true;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit 0efc8562491b7d36f6bbc4fbc8f3348cb6641e9c ]
In smartnic env, the host (PF) driver might not be an e-switch manager, hence the FW will err on driver attempts to deal with setting/unsetting the eswitch and as a result the overall setup of sriov will fail.
Fix that by avoiding the operation if e-switch management is not allowed for this driver instance. While here, move to use the correct name for the esw manager capability name.
Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reported-by: Guy Kushnir guyk@mellanox.com Reviewed-by: Eli Cohen eli@melloanox.com Tested-by: Eli Cohen eli@melloanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 2 ++ drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 ++- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 5 +++-- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 7 ++++++- include/linux/mlx5/mlx5_ifc.h | 2 +- 7 files changed, 16 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -726,7 +726,7 @@ static bool mlx5e_is_vf_vport_rep(struct struct mlx5e_rep_priv *rpriv = priv->ppriv; struct mlx5_eswitch_rep *rep;
- if (!MLX5_CAP_GEN(priv->mdev, eswitch_flow_table)) + if (!MLX5_ESWITCH_MANAGER(priv->mdev)) return false;
rep = rpriv->rep; --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1535,7 +1535,7 @@ int mlx5_eswitch_enable_sriov(struct mlx if (!ESW_ALLOWED(esw)) return 0;
- if (!MLX5_CAP_GEN(esw->dev, eswitch_flow_table) || + if (!MLX5_ESWITCH_MANAGER(esw->dev) || !MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ft_support)) { esw_warn(esw->dev, "E-Switch FDB is not supported, aborting ...\n"); return -EOPNOTSUPP; --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -39,6 +39,8 @@ #include <linux/mlx5/device.h> #include "lib/mpfs.h"
+#define MLX5_ESWITCH_MANAGER(mdev) MLX5_CAP_GEN(mdev, eswitch_manager) + enum { SRIOV_NONE, SRIOV_LEGACY, --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -36,6 +36,7 @@ #include "mlx5_core.h" #include "fs_core.h" #include "fs_cmd.h" +#include "eswitch.h" #include "diag/fs_tracepoint.h"
#define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct init_tree_node[]){__VA_ARGS__}) /\ @@ -2211,7 +2212,7 @@ int mlx5_init_fs(struct mlx5_core_dev *d goto err; }
- if (MLX5_CAP_GEN(dev, eswitch_flow_table)) { + if (MLX5_ESWITCH_MANAGER(dev)) { if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, ft_support)) { err = init_fdb_root_ns(steering); if (err) --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -34,6 +34,7 @@ #include <linux/mlx5/cmd.h> #include <linux/module.h> #include "mlx5_core.h" +#include "eswitch.h" #include "../../mlxfw/mlxfw.h"
static int mlx5_cmd_query_adapter(struct mlx5_core_dev *dev, u32 *out, @@ -152,13 +153,13 @@ int mlx5_query_hca_caps(struct mlx5_core }
if (MLX5_CAP_GEN(dev, vport_group_manager) && - MLX5_CAP_GEN(dev, eswitch_flow_table)) { + MLX5_ESWITCH_MANAGER(dev)) { err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH_FLOW_TABLE); if (err) return err; }
- if (MLX5_CAP_GEN(dev, eswitch_flow_table)) { + if (MLX5_ESWITCH_MANAGER(dev)) { err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH); if (err) return err; --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c @@ -88,6 +88,9 @@ static int mlx5_device_enable_sriov(stru return -EBUSY; }
+ if (!MLX5_ESWITCH_MANAGER(dev)) + goto enable_vfs_hca; + err = mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs, SRIOV_LEGACY); if (err) { mlx5_core_warn(dev, @@ -95,6 +98,7 @@ static int mlx5_device_enable_sriov(stru return err; }
+enable_vfs_hca: for (vf = 0; vf < num_vfs; vf++) { err = mlx5_core_enable_hca(dev, vf + 1); if (err) { @@ -140,7 +144,8 @@ static void mlx5_device_disable_sriov(st }
out: - mlx5_eswitch_disable_sriov(dev->priv.eswitch); + if (MLX5_ESWITCH_MANAGER(dev)) + mlx5_eswitch_disable_sriov(dev->priv.eswitch);
if (mlx5_wait_for_vf_pages(dev)) mlx5_core_warn(dev, "timeout reclaiming VFs pages\n"); --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -857,7 +857,7 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_1a4[0x1]; u8 ets[0x1]; u8 nic_flow_table[0x1]; - u8 eswitch_flow_table[0x1]; + u8 eswitch_manager[0x1]; u8 early_vf_enable[0x1]; u8 mcam_reg[0x1]; u8 pcam_reg[0x1];
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Vesker valex@mellanox.com
[ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ]
The command interface can work in two modes: Events and Polling. In the general case, each time we invoke a command, a work is queued to handle it.
When working in events, the interrupt handler completes the command execution. On the other hand, when working in polling mode, the work itself completes it.
Due to a bug in the work handler, a command could have been completed by the interrupt handler, while the work handler hasn't finished yet, causing the it to complete once again if the command interface mode was changed from Events to polling after the interrupt handler was called.
mlx5_unload_one() mlx5_stop_eqs() // Destroy the EQ before cmd EQ ...cmd_work_handler() write_doorbell() --> EVENT_TYPE_CMD mlx5_cmd_comp_handler() // First free free_ent(cmd, ent->idx) complete(&ent->done)
<-- mlx5_stop_eqs //cmd was complete // move to polling before destroying the last cmd EQ mlx5_cmd_use_polling() cmd->mode = POLL;
--> cmd_work_handler (continues) if (cmd->mode == POLL) mlx5_cmd_comp_handler() // Double free
The solution is to store the cmd->mode before writing the doorbell.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -801,6 +801,7 @@ static void cmd_work_handler(struct work unsigned long flags; bool poll_cmd = ent->polling; int alloc_ret; + int cmd_mode;
sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); @@ -847,6 +848,7 @@ static void cmd_work_handler(struct work set_signature(ent, !cmd->checksum_disabled); dump_command(dev, ent, 1); ent->ts1 = ktime_get_ns(); + cmd_mode = cmd->mode;
if (ent->callback) schedule_delayed_work(&ent->cb_timeout_work, cb_timeout); @@ -871,7 +873,7 @@ static void cmd_work_handler(struct work iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell); mmiowb(); /* if not in polling don't use ent after this point */ - if (cmd->mode == CMD_MODE_POLLING || poll_cmd) { + if (cmd_mode == CMD_MODE_POLLING || poll_cmd) { poll_timeout(ent); /* make sure we read the descriptor after ownership is SW */ rmb();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Vesker valex@mellanox.com
[ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ]
The NULL character was not set correctly for the string containing the command length, this caused failures reading the output of the command due to a random length. The fix is to initialize the output length string.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -1274,7 +1274,7 @@ static ssize_t outlen_write(struct file { struct mlx5_core_dev *dev = filp->private_data; struct mlx5_cmd_debug *dbg = &dev->cmd.dbg; - char outlen_str[8]; + char outlen_str[8] = {0}; int outlen; void *ptr; int err; @@ -1289,8 +1289,6 @@ static ssize_t outlen_write(struct file if (copy_from_user(outlen_str, buf, count)) return -EFAULT;
- outlen_str[7] = 0; - err = sscanf(outlen_str, "%d", &outlen); if (err < 0) return err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Cohen eli@mellanox.com
[ Upstream commit f811980444ec59ad62f9e041adbb576a821132c7 ]
Manipulating of the MPFS requires eswitch manager capabilities.
Fixes: eeb66cdb6826 ('net/mlx5: Separate between E-Switch and MPFS') Signed-off-by: Eli Cohen eli@mellanox.com Reviewed-by: Or Gerlitz ogerlitz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c @@ -34,6 +34,7 @@ #include <linux/mlx5/driver.h> #include <linux/mlx5/mlx5_ifc.h> #include "mlx5_core.h" +#include "eswitch.h" #include "lib/mpfs.h"
/* HW L2 Table (MPFS) management */ @@ -98,7 +99,7 @@ int mlx5_mpfs_init(struct mlx5_core_dev int l2table_size = 1 << MLX5_CAP_GEN(dev, log_max_l2_table); struct mlx5_mpfs *mpfs;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return 0;
mpfs = kzalloc(sizeof(*mpfs), GFP_KERNEL); @@ -122,7 +123,7 @@ void mlx5_mpfs_cleanup(struct mlx5_core_ { struct mlx5_mpfs *mpfs = dev->priv.mpfs;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return;
WARN_ON(!hlist_empty(mpfs->hash)); @@ -137,7 +138,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_d u32 index; int err;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return 0;
mutex_lock(&mpfs->lock); @@ -179,7 +180,7 @@ int mlx5_mpfs_del_mac(struct mlx5_core_d int err = 0; u32 index;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return 0;
mutex_lock(&mpfs->lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shay Agroskin shayag@mellanox.com
[ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ]
The driver allocates wrong size (due to wrong struct name) when issuing a query/set request to NIC's register.
Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Shay Agroskin shayag@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -641,7 +641,7 @@ EXPORT_SYMBOL_GPL(mlx5_query_port_prio_t static int mlx5_set_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *in, int inlen) { - u32 out[MLX5_ST_SZ_DW(qtct_reg)]; + u32 out[MLX5_ST_SZ_DW(qetc_reg)];
if (!MLX5_CAP_GEN(mdev, ets)) return -EOPNOTSUPP; @@ -653,7 +653,7 @@ static int mlx5_set_port_qetcr_reg(struc static int mlx5_query_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *out, int outlen) { - u32 in[MLX5_ST_SZ_DW(qtct_reg)]; + u32 in[MLX5_ST_SZ_DW(qetc_reg)];
if (!MLX5_CAP_GEN(mdev, ets)) return -EOPNOTSUPP;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart antoine.tenart@bootlin.com
[ Upstream commit 271f7ff5aa5a73488b7a9d8b84b5205fb5b2f7cc ]
When using s/w buffer management, buffers are allocated and DMA mapped. When doing so on an arm64 platform, an offset correction is applied on the DMA address, before storing it in an Rx descriptor. The issue is this DMA address is then used later in the Rx path without removing the offset correction. Thus the DMA address is wrong, which can led to various issues.
This patch fixes this by removing the offset correction from the DMA address retrieved from the Rx descriptor before using it in the Rx path.
Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible") Signed-off-by: Antoine Tenart antoine.tenart@bootlin.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1959,7 +1959,7 @@ static int mvneta_rx_swbm(struct mvneta_ rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE); index = rx_desc - rxq->descs; data = rxq->buf_virt_addr[index]; - phys_addr = rx_desc->buf_phys_addr; + phys_addr = rx_desc->buf_phys_addr - pp->rx_offset_correction;
if (!mvneta_rxq_desc_is_first_last(rx_status) || (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ]
We should put copy_skb in receive_queue only after a successful call to virtio_net_hdr_from_skb().
syzbot report :
BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline] BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline] BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __skb_unlink include/linux/skbuff.h:1843 [inline] __skb_dequeue include/linux/skbuff.h:1863 [inline] skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331 packet_release+0x630/0xd90 net/packet/af_packet.c:2991 __sock_release+0xd7/0x260 net/socket.c:603 sock_close+0x19/0x20 net/socket.c:1186 __fput+0x35b/0x8b0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:865 do_group_exit+0x177/0x440 kernel/exit.c:968 __do_sys_exit_group kernel/exit.c:979 [inline] __se_sys_exit_group kernel/exit.c:977 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4448e9 Code: Bad RIP value. RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9 RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001 RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000 R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0 R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb net/core/skbuff.c:642 [inline] kfree_skb+0x1a5/0x580 net/core/skbuff.c:659 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8801b044ecc0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 0 bytes inside of 232-byte region [ffff8801b044ecc0, ffff8801b044eda8) The buggy address belongs to the page: page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0 raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^ ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2303,6 +2303,13 @@ static int tpacket_rcv(struct sk_buff *s if (po->stats.stats1.tp_drops) status |= TP_STATUS_LOSING; } + + if (do_vnet && + virtio_net_hdr_from_skb(skb, h.raw + macoff - + sizeof(struct virtio_net_hdr), + vio_le(), true, 0)) + goto drop_n_account; + po->stats.stats1.tp_packets++; if (copy_skb) { status |= TP_STATUS_COPY; @@ -2310,15 +2317,6 @@ static int tpacket_rcv(struct sk_buff *s } spin_unlock(&sk->sk_receive_queue.lock);
- if (do_vnet) { - if (virtio_net_hdr_from_skb(skb, h.raw + macoff - - sizeof(struct virtio_net_hdr), - vio_le(), true, 0)) { - spin_lock(&sk->sk_receive_queue.lock); - goto drop_n_account; - } - } - skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp)))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Khlebnikov khlebnikov@yandex-team.ru
[ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ]
When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice.
In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly.
This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued.
Signed-off-by: Konstantin Khlebnikov khlebnikov@yandex-team.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_blackhole.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/sch_blackhole.c +++ b/net/sched/sch_blackhole.c @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_b struct sk_buff **to_free) { qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS; + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; }
static struct sk_buff *blackhole_dequeue(struct Qdisc *sch)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ]
After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), sungem owners reported the infamous "eth0: hw csum failure" message.
CHECKSUM_COMPLETE has in fact never worked for this driver, but this was masked by the fact that upper stacks had to strip the FCS, and therefore skb->ip_summed was set back to CHECKSUM_NONE before my recent change.
Driver configures a number of bytes to skip when the chip computes the checksum, and for some reason only half of the Ethernet header was skipped.
Then a second problem is that we should strip the FCS by default, unless the driver is updated to eventually support NETIF_F_RXFCS in the future.
Finally, a driver should check if NETIF_F_RXCSUM feature is enabled or not, so that the admin can turn off rx checksum if wanted.
Many thanks to Andreas Schwab and Mathieu Malaterre for their help in debugging this issue.
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Meelis Roos mroos@linux.ee Reported-by: Mathieu Malaterre malat@debian.org Reported-by: Andreas Schwab schwab@linux-m68k.org Tested-by: Andreas Schwab schwab@linux-m68k.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/sun/sungem.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
--- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -59,8 +59,7 @@ #include <linux/sungem_phy.h> #include "sungem.h"
-/* Stripping FCS is causing problems, disabled for now */ -#undef STRIP_FCS +#define STRIP_FCS
#define DEFAULT_MSG (NETIF_MSG_DRV | \ NETIF_MSG_PROBE | \ @@ -434,7 +433,7 @@ static int gem_rxmac_reset(struct gem *g writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW); writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) writel(((5 & RXDMA_BLANK_IPKTS) | @@ -759,7 +758,6 @@ static int gem_rx(struct gem *gp, int wo struct net_device *dev = gp->dev; int entry, drops, work_done = 0; u32 done; - __sum16 csum;
if (netif_msg_rx_status(gp)) printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", @@ -854,9 +852,13 @@ static int gem_rx(struct gem *gp, int wo skb = copy_skb; }
- csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; + if (likely(dev->features & NETIF_F_RXCSUM)) { + __sum16 csum; + + csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); + skb->csum = csum_unfold(csum); + skb->ip_summed = CHECKSUM_COMPLETE; + } skb->protocol = eth_type_trans(skb, gp->dev);
napi_gro_receive(&gp->napi, skb); @@ -1760,7 +1762,7 @@ static void gem_init_dma(struct gem *gp) writel(0, gp->regs + TXDMA_KICK);
val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG);
writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); @@ -2986,8 +2988,8 @@ static int gem_init_one(struct pci_dev * pci_set_drvdata(pdev, dev);
/* We can do scatter/gather and HW checksum */ - dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM; - dev->features |= dev->hw_features | NETIF_F_RXCSUM; + dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->features = dev->hw_features; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ]
Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups need to fail if dev_match is not true. Currently, a packet to a given port can match a socket bound to device when it should not. In the VRF case, this causes the lookup to hit a VRF socket and not a global socket resulting in a response trying to go through the VRF when it should not.
Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups") Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups") Reported-by: Lou Berger lberger@labn.net Diagnosed-by: Renato Westphal renato@opensourcerouting.org Tested-by: Renato Westphal renato@opensourcerouting.org Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/inet_hashtables.c | 4 ++-- net/ipv6/inet6_hashtables.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -188,9 +188,9 @@ static inline int compute_score(struct s bool dev_match = (sk->sk_bound_dev_if == dif || sk->sk_bound_dev_if == sdif);
- if (exact_dif && !dev_match) + if (!dev_match) return -1; - if (sk->sk_bound_dev_if && dev_match) + if (sk->sk_bound_dev_if) score += 4; } if (sk->sk_incoming_cpu == raw_smp_processor_id()) --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -113,9 +113,9 @@ static inline int compute_score(struct s bool dev_match = (sk->sk_bound_dev_if == dif || sk->sk_bound_dev_if == sdif);
- if (exact_dif && !dev_match) + if (!dev_match) return -1; - if (sk->sk_bound_dev_if && dev_match) + if (sk->sk_bound_dev_if) score++; } if (sk->sk_incoming_cpu == raw_smp_processor_id())
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit 82a4e71b1565dea8387f54503e806cf374e779ec ]
When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode), get-tsinfo() callback should return the software timestamp capabilities instead of returning the error.
Fixes: 4c55215c ("qede: Add driver support for PTP") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qede/qede_ptp.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/qlogic/qede/qede_ptp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ptp.c @@ -337,8 +337,14 @@ int qede_ptp_get_ts_info(struct qede_dev { struct qede_ptp *ptp = edev->ptp;
- if (!ptp) - return -EIO; + if (!ptp) { + info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE | + SOF_TIMESTAMPING_RX_SOFTWARE | + SOF_TIMESTAMPING_SOFTWARE; + info->phc_index = -1; + + return 0; + }
info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE | SOF_TIMESTAMPING_RX_SOFTWARE |
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit 538f8d00ba8bb417c4d9e76c61dee59d812d8287 ]
By default, driver sets the eswitch mode incorrectly as VEB (virtual Ethernet bridging). Need to set VEB eswitch mode only when sriov is enabled, and it should be to set NONE by default. The patch incorporates this change.
Fixes: 0fefbfbaa ("qed*: Management firmware - notifications and defaults") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_dev.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 19 +++++++++++++++++-- 2 files changed, 18 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1782,7 +1782,7 @@ int qed_hw_init(struct qed_dev *cdev, st DP_INFO(p_hwfn, "Failed to update driver state\n");
rc = qed_mcp_ov_update_eswitch(p_hwfn, p_hwfn->p_main_ptt, - QED_OV_ESWITCH_VEB); + QED_OV_ESWITCH_NONE); if (rc) DP_INFO(p_hwfn, "Failed to update eswitch mode\n"); } --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -4396,6 +4396,8 @@ static void qed_sriov_enable_qid_config( static int qed_sriov_enable(struct qed_dev *cdev, int num) { struct qed_iov_vf_init_params params; + struct qed_hwfn *hwfn; + struct qed_ptt *ptt; int i, j, rc;
if (num >= RESC_NUM(&cdev->hwfns[0], QED_VPORT)) { @@ -4408,8 +4410,8 @@ static int qed_sriov_enable(struct qed_d
/* Initialize HW for VF access */ for_each_hwfn(cdev, j) { - struct qed_hwfn *hwfn = &cdev->hwfns[j]; - struct qed_ptt *ptt = qed_ptt_acquire(hwfn); + hwfn = &cdev->hwfns[j]; + ptt = qed_ptt_acquire(hwfn);
/* Make sure not to use more than 16 queues per VF */ params.num_queues = min_t(int, @@ -4445,6 +4447,19 @@ static int qed_sriov_enable(struct qed_d goto err; }
+ hwfn = QED_LEADING_HWFN(cdev); + ptt = qed_ptt_acquire(hwfn); + if (!ptt) { + DP_ERR(hwfn, "Failed to acquire ptt\n"); + rc = -EBUSY; + goto err; + } + + rc = qed_mcp_ov_update_eswitch(hwfn, ptt, QED_OV_ESWITCH_VEB); + if (rc) + DP_INFO(cdev, "Failed to update eswitch mode\n"); + qed_ptt_release(hwfn, ptt); + return num;
err:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ]
Use the correct size value while copying chassis/port id values.
Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c @@ -700,9 +700,9 @@ qed_dcbx_get_local_lldp_params(struct qe p_local = &p_hwfn->p_dcbx_info->lldp_local[LLDP_NEAREST_BRIDGE];
memcpy(params->lldp_local.local_chassis_id, p_local->local_chassis_id, - ARRAY_SIZE(p_local->local_chassis_id)); + sizeof(p_local->local_chassis_id)); memcpy(params->lldp_local.local_port_id, p_local->local_port_id, - ARRAY_SIZE(p_local->local_port_id)); + sizeof(p_local->local_port_id)); }
static void @@ -714,9 +714,9 @@ qed_dcbx_get_remote_lldp_params(struct q p_remote = &p_hwfn->p_dcbx_info->lldp_remote[LLDP_NEAREST_BRIDGE];
memcpy(params->lldp_remote.peer_chassis_id, p_remote->peer_chassis_id, - ARRAY_SIZE(p_remote->peer_chassis_id)); + sizeof(p_remote->peer_chassis_id)); memcpy(params->lldp_remote.peer_port_id, p_remote->peer_port_id, - ARRAY_SIZE(p_remote->peer_port_id)); + sizeof(p_remote->peer_port_id)); }
static int
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ]
Memory size is limited in the kdump kernel environment. Allocation of more msix-vectors (or queues) consumes few tens of MBs of memory, which might lead to the kdump kernel failure. This patch adds changes to limit the number of MSI-X vectors in kdump kernel to minimum required value (i.e., 2 per engine).
Fixes: fe56b9e6a ("qed: Add module with basic common support") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_main.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -779,6 +779,14 @@ static int qed_slowpath_setup_int(struct /* We want a minimum of one slowpath and one fastpath vector per hwfn */ cdev->int_params.in.min_msix_cnt = cdev->num_hwfns * 2;
+ if (is_kdump_kernel()) { + DP_INFO(cdev, + "Kdump kernel: Limit the max number of requested MSI-X vectors to %hd\n", + cdev->int_params.in.min_msix_cnt); + cdev->int_params.in.num_vectors = + cdev->int_params.in.min_msix_cnt; + } + rc = qed_set_int_mode(cdev, false); if (rc) { DP_ERR(cdev, "qed_slowpath_setup_int ERR\n");
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ]
When unplugging an r8152 adapter while the interface is UP, the NIC becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to napi_disable, but the napi is already deleted by disconnect above. So the first while loop in napi_disable never finishes. This results in complete deadlock of the network layer as there is rtnl_mutex held by unregister_netdev.
So avoid the call to napi_disable in rtl8152_close when the device is already gone.
The other calls to usb_kill_urb, cancel_delayed_work_sync, netif_stop_queue etc. seem to be fine. The urb and netdev is not destroyed yet.
Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/r8152.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3959,7 +3959,8 @@ static int rtl8152_close(struct net_devi #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + if (!test_bit(RTL8152_UNPLUG, &tp->flags)) + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bhadram Varka vbhadram@nvidia.com
[ Upstream commit b6cfffa7ad923c73f317ea50fd4ebcb3b4b6669c ]
HW does not support Half-duplex mode in multi-queue scenario. Fix it by not advertising the Half-Duplex mode if multi-queue enabled.
Signed-off-by: Bhadram Varka vbhadram@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -914,6 +914,7 @@ static void stmmac_check_pcs_mode(struct static int stmmac_init_phy(struct net_device *dev) { struct stmmac_priv *priv = netdev_priv(dev); + u32 tx_cnt = priv->plat->tx_queues_to_use; struct phy_device *phydev; char phy_id_fmt[MII_BUS_ID_SIZE + 3]; char bus_id[MII_BUS_ID_SIZE]; @@ -955,6 +956,15 @@ static int stmmac_init_phy(struct net_de SUPPORTED_1000baseT_Full);
/* + * Half-duplex mode not supported with multiqueue + * half-duplex can only works with single queue + */ + if (tx_cnt > 1) + phydev->supported &= ~(SUPPORTED_1000baseT_Half | + SUPPORTED_100baseT_Half | + SUPPORTED_10baseT_Half); + + /* * Broken HW is sometimes missing the pull-up resistor on the * MDIO line, which results in reads to non-existent devices returning * 0 rather than 0xffff. Catch this here and treat 0 as a non-existent
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doron Roberts-Kedes doronrk@fb.com
[ Upstream commit 977c7114ebda2e746a114840d3a875e0cdb826fb ]
On receving an incomplete message, the existing code stores the remaining length of the cloned skb in the early_eaten field instead of incrementing the value returned by __strp_recv. This defers invocation of sock_rfree for the current skb until the next invocation of __strp_recv, which returns early_eaten if early_eaten is non-zero.
This behavior causes a stall when the current message occupies the very tail end of a massive skb, and strp_peek/need_bytes indicates that the remainder of the current message has yet to arrive on the socket. The TCP receive buffer is totally full, causing the TCP window to go to zero, so the remainder of the message will never arrive.
Incrementing the value returned by __strp_recv by the amount otherwise stored in early_eaten prevents stalls of this nature.
Signed-off-by: Doron Roberts-Kedes doronrk@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/strparser/strparser.c | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-)
--- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -35,7 +35,6 @@ struct _strp_msg { */ struct strp_msg strp; int accum_len; - int early_eaten; };
static inline struct _strp_msg *_strp_msg(struct sk_buff *skb) @@ -115,20 +114,6 @@ static int __strp_recv(read_descriptor_t head = strp->skb_head; if (head) { /* Message already in progress */ - - stm = _strp_msg(head); - if (unlikely(stm->early_eaten)) { - /* Already some number of bytes on the receive sock - * data saved in skb_head, just indicate they - * are consumed. - */ - eaten = orig_len <= stm->early_eaten ? - orig_len : stm->early_eaten; - stm->early_eaten -= eaten; - - return eaten; - } - if (unlikely(orig_offset)) { /* Getting data with a non-zero offset when a message is * in progress is not expected. If it does happen, we @@ -297,9 +282,9 @@ static int __strp_recv(read_descriptor_t }
stm->accum_len += cand_len; + eaten += cand_len; strp->need_bytes = stm->strp.full_len - stm->accum_len; - stm->early_eaten = cand_len; STRP_STATS_ADD(strp->stats.bytes, cand_len); desc->count = 0; /* Stop reading socket */ break;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ]
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts.
Reported-by: Daniele Iamartino danielei@google.com Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/sysctl_net_ipv4.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -258,8 +258,9 @@ static int proc_tcp_fastopen_key(struct { struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) }; struct tcp_fastopen_context *ctxt; - int ret; u32 user_key[4]; /* 16 bytes, matching TCP_FASTOPEN_KEY_LENGTH */ + __le32 key[4]; + int ret, i;
tbl.data = kmalloc(tbl.maxlen, GFP_KERNEL); if (!tbl.data) @@ -268,11 +269,14 @@ static int proc_tcp_fastopen_key(struct rcu_read_lock(); ctxt = rcu_dereference(tcp_fastopen_ctx); if (ctxt) - memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); + memcpy(key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); else - memset(user_key, 0, sizeof(user_key)); + memset(key, 0, sizeof(key)); rcu_read_unlock();
+ for (i = 0; i < ARRAY_SIZE(key); i++) + user_key[i] = le32_to_cpu(key[i]); + snprintf(tbl.data, tbl.maxlen, "%08x-%08x-%08x-%08x", user_key[0], user_key[1], user_key[2], user_key[3]); ret = proc_dostring(&tbl, write, buffer, lenp, ppos); @@ -288,12 +292,16 @@ static int proc_tcp_fastopen_key(struct * first invocation of tcp_fastopen_cookie_gen */ tcp_fastopen_init_key_once(false); - tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH); + + for (i = 0; i < ARRAY_SIZE(user_key); i++) + key[i] = cpu_to_le32(user_key[i]); + + tcp_fastopen_reset_cipher(key, TCP_FASTOPEN_KEY_LENGTH); }
bad_key: pr_debug("proc FO key set 0x%x-%x-%x-%x <- 0x%s: %u\n", - user_key[0], user_key[1], user_key[2], user_key[3], + user_key[0], user_key[1], user_key[2], user_key[3], (char *)tbl.data, ret); kfree(tbl.data); return ret;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ]
Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when we meet errors during ubuf allocation, the code does not check for NULL before calling sockfd_put(), this will lead NULL dereferencing. Fixing by checking sock pointer before.
Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1186,7 +1186,8 @@ err_used: if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: - sockfd_put(sock); + if (sock) + sockfd_put(sock); err_vq: mutex_unlock(&vq->mutex); err:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudio Imbrenda imbrenda@linux.vnet.ibm.com
[ Upstream commit e5ab564c9ebee77794842ca7d7476147b83d6a27 ]
The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be used, and in fact in virtio_transport_common.c only 64 bit accessors are used. Using 32 bit accessors for 64 bit values breaks big endian systems.
This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.
Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport")
Signed-off-by: Claudio Imbrenda imbrenda@linux.vnet.ibm.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/virtio_transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -201,7 +201,7 @@ virtio_transport_send_pkt(struct virtio_ return -ENODEV; }
- if (le32_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) + if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) return virtio_transport_send_pkt_loopback(vsock, pkt);
if (pkt->reply)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream.
t.qset_idx can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl() warn: potential spectre issue 'adapter->msix_info'
Fix this by sanitizing t.qset_idx before using it to index adapter->msix_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -51,6 +51,7 @@ #include <linux/sched.h> #include <linux/slab.h> #include <linux/uaccess.h> +#include <linux/nospec.h>
#include "common.h" #include "cxgb3_ioctl.h" @@ -2268,6 +2269,7 @@ static int cxgb_extension_ioctl(struct n
if (t.qset_idx >= nqsets) return -EINVAL; + t.qset_idx = array_index_nospec(t.qset_idx, nqsets);
q = &adapter->params.sge.qset[q1 + t.qset_idx]; t.rspq_size = q->rspq_size;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
commit 12dfa2f68ab659636e092db13b5d17cf9aac82af upstream.
When connecting to AP, mac80211 asks driver to enter and leave PS quickly, but driver deinit doesn't wait for delayed work complete when entering PS, then driver reinit procedure and delay work are running simultaneously. This will cause unpredictable kernel oops or crash like
rtl8723be: error H2C cmd because of Fw download fail!!! WARNING: CPU: 3 PID: 159 at drivers/net/wireless/realtek/rtlwifi/ rtl8723be/fw.c:227 rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be] CPU: 3 PID: 159 Comm: kworker/3:2 Tainted: G O 4.16.13-2-ARCH #1 Hardware name: ASUSTeK COMPUTER INC. X556UF/X556UF, BIOS X556UF.406 10/21/2016 Workqueue: rtl8723be_pci rtl_c2hcmd_wq_callback [rtlwifi] RIP: 0010:rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be] RSP: 0018:ffffa6ab01e1bd70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffa26069071520 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff8be70e9c RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000048 R09: 0000000000000348 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffffa26069071520 R14: 0000000000000000 R15: ffffa2607d205f70 FS: 0000000000000000(0000) GS:ffffa26081d80000(0000) knlGS:000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000443b39d3000 CR3: 000000037700a005 CR4: 00000000003606e0 Call Trace: ? halbtc_send_bt_mp_operation.constprop.17+0xd5/0xe0 [btcoexist] ? ex_btc8723b1ant_bt_info_notify+0x3b8/0x820 [btcoexist] ? rtl_c2hcmd_launcher+0xab/0x110 [rtlwifi] ? process_one_work+0x1d1/0x3b0 ? worker_thread+0x2b/0x3d0 ? process_one_work+0x3b0/0x3b0 ? kthread+0x112/0x130 ? kthread_create_on_node+0x60/0x60 ? ret_from_fork+0x35/0x40 Code: 00 76 b4 e9 e2 fe ff ff 4c 89 ee 4c 89 e7 e8 56 22 86 ca e9 5e ...
This patch ensures all delayed works done before entering PS to satisfy our expectation, so use cancel_delayed_work_sync() instead. An exception is delayed work ips_nic_off_wq because running task may be itself, so add a parameter ips_wq to deinit function to handle this case.
This issue is reported and fixed in below threads: https://github.com/lwfinger/rtlwifi_new/issues/367 https://github.com/lwfinger/rtlwifi_new/issues/366
Tested-by: Evgeny Kapun abacabadabacaba@gmail.com # 8723DE Tested-by: Shivam Kakkar shivam543@gmail.com # 8723BE on 4.18-rc1 Signed-off-by: Ping-Ke Shih pkshih@realtek.com Fixes: cceb0a597320 ("rtlwifi: Add work queue for c2h cmd.") Cc: Stable stable@vger.kernel.org # 4.11+ Reviewed-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/realtek/rtlwifi/base.c | 17 ++++++++++------- drivers/net/wireless/realtek/rtlwifi/base.h | 2 +- drivers/net/wireless/realtek/rtlwifi/core.c | 2 +- drivers/net/wireless/realtek/rtlwifi/pci.c | 2 +- drivers/net/wireless/realtek/rtlwifi/ps.c | 4 ++-- drivers/net/wireless/realtek/rtlwifi/usb.c | 2 +- 6 files changed, 16 insertions(+), 13 deletions(-)
--- a/drivers/net/wireless/realtek/rtlwifi/base.c +++ b/drivers/net/wireless/realtek/rtlwifi/base.c @@ -483,18 +483,21 @@ static void _rtl_init_deferred_work(stru
}
-void rtl_deinit_deferred_work(struct ieee80211_hw *hw) +void rtl_deinit_deferred_work(struct ieee80211_hw *hw, bool ips_wq) { struct rtl_priv *rtlpriv = rtl_priv(hw);
del_timer_sync(&rtlpriv->works.watchdog_timer);
- cancel_delayed_work(&rtlpriv->works.watchdog_wq); - cancel_delayed_work(&rtlpriv->works.ips_nic_off_wq); - cancel_delayed_work(&rtlpriv->works.ps_work); - cancel_delayed_work(&rtlpriv->works.ps_rfon_wq); - cancel_delayed_work(&rtlpriv->works.fwevt_wq); - cancel_delayed_work(&rtlpriv->works.c2hcmd_wq); + cancel_delayed_work_sync(&rtlpriv->works.watchdog_wq); + if (ips_wq) + cancel_delayed_work(&rtlpriv->works.ips_nic_off_wq); + else + cancel_delayed_work_sync(&rtlpriv->works.ips_nic_off_wq); + cancel_delayed_work_sync(&rtlpriv->works.ps_work); + cancel_delayed_work_sync(&rtlpriv->works.ps_rfon_wq); + cancel_delayed_work_sync(&rtlpriv->works.fwevt_wq); + cancel_delayed_work_sync(&rtlpriv->works.c2hcmd_wq); } EXPORT_SYMBOL_GPL(rtl_deinit_deferred_work);
--- a/drivers/net/wireless/realtek/rtlwifi/base.h +++ b/drivers/net/wireless/realtek/rtlwifi/base.h @@ -121,7 +121,7 @@ void rtl_init_rfkill(struct ieee80211_hw void rtl_deinit_rfkill(struct ieee80211_hw *hw);
void rtl_watch_dog_timer_callback(unsigned long data); -void rtl_deinit_deferred_work(struct ieee80211_hw *hw); +void rtl_deinit_deferred_work(struct ieee80211_hw *hw, bool ips_wq);
bool rtl_action_proc(struct ieee80211_hw *hw, struct sk_buff *skb, u8 is_tx); int rtlwifi_rate_mapping(struct ieee80211_hw *hw, bool isht, --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -196,7 +196,7 @@ static void rtl_op_stop(struct ieee80211 /* reset sec info */ rtl_cam_reset_sec_info(hw);
- rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, false); } rtlpriv->intf_ops->adapter_stop(hw);
--- a/drivers/net/wireless/realtek/rtlwifi/pci.c +++ b/drivers/net/wireless/realtek/rtlwifi/pci.c @@ -2359,7 +2359,7 @@ void rtl_pci_disconnect(struct pci_dev * ieee80211_unregister_hw(hw); rtlmac->mac80211_registered = 0; } else { - rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, false); rtlpriv->intf_ops->adapter_stop(hw); } rtlpriv->cfg->ops->disable_interrupt(hw); --- a/drivers/net/wireless/realtek/rtlwifi/ps.c +++ b/drivers/net/wireless/realtek/rtlwifi/ps.c @@ -66,7 +66,7 @@ bool rtl_ps_disable_nic(struct ieee80211 struct rtl_priv *rtlpriv = rtl_priv(hw);
/*<1> Stop all timer */ - rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, true);
/*<2> Disable Interrupt */ rtlpriv->cfg->ops->disable_interrupt(hw); @@ -287,7 +287,7 @@ void rtl_ips_nic_on(struct ieee80211_hw struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw)); enum rf_pwrstate rtstate;
- cancel_delayed_work(&rtlpriv->works.ips_nic_off_wq); + cancel_delayed_work_sync(&rtlpriv->works.ips_nic_off_wq);
spin_lock(&rtlpriv->locks.ips_lock); if (ppsc->inactiveps) { --- a/drivers/net/wireless/realtek/rtlwifi/usb.c +++ b/drivers/net/wireless/realtek/rtlwifi/usb.c @@ -1150,7 +1150,7 @@ void rtl_usb_disconnect(struct usb_inter ieee80211_unregister_hw(hw); rtlmac->mac80211_registered = 0; } else { - rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, false); rtlpriv->intf_ops->adapter_stop(hw); } /*deinit rfkill */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream.
Without this patch, firmware will not run properly on rtl8821ae, and it causes bad user experience. For example, bad connection performance with low rate, higher power consumption, and so on.
rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and each firmware has firmware data buffer and size individually. Original code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and this mismatch causes firmware checksum error, then firmware can't start.
In this situation, driver gives message "Firmware is not ready to run!".
Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware") Signed-off-by: Ping-Ke Shih pkshih@realtek.com Cc: Stable stable@vger.kernel.org # 4.0+ Reviewed-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/realtek/rtlwifi/core.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -130,7 +130,6 @@ found_alt: firmware->size); rtlpriv->rtlhal.wowlan_fwsize = firmware->size; } - rtlpriv->rtlhal.fwsize = firmware->size; release_firmware(firmware); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren stefan.wahren@i2se.com
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly.
This patch was tested on Raspberry Pi 3B+
Link: https://github.com/raspberrypi/linux/issues/2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable stable@vger.kernel.org Signed-off-by: Floris Bos bos@je-eigen-domein.nl Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/lan78xx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3197,6 +3197,7 @@ static void lan78xx_tx_bh(struct lan78xx pkt_cnt = 0; count = 0; length = 0; + spin_lock_irqsave(&tqp->lock, flags); for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) { if (skb_is_gso(skb)) { if (pkt_cnt) { @@ -3205,7 +3206,8 @@ static void lan78xx_tx_bh(struct lan78xx } count = 1; length = skb->len - TX_OVERHEAD; - skb2 = skb_dequeue(tqp); + __skb_unlink(skb, tqp); + spin_unlock_irqrestore(&tqp->lock, flags); goto gso_skb; }
@@ -3214,6 +3216,7 @@ static void lan78xx_tx_bh(struct lan78xx skb_totallen = skb->len + roundup(skb_totallen, sizeof(u32)); pkt_cnt++; } + spin_unlock_irqrestore(&tqp->lock, flags);
/* copy to a single skb */ skb = alloc_skb(skb_totallen, GFP_ATOMIC);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e upstream.
Hub driver will try to disable a USB3 device twice at logical disconnect, racing with xhci_free_dev() callback from the first port disable.
This can be triggered with "udisksctl power-off --block-device <disk>" or by writing "1" to the "remove" sysfs file for a USB3 device in 4.17-rc4.
USB3 devices don't have a similar disabled link state as USB2 devices, and use a U3 suspended link state instead. In this state the port is still enabled and connected.
hub_port_connect() first disconnects the device, then later it notices that device is still enabled (due to U3 states) it will try to disable the port again (set to U3).
The xhci_free_dev() called during device disable is async, so checking for existing xhci->devs[i] when setting link state to U3 the second time was successful, even if device was being freed.
The regression was caused by, and whole thing revealed by, Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned. and causes a NULL pointer dereference the second time we try to set U3.
Fix this by checking xhci->devs[i]->udev exists before setting link state.
The original patch went to stable so this fix needs to be applied there as well.
Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") Cc: stable@vger.kernel.org Reported-by: Jordan Glover Golden_Miller83@protonmail.ch Tested-by: Jordan Glover Golden_Miller83@protonmail.ch Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/host/xhci-hub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -366,7 +366,7 @@ int xhci_find_slot_id_by_port(struct usb
slot_id = 0; for (i = 0; i < MAX_HC_SLOTS; i++) { - if (!xhci->devs[i]) + if (!xhci->devs[i] || !xhci->devs[i]->udev) continue; speed = xhci->devs[i]->udev->speed; if (((speed >= USB_SPEED_SUPER) == (hcd->speed >= HCD_USB3))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Young sean@mess.org
commit 8d4068810d9926250dd2435719a080b889eb44c3 upstream.
If there is IR in the raw kfifo when ir_raw_event_unregister() is called, then kthread_stop() causes ir_raw_event_thread to be scheduled, decode some scancodes and re-arm timer_keyup. The timer_keyup then fires when the rc device is long gone.
Cc: stable@vger.kernel.org Signed-off-by: Sean Young sean@mess.org Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/rc/rc-main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/media/rc/rc-main.c +++ b/drivers/media/rc/rc-main.c @@ -1824,11 +1824,11 @@ void rc_unregister_device(struct rc_dev if (!dev) return;
- del_timer_sync(&dev->timer_keyup); - if (dev->driver_type == RC_DRIVER_IR_RAW) ir_raw_event_unregister(dev);
+ del_timer_sync(&dev->timer_keyup); + rc_free_rx_device(dev);
device_del(&dev->dev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit 5b9e886a4af97574ca3ce1147f35545da0e7afc7 upstream.
A number of places relies on list_empty(&cs->wd_list), however the list_head does not get initialized. Do so upon registration, such that thereafter it is possible to rely on list_empty() correctly reflecting the list membership status.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Diego Viola diego.viola@gmail.com Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: stable@vger.kernel.org Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/clocksource.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -322,6 +322,8 @@ static void clocksource_enqueue_watchdog { unsigned long flags;
+ INIT_LIST_HEAD(&cs->wd_list); + spin_lock_irqsave(&watchdog_lock, flags); if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) { /* cs is a clocksource to be watched. */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Mueller smueller@chronox.de
commit 2546da99212f22034aecf279da9c47cbfac6c981 upstream.
The RX SGL in processing is already registered with the RX SGL tracking list to support proper cleanup. The cleanup code path uses the sg_num_bytes variable which must therefore be always initialized, even in the error code path.
Signed-off-by: Stephan Mueller smueller@chronox.de Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com #syz test: https://github.com/google/kmsan.git master CC: stable@vger.kernel.org #4.14 Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/af_alg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1183,8 +1183,10 @@ int af_alg_get_rsgl(struct sock *sk, str
/* make one iovec available as scatterlist */ err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen); - if (err < 0) + if (err < 0) { + rsgl->sg_num_bytes = 0; return err; + }
/* chain the new scatterlist with previous one */ if (areq->last_rsgl)
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger crash:
Reported-and-tested-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com
Tested on:
commit: cf8cd3cd03e2 kmsan: fix CONFIG_KMSAN=n build git tree: https://github.com/google/kmsan.git/master kernel config: https://syzkaller.appspot.com/x/.config?x=c21487206316bcc3 compiler: clang version 7.0.0 (trunk 334104) patch: https://syzkaller.appspot.com/x/patch.diff?x=11fec658400000
Note: testing is done by a robot and is best-effort only.
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada yamada.masahiro@socionext.com
commit 3f6e6986045d47f87bd982910821b7ab9758487e upstream.
Since commit 1bb88666775e ("mtd: nand: denali: handle timing parameters by setup_data_interface()"), denali_dt.c gets the clock rate from the clock driver. The driver expects the frequency of the bus interface clock, whereas the clock driver of SOCFPGA provides the core clock. Thus, the setup_data_interface() hook calculates timing parameters based on a wrong frequency.
To make it work without relying on the clock driver, hard-code the clock frequency, 200MHz. This is fine for existing DT of UniPhier, and also fixes the issue of SOCFPGA because both platforms use 200 MHz for the bus interface clock.
Fixes: 1bb88666775e ("mtd: nand: denali: handle timing parameters by setup_data_interface()") Cc: linux-stable stable@vger.kernel.org #4.14+ Reported-by: Philipp Rosenberger p.rosenberger@linutronix.de Suggested-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Tested-by: Richard Weinberger richard@nod.at Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/denali_dt.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/denali_dt.c +++ b/drivers/mtd/nand/denali_dt.c @@ -122,7 +122,11 @@ static int denali_dt_probe(struct platfo if (ret) return ret;
- denali->clk_x_rate = clk_get_rate(dt->clk); + /* + * Hardcode the clock rate for the backward compatibility. + * This works for both SOCFPGA and UniPhier. + */ + denali->clk_x_rate = 200000000;
ret = denali_init(denali); if (ret)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Jenkins alan.christopher.jenkins@gmail.com
commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream.
When blk_queue_enter() waits for a queue to unfreeze, or unset the PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious SIGBUS (or -EIO) sometimes happened during the time the device was being resumed. Most frequently, there was no kernel log message, and we saw Xorg or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in commit 3ef28e83ab15 ("block: generic request_queue reference counting"). Before then, the interruptible wait was only in blk-mq, but I don't think it could ever have been correct.
Reviewed-by: Bart Van Assche bart.vanassche@wdc.com Cc: stable@vger.kernel.org Signed-off-by: Alan Jenkins alan.christopher.jenkins@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-core.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -779,7 +779,6 @@ EXPORT_SYMBOL(blk_alloc_queue); int blk_queue_enter(struct request_queue *q, bool nowait) { while (true) { - int ret;
if (percpu_ref_tryget_live(&q->q_usage_counter)) return 0; @@ -796,13 +795,11 @@ int blk_queue_enter(struct request_queue */ smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq, - !atomic_read(&q->mq_freeze_depth) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + !atomic_read(&q->mq_freeze_depth) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; - if (ret) - return ret; } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui decui@microsoft.com
commit 35a88a18d7ea58600e11590405bc93b08e16e7f5 upstream.
Commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can also run in tasklet context as the channel event callback, so bottom halves should be disabled to prevent a race condition.
With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that don't have commit f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled"), when the upper layer IRQ code calls hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the beginning of __local_bh_enable_ip():
IRQs not enabled as expected WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip
The warning exposes an issue in de0aa7b2f97d: local_bh_enable() can potentially call do_softirq(), which is not supposed to run when local IRQs are disabled. Let's fix this by using local_irq_save()/restore() instead.
Note: hv_pci_onchannelcallback() is not a hot path because it's only called when the PCI device is hot added and removed, which is infrequent.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Cc: stable@vger.kernel.org Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/host/pci-hyperv.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -1091,6 +1091,7 @@ static void hv_compose_msi_msg(struct ir struct pci_bus *pbus; struct pci_dev *pdev; struct cpumask *dest; + unsigned long flags; struct compose_comp_ctxt comp; struct tran_int_desc *int_desc; struct { @@ -1182,14 +1183,15 @@ static void hv_compose_msi_msg(struct ir * the channel callback directly when channel->target_cpu is * the current CPU. When the higher level interrupt code * calls us with interrupt enabled, let's add the - * local_bh_disable()/enable() to avoid race. + * local_irq_save()/restore() to avoid race: + * hv_pci_onchannelcallback() can also run in tasklet. */ - local_bh_disable(); + local_irq_save(flags);
if (hbus->hdev->channel->target_cpu == smp_processor_id()) hv_pci_onchannelcallback(hbus);
- local_bh_enable(); + local_irq_restore(flags);
if (hpdev->state == hv_pcichild_ejecting) { dev_err_once(&hbus->hdev->device,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream.
the ebtables evaluation loop expects targets to return positive values (jumps), or negative values (absolute verdicts).
This is completely different from what xtables does. In xtables, targets are expected to return the standard netfilter verdicts, i.e. NF_DROP, NF_ACCEPT, etc.
ebtables will consider these as jumps.
Therefore reject any target found due to unspec fallback. v2: also reject watchers. ebtables ignores their return value, so a target that assumes skb ownership (and returns NF_STOLEN) causes use-after-free.
The only watchers in the 'ebtables' front-end are log and nflog; both have AF_BRIDGE specific wrappers on kernel side.
Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/netfilter/ebtables.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -398,6 +398,12 @@ ebt_check_watcher(struct ebt_entry_watch watcher = xt_request_find_target(NFPROTO_BRIDGE, w->u.name, 0); if (IS_ERR(watcher)) return PTR_ERR(watcher); + + if (watcher->family != NFPROTO_BRIDGE) { + module_put(watcher->me); + return -ENOENT; + } + w->u.watcher = watcher;
par->target = watcher; @@ -719,6 +725,13 @@ ebt_check_entry(struct ebt_entry *e, str goto cleanup_watchers; }
+ /* Reject UNSPEC, xtables verdicts/return values are incompatible */ + if (target->family != NFPROTO_BRIDGE) { + module_put(target->me); + ret = -ENOENT; + goto cleanup_watchers; + } + t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit fe10e398e860955bac4d28ec031b701d358465e4 upstream.
ReiserFS prepares log messages into a 1024-byte buffer with no bounds checks. Long messages, such as the "unknown mount option" warning when userspace passes a crafted mount options string, overflow this buffer. This causes KASAN to report a global-out-of-bounds write.
Fix it by truncating messages to the buffer size.
Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/reiserfs/prints.c | 141 +++++++++++++++++++++++++++++---------------------- 1 file changed, 81 insertions(+), 60 deletions(-)
--- a/fs/reiserfs/prints.c +++ b/fs/reiserfs/prints.c @@ -76,83 +76,99 @@ static char *le_type(struct reiserfs_key }
/* %k */ -static void sprintf_le_key(char *buf, struct reiserfs_key *key) +static int scnprintf_le_key(char *buf, size_t size, struct reiserfs_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", le32_to_cpu(key->k_dir_id), - le32_to_cpu(key->k_objectid), le_offset(key), - le_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + le32_to_cpu(key->k_dir_id), + le32_to_cpu(key->k_objectid), le_offset(key), + le_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
/* %K */ -static void sprintf_cpu_key(char *buf, struct cpu_key *key) +static int scnprintf_cpu_key(char *buf, size_t size, struct cpu_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", key->on_disk_key.k_dir_id, - key->on_disk_key.k_objectid, reiserfs_cpu_offset(key), - cpu_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + key->on_disk_key.k_dir_id, + key->on_disk_key.k_objectid, + reiserfs_cpu_offset(key), cpu_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
-static void sprintf_de_head(char *buf, struct reiserfs_de_head *deh) +static int scnprintf_de_head(char *buf, size_t size, + struct reiserfs_de_head *deh) { if (deh) - sprintf(buf, - "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", - deh_offset(deh), deh_dir_id(deh), deh_objectid(deh), - deh_location(deh), deh_state(deh)); + return scnprintf(buf, size, + "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", + deh_offset(deh), deh_dir_id(deh), + deh_objectid(deh), deh_location(deh), + deh_state(deh)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]");
}
-static void sprintf_item_head(char *buf, struct item_head *ih) +static int scnprintf_item_head(char *buf, size_t size, struct item_head *ih) { if (ih) { - strcpy(buf, - (ih_version(ih) == KEY_FORMAT_3_6) ? "*3.6* " : "*3.5*"); - sprintf_le_key(buf + strlen(buf), &(ih->ih_key)); - sprintf(buf + strlen(buf), ", item_len %d, item_location %d, " - "free_space(entry_count) %d", - ih_item_len(ih), ih_location(ih), ih_free_space(ih)); + char *p = buf; + char * const end = buf + size; + + p += scnprintf(p, end - p, "%s", + (ih_version(ih) == KEY_FORMAT_3_6) ? + "*3.6* " : "*3.5*"); + + p += scnprintf_le_key(p, end - p, &ih->ih_key); + + p += scnprintf(p, end - p, + ", item_len %d, item_location %d, free_space(entry_count) %d", + ih_item_len(ih), ih_location(ih), + ih_free_space(ih)); + return p - buf; } else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
-static void sprintf_direntry(char *buf, struct reiserfs_dir_entry *de) +static int scnprintf_direntry(char *buf, size_t size, + struct reiserfs_dir_entry *de) { char name[20];
memcpy(name, de->de_name, de->de_namelen > 19 ? 19 : de->de_namelen); name[de->de_namelen > 19 ? 19 : de->de_namelen] = 0; - sprintf(buf, ""%s"==>[%d %d]", name, de->de_dir_id, de->de_objectid); + return scnprintf(buf, size, ""%s"==>[%d %d]", + name, de->de_dir_id, de->de_objectid); }
-static void sprintf_block_head(char *buf, struct buffer_head *bh) +static int scnprintf_block_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, "level=%d, nr_items=%d, free_space=%d rdkey ", - B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); + return scnprintf(buf, size, + "level=%d, nr_items=%d, free_space=%d rdkey ", + B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); }
-static void sprintf_buffer_head(char *buf, struct buffer_head *bh) +static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, - "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", - bh->b_bdev, bh->b_size, - (unsigned long long)bh->b_blocknr, atomic_read(&(bh->b_count)), - bh->b_state, bh->b_page, - buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", - buffer_dirty(bh) ? "DIRTY" : "CLEAN", - buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); + return scnprintf(buf, size, + "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", + bh->b_bdev, bh->b_size, + (unsigned long long)bh->b_blocknr, + atomic_read(&(bh->b_count)), + bh->b_state, bh->b_page, + buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", + buffer_dirty(bh) ? "DIRTY" : "CLEAN", + buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); }
-static void sprintf_disk_child(char *buf, struct disk_child *dc) +static int scnprintf_disk_child(char *buf, size_t size, struct disk_child *dc) { - sprintf(buf, "[dc_number=%d, dc_size=%u]", dc_block_number(dc), - dc_size(dc)); + return scnprintf(buf, size, "[dc_number=%d, dc_size=%u]", + dc_block_number(dc), dc_size(dc)); }
static char *is_there_reiserfs_struct(char *fmt, int *what) @@ -189,55 +205,60 @@ static void prepare_error_buf(const char char *fmt1 = fmt_buf; char *k; char *p = error_buf; + char * const end = &error_buf[sizeof(error_buf)]; int what;
spin_lock(&error_lock);
- strcpy(fmt1, fmt); + if (WARN_ON(strscpy(fmt_buf, fmt, sizeof(fmt_buf)) < 0)) { + strscpy(error_buf, "format string too long", end - error_buf); + goto out_unlock; + }
while ((k = is_there_reiserfs_struct(fmt1, &what)) != NULL) { *k = 0;
- p += vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args);
switch (what) { case 'k': - sprintf_le_key(p, va_arg(args, struct reiserfs_key *)); + p += scnprintf_le_key(p, end - p, + va_arg(args, struct reiserfs_key *)); break; case 'K': - sprintf_cpu_key(p, va_arg(args, struct cpu_key *)); + p += scnprintf_cpu_key(p, end - p, + va_arg(args, struct cpu_key *)); break; case 'h': - sprintf_item_head(p, va_arg(args, struct item_head *)); + p += scnprintf_item_head(p, end - p, + va_arg(args, struct item_head *)); break; case 't': - sprintf_direntry(p, - va_arg(args, - struct reiserfs_dir_entry *)); + p += scnprintf_direntry(p, end - p, + va_arg(args, struct reiserfs_dir_entry *)); break; case 'y': - sprintf_disk_child(p, - va_arg(args, struct disk_child *)); + p += scnprintf_disk_child(p, end - p, + va_arg(args, struct disk_child *)); break; case 'z': - sprintf_block_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_block_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'b': - sprintf_buffer_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_buffer_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'a': - sprintf_de_head(p, - va_arg(args, - struct reiserfs_de_head *)); + p += scnprintf_de_head(p, end - p, + va_arg(args, struct reiserfs_de_head *)); break; }
- p += strlen(p); fmt1 = k + 2; } - vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args); +out_unlock: spin_unlock(&error_lock);
}
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream.
My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway.
A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option.
Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer.
Reproducer for the WARN_ONCE() that syzbot hit:
perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s
Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there):
perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s
Reported-by: syzbot syzkaller@googlegroups.com Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/dns_resolver/dns_key.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
--- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -87,35 +87,39 @@ dns_resolver_preparse(struct key_prepars opt++; kdebug("options: '%s'", opt); do { + int opt_len, opt_nlen; const char *eq; - int opt_len, opt_nlen, opt_vlen, tmp; + char optval[128];
next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (opt_len <= 0 || opt_len > 128) { + if (opt_len <= 0 || opt_len > sizeof(optval)) { pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", opt_len); return -EINVAL; }
- eq = memchr(opt, '=', opt_len) ?: end; - opt_nlen = eq - opt; - eq++; - opt_vlen = next_opt - eq; /* will be -1 if no value */ - - tmp = opt_vlen >= 0 ? opt_vlen : 0; - kdebug("option '%*.*s' val '%*.*s'", - opt_nlen, opt_nlen, opt, tmp, tmp, eq); + eq = memchr(opt, '=', opt_len); + if (eq) { + opt_nlen = eq - opt; + eq++; + memcpy(optval, eq, next_opt - eq); + optval[next_opt - eq] = '\0'; + } else { + opt_nlen = opt_len; + optval[0] = '\0'; + } + + kdebug("option '%*.*s' val '%s'", + opt_nlen, opt_nlen, opt, optval);
/* see if it's an error number representing a DNS error * that's to be recorded as the result in this key */ if (opt_nlen == sizeof(DNS_ERRORNO_OPTION) - 1 && memcmp(opt, DNS_ERRORNO_OPTION, opt_nlen) == 0) { kdebug("dns error number option"); - if (opt_vlen <= 0) - goto bad_option_value;
- ret = kstrtoul(eq, 10, &derrno); + ret = kstrtoul(optval, 10, &derrno); if (ret < 0) goto bad_option_value;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Watson davejwatson@fb.com
commit 32da12216e467dea70a09cd7094c30779ce0f9db upstream.
In the zerocopy sendmsg() path, there are error checks to revert the zerocopy if we get any error code. syzkaller has discovered that tls_push_record can return -ECONNRESET, which is fatal, and happens after the point at which it is safe to revert the iter, as we've already passed the memory to do_tcp_sendpages.
Previously this code could return -ENOMEM and we would want to revert the iter, but AFAIK this no longer returns ENOMEM after a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"), so we fail for all error codes.
Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Dave Watson davejwatson@fb.com Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -449,7 +449,7 @@ alloc_encrypted: ret = tls_push_record(sk, msg->msg_flags, record_type); if (!ret) continue; - if (ret == -EAGAIN) + if (ret < 0) goto send_end;
copied -= try_to_copy;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomas Bortoli tomasbortoli@gmail.com
commit 02f51d45937f7bc7f4dee21e9f85b2d5eac37104 upstream.
The autofs subsystem does not check that the "path" parameter is present for all cases where it is required when it is passed in via the "param" struct.
In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl command.
To solve it, modify validate_dev_ioctl(function to check that a path has been provided for ioctl commands that require it.
Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto... Signed-off-by: Tomas Bortoli tomasbortoli@gmail.com Signed-off-by: Ian Kent raven@themaw.net Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/autofs4/dev-ioctl.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -148,6 +148,15 @@ static int validate_dev_ioctl(int cmd, s cmd); goto out; } + } else { + unsigned int inr = _IOC_NR(cmd); + + if (inr == AUTOFS_DEV_IOCTL_OPENMOUNT_CMD || + inr == AUTOFS_DEV_IOCTL_REQUESTER_CMD || + inr == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD) { + err = -EINVAL; + goto out; + } }
err = 0; @@ -284,7 +293,8 @@ static int autofs_dev_ioctl_openmount(st dev_t devid; int err, fd;
- /* param->path has already been checked */ + /* param->path has been checked in validate_dev_ioctl() */ + if (!param->openmount.devid) return -EINVAL;
@@ -446,10 +456,7 @@ static int autofs_dev_ioctl_requester(st dev_t devid; int err = -ENOENT;
- if (param->size <= AUTOFS_DEV_IOCTL_SIZE) { - err = -EINVAL; - goto out; - } + /* param->path has been checked in validate_dev_ioctl() */
devid = sbi->sb->s_dev;
@@ -534,10 +541,7 @@ static int autofs_dev_ioctl_ismountpoint unsigned int devid, magic; int err = -ENOENT;
- if (param->size <= AUTOFS_DEV_IOCTL_SIZE) { - err = -EINVAL; - goto out; - } + /* param->path has been checked in validate_dev_ioctl() */
name = param->path; type = param->ismountpoint.in.type;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
commit bab2c80e5a6c855657482eac9e97f5f3eedb509a upstream.
When pulling the NSH header in nsh_gso_segment, set the mac length based on the encapsulated packet type.
skb_reset_mac_len computes an offset to the network header, which here still points to the outer packet:
skb_reset_network_header(skb); [...] __skb_pull(skb, nsh_len); skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528 [..] skb_mac_gso_segment(skb, ..)
Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA... Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com Fixes: c411ed854584 ("nsh: add GSO support") Signed-off-by: Willem de Bruijn willemb@google.com Acked-by: Jiri Benc jbenc@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/nsh/nsh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/nsh/nsh.c +++ b/net/nsh/nsh.c @@ -42,7 +42,7 @@ static struct sk_buff *nsh_gso_segment(s __skb_pull(skb, nsh_len);
skb_reset_mac_header(skb); - skb_reset_mac_len(skb); + skb->mac_len = proto == htons(ETH_P_TEB) ? ETH_HLEN : 0; skb->protocol = proto;
features &= NETIF_F_SG;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream.
Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg | tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2
Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute.
Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier.
Reported-by: Eric Dumazet eric.dumazet@gmail.com Reported-by: John Sperbeck jsperbeck@google.com Signed-off-by: Florian Westphal fw@strlen.de Tested-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -618,6 +618,8 @@ int nf_ct_frag6_gather(struct net *net, fq->q.meat == fq->q.len && nf_ct_frag6_reasm(fq, skb, dev)) ret = 0; + else + skb_dst_drop(skb);
out_unlock: spin_unlock_bh(&fq->q.lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
commit 3ee7e8697d5860b173132606d80a9cd35e7113ee upstream.
syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was WB_shutting_down after wb->bdi->dev became NULL. This indicates that unregister_bdi() failed to call wb_shutdown() on one of wb objects.
The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's reference to wb structures before going through the list of wbs again and calling wb_shutdown() on each of them. This way the loop iterating through all wbs can easily miss a wb if that wb has already passed through cgwb_remove_from_bdi_list() called from wb_shutdown() from cgwb_release_workfn() and as a result fully shutdown bdi although wb_workfn() for this wb structure is still running. In fact there are also other ways cgwb_bdi_unregister() can race with cgwb_release_workfn() leading e.g. to use-after-free issues:
CPU1 CPU2 cgwb_bdi_unregister() cgwb_kill(*slot);
cgwb_release() queue_work(cgwb_release_wq, &wb->release_work); cgwb_release_workfn() wb = list_first_entry(&bdi->wb_list, ...) spin_unlock_irq(&cgwb_lock); wb_shutdown(wb); ... kfree_rcu(wb, rcu); wb_shutdown(wb); -> oops use-after-free
We solve these issues by synchronizing writeback structure shutdown from cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That way we also no longer need synchronization using WB_shutting_down as the mutex provides it for CONFIG_CGROUP_WRITEBACK case and without CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from bdi_unregister().
Reported-by: syzbot syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com Acked-by: Tejun Heo tj@kernel.org Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/backing-dev-defs.h | 2 +- mm/backing-dev.c | 20 +++++++------------- 2 files changed, 8 insertions(+), 14 deletions(-)
--- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -22,7 +22,6 @@ struct dentry; */ enum wb_state { WB_registered, /* bdi_register() was done */ - WB_shutting_down, /* wb_shutdown() in progress */ WB_writeback_running, /* Writeback is in progress */ WB_has_dirty_io, /* Dirty inodes on ->b_{dirty|io|more_io} */ }; @@ -165,6 +164,7 @@ struct backing_dev_info { #ifdef CONFIG_CGROUP_WRITEBACK struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */ struct rb_root cgwb_congested_tree; /* their congested states */ + struct mutex cgwb_release_mutex; /* protect shutdown of wb structs */ #else struct bdi_writeback_congested *wb_congested; #endif --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -356,15 +356,8 @@ static void wb_shutdown(struct bdi_write spin_lock_bh(&wb->work_lock); if (!test_and_clear_bit(WB_registered, &wb->state)) { spin_unlock_bh(&wb->work_lock); - /* - * Wait for wb shutdown to finish if someone else is just - * running wb_shutdown(). Otherwise we could proceed to wb / - * bdi destruction before wb_shutdown() is finished. - */ - wait_on_bit(&wb->state, WB_shutting_down, TASK_UNINTERRUPTIBLE); return; } - set_bit(WB_shutting_down, &wb->state); spin_unlock_bh(&wb->work_lock);
cgwb_remove_from_bdi_list(wb); @@ -376,12 +369,6 @@ static void wb_shutdown(struct bdi_write mod_delayed_work(bdi_wq, &wb->dwork, 0); flush_delayed_work(&wb->dwork); WARN_ON(!list_empty(&wb->work_list)); - /* - * Make sure bit gets cleared after shutdown is finished. Matches with - * the barrier provided by test_and_clear_bit() above. - */ - smp_wmb(); - clear_and_wake_up_bit(WB_shutting_down, &wb->state); }
static void wb_exit(struct bdi_writeback *wb) @@ -505,10 +492,12 @@ static void cgwb_release_workfn(struct w struct bdi_writeback *wb = container_of(work, struct bdi_writeback, release_work);
+ mutex_lock(&wb->bdi->cgwb_release_mutex); wb_shutdown(wb);
css_put(wb->memcg_css); css_put(wb->blkcg_css); + mutex_unlock(&wb->bdi->cgwb_release_mutex);
fprop_local_destroy_percpu(&wb->memcg_completions); percpu_ref_exit(&wb->refcnt); @@ -694,6 +683,7 @@ static int cgwb_bdi_init(struct backing_
INIT_RADIX_TREE(&bdi->cgwb_tree, GFP_ATOMIC); bdi->cgwb_congested_tree = RB_ROOT; + mutex_init(&bdi->cgwb_release_mutex);
ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL); if (!ret) { @@ -714,7 +704,10 @@ static void cgwb_bdi_unregister(struct b spin_lock_irq(&cgwb_lock); radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) cgwb_kill(*slot); + spin_unlock_irq(&cgwb_lock);
+ mutex_lock(&bdi->cgwb_release_mutex); + spin_lock_irq(&cgwb_lock); while (!list_empty(&bdi->wb_list)) { wb = list_first_entry(&bdi->wb_list, struct bdi_writeback, bdi_node); @@ -723,6 +716,7 @@ static void cgwb_bdi_unregister(struct b spin_lock_irq(&cgwb_lock); } spin_unlock_irq(&cgwb_lock); + mutex_unlock(&bdi->cgwb_release_mutex); }
/**
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Santosh Shilimkar santosh.shilimkar@oracle.com
commit f1693c63ab133d16994cc50f773982b5905af264 upstream.
Loop transport which is self loopback, remote port congestion update isn't relevant. Infact the xmit path already ignores it. Receive path needs to do the same.
Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com Reviewed-by: Sowmini Varadhan sowmini.varadhan@oracle.com Signed-off-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/rds/loop.c | 1 + net/rds/rds.h | 5 +++++ net/rds/recv.c | 5 +++++ 3 files changed, 11 insertions(+)
--- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -193,4 +193,5 @@ struct rds_transport rds_loop_transport .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", + .t_type = RDS_TRANS_LOOP, }; --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -454,6 +454,11 @@ struct rds_notifier { int n_status; };
+/* Available as part of RDS core, so doesn't need to participate + * in get_preferred transport etc + */ +#define RDS_TRANS_LOOP 3 + /** * struct rds_transport - transport specific behavioural hooks * --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -103,6 +103,11 @@ static void rds_recv_rcvbuf_delta(struct rds_stats_add(s_recv_bytes_added_to_socket, delta); else rds_stats_add(s_recv_bytes_removed_from_socket, -delta); + + /* loop transport doesn't send/recv congestion updates */ + if (rs->rs_transport->t_type == RDS_TRANS_LOOP) + return; + now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs);
rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d "
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 3bc53be9db21040b5d2de4d455f023c8c494aa68 upstream.
syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is because nfc_llcp_send_ui_frame() is retrying the loop without any delay when nonblocking nfc_alloc_send_skb() returned NULL.
Since there is no need to use MSG_DONTWAIT if we retry until sock_alloc_send_pskb() succeeds, let's use blocking call. Also, in case an unexpected error occurred, let's break the loop if blocking nfc_alloc_send_skb() failed.
[1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/nfc/llcp_commands.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/nfc/llcp_commands.c +++ b/net/nfc/llcp_commands.c @@ -752,11 +752,14 @@ int nfc_llcp_send_ui_frame(struct nfc_ll pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len);
- pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT, + pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, 0, frag_len + LLCP_HEADER_SIZE, &err); if (pdu == NULL) { - pr_err("Could not allocate PDU\n"); - continue; + pr_err("Could not allocate PDU (error=%d)\n", err); + len -= remaining_len; + if (len == 0) + len = err; + break; }
pdu = llcp_add_header(pdu, dsap, ssap, LLCP_PDU_UI);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 32b03d1059667a39e089c45ee38ec9c16332430f upstream.
KVM uses tpidr_el2 as its private vcpu register, which makes sense for non-vhe world switch as only KVM can access this register. This means vhe Linux has to use tpidr_el1, which KVM has to save/restore as part of the host context.
If the SDEI handler code runs behind KVMs back, it mustn't access any per-cpu variables. To allow this on systems with vhe we need to make the host use tpidr_el2, saving KVM from save/restoring it.
__guest_enter() stores the host_ctxt on the stack, do the same with the vcpu.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/entry.S | 10 +++++++--- arch/arm64/kvm/hyp/hyp-entry.S | 6 +++--- 2 files changed, 10 insertions(+), 6 deletions(-)
--- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -62,8 +62,8 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1
- // Store the host_ctxt for use at exit time - str x1, [sp, #-16]! + // Store host_ctxt and vcpu for use at exit time + stp x1, x0, [sp, #-16]!
add x18, x0, #VCPU_CONTEXT
@@ -159,6 +159,10 @@ abort_guest_exit_end: ENDPROC(__guest_exit)
ENTRY(__fpsimd_guest_restore) + // x0: esr + // x1: vcpu + // x2-x29,lr: vcpu regs + // vcpu x0-x1 on the stack stp x2, x3, [sp, #-16]! stp x4, lr, [sp, #-16]!
@@ -173,7 +177,7 @@ alternative_else alternative_endif isb
- mrs x3, tpidr_el2 + mov x3, x1
ldr x0, [x3, #VCPU_HOST_CONTEXT] kern_hyp_va x0 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -120,6 +120,7 @@ el1_trap: /* * x0: ESR_EC */ + ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter
/* * We trap the first access to the FP/SIMD to save the host context @@ -132,19 +133,18 @@ alternative_if_not ARM64_HAS_NO_FPSIMD b.eq __fpsimd_guest_restore alternative_else_nop_endif
- mrs x1, tpidr_el2 mov x0, #ARM_EXCEPTION_TRAP b __guest_exit
el1_irq: stp x0, x1, [sp, #-16]! - mrs x1, tpidr_el2 + ldr x1, [sp, #16 + 8] mov x0, #ARM_EXCEPTION_IRQ b __guest_exit
el1_error: stp x0, x1, [sp, #-16]! - mrs x1, tpidr_el2 + ldr x1, [sp, #16 + 8] mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 36989e7fd386a9a5822c48691473863f8fbb404d upstream.
kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init() used to store the host EL1 registers when KVM switches to a guest.
Make it easier for ASM to generate pointers into this per-cpu memory by making it a static allocation.
Signed-off-by: James Morse james.morse@arm.com Acked-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/arm/arm.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-)
--- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -51,8 +51,8 @@ __asm__(".arch_extension virt"); #endif
+DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); -static kvm_cpu_context_t __percpu *kvm_host_cpu_state;
/* Per-CPU variable containing the currently running vcpu. */ static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_arm_running_vcpu); @@ -351,7 +351,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu }
vcpu->cpu = cpu; - vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); + vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state);
kvm_arm_set_running_vcpu(vcpu);
@@ -1259,19 +1259,8 @@ static inline void hyp_cpu_pm_exit(void) } #endif
-static void teardown_common_resources(void) -{ - free_percpu(kvm_host_cpu_state); -} - static int init_common_resources(void) { - kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t); - if (!kvm_host_cpu_state) { - kvm_err("Cannot allocate host CPU state\n"); - return -ENOMEM; - } - /* set size of VMID supported by CPU */ kvm_vmid_bits = kvm_get_vmid_bits(); kvm_info("%d-bit VMID\n", kvm_vmid_bits); @@ -1413,7 +1402,7 @@ static int init_hyp_mode(void) for_each_possible_cpu(cpu) { kvm_cpu_context_t *cpu_ctxt;
- cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu); + cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu); err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP);
if (err) { @@ -1497,7 +1486,6 @@ out_hyp: if (!in_hyp_mode) teardown_hyp_mode(); out_err: - teardown_common_resources(); return err; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit c97e166e54b662717d20ec2e36761758d2b6a7c2 upstream.
Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and on VHE they can be the same register.
KVM calls hyp_panic() when anything unexpected happens. This may occur while a guest owns the EL1 registers. KVM stashes the vcpu pointer in tpidr_el2, which it uses to find the host context in order to restore the host EL1 registers before parachuting into the host's panic().
The host context is a struct kvm_cpu_context allocated in the per-cpu area, and mapped to hyp. Given the per-cpu offset for this CPU, this is easy to find. Change hyp_panic() to take a pointer to the struct kvm_cpu_context. Wrap these calls with an asm function that retrieves the struct kvm_cpu_context from the host's per-cpu area.
Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during kvm init. (Later patches will make this unnecessary for VHE hosts)
We print out the vcpu pointer as part of the panic message. Add a back reference to the 'running vcpu' in the host cpu context to preserve this.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/hyp/hyp-entry.S | 12 ++++++++++++ arch/arm64/kvm/hyp/s2-setup.c | 3 +++ arch/arm64/kvm/hyp/switch.c | 25 +++++++++++++------------ 4 files changed, 30 insertions(+), 12 deletions(-)
--- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -194,6 +194,8 @@ struct kvm_cpu_context { u64 sys_regs[NR_SYS_REGS]; u32 copro[NR_COPRO_REGS]; }; + + struct kvm_vcpu *__hyp_running_vcpu; };
typedef struct kvm_cpu_context kvm_cpu_context_t; --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -179,6 +179,18 @@ ENTRY(__hyp_do_panic) eret ENDPROC(__hyp_do_panic)
+ENTRY(__hyp_panic) + /* + * '=kvm_host_cpu_state' is a host VA from the constant pool, it may + * not be accessible by this address from EL2, hyp_panic() converts + * it with kern_hyp_va() before use. + */ + ldr x0, =kvm_host_cpu_state + mrs x1, tpidr_el2 + add x0, x0, x1 + b hyp_panic +ENDPROC(__hyp_panic) + .macro invalid_vector label, target = __hyp_panic .align 2 \label: --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -84,5 +84,8 @@ u32 __hyp_text __init_stage2_translation
write_sysreg(val, vtcr_el2);
+ /* copy tpidr_el1 into tpidr_el2 for use by HYP */ + write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); + return parange; } --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -289,9 +289,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm u64 exit_code;
vcpu = kern_hyp_va(vcpu); - write_sysreg(vcpu, tpidr_el2);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); + host_ctxt->__hyp_running_vcpu = vcpu; guest_ctxt = &vcpu->arch.ctxt;
__sysreg_save_host_state(host_ctxt); @@ -406,7 +406,8 @@ again:
static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
-static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par) +static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par, + struct kvm_vcpu *vcpu) { unsigned long str_va;
@@ -420,35 +421,35 @@ static void __hyp_text __hyp_call_panic_ __hyp_do_panic(str_va, spsr, elr, read_sysreg(esr_el2), read_sysreg_el2(far), - read_sysreg(hpfar_el2), par, - (void *)read_sysreg(tpidr_el2)); + read_sysreg(hpfar_el2), par, vcpu); }
-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par) +static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par, + struct kvm_vcpu *vcpu) { panic(__hyp_panic_string, spsr, elr, read_sysreg_el2(esr), read_sysreg_el2(far), - read_sysreg(hpfar_el2), par, - (void *)read_sysreg(tpidr_el2)); + read_sysreg(hpfar_el2), par, vcpu); }
static hyp_alternate_select(__hyp_call_panic, __hyp_call_panic_nvhe, __hyp_call_panic_vhe, ARM64_HAS_VIRT_HOST_EXTN);
-void __hyp_text __noreturn __hyp_panic(void) +void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) { + struct kvm_vcpu *vcpu = NULL; + u64 spsr = read_sysreg_el2(spsr); u64 elr = read_sysreg_el2(elr); u64 par = read_sysreg(par_el1);
if (read_sysreg(vttbr_el2)) { - struct kvm_vcpu *vcpu; struct kvm_cpu_context *host_ctxt;
- vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2); - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); + host_ctxt = kern_hyp_va(__host_ctxt); + vcpu = host_ctxt->__hyp_running_vcpu; __timer_save_state(vcpu); __deactivate_traps(vcpu); __deactivate_vm(vcpu); @@ -456,7 +457,7 @@ void __hyp_text __noreturn __hyp_panic(v }
/* Call panic for real */ - __hyp_call_panic()(spsr, elr, par); + __hyp_call_panic()(spsr, elr, par, vcpu);
unreachable(); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 6d99b68933fbcf51f84fcbba49246ce1209ec193 upstream.
Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1 on VHE hosts, and allows future code to blindly access per-cpu variables without triggering world-switch.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/alternative.h | 2 ++ arch/arm64/include/asm/assembler.h | 8 ++++++++ arch/arm64/include/asm/percpu.h | 11 +++++++++-- arch/arm64/kernel/alternative.c | 9 +++++---- arch/arm64/kernel/cpufeature.c | 17 +++++++++++++++++ arch/arm64/mm/proc.S | 8 ++++++++ 6 files changed, 49 insertions(+), 6 deletions(-)
--- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -12,6 +12,8 @@ #include <linux/stddef.h> #include <linux/stringify.h>
+extern int alternatives_applied; + struct alt_instr { s32 orig_offset; /* offset to original instruction */ s32 alt_offset; /* offset to replacement instruction */ --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -260,7 +260,11 @@ lr .req x30 // link register #else adr_l \dst, \sym #endif +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs \tmp, tpidr_el1 +alternative_else + mrs \tmp, tpidr_el2 +alternative_endif add \dst, \dst, \tmp .endm
@@ -271,7 +275,11 @@ lr .req x30 // link register */ .macro ldr_this_cpu dst, sym, tmp adr_l \dst, \sym +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs \tmp, tpidr_el1 +alternative_else + mrs \tmp, tpidr_el2 +alternative_endif ldr \dst, [\dst, \tmp] .endm
--- a/arch/arm64/include/asm/percpu.h +++ b/arch/arm64/include/asm/percpu.h @@ -16,11 +16,15 @@ #ifndef __ASM_PERCPU_H #define __ASM_PERCPU_H
+#include <asm/alternative.h> #include <asm/stack_pointer.h>
static inline void set_my_cpu_offset(unsigned long off) { - asm volatile("msr tpidr_el1, %0" :: "r" (off) : "memory"); + asm volatile(ALTERNATIVE("msr tpidr_el1, %0", + "msr tpidr_el2, %0", + ARM64_HAS_VIRT_HOST_EXTN) + :: "r" (off) : "memory"); }
static inline unsigned long __my_cpu_offset(void) @@ -31,7 +35,10 @@ static inline unsigned long __my_cpu_off * We want to allow caching the value, so avoid using volatile and * instead use a fake stack read to hazard against barrier(). */ - asm("mrs %0, tpidr_el1" : "=r" (off) : + asm(ALTERNATIVE("mrs %0, tpidr_el1", + "mrs %0, tpidr_el2", + ARM64_HAS_VIRT_HOST_EXTN) + : "=r" (off) : "Q" (*(const unsigned long *)current_stack_pointer));
return off; --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -32,6 +32,8 @@ #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
+int alternatives_applied; + struct alt_region { struct alt_instr *begin; struct alt_instr *end; @@ -143,7 +145,6 @@ static void __apply_alternatives(void *a */ static int __apply_alternatives_multi_stop(void *unused) { - static int patched = 0; struct alt_region region = { .begin = (struct alt_instr *)__alt_instructions, .end = (struct alt_instr *)__alt_instructions_end, @@ -151,14 +152,14 @@ static int __apply_alternatives_multi_st
/* We always have a CPU 0 at this point (__init) */ if (smp_processor_id()) { - while (!READ_ONCE(patched)) + while (!READ_ONCE(alternatives_applied)) cpu_relax(); isb(); } else { - BUG_ON(patched); + BUG_ON(alternatives_applied); __apply_alternatives(®ion, true); /* Barriers provided by the cache flushing */ - WRITE_ONCE(patched, 1); + WRITE_ONCE(alternatives_applied, 1); }
return 0; --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -880,6 +880,22 @@ static int __init parse_kpti(char *str) early_param("kpti", parse_kpti); #endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
+static int cpu_copy_el2regs(void *__unused) +{ + /* + * Copy register values that aren't redirected by hardware. + * + * Before code patching, we only set tpidr_el1, all CPUs need to copy + * this value to tpidr_el2 before we patch the code. Once we've done + * that, freshly-onlined CPUs will set tpidr_el2, so we don't need to + * do anything here. + */ + if (!alternatives_applied) + write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); + + return 0; +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -949,6 +965,7 @@ static const struct arm64_cpu_capabiliti .capability = ARM64_HAS_VIRT_HOST_EXTN, .def_scope = SCOPE_SYSTEM, .matches = runs_at_el2, + .enable = cpu_copy_el2regs, }, { .desc = "32-bit EL0 Support", --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -70,7 +70,11 @@ ENTRY(cpu_do_suspend) mrs x8, mdscr_el1 mrs x9, oslsr_el1 mrs x10, sctlr_el1 +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs x11, tpidr_el1 +alternative_else + mrs x11, tpidr_el2 +alternative_endif mrs x12, sp_el0 stp x2, x3, [x0] stp x4, xzr, [x0, #16] @@ -116,7 +120,11 @@ ENTRY(cpu_do_resume) msr mdscr_el1, x10
msr sctlr_el1, x12 +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN msr tpidr_el1, x13 +alternative_else + msr tpidr_el2, x13 +alternative_endif msr sp_el0, x14 /* * Restore oslsr_el1 by writing oslar_el1
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 1f742679c33bc083722cb0b442a95d458c491b56 upstream.
Now that a VHE host uses tpidr_el2 for the cpu offset we no longer need KVM to save/restore tpidr_el1. Move this from the 'common' code into the non-vhe code. While we're at it, on VHE we don't need to save the ELR or SPSR as kernel_entry in entry.S will have pushed these onto the kernel stack, and will restore them from there. Move these to the non-vhe code as we need them to get back to the host.
Finally remove the always-copy-tpidr we hid in the stage2 setup code, cpufeature's enable callback will do this for VHE, we only need KVM to do it for non-vhe. Add the copy into kvm-init instead.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp-init.S | 4 ++++ arch/arm64/kvm/hyp/s2-setup.c | 3 --- arch/arm64/kvm/hyp/sysreg-sr.c | 16 ++++++++-------- 3 files changed, 12 insertions(+), 11 deletions(-)
--- a/arch/arm64/kvm/hyp-init.S +++ b/arch/arm64/kvm/hyp-init.S @@ -122,6 +122,10 @@ CPU_BE( orr x4, x4, #SCTLR_ELx_EE) kern_hyp_va x2 msr vbar_el2, x2
+ /* copy tpidr_el1 into tpidr_el2 for use by HYP */ + mrs x1, tpidr_el1 + msr tpidr_el2, x1 + /* Hello, World! */ eret ENDPROC(__kvm_hyp_init) --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -84,8 +84,5 @@ u32 __hyp_text __init_stage2_translation
write_sysreg(val, vtcr_el2);
- /* copy tpidr_el1 into tpidr_el2 for use by HYP */ - write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); - return parange; } --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -27,8 +27,8 @@ static void __hyp_text __sysreg_do_nothi /* * Non-VHE: Both host and guest must save everything. * - * VHE: Host must save tpidr*_el[01], actlr_el1, mdscr_el1, sp0, pc, - * pstate, and guest must save everything. + * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0, + * and guest must save everything. */
static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt) @@ -36,11 +36,8 @@ static void __hyp_text __sysreg_save_com ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1); ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0); ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0); - ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1); ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1); ctxt->gp_regs.regs.sp = read_sysreg(sp_el0); - ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); - ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); }
static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) @@ -62,10 +59,13 @@ static void __hyp_text __sysreg_save_sta ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair); ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl); ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1); + ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1); ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr); ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr); + ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); + ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); }
static hyp_alternate_select(__sysreg_call_save_host_state, @@ -89,11 +89,8 @@ static void __hyp_text __sysreg_restore_ write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1); write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0); write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0); - write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1); write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1); write_sysreg(ctxt->gp_regs.regs.sp, sp_el0); - write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); - write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); }
static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) @@ -115,10 +112,13 @@ static void __hyp_text __sysreg_restore_ write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair); write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl); write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1); + write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
write_sysreg(ctxt->gp_regs.sp_el1, sp_el1); write_sysreg_el1(ctxt->gp_regs.elr_el1, elr); write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr); + write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); + write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); }
static hyp_alternate_select(__sysreg_call_restore_host_state,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
Commit dea5e2a4c5bcf196f879a66cebdcca07793e8ba4 upstream.
We've so far relied on a patching infrastructure that only gave us a single alternative, without any way to provide a range of potential replacement instructions. For a single feature, this is an all or nothing thing.
It would be interesting to have a more flexible grained way of patching the kernel though, where we could dynamically tune the code that gets injected.
In order to achive this, let's introduce a new form of dynamic patching, assiciating a callback to a patching site. This callback gets source and target locations of the patching request, as well as the number of instructions to be patched.
Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb directives:
asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback) : "r" (v)); or alternative_cb callback mov x0, #0 alternative_cb_end
where callback is the C function computing the alternative.
Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/alternative.h | 41 ++++++++++++++++++++++++++++++--- arch/arm64/kernel/alternative.c | 43 ++++++++++++++++++++++++++--------- 2 files changed, 69 insertions(+), 15 deletions(-)
--- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -5,6 +5,8 @@ #include <asm/cpucaps.h> #include <asm/insn.h>
+#define ARM64_CB_PATCH ARM64_NCAPS + #ifndef __ASSEMBLY__
#include <linux/init.h> @@ -22,12 +24,19 @@ struct alt_instr { u8 alt_len; /* size of new instruction(s), <= orig_len */ };
+typedef void (*alternative_cb_t)(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst); + void __init apply_alternatives_all(void); void apply_alternatives(void *start, size_t length);
-#define ALTINSTR_ENTRY(feature) \ +#define ALTINSTR_ENTRY(feature,cb) \ " .word 661b - .\n" /* label */ \ + " .if " __stringify(cb) " == 0\n" \ " .word 663f - .\n" /* new instruction */ \ + " .else\n" \ + " .word " __stringify(cb) "- .\n" /* callback */ \ + " .endif\n" \ " .hword " __stringify(feature) "\n" /* feature bit */ \ " .byte 662b-661b\n" /* source len */ \ " .byte 664f-663f\n" /* replacement len */ @@ -45,15 +54,18 @@ void apply_alternatives(void *start, siz * but most assemblers die if insn1 or insn2 have a .inst. This should * be fixed in a binutils release posterior to 2.25.51.0.2 (anything * containing commit 4e4d08cf7399b606 or c1baaddf8861). + * + * Alternatives with callbacks do not generate replacement instructions. */ -#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled) \ +#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled, cb) \ ".if "__stringify(cfg_enabled)" == 1\n" \ "661:\n\t" \ oldinstr "\n" \ "662:\n" \ ".pushsection .altinstructions,"a"\n" \ - ALTINSTR_ENTRY(feature) \ + ALTINSTR_ENTRY(feature,cb) \ ".popsection\n" \ + " .if " __stringify(cb) " == 0\n" \ ".pushsection .altinstr_replacement, "a"\n" \ "663:\n\t" \ newinstr "\n" \ @@ -61,11 +73,17 @@ void apply_alternatives(void *start, siz ".popsection\n\t" \ ".org . - (664b-663b) + (662b-661b)\n\t" \ ".org . - (662b-661b) + (664b-663b)\n" \ + ".else\n\t" \ + "663:\n\t" \ + "664:\n\t" \ + ".endif\n" \ ".endif\n"
#define _ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg, ...) \ - __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg)) + __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0)
+#define ALTERNATIVE_CB(oldinstr, cb) \ + __ALTERNATIVE_CFG(oldinstr, "NOT_AN_INSTRUCTION", ARM64_CB_PATCH, 1, cb) #else
#include <asm/assembler.h> @@ -132,6 +150,14 @@ void apply_alternatives(void *start, siz 661: .endm
+.macro alternative_cb cb + .set .Lasm_alt_mode, 0 + .pushsection .altinstructions, "a" + altinstruction_entry 661f, \cb, ARM64_CB_PATCH, 662f-661f, 0 + .popsection +661: +.endm + /* * Provide the other half of the alternative code sequence. */ @@ -158,6 +184,13 @@ void apply_alternatives(void *start, siz .endm
/* + * Callback-based alternative epilogue + */ +.macro alternative_cb_end +662: +.endm + +/* * Provides a trivial alternative or default sequence consisting solely * of NOPs. The number of NOPs is chosen automatically to match the * previous case. --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -107,32 +107,53 @@ static u32 get_alt_insn(struct alt_instr return insn; }
+static void patch_alternative(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + __le32 *replptr; + int i; + + replptr = ALT_REPL_PTR(alt); + for (i = 0; i < nr_inst; i++) { + u32 insn; + + insn = get_alt_insn(alt, origptr + i, replptr + i); + updptr[i] = cpu_to_le32(insn); + } +} + static void __apply_alternatives(void *alt_region, bool use_linear_alias) { struct alt_instr *alt; struct alt_region *region = alt_region; - __le32 *origptr, *replptr, *updptr; + __le32 *origptr, *updptr; + alternative_cb_t alt_cb;
for (alt = region->begin; alt < region->end; alt++) { - u32 insn; - int i, nr_inst; + int nr_inst;
- if (!cpus_have_cap(alt->cpufeature)) + /* Use ARM64_CB_PATCH as an unconditional patch */ + if (alt->cpufeature < ARM64_CB_PATCH && + !cpus_have_cap(alt->cpufeature)) continue;
- BUG_ON(alt->alt_len != alt->orig_len); + if (alt->cpufeature == ARM64_CB_PATCH) + BUG_ON(alt->alt_len != 0); + else + BUG_ON(alt->alt_len != alt->orig_len);
pr_info_once("patching kernel code\n");
origptr = ALT_ORIG_PTR(alt); - replptr = ALT_REPL_PTR(alt); updptr = use_linear_alias ? lm_alias(origptr) : origptr; - nr_inst = alt->alt_len / sizeof(insn); + nr_inst = alt->orig_len / AARCH64_INSN_SIZE; + + if (alt->cpufeature < ARM64_CB_PATCH) + alt_cb = patch_alternative; + else + alt_cb = ALT_REPL_PTR(alt);
- for (i = 0; i < nr_inst; i++) { - insn = get_alt_insn(alt, origptr + i, replptr + i); - updptr[i] = cpu_to_le32(insn); - } + alt_cb(alt, origptr, updptr, nr_inst);
flush_icache_range((uintptr_t)origptr, (uintptr_t)(origptr + nr_inst));
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
Commit 44a497abd621a71c645f06d3d545ae2f46448830 upstream.
kvm_vgic_global_state is part of the read-only section, and is usually accessed using a PC-relative address generation (adrp + add).
It is thus useless to use kern_hyp_va() on it, and actively problematic if kern_hyp_va() becomes non-idempotent. On the other hand, there is no way that the compiler is going to guarantee that such access is always PC relative.
So let's bite the bullet and provide our own accessor.
Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: James Morse james.morse@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_mmu.h | 7 +++++++ arch/arm64/include/asm/kvm_mmu.h | 20 ++++++++++++++++++++ virt/kvm/arm/hyp/vgic-v2-sr.c | 2 +- 3 files changed, 28 insertions(+), 1 deletion(-)
--- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -28,6 +28,13 @@ */ #define kern_hyp_va(kva) (kva)
+/* Contrary to arm64, there is no need to generate a PC-relative address */ +#define hyp_symbol_addr(s) \ + ({ \ + typeof(s) *addr = &(s); \ + addr; \ + }) + /* * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. */ --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -131,6 +131,26 @@ static inline unsigned long __kern_hyp_v #define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned long)(v))))
/* + * Obtain the PC-relative address of a kernel symbol + * s: symbol + * + * The goal of this macro is to return a symbol's address based on a + * PC-relative computation, as opposed to a loading the VA from a + * constant pool or something similar. This works well for HYP, as an + * absolute VA is guaranteed to be wrong. Only use this if trying to + * obtain the address of a symbol (i.e. not something you obtained by + * following a pointer). + */ +#define hyp_symbol_addr(s) \ + ({ \ + typeof(s) *addr; \ + asm("adrp %0, %1\n" \ + "add %0, %0, :lo12:%1\n" \ + : "=r" (addr) : "S" (&s)); \ + addr; \ + }) + +/* * We currently only support a 40bit IPA. */ #define KVM_PHYS_SHIFT (40) --- a/virt/kvm/arm/hyp/vgic-v2-sr.c +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c @@ -139,7 +139,7 @@ int __hyp_text __vgic_v2_perform_cpuif_a return -1;
rd = kvm_vcpu_dabt_get_rd(vcpu); - addr = kern_hyp_va((kern_hyp_va(&kvm_vgic_global_state))->vcpu_base_va); + addr = kern_hyp_va(hyp_symbol_addr(kvm_vgic_global_state)->vcpu_base_va); addr += fault_ipa - vgic->vgic_cpu_base;
if (kvm_vcpu_dabt_iswrite(vcpu)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoffer Dall christoffer.dall@linaro.org
Commit 4464e210de9e80e38de59df052fe09ea2ff80b1b upstream.
We already have the percpu area for the host cpu state, which points to the VCPU, so there's no need to store the VCPU pointer on the stack on every context switch. We can be a little more clever and just use tpidr_el2 for the percpu offset and load the VCPU pointer from the host context.
This has the benefit of being able to retrieve the host context even when our stack is corrupted, and it has a potential performance benefit because we trade a store plus a load for an mrs and a load on a round trip to the guest.
This does require us to calculate the percpu offset without including the offset from the kernel mapping of the percpu array to the linear mapping of the array (which is what we store in tpidr_el1), because a PC-relative generated address in EL2 is already giving us the hyp alias of the linear mapping of a kernel address. We do this in __cpu_init_hyp_mode() by using kvm_ksym_ref().
The code that accesses ESR_EL2 was previously using an alternative to use the _EL1 accessor on VHE systems, but this was actually unnecessary as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2 accessor does the same thing on both systems.
Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Andrew Jones drjones@redhat.com Signed-off-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_asm.h | 15 +++++++++++++++ arch/arm64/include/asm/kvm_host.h | 15 +++++++++++++++ arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/entry.S | 6 +----- arch/arm64/kvm/hyp/hyp-entry.S | 28 ++++++++++------------------ arch/arm64/kvm/hyp/switch.c | 5 +---- arch/arm64/kvm/hyp/sysreg-sr.c | 5 +++++ 7 files changed, 48 insertions(+), 27 deletions(-)
--- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,7 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+/* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ void *val = &sym; \ @@ -68,6 +69,20 @@ extern u32 __init_stage2_translation(voi
extern void __qcom_hyp_sanitize_btac_predictors(void);
+#else /* __ASSEMBLY__ */ + +.macro get_host_ctxt reg, tmp + adr_l \reg, kvm_host_cpu_state + mrs \tmp, tpidr_el2 + add \reg, \reg, \tmp +.endm + +.macro get_vcpu_ptr vcpu, ctxt + get_host_ctxt \ctxt, \vcpu + ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU] + kern_hyp_va \vcpu +.endm + #endif
#endif /* __ARM_KVM_ASM_H__ */ --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -350,10 +350,15 @@ int kvm_perf_teardown(void);
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void __kvm_set_tpidr_el2(u64 tpidr_el2); +DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); + static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, unsigned long hyp_stack_ptr, unsigned long vector_ptr) { + u64 tpidr_el2; + /* * Call initialization code, and switch to the full blown HYP code. * If the cpucaps haven't been finalized yet, something has gone very @@ -362,6 +367,16 @@ static inline void __cpu_init_hyp_mode(p */ BUG_ON(!static_branch_likely(&arm64_const_caps_ready)); __kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr); + + /* + * Calculate the raw per-cpu offset without a translation from the + * kernel's mapping to the linear mapping, and store it in tpidr_el2 + * so that we can use adr_l to access per-cpu variables in EL2. + */ + tpidr_el2 = (u64)this_cpu_ptr(&kvm_host_cpu_state) + - (u64)kvm_ksym_ref(kvm_host_cpu_state); + + kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2); }
static inline void kvm_arch_hardware_unsetup(void) {} --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -136,6 +136,7 @@ int main(void) DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2])); DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context)); + DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu)); #endif #ifdef CONFIG_CPU_PM DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx)); --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -62,9 +62,6 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1
- // Store host_ctxt and vcpu for use at exit time - stp x1, x0, [sp, #-16]! - add x18, x0, #VCPU_CONTEXT
// Restore guest regs x0-x17 @@ -118,8 +115,7 @@ ENTRY(__guest_exit) // Store the guest regs x19-x29, lr save_callee_saved_regs x1
- // Restore the host_ctxt from the stack - ldr x2, [sp], #16 + get_host_ctxt x2, x3
// Now restore the host regs restore_callee_saved_regs x2 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -57,13 +57,8 @@ ENDPROC(__vhe_hyp_call) el1_sync: // Guest trapped into EL2 stp x0, x1, [sp, #-16]!
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN - mrs x1, esr_el2 -alternative_else - mrs x1, esr_el1 -alternative_endif - lsr x0, x1, #ESR_ELx_EC_SHIFT - + mrs x0, esr_el2 + lsr x0, x0, #ESR_ELx_EC_SHIFT cmp x0, #ESR_ELx_EC_HVC64 ccmp x0, #ESR_ELx_EC_HVC32, #4, ne b.ne el1_trap @@ -117,10 +112,14 @@ el1_hvc_guest: eret
el1_trap: + get_vcpu_ptr x1, x0 + + mrs x0, esr_el2 + lsr x0, x0, #ESR_ELx_EC_SHIFT /* * x0: ESR_EC + * x1: vcpu pointer */ - ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter
/* * We trap the first access to the FP/SIMD to save the host context @@ -138,13 +137,13 @@ alternative_else_nop_endif
el1_irq: stp x0, x1, [sp, #-16]! - ldr x1, [sp, #16 + 8] + get_vcpu_ptr x1, x0 mov x0, #ARM_EXCEPTION_IRQ b __guest_exit
el1_error: stp x0, x1, [sp, #-16]! - ldr x1, [sp, #16 + 8] + get_vcpu_ptr x1, x0 mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit
@@ -180,14 +179,7 @@ ENTRY(__hyp_do_panic) ENDPROC(__hyp_do_panic)
ENTRY(__hyp_panic) - /* - * '=kvm_host_cpu_state' is a host VA from the constant pool, it may - * not be accessible by this address from EL2, hyp_panic() converts - * it with kern_hyp_va() before use. - */ - ldr x0, =kvm_host_cpu_state - mrs x1, tpidr_el2 - add x0, x0, x1 + get_host_ctxt x0, x1 b hyp_panic ENDPROC(__hyp_panic)
--- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -437,7 +437,7 @@ static hyp_alternate_select(__hyp_call_p __hyp_call_panic_nvhe, __hyp_call_panic_vhe, ARM64_HAS_VIRT_HOST_EXTN);
-void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) +void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt) { struct kvm_vcpu *vcpu = NULL;
@@ -446,9 +446,6 @@ void __hyp_text __noreturn hyp_panic(str u64 par = read_sysreg(par_el1);
if (read_sysreg(vttbr_el2)) { - struct kvm_cpu_context *host_ctxt; - - host_ctxt = kern_hyp_va(__host_ctxt); vcpu = host_ctxt->__hyp_running_vcpu; __timer_save_state(vcpu); __deactivate_traps(vcpu); --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -183,3 +183,8 @@ void __hyp_text __sysreg32_restore_state if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY) write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2); } + +void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2) +{ + asm("msr tpidr_el2, %0": : "r" (tpidr_el2)); +}
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit eff0e9e1078ea7dc1d794dc50e31baef984c46d7 upstream.
We've so far used the PSCI return codes for SMCCC because they were extremely similar. But with the new ARM DEN 0070A specification, "NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS".
Let's bite the bullet and add SMCCC specific return codes. Users can be repainted as and when required.
Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/arm-smccc.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -291,5 +291,10 @@ asmlinkage void __arm_smccc_hvc(unsigned */ #define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
+/* Return codes defined in ARM DEN 0070A */ +#define SMCCC_RET_SUCCESS 0 +#define SMCCC_RET_NOT_SUPPORTED -1 +#define SMCCC_RET_NOT_REQUIRED -2 + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream.
In order for the kernel to protect itself, let's call the SSBD mitigation implemented by the higher exception level (either hypervisor or firmware) on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the right exception level, hence the introduction of a runtime patching callback.
Reviewed-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Julien Grall julien.grall@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++++++++ arch/arm64/kernel/entry.S | 22 ++++++++++++++++++++++ include/linux/arm-smccc.h | 5 +++++ 3 files changed, 51 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -228,6 +228,30 @@ static int qcom_enable_link_stack_saniti } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
+#ifdef CONFIG_ARM64_SSBD +void __init arm64_update_smccc_conduit(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + u32 insn; + + BUG_ON(nr_inst != 1); + + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + insn = aarch64_insn_get_hvc_value(); + break; + case PSCI_CONDUIT_SMC: + insn = aarch64_insn_get_smc_value(); + break; + default: + return; + } + + *updptr = cpu_to_le32(insn); +} +#endif /* CONFIG_ARM64_SSBD */ + #define MIDR_RANGE(model, min, max) \ .def_scope = SCOPE_LOCAL_CPU, \ .matches = is_affected_midr_range, \ --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -18,6 +18,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/init.h> #include <linux/linkage.h>
@@ -137,6 +138,18 @@ alternative_else_nop_endif add \dst, \dst, #(\sym - .entry.tramp.text) .endm
+ // This macro corrupts x0-x3. It is the caller's duty + // to save/restore them if required. + .macro apply_ssbd, state +#ifdef CONFIG_ARM64_SSBD + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + mov w1, #\state +alternative_cb arm64_update_smccc_conduit + nop // Patched to SMC/HVC #0 +alternative_cb_end +#endif + .endm + .macro kernel_entry, el, regsize = 64 .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 @@ -163,6 +176,13 @@ alternative_else_nop_endif ldr x19, [tsk, #TSK_TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling.
+ apply_ssbd 1 + +#ifdef CONFIG_ARM64_SSBD + ldp x0, x1, [sp, #16 * 0] + ldp x2, x3, [sp, #16 * 1] +#endif + mov x29, xzr // fp pointed to user-space .else add x21, sp, #S_FRAME_SIZE @@ -301,6 +321,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: + apply_ssbd 0 + .endif
msr elr_el1, x21 // set up the return data --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -80,6 +80,11 @@ ARM_SMCCC_SMC_32, \ 0, 0x8000)
+#define ARM_SMCCC_ARCH_WORKAROUND_2 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0x7fff) + #ifndef __ASSEMBLY__
#include <linux/linkage.h>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream.
In a heterogeneous system, we can end up with both affected and unaffected CPUs. Let's check their status before calling into the firmware.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 2 ++ arch/arm64/kernel/entry.S | 11 +++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -229,6 +229,8 @@ static int qcom_enable_link_stack_saniti #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
#ifdef CONFIG_ARM64_SSBD +DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -140,8 +140,10 @@ alternative_else_nop_endif
// This macro corrupts x0-x3. It is the caller's duty // to save/restore them if required. - .macro apply_ssbd, state + .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD + ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 + cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit @@ -176,12 +178,13 @@ alternative_cb_end ldr x19, [tsk, #TSK_TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling.
- apply_ssbd 1 + apply_ssbd 1, 1f, x22, x23
#ifdef CONFIG_ARM64_SSBD ldp x0, x1, [sp, #16 * 0] ldp x2, x3, [sp, #16 * 1] #endif +1:
mov x29, xzr // fp pointed to user-space .else @@ -321,8 +324,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: - apply_ssbd 0 - + apply_ssbd 0, 5f, x0, x1 +5: .endif
msr elr_el1, x21 // set up the return data
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit a725e3dda1813ed306734823ac4c65ca04e38500 upstream.
As for Spectre variant-2, we rely on SMCCC 1.1 to provide the discovery mechanism for detecting the SSBD mitigation.
A new capability is also allocated for that purpose, and a config option.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 9 +++++ arch/arm64/include/asm/cpucaps.h | 3 + arch/arm64/kernel/cpu_errata.c | 69 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -849,6 +849,15 @@ config HARDEN_BRANCH_PREDICTOR
If unsure, say Y.
+config ARM64_SSBD + bool "Speculative Store Bypass Disable" if EXPERT + default y + help + This enables mitigation of the bypassing of previous stores + by speculative loads. + + If unsure, say Y. + menuconfig ARMV8_DEPRECATED bool "Emulate deprecated/obsolete ARMv8 instructions" depends on COMPAT --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -43,7 +43,8 @@ #define ARM64_UNMAP_KERNEL_AT_EL0 23 #define ARM64_HARDEN_BRANCH_PREDICTOR 24 #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25 +#define ARM64_SSBD 26
-#define ARM64_NCAPS 26 +#define ARM64_NCAPS 27
#endif /* __ASM_CPUCAPS_H */ --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -252,6 +252,67 @@ void __init arm64_update_smccc_conduit(s
*updptr = cpu_to_le32(insn); } + +static void arm64_set_ssbd_mitigation(bool state) +{ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + default: + WARN_ON_ONCE(1); + break; + } +} + +static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry, + int scope) +{ + struct arm_smccc_res res; + bool supported = true; + + WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + return false; + + /* + * The probe function return value is either negative + * (unsupported or mitigated), positive (unaffected), or zero + * (requires mitigation). We only need to do anything in the + * last case. + */ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + default: + supported = false; + } + + if (supported) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + + return supported; +} #endif /* CONFIG_ARM64_SSBD */
#define MIDR_RANGE(model, min, max) \ @@ -452,6 +513,14 @@ const struct arm64_cpu_capabilities arm6 .enable = enable_smccc_arch_workaround_1, }, #endif +#ifdef CONFIG_ARM64_SSBD + { + .desc = "Speculative Store Bypass Disable", + .def_scope = SCOPE_LOCAL_CPU, + .capability = ARM64_SSBD, + .matches = has_ssbd_mitigation, + }, +#endif { } };
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream.
On a system where the firmware implements ARCH_WORKAROUND_2, it may be useful to either permanently enable or disable the workaround for cases where the user decides that they'd rather not get a trap overhead, and keep the mitigation permanently on or off instead of switching it on exception entry/exit.
In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 17 +++ arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/kernel/cpu_errata.c | 103 ++++++++++++++++++++---- 3 files changed, 110 insertions(+), 16 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3997,6 +3997,23 @@ expediting. Set to zero to disable automatic expediting.
+ ssbd= [ARM64,HW] + Speculative Store Bypass Disable control + + On CPUs that are vulnerable to the Speculative + Store Bypass vulnerability and offer a + firmware based mitigation, this parameter + indicates how the mitigation should be used: + + force-on: Unconditionally enable mitigation for + for both kernel and userspace + force-off: Unconditionally disable mitigation for + for both kernel and userspace + kernel: Always enable mitigation in the + kernel, and offer a prctl interface + to allow userspace to register its + interest in being mitigated too. + stack_guard_gap= [MM] override the default stack gap protection. The value is in page units and it defines how many pages prior --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -262,6 +262,12 @@ static inline bool system_uses_ttbr0_pan !cpus_have_const_cap(ARM64_HAS_PAN); }
+#define ARM64_SSBD_UNKNOWN -1 +#define ARM64_SSBD_FORCE_DISABLE 0 +#define ARM64_SSBD_KERNEL 1 +#define ARM64_SSBD_FORCE_ENABLE 2 +#define ARM64_SSBD_MITIGATED 3 + #endif /* __ASSEMBLY__ */
#endif --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -231,6 +231,38 @@ static int qcom_enable_link_stack_saniti #ifdef CONFIG_ARM64_SSBD DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
+int ssbd_state __read_mostly = ARM64_SSBD_KERNEL; + +static const struct ssbd_options { + const char *str; + int state; +} ssbd_options[] = { + { "force-on", ARM64_SSBD_FORCE_ENABLE, }, + { "force-off", ARM64_SSBD_FORCE_DISABLE, }, + { "kernel", ARM64_SSBD_KERNEL, }, +}; + +static int __init ssbd_cfg(char *buf) +{ + int i; + + if (!buf || !buf[0]) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(ssbd_options); i++) { + int len = strlen(ssbd_options[i].str); + + if (strncmp(buf, ssbd_options[i].str, len)) + continue; + + ssbd_state = ssbd_options[i].state; + return 0; + } + + return -EINVAL; +} +early_param("ssbd", ssbd_cfg); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) @@ -274,44 +306,83 @@ static bool has_ssbd_mitigation(const st int scope) { struct arm_smccc_res res; - bool supported = true; + bool required = true; + s32 val;
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
- if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { + ssbd_state = ARM64_SSBD_UNKNOWN; return false; + }
- /* - * The probe function return value is either negative - * (unsupported or mitigated), positive (unaffected), or zero - * (requires mitigation). We only need to do anything in the - * last case. - */ switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break;
case PSCI_CONDUIT_SMC: arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break;
default: - supported = false; + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + } + + val = (s32)res.a0; + + switch (val) { + case SMCCC_RET_NOT_SUPPORTED: + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + + case SMCCC_RET_NOT_REQUIRED: + pr_info_once("%s mitigation not required\n", entry->desc); + ssbd_state = ARM64_SSBD_MITIGATED; + return false; + + case SMCCC_RET_SUCCESS: + required = true; + break; + + case 1: /* Mitigation not required on this CPU */ + required = false; + break; + + default: + WARN_ON(1); + return false; }
- if (supported) { - __this_cpu_write(arm64_ssbd_callback_required, 1); + switch (ssbd_state) { + case ARM64_SSBD_FORCE_DISABLE: + pr_info_once("%s disabled from command-line\n", entry->desc); + arm64_set_ssbd_mitigation(false); + required = false; + break; + + case ARM64_SSBD_KERNEL: + if (required) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + break; + + case ARM64_SSBD_FORCE_ENABLE: + pr_info_once("%s forced from command-line\n", entry->desc); arm64_set_ssbd_mitigation(true); + required = true; + break; + + default: + WARN_ON(1); + break; }
- return supported; + return required; } #endif /* CONFIG_ARM64_SSBD */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit c32e1736ca03904c03de0e4459a673be194f56fd upstream.
We're about to need the mitigation state in various parts of the kernel in order to do the right thing for userspace and guests.
Let's expose an accessor that will let other subsystems know about the state.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -268,6 +268,16 @@ static inline bool system_uses_ttbr0_pan #define ARM64_SSBD_FORCE_ENABLE 2 #define ARM64_SSBD_MITIGATED 3
+static inline int arm64_get_ssbd_state(void) +{ +#ifdef CONFIG_ARM64_SSBD + extern int ssbd_state; + return ssbd_state; +#else + return ARM64_SSBD_UNKNOWN; +#endif +} + #endif /* __ASSEMBLY__ */
#endif
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream.
In order to avoid checking arm64_ssbd_callback_required on each kernel entry/exit even if no mitigation is required, let's add yet another alternative that by default jumps over the mitigation, and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 14 ++++++++++++++ arch/arm64/kernel/entry.S | 3 +++ 2 files changed, 17 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -285,6 +285,20 @@ void __init arm64_update_smccc_conduit(s *updptr = cpu_to_le32(insn); }
+void __init arm64_enable_wa2_handling(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + BUG_ON(nr_inst != 1); + /* + * Only allow mitigation on EL1 entry/exit and guest + * ARCH_WORKAROUND_2 handling if the SSBD state allows it to + * be flipped. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_KERNEL) + *updptr = cpu_to_le32(aarch64_insn_gen_nop()); +} + static void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -142,6 +142,9 @@ alternative_else_nop_endif // to save/restore them if required. .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b \targ +alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 647d0519b53f440a55df163de21c52a8205431cc upstream.
On a system where firmware can dynamically change the state of the mitigation, the CPU will always come up with the mitigation enabled, including when coming back from suspend.
If the user has requested "no mitigation" via a command line option, let's enforce it by calling into the firmware again to disable it.
Similarily, for a resume from hibernate, the mitigation could have been disabled by the boot kernel. Let's ensure that it is set back on in that case.
Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 6 ++++++ arch/arm64/kernel/cpu_errata.c | 2 +- arch/arm64/kernel/hibernate.c | 11 +++++++++++ arch/arm64/kernel/suspend.c | 8 ++++++++ 4 files changed, 26 insertions(+), 1 deletion(-)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -278,6 +278,12 @@ static inline int arm64_get_ssbd_state(v #endif }
+#ifdef CONFIG_ARM64_SSBD +void arm64_set_ssbd_mitigation(bool state); +#else +static inline void arm64_set_ssbd_mitigation(bool state) {} +#endif + #endif /* __ASSEMBLY__ */
#endif --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -299,7 +299,7 @@ void __init arm64_enable_wa2_handling(st *updptr = cpu_to_le32(aarch64_insn_gen_nop()); }
-static void arm64_set_ssbd_mitigation(bool state) +void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -313,6 +313,17 @@ int swsusp_arch_suspend(void)
sleep_cpu = -EINVAL; __cpu_suspend_exit(); + + /* + * Just in case the boot kernel did turn the SSBD + * mitigation off behind our back, let's set the state + * to what we expect it to be. + */ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_ENABLE: + case ARM64_SSBD_KERNEL: + arm64_set_ssbd_mitigation(true); + } }
local_dbg_restore(flags); --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c @@ -62,6 +62,14 @@ void notrace __cpu_suspend_exit(void) */ if (hw_breakpoint_restore) hw_breakpoint_restore(cpu); + + /* + * On resume, firmware implementing dynamic mitigation will + * have turned the mitigation on. If the user has forcefully + * disabled it, make sure their wishes are obeyed. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) + arm64_set_ssbd_mitigation(false); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream.
In order to allow userspace to be mitigated on demand, let's introduce a new thread flag that prevents the mitigation from being turned off when exiting to userspace, and doesn't turn it on on entry into the kernel (with the assumption that the mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later patch.
Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/entry.S | 2 ++ 2 files changed, 3 insertions(+)
--- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -92,6 +92,7 @@ void arch_setup_new_exec(void); #define TIF_RESTORE_SIGMASK 20 #define TIF_SINGLESTEP 21 #define TIF_32BIT 22 /* 32bit process */ +#define TIF_SSBD 23 /* Wants SSB mitigation */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -147,6 +147,8 @@ alternative_cb arm64_enable_wa2_handling alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ + ldr \tmp2, [tsk, #TSK_TI_FLAGS] + tbnz \tmp2, #TIF_SSBD, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 9cdc0108baa8ef87c76ed834619886a46bd70cbe upstream.
If running on a system that performs dynamic SSBD mitigation, allow userspace to request the mitigation for itself. This is implemented as a prctl call, allowing the mitigation to be enabled or disabled at will for this particular thread.
Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/Makefile | 1 arch/arm64/kernel/ssbd.c | 108 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 arch/arm64/kernel/ssbd.c
--- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -54,6 +54,7 @@ arm64-obj-$(CONFIG_KEXEC) += machine_ke arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o +arm64-obj-$(CONFIG_ARM64_SSBD) += ssbd.o
ifeq ($(CONFIG_KVM),y) arm64-obj-$(CONFIG_HARDEN_BRANCH_PREDICTOR) += bpi.o --- /dev/null +++ b/arch/arm64/kernel/ssbd.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2018 ARM Ltd, All Rights Reserved. + */ + +#include <linux/errno.h> +#include <linux/prctl.h> +#include <linux/sched.h> +#include <linux/thread_info.h> + +#include <asm/cpufeature.h> + +/* + * prctl interface for SSBD + */ +static int ssbd_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + int state = arm64_get_ssbd_state(); + + /* Unsupported */ + if (state == ARM64_SSBD_UNKNOWN) + return -EINVAL; + + /* Treat the unaffected/mitigated state separately */ + if (state == ARM64_SSBD_MITIGATED) { + switch (ctrl) { + case PR_SPEC_ENABLE: + return -EPERM; + case PR_SPEC_DISABLE: + case PR_SPEC_FORCE_DISABLE: + return 0; + } + } + + /* + * Things are a bit backward here: the arm64 internal API + * *enables the mitigation* when the userspace API *disables + * speculation*. So much fun. + */ + switch (ctrl) { + case PR_SPEC_ENABLE: + /* If speculation is force disabled, enable is not allowed */ + if (state == ARM64_SSBD_FORCE_ENABLE || + task_spec_ssb_force_disable(task)) + return -EPERM; + task_clear_spec_ssb_disable(task); + clear_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_FORCE_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + task_set_spec_ssb_force_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + default: + return -ERANGE; + } + + return 0; +} + +int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, + unsigned long ctrl) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_set(task, ctrl); + default: + return -ENODEV; + } +} + +static int ssbd_prctl_get(struct task_struct *task) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_UNKNOWN: + return -EINVAL; + case ARM64_SSBD_FORCE_ENABLE: + return PR_SPEC_DISABLE; + case ARM64_SSBD_KERNEL: + if (task_spec_ssb_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ssb_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + case ARM64_SSBD_FORCE_DISABLE: + return PR_SPEC_ENABLE; + default: + return PR_SPEC_NOT_AFFECTED; + } +} + +int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_get(task); + default: + return -ENODEV; + } +}
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 85478bab409171de501b719971fd25a3d5d639f9 upstream.
As we're going to require to access per-cpu variables at EL2, let's craft the minimum set of accessors required to implement reading a per-cpu variable, relying on tpidr_el2 to contain the per-cpu offset.
Reviewed-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_asm.h | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)
--- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -69,14 +69,37 @@ extern u32 __init_stage2_translation(voi
extern void __qcom_hyp_sanitize_btac_predictors(void);
+/* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ +#define __hyp_this_cpu_ptr(sym) \ + ({ \ + void *__ptr = hyp_symbol_addr(sym); \ + __ptr += read_sysreg(tpidr_el2); \ + (typeof(&sym))__ptr; \ + }) + +#define __hyp_this_cpu_read(sym) \ + ({ \ + *__hyp_this_cpu_ptr(sym); \ + }) + #else /* __ASSEMBLY__ */
-.macro get_host_ctxt reg, tmp - adr_l \reg, kvm_host_cpu_state +.macro hyp_adr_this_cpu reg, sym, tmp + adr_l \reg, \sym mrs \tmp, tpidr_el2 add \reg, \reg, \tmp .endm
+.macro hyp_ldr_this_cpu reg, sym, tmp + adr_l \reg, \sym + mrs \tmp, tpidr_el2 + ldr \reg, [\reg, \tmp] +.endm + +.macro get_host_ctxt reg, tmp + hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp +.endm + .macro get_vcpu_ptr vcpu, ctxt get_host_ctxt \ctxt, \vcpu ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream.
In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure.
Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it.
Reviewed-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_mmu.h | 5 +++++ arch/arm64/include/asm/kvm_asm.h | 3 +++ arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++++ arch/arm64/kvm/hyp/switch.c | 38 ++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/arm.c | 4 ++++ 6 files changed, 77 insertions(+)
--- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -254,6 +254,11 @@ static inline int kvm_map_vectors(void) return 0; }
+static inline int hyp_map_aux_data(void) +{ + return 0; +} + #endif /* !__ASSEMBLY__ */
#endif /* __ARM_KVM_MMU_H__ */ --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,9 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+#define VCPU_WORKAROUND_2_FLAG_SHIFT 0 +#define VCPU_WORKAROUND_2_FLAG (_AC(1, UL) << VCPU_WORKAROUND_2_FLAG_SHIFT) + /* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -210,6 +210,9 @@ struct kvm_vcpu_arch { /* Exception Information */ struct kvm_vcpu_fault_info fault;
+ /* State of various workarounds, see kvm_asm.h for bit assignment */ + u64 workaround_flags; + /* Guest debug state */ u64 debug_flags;
--- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -383,5 +383,29 @@ static inline int kvm_map_vectors(void) } #endif
+#ifdef CONFIG_ARM64_SSBD +DECLARE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + +static inline int hyp_map_aux_data(void) +{ + int cpu, err; + + for_each_possible_cpu(cpu) { + u64 *ptr; + + ptr = per_cpu_ptr(&arm64_ssbd_callback_required, cpu); + err = create_hyp_mappings(ptr, ptr + 1, PAGE_HYP); + if (err) + return err; + } + return 0; +} +#else +static inline int hyp_map_aux_data(void) +{ + return 0; +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -15,6 +15,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/types.h> #include <linux/jump_label.h> #include <uapi/linux/psci.h> @@ -281,6 +282,39 @@ static void __hyp_text __skip_instr(stru write_sysreg_el2(*vcpu_pc(vcpu), elr); }
+static inline bool __hyp_text __needs_ssbd_off(struct kvm_vcpu *vcpu) +{ + if (!cpus_have_const_cap(ARM64_SSBD)) + return false; + + return !(vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG); +} + +static void __hyp_text __set_guest_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * The host runs with the workaround always present. If the + * guest wants it disabled, so be it... + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 0, NULL); +#endif +} + +static void __hyp_text __set_host_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * If the guest has disabled the workaround, bring it back on. + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 1, NULL); +#endif +} + int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) { struct kvm_cpu_context *host_ctxt; @@ -311,6 +345,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm __sysreg_restore_guest_state(guest_ctxt); __debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
+ __set_guest_arch_workaround_state(vcpu); + /* Jump in the fire! */ again: exit_code = __guest_enter(vcpu, host_ctxt); @@ -367,6 +403,8 @@ again: /* 0 falls through to be handled out of EL2 */ }
+ __set_host_arch_workaround_state(vcpu); + if (cpus_have_const_cap(ARM64_HARDEN_BP_POST_GUEST_EXIT)) { u32 midr = read_cpuid_id();
--- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1411,6 +1411,10 @@ static int init_hyp_mode(void) } }
+ err = hyp_map_aux_data(); + if (err) + kvm_err("Cannot map host auxilary data: %d\n", err); + return 0;
out_err:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit b4f18c063a13dfb33e3a63fe1844823e19c2265e upstream.
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3, add a small(-ish) sequence to handle it at EL2. Special care must be taken to track the state of the guest itself by updating the workaround flags. We also rely on patching to enable calls into the firmware.
Note that since we need to execute branches, this always executes after the Spectre-v2 mitigation has been applied.
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/hyp-entry.S | 38 +++++++++++++++++++++++++++++++++++++- 2 files changed, 38 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -131,6 +131,7 @@ int main(void) BLANK(); #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); + DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags)); DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -106,8 +106,44 @@ el1_hvc_guest: */ ldr x1, [sp] // Guest's x0 eor w1, w1, #ARM_SMCCC_ARCH_WORKAROUND_1 + cbz w1, wa_epilogue + + /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */ + eor w1, w1, #(ARM_SMCCC_ARCH_WORKAROUND_1 ^ \ + ARM_SMCCC_ARCH_WORKAROUND_2) cbnz w1, el1_trap - mov x0, x1 + +#ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b wa2_end +alternative_cb_end + get_vcpu_ptr x2, x0 + ldr x0, [x2, #VCPU_WORKAROUND_FLAGS] + + // Sanitize the argument and update the guest flags + ldr x1, [sp, #8] // Guest's x1 + clz w1, w1 // Murphy's device: + lsr w1, w1, #5 // w1 = !!w1 without using + eor w1, w1, #1 // the flags... + bfi x0, x1, #VCPU_WORKAROUND_2_FLAG_SHIFT, #1 + str x0, [x2, #VCPU_WORKAROUND_FLAGS] + + /* Check that we actually need to perform the call */ + hyp_ldr_this_cpu x0, arm64_ssbd_callback_required, x2 + cbz x0, wa2_end + + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + smc #0 + + /* Don't leak data from the SMC call */ + mov x3, xzr +wa2_end: + mov x2, xzr + mov x1, xzr +#endif + +wa_epilogue: + mov x0, xzr add sp, sp, #16 eret
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream.
Now that all our infrastructure is in place, let's expose the availability of ARCH_WORKAROUND_2 to guests. We take this opportunity to tidy up a couple of SMCCC constants.
Acked-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_host.h | 12 ++++++++++++ arch/arm64/include/asm/kvm_host.h | 23 +++++++++++++++++++++++ arch/arm64/kvm/reset.c | 4 ++++ virt/kvm/arm/psci.c | 18 ++++++++++++++++-- 4 files changed, 55 insertions(+), 2 deletions(-)
--- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -302,4 +302,16 @@ static inline bool kvm_arm_harden_branch return false; }
+#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + /* No way to detect it yet, pretend it is not there. */ + return KVM_SSBD_UNKNOWN; +} + #endif /* __ARM_KVM_HOST_H__ */ --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -412,4 +412,27 @@ static inline bool kvm_arm_harden_branch return cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR); }
+#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_DISABLE: + return KVM_SSBD_FORCE_DISABLE; + case ARM64_SSBD_KERNEL: + return KVM_SSBD_KERNEL; + case ARM64_SSBD_FORCE_ENABLE: + return KVM_SSBD_FORCE_ENABLE; + case ARM64_SSBD_MITIGATED: + return KVM_SSBD_MITIGATED; + case ARM64_SSBD_UNKNOWN: + default: + return KVM_SSBD_UNKNOWN; + } +} + #endif /* __ARM64_KVM_HOST_H__ */ --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -122,6 +122,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu /* Reset PMU */ kvm_pmu_vcpu_reset(vcpu);
+ /* Default workaround setup is enabled (if supported) */ + if (kvm_arm_have_ssbd() == KVM_SSBD_KERNEL) + vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; + /* Reset timer */ return kvm_timer_vcpu_reset(vcpu); } --- a/virt/kvm/arm/psci.c +++ b/virt/kvm/arm/psci.c @@ -405,7 +405,7 @@ static int kvm_psci_call(struct kvm_vcpu int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - u32 val = PSCI_RET_NOT_SUPPORTED; + u32 val = SMCCC_RET_NOT_SUPPORTED; u32 feature;
switch (func_id) { @@ -417,7 +417,21 @@ int kvm_hvc_call_handler(struct kvm_vcpu switch(feature) { case ARM_SMCCC_ARCH_WORKAROUND_1: if (kvm_arm_harden_branch_predictor()) - val = 0; + val = SMCCC_RET_SUCCESS; + break; + case ARM_SMCCC_ARCH_WORKAROUND_2: + switch (kvm_arm_have_ssbd()) { + case KVM_SSBD_FORCE_DISABLE: + case KVM_SSBD_UNKNOWN: + break; + case KVM_SSBD_KERNEL: + val = SMCCC_RET_SUCCESS; + break; + case KVM_SSBD_FORCE_ENABLE: + case KVM_SSBD_MITIGATED: + val = SMCCC_RET_NOT_REQUIRED; + break; + } break; } break;
On 20 July 2018 at 17:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.57 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:50 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.57-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.14.57-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: ec86d5e19e14ae1ee40cf8f41c4b960472bccf89 git describe: v4.14.56-93-gec86d5e19e14 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.56-93...
No regressions (compared to build v4.14.56-93-ga3636f4156e6)
Ran 16439 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 07/20/2018 05:13 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.57 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:50 UTC 2018. Anything received after that time might be too late.
Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 174 pass: 174 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org