This is the start of the stable review cycle for the 4.17.9 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.17.9-rc1
Daniel Borkmann daniel@iogearbox.net bpf: undo prog rejection on read-only lock failure
Daniel Borkmann daniel@iogearbox.net bpf, arm32: fix to use bpf_jit_binary_lock_ro api
Eric Dumazet edumazet@google.com bpf: enforce correct alignment for instructions
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add HYP per-cpu accessors
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Add prctl interface for per-thread mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Introduce thread flag to control userspace mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Restore mitigation status on CPU resume
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Add global mitigation state accessor
Marc Zyngier marc.zyngier@arm.com arm64: Add 'ssbd' command-line option
Marc Zyngier marc.zyngier@arm.com arm64: Add ARCH_WORKAROUND_2 probing
Marc Zyngier marc.zyngier@arm.com arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
Marc Zyngier marc.zyngier@arm.com arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
Marc Zyngier marc.zyngier@arm.com arm/arm64: smccc: Add SMCCC-specific return codes
Cong Wang xiyou.wangcong@gmail.com ipvs: initialize tbl->entries in ip_vs_lblc_init_svc()
Cong Wang xiyou.wangcong@gmail.com ipvs: initialize tbl->entries after allocation
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
Daniel Borkmann daniel@iogearbox.net bpf: don't leave partial mangled prog in jit_subprogs error path
John Fastabend john.fastabend@gmail.com bpf: sockmap, consume_skb in close path
John Fastabend john.fastabend@gmail.com bpf: sockmap, fix crash when ipv6 sock is added
Jens Axboe axboe@kernel.dk block: don't use blocking queue entered for recursive bio submits
Santosh Shilimkar santosh.shilimkar@oracle.com rds: avoid unenecessary cong_update in loop transport
Daniel Borkmann daniel@iogearbox.net bpf: reject any prog that failed read-only lock
Jan Kara jack@suse.cz bdi: Fix another oops in wb_workfn()
Florian Westphal fw@strlen.de netfilter: ipv6: nf_defrag: drop skb dst before queueing
Willem de Bruijn willemb@google.com nsh: set mac len based on inner packet
Tomas Bortoli tomasbortoli@gmail.com autofs: fix slab out of bounds read in getname_kernel()
Dave Watson davejwatson@fb.com tls: Stricter error checking in zerocopy sendmsg path
Eric Biggers ebiggers@google.com KEYS: DNS: fix parsing multiple options
Eric Biggers ebiggers@google.com reiserfs: fix buffer overflow with long warning messages
Florian Westphal fw@strlen.de netfilter: ebtables: reject non-bridge targets
Dexuan Cui decui@microsoft.com PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
Stephan Mueller smueller@chronox.de crypto: af_alg - Initialize sg_num_bytes in error code path
Stefan Wahren stefan.wahren@i2se.com net: lan78xx: Fix race in tx pending skb size calculation
Ping-Ke Shih pkshih@realtek.com rtlwifi: rtl8821ae: fix firmware is not ready to run
Ping-Ke Shih pkshih@realtek.com rtlwifi: Fix kernel Oops "Fw download fail!!"
Gustavo A. R. Silva gustavo@embeddedor.com net: cxgb3_main: fix potential Spectre v1
Janakarajan Natarajan Janakarajan.Natarajan@amd.com x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
Jesper Dangaard Brouer brouer@redhat.com virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing
Bert Kenward bkenward@solarflare.com sfc: correctly initialise filter rwsem for farch
Julian Wiedmann jwi@linux.ibm.com s390/qeth: fix race when setting MAC address
Vasily Gorbik gor@linux.ibm.com s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
Julian Wiedmann jwi@linux.ibm.com Revert "s390/qeth: use Read device to query hypervisor for MAC"
Or Gerlitz ogerlitz@mellanox.com IB/mlx5: Avoid dealing with vport representors if not being e-switch manager
Jesper Dangaard Brouer brouer@redhat.com i40e: split XDP_TX tail and XDP_REDIRECT map flushing
Govindarajulu Varadarajan gvaradar@cisco.com enic: do not overwrite error code
Ross Lagerwall ross.lagerwall@citrix.com xen-netfront: Update features after registering netdev
Ross Lagerwall ross.lagerwall@citrix.com xen-netfront: Fix mismatched rtnl_unlock
John Hurley john.hurley@netronome.com nfp: reject binding to shared blocks
Cong Wang xiyou.wangcong@gmail.com net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
Alexandre Belloni alexandre.belloni@bootlin.com net: macb: initialize bp->queues[0].bp for at91rm9200
Pieter Jansen van Vuuren pieter.jansenvanvuuren@netronome.com nfp: flower: fix mpls ether type detection
Wei Yongjun weiyongjun1@huawei.com hinic: reset irq affinity before freeing irq
Claudio Imbrenda imbrenda@linux.vnet.ibm.com VSOCK: fix loopback on big-endian systems
Jason Wang jasowang@redhat.com vhost_net: validate sock before trying to put its fd
Ilpo Järvinen ilpo.jarvinen@helsinki.fi tcp: prevent bogus FRTO undos with non-SACK flows
Yuchung Cheng ycheng@google.com tcp: fix Fast Open key endianness
Doron Roberts-Kedes doronrk@fb.com strparser: Remove early eaten to fix full tcp receive buffer stall
Bhadram Varka vbhadram@nvidia.com stmmac: fix DMA channel hang in half-duplex mode
Julian Wiedmann jwi@linux.ibm.com s390/qeth: don't clobber buffer on async TX completion
Jiri Slaby jslaby@suse.cz r8152: napi hangup fix after disconnect
Aleksander Morgado aleksander@aleksander.es qmi_wwan: add support for the Dell Wireless 5821e module
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Limit msix vectors in kdump kernel to the minimum required count.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Fix use of incorrect size in memcpy call.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Fix setting of incorrect eswitch mode.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qede: Adverstise software timestamp caps when PHC is not available.
David Ahern dsahern@gmail.com net/tcp: Fix socket lookups with SO_BINDTODEVICE
Eric Dumazet edumazet@google.com net: sungem: fix rx checksum support
Konstantin Khlebnikov khlebnikov@yandex-team.ru net_sched: blackhole: tell upper qdisc about dropped packets
Davide Caratti dcaratti@redhat.com net/sched: act_ife: preserve the action control in case of error
Davide Caratti dcaratti@redhat.com net/sched: act_ife: fix recursive lock and idr leak
Eric Dumazet edumazet@google.com net/packet: fix use-after-free
Antoine Tenart antoine.tenart@bootlin.com net: mvneta: fix the Rx desc DMA address in the Rx path
Shay Agroskin shayag@mellanox.com net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
Eli Cohen eli@mellanox.com net/mlx5: Fix required capability for manipulating MPFS
Alex Vesker valex@mellanox.com net/mlx5: Fix incorrect raw command length parsing
Alex Vesker valex@mellanox.com net/mlx5: Fix command interface race in polling mode
Or Gerlitz ogerlitz@mellanox.com net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
Or Gerlitz ogerlitz@mellanox.com net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
Or Gerlitz ogerlitz@mellanox.com net/mlx5e: Avoid dealing with vport representors if not being e-switch manager
Harini Katakam harini.katakam@xilinx.com net: macb: Fix ptp time adjustment for large negative delta
Sabrina Dubroca sd@queasysnail.net net: fix use-after-free in GRO with ESP
Eric Dumazet edumazet@google.com net: dccp: switch rx_tstamp_last_feedback to monotonic clock
Eric Dumazet edumazet@google.com net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
Jesper Dangaard Brouer brouer@redhat.com ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
Xin Long lucien.xin@gmail.com ipvlan: fix IFLA_MTU ignored on NEWLINK
Eric Biggers ebiggers@google.com ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
Stephen Hemminger sthemmin@microsoft.com hv_netvsc: split sub-channel setup into async and sync
Gustavo A. R. Silva gustavo@embeddedor.com atm: zatm: Fix potential Spectre v1
David Woodhouse dwmw2@infradead.org atm: Preserve value of skb->truesize when accounting to vcc
Sabrina Dubroca sd@queasysnail.net alx: take rtnl before calling __alx_open from resume
Sean Wang sean.wang@mediatek.com pinctrl: mt7622: fix a kernel panic when gpio-hog is being applied
Sean Wang sean.wang@mediatek.com pinctrl: mt7622: stop using the deprecated pinctrl_add_gpio_range
Sean Wang sean.wang@mediatek.com pinctrl: mt7622: fix error path on failing at groups building
Niklas Söderlund niklas.soderlund+renesas@ragnatech.se pinctrl: sh-pfc: r8a77970: remove SH_PFC_PIN_CFG_DRIVE_STRENGTH flag
Nick Desaulniers ndesaulniers@google.com x86/paravirt: Make native_save_fl() extern inline
H. Peter Anvin hpa@linux.intel.com x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
Nick Desaulniers ndesaulniers@google.com compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 17 ++ Makefile | 4 +- arch/arm/include/asm/kvm_host.h | 12 ++ arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm/net/bpf_jit_32.c | 2 +- arch/arm64/Kconfig | 9 ++ arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/cpufeature.h | 22 +++ arch/arm64/include/asm/kvm_asm.h | 30 +++- arch/arm64/include/asm/kvm_host.h | 26 +++ arch/arm64/include/asm/kvm_mmu.h | 24 +++ arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kernel/cpu_errata.c | 180 +++++++++++++++++++++ arch/arm64/kernel/entry.S | 30 ++++ arch/arm64/kernel/hibernate.c | 11 ++ arch/arm64/kernel/ssbd.c | 110 +++++++++++++ arch/arm64/kernel/suspend.c | 8 + arch/arm64/kvm/hyp/hyp-entry.S | 38 ++++- arch/arm64/kvm/hyp/switch.c | 42 +++++ arch/arm64/kvm/reset.c | 4 + arch/x86/include/asm/asm.h | 59 +++++++ arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 +++ arch/x86/kvm/Kconfig | 2 +- block/blk-core.c | 4 +- block/blk-merge.c | 10 ++ crypto/af_alg.c | 4 +- drivers/atm/zatm.c | 2 + drivers/infiniband/hw/mlx5/main.c | 2 +- drivers/net/ethernet/atheros/alx/main.c | 8 +- drivers/net/ethernet/cadence/macb_main.c | 2 + drivers/net/ethernet/cadence/macb_ptp.c | 5 +- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 + drivers/net/ethernet/cisco/enic/enic_main.c | 9 +- drivers/net/ethernet/huawei/hinic/hinic_rx.c | 1 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +-- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 +-- drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 5 +- drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c | 9 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 7 +- drivers/net/ethernet/netronome/nfp/bpf/main.c | 3 + drivers/net/ethernet/netronome/nfp/flower/match.c | 14 ++ .../net/ethernet/netronome/nfp/flower/offload.c | 11 ++ drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_main.c | 8 + drivers/net/ethernet/qlogic/qed/qed_sriov.c | 19 ++- drivers/net/ethernet/qlogic/qede/qede_ptp.c | 10 +- drivers/net/ethernet/sfc/farch.c | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++ drivers/net/ethernet/sun/sungem.c | 22 +-- drivers/net/geneve.c | 2 +- drivers/net/hyperv/hyperv_net.h | 2 +- drivers/net/hyperv/netvsc.c | 37 ++++- drivers/net/hyperv/netvsc_drv.c | 17 +- drivers/net/hyperv/rndis_filter.c | 61 ++----- drivers/net/ipvlan/ipvlan_main.c | 3 +- drivers/net/usb/lan78xx.c | 5 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/usb/r8152.c | 3 +- drivers/net/virtio_net.c | 30 ++-- drivers/net/vxlan.c | 4 +- drivers/net/wireless/realtek/rtlwifi/base.c | 17 +- drivers/net/wireless/realtek/rtlwifi/base.h | 2 +- drivers/net/wireless/realtek/rtlwifi/core.c | 3 +- drivers/net/wireless/realtek/rtlwifi/pci.c | 2 +- drivers/net/wireless/realtek/rtlwifi/ps.c | 4 +- drivers/net/wireless/realtek/rtlwifi/usb.c | 2 +- drivers/net/xen-netfront.c | 11 +- drivers/pci/host/pci-hyperv.c | 8 +- drivers/pinctrl/mediatek/pinctrl-mt7622.c | 25 ++- drivers/pinctrl/sh-pfc/pfc-r8a77970.c | 12 +- drivers/s390/net/qeth_core.h | 11 ++ drivers/s390/net/qeth_core_main.c | 24 ++- drivers/s390/net/qeth_l2_main.c | 19 ++- drivers/vhost/net.c | 3 +- fs/autofs4/dev-ioctl.c | 22 +-- fs/reiserfs/prints.c | 141 +++++++++------- include/linux/arm-smccc.h | 10 ++ include/linux/atmdev.h | 15 ++ include/linux/backing-dev-defs.h | 2 +- include/linux/blk_types.h | 2 + include/linux/compiler-gcc.h | 29 +++- include/linux/filter.h | 42 ++--- include/linux/mlx5/eswitch.h | 2 + include/linux/mlx5/mlx5_ifc.h | 2 +- include/linux/netdevice.h | 20 +++ include/net/pkt_cls.h | 5 + kernel/bpf/core.c | 25 ++- kernel/bpf/sockmap.c | 63 ++++++-- kernel/bpf/syscall.c | 4 +- kernel/bpf/verifier.c | 11 +- mm/backing-dev.c | 20 +-- net/8021q/vlan.c | 2 +- net/atm/br2684.c | 3 +- net/atm/clip.c | 3 +- net/atm/common.c | 3 +- net/atm/lec.c | 3 +- net/atm/mpc.c | 3 +- net/atm/pppoatm.c | 3 +- net/atm/raw.c | 4 +- net/bridge/netfilter/ebtables.c | 13 ++ net/core/dev_ioctl.c | 11 +- net/dccp/ccids/ccid3.c | 16 +- net/dns_resolver/dns_key.c | 28 ++-- net/ipv4/fou.c | 4 +- net/ipv4/gre_offload.c | 2 +- net/ipv4/inet_hashtables.c | 4 +- net/ipv4/sysctl_net_ipv4.c | 18 ++- net/ipv4/tcp_input.c | 9 ++ net/ipv4/udp_offload.c | 2 +- net/ipv6/inet6_hashtables.c | 4 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 + net/ipv6/seg6_hmac.c | 2 +- net/netfilter/ipvs/ip_vs_lblc.c | 1 + net/netfilter/ipvs/ip_vs_lblcr.c | 1 + net/nfc/llcp_commands.c | 9 +- net/nsh/nsh.c | 2 +- net/packet/af_packet.c | 16 +- net/rds/loop.c | 1 + net/rds/rds.h | 5 + net/rds/recv.c | 5 + net/sched/act_ife.c | 12 +- net/sched/sch_blackhole.c | 2 +- net/strparser/strparser.c | 17 +- net/tls/tls_sw.c | 2 +- net/vmw_vsock/virtio_transport.c | 2 +- virt/kvm/arm/arm.c | 4 + virt/kvm/arm/psci.c | 18 ++- 140 files changed, 1448 insertions(+), 450 deletions(-)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Desaulniers ndesaulniers@google.com
commit d03db2bc26f0e4a6849ad649a09c9c73fccdc656 upstream.
Functions marked extern inline do not emit an externally visible function when the gnu89 C standard is used. Some KBUILD Makefiles overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without an explicit C standard specified, the default is gnu11. Since c99, the semantics of extern inline have changed such that an externally visible function is always emitted. This can lead to multiple definition errors of extern inline functions at link time of compilation units whose build files have removed an explicit C standard compiler flag for users of GCC 5.1+ or Clang.
Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: H. Peter Anvin hpa@zytor.com Suggested-by: Joe Perches joe@perches.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: sedat.dilek@gmail.com Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler-gcc.h | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
--- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,25 +66,40 @@ #endif
/* + * Feature detection for gnu_inline (gnu89 extern inline semantics). Either + * __GNUC_STDC_INLINE__ is defined (not using gnu89 extern inline semantics, + * and we opt in to the gnu89 semantics), or __GNUC_STDC_INLINE__ is not + * defined so the gnu89 semantics are the default. + */ +#ifdef __GNUC_STDC_INLINE__ +# define __gnu_inline __attribute__((gnu_inline)) +#else +# define __gnu_inline +#endif + +/* * Force always-inline if the user requests it so via the .config, * or if gcc is too old. * GCC does not warn about unused static inline functions for * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well by using "unused" * function attribute, which is redundant but not harmful for gcc. + * Prefer gnu_inline, so that extern inline functions do not emit an + * externally visible function. This makes extern inline behave as per gnu89 + * semantics rather than c99. This prevents multiple symbol definition errors + * of extern inline functions at link time. + * A lot of inline functions can cause havoc with function tracing. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline,unused)) notrace -#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace -#define __inline __inline __attribute__((always_inline,unused)) notrace +#define inline \ + inline __attribute__((always_inline, unused)) notrace __gnu_inline #else -/* A lot of inline functions can cause havoc with function tracing */ -#define inline inline __attribute__((unused)) notrace -#define __inline__ __inline__ __attribute__((unused)) notrace -#define __inline __inline __attribute__((unused)) notrace +#define inline inline __attribute__((unused)) notrace __gnu_inline #endif
+#define __inline__ inline +#define __inline inline #define __always_inline inline __attribute__((always_inline)) #define noinline __attribute__((noinline))
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: H. Peter Anvin hpa@linux.intel.com
commit 0e2e160033283e20f688d8bad5b89460cc5bfcc4 upstream.
i386 and x86-64 uses different registers for arguments; make them available so we don't have to #ifdef in the actual code.
Native size and specified size (q, l, w, b) versions are provided.
Signed-off-by: H. Peter Anvin hpa@linux.intel.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Sedat Dilek sedat.dilek@gmail.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-3-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/asm.h | 59 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -46,6 +46,65 @@ #define _ASM_SI __ASM_REG(si) #define _ASM_DI __ASM_REG(di)
+#ifndef __x86_64__ +/* 32 bit */ + +#define _ASM_ARG1 _ASM_AX +#define _ASM_ARG2 _ASM_DX +#define _ASM_ARG3 _ASM_CX + +#define _ASM_ARG1L eax +#define _ASM_ARG2L edx +#define _ASM_ARG3L ecx + +#define _ASM_ARG1W ax +#define _ASM_ARG2W dx +#define _ASM_ARG3W cx + +#define _ASM_ARG1B al +#define _ASM_ARG2B dl +#define _ASM_ARG3B cl + +#else +/* 64 bit */ + +#define _ASM_ARG1 _ASM_DI +#define _ASM_ARG2 _ASM_SI +#define _ASM_ARG3 _ASM_DX +#define _ASM_ARG4 _ASM_CX +#define _ASM_ARG5 r8 +#define _ASM_ARG6 r9 + +#define _ASM_ARG1Q rdi +#define _ASM_ARG2Q rsi +#define _ASM_ARG3Q rdx +#define _ASM_ARG4Q rcx +#define _ASM_ARG5Q r8 +#define _ASM_ARG6Q r9 + +#define _ASM_ARG1L edi +#define _ASM_ARG2L esi +#define _ASM_ARG3L edx +#define _ASM_ARG4L ecx +#define _ASM_ARG5L r8d +#define _ASM_ARG6L r9d + +#define _ASM_ARG1W di +#define _ASM_ARG2W si +#define _ASM_ARG3W dx +#define _ASM_ARG4W cx +#define _ASM_ARG5W r8w +#define _ASM_ARG6W r9w + +#define _ASM_ARG1B dil +#define _ASM_ARG2B sil +#define _ASM_ARG3B dl +#define _ASM_ARG4B cl +#define _ASM_ARG5B r8b +#define _ASM_ARG6B r9b + +#endif + /* * Macros to generate condition code outputs from inline assembly, * The output operand must be type "bool".
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Desaulniers ndesaulniers@google.com
commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream.
native_save_fl() is marked static inline, but by using it as a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.
paravirt's use of native_save_fl() also requires that no GPRs other than %rax are clobbered.
Compilers have different heuristics which they use to emit stack guard code, the emittance of which can break paravirt's callee saved assumption by clobbering %rcx.
Marking a function definition extern inline means that if this version cannot be inlined, then the out-of-line version will be preferred. By having the out-of-line version be implemented in assembly, it cannot be instrumented with a stack protector, which might violate custom calling conventions that code like paravirt rely on.
The semantics of extern inline has changed since gnu89. This means that folks using GCC versions >= 5.1 may see symbol redefinition errors at link time for subdirs that override KBUILD_CFLAGS (making the C standard used implicit) regardless of this patch. This has been cleaned up earlier in the patch set, but is left as a note in the commit message for future travelers.
Reports: https://lkml.org/lkml/2018/5/7/534 https://github.com/ClangBuiltLinux/linux/issues/16
Discussion: https://bugs.llvm.org/show_bug.cgi?id=37512 https://lkml.org/lkml/2018/5/24/1371
Thanks to the many folks that participated in the discussion.
Debugged-by: Alistair Strachan astrachan@google.com Debugged-by: Matthias Kaehlcke mka@chromium.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: H. Peter Anvin hpa@zytor.com Suggested-by: Tom Stellar tstellar@redhat.com Reported-by: Sedat Dilek sedat.dilek@gmail.com Tested-by: Sedat Dilek sedat.dilek@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -13,7 +13,7 @@ * Interrupt control: */
-static inline unsigned long native_save_fl(void) +extern inline unsigned long native_save_fl(void) { unsigned long flags;
--- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -61,6 +61,7 @@ obj-y += alternative.o i8253.o hw_brea obj-y += tsc.o tsc_msr.o io_delay.o rtc.o obj-y += pci-iommu_table.o obj-y += resource.o +obj-y += irqflags.o
obj-y += process.o obj-y += fpu/ --- /dev/null +++ b/arch/x86/kernel/irqflags.S @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include <asm/asm.h> +#include <asm/export.h> +#include <linux/linkage.h> + +/* + * unsigned long native_save_fl(void) + */ +ENTRY(native_save_fl) + pushf + pop %_ASM_AX + ret +ENDPROC(native_save_fl) +EXPORT_SYMBOL(native_save_fl) + +/* + * void native_restore_fl(unsigned long flags) + * %eax/%rdi: flags + */ +ENTRY(native_restore_fl) + push %_ASM_ARG1 + popf + ret +ENDPROC(native_restore_fl) +EXPORT_SYMBOL(native_restore_fl)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Wang sean.wang@mediatek.com
commit fafa35cce34ba4c4f6fd7f1026c038de0a2884af upstream.
It should be to return an error code when failing at groups building.
Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang sean.wang@mediatek.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pinctrl/mediatek/pinctrl-mt7622.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pinctrl/mediatek/pinctrl-mt7622.c +++ b/drivers/pinctrl/mediatek/pinctrl-mt7622.c @@ -1561,7 +1561,7 @@ static int mtk_pinctrl_probe(struct plat err = mtk_build_groups(hw); if (err) { dev_err(&pdev->dev, "Failed to build groups\n"); - return 0; + return err; }
/* Setup functions descriptions per SoC types */
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Wang sean.wang@mediatek.com
commit de227ed7965d06dcfcd06376e03c107004a4881c upstream.
If the pinctrl node has the gpio-ranges property, the range will be added by the gpio core and doesn't need to be added by the pinctrl driver.
But for keeping backward compatibility, an explicit pinctrl_add_gpio_range is still needed to be called when there is a missing gpio-ranges in pinctrl node in old dts files.
Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang sean.wang@mediatek.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pinctrl/mediatek/pinctrl-mt7622.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
--- a/drivers/pinctrl/mediatek/pinctrl-mt7622.c +++ b/drivers/pinctrl/mediatek/pinctrl-mt7622.c @@ -1463,11 +1463,20 @@ static int mtk_build_gpiochip(struct mtk if (ret < 0) return ret;
- ret = gpiochip_add_pin_range(chip, dev_name(hw->dev), 0, 0, - chip->ngpio); - if (ret < 0) { - gpiochip_remove(chip); - return ret; + /* Just for backward compatible for these old pinctrl nodes without + * "gpio-ranges" property. Otherwise, called directly from a + * DeviceTree-supported pinctrl driver is DEPRECATED. + * Please see Section 2.1 of + * Documentation/devicetree/bindings/gpio/gpio.txt on how to + * bind pinctrl and gpio drivers via the "gpio-ranges" property. + */ + if (!of_find_property(np, "gpio-ranges", NULL)) { + ret = gpiochip_add_pin_range(chip, dev_name(hw->dev), 0, 0, + chip->ngpio); + if (ret < 0) { + gpiochip_remove(chip); + return ret; + } }
return 0;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Wang sean.wang@mediatek.com
commit 5b1c4bf2519efc2328d252fd7697bdfb306f10f3 upstream.
When we are explicitly using GPIO hogging mechanism in the pinctrl node, such as:
&pio { line_input { gpio-hog; gpios = <95 0>, <96 0>, <97 0>; input; }; };
A kernel panic happens at dereferencing a NULL pointer: In this case, the drvdata is still not setup properly yet when it is being accessed.
A better solution for fixing up this issue should be we should obtain the private data from struct gpio_chip using a specific gpiochip_get_data instead of a generic dev_get_drvdata.
[ 0.249424] Unable to handle kernel NULL pointer dereference at virtual address 000000c8 [ 0.257818] Mem abort info: [ 0.260704] ESR = 0x96000005 [ 0.263869] Exception class = DABT (current EL), IL = 32 bits [ 0.270011] SET = 0, FnV = 0 [ 0.273167] EA = 0, S1PTW = 0 [ 0.276421] Data abort info: [ 0.279398] ISV = 0, ISS = 0x00000005 [ 0.283372] CM = 0, WnR = 0 [ 0.286440] [00000000000000c8] user address but active_mm is swapper [ 0.293027] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 0.298795] Modules linked in: [ 0.301958] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1+ #389 [ 0.308716] Hardware name: MediaTek MT7622 RFB1 board (DT) [ 0.314396] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 0.319362] pc : mtk_hw_pin_field_get+0x28/0x118 [ 0.324140] lr : mtk_hw_set_value+0x30/0x104 [ 0.328557] sp : ffffff800801b6d0 [ 0.331983] x29: ffffff800801b6d0 x28: ffffff80086b7970 [ 0.337484] x27: 0000000000000000 x26: ffffff80087b8000 [ 0.342986] x25: 0000000000000000 x24: ffffffc00324c230 [ 0.348487] x23: 0000000000000003 x22: 0000000000000000 [ 0.353988] x21: ffffff80087b8000 x20: 0000000000000000 [ 0.359489] x19: 0000000000000054 x18: 00000000fffff7c0 [ 0.364990] x17: 0000000000006300 x16: 000000000000003f [ 0.370492] x15: 000000000000000e x14: ffffffffffffffff [ 0.375993] x13: 0000000000000000 x12: 0000000000000020 [ 0.381494] x11: 0000000000000006 x10: 0101010101010101 [ 0.386995] x9 : fffffffffffffffa x8 : 0000000000000007 [ 0.392496] x7 : ffffff80085d63f8 x6 : 0000000000000003 [ 0.397997] x5 : 0000000000000054 x4 : ffffffc0031eb800 [ 0.403499] x3 : ffffff800801b728 x2 : 0000000000000003 [ 0.409000] x1 : 0000000000000054 x0 : 0000000000000000 [ 0.414502] Process swapper/0 (pid: 1, stack limit = 0x000000002a913c1c) [ 0.421441] Call trace: [ 0.423968] mtk_hw_pin_field_get+0x28/0x118 [ 0.428387] mtk_hw_set_value+0x30/0x104 [ 0.432445] mtk_gpio_set+0x20/0x28 [ 0.436052] mtk_gpio_direction_output+0x18/0x30 [ 0.440833] gpiod_direction_output_raw_commit+0x7c/0xa0 [ 0.446333] gpiod_direction_output+0x104/0x114 [ 0.451022] gpiod_configure_flags+0xbc/0xfc [ 0.455441] gpiod_hog+0x8c/0x140 [ 0.458869] of_gpiochip_add+0x27c/0x2d4 [ 0.462928] gpiochip_add_data_with_key+0x338/0x5f0 [ 0.467976] mtk_pinctrl_probe+0x388/0x400 [ 0.472217] platform_drv_probe+0x58/0xa4 [ 0.476365] driver_probe_device+0x204/0x44c [ 0.480783] __device_attach_driver+0xac/0x108 [ 0.485384] bus_for_each_drv+0x7c/0xac [ 0.489352] __device_attach+0xa0/0x144 [ 0.493320] device_initial_probe+0x10/0x18 [ 0.497647] bus_probe_device+0x2c/0x8c [ 0.501616] device_add+0x2f8/0x540 [ 0.505226] of_device_add+0x3c/0x44 [ 0.508925] of_platform_device_create_pdata+0x80/0xb8 [ 0.514245] of_platform_bus_create+0x290/0x3e8 [ 0.518933] of_platform_populate+0x78/0x100 [ 0.523352] of_platform_default_populate+0x24/0x2c [ 0.528403] of_platform_default_populate_init+0x94/0xa4 [ 0.533903] do_one_initcall+0x98/0x130 [ 0.537874] kernel_init_freeable+0x13c/0x1d4 [ 0.542385] kernel_init+0x10/0xf8 [ 0.545903] ret_from_fork+0x10/0x18 [ 0.549603] Code: 900020a1 f9400800 911dcc21 1400001f (f9406401) [ 0.555916] ---[ end trace de8c34787fdad3b3 ]--- [ 0.560722] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.560722] [ 0.570188] SMP: stopping secondary CPUs [ 0.574253] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.574253]
Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang sean.wang@mediatek.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pinctrl/mediatek/pinctrl-mt7622.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/pinctrl/mediatek/pinctrl-mt7622.c +++ b/drivers/pinctrl/mediatek/pinctrl-mt7622.c @@ -1411,7 +1411,7 @@ static struct pinctrl_desc mtk_desc = {
static int mtk_gpio_get(struct gpio_chip *chip, unsigned int gpio) { - struct mtk_pinctrl *hw = dev_get_drvdata(chip->parent); + struct mtk_pinctrl *hw = gpiochip_get_data(chip); int value, err;
err = mtk_hw_get_value(hw, gpio, PINCTRL_PIN_REG_DI, &value); @@ -1423,7 +1423,7 @@ static int mtk_gpio_get(struct gpio_chip
static void mtk_gpio_set(struct gpio_chip *chip, unsigned int gpio, int value) { - struct mtk_pinctrl *hw = dev_get_drvdata(chip->parent); + struct mtk_pinctrl *hw = gpiochip_get_data(chip);
mtk_hw_set_value(hw, gpio, PINCTRL_PIN_REG_DO, !!value); }
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit bc800e8b39bad60ccdb83be828da63af71ab87b3 ]
The __alx_open function can be called from ndo_open, which is called under RTNL, or from alx_resume, which isn't. Since commit d768319cd427, we're calling the netif_set_real_num_{tx,rx}_queues functions, which need to be called under RTNL.
This is similar to commit 0c2cc02e571a ("igb: Move the calls to set the Tx and Rx queues into igb_open").
Fixes: d768319cd427 ("alx: enable multiple tx queues") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/atheros/alx/main.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/atheros/alx/main.c +++ b/drivers/net/ethernet/atheros/alx/main.c @@ -1897,13 +1897,19 @@ static int alx_resume(struct device *dev struct pci_dev *pdev = to_pci_dev(dev); struct alx_priv *alx = pci_get_drvdata(pdev); struct alx_hw *hw = &alx->hw; + int err;
alx_reset_phy(hw);
if (!netif_running(alx->dev)) return 0; netif_device_attach(alx->dev); - return __alx_open(alx, true); + + rtnl_lock(); + err = __alx_open(alx, true); + rtnl_unlock(); + + return err; }
static SIMPLE_DEV_PM_OPS(alx_pm_ops, alx_suspend, alx_resume);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Gustavo A. R. Silva" gustavo@embeddedor.com
[ Upstream commit ced9e191501e52b95e1b57b8e0db00943869eed0 ]
pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap)
Fix this by sanitizing pool before using it to index zatm_dev->pool_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/atm/zatm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/atm/zatm.c +++ b/drivers/atm/zatm.c @@ -1483,6 +1483,8 @@ static int zatm_ioctl(struct atm_dev *de return -EFAULT; if (pool < 0 || pool > ZATM_LAST_POOL) return -EINVAL; + pool = array_index_nospec(pool, + ZATM_LAST_POOL + 1); if (copy_from_user(&info, &((struct zatm_pool_req __user *) arg)->info, sizeof(info))) return -EFAULT;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Hemminger sthemmin@microsoft.com
[ Upstream commit 3ffe64f1a641b80a82d9ef4efa7a05ce69049871 ]
When doing device hotplug the sub channel must be async to avoid deadlock issues because device is discovered in softirq context.
When doing changes to MTU and number of channels, the setup must be synchronous to avoid races such as when MTU and device settings are done in a single ip command.
Reported-by: Thomas Walker Thomas.Walker@twosigma.com Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation") Signed-off-by: Stephen Hemminger sthemmin@microsoft.com Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/hyperv_net.h | 2 - drivers/net/hyperv/netvsc.c | 37 ++++++++++++++++++++++- drivers/net/hyperv/netvsc_drv.c | 17 +++++++++- drivers/net/hyperv/rndis_filter.c | 61 +++++++------------------------------- 4 files changed, 65 insertions(+), 52 deletions(-)
--- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -211,7 +211,7 @@ int netvsc_recv_callback(struct net_devi void netvsc_channel_cb(void *context); int netvsc_poll(struct napi_struct *napi, int budget);
-void rndis_set_subchannel(struct work_struct *w); +int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev); int rndis_filter_open(struct netvsc_device *nvdev); int rndis_filter_close(struct netvsc_device *nvdev); struct netvsc_device *rndis_filter_device_add(struct hv_device *dev, --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -66,6 +66,41 @@ void netvsc_switch_datapath(struct net_d VM_PKT_DATA_INBAND, 0); }
+/* Worker to setup sub channels on initial setup + * Initial hotplug event occurs in softirq context + * and can't wait for channels. + */ +static void netvsc_subchan_work(struct work_struct *w) +{ + struct netvsc_device *nvdev = + container_of(w, struct netvsc_device, subchan_work); + struct rndis_device *rdev; + int i, ret; + + /* Avoid deadlock with device removal already under RTNL */ + if (!rtnl_trylock()) { + schedule_work(w); + return; + } + + rdev = nvdev->extension; + if (rdev) { + ret = rndis_set_subchannel(rdev->ndev, nvdev); + if (ret == 0) { + netif_device_attach(rdev->ndev); + } else { + /* fallback to only primary channel */ + for (i = 1; i < nvdev->num_chn; i++) + netif_napi_del(&nvdev->chan_table[i].napi); + + nvdev->max_chn = 1; + nvdev->num_chn = 1; + } + } + + rtnl_unlock(); +} + static struct netvsc_device *alloc_net_device(void) { struct netvsc_device *net_device; @@ -82,7 +117,7 @@ static struct netvsc_device *alloc_net_d
init_completion(&net_device->channel_init_wait); init_waitqueue_head(&net_device->subchan_open); - INIT_WORK(&net_device->subchan_work, rndis_set_subchannel); + INIT_WORK(&net_device->subchan_work, netvsc_subchan_work);
return net_device; } --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -905,8 +905,20 @@ static int netvsc_attach(struct net_devi if (IS_ERR(nvdev)) return PTR_ERR(nvdev);
- /* Note: enable and attach happen when sub-channels setup */ + if (nvdev->num_chn > 1) { + ret = rndis_set_subchannel(ndev, nvdev); + + /* if unavailable, just proceed with one queue */ + if (ret) { + nvdev->max_chn = 1; + nvdev->num_chn = 1; + } + }
+ /* In any case device is now ready */ + netif_device_attach(ndev); + + /* Note: enable and attach happen when sub-channels setup */ netif_carrier_off(ndev);
if (netif_running(ndev)) { @@ -2064,6 +2076,9 @@ static int netvsc_probe(struct hv_device
memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
+ if (nvdev->num_chn > 1) + schedule_work(&nvdev->subchan_work); + /* hw_features computed in rndis_netdev_set_hwcaps() */ net->features = net->hw_features | NETIF_F_HIGHDMA | NETIF_F_SG | --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -1061,29 +1061,15 @@ static void netvsc_sc_open(struct vmbus_ * This breaks overlap of processing the host message for the * new primary channel with the initialization of sub-channels. */ -void rndis_set_subchannel(struct work_struct *w) +int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev) { - struct netvsc_device *nvdev - = container_of(w, struct netvsc_device, subchan_work); struct nvsp_message *init_packet = &nvdev->channel_init_pkt; - struct net_device_context *ndev_ctx; - struct rndis_device *rdev; - struct net_device *ndev; - struct hv_device *hv_dev; + struct net_device_context *ndev_ctx = netdev_priv(ndev); + struct hv_device *hv_dev = ndev_ctx->device_ctx; + struct rndis_device *rdev = nvdev->extension; int i, ret;
- if (!rtnl_trylock()) { - schedule_work(w); - return; - } - - rdev = nvdev->extension; - if (!rdev) - goto unlock; /* device was removed */ - - ndev = rdev->ndev; - ndev_ctx = netdev_priv(ndev); - hv_dev = ndev_ctx->device_ctx; + ASSERT_RTNL();
memset(init_packet, 0, sizeof(struct nvsp_message)); init_packet->hdr.msg_type = NVSP_MSG5_TYPE_SUBCHANNEL; @@ -1099,13 +1085,13 @@ void rndis_set_subchannel(struct work_st VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret) { netdev_err(ndev, "sub channel allocate send failed: %d\n", ret); - goto failed; + return ret; }
wait_for_completion(&nvdev->channel_init_wait); if (init_packet->msg.v5_msg.subchn_comp.status != NVSP_STAT_SUCCESS) { netdev_err(ndev, "sub channel request failed\n"); - goto failed; + return -EIO; }
nvdev->num_chn = 1 + @@ -1124,21 +1110,7 @@ void rndis_set_subchannel(struct work_st for (i = 0; i < VRSS_SEND_TAB_SIZE; i++) ndev_ctx->tx_table[i] = i % nvdev->num_chn;
- netif_device_attach(ndev); - rtnl_unlock(); - return; - -failed: - /* fallback to only primary channel */ - for (i = 1; i < nvdev->num_chn; i++) - netif_napi_del(&nvdev->chan_table[i].napi); - - nvdev->max_chn = 1; - nvdev->num_chn = 1; - - netif_device_attach(ndev); -unlock: - rtnl_unlock(); + return 0; }
static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device, @@ -1329,21 +1301,12 @@ struct netvsc_device *rndis_filter_devic netif_napi_add(net, &net_device->chan_table[i].napi, netvsc_poll, NAPI_POLL_WEIGHT);
- if (net_device->num_chn > 1) - schedule_work(&net_device->subchan_work); + return net_device;
out: - /* if unavailable, just proceed with one queue */ - if (ret) { - net_device->max_chn = 1; - net_device->num_chn = 1; - } - - /* No sub channels, device is ready */ - if (net_device->num_chn == 1) - netif_device_attach(net); - - return net_device; + /* setting up multiple channels failed */ + net_device->max_chn = 1; + net_device->num_chn = 1;
err_dev_remv: rndis_filter_device_remove(dev, net_device);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
[ Upstream commit fc9c2029e37c3ae9efc28bf47045e0b87e09660c ]
The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/seg6_hmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/seg6_hmac.c +++ b/net/ipv6/seg6_hmac.c @@ -373,7 +373,7 @@ static int seg6_hmac_init_algo(void) return -ENOMEM;
for_each_possible_cpu(cpu) { - tfm = crypto_alloc_shash(algo->name, 0, GFP_KERNEL); + tfm = crypto_alloc_shash(algo->name, 0, 0); if (IS_ERR(tfm)) return PTR_ERR(tfm); p_tfm = per_cpu_ptr(algo->tfms, cpu);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 30877961b1cdd6fdca783c2e8c4f0f47e95dc58c ]
Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted the mtu from the master device when creating a ipvlan device, but it would also override the mtu value set in rtnl_create_link. It causes IFLA_MTU param not to take effect.
So this patch is to not adjust the mtu if IFLA_MTU param is set when creating a ipvlan device.
Fixes: 296d48568042 ("ipvlan: inherit MTU from master device") Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipvlan/ipvlan_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -594,7 +594,8 @@ int ipvlan_link_new(struct net *src_net, ipvlan->phy_dev = phy_dev; ipvlan->dev = dev; ipvlan->sfeatures = IPVLAN_FEATURES; - ipvlan_adjust_mtu(ipvlan, phy_dev); + if (!tb[IFLA_MTU]) + ipvlan_adjust_mtu(ipvlan, phy_dev); INIT_LIST_HEAD(&ipvlan->addrs); spin_lock_init(&ipvlan->addrs_lock);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesper Dangaard Brouer brouer@redhat.com
[ Upstream commit ad088ec480768850db019a5cc543685e868a513d ]
The driver was combining the XDP_TX tail flush and XDP_REDIRECT map flushing (xdp_do_flush_map). This is suboptimal, these two flush operations should be kept separate.
Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map") Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2257,9 +2257,10 @@ static struct sk_buff *ixgbe_build_skb(s return skb; }
-#define IXGBE_XDP_PASS 0 -#define IXGBE_XDP_CONSUMED 1 -#define IXGBE_XDP_TX 2 +#define IXGBE_XDP_PASS 0 +#define IXGBE_XDP_CONSUMED BIT(0) +#define IXGBE_XDP_TX BIT(1) +#define IXGBE_XDP_REDIR BIT(2)
static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, struct xdp_buff *xdp); @@ -2288,7 +2289,7 @@ static struct sk_buff *ixgbe_run_xdp(str case XDP_REDIRECT: err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog); if (!err) - result = IXGBE_XDP_TX; + result = IXGBE_XDP_REDIR; else result = IXGBE_XDP_CONSUMED; break; @@ -2348,7 +2349,7 @@ static int ixgbe_clean_rx_irq(struct ixg unsigned int mss = 0; #endif /* IXGBE_FCOE */ u16 cleaned_count = ixgbe_desc_unused(rx_ring); - bool xdp_xmit = false; + unsigned int xdp_xmit = 0; struct xdp_buff xdp;
xdp.rxq = &rx_ring->xdp_rxq; @@ -2391,8 +2392,10 @@ static int ixgbe_clean_rx_irq(struct ixg }
if (IS_ERR(skb)) { - if (PTR_ERR(skb) == -IXGBE_XDP_TX) { - xdp_xmit = true; + unsigned int xdp_res = -PTR_ERR(skb); + + if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { + xdp_xmit |= xdp_res; ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; @@ -2464,7 +2467,10 @@ static int ixgbe_clean_rx_irq(struct ixg total_rx_packets++; }
- if (xdp_xmit) { + if (xdp_xmit & IXGBE_XDP_REDIR) + xdp_do_flush_map(); + + if (xdp_xmit & IXGBE_XDP_TX) { struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
/* Force memory writes to complete before letting h/w @@ -2472,8 +2478,6 @@ static int ixgbe_clean_rx_irq(struct ixg */ wmb(); writel(ring->next_to_use, ring->tail); - - xdp_do_flush_map(); }
u64_stats_update_begin(&rx_ring->syncp);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 74174fe5634ffbf645a7ca5a261571f700b2f332 ]
On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive.
syzbot reported :
BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid3.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -625,9 +625,8 @@ static void ccid3_hc_rx_send_feedback(st case CCID3_FBACK_PERIODIC: delta = ktime_us_delta(now, hc->rx_tstamp_last_feedback); if (delta <= 0) - DCCP_BUG("delta (%ld) <= 0", (long)delta); - else - hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); + delta = 1; + hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); break; default: return;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0ce4e70ff00662ad7490e545ba0cd8c1fa179fca ]
To compute delays, better not use time of the day which can be changed by admins or malicious programs.
Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid3.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -600,7 +600,7 @@ static void ccid3_hc_rx_send_feedback(st { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - ktime_t now = ktime_get_real(); + ktime_t now = ktime_get(); s64 delta = 0;
switch (fbtype) { @@ -632,7 +632,7 @@ static void ccid3_hc_rx_send_feedback(st return; }
- ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + ccid3_pr_debug("Interval %lldusec, X_recv=%u, 1/p=%u\n", delta, hc->rx_x_recv, hc->rx_pinv);
hc->rx_tstamp_last_feedback = now; @@ -679,7 +679,8 @@ static int ccid3_hc_rx_insert_options(st static u32 ccid3_first_li(struct sock *sk) { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); - u32 x_recv, p, delta; + u32 x_recv, p; + s64 delta; u64 fval;
if (hc->rx_rtt == 0) { @@ -687,7 +688,9 @@ static u32 ccid3_first_li(struct sock *s hc->rx_rtt = DCCP_FALLBACK_RTT; }
- delta = ktime_to_us(net_timedelta(hc->rx_tstamp_last_feedback)); + delta = ktime_us_delta(ktime_get(), hc->rx_tstamp_last_feedback); + if (delta <= 0) + delta = 1; x_recv = scaled_div32(hc->rx_bytes_recv, delta); if (x_recv == 0) { /* would also trigger divide-by-zero */ DCCP_WARN("X_recv==0\n");
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 603d4cf8fe095b1ee78f423d514427be507fb513 ]
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore.
Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols.
This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE.
Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/geneve.c | 2 +- drivers/net/vxlan.c | 4 +--- include/linux/netdevice.h | 20 ++++++++++++++++++++ net/8021q/vlan.c | 2 +- net/ipv4/fou.c | 4 +--- net/ipv4/gre_offload.c | 2 +- net/ipv4/udp_offload.c | 2 +- 7 files changed, 26 insertions(+), 10 deletions(-)
--- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -474,7 +474,7 @@ static struct sk_buff **geneve_gro_recei out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush);
return pp; } --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -623,9 +623,7 @@ static struct sk_buff **vxlan_gro_receiv flush = 0;
out: - skb_gro_remcsum_cleanup(skb, &grc); - skb->remcsum_offload = 0; - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final_remcsum(skb, pp, flush, &grc);
return pp; } --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2735,11 +2735,31 @@ static inline void skb_gro_flush_final(s if (PTR_ERR(pp) != -EINPROGRESS) NAPI_GRO_CB(skb)->flush |= flush; } +static inline void skb_gro_flush_final_remcsum(struct sk_buff *skb, + struct sk_buff **pp, + int flush, + struct gro_remcsum *grc) +{ + if (PTR_ERR(pp) != -EINPROGRESS) { + NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_remcsum_cleanup(skb, grc); + skb->remcsum_offload = 0; + } +} #else static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff **pp, int flush) { NAPI_GRO_CB(skb)->flush |= flush; } +static inline void skb_gro_flush_final_remcsum(struct sk_buff *skb, + struct sk_buff **pp, + int flush, + struct gro_remcsum *grc) +{ + NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_remcsum_cleanup(skb, grc); + skb->remcsum_offload = 0; +} #endif
static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev, --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -688,7 +688,7 @@ static struct sk_buff **vlan_gro_receive out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush);
return pp; } --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -448,9 +448,7 @@ next_proto: out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; - skb_gro_remcsum_cleanup(skb, &grc); - skb->remcsum_offload = 0; + skb_gro_flush_final_remcsum(skb, pp, flush, &grc);
return pp; } --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -223,7 +223,7 @@ static struct sk_buff **gre_gro_receive( out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush);
return pp; } --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -295,7 +295,7 @@ unflush: out_unlock: rcu_read_unlock(); out: - NAPI_GRO_CB(skb)->flush |= flush; + skb_gro_flush_final(skb, pp, flush); return pp; } EXPORT_SYMBOL(udp_gro_receive);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harini Katakam harini.katakam@xilinx.com
[ Upstream commit 64d7839af8c8f67daaf9bf387135052c55d85f90 ]
When delta passed to gem_ptp_adjtime is negative, the sign is maintained in the ns_to_timespec64 conversion. Hence timespec_add should be used directly. timespec_sub will just subtract the negative value thus increasing the time difference.
Signed-off-by: Harini Katakam harini.katakam@xilinx.com Acked-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cadence/macb_ptp.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/net/ethernet/cadence/macb_ptp.c +++ b/drivers/net/ethernet/cadence/macb_ptp.c @@ -170,10 +170,7 @@ static int gem_ptp_adjtime(struct ptp_cl
if (delta > TSU_NSEC_MAX_VAL) { gem_tsu_get_time(&bp->ptp_clock_info, &now); - if (sign) - now = timespec64_sub(now, then); - else - now = timespec64_add(now, then); + now = timespec64_add(now, then);
gem_tsu_set_time(&bp->ptp_clock_info, (const struct timespec64 *)&now);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit 733d3e5497070d05971352ca5087bac83c197c3d ]
In smartnic env, the host (PF) driver might not be an e-switch manager, hence the switchdev mode representors are running on the embedded cpu (EC) and not at the host.
As such, we should avoid dealing with vport representors if not being esw manager.
While here, make sure to disallow eswitch switchdev related setups through devlink if we are not esw managers.
Fixes: cb67b832921c ('net/mlx5e: Introduce SRIOV VF representors') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reviewed-by: Eli Cohen eli@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++++++------ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2612,7 +2612,7 @@ void mlx5e_activate_priv_channels(struct mlx5e_activate_channels(&priv->channels); netif_tx_start_all_queues(priv->netdev);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_add_sqs_fwd_rules(priv);
mlx5e_wait_channels_min_rx_wqes(&priv->channels); @@ -2623,7 +2623,7 @@ void mlx5e_deactivate_priv_channels(stru { mlx5e_redirect_rqts_to_drop(priv);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_remove_sqs_fwd_rules(priv);
/* FIXME: This is a W/A only for tx timeout watch dog false alarm when @@ -4315,7 +4315,7 @@ static void mlx5e_build_nic_netdev(struc mlx5e_set_netdev_dev_addr(netdev);
#if IS_ENABLED(CONFIG_MLX5_ESWITCH) - if (MLX5_VPORT_MANAGER(mdev)) + if (MLX5_ESWITCH_MANAGER(mdev)) netdev->switchdev_ops = &mlx5e_switchdev_ops; #endif
@@ -4465,7 +4465,7 @@ static void mlx5e_nic_enable(struct mlx5
mlx5e_enable_async_events(priv);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_register_vport_reps(priv);
if (netdev->reg_state != NETREG_REGISTERED) @@ -4500,7 +4500,7 @@ static void mlx5e_nic_disable(struct mlx
queue_work(priv->wq, &priv->set_rx_mode_work);
- if (MLX5_VPORT_MANAGER(priv->mdev)) + if (MLX5_ESWITCH_MANAGER(priv->mdev)) mlx5e_unregister_vport_reps(priv);
mlx5e_disable_async_events(priv); @@ -4684,7 +4684,7 @@ static void *mlx5e_add(struct mlx5_core_ return NULL;
#ifdef CONFIG_MLX5_ESWITCH - if (MLX5_VPORT_MANAGER(mdev)) { + if (MLX5_ESWITCH_MANAGER(mdev)) { rpriv = mlx5e_alloc_nic_rep_priv(mdev); if (!rpriv) { mlx5_core_warn(mdev, "Failed to alloc NIC rep priv data\n"); --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -790,7 +790,7 @@ bool mlx5e_is_uplink_rep(struct mlx5e_pr struct mlx5e_rep_priv *rpriv = priv->ppriv; struct mlx5_eswitch_rep *rep;
- if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + if (!MLX5_ESWITCH_MANAGER(priv->mdev)) return false;
rep = rpriv->rep; --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -983,8 +983,8 @@ static int mlx5_devlink_eswitch_check(st if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH) return -EOPNOTSUPP;
- if (!MLX5_CAP_GEN(dev, vport_group_manager)) - return -EOPNOTSUPP; + if(!MLX5_ESWITCH_MANAGER(dev)) + return -EPERM;
if (dev->priv.eswitch->mode == SRIOV_NONE) return -EOPNOTSUPP;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit 8ffd569aaa818f2624ca821d9a246342fa8b8c50 ]
The check for cpu hit statistics was not returning immediate false for any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if user-space tool was calling into any possible netdev in the system.
Fix that by doing a proper check before dereferencing.
Fixes: 1d447a39142e ('net/mlx5e: Extendable vport representor netdev private data') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reported-by: Eli Cohen eli@melloanox.com Reviewed-by: Eli Cohen eli@melloanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -804,8 +804,12 @@ bool mlx5e_is_uplink_rep(struct mlx5e_pr static bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv) { struct mlx5e_rep_priv *rpriv = priv->ppriv; - struct mlx5_eswitch_rep *rep = rpriv->rep; + struct mlx5_eswitch_rep *rep;
+ if (!MLX5_CAP_GEN(priv->mdev, eswitch_flow_table)) + return false; + + rep = rpriv->rep; if (rep && rep->vport != FDB_UPLINK_VPORT) return true;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit 0efc8562491b7d36f6bbc4fbc8f3348cb6641e9c ]
In smartnic env, the host (PF) driver might not be an e-switch manager, hence the FW will err on driver attempts to deal with setting/unsetting the eswitch and as a result the overall setup of sriov will fail.
Fix that by avoiding the operation if e-switch management is not allowed for this driver instance. While here, move to use the correct name for the esw manager capability name.
Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reported-by: Guy Kushnir guyk@mellanox.com Reviewed-by: Eli Cohen eli@melloanox.com Tested-by: Eli Cohen eli@melloanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 ++- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 5 +++-- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 7 ++++++- include/linux/mlx5/eswitch.h | 2 ++ include/linux/mlx5/mlx5_ifc.h | 2 +- 7 files changed, 16 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -806,7 +806,7 @@ static bool mlx5e_is_vf_vport_rep(struct struct mlx5e_rep_priv *rpriv = priv->ppriv; struct mlx5_eswitch_rep *rep;
- if (!MLX5_CAP_GEN(priv->mdev, eswitch_flow_table)) + if (!MLX5_ESWITCH_MANAGER(priv->mdev)) return false;
rep = rpriv->rep; --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1604,7 +1604,7 @@ int mlx5_eswitch_enable_sriov(struct mlx if (!ESW_ALLOWED(esw)) return 0;
- if (!MLX5_CAP_GEN(esw->dev, eswitch_flow_table) || + if (!MLX5_ESWITCH_MANAGER(esw->dev) || !MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ft_support)) { esw_warn(esw->dev, "E-Switch FDB is not supported, aborting ...\n"); return -EOPNOTSUPP; --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -32,6 +32,7 @@
#include <linux/mutex.h> #include <linux/mlx5/driver.h> +#include <linux/mlx5/eswitch.h>
#include "mlx5_core.h" #include "fs_core.h" @@ -2631,7 +2632,7 @@ int mlx5_init_fs(struct mlx5_core_dev *d goto err; }
- if (MLX5_CAP_GEN(dev, eswitch_flow_table)) { + if (MLX5_ESWITCH_MANAGER(dev)) { if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, ft_support)) { err = init_fdb_root_ns(steering); if (err) --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -32,6 +32,7 @@
#include <linux/mlx5/driver.h> #include <linux/mlx5/cmd.h> +#include <linux/mlx5/eswitch.h> #include <linux/module.h> #include "mlx5_core.h" #include "../../mlxfw/mlxfw.h" @@ -159,13 +160,13 @@ int mlx5_query_hca_caps(struct mlx5_core }
if (MLX5_CAP_GEN(dev, vport_group_manager) && - MLX5_CAP_GEN(dev, eswitch_flow_table)) { + MLX5_ESWITCH_MANAGER(dev)) { err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH_FLOW_TABLE); if (err) return err; }
- if (MLX5_CAP_GEN(dev, eswitch_flow_table)) { + if (MLX5_ESWITCH_MANAGER(dev)) { err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH); if (err) return err; --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c @@ -88,6 +88,9 @@ static int mlx5_device_enable_sriov(stru return -EBUSY; }
+ if (!MLX5_ESWITCH_MANAGER(dev)) + goto enable_vfs_hca; + err = mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs, SRIOV_LEGACY); if (err) { mlx5_core_warn(dev, @@ -95,6 +98,7 @@ static int mlx5_device_enable_sriov(stru return err; }
+enable_vfs_hca: for (vf = 0; vf < num_vfs; vf++) { err = mlx5_core_enable_hca(dev, vf + 1); if (err) { @@ -140,7 +144,8 @@ static void mlx5_device_disable_sriov(st }
out: - mlx5_eswitch_disable_sriov(dev->priv.eswitch); + if (MLX5_ESWITCH_MANAGER(dev)) + mlx5_eswitch_disable_sriov(dev->priv.eswitch);
if (mlx5_wait_for_vf_pages(dev)) mlx5_core_warn(dev, "timeout reclaiming VFs pages\n"); --- a/include/linux/mlx5/eswitch.h +++ b/include/linux/mlx5/eswitch.h @@ -8,6 +8,8 @@
#include <linux/mlx5/driver.h>
+#define MLX5_ESWITCH_MANAGER(mdev) MLX5_CAP_GEN(mdev, eswitch_manager) + enum { SRIOV_NONE, SRIOV_LEGACY, --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -905,7 +905,7 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 vnic_env_queue_counters[0x1]; u8 ets[0x1]; u8 nic_flow_table[0x1]; - u8 eswitch_flow_table[0x1]; + u8 eswitch_manager[0x1]; u8 device_memory[0x1]; u8 mcam_reg[0x1]; u8 pcam_reg[0x1];
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Vesker valex@mellanox.com
[ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ]
The command interface can work in two modes: Events and Polling. In the general case, each time we invoke a command, a work is queued to handle it.
When working in events, the interrupt handler completes the command execution. On the other hand, when working in polling mode, the work itself completes it.
Due to a bug in the work handler, a command could have been completed by the interrupt handler, while the work handler hasn't finished yet, causing the it to complete once again if the command interface mode was changed from Events to polling after the interrupt handler was called.
mlx5_unload_one() mlx5_stop_eqs() // Destroy the EQ before cmd EQ ...cmd_work_handler() write_doorbell() --> EVENT_TYPE_CMD mlx5_cmd_comp_handler() // First free free_ent(cmd, ent->idx) complete(&ent->done)
<-- mlx5_stop_eqs //cmd was complete // move to polling before destroying the last cmd EQ mlx5_cmd_use_polling() cmd->mode = POLL;
--> cmd_work_handler (continues) if (cmd->mode == POLL) mlx5_cmd_comp_handler() // Double free
The solution is to store the cmd->mode before writing the doorbell.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -803,6 +803,7 @@ static void cmd_work_handler(struct work unsigned long flags; bool poll_cmd = ent->polling; int alloc_ret; + int cmd_mode;
sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); @@ -849,6 +850,7 @@ static void cmd_work_handler(struct work set_signature(ent, !cmd->checksum_disabled); dump_command(dev, ent, 1); ent->ts1 = ktime_get_ns(); + cmd_mode = cmd->mode;
if (ent->callback) schedule_delayed_work(&ent->cb_timeout_work, cb_timeout); @@ -873,7 +875,7 @@ static void cmd_work_handler(struct work iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell); mmiowb(); /* if not in polling don't use ent after this point */ - if (cmd->mode == CMD_MODE_POLLING || poll_cmd) { + if (cmd_mode == CMD_MODE_POLLING || poll_cmd) { poll_timeout(ent); /* make sure we read the descriptor after ownership is SW */ rmb();
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Vesker valex@mellanox.com
[ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ]
The NULL character was not set correctly for the string containing the command length, this caused failures reading the output of the command due to a random length. The fix is to initialize the output length string.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -1276,7 +1276,7 @@ static ssize_t outlen_write(struct file { struct mlx5_core_dev *dev = filp->private_data; struct mlx5_cmd_debug *dbg = &dev->cmd.dbg; - char outlen_str[8]; + char outlen_str[8] = {0}; int outlen; void *ptr; int err; @@ -1291,8 +1291,6 @@ static ssize_t outlen_write(struct file if (copy_from_user(outlen_str, buf, count)) return -EFAULT;
- outlen_str[7] = 0; - err = sscanf(outlen_str, "%d", &outlen); if (err < 0) return err;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Cohen eli@mellanox.com
[ Upstream commit f811980444ec59ad62f9e041adbb576a821132c7 ]
Manipulating of the MPFS requires eswitch manager capabilities.
Fixes: eeb66cdb6826 ('net/mlx5: Separate between E-Switch and MPFS') Signed-off-by: Eli Cohen eli@mellanox.com Reviewed-by: Or Gerlitz ogerlitz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mpfs.c @@ -33,6 +33,7 @@ #include <linux/etherdevice.h> #include <linux/mlx5/driver.h> #include <linux/mlx5/mlx5_ifc.h> +#include <linux/mlx5/eswitch.h> #include "mlx5_core.h" #include "lib/mpfs.h"
@@ -98,7 +99,7 @@ int mlx5_mpfs_init(struct mlx5_core_dev int l2table_size = 1 << MLX5_CAP_GEN(dev, log_max_l2_table); struct mlx5_mpfs *mpfs;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return 0;
mpfs = kzalloc(sizeof(*mpfs), GFP_KERNEL); @@ -122,7 +123,7 @@ void mlx5_mpfs_cleanup(struct mlx5_core_ { struct mlx5_mpfs *mpfs = dev->priv.mpfs;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return;
WARN_ON(!hlist_empty(mpfs->hash)); @@ -137,7 +138,7 @@ int mlx5_mpfs_add_mac(struct mlx5_core_d u32 index; int err;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return 0;
mutex_lock(&mpfs->lock); @@ -179,7 +180,7 @@ int mlx5_mpfs_del_mac(struct mlx5_core_d int err = 0; u32 index;
- if (!MLX5_VPORT_MANAGER(dev)) + if (!MLX5_ESWITCH_MANAGER(dev)) return 0;
mutex_lock(&mpfs->lock);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shay Agroskin shayag@mellanox.com
[ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ]
The driver allocates wrong size (due to wrong struct name) when issuing a query/set request to NIC's register.
Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Shay Agroskin shayag@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -701,7 +701,7 @@ EXPORT_SYMBOL_GPL(mlx5_query_port_prio_t static int mlx5_set_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *in, int inlen) { - u32 out[MLX5_ST_SZ_DW(qtct_reg)]; + u32 out[MLX5_ST_SZ_DW(qetc_reg)];
if (!MLX5_CAP_GEN(mdev, ets)) return -EOPNOTSUPP; @@ -713,7 +713,7 @@ static int mlx5_set_port_qetcr_reg(struc static int mlx5_query_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *out, int outlen) { - u32 in[MLX5_ST_SZ_DW(qtct_reg)]; + u32 in[MLX5_ST_SZ_DW(qetc_reg)];
if (!MLX5_CAP_GEN(mdev, ets)) return -EOPNOTSUPP;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart antoine.tenart@bootlin.com
[ Upstream commit 271f7ff5aa5a73488b7a9d8b84b5205fb5b2f7cc ]
When using s/w buffer management, buffers are allocated and DMA mapped. When doing so on an arm64 platform, an offset correction is applied on the DMA address, before storing it in an Rx descriptor. The issue is this DMA address is then used later in the Rx path without removing the offset correction. Thus the DMA address is wrong, which can led to various issues.
This patch fixes this by removing the offset correction from the DMA address retrieved from the Rx descriptor before using it in the Rx path.
Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible") Signed-off-by: Antoine Tenart antoine.tenart@bootlin.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1932,7 +1932,7 @@ static int mvneta_rx_swbm(struct mvneta_ rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE); index = rx_desc - rxq->descs; data = rxq->buf_virt_addr[index]; - phys_addr = rx_desc->buf_phys_addr; + phys_addr = rx_desc->buf_phys_addr - pp->rx_offset_correction;
if (!mvneta_rxq_desc_is_first_last(rx_status) || (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ]
We should put copy_skb in receive_queue only after a successful call to virtio_net_hdr_from_skb().
syzbot report :
BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline] BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline] BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __skb_unlink include/linux/skbuff.h:1843 [inline] __skb_dequeue include/linux/skbuff.h:1863 [inline] skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331 packet_release+0x630/0xd90 net/packet/af_packet.c:2991 __sock_release+0xd7/0x260 net/socket.c:603 sock_close+0x19/0x20 net/socket.c:1186 __fput+0x35b/0x8b0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:865 do_group_exit+0x177/0x440 kernel/exit.c:968 __do_sys_exit_group kernel/exit.c:979 [inline] __se_sys_exit_group kernel/exit.c:977 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4448e9 Code: Bad RIP value. RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9 RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001 RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000 R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0 R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb net/core/skbuff.c:642 [inline] kfree_skb+0x1a5/0x580 net/core/skbuff.c:659 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8801b044ecc0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 0 bytes inside of 232-byte region [ffff8801b044ecc0, ffff8801b044eda8) The buggy address belongs to the page: page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0 raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^ ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2294,6 +2294,13 @@ static int tpacket_rcv(struct sk_buff *s if (po->stats.stats1.tp_drops) status |= TP_STATUS_LOSING; } + + if (do_vnet && + virtio_net_hdr_from_skb(skb, h.raw + macoff - + sizeof(struct virtio_net_hdr), + vio_le(), true, 0)) + goto drop_n_account; + po->stats.stats1.tp_packets++; if (copy_skb) { status |= TP_STATUS_COPY; @@ -2301,15 +2308,6 @@ static int tpacket_rcv(struct sk_buff *s } spin_unlock(&sk->sk_receive_queue.lock);
- if (do_vnet) { - if (virtio_net_hdr_from_skb(skb, h.raw + macoff - - sizeof(struct virtio_net_hdr), - vio_le(), true, 0)) { - spin_lock(&sk->sk_receive_queue.lock); - goto drop_n_account; - } - } - skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp)))
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit 0a889b9404c084c6fd145020c939a8f688b3e058 ]
a recursive lock warning [1] can be observed with the following script,
# $TC actions add action ife encode allow prio pass index 42 IFE type 0xED3E # $TC actions replace action ife encode allow tcindex pass index 42
in case the kernel was unable to run the last command (e.g. because of the impossibility to load 'act_meta_skbtcindex'). For a similar reason, the kernel can leak idr in the error path of tcf_ife_init(), because tcf_idr_release() is not called after successful idr reservation:
# $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No such file or directory We have an error talking to the kernel # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No space left on device We have an error talking to the kernel # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47 IFE type 0xFEFE RTNETLINK answers: No space left on device We have an error talking to the kernel
Since tcfa_lock is already taken when the action is being edited, a call to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock again. On the other hand, tcf_idr_release() needs to be called in the error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation. Fix both problems in tcf_ife_init(). Since the cleanup() routine can now be called when ife->params is NULL, also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).
[1] ============================================ WARNING: possible recursive locking detected 4.17.0-rc4.kasan+ #417 Tainted: G E -------------------------------------------- tc/3932 is trying to acquire lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]
but task is already holding lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&(&p->tcfa_lock)->rlock); lock(&(&p->tcfa_lock)->rlock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by tc/3932: #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife] #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]
stack backtrace: CPU: 3 PID: 3932 Comm: tc Tainted: G E 4.17.0-rc4.kasan+ #417 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x9a/0xeb __lock_acquire+0xf43/0x34a0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? __mutex_lock+0x62f/0x1240 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_acquire+0x10b/0x330 lock_acquire+0x10b/0x330 ? tcf_ife_cleanup+0x19/0x80 [act_ife] _raw_spin_lock_bh+0x38/0x70 ? tcf_ife_cleanup+0x19/0x80 [act_ife] tcf_ife_cleanup+0x19/0x80 [act_ife] __tcf_idr_release+0xff/0x350 tcf_ife_init+0xdde/0x13c0 [act_ife] ? ife_exit_net+0x290/0x290 [act_ife] ? __lock_is_held+0xb4/0x140 tcf_action_init_1+0x67b/0xad0 ? tcf_action_dump_old+0xa0/0xa0 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? memset+0x1f/0x40 tcf_action_init+0x30f/0x590 ? tcf_action_init_1+0xad0/0xad0 ? memset+0x1f/0x40 tc_ctl_action+0x48e/0x5e0 ? mutex_lock_io_nested+0x1160/0x1160 ? tca_action_gd+0x990/0x990 ? sched_clock+0x5/0x10 ? find_held_lock+0x39/0x1d0 rtnetlink_rcv_msg+0x4da/0x990 ? validate_linkmsg+0x680/0x680 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 netlink_rcv_skb+0x127/0x350 ? validate_linkmsg+0x680/0x680 ? netlink_ack+0x970/0x970 ? __kmalloc_node_track_caller+0x304/0x3a0 netlink_unicast+0x40f/0x5d0 ? netlink_attachskb+0x580/0x580 ? _copy_from_iter_full+0x187/0x760 ? import_iovec+0x90/0x390 netlink_sendmsg+0x67f/0xb50 ? netlink_unicast+0x5d0/0x5d0 ? copy_msghdr_from_user+0x206/0x340 ? netlink_unicast+0x5d0/0x5d0 sock_sendmsg+0xb3/0xf0 ___sys_sendmsg+0x60a/0x8b0 ? copy_msghdr_from_user+0x340/0x340 ? lock_downgrade+0x5e0/0x5e0 ? tty_write_lock+0x18/0x50 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_downgrade+0x5e0/0x5e0 ? lock_acquire+0x10b/0x330 ? __audit_syscall_entry+0x316/0x690 ? current_kernel_time64+0x6b/0xd0 ? __fget_light+0x55/0x1f0 ? __sys_sendmsg+0xd2/0x170 __sys_sendmsg+0xd2/0x170 ? __ia32_sys_shutdown+0x70/0x70 ? syscall_trace_enter+0x57a/0xd60 ? rcu_read_lock_sched_held+0xdc/0x110 ? __bpf_trace_sys_enter+0x10/0x10 ? do_syscall_64+0x22/0x480 do_syscall_64+0xa5/0x480 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fd646988ba0 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100
Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding") Fixes: ef6980b6becb ("introduce IFE action") Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_ife.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -415,7 +415,8 @@ static void tcf_ife_cleanup(struct tc_ac spin_unlock_bh(&ife->tcf_lock);
p = rcu_dereference_protected(ife->params, 1); - kfree_rcu(p, rcu); + if (p) + kfree_rcu(p, rcu); }
/* under ife->tcf_lock for existing action */ @@ -543,10 +544,8 @@ static int tcf_ife_init(struct net *net, NULL, NULL); if (err) { metadata_parse_err: - if (exists) - tcf_idr_release(*a, bind); if (ret == ACT_P_CREATED) - _tcf_ife_cleanup(*a); + tcf_idr_release(*a, bind);
if (exists) spin_unlock_bh(&ife->tcf_lock); @@ -567,7 +566,7 @@ metadata_parse_err: err = use_all_metadata(ife); if (err) { if (ret == ACT_P_CREATED) - _tcf_ife_cleanup(*a); + tcf_idr_release(*a, bind);
if (exists) spin_unlock_bh(&ife->tcf_lock);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit cbf56c29624fa056a0c1c3d177e67aa51a7fd8d6 ]
in the following script
# tc actions add action ife encode allow prio pass index 42 # tc actions replace action ife encode allow tcindex drop index 42
the action control should remain equal to 'pass', if the kernel failed to replace the TC action. Pospone the assignment of the action control, to ensure it is not overwritten in the error path of tcf_ife_init().
Fixes: ef6980b6becb ("introduce IFE action") Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_ife.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -517,8 +517,6 @@ static int tcf_ife_init(struct net *net, saddr = nla_data(tb[TCA_IFE_SMAC]); }
- ife->tcf_action = parm->action; - if (parm->flags & IFE_ENCODE) { if (daddr) ether_addr_copy(p->eth_dst, daddr); @@ -575,6 +573,7 @@ metadata_parse_err: } }
+ ife->tcf_action = parm->action; if (exists) spin_unlock_bh(&ife->tcf_lock);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Khlebnikov khlebnikov@yandex-team.ru
[ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ]
When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice.
In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly.
This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued.
Signed-off-by: Konstantin Khlebnikov khlebnikov@yandex-team.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_blackhole.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/sch_blackhole.c +++ b/net/sched/sch_blackhole.c @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_b struct sk_buff **to_free) { qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS; + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; }
static struct sk_buff *blackhole_dequeue(struct Qdisc *sch)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ]
After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), sungem owners reported the infamous "eth0: hw csum failure" message.
CHECKSUM_COMPLETE has in fact never worked for this driver, but this was masked by the fact that upper stacks had to strip the FCS, and therefore skb->ip_summed was set back to CHECKSUM_NONE before my recent change.
Driver configures a number of bytes to skip when the chip computes the checksum, and for some reason only half of the Ethernet header was skipped.
Then a second problem is that we should strip the FCS by default, unless the driver is updated to eventually support NETIF_F_RXFCS in the future.
Finally, a driver should check if NETIF_F_RXCSUM feature is enabled or not, so that the admin can turn off rx checksum if wanted.
Many thanks to Andreas Schwab and Mathieu Malaterre for their help in debugging this issue.
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Meelis Roos mroos@linux.ee Reported-by: Mathieu Malaterre malat@debian.org Reported-by: Andreas Schwab schwab@linux-m68k.org Tested-by: Andreas Schwab schwab@linux-m68k.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/sun/sungem.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
--- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -60,8 +60,7 @@ #include <linux/sungem_phy.h> #include "sungem.h"
-/* Stripping FCS is causing problems, disabled for now */ -#undef STRIP_FCS +#define STRIP_FCS
#define DEFAULT_MSG (NETIF_MSG_DRV | \ NETIF_MSG_PROBE | \ @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *g writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW); writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) writel(((5 & RXDMA_BLANK_IPKTS) | @@ -760,7 +759,6 @@ static int gem_rx(struct gem *gp, int wo struct net_device *dev = gp->dev; int entry, drops, work_done = 0; u32 done; - __sum16 csum;
if (netif_msg_rx_status(gp)) printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", @@ -855,9 +853,13 @@ static int gem_rx(struct gem *gp, int wo skb = copy_skb; }
- csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; + if (likely(dev->features & NETIF_F_RXCSUM)) { + __sum16 csum; + + csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); + skb->csum = csum_unfold(csum); + skb->ip_summed = CHECKSUM_COMPLETE; + } skb->protocol = eth_type_trans(skb, gp->dev);
napi_gro_receive(&gp->napi, skb); @@ -1761,7 +1763,7 @@ static void gem_init_dma(struct gem *gp) writel(0, gp->regs + TXDMA_KICK);
val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG);
writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); @@ -2985,8 +2987,8 @@ static int gem_init_one(struct pci_dev * pci_set_drvdata(pdev, dev);
/* We can do scatter/gather and HW checksum */ - dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM; - dev->features |= dev->hw_features | NETIF_F_RXCSUM; + dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->features = dev->hw_features; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ]
Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups need to fail if dev_match is not true. Currently, a packet to a given port can match a socket bound to device when it should not. In the VRF case, this causes the lookup to hit a VRF socket and not a global socket resulting in a response trying to go through the VRF when it should not.
Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups") Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups") Reported-by: Lou Berger lberger@labn.net Diagnosed-by: Renato Westphal renato@opensourcerouting.org Tested-by: Renato Westphal renato@opensourcerouting.org Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/inet_hashtables.c | 4 ++-- net/ipv6/inet6_hashtables.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -243,9 +243,9 @@ static inline int compute_score(struct s bool dev_match = (sk->sk_bound_dev_if == dif || sk->sk_bound_dev_if == sdif);
- if (exact_dif && !dev_match) + if (!dev_match) return -1; - if (sk->sk_bound_dev_if && dev_match) + if (sk->sk_bound_dev_if) score += 4; } if (sk->sk_incoming_cpu == raw_smp_processor_id()) --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -113,9 +113,9 @@ static inline int compute_score(struct s bool dev_match = (sk->sk_bound_dev_if == dif || sk->sk_bound_dev_if == sdif);
- if (exact_dif && !dev_match) + if (!dev_match) return -1; - if (sk->sk_bound_dev_if && dev_match) + if (sk->sk_bound_dev_if) score++; } if (sk->sk_incoming_cpu == raw_smp_processor_id())
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit 82a4e71b1565dea8387f54503e806cf374e779ec ]
When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode), get-tsinfo() callback should return the software timestamp capabilities instead of returning the error.
Fixes: 4c55215c ("qede: Add driver support for PTP") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qede/qede_ptp.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/qlogic/qede/qede_ptp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ptp.c @@ -337,8 +337,14 @@ int qede_ptp_get_ts_info(struct qede_dev { struct qede_ptp *ptp = edev->ptp;
- if (!ptp) - return -EIO; + if (!ptp) { + info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE | + SOF_TIMESTAMPING_RX_SOFTWARE | + SOF_TIMESTAMPING_SOFTWARE; + info->phc_index = -1; + + return 0; + }
info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE | SOF_TIMESTAMPING_RX_SOFTWARE |
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit 538f8d00ba8bb417c4d9e76c61dee59d812d8287 ]
By default, driver sets the eswitch mode incorrectly as VEB (virtual Ethernet bridging). Need to set VEB eswitch mode only when sriov is enabled, and it should be to set NONE by default. The patch incorporates this change.
Fixes: 0fefbfbaa ("qed*: Management firmware - notifications and defaults") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_dev.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 19 +++++++++++++++++-- 2 files changed, 18 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1789,7 +1789,7 @@ int qed_hw_init(struct qed_dev *cdev, st DP_INFO(p_hwfn, "Failed to update driver state\n");
rc = qed_mcp_ov_update_eswitch(p_hwfn, p_hwfn->p_main_ptt, - QED_OV_ESWITCH_VEB); + QED_OV_ESWITCH_NONE); if (rc) DP_INFO(p_hwfn, "Failed to update eswitch mode\n"); } --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -4400,6 +4400,8 @@ static void qed_sriov_enable_qid_config( static int qed_sriov_enable(struct qed_dev *cdev, int num) { struct qed_iov_vf_init_params params; + struct qed_hwfn *hwfn; + struct qed_ptt *ptt; int i, j, rc;
if (num >= RESC_NUM(&cdev->hwfns[0], QED_VPORT)) { @@ -4412,8 +4414,8 @@ static int qed_sriov_enable(struct qed_d
/* Initialize HW for VF access */ for_each_hwfn(cdev, j) { - struct qed_hwfn *hwfn = &cdev->hwfns[j]; - struct qed_ptt *ptt = qed_ptt_acquire(hwfn); + hwfn = &cdev->hwfns[j]; + ptt = qed_ptt_acquire(hwfn);
/* Make sure not to use more than 16 queues per VF */ params.num_queues = min_t(int, @@ -4449,6 +4451,19 @@ static int qed_sriov_enable(struct qed_d goto err; }
+ hwfn = QED_LEADING_HWFN(cdev); + ptt = qed_ptt_acquire(hwfn); + if (!ptt) { + DP_ERR(hwfn, "Failed to acquire ptt\n"); + rc = -EBUSY; + goto err; + } + + rc = qed_mcp_ov_update_eswitch(hwfn, ptt, QED_OV_ESWITCH_VEB); + if (rc) + DP_INFO(cdev, "Failed to update eswitch mode\n"); + qed_ptt_release(hwfn, ptt); + return num;
err:
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ]
Use the correct size value while copying chassis/port id values.
Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c @@ -700,9 +700,9 @@ qed_dcbx_get_local_lldp_params(struct qe p_local = &p_hwfn->p_dcbx_info->lldp_local[LLDP_NEAREST_BRIDGE];
memcpy(params->lldp_local.local_chassis_id, p_local->local_chassis_id, - ARRAY_SIZE(p_local->local_chassis_id)); + sizeof(p_local->local_chassis_id)); memcpy(params->lldp_local.local_port_id, p_local->local_port_id, - ARRAY_SIZE(p_local->local_port_id)); + sizeof(p_local->local_port_id)); }
static void @@ -714,9 +714,9 @@ qed_dcbx_get_remote_lldp_params(struct q p_remote = &p_hwfn->p_dcbx_info->lldp_remote[LLDP_NEAREST_BRIDGE];
memcpy(params->lldp_remote.peer_chassis_id, p_remote->peer_chassis_id, - ARRAY_SIZE(p_remote->peer_chassis_id)); + sizeof(p_remote->peer_chassis_id)); memcpy(params->lldp_remote.peer_port_id, p_remote->peer_port_id, - ARRAY_SIZE(p_remote->peer_port_id)); + sizeof(p_remote->peer_port_id)); }
static int
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ]
Memory size is limited in the kdump kernel environment. Allocation of more msix-vectors (or queues) consumes few tens of MBs of memory, which might lead to the kdump kernel failure. This patch adds changes to limit the number of MSI-X vectors in kdump kernel to minimum required value (i.e., 2 per engine).
Fixes: fe56b9e6a ("qed: Add module with basic common support") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_main.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -780,6 +780,14 @@ static int qed_slowpath_setup_int(struct /* We want a minimum of one slowpath and one fastpath vector per hwfn */ cdev->int_params.in.min_msix_cnt = cdev->num_hwfns * 2;
+ if (is_kdump_kernel()) { + DP_INFO(cdev, + "Kdump kernel: Limit the max number of requested MSI-X vectors to %hd\n", + cdev->int_params.in.min_msix_cnt); + cdev->int_params.in.num_vectors = + cdev->int_params.in.min_msix_cnt; + } + rc = qed_set_int_mode(cdev, false); if (rc) { DP_ERR(cdev, "qed_slowpath_setup_int ERR\n");
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ]
When unplugging an r8152 adapter while the interface is UP, the NIC becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to napi_disable, but the napi is already deleted by disconnect above. So the first while loop in napi_disable never finishes. This results in complete deadlock of the network layer as there is rtnl_mutex held by unregister_netdev.
So avoid the call to napi_disable in rtl8152_close when the device is already gone.
The other calls to usb_kill_urb, cancel_delayed_work_sync, netif_stop_queue etc. seem to be fine. The urb and netdev is not destroyed yet.
Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/r8152.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3962,7 +3962,8 @@ static int rtl8152_close(struct net_devi #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + if (!test_bit(RTL8152_UNPLUG, &tp->flags)) + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.ibm.com
[ Upstream commit ce28867fd20c23cd769e78b4d619c4755bf71a1c ]
If qeth_qdio_output_handler() detects that a transmit requires async completion, it replaces the pending buffer's metadata object (qeth_qdio_out_buffer) so that this queue buffer can be re-used while the data is pending completion.
Later when the CQ indicates async completion of such a metadata object, qeth_qdio_cq_handler() tries to free any data associated with this object (since HW has now completed the transfer). By calling qeth_clear_output_buffer(), it erronously operates on the queue buffer that _previously_ belonged to this transfer ... but which has been potentially re-used several times by now. This results in double-free's of the buffer's data, and failing transmits as the buffer descriptor is scrubbed in mid-air.
The correct way of handling this situation is to 1. scrub the queue buffer when it is prepared for re-use, and 2. later obtain the data addresses from the async-completion notifier (ie. the AOB), instead of the queue buffer.
All this only affects qeth devices used for af_iucv HiperTransport.
Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core.h | 11 +++++++++++ drivers/s390/net/qeth_core_main.c | 22 ++++++++++++++++------ 2 files changed, 27 insertions(+), 6 deletions(-)
--- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -831,6 +831,17 @@ struct qeth_trap_id { /*some helper functions*/ #define QETH_CARD_IFNAME(card) (((card)->dev)? (card)->dev->name : "")
+static inline void qeth_scrub_qdio_buffer(struct qdio_buffer *buf, + unsigned int elements) +{ + unsigned int i; + + for (i = 0; i < elements; i++) + memset(&buf->element[i], 0, sizeof(struct qdio_buffer_element)); + buf->element[14].sflags = 0; + buf->element[15].sflags = 0; +} + /** * qeth_get_elements_for_range() - find number of SBALEs to cover range. * @start: Start of the address range. --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -73,9 +73,6 @@ static void qeth_notify_skbs(struct qeth struct qeth_qdio_out_buffer *buf, enum iucv_tx_notify notification); static void qeth_release_skbs(struct qeth_qdio_out_buffer *buf); -static void qeth_clear_output_buffer(struct qeth_qdio_out_q *queue, - struct qeth_qdio_out_buffer *buf, - enum qeth_qdio_buffer_states newbufstate); static int qeth_init_qdio_out_buf(struct qeth_qdio_out_q *, int);
struct workqueue_struct *qeth_wq; @@ -488,6 +485,7 @@ static void qeth_qdio_handle_aob(struct struct qaob *aob; struct qeth_qdio_out_buffer *buffer; enum iucv_tx_notify notification; + unsigned int i;
aob = (struct qaob *) phys_to_virt(phys_aob_addr); QETH_CARD_TEXT(card, 5, "haob"); @@ -512,10 +510,18 @@ static void qeth_qdio_handle_aob(struct qeth_notify_skbs(buffer->q, buffer, notification);
buffer->aob = NULL; - qeth_clear_output_buffer(buffer->q, buffer, - QETH_QDIO_BUF_HANDLED_DELAYED); + /* Free dangling allocations. The attached skbs are handled by + * qeth_cleanup_handled_pending(). + */ + for (i = 0; + i < aob->sb_count && i < QETH_MAX_BUFFER_ELEMENTS(card); + i++) { + if (aob->sba[i] && buffer->is_header[i]) + kmem_cache_free(qeth_core_header_cache, + (void *) aob->sba[i]); + } + atomic_set(&buffer->state, QETH_QDIO_BUF_HANDLED_DELAYED);
- /* from here on: do not touch buffer anymore */ qdio_release_aob(aob); }
@@ -3759,6 +3765,10 @@ void qeth_qdio_output_handler(struct ccw QETH_CARD_TEXT(queue->card, 5, "aob"); QETH_CARD_TEXT_(queue->card, 5, "%lx", virt_to_phys(buffer->aob)); + + /* prepare the queue slot for re-use: */ + qeth_scrub_qdio_buffer(buffer->buffer, + QETH_MAX_BUFFER_ELEMENTS(card)); if (qeth_init_qdio_out_buf(queue, bidx)) { QETH_CARD_TEXT(card, 2, "outofbuf"); qeth_schedule_recovery(card);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bhadram Varka vbhadram@nvidia.com
[ Upstream commit b6cfffa7ad923c73f317ea50fd4ebcb3b4b6669c ]
HW does not support Half-duplex mode in multi-queue scenario. Fix it by not advertising the Half-Duplex mode if multi-queue enabled.
Signed-off-by: Bhadram Varka vbhadram@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -927,6 +927,7 @@ static void stmmac_check_pcs_mode(struct static int stmmac_init_phy(struct net_device *dev) { struct stmmac_priv *priv = netdev_priv(dev); + u32 tx_cnt = priv->plat->tx_queues_to_use; struct phy_device *phydev; char phy_id_fmt[MII_BUS_ID_SIZE + 3]; char bus_id[MII_BUS_ID_SIZE]; @@ -968,6 +969,15 @@ static int stmmac_init_phy(struct net_de SUPPORTED_1000baseT_Full);
/* + * Half-duplex mode not supported with multiqueue + * half-duplex can only works with single queue + */ + if (tx_cnt > 1) + phydev->supported &= ~(SUPPORTED_1000baseT_Half | + SUPPORTED_100baseT_Half | + SUPPORTED_10baseT_Half); + + /* * Broken HW is sometimes missing the pull-up resistor on the * MDIO line, which results in reads to non-existent devices returning * 0 rather than 0xffff. Catch this here and treat 0 as a non-existent
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doron Roberts-Kedes doronrk@fb.com
[ Upstream commit 977c7114ebda2e746a114840d3a875e0cdb826fb ]
On receving an incomplete message, the existing code stores the remaining length of the cloned skb in the early_eaten field instead of incrementing the value returned by __strp_recv. This defers invocation of sock_rfree for the current skb until the next invocation of __strp_recv, which returns early_eaten if early_eaten is non-zero.
This behavior causes a stall when the current message occupies the very tail end of a massive skb, and strp_peek/need_bytes indicates that the remainder of the current message has yet to arrive on the socket. The TCP receive buffer is totally full, causing the TCP window to go to zero, so the remainder of the message will never arrive.
Incrementing the value returned by __strp_recv by the amount otherwise stored in early_eaten prevents stalls of this nature.
Signed-off-by: Doron Roberts-Kedes doronrk@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/strparser/strparser.c | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-)
--- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -35,7 +35,6 @@ struct _strp_msg { */ struct strp_msg strp; int accum_len; - int early_eaten; };
static inline struct _strp_msg *_strp_msg(struct sk_buff *skb) @@ -115,20 +114,6 @@ static int __strp_recv(read_descriptor_t head = strp->skb_head; if (head) { /* Message already in progress */ - - stm = _strp_msg(head); - if (unlikely(stm->early_eaten)) { - /* Already some number of bytes on the receive sock - * data saved in skb_head, just indicate they - * are consumed. - */ - eaten = orig_len <= stm->early_eaten ? - orig_len : stm->early_eaten; - stm->early_eaten -= eaten; - - return eaten; - } - if (unlikely(orig_offset)) { /* Getting data with a non-zero offset when a message is * in progress is not expected. If it does happen, we @@ -297,9 +282,9 @@ static int __strp_recv(read_descriptor_t }
stm->accum_len += cand_len; + eaten += cand_len; strp->need_bytes = stm->strp.full_len - stm->accum_len; - stm->early_eaten = cand_len; STRP_STATS_ADD(strp->stats.bytes, cand_len); desc->count = 0; /* Stop reading socket */ break;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ]
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts.
Reported-by: Daniele Iamartino danielei@google.com Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/sysctl_net_ipv4.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -263,8 +263,9 @@ static int proc_tcp_fastopen_key(struct ipv4.sysctl_tcp_fastopen); struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) }; struct tcp_fastopen_context *ctxt; - int ret; u32 user_key[4]; /* 16 bytes, matching TCP_FASTOPEN_KEY_LENGTH */ + __le32 key[4]; + int ret, i;
tbl.data = kmalloc(tbl.maxlen, GFP_KERNEL); if (!tbl.data) @@ -273,11 +274,14 @@ static int proc_tcp_fastopen_key(struct rcu_read_lock(); ctxt = rcu_dereference(net->ipv4.tcp_fastopen_ctx); if (ctxt) - memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); + memcpy(key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); else - memset(user_key, 0, sizeof(user_key)); + memset(key, 0, sizeof(key)); rcu_read_unlock();
+ for (i = 0; i < ARRAY_SIZE(key); i++) + user_key[i] = le32_to_cpu(key[i]); + snprintf(tbl.data, tbl.maxlen, "%08x-%08x-%08x-%08x", user_key[0], user_key[1], user_key[2], user_key[3]); ret = proc_dostring(&tbl, write, buffer, lenp, ppos); @@ -288,13 +292,17 @@ static int proc_tcp_fastopen_key(struct ret = -EINVAL; goto bad_key; } - tcp_fastopen_reset_cipher(net, NULL, user_key, + + for (i = 0; i < ARRAY_SIZE(user_key); i++) + key[i] = cpu_to_le32(user_key[i]); + + tcp_fastopen_reset_cipher(net, NULL, key, TCP_FASTOPEN_KEY_LENGTH); }
bad_key: pr_debug("proc FO key set 0x%x-%x-%x-%x <- 0x%s: %u\n", - user_key[0], user_key[1], user_key[2], user_key[3], + user_key[0], user_key[1], user_key[2], user_key[3], (char *)tbl.data, ret); kfree(tbl.data); return ret;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ]
Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when we meet errors during ubuf allocation, the code does not check for NULL before calling sockfd_put(), this will lead NULL dereferencing. Fixing by checking sock pointer before.
Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1219,7 +1219,8 @@ err_used: if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: - sockfd_put(sock); + if (sock) + sockfd_put(sock); err_vq: mutex_unlock(&vq->mutex); err:
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudio Imbrenda imbrenda@linux.vnet.ibm.com
[ Upstream commit e5ab564c9ebee77794842ca7d7476147b83d6a27 ]
The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be used, and in fact in virtio_transport_common.c only 64 bit accessors are used. Using 32 bit accessors for 64 bit values breaks big endian systems.
This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.
Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport")
Signed-off-by: Claudio Imbrenda imbrenda@linux.vnet.ibm.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/virtio_transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -201,7 +201,7 @@ virtio_transport_send_pkt(struct virtio_ return -ENODEV; }
- if (le32_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) + if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) return virtio_transport_send_pkt_loopback(vsock, pkt);
if (pkt->reply)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit 82be2ab159a3a0ae4024b946a31f12b221f6c8ff ]
Following warning is seen when rmmod hinic. This is because affinity value is not reset before calling free_irq(). This patch fixes it.
[ 55.181232] WARNING: CPU: 38 PID: 19589 at kernel/irq/manage.c:1608 __free_irq+0x2aa/0x2c0
Fixes: 352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu for NUMA") Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/huawei/hinic/hinic_rx.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c @@ -439,6 +439,7 @@ static void rx_free_irq(struct hinic_rxq { struct hinic_rq *rq = rxq->rq;
+ irq_set_affinity_hint(rq->irq, NULL); free_irq(rq->irq, rxq); rx_del_napi(rxq); }
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pieter Jansen van Vuuren pieter.jansenvanvuuren@netronome.com
[ Upstream commit a64119415ff248efa61301783bc26551df5dabf6 ]
Previously it was not possible to distinguish between mpls ether types and other ether types. This leads to incorrect classification of offloaded filters that match on mpls ether type. For example the following two filters overlap:
# tc filter add dev eth0 parent ffff: \ protocol 0x8847 flower \ action mirred egress redirect dev eth1
# tc filter add dev eth0 parent ffff: \ protocol 0x0800 flower \ action mirred egress redirect dev eth2
The driver now correctly includes the mac_mpls layer where HW stores mpls fields, when it detects an mpls ether type. It also sets the MPLS_Q bit to indicate that the filter should match mpls packets.
Fixes: bb055c198d9b ("nfp: add mpls match offloading support") Signed-off-by: Pieter Jansen van Vuuren pieter.jansenvanvuuren@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Reviewed-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/flower/match.c | 14 ++++++++++++++ drivers/net/ethernet/netronome/nfp/flower/offload.c | 8 ++++++++ 2 files changed, 22 insertions(+)
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c +++ b/drivers/net/ethernet/netronome/nfp/flower/match.c @@ -123,6 +123,20 @@ nfp_flower_compile_mac(struct nfp_flower NFP_FLOWER_MASK_MPLS_Q;
frame->mpls_lse = cpu_to_be32(t_mpls); + } else if (dissector_uses_key(flow->dissector, + FLOW_DISSECTOR_KEY_BASIC)) { + /* Check for mpls ether type and set NFP_FLOWER_MASK_MPLS_Q + * bit, which indicates an mpls ether type but without any + * mpls fields. + */ + struct flow_dissector_key_basic *key_basic; + + key_basic = skb_flow_dissector_target(flow->dissector, + FLOW_DISSECTOR_KEY_BASIC, + flow->key); + if (key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_UC) || + key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_MC)) + frame->mpls_lse = cpu_to_be32(NFP_FLOWER_MASK_MPLS_Q); } }
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -264,6 +264,14 @@ nfp_flower_calculate_key_layers(struct n case cpu_to_be16(ETH_P_ARP): return -EOPNOTSUPP;
+ case cpu_to_be16(ETH_P_MPLS_UC): + case cpu_to_be16(ETH_P_MPLS_MC): + if (!(key_layer & NFP_FLOWER_LAYER_MAC)) { + key_layer |= NFP_FLOWER_LAYER_MAC; + key_size += sizeof(struct nfp_flower_mac_mpls); + } + break; + /* Will be included in layer 2. */ case cpu_to_be16(ETH_P_8021Q): break;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Belloni alexandre.belloni@bootlin.com
[ Upstream commit fec9d3b1dc4c481f20f5d2f5aef3ad1cb7504186 ]
The macb driver currently crashes on at91rm9200 with the following trace:
Unable to handle kernel NULL pointer dereference at virtual address 00000014 [...] [<c031da44>] (macb_rx_desc) from [<c031f2bc>] (at91ether_open+0x2e8/0x3f8) [<c031f2bc>] (at91ether_open) from [<c041e8d8>] (__dev_open+0x120/0x13c) [<c041e8d8>] (__dev_open) from [<c041ec08>] (__dev_change_flags+0x17c/0x1a8) [<c041ec08>] (__dev_change_flags) from [<c041ec4c>] (dev_change_flags+0x18/0x4c) [<c041ec4c>] (dev_change_flags) from [<c07a5f4c>] (ip_auto_config+0x220/0x10b0) [<c07a5f4c>] (ip_auto_config) from [<c000a4fc>] (do_one_initcall+0x78/0x18c) [<c000a4fc>] (do_one_initcall) from [<c0783e50>] (kernel_init_freeable+0x184/0x1c4) [<c0783e50>] (kernel_init_freeable) from [<c0574d70>] (kernel_init+0x8/0xe8) [<c0574d70>] (kernel_init) from [<c00090e0>] (ret_from_fork+0x14/0x34)
Solve that by initializing bp->queues[0].bp in at91ether_init (as is done in macb_init).
Fixes: ae1f2a56d273 ("net: macb: Added support for many RX queues") Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Acked-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cadence/macb_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -3732,6 +3732,8 @@ static int at91ether_init(struct platfor int err; u32 reg;
+ bp->queues[0].bp = bp; + dev->netdev_ops = &at91ether_netdev_ops; dev->ethtool_ops = &macb_ethtool_ops;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 3f76df198288ceec92fc9eddecad1e73c52769b0 ]
As noticed by Eric, we need to switch to the helper dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too, otheriwse still miss dev_qdisc_change_tx_queue_len().
Fixes: 6a643ddb5624 ("net: introduce helper dev_change_tx_queue_len()") Reported-by: Eric Dumazet edumazet@google.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev_ioctl.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
--- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -285,16 +285,9 @@ static int dev_ifsioc(struct net *net, s if (ifr->ifr_qlen < 0) return -EINVAL; if (dev->tx_queue_len ^ ifr->ifr_qlen) { - unsigned int orig_len = dev->tx_queue_len; - - dev->tx_queue_len = ifr->ifr_qlen; - err = call_netdevice_notifiers( - NETDEV_CHANGE_TX_QUEUE_LEN, dev); - err = notifier_to_errno(err); - if (err) { - dev->tx_queue_len = orig_len; + err = dev_change_tx_queue_len(dev, ifr->ifr_qlen); + if (err) return err; - } } return 0;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Hurley john.hurley@netronome.com
[ Upstream commit 951a8ee6def39e25d0e60b9394e5a249ba8b2390 ]
TC shared blocks allow multiple qdiscs to be grouped together and filters shared between them. Currently the chains of filters attached to a block are only flushed when the block is removed. If a qdisc is removed from a block but the block still exists, flow del messages are not passed to the callback registered for that qdisc. For the NFP, this presents the possibility of rules still existing in hw when they should be removed.
Prevent binding to shared blocks until the kernel can send per qdisc del messages when block unbinds occur.
tcf_block_shared() was not used outside of the core until now, so also add an empty implementation for builds with CONFIG_NET_CLS=n.
Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure") Signed-off-by: John Hurley john.hurley@netronome.com Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/bpf/main.c | 3 +++ drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 +++ include/net/pkt_cls.h | 5 +++++ 3 files changed, 11 insertions(+)
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -194,6 +194,9 @@ static int nfp_bpf_setup_tc_block(struct if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) return -EOPNOTSUPP;
+ if (tcf_block_shared(f->block)) + return -EOPNOTSUPP; + switch (f->command) { case TC_BLOCK_BIND: return tcf_block_cb_register(f->block, --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c @@ -601,6 +601,9 @@ static int nfp_flower_setup_tc_block(str if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) return -EOPNOTSUPP;
+ if (tcf_block_shared(f->block)) + return -EOPNOTSUPP; + switch (f->command) { case TC_BLOCK_BIND: return tcf_block_cb_register(f->block, --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -111,6 +111,11 @@ void tcf_block_put_ext(struct tcf_block { }
+static inline bool tcf_block_shared(struct tcf_block *block) +{ + return false; +} + static inline struct Qdisc *tcf_block_q(struct tcf_block *block) { return NULL;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit cb257783c2927b73614b20f915a91ff78aa6f3e8 ]
Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open") Reported-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1810,7 +1810,7 @@ static int talk_to_netback(struct xenbus err = xen_net_read_mac(dev, info->netdev->dev_addr); if (err) { xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename); - goto out; + goto out_unlocked; }
rtnl_lock(); @@ -1925,6 +1925,7 @@ abort_transaction_no_dev_fatal: xennet_destroy_queues(info); out: rtnl_unlock(); +out_unlocked: device_unregister(&dev->dev); return err; }
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit 45c8184c1bed1ca8a7f02918552063a00b909bf5 ]
Update the features after calling register_netdev() otherwise the device features are not set up correctly and it not possible to change the MTU of the device. After this change, the features reported by ethtool match the device's features before the commit which introduced the issue and it is possible to change the device's MTU.
Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open") Reported-by: Liam Shepherd liam@dancer.es Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1951,10 +1951,6 @@ static int xennet_connect(struct net_dev /* talk_to_netback() sets the correct number of queues */ num_queues = dev->real_num_tx_queues;
- rtnl_lock(); - netdev_update_features(dev); - rtnl_unlock(); - if (dev->reg_state == NETREG_UNINITIALIZED) { err = register_netdev(dev); if (err) { @@ -1964,6 +1960,10 @@ static int xennet_connect(struct net_dev } }
+ rtnl_lock(); + netdev_update_features(dev); + rtnl_unlock(); + /* * All public and private state should now be sane. Get * ready to start sending and receiving packets and give the driver
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Govindarajulu Varadarajan gvaradar@cisco.com
[ Upstream commit 56f772279a762984f6e9ebbf24a7c829faba5712 ]
In failure path, we overwrite err to what vnic_rq_disable() returns. In case it returns 0, enic_open() returns success in case of error.
Reported-by: Ben Hutchings ben.hutchings@codethink.co.uk Fixes: e8588e268509 ("enic: enable rq before updating rq descriptors") Signed-off-by: Govindarajulu Varadarajan gvaradar@cisco.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cisco/enic/enic_main.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -1920,7 +1920,7 @@ static int enic_open(struct net_device * { struct enic *enic = netdev_priv(netdev); unsigned int i; - int err; + int err, ret;
err = enic_request_intr(enic); if (err) { @@ -1977,10 +1977,9 @@ static int enic_open(struct net_device *
err_out_free_rq: for (i = 0; i < enic->rq_count; i++) { - err = vnic_rq_disable(&enic->rq[i]); - if (err) - return err; - vnic_rq_clean(&enic->rq[i], enic_free_rq_buf); + ret = vnic_rq_disable(&enic->rq[i]); + if (!ret) + vnic_rq_clean(&enic->rq[i], enic_free_rq_buf); } enic_dev_notify_unset(enic); err_out_free_intr:
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Or Gerlitz ogerlitz@mellanox.com
[ Upstream commit aff2252a2ad3844ca47bf2f18af071101baace40 ]
In smartnic env, the host (PF) driver might not be an e-switch manager, hence the switchdev mode representors are running on the embedded cpu (EC) and not at the host.
As such, we should avoid dealing with vport representors if not being esw manager.
Fixes: b5ca15ad7e61 ('IB/mlx5: Add proper representors support') Signed-off-by: Or Gerlitz ogerlitz@mellanox.com Reviewed-by: Eli Cohen eli@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/mlx5/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -5736,7 +5736,7 @@ static void *mlx5_ib_add(struct mlx5_cor dev->num_ports = max(MLX5_CAP_GEN(mdev, num_ports), MLX5_CAP_GEN(mdev, num_vhca_ports));
- if (MLX5_VPORT_MANAGER(mdev) && + if (MLX5_ESWITCH_MANAGER(mdev) && mlx5_ib_eswitch_mode(mdev->priv.eswitch) == SRIOV_OFFLOADS) { dev->rep = mlx5_ib_vport_rep(mdev->priv.eswitch, 0);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.ibm.com
[ Upstream commit 4664610537d398d55be19432f9cd9c29c831e159 ]
This reverts commit b7493e91c11a757cf0f8ab26989642ee4bb2c642.
On its own, querying RDEV for a MAC address works fine. But when upgrading from a qeth that previously queried DDEV on a z/VM NIC (ie. any kernel with commit ec61bd2fd2a2), the RDEV query now returns a _different_ MAC address than the DDEV query.
If the NIC is configured with MACPROTECT, z/VM apparently requires us to use the MAC that was initially returned (on DDEV) and registered. So after upgrading to a kernel that uses RDEV, the SETVMAC registration cmd for the new MAC address fails and we end up with a non-operabel interface.
To avoid regressions on upgrade, switch back to using DDEV for the MAC address query. The downgrade path (first RDEV, later DDEV) is fine, in this case both queries return the same MAC address.
Fixes: b7493e91c11a ("s390/qeth: use Read device to query hypervisor for MAC") Reported-by: Michal Kubecek mkubecek@suse.com Tested-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -4845,7 +4845,7 @@ int qeth_vm_request_mac(struct qeth_card goto out; }
- ccw_device_get_id(CARD_RDEV(card), &id); + ccw_device_get_id(CARD_DDEV(card), &id); request->resp_buf_len = sizeof(*response); request->resp_version = DIAG26C_VERSION2; request->op_code = DIAG26C_GET_MAC;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Gorbik gor@linux.ibm.com
[ Upstream commit 9d0a58fb9747afd27d490c02a97889a1b59f6be4 ]
*ether_addr*_64bits functions have been introduced to optimize performance critical paths, which access 6-byte ethernet address as u64 value to get "nice" assembly. A harmless hack works nicely on ethernet addresses shoved into a structure or a larger buffer, until busted by Kasan on smth like plain (u8 *)[6].
qeth_l2_set_mac_address calls qeth_l2_remove_mac passing u8 old_addr[ETH_ALEN] as an argument.
Adding/removing macs for an ethernet adapter is not that performance critical. Moreover is_multicast_ether_addr_64bits itself on s390 is not faster than is_multicast_ether_addr:
is_multicast_ether_addr(%r2) -> %r2 llc %r2,0(%r2) risbg %r2,%r2,63,191,0
is_multicast_ether_addr_64bits(%r2) -> %r2 llgc %r2,0(%r2) risbg %r2,%r2,63,191,0
So, let's just use is_multicast_ether_addr instead of is_multicast_ether_addr_64bits.
Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence") Reviewed-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l2_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -141,7 +141,7 @@ static int qeth_l2_send_setmac(struct qe
static int qeth_l2_write_mac(struct qeth_card *card, u8 *mac) { - enum qeth_ipa_cmds cmd = is_multicast_ether_addr_64bits(mac) ? + enum qeth_ipa_cmds cmd = is_multicast_ether_addr(mac) ? IPA_CMD_SETGMAC : IPA_CMD_SETVMAC; int rc;
@@ -158,7 +158,7 @@ static int qeth_l2_write_mac(struct qeth
static int qeth_l2_remove_mac(struct qeth_card *card, u8 *mac) { - enum qeth_ipa_cmds cmd = is_multicast_ether_addr_64bits(mac) ? + enum qeth_ipa_cmds cmd = is_multicast_ether_addr(mac) ? IPA_CMD_DELGMAC : IPA_CMD_DELVMAC; int rc;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.ibm.com
[ Upstream commit 4789a21880488048105590049fc41a99f53d565d ]
When qeth_l2_set_mac_address() finds the card in a non-reachable state, it merely copies the new MAC address into dev->dev_addr so that __qeth_l2_set_online() can later register it with the HW.
But __qeth_l2_set_online() may very well be running concurrently, so we can't trust the card state without appropriate locking: If the online sequence is past the point where it registers dev->dev_addr (but not yet in SOFTSETUP state), any address change needs to be properly programmed into the HW. Otherwise the netdevice ends up with a different MAC address than what's set in the HW, and inbound traffic is not forwarded as expected.
This is most likely to occur for OSD in LPAR, where commit 21b1702af12e ("s390/qeth: improve fallback to random MAC address") now triggers eg. systemd to immediately change the MAC when the netdevice is registered with a NET_ADDR_RANDOM address.
Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence") Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l2_main.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -523,27 +523,34 @@ static int qeth_l2_set_mac_address(struc return -ERESTARTSYS; }
+ /* avoid racing against concurrent state change: */ + if (!mutex_trylock(&card->conf_mutex)) + return -EAGAIN; + if (!qeth_card_hw_is_reachable(card)) { ether_addr_copy(dev->dev_addr, addr->sa_data); - return 0; + goto out_unlock; }
/* don't register the same address twice */ if (ether_addr_equal_64bits(dev->dev_addr, addr->sa_data) && (card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED)) - return 0; + goto out_unlock;
/* add the new address, switch over, drop the old */ rc = qeth_l2_send_setmac(card, addr->sa_data); if (rc) - return rc; + goto out_unlock; ether_addr_copy(old_addr, dev->dev_addr); ether_addr_copy(dev->dev_addr, addr->sa_data);
if (card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED) qeth_l2_remove_mac(card, old_addr); card->info.mac_bits |= QETH_LAYER2_MAC_REGISTERED; - return 0; + +out_unlock: + mutex_unlock(&card->conf_mutex); + return rc; }
static void qeth_promisc_to_bridge(struct qeth_card *card)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bert Kenward bkenward@solarflare.com
[ Upstream commit cafb39600e7a73263122a0e2db052d691686378f ]
Fixes: fc7a6c287ff3 ("sfc: use a semaphore to lock farch filters too") Suggested-by: Joseph Korty joe.korty@concurrent-rt.com Signed-off-by: Bert Kenward bkenward@solarflare.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/sfc/farch.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/sfc/farch.c +++ b/drivers/net/ethernet/sfc/farch.c @@ -2794,6 +2794,7 @@ int efx_farch_filter_table_probe(struct if (!state) return -ENOMEM; efx->filter_state = state; + init_rwsem(&state->lock);
table = &state->table[EFX_FARCH_FILTER_TABLE_RX_IP]; table->id = EFX_FARCH_FILTER_TABLE_RX_IP;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesper Dangaard Brouer brouer@redhat.com
[ Upstream commit 2471c75efed32529698c26da499954f0253cb401 ]
The driver was combining XDP_TX virtqueue_kick and XDP_REDIRECT map flushing (xdp_do_flush_map). This is suboptimal, these two flush operations should be kept separate.
The suboptimal behavior was introduced in commit 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets").
Fixes: 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets") Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -50,6 +50,10 @@ module_param(napi_tx, bool, 0644); /* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */ #define VIRTIO_XDP_HEADROOM 256
+/* Separating two types of XDP xmit */ +#define VIRTIO_XDP_TX BIT(0) +#define VIRTIO_XDP_REDIR BIT(1) + /* RX packet size EWMA. The average packet size is used to determine the packet * buffer size when refilling RX rings. As the entire RX ring may be refilled * at once, the weight is chosen so that the EWMA will be insensitive to short- @@ -547,7 +551,7 @@ static struct sk_buff *receive_small(str struct receive_queue *rq, void *buf, void *ctx, unsigned int len, - bool *xdp_xmit) + unsigned int *xdp_xmit) { struct sk_buff *skb; struct bpf_prog *xdp_prog; @@ -615,14 +619,14 @@ static struct sk_buff *receive_small(str trace_xdp_exception(vi->dev, xdp_prog, act); goto err_xdp; } - *xdp_xmit = true; + *xdp_xmit |= VIRTIO_XDP_TX; rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: err = xdp_do_redirect(dev, &xdp, xdp_prog); if (err) goto err_xdp; - *xdp_xmit = true; + *xdp_xmit |= VIRTIO_XDP_REDIR; rcu_read_unlock(); goto xdp_xmit; default: @@ -684,7 +688,7 @@ static struct sk_buff *receive_mergeable void *buf, void *ctx, unsigned int len, - bool *xdp_xmit) + unsigned int *xdp_xmit) { struct virtio_net_hdr_mrg_rxbuf *hdr = buf; u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); @@ -772,7 +776,7 @@ static struct sk_buff *receive_mergeable put_page(xdp_page); goto err_xdp; } - *xdp_xmit = true; + *xdp_xmit |= VIRTIO_XDP_REDIR; if (unlikely(xdp_page != page)) put_page(page); rcu_read_unlock(); @@ -784,7 +788,7 @@ static struct sk_buff *receive_mergeable put_page(xdp_page); goto err_xdp; } - *xdp_xmit = true; + *xdp_xmit |= VIRTIO_XDP_TX; if (unlikely(xdp_page != page)) put_page(page); rcu_read_unlock(); @@ -893,7 +897,8 @@ xdp_xmit: }
static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq, - void *buf, unsigned int len, void **ctx, bool *xdp_xmit) + void *buf, unsigned int len, void **ctx, + unsigned int *xdp_xmit) { struct net_device *dev = vi->dev; struct sk_buff *skb; @@ -1186,7 +1191,8 @@ static void refill_work(struct work_stru } }
-static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit) +static int virtnet_receive(struct receive_queue *rq, int budget, + unsigned int *xdp_xmit) { struct virtnet_info *vi = rq->vq->vdev->priv; unsigned int len, received = 0, bytes = 0; @@ -1275,7 +1281,7 @@ static int virtnet_poll(struct napi_stru struct virtnet_info *vi = rq->vq->vdev->priv; struct send_queue *sq; unsigned int received, qp; - bool xdp_xmit = false; + unsigned int xdp_xmit = 0;
virtnet_poll_cleantx(rq);
@@ -1285,12 +1291,14 @@ static int virtnet_poll(struct napi_stru if (received < budget) virtqueue_napi_complete(napi, rq->vq, received);
- if (xdp_xmit) { + if (xdp_xmit & VIRTIO_XDP_REDIR) + xdp_do_flush_map(); + + if (xdp_xmit & VIRTIO_XDP_TX) { qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); sq = &vi->sq[qp]; virtqueue_kick(sq->vq); - xdp_do_flush_map(); }
return received;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janakarajan Natarajan Janakarajan.Natarajan@amd.com
commit d30f370d3a4998c13ed3e5c8ef607d05be0a987a upstream.
Prevent a config where KVM_AMD=y and CRYPTO_DEV_CCP_DD=m thereby ensuring that AMD Secure Processor device driver will be built-in when KVM_AMD is also built-in.
v1->v2: * Removed usage of 'imply' Kconfig option. * Change patch commit message.
Fixes: 505c9e94d832 ("KVM: x86: prefer "depends on" to "select" for SEV")
Cc: stable@vger.kernel.org # 4.16.x Signed-off-by: Janakarajan Natarajan Janakarajan.Natarajan@amd.com Reviewed-by: Brijesh Singh brijesh.singh@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -85,7 +85,7 @@ config KVM_AMD_SEV def_bool y bool "AMD Secure Encrypted Virtualization (SEV) support" depends on KVM_AMD && X86_64 - depends on CRYPTO_DEV_CCP && CRYPTO_DEV_CCP_DD && CRYPTO_DEV_SP_PSP + depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m) ---help--- Provides support for launching Encrypted VMs on AMD processors.
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream.
t.qset_idx can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl() warn: potential spectre issue 'adapter->msix_info'
Fix this by sanitizing t.qset_idx before using it to index adapter->msix_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -51,6 +51,7 @@ #include <linux/sched.h> #include <linux/slab.h> #include <linux/uaccess.h> +#include <linux/nospec.h>
#include "common.h" #include "cxgb3_ioctl.h" @@ -2268,6 +2269,7 @@ static int cxgb_extension_ioctl(struct n
if (t.qset_idx >= nqsets) return -EINVAL; + t.qset_idx = array_index_nospec(t.qset_idx, nqsets);
q = &adapter->params.sge.qset[q1 + t.qset_idx]; t.rspq_size = q->rspq_size;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
commit 12dfa2f68ab659636e092db13b5d17cf9aac82af upstream.
When connecting to AP, mac80211 asks driver to enter and leave PS quickly, but driver deinit doesn't wait for delayed work complete when entering PS, then driver reinit procedure and delay work are running simultaneously. This will cause unpredictable kernel oops or crash like
rtl8723be: error H2C cmd because of Fw download fail!!! WARNING: CPU: 3 PID: 159 at drivers/net/wireless/realtek/rtlwifi/ rtl8723be/fw.c:227 rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be] CPU: 3 PID: 159 Comm: kworker/3:2 Tainted: G O 4.16.13-2-ARCH #1 Hardware name: ASUSTeK COMPUTER INC. X556UF/X556UF, BIOS X556UF.406 10/21/2016 Workqueue: rtl8723be_pci rtl_c2hcmd_wq_callback [rtlwifi] RIP: 0010:rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be] RSP: 0018:ffffa6ab01e1bd70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffa26069071520 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff8be70e9c RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000048 R09: 0000000000000348 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffffa26069071520 R14: 0000000000000000 R15: ffffa2607d205f70 FS: 0000000000000000(0000) GS:ffffa26081d80000(0000) knlGS:000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000443b39d3000 CR3: 000000037700a005 CR4: 00000000003606e0 Call Trace: ? halbtc_send_bt_mp_operation.constprop.17+0xd5/0xe0 [btcoexist] ? ex_btc8723b1ant_bt_info_notify+0x3b8/0x820 [btcoexist] ? rtl_c2hcmd_launcher+0xab/0x110 [rtlwifi] ? process_one_work+0x1d1/0x3b0 ? worker_thread+0x2b/0x3d0 ? process_one_work+0x3b0/0x3b0 ? kthread+0x112/0x130 ? kthread_create_on_node+0x60/0x60 ? ret_from_fork+0x35/0x40 Code: 00 76 b4 e9 e2 fe ff ff 4c 89 ee 4c 89 e7 e8 56 22 86 ca e9 5e ...
This patch ensures all delayed works done before entering PS to satisfy our expectation, so use cancel_delayed_work_sync() instead. An exception is delayed work ips_nic_off_wq because running task may be itself, so add a parameter ips_wq to deinit function to handle this case.
This issue is reported and fixed in below threads: https://github.com/lwfinger/rtlwifi_new/issues/367 https://github.com/lwfinger/rtlwifi_new/issues/366
Tested-by: Evgeny Kapun abacabadabacaba@gmail.com # 8723DE Tested-by: Shivam Kakkar shivam543@gmail.com # 8723BE on 4.18-rc1 Signed-off-by: Ping-Ke Shih pkshih@realtek.com Fixes: cceb0a597320 ("rtlwifi: Add work queue for c2h cmd.") Cc: Stable stable@vger.kernel.org # 4.11+ Reviewed-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/realtek/rtlwifi/base.c | 17 ++++++++++------- drivers/net/wireless/realtek/rtlwifi/base.h | 2 +- drivers/net/wireless/realtek/rtlwifi/core.c | 2 +- drivers/net/wireless/realtek/rtlwifi/pci.c | 2 +- drivers/net/wireless/realtek/rtlwifi/ps.c | 4 ++-- drivers/net/wireless/realtek/rtlwifi/usb.c | 2 +- 6 files changed, 16 insertions(+), 13 deletions(-)
--- a/drivers/net/wireless/realtek/rtlwifi/base.c +++ b/drivers/net/wireless/realtek/rtlwifi/base.c @@ -485,18 +485,21 @@ static void _rtl_init_deferred_work(stru
}
-void rtl_deinit_deferred_work(struct ieee80211_hw *hw) +void rtl_deinit_deferred_work(struct ieee80211_hw *hw, bool ips_wq) { struct rtl_priv *rtlpriv = rtl_priv(hw);
del_timer_sync(&rtlpriv->works.watchdog_timer);
- cancel_delayed_work(&rtlpriv->works.watchdog_wq); - cancel_delayed_work(&rtlpriv->works.ips_nic_off_wq); - cancel_delayed_work(&rtlpriv->works.ps_work); - cancel_delayed_work(&rtlpriv->works.ps_rfon_wq); - cancel_delayed_work(&rtlpriv->works.fwevt_wq); - cancel_delayed_work(&rtlpriv->works.c2hcmd_wq); + cancel_delayed_work_sync(&rtlpriv->works.watchdog_wq); + if (ips_wq) + cancel_delayed_work(&rtlpriv->works.ips_nic_off_wq); + else + cancel_delayed_work_sync(&rtlpriv->works.ips_nic_off_wq); + cancel_delayed_work_sync(&rtlpriv->works.ps_work); + cancel_delayed_work_sync(&rtlpriv->works.ps_rfon_wq); + cancel_delayed_work_sync(&rtlpriv->works.fwevt_wq); + cancel_delayed_work_sync(&rtlpriv->works.c2hcmd_wq); } EXPORT_SYMBOL_GPL(rtl_deinit_deferred_work);
--- a/drivers/net/wireless/realtek/rtlwifi/base.h +++ b/drivers/net/wireless/realtek/rtlwifi/base.h @@ -121,7 +121,7 @@ void rtl_init_rfkill(struct ieee80211_hw void rtl_deinit_rfkill(struct ieee80211_hw *hw);
void rtl_watch_dog_timer_callback(struct timer_list *t); -void rtl_deinit_deferred_work(struct ieee80211_hw *hw); +void rtl_deinit_deferred_work(struct ieee80211_hw *hw, bool ips_wq);
bool rtl_action_proc(struct ieee80211_hw *hw, struct sk_buff *skb, u8 is_tx); int rtlwifi_rate_mapping(struct ieee80211_hw *hw, bool isht, --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -196,7 +196,7 @@ static void rtl_op_stop(struct ieee80211 /* reset sec info */ rtl_cam_reset_sec_info(hw);
- rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, false); } rtlpriv->intf_ops->adapter_stop(hw);
--- a/drivers/net/wireless/realtek/rtlwifi/pci.c +++ b/drivers/net/wireless/realtek/rtlwifi/pci.c @@ -2375,7 +2375,7 @@ void rtl_pci_disconnect(struct pci_dev * ieee80211_unregister_hw(hw); rtlmac->mac80211_registered = 0; } else { - rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, false); rtlpriv->intf_ops->adapter_stop(hw); } rtlpriv->cfg->ops->disable_interrupt(hw); --- a/drivers/net/wireless/realtek/rtlwifi/ps.c +++ b/drivers/net/wireless/realtek/rtlwifi/ps.c @@ -71,7 +71,7 @@ bool rtl_ps_disable_nic(struct ieee80211 struct rtl_priv *rtlpriv = rtl_priv(hw);
/*<1> Stop all timer */ - rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, true);
/*<2> Disable Interrupt */ rtlpriv->cfg->ops->disable_interrupt(hw); @@ -292,7 +292,7 @@ void rtl_ips_nic_on(struct ieee80211_hw struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw)); enum rf_pwrstate rtstate;
- cancel_delayed_work(&rtlpriv->works.ips_nic_off_wq); + cancel_delayed_work_sync(&rtlpriv->works.ips_nic_off_wq);
mutex_lock(&rtlpriv->locks.ips_mutex); if (ppsc->inactiveps) { --- a/drivers/net/wireless/realtek/rtlwifi/usb.c +++ b/drivers/net/wireless/realtek/rtlwifi/usb.c @@ -1132,7 +1132,7 @@ void rtl_usb_disconnect(struct usb_inter ieee80211_unregister_hw(hw); rtlmac->mac80211_registered = 0; } else { - rtl_deinit_deferred_work(hw); + rtl_deinit_deferred_work(hw, false); rtlpriv->intf_ops->adapter_stop(hw); } /*deinit rfkill */
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream.
Without this patch, firmware will not run properly on rtl8821ae, and it causes bad user experience. For example, bad connection performance with low rate, higher power consumption, and so on.
rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and each firmware has firmware data buffer and size individually. Original code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and this mismatch causes firmware checksum error, then firmware can't start.
In this situation, driver gives message "Firmware is not ready to run!".
Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware") Signed-off-by: Ping-Ke Shih pkshih@realtek.com Cc: Stable stable@vger.kernel.org # 4.0+ Reviewed-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/realtek/rtlwifi/core.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -130,7 +130,6 @@ found_alt: firmware->size); rtlpriv->rtlhal.wowlan_fwsize = firmware->size; } - rtlpriv->rtlhal.fwsize = firmware->size; release_firmware(firmware); }
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren stefan.wahren@i2se.com
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly.
This patch was tested on Raspberry Pi 3B+
Link: https://github.com/raspberrypi/linux/issues/2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable stable@vger.kernel.org Signed-off-by: Floris Bos bos@je-eigen-domein.nl Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/lan78xx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3193,6 +3193,7 @@ static void lan78xx_tx_bh(struct lan78xx pkt_cnt = 0; count = 0; length = 0; + spin_lock_irqsave(&tqp->lock, flags); for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) { if (skb_is_gso(skb)) { if (pkt_cnt) { @@ -3201,7 +3202,8 @@ static void lan78xx_tx_bh(struct lan78xx } count = 1; length = skb->len - TX_OVERHEAD; - skb2 = skb_dequeue(tqp); + __skb_unlink(skb, tqp); + spin_unlock_irqrestore(&tqp->lock, flags); goto gso_skb; }
@@ -3210,6 +3212,7 @@ static void lan78xx_tx_bh(struct lan78xx skb_totallen = skb->len + roundup(skb_totallen, sizeof(u32)); pkt_cnt++; } + spin_unlock_irqrestore(&tqp->lock, flags);
/* copy to a single skb */ skb = alloc_skb(skb_totallen, GFP_ATOMIC);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Mueller smueller@chronox.de
commit 2546da99212f22034aecf279da9c47cbfac6c981 upstream.
The RX SGL in processing is already registered with the RX SGL tracking list to support proper cleanup. The cleanup code path uses the sg_num_bytes variable which must therefore be always initialized, even in the error code path.
Signed-off-by: Stephan Mueller smueller@chronox.de Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com #syz test: https://github.com/google/kmsan.git master CC: stable@vger.kernel.org #4.14 Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/af_alg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1156,8 +1156,10 @@ int af_alg_get_rsgl(struct sock *sk, str
/* make one iovec available as scatterlist */ err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen); - if (err < 0) + if (err < 0) { + rsgl->sg_num_bytes = 0; return err; + }
/* chain the new scatterlist with previous one */ if (areq->last_rsgl)
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger crash:
Reported-and-tested-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com
Tested on:
commit: cf8cd3cd03e2 kmsan: fix CONFIG_KMSAN=n build git tree: https://github.com/google/kmsan.git/master kernel config: https://syzkaller.appspot.com/x/.config?x=c21487206316bcc3 compiler: clang version 7.0.0 (trunk 334104) patch: https://syzkaller.appspot.com/x/patch.diff?x=15fec658400000
Note: testing is done by a robot and is best-effort only.
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui decui@microsoft.com
commit 35a88a18d7ea58600e11590405bc93b08e16e7f5 upstream.
Commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can also run in tasklet context as the channel event callback, so bottom halves should be disabled to prevent a race condition.
With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that don't have commit f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled"), when the upper layer IRQ code calls hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the beginning of __local_bh_enable_ip():
IRQs not enabled as expected WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip
The warning exposes an issue in de0aa7b2f97d: local_bh_enable() can potentially call do_softirq(), which is not supposed to run when local IRQs are disabled. Let's fix this by using local_irq_save()/restore() instead.
Note: hv_pci_onchannelcallback() is not a hot path because it's only called when the PCI device is hot added and removed, which is infrequent.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Cc: stable@vger.kernel.org Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/host/pci-hyperv.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -1077,6 +1077,7 @@ static void hv_compose_msi_msg(struct ir struct pci_bus *pbus; struct pci_dev *pdev; struct cpumask *dest; + unsigned long flags; struct compose_comp_ctxt comp; struct tran_int_desc *int_desc; struct { @@ -1168,14 +1169,15 @@ static void hv_compose_msi_msg(struct ir * the channel callback directly when channel->target_cpu is * the current CPU. When the higher level interrupt code * calls us with interrupt enabled, let's add the - * local_bh_disable()/enable() to avoid race. + * local_irq_save()/restore() to avoid race: + * hv_pci_onchannelcallback() can also run in tasklet. */ - local_bh_disable(); + local_irq_save(flags);
if (hbus->hdev->channel->target_cpu == smp_processor_id()) hv_pci_onchannelcallback(hbus);
- local_bh_enable(); + local_irq_restore(flags);
if (hpdev->state == hv_pcichild_ejecting) { dev_err_once(&hbus->hdev->device,
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream.
the ebtables evaluation loop expects targets to return positive values (jumps), or negative values (absolute verdicts).
This is completely different from what xtables does. In xtables, targets are expected to return the standard netfilter verdicts, i.e. NF_DROP, NF_ACCEPT, etc.
ebtables will consider these as jumps.
Therefore reject any target found due to unspec fallback. v2: also reject watchers. ebtables ignores their return value, so a target that assumes skb ownership (and returns NF_STOLEN) causes use-after-free.
The only watchers in the 'ebtables' front-end are log and nflog; both have AF_BRIDGE specific wrappers on kernel side.
Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/netfilter/ebtables.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -396,6 +396,12 @@ ebt_check_watcher(struct ebt_entry_watch watcher = xt_request_find_target(NFPROTO_BRIDGE, w->u.name, 0); if (IS_ERR(watcher)) return PTR_ERR(watcher); + + if (watcher->family != NFPROTO_BRIDGE) { + module_put(watcher->me); + return -ENOENT; + } + w->u.watcher = watcher;
par->target = watcher; @@ -717,6 +723,13 @@ ebt_check_entry(struct ebt_entry *e, str goto cleanup_watchers; }
+ /* Reject UNSPEC, xtables verdicts/return values are incompatible */ + if (target->family != NFPROTO_BRIDGE) { + module_put(target->me); + ret = -ENOENT; + goto cleanup_watchers; + } + t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) {
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit fe10e398e860955bac4d28ec031b701d358465e4 upstream.
ReiserFS prepares log messages into a 1024-byte buffer with no bounds checks. Long messages, such as the "unknown mount option" warning when userspace passes a crafted mount options string, overflow this buffer. This causes KASAN to report a global-out-of-bounds write.
Fix it by truncating messages to the buffer size.
Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/reiserfs/prints.c | 141 +++++++++++++++++++++++++++++---------------------- 1 file changed, 81 insertions(+), 60 deletions(-)
--- a/fs/reiserfs/prints.c +++ b/fs/reiserfs/prints.c @@ -76,83 +76,99 @@ static char *le_type(struct reiserfs_key }
/* %k */ -static void sprintf_le_key(char *buf, struct reiserfs_key *key) +static int scnprintf_le_key(char *buf, size_t size, struct reiserfs_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", le32_to_cpu(key->k_dir_id), - le32_to_cpu(key->k_objectid), le_offset(key), - le_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + le32_to_cpu(key->k_dir_id), + le32_to_cpu(key->k_objectid), le_offset(key), + le_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
/* %K */ -static void sprintf_cpu_key(char *buf, struct cpu_key *key) +static int scnprintf_cpu_key(char *buf, size_t size, struct cpu_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", key->on_disk_key.k_dir_id, - key->on_disk_key.k_objectid, reiserfs_cpu_offset(key), - cpu_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + key->on_disk_key.k_dir_id, + key->on_disk_key.k_objectid, + reiserfs_cpu_offset(key), cpu_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
-static void sprintf_de_head(char *buf, struct reiserfs_de_head *deh) +static int scnprintf_de_head(char *buf, size_t size, + struct reiserfs_de_head *deh) { if (deh) - sprintf(buf, - "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", - deh_offset(deh), deh_dir_id(deh), deh_objectid(deh), - deh_location(deh), deh_state(deh)); + return scnprintf(buf, size, + "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", + deh_offset(deh), deh_dir_id(deh), + deh_objectid(deh), deh_location(deh), + deh_state(deh)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]");
}
-static void sprintf_item_head(char *buf, struct item_head *ih) +static int scnprintf_item_head(char *buf, size_t size, struct item_head *ih) { if (ih) { - strcpy(buf, - (ih_version(ih) == KEY_FORMAT_3_6) ? "*3.6* " : "*3.5*"); - sprintf_le_key(buf + strlen(buf), &(ih->ih_key)); - sprintf(buf + strlen(buf), ", item_len %d, item_location %d, " - "free_space(entry_count) %d", - ih_item_len(ih), ih_location(ih), ih_free_space(ih)); + char *p = buf; + char * const end = buf + size; + + p += scnprintf(p, end - p, "%s", + (ih_version(ih) == KEY_FORMAT_3_6) ? + "*3.6* " : "*3.5*"); + + p += scnprintf_le_key(p, end - p, &ih->ih_key); + + p += scnprintf(p, end - p, + ", item_len %d, item_location %d, free_space(entry_count) %d", + ih_item_len(ih), ih_location(ih), + ih_free_space(ih)); + return p - buf; } else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
-static void sprintf_direntry(char *buf, struct reiserfs_dir_entry *de) +static int scnprintf_direntry(char *buf, size_t size, + struct reiserfs_dir_entry *de) { char name[20];
memcpy(name, de->de_name, de->de_namelen > 19 ? 19 : de->de_namelen); name[de->de_namelen > 19 ? 19 : de->de_namelen] = 0; - sprintf(buf, ""%s"==>[%d %d]", name, de->de_dir_id, de->de_objectid); + return scnprintf(buf, size, ""%s"==>[%d %d]", + name, de->de_dir_id, de->de_objectid); }
-static void sprintf_block_head(char *buf, struct buffer_head *bh) +static int scnprintf_block_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, "level=%d, nr_items=%d, free_space=%d rdkey ", - B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); + return scnprintf(buf, size, + "level=%d, nr_items=%d, free_space=%d rdkey ", + B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); }
-static void sprintf_buffer_head(char *buf, struct buffer_head *bh) +static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, - "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", - bh->b_bdev, bh->b_size, - (unsigned long long)bh->b_blocknr, atomic_read(&(bh->b_count)), - bh->b_state, bh->b_page, - buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", - buffer_dirty(bh) ? "DIRTY" : "CLEAN", - buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); + return scnprintf(buf, size, + "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", + bh->b_bdev, bh->b_size, + (unsigned long long)bh->b_blocknr, + atomic_read(&(bh->b_count)), + bh->b_state, bh->b_page, + buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", + buffer_dirty(bh) ? "DIRTY" : "CLEAN", + buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); }
-static void sprintf_disk_child(char *buf, struct disk_child *dc) +static int scnprintf_disk_child(char *buf, size_t size, struct disk_child *dc) { - sprintf(buf, "[dc_number=%d, dc_size=%u]", dc_block_number(dc), - dc_size(dc)); + return scnprintf(buf, size, "[dc_number=%d, dc_size=%u]", + dc_block_number(dc), dc_size(dc)); }
static char *is_there_reiserfs_struct(char *fmt, int *what) @@ -189,55 +205,60 @@ static void prepare_error_buf(const char char *fmt1 = fmt_buf; char *k; char *p = error_buf; + char * const end = &error_buf[sizeof(error_buf)]; int what;
spin_lock(&error_lock);
- strcpy(fmt1, fmt); + if (WARN_ON(strscpy(fmt_buf, fmt, sizeof(fmt_buf)) < 0)) { + strscpy(error_buf, "format string too long", end - error_buf); + goto out_unlock; + }
while ((k = is_there_reiserfs_struct(fmt1, &what)) != NULL) { *k = 0;
- p += vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args);
switch (what) { case 'k': - sprintf_le_key(p, va_arg(args, struct reiserfs_key *)); + p += scnprintf_le_key(p, end - p, + va_arg(args, struct reiserfs_key *)); break; case 'K': - sprintf_cpu_key(p, va_arg(args, struct cpu_key *)); + p += scnprintf_cpu_key(p, end - p, + va_arg(args, struct cpu_key *)); break; case 'h': - sprintf_item_head(p, va_arg(args, struct item_head *)); + p += scnprintf_item_head(p, end - p, + va_arg(args, struct item_head *)); break; case 't': - sprintf_direntry(p, - va_arg(args, - struct reiserfs_dir_entry *)); + p += scnprintf_direntry(p, end - p, + va_arg(args, struct reiserfs_dir_entry *)); break; case 'y': - sprintf_disk_child(p, - va_arg(args, struct disk_child *)); + p += scnprintf_disk_child(p, end - p, + va_arg(args, struct disk_child *)); break; case 'z': - sprintf_block_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_block_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'b': - sprintf_buffer_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_buffer_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'a': - sprintf_de_head(p, - va_arg(args, - struct reiserfs_de_head *)); + p += scnprintf_de_head(p, end - p, + va_arg(args, struct reiserfs_de_head *)); break; }
- p += strlen(p); fmt1 = k + 2; } - vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args); +out_unlock: spin_unlock(&error_lock);
}
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream.
My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway.
A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option.
Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer.
Reproducer for the WARN_ONCE() that syzbot hit:
perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s
Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there):
perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s
Reported-by: syzbot syzkaller@googlegroups.com Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/dns_resolver/dns_key.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
--- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -86,35 +86,39 @@ dns_resolver_preparse(struct key_prepars opt++; kdebug("options: '%s'", opt); do { + int opt_len, opt_nlen; const char *eq; - int opt_len, opt_nlen, opt_vlen, tmp; + char optval[128];
next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (opt_len <= 0 || opt_len > 128) { + if (opt_len <= 0 || opt_len > sizeof(optval)) { pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", opt_len); return -EINVAL; }
- eq = memchr(opt, '=', opt_len) ?: end; - opt_nlen = eq - opt; - eq++; - opt_vlen = next_opt - eq; /* will be -1 if no value */ - - tmp = opt_vlen >= 0 ? opt_vlen : 0; - kdebug("option '%*.*s' val '%*.*s'", - opt_nlen, opt_nlen, opt, tmp, tmp, eq); + eq = memchr(opt, '=', opt_len); + if (eq) { + opt_nlen = eq - opt; + eq++; + memcpy(optval, eq, next_opt - eq); + optval[next_opt - eq] = '\0'; + } else { + opt_nlen = opt_len; + optval[0] = '\0'; + } + + kdebug("option '%*.*s' val '%s'", + opt_nlen, opt_nlen, opt, optval);
/* see if it's an error number representing a DNS error * that's to be recorded as the result in this key */ if (opt_nlen == sizeof(DNS_ERRORNO_OPTION) - 1 && memcmp(opt, DNS_ERRORNO_OPTION, opt_nlen) == 0) { kdebug("dns error number option"); - if (opt_vlen <= 0) - goto bad_option_value;
- ret = kstrtoul(eq, 10, &derrno); + ret = kstrtoul(optval, 10, &derrno); if (ret < 0) goto bad_option_value;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Watson davejwatson@fb.com
commit 32da12216e467dea70a09cd7094c30779ce0f9db upstream.
In the zerocopy sendmsg() path, there are error checks to revert the zerocopy if we get any error code. syzkaller has discovered that tls_push_record can return -ECONNRESET, which is fatal, and happens after the point at which it is safe to revert the iter, as we've already passed the memory to do_tcp_sendpages.
Previously this code could return -ENOMEM and we would want to revert the iter, but AFAIK this no longer returns ENOMEM after a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"), so we fail for all error codes.
Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Dave Watson davejwatson@fb.com Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -440,7 +440,7 @@ alloc_encrypted: ret = tls_push_record(sk, msg->msg_flags, record_type); if (!ret) continue; - if (ret == -EAGAIN) + if (ret < 0) goto send_end;
copied -= try_to_copy;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomas Bortoli tomasbortoli@gmail.com
commit 02f51d45937f7bc7f4dee21e9f85b2d5eac37104 upstream.
The autofs subsystem does not check that the "path" parameter is present for all cases where it is required when it is passed in via the "param" struct.
In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl command.
To solve it, modify validate_dev_ioctl(function to check that a path has been provided for ioctl commands that require it.
Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto... Signed-off-by: Tomas Bortoli tomasbortoli@gmail.com Signed-off-by: Ian Kent raven@themaw.net Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/autofs4/dev-ioctl.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -148,6 +148,15 @@ static int validate_dev_ioctl(int cmd, s cmd); goto out; } + } else { + unsigned int inr = _IOC_NR(cmd); + + if (inr == AUTOFS_DEV_IOCTL_OPENMOUNT_CMD || + inr == AUTOFS_DEV_IOCTL_REQUESTER_CMD || + inr == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD) { + err = -EINVAL; + goto out; + } }
err = 0; @@ -284,7 +293,8 @@ static int autofs_dev_ioctl_openmount(st dev_t devid; int err, fd;
- /* param->path has already been checked */ + /* param->path has been checked in validate_dev_ioctl() */ + if (!param->openmount.devid) return -EINVAL;
@@ -446,10 +456,7 @@ static int autofs_dev_ioctl_requester(st dev_t devid; int err = -ENOENT;
- if (param->size <= AUTOFS_DEV_IOCTL_SIZE) { - err = -EINVAL; - goto out; - } + /* param->path has been checked in validate_dev_ioctl() */
devid = sbi->sb->s_dev;
@@ -534,10 +541,7 @@ static int autofs_dev_ioctl_ismountpoint unsigned int devid, magic; int err = -ENOENT;
- if (param->size <= AUTOFS_DEV_IOCTL_SIZE) { - err = -EINVAL; - goto out; - } + /* param->path has been checked in validate_dev_ioctl() */
name = param->path; type = param->ismountpoint.in.type;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
commit bab2c80e5a6c855657482eac9e97f5f3eedb509a upstream.
When pulling the NSH header in nsh_gso_segment, set the mac length based on the encapsulated packet type.
skb_reset_mac_len computes an offset to the network header, which here still points to the outer packet:
skb_reset_network_header(skb); [...] __skb_pull(skb, nsh_len); skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528 [..] skb_mac_gso_segment(skb, ..)
Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA... Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com Fixes: c411ed854584 ("nsh: add GSO support") Signed-off-by: Willem de Bruijn willemb@google.com Acked-by: Jiri Benc jbenc@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/nsh/nsh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/nsh/nsh.c +++ b/net/nsh/nsh.c @@ -104,7 +104,7 @@ static struct sk_buff *nsh_gso_segment(s __skb_pull(skb, nsh_len);
skb_reset_mac_header(skb); - skb_reset_mac_len(skb); + skb->mac_len = proto == htons(ETH_P_TEB) ? ETH_HLEN : 0; skb->protocol = proto;
features &= NETIF_F_SG;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream.
Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg | tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2
Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute.
Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier.
Reported-by: Eric Dumazet eric.dumazet@gmail.com Reported-by: John Sperbeck jsperbeck@google.com Signed-off-by: Florian Westphal fw@strlen.de Tested-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -585,6 +585,8 @@ int nf_ct_frag6_gather(struct net *net, fq->q.meat == fq->q.len && nf_ct_frag6_reasm(fq, skb, dev)) ret = 0; + else + skb_dst_drop(skb);
out_unlock: spin_unlock_bh(&fq->q.lock);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
commit 3ee7e8697d5860b173132606d80a9cd35e7113ee upstream.
syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was WB_shutting_down after wb->bdi->dev became NULL. This indicates that unregister_bdi() failed to call wb_shutdown() on one of wb objects.
The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's reference to wb structures before going through the list of wbs again and calling wb_shutdown() on each of them. This way the loop iterating through all wbs can easily miss a wb if that wb has already passed through cgwb_remove_from_bdi_list() called from wb_shutdown() from cgwb_release_workfn() and as a result fully shutdown bdi although wb_workfn() for this wb structure is still running. In fact there are also other ways cgwb_bdi_unregister() can race with cgwb_release_workfn() leading e.g. to use-after-free issues:
CPU1 CPU2 cgwb_bdi_unregister() cgwb_kill(*slot);
cgwb_release() queue_work(cgwb_release_wq, &wb->release_work); cgwb_release_workfn() wb = list_first_entry(&bdi->wb_list, ...) spin_unlock_irq(&cgwb_lock); wb_shutdown(wb); ... kfree_rcu(wb, rcu); wb_shutdown(wb); -> oops use-after-free
We solve these issues by synchronizing writeback structure shutdown from cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That way we also no longer need synchronization using WB_shutting_down as the mutex provides it for CONFIG_CGROUP_WRITEBACK case and without CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from bdi_unregister().
Reported-by: syzbot syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com Acked-by: Tejun Heo tj@kernel.org Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/backing-dev-defs.h | 2 +- mm/backing-dev.c | 20 +++++++------------- 2 files changed, 8 insertions(+), 14 deletions(-)
--- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -22,7 +22,6 @@ struct dentry; */ enum wb_state { WB_registered, /* bdi_register() was done */ - WB_shutting_down, /* wb_shutdown() in progress */ WB_writeback_running, /* Writeback is in progress */ WB_has_dirty_io, /* Dirty inodes on ->b_{dirty|io|more_io} */ WB_start_all, /* nr_pages == 0 (all) work pending */ @@ -189,6 +188,7 @@ struct backing_dev_info { #ifdef CONFIG_CGROUP_WRITEBACK struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */ struct rb_root cgwb_congested_tree; /* their congested states */ + struct mutex cgwb_release_mutex; /* protect shutdown of wb structs */ #else struct bdi_writeback_congested *wb_congested; #endif --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -359,15 +359,8 @@ static void wb_shutdown(struct bdi_write spin_lock_bh(&wb->work_lock); if (!test_and_clear_bit(WB_registered, &wb->state)) { spin_unlock_bh(&wb->work_lock); - /* - * Wait for wb shutdown to finish if someone else is just - * running wb_shutdown(). Otherwise we could proceed to wb / - * bdi destruction before wb_shutdown() is finished. - */ - wait_on_bit(&wb->state, WB_shutting_down, TASK_UNINTERRUPTIBLE); return; } - set_bit(WB_shutting_down, &wb->state); spin_unlock_bh(&wb->work_lock);
cgwb_remove_from_bdi_list(wb); @@ -379,12 +372,6 @@ static void wb_shutdown(struct bdi_write mod_delayed_work(bdi_wq, &wb->dwork, 0); flush_delayed_work(&wb->dwork); WARN_ON(!list_empty(&wb->work_list)); - /* - * Make sure bit gets cleared after shutdown is finished. Matches with - * the barrier provided by test_and_clear_bit() above. - */ - smp_wmb(); - clear_and_wake_up_bit(WB_shutting_down, &wb->state); }
static void wb_exit(struct bdi_writeback *wb) @@ -508,10 +495,12 @@ static void cgwb_release_workfn(struct w struct bdi_writeback *wb = container_of(work, struct bdi_writeback, release_work);
+ mutex_lock(&wb->bdi->cgwb_release_mutex); wb_shutdown(wb);
css_put(wb->memcg_css); css_put(wb->blkcg_css); + mutex_unlock(&wb->bdi->cgwb_release_mutex);
fprop_local_destroy_percpu(&wb->memcg_completions); percpu_ref_exit(&wb->refcnt); @@ -697,6 +686,7 @@ static int cgwb_bdi_init(struct backing_
INIT_RADIX_TREE(&bdi->cgwb_tree, GFP_ATOMIC); bdi->cgwb_congested_tree = RB_ROOT; + mutex_init(&bdi->cgwb_release_mutex);
ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL); if (!ret) { @@ -717,7 +707,10 @@ static void cgwb_bdi_unregister(struct b spin_lock_irq(&cgwb_lock); radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) cgwb_kill(*slot); + spin_unlock_irq(&cgwb_lock);
+ mutex_lock(&bdi->cgwb_release_mutex); + spin_lock_irq(&cgwb_lock); while (!list_empty(&bdi->wb_list)) { wb = list_first_entry(&bdi->wb_list, struct bdi_writeback, bdi_node); @@ -726,6 +719,7 @@ static void cgwb_bdi_unregister(struct b spin_lock_irq(&cgwb_lock); } spin_unlock_irq(&cgwb_lock); + mutex_unlock(&bdi->cgwb_release_mutex); }
/**
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
commit 9facc336876f7ecf9edba4c67b90426fde4ec898 upstream.
We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro() as well as the BPF image as read-only through bpf_prog_lock_ro(). In the case any of these would fail we throw a WARN_ON_ONCE() in order to yell loudly to the log. Perhaps, to some extend, this may be comparable to an allocation where __GFP_NOWARN is explicitly not set.
Added via 65869a47f348 ("bpf: improve read-only handling"), this behavior is slightly different compared to any of the other in-kernel set_memory_ro() users who do not check the return code of set_memory_ro() and friends /at all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given in BPF this is mandatory hardening step, we want to know whether there are any issues that would leave both BPF data writable. So it happens that syzkaller enabled fault injection and it triggered memory allocation failure deep inside x86's change_page_attr_set_clr() which was triggered from set_memory_ro().
Now, there are two options: i) leaving everything as is, and ii) reworking the image locking code in order to have a final checkpoint out of the central bpf_prog_select_runtime() which probes whether any of the calls during prog setup weren't successful, and then bailing out with an error. Option ii) is a better approach since this additional paranoia avoids altogether leaving any potential W+X pages from BPF side in the system. Therefore, lets be strict about it, and reject programs in such unlikely occasion. While testing I noticed also that one bpf_prog_lock_ro() call was missing on the outer dummy prog in case of calls, e.g. in the destructor we call bpf_prog_free_deferred() on the main prog where we try to bpf_prog_unlock_free() the program, and since we go via bpf_prog_select_runtime() do that as well.
Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Martin KaFai Lau kafai@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/filter.h | 60 +++++++++++++++++++++++++++++++------------------ kernel/bpf/core.c | 53 ++++++++++++++++++++++++++++++++++++++----- kernel/bpf/syscall.c | 4 --- 3 files changed, 86 insertions(+), 31 deletions(-)
--- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -453,7 +453,8 @@ struct sock_fprog_kern { };
struct bpf_binary_header { - unsigned int pages; + u16 pages; + u16 locked:1; u8 image[]; };
@@ -644,15 +645,18 @@ bpf_ctx_narrow_access_ok(u32 off, u32 si
#define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0]))
-#ifdef CONFIG_ARCH_HAS_SET_MEMORY static inline void bpf_prog_lock_ro(struct bpf_prog *fp) { +#ifdef CONFIG_ARCH_HAS_SET_MEMORY fp->locked = 1; - WARN_ON_ONCE(set_memory_ro((unsigned long)fp, fp->pages)); + if (set_memory_ro((unsigned long)fp, fp->pages)) + fp->locked = 0; +#endif }
static inline void bpf_prog_unlock_ro(struct bpf_prog *fp) { +#ifdef CONFIG_ARCH_HAS_SET_MEMORY if (fp->locked) { WARN_ON_ONCE(set_memory_rw((unsigned long)fp, fp->pages)); /* In case set_memory_rw() fails, we want to be the first @@ -660,34 +664,30 @@ static inline void bpf_prog_unlock_ro(st */ fp->locked = 0; } +#endif }
static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { - WARN_ON_ONCE(set_memory_ro((unsigned long)hdr, hdr->pages)); -} - -static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr) -{ - WARN_ON_ONCE(set_memory_rw((unsigned long)hdr, hdr->pages)); -} -#else -static inline void bpf_prog_lock_ro(struct bpf_prog *fp) -{ -} - -static inline void bpf_prog_unlock_ro(struct bpf_prog *fp) -{ -} - -static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) -{ +#ifdef CONFIG_ARCH_HAS_SET_MEMORY + hdr->locked = 1; + if (set_memory_ro((unsigned long)hdr, hdr->pages)) + hdr->locked = 0; +#endif }
static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr) { +#ifdef CONFIG_ARCH_HAS_SET_MEMORY + if (hdr->locked) { + WARN_ON_ONCE(set_memory_rw((unsigned long)hdr, hdr->pages)); + /* In case set_memory_rw() fails, we want to be the first + * to crash here instead of some random place later on. + */ + hdr->locked = 0; + } +#endif } -#endif /* CONFIG_ARCH_HAS_SET_MEMORY */
static inline struct bpf_binary_header * bpf_jit_binary_hdr(const struct bpf_prog *fp) @@ -698,6 +698,22 @@ bpf_jit_binary_hdr(const struct bpf_prog return (void *)addr; }
+#ifdef CONFIG_ARCH_HAS_SET_MEMORY +static inline int bpf_prog_check_pages_ro_single(const struct bpf_prog *fp) +{ + if (!fp->locked) + return -ENOLCK; + if (fp->jited) { + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp); + + if (!hdr->locked) + return -ENOLCK; + } + + return 0; +} +#endif + int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap); static inline int sk_filter(struct sock *sk, struct sk_buff *skb) { --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -583,6 +583,8 @@ bpf_jit_binary_alloc(unsigned int progle bpf_fill_ill_insns(hdr, size);
hdr->pages = size / PAGE_SIZE; + hdr->locked = 0; + hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)), PAGE_SIZE - sizeof(*hdr)); start = (get_random_int() % hole) & ~(alignment - 1); @@ -1513,6 +1515,33 @@ static int bpf_check_tail_call(const str return 0; }
+static int bpf_prog_check_pages_ro_locked(const struct bpf_prog *fp) +{ +#ifdef CONFIG_ARCH_HAS_SET_MEMORY + int i, err; + + for (i = 0; i < fp->aux->func_cnt; i++) { + err = bpf_prog_check_pages_ro_single(fp->aux->func[i]); + if (err) + return err; + } + + return bpf_prog_check_pages_ro_single(fp); +#endif + return 0; +} + +static void bpf_prog_select_func(struct bpf_prog *fp) +{ +#ifndef CONFIG_BPF_JIT_ALWAYS_ON + u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1); + + fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1]; +#else + fp->bpf_func = __bpf_prog_ret0_warn; +#endif +} + /** * bpf_prog_select_runtime - select exec runtime for BPF program * @fp: bpf_prog populated with internal BPF program @@ -1523,13 +1552,13 @@ static int bpf_check_tail_call(const str */ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) { -#ifndef CONFIG_BPF_JIT_ALWAYS_ON - u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1); + /* In case of BPF to BPF calls, verifier did all the prep + * work with regards to JITing, etc. + */ + if (fp->bpf_func) + goto finalize;
- fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1]; -#else - fp->bpf_func = __bpf_prog_ret0_warn; -#endif + bpf_prog_select_func(fp);
/* eBPF JITs can rewrite the program in case constant * blinding is active. However, in case of error during @@ -1550,6 +1579,8 @@ struct bpf_prog *bpf_prog_select_runtime if (*err) return fp; } + +finalize: bpf_prog_lock_ro(fp);
/* The tail call compatibility check can only be done at @@ -1558,7 +1589,17 @@ struct bpf_prog *bpf_prog_select_runtime * all eBPF JITs might immediately support all features. */ *err = bpf_check_tail_call(fp); + if (*err) + return fp;
+ /* Checkpoint: at this point onwards any cBPF -> eBPF or + * native eBPF program is read-only. If we failed to change + * the page attributes (e.g. allocation failure from + * splitting large pages), then reject the whole program + * in order to guarantee not ending up with any W+X pages + * from BPF side in kernel. + */ + *err = bpf_prog_check_pages_ro_locked(fp); return fp; } EXPORT_SYMBOL_GPL(bpf_prog_select_runtime); --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1328,9 +1328,7 @@ static int bpf_prog_load(union bpf_attr if (err < 0) goto free_used_maps;
- /* eBPF program is ready to be JITed */ - if (!prog->bpf_func) - prog = bpf_prog_select_runtime(prog, &err); + prog = bpf_prog_select_runtime(prog, &err); if (err < 0) goto free_used_maps;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Santosh Shilimkar santosh.shilimkar@oracle.com
commit f1693c63ab133d16994cc50f773982b5905af264 upstream.
Loop transport which is self loopback, remote port congestion update isn't relevant. Infact the xmit path already ignores it. Receive path needs to do the same.
Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com Reviewed-by: Sowmini Varadhan sowmini.varadhan@oracle.com Signed-off-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/rds/loop.c | 1 + net/rds/rds.h | 5 +++++ net/rds/recv.c | 5 +++++ 3 files changed, 11 insertions(+)
--- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -193,4 +193,5 @@ struct rds_transport rds_loop_transport .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", + .t_type = RDS_TRANS_LOOP, }; --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -479,6 +479,11 @@ struct rds_notifier { int n_status; };
+/* Available as part of RDS core, so doesn't need to participate + * in get_preferred transport etc + */ +#define RDS_TRANS_LOOP 3 + /** * struct rds_transport - transport specific behavioural hooks * --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -103,6 +103,11 @@ static void rds_recv_rcvbuf_delta(struct rds_stats_add(s_recv_bytes_added_to_socket, delta); else rds_stats_add(s_recv_bytes_removed_from_socket, -delta); + + /* loop transport doesn't send/recv congestion updates */ + if (rs->rs_transport->t_type == RDS_TRANS_LOOP) + return; + now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs);
rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d "
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit cd4a4ae4683dc2e09380118e205e057896dcda2b upstream.
If we end up splitting a bio and the queue goes away between the initial submission and the later split submission, then we can block forever in blk_queue_enter() waiting for the reference to drop to zero. This will never happen, since we already hold a reference.
Mark a split bio as already having entered the queue, so we can just use the live non-blocking queue enter variant.
Thanks to Tetsuo Handa for the analysis.
Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-core.c | 4 +++- block/blk-merge.c | 10 ++++++++++ include/linux/blk_types.h | 2 ++ 3 files changed, 15 insertions(+), 1 deletion(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio
if (bio->bi_opf & REQ_NOWAIT) flags = BLK_MQ_REQ_NOWAIT; - if (blk_queue_enter(q, flags) < 0) { + if (bio_flagged(bio, BIO_QUEUE_ENTERED)) + blk_queue_enter_live(q); + else if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queu /* there isn't chance to merge the splitted bio */ split->bi_opf |= REQ_NOMERGE;
+ /* + * Since we're recursing into make_request here, ensure + * that we mark this bio as already having entered the queue. + * If not, and the queue is going away, we can get stuck + * forever on waiting for the queue reference to drop. But + * that will never happen, as we're already holding a + * reference to it. + */ + bio_set_flag(*bio, BIO_QUEUE_ENTERED); + bio_chain(split, *bio); trace_block_split(q, split, (*bio)->bi_iter.bi_sector); generic_make_request(*bio); --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -186,6 +186,8 @@ struct bio { * throttling rules. Don't do it again. */ #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion * of this bio. */ +#define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ + /* See BVEC_POOL_OFFSET below before adding new flags */
/*
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Fastabend john.fastabend@gmail.com
commit 9901c5d77e969d8215a8e8d087ef02e6feddc84c upstream.
This fixes a crash where we assign tcp_prot to IPv6 sockets instead of tcpv6_prot.
Previously we overwrote the sk->prot field with tcp_prot even in the AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot are used.
Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously crashing case here.
Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com Acked-by: Martin KaFai Lau kafai@fb.com Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Wei Wang weiwan@google.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/sockmap.c | 58 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 48 insertions(+), 10 deletions(-)
--- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -112,6 +112,7 @@ static int bpf_tcp_recvmsg(struct sock * static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); static int bpf_tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); +static void bpf_tcp_close(struct sock *sk, long timeout);
static inline struct smap_psock *smap_psock_sk(const struct sock *sk) { @@ -133,7 +134,42 @@ out: return !empty; }
-static struct proto tcp_bpf_proto; +enum { + SOCKMAP_IPV4, + SOCKMAP_IPV6, + SOCKMAP_NUM_PROTS, +}; + +enum { + SOCKMAP_BASE, + SOCKMAP_TX, + SOCKMAP_NUM_CONFIGS, +}; + +static struct proto *saved_tcpv6_prot __read_mostly; +static DEFINE_SPINLOCK(tcpv6_prot_lock); +static struct proto bpf_tcp_prots[SOCKMAP_NUM_PROTS][SOCKMAP_NUM_CONFIGS]; +static void build_protos(struct proto prot[SOCKMAP_NUM_CONFIGS], + struct proto *base) +{ + prot[SOCKMAP_BASE] = *base; + prot[SOCKMAP_BASE].close = bpf_tcp_close; + prot[SOCKMAP_BASE].recvmsg = bpf_tcp_recvmsg; + prot[SOCKMAP_BASE].stream_memory_read = bpf_tcp_stream_read; + + prot[SOCKMAP_TX] = prot[SOCKMAP_BASE]; + prot[SOCKMAP_TX].sendmsg = bpf_tcp_sendmsg; + prot[SOCKMAP_TX].sendpage = bpf_tcp_sendpage; +} + +static void update_sk_prot(struct sock *sk, struct smap_psock *psock) +{ + int family = sk->sk_family == AF_INET6 ? SOCKMAP_IPV6 : SOCKMAP_IPV4; + int conf = psock->bpf_tx_msg ? SOCKMAP_TX : SOCKMAP_BASE; + + sk->sk_prot = &bpf_tcp_prots[family][conf]; +} + static int bpf_tcp_init(struct sock *sk) { struct smap_psock *psock; @@ -153,14 +189,17 @@ static int bpf_tcp_init(struct sock *sk) psock->save_close = sk->sk_prot->close; psock->sk_proto = sk->sk_prot;
- if (psock->bpf_tx_msg) { - tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg; - tcp_bpf_proto.sendpage = bpf_tcp_sendpage; - tcp_bpf_proto.recvmsg = bpf_tcp_recvmsg; - tcp_bpf_proto.stream_memory_read = bpf_tcp_stream_read; + /* Build IPv6 sockmap whenever the address of tcpv6_prot changes */ + if (sk->sk_family == AF_INET6 && + unlikely(sk->sk_prot != smp_load_acquire(&saved_tcpv6_prot))) { + spin_lock_bh(&tcpv6_prot_lock); + if (likely(sk->sk_prot != saved_tcpv6_prot)) { + build_protos(bpf_tcp_prots[SOCKMAP_IPV6], sk->sk_prot); + smp_store_release(&saved_tcpv6_prot, sk->sk_prot); + } + spin_unlock_bh(&tcpv6_prot_lock); } - - sk->sk_prot = &tcp_bpf_proto; + update_sk_prot(sk, psock); rcu_read_unlock(); return 0; } @@ -1070,8 +1109,7 @@ static void bpf_tcp_msg_add(struct smap_
static int bpf_tcp_ulp_register(void) { - tcp_bpf_proto = tcp_prot; - tcp_bpf_proto.close = bpf_tcp_close; + build_protos(bpf_tcp_prots[SOCKMAP_IPV4], &tcp_prot); /* Once BPF TX ULP is registered it is never unregistered. It * will be in the ULP list for the lifetime of the system. Doing * duplicate registers is not a problem.
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Fastabend john.fastabend@gmail.com
commit 7ebc14d507b4b55105da8d1a1eda323381529cc7 upstream.
Currently, when a sock is closed and the bpf_tcp_close() callback is used we remove memory but do not free the skb. Call consume_skb() if the skb is attached to the buffer.
Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks") Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/sockmap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -471,7 +471,8 @@ static int free_sg(struct sock *sk, int while (sg[i].length) { free += sg[i].length; sk_mem_uncharge(sk, sg[i].length); - put_page(sg_page(&sg[i])); + if (!md->skb) + put_page(sg_page(&sg[i])); sg[i].length = 0; sg[i].page_link = 0; sg[i].offset = 0; @@ -480,6 +481,8 @@ static int free_sg(struct sock *sk, int if (i == MAX_SKB_FRAGS) i = 0; } + if (md->skb) + consume_skb(md->skb);
return free; }
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
commit c7a897843224a92209f306c984975b704969b89d upstream.
syzkaller managed to trigger the following bug through fault injection:
[...] [ 141.043668] verifier bug. No program starts at insn 3 [ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613 get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline] [ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613 fixup_call_args kernel/bpf/verifier.c:5587 [inline] [ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613 bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952 [ 141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51 [ 141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014 [ 141.049877] Call Trace: [ 141.050324] __dump_stack lib/dump_stack.c:77 [inline] [ 141.050324] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 [ 141.050950] ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60 [ 141.051837] panic+0x238/0x4e7 kernel/panic.c:184 [ 141.052386] ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385 [ 141.053101] ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537 [ 141.053814] ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530 [ 141.054506] ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline] [ 141.054506] ? fixup_call_args kernel/bpf/verifier.c:5587 [inline] [ 141.054506] ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952 [ 141.055163] __warn.cold.8+0x163/0x1ba kernel/panic.c:538 [ 141.055820] ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline] [ 141.055820] ? fixup_call_args kernel/bpf/verifier.c:5587 [inline] [ 141.055820] ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952 [...]
What happens in jit_subprogs() is that kcalloc() for the subprog func buffer is failing with NULL where we then bail out. Latter is a plain return -ENOMEM, and this is definitely not okay since earlier in the loop we are walking all subprogs and temporarily rewrite insn->off to remember the subprog id as well as insn->imm to temporarily point the call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing out in such state and handing this over to the interpreter is troublesome since later/subsequent e.g. find_subprog() lookups are based on wrong insn->imm.
Therefore, once we hit this point, we need to jump to out_free path where we undo all changes from earlier loop, so that interpreter can work on unmodified insn->{off,imm}.
Another point is that should find_subprog() fail in jit_subprogs() due to a verifier bug, then we also should not simply defer the program to the interpreter since also here we did partial modifications. Instead we should just bail out entirely and return an error to the user who is trying to load the program.
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/verifier.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5349,6 +5349,10 @@ static int jit_subprogs(struct bpf_verif if (insn->code != (BPF_JMP | BPF_CALL) || insn->src_reg != BPF_PSEUDO_CALL) continue; + /* Upon error here we cannot fall back to interpreter but + * need a hard reject of the program. Thus -EFAULT is + * propagated in any case. + */ subprog = find_subprog(env, i + insn->imm + 1); if (subprog < 0) { WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", @@ -5369,7 +5373,7 @@ static int jit_subprogs(struct bpf_verif
func = kzalloc(sizeof(prog) * (env->subprog_cnt + 1), GFP_KERNEL); if (!func) - return -ENOMEM; + goto out_undo_insn;
for (i = 0; i <= env->subprog_cnt; i++) { subprog_start = subprog_end; @@ -5424,7 +5428,7 @@ static int jit_subprogs(struct bpf_verif tmp = bpf_int_jit_compile(func[i]); if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) { verbose(env, "JIT doesn't support bpf-to-bpf calls\n"); - err = -EFAULT; + err = -ENOTSUPP; goto out_free; } cond_resched(); @@ -5466,6 +5470,7 @@ out_free: if (func[i]) bpf_jit_free(func[i]); kfree(func); +out_undo_insn: /* cleanup main prog to be interpreted */ prog->jit_requested = 0; for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) { @@ -5492,6 +5497,8 @@ static int fixup_call_args(struct bpf_ve err = jit_subprogs(env); if (err == 0) return 0; + if (err == -EFAULT) + return err; } #ifndef CONFIG_BPF_JIT_ALWAYS_ON for (i = 0; i < prog->len; i++, insn++) {
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 3bc53be9db21040b5d2de4d455f023c8c494aa68 upstream.
syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is because nfc_llcp_send_ui_frame() is retrying the loop without any delay when nonblocking nfc_alloc_send_skb() returned NULL.
Since there is no need to use MSG_DONTWAIT if we retry until sock_alloc_send_pskb() succeeds, let's use blocking call. Also, in case an unexpected error occurred, let's break the loop if blocking nfc_alloc_send_skb() failed.
[1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/nfc/llcp_commands.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/nfc/llcp_commands.c +++ b/net/nfc/llcp_commands.c @@ -752,11 +752,14 @@ int nfc_llcp_send_ui_frame(struct nfc_ll pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len);
- pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT, + pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, 0, frag_len + LLCP_HEADER_SIZE, &err); if (pdu == NULL) { - pr_err("Could not allocate PDU\n"); - continue; + pr_err("Could not allocate PDU (error=%d)\n", err); + len -= remaining_len; + if (len == 0) + len = err; + break; }
pdu = llcp_add_header(pdu, dsap, ssap, LLCP_PDU_UI);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
commit 3aa1409a7b160f9444945c0df1cb079df82be84e upstream.
tbl->entries is not initialized after kmalloc(), therefore causes an uninit-value warning in ip_vs_lblc_check_expire() as reported by syzbot.
Reported-by: syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com Cc: Simon Horman horms@verge.net.au Cc: Julian Anastasov ja@ssi.bg Cc: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Acked-by: Julian Anastasov ja@ssi.bg Acked-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/ipvs/ip_vs_lblcr.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/netfilter/ipvs/ip_vs_lblcr.c +++ b/net/netfilter/ipvs/ip_vs_lblcr.c @@ -534,6 +534,7 @@ static int ip_vs_lblcr_init_svc(struct i tbl->counter = 1; tbl->dead = false; tbl->svc = svc; + atomic_set(&tbl->entries, 0);
/* * Hook periodic timer for garbage collection
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
commit 8b2ebb6cf064247d60cccbf1750610ac9bb2e672 upstream.
Similarly, tbl->entries is not initialized after kmalloc(), therefore causes an uninit-value warning in ip_vs_lblc_check_expire(), as reported by syzbot.
Reported-by: syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com Cc: Simon Horman horms@verge.net.au Cc: Julian Anastasov ja@ssi.bg Cc: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Acked-by: Julian Anastasov ja@ssi.bg Acked-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/ipvs/ip_vs_lblc.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/netfilter/ipvs/ip_vs_lblc.c +++ b/net/netfilter/ipvs/ip_vs_lblc.c @@ -371,6 +371,7 @@ static int ip_vs_lblc_init_svc(struct ip tbl->counter = 1; tbl->dead = false; tbl->svc = svc; + atomic_set(&tbl->entries, 0);
/* * Hook periodic timer for garbage collection
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit eff0e9e1078ea7dc1d794dc50e31baef984c46d7 upstream.
We've so far used the PSCI return codes for SMCCC because they were extremely similar. But with the new ARM DEN 0070A specification, "NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS".
Let's bite the bullet and add SMCCC specific return codes. Users can be repainted as and when required.
Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/arm-smccc.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -291,5 +291,10 @@ asmlinkage void __arm_smccc_hvc(unsigned */ #define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
+/* Return codes defined in ARM DEN 0070A */ +#define SMCCC_RET_SUCCESS 0 +#define SMCCC_RET_NOT_SUPPORTED -1 +#define SMCCC_RET_NOT_REQUIRED -2 + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream.
In order for the kernel to protect itself, let's call the SSBD mitigation implemented by the higher exception level (either hypervisor or firmware) on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the right exception level, hence the introduction of a runtime patching callback.
Reviewed-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Julien Grall julien.grall@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++++++++ arch/arm64/kernel/entry.S | 22 ++++++++++++++++++++++ include/linux/arm-smccc.h | 5 +++++ 3 files changed, 51 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -232,6 +232,30 @@ enable_smccc_arch_workaround_1(const str } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
+#ifdef CONFIG_ARM64_SSBD +void __init arm64_update_smccc_conduit(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + u32 insn; + + BUG_ON(nr_inst != 1); + + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + insn = aarch64_insn_get_hvc_value(); + break; + case PSCI_CONDUIT_SMC: + insn = aarch64_insn_get_smc_value(); + break; + default: + return; + } + + *updptr = cpu_to_le32(insn); +} +#endif /* CONFIG_ARM64_SSBD */ + #define CAP_MIDR_RANGE(model, v_min, r_min, v_max, r_max) \ .matches = is_affected_midr_range, \ .midr_range = MIDR_RANGE(model, v_min, r_min, v_max, r_max) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -18,6 +18,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/init.h> #include <linux/linkage.h>
@@ -137,6 +138,18 @@ alternative_else_nop_endif add \dst, \dst, #(\sym - .entry.tramp.text) .endm
+ // This macro corrupts x0-x3. It is the caller's duty + // to save/restore them if required. + .macro apply_ssbd, state +#ifdef CONFIG_ARM64_SSBD + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + mov w1, #\state +alternative_cb arm64_update_smccc_conduit + nop // Patched to SMC/HVC #0 +alternative_cb_end +#endif + .endm + .macro kernel_entry, el, regsize = 64 .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 @@ -163,6 +176,13 @@ alternative_else_nop_endif ldr x19, [tsk, #TSK_TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling.
+ apply_ssbd 1 + +#ifdef CONFIG_ARM64_SSBD + ldp x0, x1, [sp, #16 * 0] + ldp x2, x3, [sp, #16 * 1] +#endif + mov x29, xzr // fp pointed to user-space .else add x21, sp, #S_FRAME_SIZE @@ -303,6 +323,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: + apply_ssbd 0 + .endif
msr elr_el1, x21 // set up the return data --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -80,6 +80,11 @@ ARM_SMCCC_SMC_32, \ 0, 0x8000)
+#define ARM_SMCCC_ARCH_WORKAROUND_2 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0x7fff) + #ifndef __ASSEMBLY__
#include <linux/linkage.h>
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream.
In a heterogeneous system, we can end up with both affected and unaffected CPUs. Let's check their status before calling into the firmware.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 2 ++ arch/arm64/kernel/entry.S | 11 +++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -233,6 +233,8 @@ enable_smccc_arch_workaround_1(const str #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
#ifdef CONFIG_ARM64_SSBD +DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -140,8 +140,10 @@ alternative_else_nop_endif
// This macro corrupts x0-x3. It is the caller's duty // to save/restore them if required. - .macro apply_ssbd, state + .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD + ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 + cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit @@ -176,12 +178,13 @@ alternative_cb_end ldr x19, [tsk, #TSK_TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling.
- apply_ssbd 1 + apply_ssbd 1, 1f, x22, x23
#ifdef CONFIG_ARM64_SSBD ldp x0, x1, [sp, #16 * 0] ldp x2, x3, [sp, #16 * 1] #endif +1:
mov x29, xzr // fp pointed to user-space .else @@ -323,8 +326,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: - apply_ssbd 0 - + apply_ssbd 0, 5f, x0, x1 +5: .endif
msr elr_el1, x21 // set up the return data
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit a725e3dda1813ed306734823ac4c65ca04e38500 upstream.
As for Spectre variant-2, we rely on SMCCC 1.1 to provide the discovery mechanism for detecting the SSBD mitigation.
A new capability is also allocated for that purpose, and a config option.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 9 +++++ arch/arm64/include/asm/cpucaps.h | 3 + arch/arm64/kernel/cpu_errata.c | 69 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -938,6 +938,15 @@ config HARDEN_EL2_VECTORS
If unsure, say Y.
+config ARM64_SSBD + bool "Speculative Store Bypass Disable" if EXPERT + default y + help + This enables mitigation of the bypassing of previous stores + by speculative loads. + + If unsure, say Y. + menuconfig ARMV8_DEPRECATED bool "Emulate deprecated/obsolete ARMv8 instructions" depends on COMPAT --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -48,7 +48,8 @@ #define ARM64_HAS_CACHE_IDC 27 #define ARM64_HAS_CACHE_DIC 28 #define ARM64_HW_DBM 29 +#define ARM64_SSBD 30
-#define ARM64_NCAPS 30 +#define ARM64_NCAPS 31
#endif /* __ASM_CPUCAPS_H */ --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -256,6 +256,67 @@ void __init arm64_update_smccc_conduit(s
*updptr = cpu_to_le32(insn); } + +static void arm64_set_ssbd_mitigation(bool state) +{ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + default: + WARN_ON_ONCE(1); + break; + } +} + +static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry, + int scope) +{ + struct arm_smccc_res res; + bool supported = true; + + WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + return false; + + /* + * The probe function return value is either negative + * (unsupported or mitigated), positive (unaffected), or zero + * (requires mitigation). We only need to do anything in the + * last case. + */ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + default: + supported = false; + } + + if (supported) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + + return supported; +} #endif /* CONFIG_ARM64_SSBD */
#define CAP_MIDR_RANGE(model, v_min, r_min, v_max, r_max) \ @@ -514,6 +575,14 @@ const struct arm64_cpu_capabilities arm6 ERRATA_MIDR_RANGE_LIST(arm64_harden_el2_vectors), }, #endif +#ifdef CONFIG_ARM64_SSBD + { + .desc = "Speculative Store Bypass Disable", + .capability = ARM64_SSBD, + .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, + .matches = has_ssbd_mitigation, + }, +#endif { } };
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream.
On a system where the firmware implements ARCH_WORKAROUND_2, it may be useful to either permanently enable or disable the workaround for cases where the user decides that they'd rather not get a trap overhead, and keep the mitigation permanently on or off instead of switching it on exception entry/exit.
In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 17 +++ arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/kernel/cpu_errata.c | 103 ++++++++++++++++++++---- 3 files changed, 110 insertions(+), 16 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4092,6 +4092,23 @@ expediting. Set to zero to disable automatic expediting.
+ ssbd= [ARM64,HW] + Speculative Store Bypass Disable control + + On CPUs that are vulnerable to the Speculative + Store Bypass vulnerability and offer a + firmware based mitigation, this parameter + indicates how the mitigation should be used: + + force-on: Unconditionally enable mitigation for + for both kernel and userspace + force-off: Unconditionally disable mitigation for + for both kernel and userspace + kernel: Always enable mitigation in the + kernel, and offer a prctl interface + to allow userspace to register its + interest in being mitigated too. + stack_guard_gap= [MM] override the default stack gap protection. The value is in page units and it defines how many pages prior --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -537,6 +537,12 @@ static inline u64 read_zcr_features(void return zcr; }
+#define ARM64_SSBD_UNKNOWN -1 +#define ARM64_SSBD_FORCE_DISABLE 0 +#define ARM64_SSBD_KERNEL 1 +#define ARM64_SSBD_FORCE_ENABLE 2 +#define ARM64_SSBD_MITIGATED 3 + #endif /* __ASSEMBLY__ */
#endif --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -235,6 +235,38 @@ enable_smccc_arch_workaround_1(const str #ifdef CONFIG_ARM64_SSBD DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
+int ssbd_state __read_mostly = ARM64_SSBD_KERNEL; + +static const struct ssbd_options { + const char *str; + int state; +} ssbd_options[] = { + { "force-on", ARM64_SSBD_FORCE_ENABLE, }, + { "force-off", ARM64_SSBD_FORCE_DISABLE, }, + { "kernel", ARM64_SSBD_KERNEL, }, +}; + +static int __init ssbd_cfg(char *buf) +{ + int i; + + if (!buf || !buf[0]) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(ssbd_options); i++) { + int len = strlen(ssbd_options[i].str); + + if (strncmp(buf, ssbd_options[i].str, len)) + continue; + + ssbd_state = ssbd_options[i].state; + return 0; + } + + return -EINVAL; +} +early_param("ssbd", ssbd_cfg); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) @@ -278,44 +310,83 @@ static bool has_ssbd_mitigation(const st int scope) { struct arm_smccc_res res; - bool supported = true; + bool required = true; + s32 val;
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
- if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { + ssbd_state = ARM64_SSBD_UNKNOWN; return false; + }
- /* - * The probe function return value is either negative - * (unsupported or mitigated), positive (unaffected), or zero - * (requires mitigation). We only need to do anything in the - * last case. - */ switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break;
case PSCI_CONDUIT_SMC: arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break;
default: - supported = false; + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + } + + val = (s32)res.a0; + + switch (val) { + case SMCCC_RET_NOT_SUPPORTED: + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + + case SMCCC_RET_NOT_REQUIRED: + pr_info_once("%s mitigation not required\n", entry->desc); + ssbd_state = ARM64_SSBD_MITIGATED; + return false; + + case SMCCC_RET_SUCCESS: + required = true; + break; + + case 1: /* Mitigation not required on this CPU */ + required = false; + break; + + default: + WARN_ON(1); + return false; }
- if (supported) { - __this_cpu_write(arm64_ssbd_callback_required, 1); + switch (ssbd_state) { + case ARM64_SSBD_FORCE_DISABLE: + pr_info_once("%s disabled from command-line\n", entry->desc); + arm64_set_ssbd_mitigation(false); + required = false; + break; + + case ARM64_SSBD_KERNEL: + if (required) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + break; + + case ARM64_SSBD_FORCE_ENABLE: + pr_info_once("%s forced from command-line\n", entry->desc); arm64_set_ssbd_mitigation(true); + required = true; + break; + + default: + WARN_ON(1); + break; }
- return supported; + return required; } #endif /* CONFIG_ARM64_SSBD */
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit c32e1736ca03904c03de0e4459a673be194f56fd upstream.
We're about to need the mitigation state in various parts of the kernel in order to do the right thing for userspace and guests.
Let's expose an accessor that will let other subsystems know about the state.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -543,6 +543,16 @@ static inline u64 read_zcr_features(void #define ARM64_SSBD_FORCE_ENABLE 2 #define ARM64_SSBD_MITIGATED 3
+static inline int arm64_get_ssbd_state(void) +{ +#ifdef CONFIG_ARM64_SSBD + extern int ssbd_state; + return ssbd_state; +#else + return ARM64_SSBD_UNKNOWN; +#endif +} + #endif /* __ASSEMBLY__ */
#endif
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream.
In order to avoid checking arm64_ssbd_callback_required on each kernel entry/exit even if no mitigation is required, let's add yet another alternative that by default jumps over the mitigation, and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 14 ++++++++++++++ arch/arm64/kernel/entry.S | 3 +++ 2 files changed, 17 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -289,6 +289,20 @@ void __init arm64_update_smccc_conduit(s *updptr = cpu_to_le32(insn); }
+void __init arm64_enable_wa2_handling(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + BUG_ON(nr_inst != 1); + /* + * Only allow mitigation on EL1 entry/exit and guest + * ARCH_WORKAROUND_2 handling if the SSBD state allows it to + * be flipped. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_KERNEL) + *updptr = cpu_to_le32(aarch64_insn_gen_nop()); +} + static void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -142,6 +142,9 @@ alternative_else_nop_endif // to save/restore them if required. .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b \targ +alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 647d0519b53f440a55df163de21c52a8205431cc upstream.
On a system where firmware can dynamically change the state of the mitigation, the CPU will always come up with the mitigation enabled, including when coming back from suspend.
If the user has requested "no mitigation" via a command line option, let's enforce it by calling into the firmware again to disable it.
Similarily, for a resume from hibernate, the mitigation could have been disabled by the boot kernel. Let's ensure that it is set back on in that case.
Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 6 ++++++ arch/arm64/kernel/cpu_errata.c | 2 +- arch/arm64/kernel/hibernate.c | 11 +++++++++++ arch/arm64/kernel/suspend.c | 8 ++++++++ 4 files changed, 26 insertions(+), 1 deletion(-)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -553,6 +553,12 @@ static inline int arm64_get_ssbd_state(v #endif }
+#ifdef CONFIG_ARM64_SSBD +void arm64_set_ssbd_mitigation(bool state); +#else +static inline void arm64_set_ssbd_mitigation(bool state) {} +#endif + #endif /* __ASSEMBLY__ */
#endif --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -303,7 +303,7 @@ void __init arm64_enable_wa2_handling(st *updptr = cpu_to_le32(aarch64_insn_gen_nop()); }
-static void arm64_set_ssbd_mitigation(bool state) +void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -313,6 +313,17 @@ int swsusp_arch_suspend(void)
sleep_cpu = -EINVAL; __cpu_suspend_exit(); + + /* + * Just in case the boot kernel did turn the SSBD + * mitigation off behind our back, let's set the state + * to what we expect it to be. + */ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_ENABLE: + case ARM64_SSBD_KERNEL: + arm64_set_ssbd_mitigation(true); + } }
local_daif_restore(flags); --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c @@ -62,6 +62,14 @@ void notrace __cpu_suspend_exit(void) */ if (hw_breakpoint_restore) hw_breakpoint_restore(cpu); + + /* + * On resume, firmware implementing dynamic mitigation will + * have turned the mitigation on. If the user has forcefully + * disabled it, make sure their wishes are obeyed. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) + arm64_set_ssbd_mitigation(false); }
/*
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream.
In order to allow userspace to be mitigated on demand, let's introduce a new thread flag that prevents the mitigation from being turned off when exiting to userspace, and doesn't turn it on on entry into the kernel (with the assumption that the mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later patch.
Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/entry.S | 2 ++ 2 files changed, 3 insertions(+)
--- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -94,6 +94,7 @@ void arch_release_task_struct(struct tas #define TIF_32BIT 22 /* 32bit process */ #define TIF_SVE 23 /* Scalable Vector Extension in use */ #define TIF_SVE_VL_INHERIT 24 /* Inherit sve_vl_onexec across exec */ +#define TIF_SSBD 25 /* Wants SSB mitigation */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -147,6 +147,8 @@ alternative_cb arm64_enable_wa2_handling alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ + ldr \tmp2, [tsk, #TSK_TI_FLAGS] + tbnz \tmp2, #TIF_SSBD, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 9cdc0108baa8ef87c76ed834619886a46bd70cbe upstream.
If running on a system that performs dynamic SSBD mitigation, allow userspace to request the mitigation for itself. This is implemented as a prctl call, allowing the mitigation to be enabled or disabled at will for this particular thread.
Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/Makefile | 1 arch/arm64/kernel/ssbd.c | 110 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 111 insertions(+) create mode 100644 arch/arm64/kernel/ssbd.c
--- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -54,6 +54,7 @@ arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o arm64-obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o +arm64-obj-$(CONFIG_ARM64_SSBD) += ssbd.o
obj-y += $(arm64-obj-y) vdso/ probes/ obj-m += $(arm64-obj-m) --- /dev/null +++ b/arch/arm64/kernel/ssbd.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2018 ARM Ltd, All Rights Reserved. + */ + +#include <linux/errno.h> +#include <linux/sched.h> +#include <linux/thread_info.h> + +#include <asm/cpufeature.h> + +/* + * prctl interface for SSBD + * FIXME: Drop the below ifdefery once merged in 4.18. + */ +#ifdef PR_SPEC_STORE_BYPASS +static int ssbd_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + int state = arm64_get_ssbd_state(); + + /* Unsupported */ + if (state == ARM64_SSBD_UNKNOWN) + return -EINVAL; + + /* Treat the unaffected/mitigated state separately */ + if (state == ARM64_SSBD_MITIGATED) { + switch (ctrl) { + case PR_SPEC_ENABLE: + return -EPERM; + case PR_SPEC_DISABLE: + case PR_SPEC_FORCE_DISABLE: + return 0; + } + } + + /* + * Things are a bit backward here: the arm64 internal API + * *enables the mitigation* when the userspace API *disables + * speculation*. So much fun. + */ + switch (ctrl) { + case PR_SPEC_ENABLE: + /* If speculation is force disabled, enable is not allowed */ + if (state == ARM64_SSBD_FORCE_ENABLE || + task_spec_ssb_force_disable(task)) + return -EPERM; + task_clear_spec_ssb_disable(task); + clear_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_FORCE_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + task_set_spec_ssb_force_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + default: + return -ERANGE; + } + + return 0; +} + +int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, + unsigned long ctrl) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_set(task, ctrl); + default: + return -ENODEV; + } +} + +static int ssbd_prctl_get(struct task_struct *task) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_UNKNOWN: + return -EINVAL; + case ARM64_SSBD_FORCE_ENABLE: + return PR_SPEC_DISABLE; + case ARM64_SSBD_KERNEL: + if (task_spec_ssb_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ssb_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + case ARM64_SSBD_FORCE_DISABLE: + return PR_SPEC_ENABLE; + default: + return PR_SPEC_NOT_AFFECTED; + } +} + +int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_get(task); + default: + return -ENODEV; + } +} +#endif /* PR_SPEC_STORE_BYPASS */
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 85478bab409171de501b719971fd25a3d5d639f9 upstream.
As we're going to require to access per-cpu variables at EL2, let's craft the minimum set of accessors required to implement reading a per-cpu variable, relying on tpidr_el2 to contain the per-cpu offset.
Reviewed-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_asm.h | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)
--- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -71,14 +71,37 @@ extern u32 __kvm_get_mdcr_el2(void);
extern u32 __init_stage2_translation(void);
+/* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ +#define __hyp_this_cpu_ptr(sym) \ + ({ \ + void *__ptr = hyp_symbol_addr(sym); \ + __ptr += read_sysreg(tpidr_el2); \ + (typeof(&sym))__ptr; \ + }) + +#define __hyp_this_cpu_read(sym) \ + ({ \ + *__hyp_this_cpu_ptr(sym); \ + }) + #else /* __ASSEMBLY__ */
-.macro get_host_ctxt reg, tmp - adr_l \reg, kvm_host_cpu_state +.macro hyp_adr_this_cpu reg, sym, tmp + adr_l \reg, \sym mrs \tmp, tpidr_el2 add \reg, \reg, \tmp .endm
+.macro hyp_ldr_this_cpu reg, sym, tmp + adr_l \reg, \sym + mrs \tmp, tpidr_el2 + ldr \reg, [\reg, \tmp] +.endm + +.macro get_host_ctxt reg, tmp + hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp +.endm + .macro get_vcpu_ptr vcpu, ctxt get_host_ctxt \ctxt, \vcpu ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream.
In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure.
Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it.
Reviewed-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_mmu.h | 5 ++++ arch/arm64/include/asm/kvm_asm.h | 3 ++ arch/arm64/include/asm/kvm_host.h | 3 ++ arch/arm64/include/asm/kvm_mmu.h | 24 +++++++++++++++++++++ arch/arm64/kvm/hyp/switch.c | 42 ++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/arm.c | 4 +++ 6 files changed, 81 insertions(+)
--- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -335,6 +335,11 @@ static inline int kvm_map_vectors(void) return 0; }
+static inline int hyp_map_aux_data(void) +{ + return 0; +} + #define kvm_phys_to_vttbr(addr) (addr)
#endif /* !__ASSEMBLY__ */ --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,9 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+#define VCPU_WORKAROUND_2_FLAG_SHIFT 0 +#define VCPU_WORKAROUND_2_FLAG (_AC(1, UL) << VCPU_WORKAROUND_2_FLAG_SHIFT) + /* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -216,6 +216,9 @@ struct kvm_vcpu_arch { /* Exception Information */ struct kvm_vcpu_fault_info fault;
+ /* State of various workarounds, see kvm_asm.h for bit assignment */ + u64 workaround_flags; + /* Guest debug state */ u64 debug_flags;
--- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -473,6 +473,30 @@ static inline int kvm_map_vectors(void) } #endif
+#ifdef CONFIG_ARM64_SSBD +DECLARE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + +static inline int hyp_map_aux_data(void) +{ + int cpu, err; + + for_each_possible_cpu(cpu) { + u64 *ptr; + + ptr = per_cpu_ptr(&arm64_ssbd_callback_required, cpu); + err = create_hyp_mappings(ptr, ptr + 1, PAGE_HYP); + if (err) + return err; + } + return 0; +} +#else +static inline int hyp_map_aux_data(void) +{ + return 0; +} +#endif + #define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr)
#endif /* __ASSEMBLY__ */ --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -15,6 +15,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/types.h> #include <linux/jump_label.h> #include <uapi/linux/psci.h> @@ -389,6 +390,39 @@ static bool __hyp_text fixup_guest_exit( return false; }
+static inline bool __hyp_text __needs_ssbd_off(struct kvm_vcpu *vcpu) +{ + if (!cpus_have_const_cap(ARM64_SSBD)) + return false; + + return !(vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG); +} + +static void __hyp_text __set_guest_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * The host runs with the workaround always present. If the + * guest wants it disabled, so be it... + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 0, NULL); +#endif +} + +static void __hyp_text __set_host_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * If the guest has disabled the workaround, bring it back on. + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 1, NULL); +#endif +} + /* Switch to the guest for VHE systems running in EL2 */ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { @@ -409,6 +443,8 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vc sysreg_restore_guest_state_vhe(guest_ctxt); __debug_switch_to_guest(vcpu);
+ __set_guest_arch_workaround_state(vcpu); + do { /* Jump in the fire! */ exit_code = __guest_enter(vcpu, host_ctxt); @@ -416,6 +452,8 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vc /* And we're baaack! */ } while (fixup_guest_exit(vcpu, &exit_code));
+ __set_host_arch_workaround_state(vcpu); + fp_enabled = fpsimd_enabled_vhe();
sysreg_save_guest_state_vhe(guest_ctxt); @@ -465,6 +503,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struc __sysreg_restore_state_nvhe(guest_ctxt); __debug_switch_to_guest(vcpu);
+ __set_guest_arch_workaround_state(vcpu); + do { /* Jump in the fire! */ exit_code = __guest_enter(vcpu, host_ctxt); @@ -472,6 +512,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struc /* And we're baaack! */ } while (fixup_guest_exit(vcpu, &exit_code));
+ __set_host_arch_workaround_state(vcpu); + fp_enabled = __fpsimd_enabled_nvhe();
__sysreg_save_state_nvhe(guest_ctxt); --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1490,6 +1490,10 @@ static int init_hyp_mode(void) } }
+ err = hyp_map_aux_data(); + if (err) + kvm_err("Cannot map host auxilary data: %d\n", err); + return 0;
out_err:
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit b4f18c063a13dfb33e3a63fe1844823e19c2265e upstream.
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3, add a small(-ish) sequence to handle it at EL2. Special care must be taken to track the state of the guest itself by updating the workaround flags. We also rely on patching to enable calls into the firmware.
Note that since we need to execute branches, this always executes after the Spectre-v2 mitigation has been applied.
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/hyp-entry.S | 38 +++++++++++++++++++++++++++++++++++++- 2 files changed, 38 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -136,6 +136,7 @@ int main(void) #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1)); + DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags)); DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -106,8 +106,44 @@ el1_hvc_guest: */ ldr x1, [sp] // Guest's x0 eor w1, w1, #ARM_SMCCC_ARCH_WORKAROUND_1 + cbz w1, wa_epilogue + + /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */ + eor w1, w1, #(ARM_SMCCC_ARCH_WORKAROUND_1 ^ \ + ARM_SMCCC_ARCH_WORKAROUND_2) cbnz w1, el1_trap - mov x0, x1 + +#ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b wa2_end +alternative_cb_end + get_vcpu_ptr x2, x0 + ldr x0, [x2, #VCPU_WORKAROUND_FLAGS] + + // Sanitize the argument and update the guest flags + ldr x1, [sp, #8] // Guest's x1 + clz w1, w1 // Murphy's device: + lsr w1, w1, #5 // w1 = !!w1 without using + eor w1, w1, #1 // the flags... + bfi x0, x1, #VCPU_WORKAROUND_2_FLAG_SHIFT, #1 + str x0, [x2, #VCPU_WORKAROUND_FLAGS] + + /* Check that we actually need to perform the call */ + hyp_ldr_this_cpu x0, arm64_ssbd_callback_required, x2 + cbz x0, wa2_end + + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + smc #0 + + /* Don't leak data from the SMC call */ + mov x3, xzr +wa2_end: + mov x2, xzr + mov x1, xzr +#endif + +wa_epilogue: + mov x0, xzr add sp, sp, #16 eret
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream.
Now that all our infrastructure is in place, let's expose the availability of ARCH_WORKAROUND_2 to guests. We take this opportunity to tidy up a couple of SMCCC constants.
Acked-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_host.h | 12 ++++++++++++ arch/arm64/include/asm/kvm_host.h | 23 +++++++++++++++++++++++ arch/arm64/kvm/reset.c | 4 ++++ virt/kvm/arm/psci.c | 18 ++++++++++++++++-- 4 files changed, 55 insertions(+), 2 deletions(-)
--- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -315,6 +315,18 @@ static inline bool kvm_arm_harden_branch return false; }
+#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + /* No way to detect it yet, pretend it is not there. */ + return KVM_SSBD_UNKNOWN; +} + static inline void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu) {} static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}
--- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -455,6 +455,29 @@ static inline bool kvm_arm_harden_branch return cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR); }
+#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_DISABLE: + return KVM_SSBD_FORCE_DISABLE; + case ARM64_SSBD_KERNEL: + return KVM_SSBD_KERNEL; + case ARM64_SSBD_FORCE_ENABLE: + return KVM_SSBD_FORCE_ENABLE; + case ARM64_SSBD_MITIGATED: + return KVM_SSBD_MITIGATED; + case ARM64_SSBD_UNKNOWN: + default: + return KVM_SSBD_UNKNOWN; + } +} + void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu); void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
--- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -122,6 +122,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu /* Reset PMU */ kvm_pmu_vcpu_reset(vcpu);
+ /* Default workaround setup is enabled (if supported) */ + if (kvm_arm_have_ssbd() == KVM_SSBD_KERNEL) + vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; + /* Reset timer */ return kvm_timer_vcpu_reset(vcpu); } --- a/virt/kvm/arm/psci.c +++ b/virt/kvm/arm/psci.c @@ -405,7 +405,7 @@ static int kvm_psci_call(struct kvm_vcpu int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - u32 val = PSCI_RET_NOT_SUPPORTED; + u32 val = SMCCC_RET_NOT_SUPPORTED; u32 feature;
switch (func_id) { @@ -417,7 +417,21 @@ int kvm_hvc_call_handler(struct kvm_vcpu switch(feature) { case ARM_SMCCC_ARCH_WORKAROUND_1: if (kvm_arm_harden_branch_predictor()) - val = 0; + val = SMCCC_RET_SUCCESS; + break; + case ARM_SMCCC_ARCH_WORKAROUND_2: + switch (kvm_arm_have_ssbd()) { + case KVM_SSBD_FORCE_DISABLE: + case KVM_SSBD_UNKNOWN: + break; + case KVM_SSBD_KERNEL: + val = SMCCC_RET_SUCCESS; + break; + case KVM_SSBD_FORCE_ENABLE: + case KVM_SSBD_MITIGATED: + val = SMCCC_RET_NOT_REQUIRED; + break; + } break; } break;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 9262478220eac908ae6e168c3df2c453c87e2da3 upstream.
After commit 9facc336876f ("bpf: reject any prog that failed read-only lock") offsetof(struct bpf_binary_header, image) became 3 instead of 4, breaking powerpc BPF badly, since instructions need to be word aligned.
Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Daniel Borkmann daniel@iogearbox.net Cc: Martin KaFai Lau kafai@fb.com Cc: Alexei Starovoitov ast@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/filter.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -455,7 +455,9 @@ struct sock_fprog_kern { struct bpf_binary_header { u16 pages; u16 locked:1; - u8 image[]; + + /* Some arches need word alignment for their instructions */ + u8 image[] __aligned(4); };
struct bpf_prog {
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
commit 18d405af30bf6506bf2fc49056de7691c949812e upstream.
Any eBPF JIT that where its underlying arch supports ARCH_HAS_SET_MEMORY would need to use bpf_jit_binary_{un,}lock_ro() pair instead of the set_memory_{ro,rw}() pair directly as otherwise changes to the former might break. arm32's eBPF conversion missed to change it, so fix this up here.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/net/bpf_jit_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1928,7 +1928,7 @@ struct bpf_prog *bpf_int_jit_compile(str /* there are 2 passes here */ bpf_jit_dump(prog->len, image_size, 2, ctx.target);
- set_memory_ro((unsigned long)header, header->pages); + bpf_jit_binary_lock_ro(header); prog->bpf_func = (void *)ctx.target; prog->jited = 1; prog->jited_len = image_size;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
commit 85782e037f8aba8922dadb24a1523ca0b82ab8bc upstream.
Partially undo commit 9facc336876f ("bpf: reject any prog that failed read-only lock") since it caused a regression, that is, syzkaller was able to manage to cause a panic via fault injection deep in set_memory_ro() path by letting an allocation fail: In x86's __change_page_attr_set_clr() it was able to change the attributes of the primary mapping but not in the alias mapping via cpa_process_alias(), so the second, inner call to the __change_page_attr() via __change_page_attr_set_clr() had to split a larger page and failed in the alloc_pages() with the artifically triggered allocation error which is then propagated down to the call site.
Thus, for set_memory_ro() this means that it returned with an error, but from debugging a probe_kernel_write() revealed EFAULT on that memory since the primary mapping succeeded to get changed. Therefore the subsequent hdr->locked = 0 reset triggered the panic as it was performed on read-only memory, so call-site assumptions were infact wrong to assume that it would either succeed /or/ not succeed at all since there's no such rollback in set_memory_*() calls from partial change of mappings, in other words, we're left in a state that is "half done". A later undo via set_memory_rw() is succeeding though due to matching permissions on that part (aka due to the try_preserve_large_page() succeeding). While reproducing locally with explicitly triggering this error, the initial splitting only happens on rare occasions and in real world it would additionally need oom conditions, but that said, it could partially fail. Therefore, it is definitely wrong to bail out on set_memory_ro() error and reject the program with the set_memory_*() semantics we have today. Shouldn't have gone the extra mile since no other user in tree today infact checks for any set_memory_*() errors, e.g. neither module_enable_ro() / module_disable_ro() for module RO/NX handling which is mostly default these days nor kprobes core with alloc_insn_page() / free_insn_page() as examples that could be invoked long after bootup and original 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") did neither when it got first introduced to BPF so "improving" with bailing out was clearly not right when set_memory_*() cannot handle it today.
Kees suggested that if set_memory_*() can fail, we should annotate it with __must_check, and all callers need to deal with it gracefully given those set_memory_*() markings aren't "advisory", but they're expected to actually do what they say. This might be an option worth to move forward in future but would at the same time require that set_memory_*() calls from supporting archs are guaranteed to be "atomic" in that they provide rollback if part of the range fails, once that happened, the transition from RW -> RO could be made more robust that way, while subsequent RO -> RW transition /must/ continue guaranteeing to always succeed the undo part.
Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock") Cc: Laura Abbott labbott@redhat.com Cc: Kees Cook keescook@chromium.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/filter.h | 56 +++++++------------------------------------------ kernel/bpf/core.c | 28 ------------------------ 2 files changed, 8 insertions(+), 76 deletions(-)
--- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -453,9 +453,7 @@ struct sock_fprog_kern { };
struct bpf_binary_header { - u16 pages; - u16 locked:1; - + u32 pages; /* Some arches need word alignment for their instructions */ u8 image[] __aligned(4); }; @@ -464,7 +462,7 @@ struct bpf_prog { u16 pages; /* Number of allocated pages */ u16 jited:1, /* Is our filter JIT'ed? */ jit_requested:1,/* archs need to JIT the prog */ - locked:1, /* Program image locked? */ + undo_set_mem:1, /* Passed set_memory_ro() checkpoint */ gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1, /* Is control block accessed? */ dst_needed:1, /* Do we need dst entry? */ @@ -649,46 +647,24 @@ bpf_ctx_narrow_access_ok(u32 off, u32 si
static inline void bpf_prog_lock_ro(struct bpf_prog *fp) { -#ifdef CONFIG_ARCH_HAS_SET_MEMORY - fp->locked = 1; - if (set_memory_ro((unsigned long)fp, fp->pages)) - fp->locked = 0; -#endif + fp->undo_set_mem = 1; + set_memory_ro((unsigned long)fp, fp->pages); }
static inline void bpf_prog_unlock_ro(struct bpf_prog *fp) { -#ifdef CONFIG_ARCH_HAS_SET_MEMORY - if (fp->locked) { - WARN_ON_ONCE(set_memory_rw((unsigned long)fp, fp->pages)); - /* In case set_memory_rw() fails, we want to be the first - * to crash here instead of some random place later on. - */ - fp->locked = 0; - } -#endif + if (fp->undo_set_mem) + set_memory_rw((unsigned long)fp, fp->pages); }
static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { -#ifdef CONFIG_ARCH_HAS_SET_MEMORY - hdr->locked = 1; - if (set_memory_ro((unsigned long)hdr, hdr->pages)) - hdr->locked = 0; -#endif + set_memory_ro((unsigned long)hdr, hdr->pages); }
static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr) { -#ifdef CONFIG_ARCH_HAS_SET_MEMORY - if (hdr->locked) { - WARN_ON_ONCE(set_memory_rw((unsigned long)hdr, hdr->pages)); - /* In case set_memory_rw() fails, we want to be the first - * to crash here instead of some random place later on. - */ - hdr->locked = 0; - } -#endif + set_memory_rw((unsigned long)hdr, hdr->pages); }
static inline struct bpf_binary_header * @@ -700,22 +676,6 @@ bpf_jit_binary_hdr(const struct bpf_prog return (void *)addr; }
-#ifdef CONFIG_ARCH_HAS_SET_MEMORY -static inline int bpf_prog_check_pages_ro_single(const struct bpf_prog *fp) -{ - if (!fp->locked) - return -ENOLCK; - if (fp->jited) { - const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp); - - if (!hdr->locked) - return -ENOLCK; - } - - return 0; -} -#endif - int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap); static inline int sk_filter(struct sock *sk, struct sk_buff *skb) { --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -583,8 +583,6 @@ bpf_jit_binary_alloc(unsigned int progle bpf_fill_ill_insns(hdr, size);
hdr->pages = size / PAGE_SIZE; - hdr->locked = 0; - hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)), PAGE_SIZE - sizeof(*hdr)); start = (get_random_int() % hole) & ~(alignment - 1); @@ -1515,22 +1513,6 @@ static int bpf_check_tail_call(const str return 0; }
-static int bpf_prog_check_pages_ro_locked(const struct bpf_prog *fp) -{ -#ifdef CONFIG_ARCH_HAS_SET_MEMORY - int i, err; - - for (i = 0; i < fp->aux->func_cnt; i++) { - err = bpf_prog_check_pages_ro_single(fp->aux->func[i]); - if (err) - return err; - } - - return bpf_prog_check_pages_ro_single(fp); -#endif - return 0; -} - static void bpf_prog_select_func(struct bpf_prog *fp) { #ifndef CONFIG_BPF_JIT_ALWAYS_ON @@ -1589,17 +1571,7 @@ finalize: * all eBPF JITs might immediately support all features. */ *err = bpf_check_tail_call(fp); - if (*err) - return fp;
- /* Checkpoint: at this point onwards any cBPF -> eBPF or - * native eBPF program is read-only. If we failed to change - * the page attributes (e.g. allocation failure from - * splitting large pages), then reject the whole program - * in order to guarantee not ending up with any W+X pages - * from BPF side in kernel. - */ - *err = bpf_prog_check_pages_ro_locked(fp); return fp; } EXPORT_SYMBOL_GPL(bpf_prog_select_runtime);
On 20 July 2018 at 17:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.17.9 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.17.9-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.17.y git commit: 8055764ff6b0c0f7e1b363542c7ffab1af1e498b git describe: v4.17.8-102-g8055764ff6b0 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.17-oe/build/v4.17.8-102...
No regressions (compared to build v4.17.8)
Ran 16384 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Sat, Jul 21, 2018 at 01:24:17PM +0530, Naresh Kamboju wrote:
On 20 July 2018 at 17:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.17.9 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Thanks for testing all of these and letting me know.
greg k-h
On 07/20/2018 05:13 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.17.9 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:52 UTC 2018. Anything received after that time might be too late.
Build results: total: 134 pass: 134 fail: 0 Qemu test results: total: 172 pass: 172 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org