This is the start of the stable review cycle for the 4.19.35 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:40 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.35-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.35-rc1
Marc Orr marcorr@google.com KVM: x86: nVMX: fix x2APIC VTPR read intercept
Marc Orr marcorr@google.com KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
Erik Schmauss erik.schmauss@intel.com ACPICA: AML interpreter: add region addresses in global list during initialization
Tomohiro Mayama parly-gh@iris.mystia.org arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64
Katsuhiro Suzuki katsuhiro@katsuster.net arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64
Mikulas Patocka mpatocka@redhat.com dm integrity: fix deadlock with overlapping I/O
Ilya Dryomov idryomov@gmail.com dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
Mikulas Patocka mpatocka@redhat.com dm: revert 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
Mikulas Patocka mpatocka@redhat.com dm integrity: change memcmp to strncmp in dm_integrity_ctr
Sergey Miroshnichenko s.miroshnichenko@yadro.com PCI: pciehp: Ignore Link State Changes after powering off a slot
Andre Przywara andre.przywara@arm.com PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Remove need to check "running" bit in NMI handler
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Resolve NMI latency issues for active PMCs
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Resolve race condition when disabling PMC
Alexander Potapenko glider@google.com x86/asm: Use stricter assembly constraints in bitops
Rasmus Villemoes linux@rasmusvillemoes.dk x86/asm: Remove dead __GNUC__ conditionals
Max Filippov jcmvbkbc@gmail.com xtensa: fix return_address
Mel Gorman mgorman@techsingularity.net sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Dan Carpenter dan.carpenter@oracle.com xen: Prevent buffer overflow in privcmd ioctl
Will Deacon will.deacon@arm.com arm64: backtrace: Don't bother trying to unwind the userspace stack
Peter Geis pgwipeout@gmail.com arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
Will Deacon will.deacon@arm.com arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
David Engraf david.engraf@sysgo.com ARM: dts: at91: Fix typo in ISC_D0 on PC9
Peter Ujfalusi peter.ujfalusi@ti.com ARM: dts: am335x-evm: Correct the regulators for the audio codec
Peter Ujfalusi peter.ujfalusi@ti.com ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
Jonas Karlman jonas@kwiboo.se ARM: dts: rockchip: fix rk3288 cpu opp node reference
Cornelia Huck cohuck@redhat.com virtio: Honour 'may_reduce_num' in vring_create_virtqueue
Kefeng Wang wangkefeng.wang@huawei.com genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
Stephen Boyd swboyd@chromium.org genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
Jason Yan yanaijie@huawei.com block: fix the return errno for direct IO
Jérôme Glisse jglisse@redhat.com block: do not leak memory in bio_copy_user_iov()
Dmitry V. Levin ldv@altlinux.org riscv: Fix syscall_get_arguments() and syscall_set_arguments()
Anand Jain anand.jain@oracle.com btrfs: prop: fix vanished compression property after failed set
Anand Jain anand.jain@oracle.com btrfs: prop: fix zstd compression parameter validation
Filipe Manana fdmanana@suse.com Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
S.j. Wang shengjiu.wang@nxp.com ASoC: fsl_esai: fix channel swap issue when stream starts
Guenter Roeck linux@roeck-us.net ASoC: intel: Fix crash at suspend/resume after failed codec registration
Greg Thelen gthelen@google.com mm: writeback: use exact memcg dirty counts
Arnd Bergmann arnd@arndb.de include/linux/bitrev.h: fix constant bitrev
David Rientjes rientjes@google.com kvm: svm: fix potential get_num_contig_pages overflow
Dave Airlie airlied@redhat.com drm/udl: add a release method and delay modeset teardown
Yan Zhao yan.y.zhao@intel.com drm/i915/gvt: do not deliver a workload if its creation fails
Andrei Vagin avagin@gmail.com alarmtimer: Return correct remaining time
Sven Schnelle svens@stackframe.org parisc: also set iaoq_b in instruction_pointer_set()
Sven Schnelle svens@stackframe.org parisc: regs_return_value() should return gpr28
Helge Deller deller@gmx.de parisc: Detect QEMU earlier in boot process
Peter Geis pgwipeout@gmail.com arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
Hui Wang hui.wang@canonical.com ALSA: hda - Add two more machines to the power_save_blacklist
Richard Sailer rs@tuxedocomputers.com ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
Jian-Hong Pan jian-hong@endlessm.com ALSA: hda/realtek: Enable headset MIC of Acer TravelMate B114-21 with ALC233
Zubin Mithra zsm@chromium.org ALSA: seq: Fix OOB-reads from strlcpy
Erik Schmauss erik.schmauss@intel.com ACPICA: Namespace: remove address node from global list after method termination
Furquan Shaikh furquan@google.com ACPICA: Clear status of GPEs before enabling them
Axel Lin axel.lin@ingics.com hwmon: (w83773g) Select REGMAP_I2C to fix build error
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: ldisc: add sysctl to prevent autoloading of ldiscs
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: mark Siemens R3964 line discipline as BROKEN
Yueyi Li liyueyi@live.com arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region
Florian Westphal fw@strlen.de netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too
Pablo Neira Ayuso pablo@netfilter.org netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr
Neil Armstrong narmstrong@baylibre.com Revert "clk: meson: clean-up clock registration"
Nick Desaulniers ndesaulniers@google.com lib/string.c: implement a basic bcmp
Nick Desaulniers ndesaulniers@google.com x86/vdso: Drop implicit common-page-size linker flag
Nick Desaulniers ndesaulniers@google.com kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
Masahiro Yamada yamada.masahiro@socionext.com kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used
Huy Nguyen huyn@mellanox.com net/mlx5e: Update xon formula
Huy Nguyen huyn@mellanox.com net/mlx5e: Update xoff formula
Aditya Pakki pakki001@umn.edu net: mlx5: Add a missing check on idr_find, free buf
Heiner Kallweit hkallweit1@gmail.com r8169: disable default rx interrupt coalescing on RTL8168
Alexander Lobakin alobakin@dlink.ru net: core: netif_receive_skb_list: unlist skb before passing to pt->func
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip_gre: fix possible use-after-free in erspan_rcv
Michael Chan michael.chan@broadcom.com bnxt_en: Reset device on RX buffer errors.
Michael Chan michael.chan@broadcom.com bnxt_en: Improve RX consumer index validity check.
Jakub Kicinski jakub.kicinski@netronome.com nfp: disable netpoll on representors
Jakub Kicinski jakub.kicinski@netronome.com nfp: validate the return code from dev_queue_xmit()
Yuval Avnery yuvalav@mellanox.com net/mlx5e: Add a lock on tir list
Gavi Teitz gavi@mellanox.com net/mlx5e: Fix error handling when refreshing TIRs
Stephen Suryaputra ssuryaextr@gmail.com vrf: check accept_source_route on the original netdevice
Dust Li dust.li@linux.alibaba.com tcp: fix a potential NULL pointer dereference in tcp_sk_exit
Koen De Schepper koen.de_schepper@nokia-bell-labs.com tcp: Ensure DCTCP reacts to losses
Xin Long lucien.xin@gmail.com sctp: initialize _pad of sockaddr_in before copying to user memory
Heiner Kallweit hkallweit1@gmail.com r8169: disable ASPM again
Bjørn Mork bjorn@mork.no qmi_wwan: add Olicard 600
Andrea Righi andrea.righi@canonical.com openvswitch: fix flow actions reallocation
Nicolas Dichtel nicolas.dichtel@6wind.com net/sched: fix ->get helper of the matchall cls
Davide Caratti dcaratti@redhat.com net/sched: act_sample: fix divide by zero in the traffic path
Mao Wenan maowenan@huawei.com net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
Eric Dumazet edumazet@google.com netns: provide pure entropy for net_hash_mix()
Artemy Kovalyov artemyko@mellanox.com net/mlx5: Decrease default mr cache size
Steffen Klassert steffen.klassert@secunet.com net-gro: Fix GRO flush when receiving a GSO packet.
Li RongQing lirongqing@baidu.com net: ethtool: not call vzalloc for zero sized memory request
Jiri Slaby jslaby@suse.cz kcm: switch order of device registration to fix a crash
Lorenzo Bianconi lorenzo.bianconi@redhat.com ipv6: sit: reset ip header pointer in ipip6_rcv
Junwei Hu hujunwei4@huawei.com ipv6: Fix dangling pointer when ipv6 fragment
Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
Thomas Falcon tlfalcon@linux.ibm.com ibmvnic: Fix completion structure initialization
Haiyang Zhang haiyangz@microsoft.com hv_netvsc: Fix unwanted wakeup after tx_disable
Breno Leitao leitao@debian.org powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
Yan Zhao yan.y.zhao@intel.com drm/i915/gvt: do not let pin count of shadow mm go negative
Jim Mattson jmattson@google.com kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLT
-------------
Diffstat:
Makefile | 6 +- arch/arm/boot/dts/am335x-evm.dts | 26 +++- arch/arm/boot/dts/am335x-evmsk.dts | 26 +++- arch/arm/boot/dts/rk3288.dtsi | 6 +- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 5 +- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 58 ++++----- arch/arm64/include/asm/futex.h | 16 +-- arch/arm64/kernel/traps.c | 15 ++- arch/arm64/mm/init.c | 2 +- arch/parisc/include/asm/ptrace.h | 5 +- arch/parisc/kernel/process.c | 6 - arch/parisc/kernel/setup.c | 3 + arch/powerpc/kernel/signal_64.c | 23 +++- arch/riscv/include/asm/syscall.h | 12 +- arch/x86/entry/vdso/Makefile | 4 +- arch/x86/events/amd/core.c | 140 ++++++++++++++++++++- arch/x86/events/core.c | 13 +- arch/x86/include/asm/bitops.h | 47 +++---- arch/x86/include/asm/string_32.h | 20 --- arch/x86/include/asm/string_64.h | 15 --- arch/x86/include/asm/xen/hypercall.h | 3 + arch/x86/kvm/svm.c | 10 +- arch/x86/kvm/vmx.c | 84 ++++++++----- arch/xtensa/kernel/stacktrace.c | 6 +- block/bio.c | 5 +- drivers/acpi/acpica/dsopcode.c | 4 + drivers/acpi/acpica/evgpe.c | 6 +- drivers/acpi/acpica/nsobject.c | 4 + drivers/char/Kconfig | 2 +- drivers/clk/meson/meson-aoclk.c | 15 +-- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- drivers/gpu/drm/i915/gvt/scheduler.c | 5 +- drivers/gpu/drm/udl/udl_drv.c | 1 + drivers/gpu/drm/udl/udl_drv.h | 1 + drivers/gpu/drm/udl/udl_main.c | 8 +- drivers/hwmon/Kconfig | 1 + drivers/md/dm-integrity.c | 12 +- drivers/md/dm-table.c | 39 ++++++ drivers/md/dm.c | 10 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 16 ++- drivers/net/ethernet/ibm/ibmvnic.c | 5 +- .../ethernet/mellanox/mlx5/core/en/port_buffer.c | 39 +++--- .../net/ethernet/mellanox/mlx5/core/en_common.c | 13 +- drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 14 ++- drivers/net/ethernet/mellanox/mlx5/core/main.c | 20 --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 4 +- drivers/net/ethernet/realtek/r8169.c | 8 +- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc.c | 6 +- drivers/net/hyperv/netvsc_drv.c | 32 ++++- drivers/net/usb/qmi_wwan.c | 1 + drivers/pci/hotplug/pciehp_ctrl.c | 4 + drivers/pci/quirks.c | 2 + drivers/tty/Kconfig | 24 ++++ drivers/tty/tty_io.c | 3 + drivers/tty/tty_ldisc.c | 47 +++++++ drivers/virtio/virtio_ring.c | 2 + fs/block_dev.c | 8 +- fs/btrfs/ioctl.c | 10 ++ fs/btrfs/props.c | 8 +- include/linux/bitrev.h | 46 +++---- include/linux/memcontrol.h | 5 +- include/linux/mlx5/driver.h | 2 + include/linux/netfilter/nf_conntrack_proto_gre.h | 13 ++ include/linux/string.h | 3 + include/linux/virtio_ring.h | 2 +- include/net/ip.h | 2 +- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 10 +- kernel/irq/chip.c | 4 + kernel/irq/irqdesc.c | 1 + kernel/sched/fair.c | 6 +- kernel/time/alarmtimer.c | 2 +- lib/string.c | 20 +++ mm/huge_memory.c | 36 ++++++ mm/memcontrol.c | 20 ++- net/core/dev.c | 4 +- net/core/ethtool.c | 46 ++++--- net/core/net_namespace.c | 1 + net/core/skbuff.c | 2 +- net/ipv4/ip_gre.c | 15 ++- net/ipv4/ip_input.c | 7 +- net/ipv4/ip_options.c | 4 +- net/ipv4/tcp_dctcp.c | 36 +++--- net/ipv4/tcp_ipv4.c | 3 +- net/ipv6/ip6_gre.c | 21 ++-- net/ipv6/ip6_output.c | 4 +- net/ipv6/ip6_tunnel.c | 4 +- net/ipv6/sit.c | 4 + net/kcm/kcmsock.c | 16 +-- net/netfilter/nf_conntrack_proto_gre.c | 14 +-- net/netfilter/nfnetlink_cttimeout.c | 57 ++++++++- net/openvswitch/flow_netlink.c | 4 +- net/rds/tcp.c | 2 +- net/sched/act_sample.c | 10 +- net/sched/cls_matchall.c | 5 + net/sctp/protocol.c | 1 + scripts/package/builddeb | 2 +- sound/core/seq/seq_clientmgr.c | 6 +- sound/pci/hda/hda_intel.c | 4 + sound/pci/hda/patch_realtek.c | 31 +++-- sound/soc/fsl/fsl_esai.c | 47 +++++-- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 ++ .../tc-testing/tc-tests/actions/sample.json | 24 ++++ 105 files changed, 1040 insertions(+), 450 deletions(-)
[ Upstream commit 9ebdfe5230f2e50e3ba05c57723a06e90946815a ]
According to the SDM, "NMI-window exiting" VM-exits wake a logical processor from the same inactive states as would an NMI and "interrupt-window exiting" VM-exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions.
Fixes: 6dfacadd5858 ("KVM: nVMX: Add support for activity state HLT") Signed-off-by: Jim Mattson jmattson@google.com Reviewed-by: Peter Shier pshier@google.com Suggested-by: Sean Christopherson sean.j.christopherson@intel.com [Squashed comments of two Jim's patches and used the simplified code hunk provided by Sean. - Radim] Signed-off-by: Radim Krčmář rkrcmar@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/vmx.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f99f59625da5..e61ac229a6c1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -12836,11 +12836,15 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) nested_cache_shadow_vmcs12(vcpu, vmcs12);
/* - * If we're entering a halted L2 vcpu and the L2 vcpu won't be woken - * by event injection, halt vcpu. + * If we're entering a halted L2 vcpu and the L2 vcpu won't be + * awakened by event injection or by an NMI-window VM-exit or + * by an interrupt-window VM-exit, halt the vcpu. */ if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) && - !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) { + !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) && + !(vmcs12->cpu_based_vm_exec_control & CPU_BASED_VIRTUAL_NMI_PENDING) && + !((vmcs12->cpu_based_vm_exec_control & CPU_BASED_VIRTUAL_INTR_PENDING) && + (vmcs12->guest_rflags & X86_EFLAGS_IF))) { vmx->nested.nested_run_pending = 0; return kvm_vcpu_halt(vcpu); }
[ Upstream commit 663a50ceac75c2208d2ad95365bc8382fd42f44d ]
shadow mm's pin count got increased in workload preparation phase, which is after workload scanning. it will get decreased in complete_current_workload() anyway after workload completion. Sometimes, if a workload meets a scanning error, its shadow mm pin count will not get increased but will get decreased in the end. This patch lets shadow mm's pin count not go below 0.
Fixes: 2707e4446688 ("drm/i915/gvt: vGPU graphics memory virtualization") Cc: zhenyuw@linux.intel.com Cc: stable@vger.kernel.org #4.14+ Signed-off-by: Yan Zhao yan.y.zhao@intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index 00aad8164dec..542f31ce108f 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1940,7 +1940,7 @@ void _intel_vgpu_mm_release(struct kref *mm_ref) */ void intel_vgpu_unpin_mm(struct intel_vgpu_mm *mm) { - atomic_dec(&mm->pincount); + atomic_dec_if_positive(&mm->pincount); }
/**
commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream.
Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined.
error: 'msr' undeclared (first use in this function)
This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:
#define MSR_TM_ACTIVE(x) 0
This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition.
Cc: stable@vger.kernel.org Reported-by: Christoph Biedl linux-kernel.bfrz@manchmal.in-ulm.de Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Michael Neuling mikey@neuling.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/signal_64.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index bbd1c73243d7..14b0f5b6a373 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -755,12 +755,25 @@ SYSCALL_DEFINE0(rt_sigreturn) if (restore_tm_sigcontexts(current, &uc->uc_mcontext, &uc_transact->uc_mcontext)) goto badframe; - } - else - /* Fall through, for non-TM restore */ + } else #endif - if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) - goto badframe; + { + /* + * Fall through, for non-TM restore + * + * Unset MSR[TS] on the thread regs since MSR from user + * context does not have MSR active, and recheckpoint was + * not called since restore_tm_sigcontexts() was not called + * also. + * + * If not unsetting it, the code can RFID to userspace with + * MSR[TS] set, but without CPU in the proper state, + * causing a TM bad thing. + */ + current->thread.regs->msr &= ~MSR_TS_MASK; + if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) + goto badframe; + }
if (restore_altstack(&uc->uc_stack)) goto badframe;
[ Upstream commit 1b704c4a1ba95574832e730f23817b651db2aa59 ]
After queue stopped, the wakeup mechanism may wake it up again when ring buffer usage is lower than a threshold. This may cause send path panic on NULL pointer when we stopped all tx queues in netvsc_detach and start removing the netvsc device.
This patch fix it by adding a tx_disable flag to prevent unwanted queue wakeup.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Reported-by: Mohammed Gamal mgamal@redhat.com Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc.c | 6 ++++-- drivers/net/hyperv/netvsc_drv.c | 32 ++++++++++++++++++++++++++------ 3 files changed, 31 insertions(+), 8 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 42d284669b03..31d8d83c25ac 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -970,6 +970,7 @@ struct netvsc_device {
wait_queue_head_t wait_drain; bool destroy; + bool tx_disable; /* if true, do not wake up queue again */
/* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 1a942feab954..fb12b63439c6 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -110,6 +110,7 @@ static struct netvsc_device *alloc_net_device(void)
init_waitqueue_head(&net_device->wait_drain); net_device->destroy = false; + net_device->tx_disable = false;
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT; net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT; @@ -716,7 +717,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev, } else { struct netdev_queue *txq = netdev_get_tx_queue(ndev, q_idx);
- if (netif_tx_queue_stopped(txq) && + if (netif_tx_queue_stopped(txq) && !net_device->tx_disable && (hv_get_avail_to_write_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER || queue_sends < 1)) { netif_tx_wake_queue(txq); @@ -871,7 +872,8 @@ static inline int netvsc_send_pkt( } else if (ret == -EAGAIN) { netif_tx_stop_queue(txq); ndev_ctx->eth_stats.stop_queue++; - if (atomic_read(&nvchan->queue_sends) < 1) { + if (atomic_read(&nvchan->queue_sends) < 1 && + !net_device->tx_disable) { netif_tx_wake_queue(txq); ndev_ctx->eth_stats.wake_queue++; ret = -ENOSPC; diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index c8320405c8f1..9d699bd5f715 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -109,6 +109,15 @@ static void netvsc_set_rx_mode(struct net_device *net) rcu_read_unlock(); }
+static void netvsc_tx_enable(struct netvsc_device *nvscdev, + struct net_device *ndev) +{ + nvscdev->tx_disable = false; + virt_wmb(); /* ensure queue wake up mechanism is on */ + + netif_tx_wake_all_queues(ndev); +} + static int netvsc_open(struct net_device *net) { struct net_device_context *ndev_ctx = netdev_priv(net); @@ -129,7 +138,7 @@ static int netvsc_open(struct net_device *net) rdev = nvdev->extension; if (!rdev->link_state) { netif_carrier_on(net); - netif_tx_wake_all_queues(net); + netvsc_tx_enable(nvdev, net); }
if (vf_netdev) { @@ -184,6 +193,17 @@ static int netvsc_wait_until_empty(struct netvsc_device *nvdev) } }
+static void netvsc_tx_disable(struct netvsc_device *nvscdev, + struct net_device *ndev) +{ + if (nvscdev) { + nvscdev->tx_disable = true; + virt_wmb(); /* ensure txq will not wake up after stop */ + } + + netif_tx_disable(ndev); +} + static int netvsc_close(struct net_device *net) { struct net_device_context *net_device_ctx = netdev_priv(net); @@ -192,7 +212,7 @@ static int netvsc_close(struct net_device *net) struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev); int ret;
- netif_tx_disable(net); + netvsc_tx_disable(nvdev, net);
/* No need to close rndis filter if it is removed already */ if (!nvdev) @@ -918,7 +938,7 @@ static int netvsc_detach(struct net_device *ndev,
/* If device was up (receiving) then shutdown */ if (netif_running(ndev)) { - netif_tx_disable(ndev); + netvsc_tx_disable(nvdev, ndev);
ret = rndis_filter_close(nvdev); if (ret) { @@ -1899,7 +1919,7 @@ static void netvsc_link_change(struct work_struct *w) if (rdev->link_state) { rdev->link_state = false; netif_carrier_on(net); - netif_tx_wake_all_queues(net); + netvsc_tx_enable(net_device, net); } else { notify = true; } @@ -1909,7 +1929,7 @@ static void netvsc_link_change(struct work_struct *w) if (!rdev->link_state) { rdev->link_state = true; netif_carrier_off(net); - netif_tx_stop_all_queues(net); + netvsc_tx_disable(net_device, net); } kfree(event); break; @@ -1918,7 +1938,7 @@ static void netvsc_link_change(struct work_struct *w) if (!rdev->link_state) { rdev->link_state = true; netif_carrier_off(net); - netif_tx_stop_all_queues(net); + netvsc_tx_disable(net_device, net); event->event = RNDIS_STATUS_MEDIA_CONNECT; spin_lock_irqsave(&ndev_ctx->lock, flags); list_add(&event->list, &ndev_ctx->reconfig_events);
[ Upstream commit bbd669a868bba591ffd38b7bc75a7b361bb54b04 ]
Fix device initialization completion handling for vNIC adapters. Initialize the completion structure on probe and reinitialize when needed. This also fixes a race condition during kdump where the driver can attempt to access the completion struct before it is initialized:
Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000081acbe0 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1 NIP: c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900 REGS: c000000027f3f990 TRAP: 0300 Not tainted (4.18.0-64.el8.ppc64le) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288 XER: 00000006 CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58 GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4 GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28 GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100 GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006 GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000 GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003 GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58 NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100 LR [c0000000081ad964] complete+0x64/0xa0 Call Trace: [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable) [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0 [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic] [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic] [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0 [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400 [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0 [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210 [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24 [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130 [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120
Signed-off-by: Thomas Falcon tlfalcon@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index c8704b1690eb..a475f36ddf8c 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1888,6 +1888,7 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, */ adapter->state = VNIC_PROBED;
+ reinit_completion(&adapter->init_done); rc = init_crq_queue(adapter); if (rc) { netdev_err(adapter->netdev, @@ -4569,7 +4570,7 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) old_num_rx_queues = adapter->req_rx_queues; old_num_tx_queues = adapter->req_tx_queues;
- init_completion(&adapter->init_done); + reinit_completion(&adapter->init_done); adapter->init_done_rc = 0; ibmvnic_send_crq_init(adapter); if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { @@ -4624,7 +4625,6 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
adapter->from_passive_init = false;
- init_completion(&adapter->init_done); adapter->init_done_rc = 0; ibmvnic_send_crq_init(adapter); if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { @@ -4703,6 +4703,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) INIT_WORK(&adapter->ibmvnic_reset, __ibmvnic_reset); INIT_LIST_HEAD(&adapter->rwi_list); spin_lock_init(&adapter->rwi_lock); + init_completion(&adapter->init_done); adapter->resetting = false;
adapter->mac_change_pending = false;
[ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
The device type for ip6 tunnels is set to ARPHRD_TUNNEL6. However, the ip4ip6_err function is expecting the device type of the tunnel to be ARPHRD_TUNNEL. Since the device types do not match, the function exits and the ICMP error packet is not sent to the originating host. Note that the device type for IPv4 tunnels is set to ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of ARPHRD_TUNNEL6 instead. Now the tunnel device type matches and the ICMP error packet is sent to the originating host.
Signed-off-by: Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 0c6403cf8b52..ade1390c6348 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -627,7 +627,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, rt = ip_route_output_ports(dev_net(skb->dev), &fl4, NULL, eiph->daddr, eiph->saddr, 0, 0, IPPROTO_IPIP, RT_TOS(eiph->tos), 0); - if (IS_ERR(rt) || rt->dst.dev->type != ARPHRD_TUNNEL) { + if (IS_ERR(rt) || rt->dst.dev->type != ARPHRD_TUNNEL6) { if (!IS_ERR(rt)) ip_rt_put(rt); goto out; @@ -636,7 +636,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, } else { if (ip_route_input(skb2, eiph->daddr, eiph->saddr, eiph->tos, skb2->dev) || - skb_dst(skb2)->dev->type != ARPHRD_TUNNEL) + skb_dst(skb2)->dev->type != ARPHRD_TUNNEL6) goto out; }
[ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
At the beginning of ip6_fragment func, the prevhdr pointer is obtained in the ip6_find_1stfragopt func. However, all the pointers pointing into skb header may change when calling skb_checksum_help func with skb->ip_summed = CHECKSUM_PARTIAL condition. The prevhdr pointe will be dangling if it is not reloaded after calling __skb_linearize func in skb_checksum_help func.
Here, I add a variable, nexthdr_offset, to evaluate the offset, which does not changes even after calling __skb_linearize func.
Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment") Signed-off-by: Junwei Hu hujunwei4@huawei.com Reported-by: Wenhao Zhang zhangwenhao8@huawei.com Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com Reviewed-by: Zhiqiang Liu liuzhiqiang26@huawei.com Acked-by: Martin KaFai Lau kafai@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 0bb87f3a10c7..eed9231c90ad 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -587,7 +587,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, inet6_sk(skb->sk) : NULL; struct ipv6hdr *tmp_hdr; struct frag_hdr *fh; - unsigned int mtu, hlen, left, len; + unsigned int mtu, hlen, left, len, nexthdr_offset; int hroom, troom; __be32 frag_id; int ptr, offset = 0, err = 0; @@ -598,6 +598,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, goto fail; hlen = err; nexthdr = *prevhdr; + nexthdr_offset = prevhdr - skb_network_header(skb);
mtu = ip6_skb_dst_mtu(skb);
@@ -632,6 +633,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, (err = skb_checksum_help(skb))) goto fail;
+ prevhdr = skb_network_header(skb) + nexthdr_offset; hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { unsigned int first_len = skb_pagelen(skb);
[ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
ipip6 tunnels run iptunnel_pull_header on received skbs. This can determine the following use-after-free accessing iph pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device)
[ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit] [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/= [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016 [ 706.771839] Call trace: [ 706.801159] dump_backtrace+0x0/0x2f8 [ 706.845079] show_stack+0x24/0x30 [ 706.884833] dump_stack+0xe0/0x11c [ 706.925629] print_address_description+0x68/0x260 [ 706.982070] kasan_report+0x178/0x340 [ 707.025995] __asan_report_load1_noabort+0x30/0x40 [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit] [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 707.185940] ip_local_deliver_finish+0x3b8/0x988 [ 707.241338] ip_local_deliver+0x144/0x470 [ 707.289436] ip_rcv_finish+0x43c/0x14b0 [ 707.335447] ip_rcv+0x628/0x1138 [ 707.374151] __netif_receive_skb_core+0x1670/0x2600 [ 707.432680] __netif_receive_skb+0x28/0x190 [ 707.482859] process_backlog+0x1d0/0x610 [ 707.529913] net_rx_action+0x37c/0xf68 [ 707.574882] __do_softirq+0x288/0x1018 [ 707.619852] run_ksoftirqd+0x70/0xa8 [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8 [ 707.711875] kthread+0x2c8/0x350 [ 707.750583] ret_from_fork+0x10/0x18
[ 707.811302] Allocated by task 16982: [ 707.854182] kasan_kmalloc.part.1+0x40/0x108 [ 707.905405] kasan_kmalloc+0xb4/0xc8 [ 707.948291] kasan_slab_alloc+0x14/0x20 [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0 [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0 [ 708.108280] __alloc_skb+0xd8/0x400 [ 708.150139] sk_stream_alloc_skb+0xa4/0x638 [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90 [ 708.251581] tcp_sendmsg+0x40/0x60 [ 708.292376] inet_sendmsg+0xf0/0x520 [ 708.335259] sock_sendmsg+0xac/0xf8 [ 708.377096] sock_write_iter+0x1c0/0x2c0 [ 708.424154] new_sync_write+0x358/0x4a8 [ 708.470162] __vfs_write+0xc4/0xf8 [ 708.510950] vfs_write+0x12c/0x3d0 [ 708.551739] ksys_write+0xcc/0x178 [ 708.592533] __arm64_sys_write+0x70/0xa0 [ 708.639593] el0_svc_handler+0x13c/0x298 [ 708.686646] el0_svc+0x8/0xc
[ 708.739019] Freed by task 17: [ 708.774597] __kasan_slab_free+0x114/0x228 [ 708.823736] kasan_slab_free+0x10/0x18 [ 708.868703] kfree+0x100/0x3d8 [ 708.905320] skb_free_head+0x7c/0x98 [ 708.948204] skb_release_data+0x320/0x490 [ 708.996301] pskb_expand_head+0x60c/0x970 [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0 [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit] [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 709.200195] ip_local_deliver_finish+0x3b8/0x988 [ 709.255596] ip_local_deliver+0x144/0x470 [ 709.303692] ip_rcv_finish+0x43c/0x14b0 [ 709.349705] ip_rcv+0x628/0x1138 [ 709.388413] __netif_receive_skb_core+0x1670/0x2600 [ 709.446943] __netif_receive_skb+0x28/0x190 [ 709.497120] process_backlog+0x1d0/0x610 [ 709.544169] net_rx_action+0x37c/0xf68 [ 709.589131] __do_softirq+0x288/0x1018
[ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580 which belongs to the cache kmalloc-1024 of size 1024 [ 709.804356] The buggy address is located 117 bytes inside of 1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980) [ 709.946340] The buggy address belongs to the page: [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0 [ 710.099914] flags: 0xfffff8000000100(slab) [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600 [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000 [ 710.334966] page dumped because: kasan: bad access detected
Fix it resetting iph pointer after iptunnel_pull_header
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Tested-by: Jianlin Shi jishi@redhat.com Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/sit.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index de9aa5cb295c..8f6cf8e6b5c1 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -669,6 +669,10 @@ static int ipip6_rcv(struct sk_buff *skb) !net_eq(tunnel->net, dev_net(tunnel->dev)))) goto out;
+ /* skb can be uncloned in iptunnel_pull_header, so + * old iph is no longer valid + */ + iph = (const struct iphdr *)skb_mac_header(skb); err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { if (log_ecn_error)
[ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
When kcm is loaded while many processes try to create a KCM socket, a crash occurs: BUG: unable to handle kernel NULL pointer dereference at 000000000000000e IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0 Oops: 0002 [#1] SMP KASAN PTI CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased) RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 RSP: 0018:ffff88000d487a00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719 ... CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0 Call Trace: kcm_create+0x600/0xbf0 [kcm] __sock_create+0x324/0x750 net/socket.c:1272 ...
This is due to race between sock_create and unfinished register_pernet_device. kcm_create tries to do "net_generic(net, kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...) and one process doing module removal.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/kcm/kcmsock.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 571d824e4e24..b919db02c7f9 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -2054,14 +2054,14 @@ static int __init kcm_init(void) if (err) goto fail;
- err = sock_register(&kcm_family_ops); - if (err) - goto sock_register_fail; - err = register_pernet_device(&kcm_net_ops); if (err) goto net_ops_fail;
+ err = sock_register(&kcm_family_ops); + if (err) + goto sock_register_fail; + err = kcm_proc_init(); if (err) goto proc_init_fail; @@ -2069,12 +2069,12 @@ static int __init kcm_init(void) return 0;
proc_init_fail: - unregister_pernet_device(&kcm_net_ops); - -net_ops_fail: sock_unregister(PF_KCM);
sock_register_fail: + unregister_pernet_device(&kcm_net_ops); + +net_ops_fail: proto_unregister(&kcm_proto);
fail: @@ -2090,8 +2090,8 @@ static int __init kcm_init(void) static void __exit kcm_exit(void) { kcm_proc_exit(); - unregister_pernet_device(&kcm_net_ops); sock_unregister(PF_KCM); + unregister_pernet_device(&kcm_net_ops); proto_unregister(&kcm_proto); destroy_workqueue(kcm_wq);
[ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
NULL or ZERO_SIZE_PTR will be returned for zero sized memory request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory request and not call functions which maybe derefence the NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats returns error, memory should be freed before exit
Signed-off-by: Li RongQing lirongqing@baidu.com Reviewed-by: Wang Li wangli39@baidu.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/ethtool.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c index aeabc4831fca..7cc97f43f138 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1863,11 +1863,16 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr) WARN_ON_ONCE(!ret);
gstrings.len = ret; - data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); - if (gstrings.len && !data) - return -ENOMEM;
- __ethtool_get_strings(dev, gstrings.string_set, data); + if (gstrings.len) { + data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); + if (!data) + return -ENOMEM; + + __ethtool_get_strings(dev, gstrings.string_set, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &gstrings, sizeof(gstrings))) @@ -1963,11 +1968,15 @@ static int ethtool_get_stats(struct net_device *dev, void __user *useraddr) return -EFAULT;
stats.n_stats = n_stats; - data = vzalloc(array_size(n_stats, sizeof(u64))); - if (n_stats && !data) - return -ENOMEM;
- ops->get_ethtool_stats(dev, &stats, data); + if (n_stats) { + data = vzalloc(array_size(n_stats, sizeof(u64))); + if (!data) + return -ENOMEM; + ops->get_ethtool_stats(dev, &stats, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &stats, sizeof(stats))) @@ -2007,16 +2016,21 @@ static int ethtool_get_phy_stats(struct net_device *dev, void __user *useraddr) return -EFAULT;
stats.n_stats = n_stats; - data = vzalloc(array_size(n_stats, sizeof(u64))); - if (n_stats && !data) - return -ENOMEM;
- if (dev->phydev && !ops->get_ethtool_phy_stats) { - ret = phy_ethtool_get_stats(dev->phydev, &stats, data); - if (ret < 0) - return ret; + if (n_stats) { + data = vzalloc(array_size(n_stats, sizeof(u64))); + if (!data) + return -ENOMEM; + + if (dev->phydev && !ops->get_ethtool_phy_stats) { + ret = phy_ethtool_get_stats(dev->phydev, &stats, data); + if (ret < 0) + goto out; + } else { + ops->get_ethtool_phy_stats(dev, &stats, data); + } } else { - ops->get_ethtool_phy_stats(dev, &stats, data); + data = NULL; }
ret = -EFAULT;
[ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
Currently we may merge incorrectly a received GSO packet or a packet with frag_list into a packet sitting in the gro_hash list. skb_segment() may crash case because the assumptions on the skb layout are not met. The correct behaviour would be to flush the packet in the gro_hash list and send the received GSO packet directly afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders") sets NAPI_GRO_CB(skb)->flush in this case, but this is not checked before merging. This patch makes sure to check this flag and to not merge in that case.
Fixes: d61d072e87c8e ("net-gro: avoid reorders") Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8656b1e20d35..ceee28e184af 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3832,7 +3832,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) unsigned int delta_truesize; struct sk_buff *lp;
- if (unlikely(p->len + len >= 65536)) + if (unlikely(p->len + len >= 65536 || NAPI_GRO_CB(skb)->flush)) return -E2BIG;
lp = NAPI_GRO_CB(p)->last;
[ Upstream commit e8b26b2135dedc0284490bfeac06dfc4418d0105 ]
Delete initialization of high order entries in mr cache to decrease initial memory footprint. When required, the administrator can populate the entries with memory keys via the /sys interface.
This approach is very helpful to significantly reduce the per HW function memory footprint in virtualization environments such as SRIOV.
Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Signed-off-by: Artemy Kovalyov artemyko@mellanox.com Signed-off-by: Moni Shoua monis@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Reported-by: Shalom Toledo shalomt@mellanox.com Acked-by: Or Gerlitz ogerlitz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/main.c | 20 ------------------- 1 file changed, 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 563ce3fedab4..0e820cf92f8a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -162,26 +162,6 @@ static struct mlx5_profile profile[] = { .size = 8, .limit = 4 }, - .mr_cache[16] = { - .size = 8, - .limit = 4 - }, - .mr_cache[17] = { - .size = 8, - .limit = 4 - }, - .mr_cache[18] = { - .size = 8, - .limit = 4 - }, - .mr_cache[19] = { - .size = 4, - .limit = 2 - }, - .mr_cache[20] = { - .size = 4, - .limit = 2 - }, }, };
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated)
I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Amit Klein aksecurity@gmail.com Reported-by: Benny Pinkas benny@pinkas.net Cc: Pavel Emelyanov xemul@openvz.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 10 ++-------- net/core/net_namespace.c | 1 + 3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 9b5fdc50519a..3f7b166262d7 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -57,6 +57,7 @@ struct net { */ spinlock_t rules_mod_lock;
+ u32 hash_mix; atomic64_t cookie_gen;
struct list_head list; /* list of network namespaces */ diff --git a/include/net/netns/hash.h b/include/net/netns/hash.h index 16a842456189..d9b665151f3d 100644 --- a/include/net/netns/hash.h +++ b/include/net/netns/hash.h @@ -2,16 +2,10 @@ #ifndef __NET_NS_HASH_H__ #define __NET_NS_HASH_H__
-#include <asm/cache.h> - -struct net; +#include <net/net_namespace.h>
static inline u32 net_hash_mix(const struct net *net) { -#ifdef CONFIG_NET_NS - return (u32)(((unsigned long)net) >> ilog2(sizeof(*net))); -#else - return 0; -#endif + return net->hash_mix; } #endif diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 670c84b1bfc2..7320f0844a50 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -304,6 +304,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
refcount_set(&net->count, 1); refcount_set(&net->passive, 1); + get_random_bytes(&net->hash_mix, sizeof(u32)); net->dev_base_seq = 1; net->user_ns = user_ns; idr_init(&net->netns_ids);
[ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
When it is to cleanup net namespace, rds_tcp_exit_net() will call rds_tcp_kill_sock(), if t_sock is NULL, it will not call rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect() and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before sock->ops->connect, but if connect() is failed, it will call rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always failed, rds_connect_worker() will try to reconnect all the time, so rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always NULL, and there is on other path to cancel cp_conn_w and free connection. So this patch is to fix this.
rds_tcp_kill_sock(): ... if (net != c_net || !tc->t_sock) ... Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com
================================================================== BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11 Hardware name: linux,dummy-virt (DT) Workqueue: krdsd rds_connect_worker Call trace: dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x120/0x188 lib/dump_stack.c:113 print_address_description+0x68/0x278 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x21c/0x348 mm/kasan/report.c:409 __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429 inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 __sock_create+0x4f8/0x770 net/socket.c:1276 sock_create_kern+0x50/0x68 net/socket.c:1322 rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114 rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
Allocated by task 687: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2705 [inline] slab_alloc mm/slub.c:2713 [inline] kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718 kmem_cache_zalloc include/linux/slab.h:697 [inline] net_alloc net/core/net_namespace.c:384 [inline] copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424 create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206 ksys_unshare+0x340/0x628 kernel/fork.c:2577 __do_sys_unshare kernel/fork.c:2645 [inline] __se_sys_unshare kernel/fork.c:2643 [inline] __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
Freed by task 264: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968 net_free net/core/net_namespace.c:400 [inline] net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407 net_drop_ns net/core/net_namespace.c:406 [inline] cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
The buggy address belongs to the object at ffff8003496a3f80 which belongs to the cache net_namespace of size 7872 The buggy address is located 1796 bytes inside of 7872-byte region [ffff8003496a3f80, ffff8003496a5e40) The buggy address belongs to the page: page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000 index:0x0 compound_mapcount: 0 flags: 0xffffe0000008100(slab|head) raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c index b9bbcf3d6c63..18bb522df282 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -600,7 +600,7 @@ static void rds_tcp_kill_sock(struct net *net) list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
- if (net != c_net || !tc->t_sock) + if (net != c_net) continue; if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) { list_move_tail(&tc->t_tcp_node, &tmp_list);
[ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]
the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below:
# tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1
action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q
divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt
Add a TDC selftest to document that 'rate' is now being validated.
Reported-by: Matteo Croce mcroce@redhat.com Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Yotam Gigi yotam.gi@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_sample.c | 10 ++++++-- .../tc-testing/tc-tests/actions/sample.json | 24 +++++++++++++++++++ 2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index 6b67aa13d2dd..c7f5d630d97c 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -43,8 +43,8 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, sample_net_id); struct nlattr *tb[TCA_SAMPLE_MAX + 1]; struct psample_group *psample_group; + u32 psample_group_num, rate; struct tc_sample *parm; - u32 psample_group_num; struct tcf_sample *s; bool exists = false; int ret, err; @@ -80,6 +80,12 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, return -EEXIST; }
+ rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); + if (!rate) { + NL_SET_ERR_MSG(extack, "invalid sample rate"); + tcf_idr_release(*a, bind); + return -EINVAL; + } psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]); psample_group = psample_group_get(net, psample_group_num); if (!psample_group) { @@ -91,7 +97,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
spin_lock_bh(&s->tcf_lock); s->tcf_action = parm->action; - s->rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); + s->rate = rate; s->psample_group_num = psample_group_num; RCU_INIT_POINTER(s->psample_group, psample_group);
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json b/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json index 3aca33c00039..618def9bdf0e 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json @@ -143,6 +143,30 @@ "$TC actions flush action sample" ] }, + { + "id": "7571", + "name": "Add sample action with invalid rate", + "category": [ + "actions", + "sample" + ], + "setup": [ + [ + "$TC actions flush action sample", + 0, + 1, + 255 + ] + ], + "cmdUnderTest": "$TC actions add action sample rate 0 group 1 index 2", + "expExitCode": "255", + "verifyCmd": "$TC actions get action sample index 2", + "matchPattern": "action order [0-9]+: sample rate 1/0 group 1.*index 2 ref", + "matchCount": "0", + "teardown": [ + "$TC actions flush action sample" + ] + }, { "id": "b6d4", "name": "Add sample action with mandatory arguments and invalid control action",
[ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]
It returned always NULL, thus it was never possible to get the filter.
Example: $ ip link add foo type dummy $ ip link add bar type dummy $ tc qdisc add dev foo clsact $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \ matchall action mirred ingress mirror dev bar
Before the patch: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall Error: Specified filter handle not found. We have an error talking to the kernel
After: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2 not_in_hw action order 1: mirred (Ingress Mirror to device bar) pipe index 1 ref 1 bind 1
CC: Yotam Gigi yotamg@mellanox.com CC: Jiri Pirko jiri@mellanox.com Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_matchall.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index 856fa79d4ffd..621bc1d5b057 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -126,6 +126,11 @@ static void mall_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
static void *mall_get(struct tcf_proto *tp, u32 handle) { + struct cls_mall_head *head = rtnl_dereference(tp->root); + + if (head && head->handle == handle) + return head; + return NULL; }
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
The flow action buffer can be resized if it's not big enough to contain all the requested flow actions. However, this resize doesn't take into account the new requested size, the buffer is only increased by a factor of 2x. This might be not enough to contain the new data, causing a buffer overflow, for example:
[ 42.044472] ============================================================================= [ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten [ 42.046415] -----------------------------------------------------------------------------
[ 42.047715] Disabling lock debugging due to kernel taint [ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc [ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101 [ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
[ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........ [ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l... [ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x... [ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 .... [ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Fix by making sure the new buffer is properly resized to contain all the requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi andrea.righi@canonical.com Acked-by: Pravin B Shelar pshelar@ovn.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/flow_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index c7b6010b2c09..eab5e8eaddaa 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -2306,14 +2306,14 @@ static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
struct sw_flow_actions *acts; int new_acts_size; - int req_size = NLA_ALIGN(attr_len); + size_t req_size = NLA_ALIGN(attr_len); int next_offset = offsetof(struct sw_flow_actions, actions) + (*sfa)->actions_len;
if (req_size <= (ksize(*sfa) - next_offset)) goto out;
- new_acts_size = ksize(*sfa) * 2; + new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
if (new_acts_size > MAX_ACTIONS_BUFSIZE) { if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
[ Upstream commit 6289d0facd9ebce4cc83e5da39e15643ee998dc5 ]
This is a Qualcomm based device with a QMI function on interface 4. It is mode switched from 2020:2030 using a standard eject message.
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2020 ProdID=2031 Rev= 2.32 S: Manufacturer=Mobile Connect S: Product=Mobile Connect S: SerialNumber=0123456789ABCDEF C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none) E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 74bebbdb4b15..9195f3476b1d 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1203,6 +1203,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */ {QMI_FIXED_INTF(0x2001, 0x7e19, 4)}, /* D-Link DWM-221 B1 */ {QMI_FIXED_INTF(0x2001, 0x7e35, 4)}, /* D-Link DWM-222 */ + {QMI_FIXED_INTF(0x2020, 0x2031, 4)}, /* Olicard 600 */ {QMI_FIXED_INTF(0x2020, 0x2033, 4)}, /* BroadMobi BM806U */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */
[ Upstream commit b75bb8a5b755d0c7bf1ac071e4df2349a7644a1e ]
There's a significant number of reports that re-enabling ASPM causes different issues, ranging from decreased performance to system not booting at all. This affects only a minority of users, but the number of affected users is big enough that we better switch off ASPM again.
This will hurt notebook users who are not affected by the issues, they may see decreased battery runtime w/o ASPM. With the PCI core folks is being discussed to add generic sysfs attributes to control ASPM. Once this is in place brave enough users can re-enable ASPM on their system.
Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 5f45ffeeecb4..1d24884e9897 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -28,6 +28,7 @@ #include <linux/pm_runtime.h> #include <linux/firmware.h> #include <linux/prefetch.h> +#include <linux/pci-aspm.h> #include <linux/ipv6.h> #include <net/ip6_checksum.h>
@@ -7324,6 +7325,11 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return rc; }
+ /* Disable ASPM completely as that cause random device stop working + * problems as well as full system hangs for some PCIe devices users. + */ + pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); + /* enable device (incl. PCI PM wakeup and hotplug setup) */ rc = pcim_enable_device(pdev); if (rc < 0) {
[ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 Call Trace: _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:174 [inline] sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline] sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562 ... Uninit was stored to memory at: sctp_transport_init net/sctp/transport.c:61 [inline] sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115 sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637 sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline] sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361 ... Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in struct sockaddr_in) wasn't initialized, but directly copied to user memory in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Tested-by: Alexander Potapenko glider@google.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/protocol.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 1c9f079e8a50..d97b2b4b7a8b 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -600,6 +600,7 @@ static struct sock *sctp_v4_create_accept_sk(struct sock *sk, static int sctp_v4_addr_to_user(struct sctp_sock *sp, union sctp_addr *addr) { /* No address mapping for V4 sockets */ + memset(addr->v4.sin_zero, 0, sizeof(addr->v4.sin_zero)); return sizeof(struct sockaddr_in); }
[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP".
Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values.
This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed.
The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison.
| Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06%
We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno.
Signed-off-by: Koen De Schepper koen.de_schepper@nokia-bell-labs.com Signed-off-by: Olivier Tilmans olivier.tilmans@nokia-bell-labs.com Cc: Bob Briscoe research@bobbriscoe.net Cc: Lawrence Brakmo brakmo@fb.com Cc: Florian Westphal fw@strlen.de Cc: Daniel Borkmann borkmann@iogearbox.net Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Cc: Eric Dumazet edumazet@google.com Cc: Andrew Shewmaker agshew@gmail.com Cc: Glenn Judd glenn.judd@morganstanley.com Acked-by: Florian Westphal fw@strlen.de Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_dctcp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index ca61e2a659e7..5205c5a5d8d5 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -66,11 +66,6 @@ static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA; module_param(dctcp_alpha_on_init, uint, 0644); MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly; -module_param(dctcp_clamp_alpha_on_loss, uint, 0644); -MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss, - "parameter for clamping alpha on loss"); - static struct tcp_congestion_ops dctcp_reno;
static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) @@ -211,21 +206,23 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags) } }
-static void dctcp_state(struct sock *sk, u8 new_state) +static void dctcp_react_to_loss(struct sock *sk) { - if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) { - struct dctcp *ca = inet_csk_ca(sk); + struct dctcp *ca = inet_csk_ca(sk); + struct tcp_sock *tp = tcp_sk(sk);
- /* If this extension is enabled, we clamp dctcp_alpha to - * max on packet loss; the motivation is that dctcp_alpha - * is an indicator to the extend of congestion and packet - * loss is an indicator of extreme congestion; setting - * this in practice turned out to be beneficial, and - * effectively assumes total congestion which reduces the - * window by half. - */ - ca->dctcp_alpha = DCTCP_MAX_ALPHA; - } + ca->loss_cwnd = tp->snd_cwnd; + tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); +} + +static void dctcp_state(struct sock *sk, u8 new_state) +{ + if (new_state == TCP_CA_Recovery && + new_state != inet_csk(sk)->icsk_ca_state) + dctcp_react_to_loss(sk); + /* We handle RTO in dctcp_cwnd_event to ensure that we perform only + * one loss-adjustment per RTT. + */ }
static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) @@ -237,6 +234,9 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) case CA_EVENT_ECN_NO_CE: dctcp_ce_state_1_to_0(sk); break; + case CA_EVENT_LOSS: + dctcp_react_to_loss(sk); + break; default: /* Don't care for the rest. */ break;
[ Upstream commit b506bc975f60f06e13e74adb35e708a23dc4e87c ]
When tcp_sk_init() failed in inet_ctl_sock_create(), 'net->ipv4.tcp_congestion_control' will be left uninitialized, but tcp_sk_exit() hasn't check for that.
This patch add checking on 'net->ipv4.tcp_congestion_control' in tcp_sk_exit() to prevent NULL-ptr dereference.
Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control") Signed-off-by: Dust Li dust.li@linux.alibaba.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_ipv4.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 30fdf891940b..11101cf8693b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2490,7 +2490,8 @@ static void __net_exit tcp_sk_exit(struct net *net) { int cpu;
- module_put(net->ipv4.tcp_congestion_control->owner); + if (net->ipv4.tcp_congestion_control) + module_put(net->ipv4.tcp_congestion_control->owner);
for_each_possible_cpu(cpu) inet_ctl_sock_destroy(*per_cpu_ptr(net->ipv4.tcp_sk, cpu));
[ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev.
v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern).
Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ip.h | 2 +- net/ipv4/ip_input.c | 7 +++---- net/ipv4/ip_options.c | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h index 71d31e4d4391..cfc3dd5ff085 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -648,7 +648,7 @@ int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp, unsigned char __user *data, int optlen); void ip_options_undo(struct ip_options *opt); void ip_forward_options(struct sk_buff *skb); -int ip_options_rcv_srr(struct sk_buff *skb); +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev);
/* * Functions provided by ip_sockglue.c diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index bd8ef4f87c79..c3a0683e83df 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -258,11 +258,10 @@ int ip_local_deliver(struct sk_buff *skb) ip_local_deliver_finish); }
-static inline bool ip_rcv_options(struct sk_buff *skb) +static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt; const struct iphdr *iph; - struct net_device *dev = skb->dev;
/* It looks as overkill, because not all IP options require packet mangling. @@ -298,7 +297,7 @@ static inline bool ip_rcv_options(struct sk_buff *skb) } }
- if (ip_options_rcv_srr(skb)) + if (ip_options_rcv_srr(skb, dev)) goto drop; }
@@ -354,7 +353,7 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk, } #endif
- if (iph->ihl > 5 && ip_rcv_options(skb)) + if (iph->ihl > 5 && ip_rcv_options(skb, dev)) goto drop;
rt = skb_rtable(skb); diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index 32a35043c9f5..3db31bb9df50 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -612,7 +612,7 @@ void ip_forward_options(struct sk_buff *skb) } }
-int ip_options_rcv_srr(struct sk_buff *skb) +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt = &(IPCB(skb)->opt); int srrspace, srrptr; @@ -647,7 +647,7 @@ int ip_options_rcv_srr(struct sk_buff *skb)
orefdst = skb->_skb_refdst; skb_dst_set(skb, NULL); - err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, skb->dev); + err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, dev); rt2 = skb_rtable(skb); if (err || (rt2->rt_type != RTN_UNICAST && rt2->rt_type != RTN_LOCAL)) { skb_dst_drop(skb);
[ Upstream commit bc87a0036826a37b43489b029af8143bd07c6cca ]
Previously, a false positive would be caught if the TIRs list is empty, since the err value was initialized to -ENOMEM, and was only updated if a TIR is refreshed. This is resolved by initializing the err value to zero.
Fixes: b676f653896a ("net/mlx5e: Refactor refresh TIRs") Signed-off-by: Gavi Teitz gavi@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index db3278cc052b..1e28866e3924 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -141,15 +141,17 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) { struct mlx5_core_dev *mdev = priv->mdev; struct mlx5e_tir *tir; - int err = -ENOMEM; + int err = 0; u32 tirn = 0; int inlen; void *in;
inlen = MLX5_ST_SZ_BYTES(modify_tir_in); in = kvzalloc(inlen, GFP_KERNEL); - if (!in) + if (!in) { + err = -ENOMEM; goto out; + }
if (enable_uc_lb) MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
[ Upstream commit 80a2a9026b24c6bd34b8d58256973e22270bedec ]
Refresh tirs is looping over a global list of tirs while netdevs are adding and removing tirs from that list. That is why a lock is required.
Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Yuval Avnery yuvalav@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 7 +++++++ include/linux/mlx5/driver.h | 2 ++ 2 files changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 1e28866e3924..124e4567a4ee 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -45,7 +45,9 @@ int mlx5e_create_tir(struct mlx5_core_dev *mdev, if (err) return err;
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_add(&tir->list, &mdev->mlx5e_res.td.tirs_list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return 0; } @@ -53,8 +55,10 @@ int mlx5e_create_tir(struct mlx5_core_dev *mdev, void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir) { + mutex_lock(&mdev->mlx5e_res.td.list_lock); mlx5_core_destroy_tir(mdev, tir->tirn); list_del(&tir->list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock); }
static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, @@ -114,6 +118,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev) }
INIT_LIST_HEAD(&mdev->mlx5e_res.td.tirs_list); + mutex_init(&mdev->mlx5e_res.td.list_lock);
return 0;
@@ -159,6 +164,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb)
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_for_each_entry(tir, &mdev->mlx5e_res.td.tirs_list, list) { tirn = tir->tirn; err = mlx5_core_modify_tir(mdev, tirn, in, inlen); @@ -170,6 +176,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) kvfree(in); if (err) netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", tirn, err); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return err; } diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index bbcfe2e5fd91..e8b92dee5a72 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -776,6 +776,8 @@ struct mlx5_pagefault { };
struct mlx5_td { + /* protects tirs list changes while tirs refresh */ + struct mutex list_lock; struct list_head tirs_list; u32 tdn; };
[ Upstream commit c8ba5b91a04e3e2643e48501c114108802f21cda ]
dev_queue_xmit() may return error codes as well as netdev_tx_t, and it always consumes the skb. Make sure we always return a correct netdev_tx_t value.
Fixes: eadfa4c3be99 ("nfp: add stats and xmit helpers for representors") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: John Hurley john.hurley@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index 18a09cdcd9c6..e0d73b385563 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -225,7 +225,7 @@ static netdev_tx_t nfp_repr_xmit(struct sk_buff *skb, struct net_device *netdev) ret = dev_queue_xmit(skb); nfp_repr_inc_tx_stats(netdev, len, ret);
- return ret; + return NETDEV_TX_OK; }
static int nfp_repr_stop(struct net_device *netdev)
[ Upstream commit c3e1f7fff69c78169c8ac40cc74ac4307f74e36d ]
NFP reprs are software device on top of the PF's vNIC. The comment above __dev_queue_xmit() sayeth:
When calling this method, interrupts MUST be enabled. This is because the BH enable code must have IRQs enabled so that it will not deadlock.
For netconsole we can't guarantee IRQ state, let's just disable netpoll on representors to be on the safe side.
When the initial implementation of NFP reprs was added by the commit 5de73ee46704 ("nfp: general representor implementation") .ndo_poll_controller was required for netpoll to be enabled.
Fixes: ac3d9dd034e5 ("netpoll: make ndo_poll_controller() optional") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: John Hurley john.hurley@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index e0d73b385563..aa5869eb2e3f 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -329,6 +329,8 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
SWITCHDEV_SET_OPS(netdev, &nfp_port_switchdev_ops);
+ netdev->priv_flags |= IFF_DISABLE_NETPOLL; + if (nfp_app_has_tc(app)) { netdev->features |= NETIF_F_HW_TC; netdev->hw_features |= NETIF_F_HW_TC;
[ Upstream commit a1b0e4e684e9c300b9e759b46cb7a0147e61ddff ]
There is logic to check that the RX/TPA consumer index is the expected index to work around a hardware problem. However, the potentially bad consumer index is first used to index into an array to reference an entry. This can potentially crash if the bad consumer index is beyond legal range. Improve the logic to use the consumer index for dereferencing after the validity check and log an error message.
Fixes: fa7e28127a5a ("bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 0bd93bb7d1a2..a8abb47178be 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1092,6 +1092,8 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, tpa_info = &rxr->rx_tpa[agg_id];
if (unlikely(cons != rxr->rx_next_cons)) { + netdev_warn(bp->dev, "TPA cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return; } @@ -1544,15 +1546,17 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi *bnapi, u32 *raw_cons, }
cons = rxcmp->rx_cmp_opaque; - rx_buf = &rxr->rx_buf_ring[cons]; - data = rx_buf->data; - data_ptr = rx_buf->data_ptr; if (unlikely(cons != rxr->rx_next_cons)) { int rc1 = bnxt_discard_rx(bp, bnapi, raw_cons, rxcmp);
+ netdev_warn(bp->dev, "RX cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return rc1; } + rx_buf = &rxr->rx_buf_ring[cons]; + data = rx_buf->data; + data_ptr = rx_buf->data_ptr; prefetch(data_ptr);
misc = le32_to_cpu(rxcmp->rx_cmp_misc_v1);
[ Upstream commit 8e44e96c6c8e8fb80b84a2ca11798a8554f710f2 ]
If the RX completion indicates RX buffers errors, the RX ring will be disabled by firmware and no packets will be received on that ring from that point on. Recover by resetting the device.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index a8abb47178be..581ad0a17d0c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1573,11 +1573,17 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi *bnapi, u32 *raw_cons,
rx_buf->data = NULL; if (rxcmp1->rx_cmp_cfa_code_errors_v2 & RX_CMP_L2_ERRORS) { + u32 rx_err = le32_to_cpu(rxcmp1->rx_cmp_cfa_code_errors_v2); + bnxt_reuse_rx_data(rxr, cons, data); if (agg_bufs) bnxt_reuse_rx_agg_bufs(bnapi, cp_cons, agg_bufs);
rc = -EIO; + if (rx_err & RX_CMPL_ERRORS_BUFFER_ERROR_MASK) { + netdev_warn(bp->dev, "RX buffer error %x\n", rx_err); + bnxt_sched_reset(bp, rxr); + } goto next_rx; }
[ Upstream commit 492b67e28ee5f2a2522fb72e3d3bcb990e461514 ]
erspan tunnels run __iptunnel_pull_header on received skbs to remove gre and erspan headers. This can determine a possible use-after-free accessing pkt_md pointer in erspan_rcv since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device). Fix it resetting pkt_md pointer after __iptunnel_pull_header
Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_gre.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index f199945f6e4a..3c734832bb7c 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -260,7 +260,6 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, struct net *net = dev_net(skb->dev); struct metadata_dst *tun_dst = NULL; struct erspan_base_hdr *ershdr; - struct erspan_metadata *pkt_md; struct ip_tunnel_net *itn; struct ip_tunnel *tunnel; const struct iphdr *iph; @@ -283,9 +282,6 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, if (unlikely(!pskb_may_pull(skb, len))) return PACKET_REJECT;
- ershdr = (struct erspan_base_hdr *)(skb->data + gre_hdr_len); - pkt_md = (struct erspan_metadata *)(ershdr + 1); - if (__iptunnel_pull_header(skb, len, htons(ETH_P_TEB), @@ -293,8 +289,9 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, goto drop;
if (tunnel->collect_md) { + struct erspan_metadata *pkt_md, *md; struct ip_tunnel_info *info; - struct erspan_metadata *md; + unsigned char *gh; __be64 tun_id; __be16 flags;
@@ -307,6 +304,14 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, if (!tun_dst) return PACKET_REJECT;
+ /* skb can be uncloned in __iptunnel_pull_header, so + * old pkt_md is no longer valid and we need to reset + * it + */ + gh = skb_network_header(skb) + + skb_network_header_len(skb); + pkt_md = (struct erspan_metadata *)(gh + gre_hdr_len + + sizeof(*ershdr)); md = ip_tunnel_info_opts(&tun_dst->u.tun_info); md->version = ver; md2 = &md->u.md2;
[ Upstream commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb ]
erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove erspan header. This can determine a possible use-after-free accessing pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device). Fix it resetting pkt_md pointer after __iptunnel_pull_header
Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_gre.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index faed98dab913..c4a7db62658e 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -540,11 +540,10 @@ static int ip6gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) return PACKET_REJECT; }
-static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, - struct tnl_ptk_info *tpi) +static int ip6erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, + int gre_hdr_len) { struct erspan_base_hdr *ershdr; - struct erspan_metadata *pkt_md; const struct ipv6hdr *ipv6h; struct erspan_md2 *md2; struct ip6_tnl *tunnel; @@ -563,18 +562,16 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, if (unlikely(!pskb_may_pull(skb, len))) return PACKET_REJECT;
- ershdr = (struct erspan_base_hdr *)skb->data; - pkt_md = (struct erspan_metadata *)(ershdr + 1); - if (__iptunnel_pull_header(skb, len, htons(ETH_P_TEB), false, false) < 0) return PACKET_REJECT;
if (tunnel->parms.collect_md) { + struct erspan_metadata *pkt_md, *md; struct metadata_dst *tun_dst; struct ip_tunnel_info *info; - struct erspan_metadata *md; + unsigned char *gh; __be64 tun_id; __be16 flags;
@@ -587,6 +584,14 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, if (!tun_dst) return PACKET_REJECT;
+ /* skb can be uncloned in __iptunnel_pull_header, so + * old pkt_md is no longer valid and we need to reset + * it + */ + gh = skb_network_header(skb) + + skb_network_header_len(skb); + pkt_md = (struct erspan_metadata *)(gh + gre_hdr_len + + sizeof(*ershdr)); info = &tun_dst->u.tun_info; md = ip_tunnel_info_opts(info); md->version = ver; @@ -623,7 +628,7 @@ static int gre_rcv(struct sk_buff *skb)
if (unlikely(tpi.proto == htons(ETH_P_ERSPAN) || tpi.proto == htons(ETH_P_ERSPAN2))) { - if (ip6erspan_rcv(skb, hdr_len, &tpi) == PACKET_RCVD) + if (ip6erspan_rcv(skb, &tpi, hdr_len) == PACKET_RCVD) return 0; goto out; }
[ Upstream commit 9a5a90d167b0e5fe3d47af16b68fd09ce64085cd ]
__netif_receive_skb_list_ptype() leaves skb->next poisoned before passing it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA setup) crashes like:
[ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c [ 88.618666] Oops[#1]: [ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473 [ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850 [ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000 [ 88.642526] $ 8 : 00000000 49600000 00000001 00000001 [ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41 [ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd [ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800 [ 88.665793] $24 : 00000000 8011e658 [ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c [ 88.677427] Hi : 00000003 [ 88.680622] Lo : 7b5b4220 [ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0 [ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188 [ 88.696734] Status: 10000404 IEp [ 88.700422] Cause : 50000008 (ExcCode 02) [ 88.704874] BadVA : 0000000e [ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi)) [ 88.713005] Modules linked in: [ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000) [ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c [ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000 [ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c [ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50 [ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800 [ 88.771770] ... [ 88.774483] Call Trace: [ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0 [ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188 [ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4 [ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0 [ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140 [ 88.805526] [<805a68f4>] ip_forward+0x364/0x560 [ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4 [ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58 [ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac [ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c [ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394 [ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828 [ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc [ 88.850835] [<806ad300>] __do_softirq+0x178/0x338 [ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100 [ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144 [ 88.866477] [<80105974>] handle_int+0x14c/0x158 [ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40 [ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a [ 88.887332] [ 88.888982] ---[ end trace eb863d007da11cf1 ]--- [ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt [ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fix this by pulling skb off the sublist and zeroing skb->next pointer before calling ptype callback.
Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup") Reviewed-by: Edward Cree ecree@solarflare.com Signed-off-by: Alexander Lobakin alobakin@dlink.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 5c8c0a572ee9..d47554307a6d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4959,8 +4959,10 @@ static inline void __netif_receive_skb_list_ptype(struct list_head *head, if (pt_prev->list_func != NULL) pt_prev->list_func(head, pt_prev, orig_dev); else - list_for_each_entry_safe(skb, next, head, list) + list_for_each_entry_safe(skb, next, head, list) { + skb_list_del_init(skb); pt_prev->func(skb, skb->dev, pt_prev, orig_dev); + } }
static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)
[ Upstream commit 288ac524cf70a8e7ed851a61ed2a9744039dae8d ]
It was reported that re-introducing ASPM, in combination with RX interrupt coalescing, results in significantly increased packet latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes the issue. Therefore change the driver's default to disable RX interrupt coalescing. Users still have the option to enable RX coalescing via ethtool.
[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925496
Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Reported-by: Mike Crowe mac@mcrowe.com Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 1d24884e9897..7a50b911b180 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -5418,7 +5418,7 @@ static void rtl_hw_start_8168(struct rtl8169_private *tp) tp->cp_cmd |= PktCntrDisable | INTT_1; RTL_W16(tp, CPlusCmd, tp->cp_cmd);
- RTL_W16(tp, IntrMitigate, 0x5151); + RTL_W16(tp, IntrMitigate, 0x5100);
/* Work around for RxFIFO overflow. */ if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
[ Upstream commit 8e949363f017e2011464812a714fb29710fb95b4 ]
idr_find() can return a NULL value to 'flow' which is used without a check. The patch adds a check to avoid potential NULL pointer dereference.
In case of mlx5_fpga_sbu_conn_sendmsg() failure, free buf allocated using kzalloc.
Fixes: ab412e1dd7db ("net/mlx5: Accel, add TLS rx offload routines") Signed-off-by: Aditya Pakki pakki001@umn.edu Reviewed-by: Yuval Shaia yuval.shaia@oracle.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c index 5cf5f2a9d51f..8de64e88c670 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c @@ -217,15 +217,21 @@ int mlx5_fpga_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq, void *cmd; int ret;
+ rcu_read_lock(); + flow = idr_find(&mdev->fpga->tls->rx_idr, ntohl(handle)); + rcu_read_unlock(); + + if (!flow) { + WARN_ONCE(1, "Received NULL pointer for handle\n"); + return -EINVAL; + } + buf = kzalloc(size, GFP_ATOMIC); if (!buf) return -ENOMEM;
cmd = (buf + 1);
- rcu_read_lock(); - flow = idr_find(&mdev->fpga->tls->rx_idr, ntohl(handle)); - rcu_read_unlock(); mlx5_fpga_tls_flow_to_cmd(flow, cmd);
MLX5_SET(tls_cmd, cmd, swid, ntohl(handle)); @@ -238,6 +244,8 @@ int mlx5_fpga_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq, buf->complete = mlx_tls_kfree_complete;
ret = mlx5_fpga_sbu_conn_sendmsg(mdev->fpga->tls->conn, buf); + if (ret < 0) + kfree(buf);
return ret; }
[ Upstream commit 5ec983e924c7978aaec3cf8679ece9436508bb20 ]
Set minimum speed in xoff threshold formula to 40Gbps
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen huyn@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/port_buffer.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c index eac245a93f91..f00de0c987cd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c @@ -122,7 +122,9 @@ static int port_set_buffer(struct mlx5e_priv *priv, return err; }
-/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) */ +/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) + * minimum speed value is 40Gbps + */ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu) { u32 speed; @@ -130,10 +132,9 @@ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu) int err;
err = mlx5e_port_linkspeed(priv->mdev, &speed); - if (err) { - mlx5_core_warn(priv->mdev, "cannot get port speed\n"); - return 0; - } + if (err) + speed = SPEED_40000; + speed = max_t(u32, speed, SPEED_40000);
xoff = (301 + 216 * priv->dcbx.cable_len / 100) * speed / 1000 + 272 * mtu / 100;
[ Upstream commit e28408e98bced123038857b6e3c81fa12a2e3e68 ]
Set xon = xoff - netdev's max_mtu. netdev's max_mtu will give enough time for the pause frame to arrive at the sender.
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen huyn@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/en/port_buffer.c | 28 +++++++++++-------- 1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c index f00de0c987cd..4ab0d030b544 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c @@ -143,7 +143,7 @@ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu) }
static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer, - u32 xoff, unsigned int mtu) + u32 xoff, unsigned int max_mtu) { int i;
@@ -155,11 +155,12 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer, }
if (port_buffer->buffer[i].size < - (xoff + mtu + (1 << MLX5E_BUFFER_CELL_SHIFT))) + (xoff + max_mtu + (1 << MLX5E_BUFFER_CELL_SHIFT))) return -ENOMEM;
port_buffer->buffer[i].xoff = port_buffer->buffer[i].size - xoff; - port_buffer->buffer[i].xon = port_buffer->buffer[i].xoff - mtu; + port_buffer->buffer[i].xon = + port_buffer->buffer[i].xoff - max_mtu; }
return 0; @@ -167,7 +168,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
/** * update_buffer_lossy() - * mtu: device's MTU + * max_mtu: netdev's max_mtu * pfc_en: <input> current pfc configuration * buffer: <input> current prio to buffer mapping * xoff: <input> xoff value @@ -184,7 +185,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer, * Return 0 if no error. * Set change to true if buffer configuration is modified. */ -static int update_buffer_lossy(unsigned int mtu, +static int update_buffer_lossy(unsigned int max_mtu, u8 pfc_en, u8 *buffer, u32 xoff, struct mlx5e_port_buffer *port_buffer, bool *change) @@ -221,7 +222,7 @@ static int update_buffer_lossy(unsigned int mtu, }
if (changed) { - err = update_xoff_threshold(port_buffer, xoff, mtu); + err = update_xoff_threshold(port_buffer, xoff, max_mtu); if (err) return err;
@@ -231,6 +232,7 @@ static int update_buffer_lossy(unsigned int mtu, return 0; }
+#define MINIMUM_MAX_MTU 9216 int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, u32 change, unsigned int mtu, struct ieee_pfc *pfc, @@ -242,12 +244,14 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, bool update_prio2buffer = false; u8 buffer[MLX5E_MAX_PRIORITY]; bool update_buffer = false; + unsigned int max_mtu; u32 total_used = 0; u8 curr_pfc_en; int err; int i;
mlx5e_dbg(HW, priv, "%s: change=%x\n", __func__, change); + max_mtu = max_t(unsigned int, priv->netdev->max_mtu, MINIMUM_MAX_MTU);
err = mlx5e_port_query_buffer(priv, &port_buffer); if (err) @@ -255,7 +259,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
if (change & MLX5E_PORT_BUFFER_CABLE_LEN) { update_buffer = true; - err = update_xoff_threshold(&port_buffer, xoff, mtu); + err = update_xoff_threshold(&port_buffer, xoff, max_mtu); if (err) return err; } @@ -265,7 +269,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, if (err) return err;
- err = update_buffer_lossy(mtu, pfc->pfc_en, buffer, xoff, + err = update_buffer_lossy(max_mtu, pfc->pfc_en, buffer, xoff, &port_buffer, &update_buffer); if (err) return err; @@ -277,8 +281,8 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, if (err) return err;
- err = update_buffer_lossy(mtu, curr_pfc_en, prio2buffer, xoff, - &port_buffer, &update_buffer); + err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer, + xoff, &port_buffer, &update_buffer); if (err) return err; } @@ -302,7 +306,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, return -EINVAL;
update_buffer = true; - err = update_xoff_threshold(&port_buffer, xoff, mtu); + err = update_xoff_threshold(&port_buffer, xoff, max_mtu); if (err) return err; } @@ -310,7 +314,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, /* Need to update buffer configuration if xoff value is changed */ if (!update_buffer && xoff != priv->dcbx.xoff) { update_buffer = true; - err = update_xoff_threshold(&port_buffer, xoff, mtu); + err = update_xoff_threshold(&port_buffer, xoff, max_mtu); if (err) return err; }
[ Upstream commit 02826a6ba301b72461c3706e1cc66d5571cd327e ]
Ard Biesheuvel reports bindeb-pkg with O= option is broken in the following way:
... LD [M] sound/soc/rockchip/snd-soc-rk3399-gru-sound.ko LD [M] sound/soc/rockchip/snd-soc-rockchip-pcm.ko LD [M] sound/soc/rockchip/snd-soc-rockchip-rt5645.ko LD [M] sound/soc/rockchip/snd-soc-rockchip-spdif.ko LD [M] sound/soc/sh/rcar/snd-soc-rcar.ko fakeroot -u debian/rules binary make KERNELRELEASE=4.19.0-12677-g19beffaf7a99-dirty ARCH=arm64 KBUILD_SRC= intdeb-pkg /bin/bash /home/ard/linux/scripts/package/builddeb Makefile:600: include/config/auto.conf: No such file or directory *** *** Configuration file ".config" not found! *** *** Please run some configurator (e.g. "make oldconfig" or *** "make menuconfig" or "make xconfig"). *** make[12]: *** [syncconfig] Error 1 make[11]: *** [syncconfig] Error 2 make[10]: *** [include/config/auto.conf] Error 2 make[9]: *** [__sub-make] Error 2 ...
Prior to commit 80463f1b7bf9 ("kbuild: add --include-dir flag only for out-of-tree build"), both srctree and objtree were added to --include-dir redundantly, and the wrong code '$MAKE image_name' was working by relying on that. Now, the potential issue that had previously been hidden just showed up.
'$MAKE image_name' recurses to the generated $(objtree)/Makefile and ends up with running in srctree, which is incorrect. It should be invoked with '-f $srctree/Makefile' (or KBUILD_SRC=) to be executed in objtree.
Fixes: 80463f1b7bf9 ("kbuild: add --include-dir flag only for out-of-tree build") Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/package/builddeb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/package/builddeb b/scripts/package/builddeb index 90c9a8ac7adb..0b31f4f1f92c 100755 --- a/scripts/package/builddeb +++ b/scripts/package/builddeb @@ -81,7 +81,7 @@ else cp System.map "$tmpdir/boot/System.map-$version" cp $KCONFIG_CONFIG "$tmpdir/boot/config-$version" fi -cp "$($MAKE -s image_name)" "$tmpdir/$installed_image_path" +cp "$($MAKE -s -f $srctree/Makefile image_name)" "$tmpdir/$installed_image_path"
if grep -q "^CONFIG_OF=y" $KCONFIG_CONFIG ; then # Only some architectures with OF support have this target
commit ad15006cc78459d059af56729c4d9bed7c7fd860 upstream.
This causes an issue when trying to build with `make LD=ld.lld` if ld.lld and the rest of your cross tools aren't in the same directory (ex. /usr/local/bin) (as is the case for Android's build system), as the GCC_TOOLCHAIN_DIR then gets set based on `which $(LD)` which will point where LLVM tools are, not GCC/binutils tools are located.
Instead, select the GCC_TOOLCHAIN_DIR based on another tool provided by binutils for which LLVM does not provide a substitute for, such as elfedit.
Fixes: 785f11aa595b ("kbuild: Add better clang cross build support") Link: https://github.com/ClangBuiltLinux/linux/issues/341 Suggested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index 8fdfe0af5862..c07f25e2e11b 100644 --- a/Makefile +++ b/Makefile @@ -483,7 +483,7 @@ endif ifeq ($(cc-name),clang) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%)) -GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD))) +GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit)) CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR) GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..) endif
commit ac3e233d29f7f77f28243af0132057d378d3ea58 upstream.
GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/entry/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it.
Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Reported-by: Dmitry Golovin dima@golovin.in Reported-by: Bill Wendling morbo@google.com Suggested-by: Dmitry Golovin dima@golovin.in Suggested-by: Rui Ueyama ruiu@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Andy Lutomirski luto@kernel.org Cc: Andi Kleen andi@firstfloor.org Cc: Fangrui Song maskray@google.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=38774 Link: https://github.com/ClangBuiltLinux/linux/issues/31 Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/entry/vdso/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index c3d7ccd25381..5bfe2243a08f 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -47,7 +47,7 @@ targets += $(vdso_img_sodbg) $(vdso_img-y:%=vdso%.so) CPPFLAGS_vdso.lds += -P -C
VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \ - -z max-page-size=4096 -z common-page-size=4096 + -z max-page-size=4096
$(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -98,7 +98,7 @@ CFLAGS_REMOVE_vvar.o = -pg
CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \ - -z max-page-size=4096 -z common-page-size=4096 + -z max-page-size=4096
# x32-rebranded versions vobjx32s-y := $(vobjs-y:.o=-x32.o)
[ Upstream commit 5f074f3e192f10c9fade898b9b3b8812e3d83342 ]
A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works.
This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future.
Other ideas discussed:
* A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly.
* -ffreestanding also is used sporadically throughout the kernel.
* -fno-builtin-bcmp doesn't work when doing LTO.
Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18... Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reported-by: Nathan Chancellor natechancellor@gmail.com Reported-by: Adhemerval Zanella adhemerval.zanella@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: James Y Knight jyknight@google.com Suggested-by: Masahiro Yamada yamada.masahiro@socionext.com Suggested-by: Nathan Chancellor natechancellor@gmail.com Suggested-by: Rasmus Villemoes linux@rasmusvillemoes.dk Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Masahiro Yamada yamada.masahiro@socionext.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: David Laight David.Laight@ACULAB.COM Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Namhyung Kim namhyung@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Dan Williams dan.j.williams@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/string.h | 3 +++ lib/string.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/include/linux/string.h b/include/linux/string.h index 4a5a0eb7df51..f58e1ef76572 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -143,6 +143,9 @@ extern void * memscan(void *,int,__kernel_size_t); #ifndef __HAVE_ARCH_MEMCMP extern int memcmp(const void *,const void *,__kernel_size_t); #endif +#ifndef __HAVE_ARCH_BCMP +extern int bcmp(const void *,const void *,__kernel_size_t); +#endif #ifndef __HAVE_ARCH_MEMCHR extern void * memchr(const void *,int,__kernel_size_t); #endif diff --git a/lib/string.c b/lib/string.c index 2c0900a5d51a..72125fd5b4a6 100644 --- a/lib/string.c +++ b/lib/string.c @@ -865,6 +865,26 @@ __visible int memcmp(const void *cs, const void *ct, size_t count) EXPORT_SYMBOL(memcmp); #endif
+#ifndef __HAVE_ARCH_BCMP +/** + * bcmp - returns 0 if and only if the buffers have identical contents. + * @a: pointer to first buffer. + * @b: pointer to second buffer. + * @len: size of buffers. + * + * The sign or magnitude of a non-zero return value has no particular + * meaning, and architectures may implement their own more efficient bcmp(). So + * while this particular implementation is a simple (tail) call to memcmp, do + * not rely on anything but whether the return value is zero or non-zero. + */ +#undef bcmp +int bcmp(const void *a, const void *b, size_t len) +{ + return memcmp(a, b, len); +} +EXPORT_SYMBOL(bcmp); +#endif + #ifndef __HAVE_ARCH_MEMSCAN /** * memscan - Find a character in an area of memory.
This reverts commit c8e4f8406842332fb55cd792016e5dac266f6354.
This patch was not initially a fix and is dependent on other changes which are not fixes eithers.
With this change, multiple Amlogic based boards fails to boot, as reported by kernelci.
Cc: stable@vger.kernel.org # 4.19.34 Signed-off-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/meson/meson-aoclk.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/clk/meson/meson-aoclk.c b/drivers/clk/meson/meson-aoclk.c index 258c8d259ea1..f965845917e3 100644 --- a/drivers/clk/meson/meson-aoclk.c +++ b/drivers/clk/meson/meson-aoclk.c @@ -65,20 +65,15 @@ int meson_aoclkc_probe(struct platform_device *pdev) return ret; }
- /* Populate regmap */ - for (clkid = 0; clkid < data->num_clks; clkid++) + /* + * Populate regmap and register all clks + */ + for (clkid = 0; clkid < data->num_clks; clkid++) { data->clks[clkid]->map = regmap;
- /* Register all clks */ - for (clkid = 0; clkid < data->hw_data->num; clkid++) { - if (!data->hw_data->hws[clkid]) - continue; - ret = devm_clk_hw_register(dev, data->hw_data->hws[clkid]); - if (ret) { - dev_err(dev, "Clock registration failed\n"); + if (ret) return ret; - } }
return devm_of_clk_add_hw_provider(dev, of_clk_hw_onecell_get,
commit 8866df9264a34e675b4ee8a151db819b87cce2d3 upstream
Otherwise, we hit a NULL pointer deference since handlers always assume default timeout policy is passed.
netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297
Fixes: c779e849608a ("netfilter: conntrack: remove get_timeout() indirection") Reported-by: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Zubin Mithra zsm@chromium.org Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- net/netfilter/nfnetlink_cttimeout.c | 46 +++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index a30f8ba4b89a..1dc4ea327cbe 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -392,7 +392,8 @@ err: static int cttimeout_default_fill_info(struct net *net, struct sk_buff *skb, u32 portid, u32 seq, u32 type, int event, - const struct nf_conntrack_l4proto *l4proto) + const struct nf_conntrack_l4proto *l4proto, + const unsigned int *timeouts) { struct nlmsghdr *nlh; struct nfgenmsg *nfmsg; @@ -421,7 +422,7 @@ cttimeout_default_fill_info(struct net *net, struct sk_buff *skb, u32 portid, if (!nest_parms) goto nla_put_failure;
- ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, NULL); + ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, timeouts); if (ret < 0) goto nla_put_failure;
@@ -444,6 +445,7 @@ static int cttimeout_default_get(struct net *net, struct sock *ctnl, struct netlink_ext_ack *extack) { const struct nf_conntrack_l4proto *l4proto; + unsigned int *timeouts = NULL; struct sk_buff *skb2; int ret, err; __u16 l3num; @@ -456,12 +458,44 @@ static int cttimeout_default_get(struct net *net, struct sock *ctnl, l4num = nla_get_u8(cda[CTA_TIMEOUT_L4PROTO]); l4proto = nf_ct_l4proto_find_get(l3num, l4num);
- /* This protocol is not supported, skip. */ - if (l4proto->l4proto != l4num) { - err = -EOPNOTSUPP; + err = -EOPNOTSUPP; + if (l4proto->l4proto != l4num) goto err; + + switch (l4proto->l4proto) { + case IPPROTO_ICMP: + timeouts = &net->ct.nf_ct_proto.icmp.timeout; + break; + case IPPROTO_TCP: + timeouts = net->ct.nf_ct_proto.tcp.timeouts; + break; + case IPPROTO_UDP: + timeouts = net->ct.nf_ct_proto.udp.timeouts; + break; + case IPPROTO_DCCP: +#ifdef CONFIG_NF_CT_PROTO_DCCP + timeouts = net->ct.nf_ct_proto.dccp.dccp_timeout; +#endif + break; + case IPPROTO_ICMPV6: + timeouts = &net->ct.nf_ct_proto.icmpv6.timeout; + break; + case IPPROTO_SCTP: +#ifdef CONFIG_NF_CT_PROTO_SCTP + timeouts = net->ct.nf_ct_proto.sctp.timeouts; +#endif + break; + case 255: + timeouts = &net->ct.nf_ct_proto.generic.timeout; + break; + default: + WARN_ON_ONCE(1); + break; }
+ if (!timeouts) + goto err; + skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (skb2 == NULL) { err = -ENOMEM; @@ -472,7 +506,7 @@ static int cttimeout_default_get(struct net *net, struct sock *ctnl, nlh->nlmsg_seq, NFNL_MSG_TYPE(nlh->nlmsg_type), IPCTNL_MSG_TIMEOUT_DEFAULT_SET, - l4proto); + l4proto, timeouts); if (ret <= 0) { kfree_skb(skb2); err = -ENOMEM;
commit 89259088c1b7fecb43e8e245dc931909132a4e03 upstream
syzbot was able to trigger the WARN in cttimeout_default_get() by passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use same timeout values.
Furthermore, also fetch GRE timeouts. GRE is a bit more complicated, as it still can be a module and its netns_proto_gre struct layout isn't visible outside of the gre module. Can't move timeouts around, it appears conntrack sysctl unregister assumes net_generic() returns nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
A followup nf-next patch could make gre tracker be built-in as well if needed, its not that large.
Last, make the WARN() mention the missing protocol value in case anything else is missing.
Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Zubin Mithra zsm@chromium.org Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- include/linux/netfilter/nf_conntrack_proto_gre.h | 13 +++++++++++++ net/netfilter/nf_conntrack_proto_gre.c | 14 ++------------ net/netfilter/nfnetlink_cttimeout.c | 15 +++++++++++++-- 3 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/include/linux/netfilter/nf_conntrack_proto_gre.h b/include/linux/netfilter/nf_conntrack_proto_gre.h index b8d95564bd53..14edb795ab43 100644 --- a/include/linux/netfilter/nf_conntrack_proto_gre.h +++ b/include/linux/netfilter/nf_conntrack_proto_gre.h @@ -21,6 +21,19 @@ struct nf_ct_gre_keymap { struct nf_conntrack_tuple tuple; };
+enum grep_conntrack { + GRE_CT_UNREPLIED, + GRE_CT_REPLIED, + GRE_CT_MAX +}; + +struct netns_proto_gre { + struct nf_proto_net nf; + rwlock_t keymap_lock; + struct list_head keymap_list; + unsigned int gre_timeouts[GRE_CT_MAX]; +}; + /* add new tuple->key_reply pair to keymap */ int nf_ct_gre_keymap_add(struct nf_conn *ct, enum ip_conntrack_dir dir, struct nf_conntrack_tuple *t); diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c index 650eb4fba2c5..841c472aae1c 100644 --- a/net/netfilter/nf_conntrack_proto_gre.c +++ b/net/netfilter/nf_conntrack_proto_gre.c @@ -43,24 +43,12 @@ #include <linux/netfilter/nf_conntrack_proto_gre.h> #include <linux/netfilter/nf_conntrack_pptp.h>
-enum grep_conntrack { - GRE_CT_UNREPLIED, - GRE_CT_REPLIED, - GRE_CT_MAX -}; - static const unsigned int gre_timeouts[GRE_CT_MAX] = { [GRE_CT_UNREPLIED] = 30*HZ, [GRE_CT_REPLIED] = 180*HZ, };
static unsigned int proto_gre_net_id __read_mostly; -struct netns_proto_gre { - struct nf_proto_net nf; - rwlock_t keymap_lock; - struct list_head keymap_list; - unsigned int gre_timeouts[GRE_CT_MAX]; -};
static inline struct netns_proto_gre *gre_pernet(struct net *net) { @@ -408,6 +396,8 @@ static int __init nf_ct_proto_gre_init(void) { int ret;
+ BUILD_BUG_ON(offsetof(struct netns_proto_gre, nf) != 0); + ret = register_pernet_subsys(&proto_gre_net_ops); if (ret < 0) goto out_pernet; diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 1dc4ea327cbe..70a7382b9787 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -469,7 +469,8 @@ static int cttimeout_default_get(struct net *net, struct sock *ctnl, case IPPROTO_TCP: timeouts = net->ct.nf_ct_proto.tcp.timeouts; break; - case IPPROTO_UDP: + case IPPROTO_UDP: /* fallthrough */ + case IPPROTO_UDPLITE: timeouts = net->ct.nf_ct_proto.udp.timeouts; break; case IPPROTO_DCCP: @@ -483,13 +484,23 @@ static int cttimeout_default_get(struct net *net, struct sock *ctnl, case IPPROTO_SCTP: #ifdef CONFIG_NF_CT_PROTO_SCTP timeouts = net->ct.nf_ct_proto.sctp.timeouts; +#endif + break; + case IPPROTO_GRE: +#ifdef CONFIG_NF_CT_PROTO_GRE + if (l4proto->net_id) { + struct netns_proto_gre *net_gre; + + net_gre = net_generic(net, *l4proto->net_id); + timeouts = net_gre->gre_timeouts; + } #endif break; case 255: timeouts = &net->ct.nf_ct_proto.generic.timeout; break; default: - WARN_ON_ONCE(1); + WARN_ONCE(1, "Missing timeouts for proto %d", l4proto->l4proto); break; }
[ Upstream commit c8a43c18a97845e7f94ed7d181c11f41964976a2 ]
When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), the top 4K of kernel virtual address space may be mapped to physical addresses despite being reserved for ERR_PTR values.
Fix the randomization of the linear region so that we avoid mapping the last page of the virtual address space.
Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: liyueyi liyueyi@live.com [will: rewrote commit message; merged in suggestion from Ard] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- arch/arm64/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 787e27964ab9..774c3e17c798 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -450,7 +450,7 @@ void __init arm64_memblock_init(void) * memory spans, randomize the linear region as well. */ if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { - range = range / ARM64_MEMSTART_ALIGN + 1; + range /= ARM64_MEMSTART_ALIGN; memstart_addr -= ARM64_MEMSTART_ALIGN * ((range * memstart_offset_seed) >> 16); }
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit c7084edc3f6d67750f50d4183134c4fb5712a5c8 upstream.
The n_r3964 line discipline driver was written in a different time, when SMP machines were rare, and users were trusted to do the right thing. Since then, the world has moved on but not this code, it has stayed rooted in the past with its lovely hand-crafted list structures and loads of "interesting" race conditions all over the place.
After attempting to clean up most of the issues, I just gave up and am now marking the driver as BROKEN so that hopefully someone who has this hardware will show up out of the woodwork (I know you are out there!) and will help with debugging a raft of changes that I had laying around for the code, but was too afraid to commit as odds are they would break things.
Many thanks to Jann and Linus for pointing out the initial problems in this codebase, as well as many reviews of my attempts to fix the issues. It was a case of whack-a-mole, and as you can see, the mole won.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
--- drivers/char/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -343,7 +343,7 @@ config XILINX_HWICAP
config R3964 tristate "Siemens R3964 line discipline" - depends on TTY + depends on TTY && BROKEN ---help--- This driver allows synchronous communication with devices using the Siemens R3964 packet protocol. Unless you are dealing with special
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 7c0cca7c847e6e019d67b7d793efbbe3b947d004 upstream.
By default, the kernel will automatically load the module of any line dicipline that is asked for. As this sometimes isn't the safest thing to do, provide a sysctl to disable this feature.
By default, we set this to 'y' as that is the historical way that Linux has worked, and we do not want to break working systems. But in the future, perhaps this can default to 'n' to prevent this functionality.
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/Kconfig | 24 ++++++++++++++++++++++++ drivers/tty/tty_io.c | 3 +++ drivers/tty/tty_ldisc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 74 insertions(+)
--- a/drivers/tty/Kconfig +++ b/drivers/tty/Kconfig @@ -441,4 +441,28 @@ config VCC depends on SUN_LDOMS help Support for Sun logical domain consoles. + +config LDISC_AUTOLOAD + bool "Automatically load TTY Line Disciplines" + default y + help + Historically the kernel has always automatically loaded any + line discipline that is in a kernel module when a user asks + for it to be loaded with the TIOCSETD ioctl, or through other + means. This is not always the best thing to do on systems + where you know you will not be using some of the more + "ancient" line disciplines, so prevent the kernel from doing + this unless the request is coming from a process with the + CAP_SYS_MODULE permissions. + + Say 'Y' here if you trust your userspace users to do the right + thing, or if you have only provided the line disciplines that + you know you will be using, or if you wish to continue to use + the traditional method of on-demand loading of these modules + by any user. + + This functionality can be changed at runtime with the + dev.tty.ldisc_autoload sysctl, this configuration option will + only set the default value of this functionality. + endif # TTY --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -512,6 +512,8 @@ static const struct file_operations hung static DEFINE_SPINLOCK(redirect_lock); static struct file *redirect;
+extern void tty_sysctl_init(void); + /** * tty_wakeup - request more data * @tty: terminal @@ -3340,6 +3342,7 @@ void console_sysfs_notify(void) */ int __init tty_init(void) { + tty_sysctl_init(); cdev_init(&tty_cdev, &tty_fops); if (cdev_add(&tty_cdev, MKDEV(TTYAUX_MAJOR, 0), 1) || register_chrdev_region(MKDEV(TTYAUX_MAJOR, 0), 1, "/dev/tty") < 0) --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -156,6 +156,13 @@ static void put_ldops(struct tty_ldisc_o * takes tty_ldiscs_lock to guard against ldisc races */
+#if defined(CONFIG_LDISC_AUTOLOAD) + #define INITIAL_AUTOLOAD_STATE 1 +#else + #define INITIAL_AUTOLOAD_STATE 0 +#endif +static int tty_ldisc_autoload = INITIAL_AUTOLOAD_STATE; + static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc) { struct tty_ldisc *ld; @@ -170,6 +177,8 @@ static struct tty_ldisc *tty_ldisc_get(s */ ldops = get_ldops(disc); if (IS_ERR(ldops)) { + if (!capable(CAP_SYS_MODULE) && !tty_ldisc_autoload) + return ERR_PTR(-EPERM); request_module("tty-ldisc-%d", disc); ldops = get_ldops(disc); if (IS_ERR(ldops)) @@ -829,3 +838,41 @@ void tty_ldisc_deinit(struct tty_struct tty_ldisc_put(tty->ldisc); tty->ldisc = NULL; } + +static int zero; +static int one = 1; +static struct ctl_table tty_table[] = { + { + .procname = "ldisc_autoload", + .data = &tty_ldisc_autoload, + .maxlen = sizeof(tty_ldisc_autoload), + .mode = 0644, + .proc_handler = proc_dointvec, + .extra1 = &zero, + .extra2 = &one, + }, + { } +}; + +static struct ctl_table tty_dir_table[] = { + { + .procname = "tty", + .mode = 0555, + .child = tty_table, + }, + { } +}; + +static struct ctl_table tty_root_table[] = { + { + .procname = "dev", + .mode = 0555, + .child = tty_dir_table, + }, + { } +}; + +void tty_sysctl_init(void) +{ + register_sysctl_table(tty_root_table); +}
From: Axel Lin axel.lin@ingics.com
commit a165dcc923ada2ffdee1d4f41f12f81b66d04c55 upstream.
Select REGMAP_I2C to avoid below build error: ERROR: "__devm_regmap_init_i2c" [drivers/hwmon/w83773g.ko] undefined!
Fixes: ee249f271524 ("hwmon: Add W83773G driver") Cc: stable@vger.kernel.org Signed-off-by: Axel Lin axel.lin@ingics.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1755,6 +1755,7 @@ config SENSORS_VT8231 config SENSORS_W83773G tristate "Nuvoton W83773G" depends on I2C + select REGMAP_I2C help If you say yes here you get support for the Nuvoton W83773G hardware monitoring chip.
From: Furquan Shaikh furquan@google.com
commit c8b1917c8987a6fa3695d479b4d60fbbbc3e537b upstream.
Commit 18996f2db918 ("ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume") was added to stop clearing event status bits unconditionally in the system-wide suspend and resume paths. This was done because of an issue with a laptop lid appaering to be closed even when it was used to wake up the system from suspend (see https://bugzilla.kernel.org/show_bug.cgi?id=196249), which happened because event status bits were cleared unconditionally on system resume. Though this change fixed the issue in the resume path, it introduced regressions in a few suspend paths.
First regression was reported and fixed in the S5 entry path by commit fa85015c0d95 ("ACPICA: Clear status of all events when entering S5"). Next regression was reported and fixed for all legacy sleep paths by commit f317c7dc12b7 ("ACPICA: Clear status of all events when entering sleep states"). However, there still is a suspend-to-idle regression, since suspend-to-idle does not follow the legacy sleep paths.
In the suspend-to-idle case, wakeup is enabled as part of device suspend. If the status bits of wakeup GPEs are set when they are enabled, it causes a premature system wakeup to occur.
To address that problem, partially revert commit 18996f2db918 to restore GPE status bits clearing before the GPE is enabled in acpi_ev_enable_gpe().
Fixes: 18996f2db918 ("ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume") Signed-off-by: Furquan Shaikh furquan@google.com Cc: 4.17+ stable@vger.kernel.org # 4.17+ [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/evgpe.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/acpi/acpica/evgpe.c +++ b/drivers/acpi/acpica/evgpe.c @@ -81,8 +81,12 @@ acpi_status acpi_ev_enable_gpe(struct ac
ACPI_FUNCTION_TRACE(ev_enable_gpe);
- /* Enable the requested GPE */ + /* Clear the GPE status */ + status = acpi_hw_clear_gpe(gpe_event_info); + if (ACPI_FAILURE(status)) + return_ACPI_STATUS(status);
+ /* Enable the requested GPE */ status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE); return_ACPI_STATUS(status); }
From: Erik Schmauss erik.schmauss@intel.com
commit c5781ffbbd4f742a58263458145fe7f0ac01d9e0 upstream.
ACPICA commit b233720031a480abd438f2e9c643080929d144c3
ASL operation_regions declare a range of addresses that it uses. In a perfect world, the range of addresses should be used exclusively by the AML interpreter. The OS can use this information to decide which drivers to load so that the AML interpreter and device drivers use different regions of memory.
During table load, the address information is added to a global address range list. Each node in this list contains an address range as well as a namespace node of the operation_region. This list is deleted at ACPI shutdown.
Unfortunately, ASL operation_regions can be declared inside of control methods. Although this is not recommended, modern firmware contains such code. New module level code changes unintentionally removed the functionality of adding and removing nodes to the global address range list.
A few months ago, support for adding addresses has been re- implemented. However, the removal of the address range list was missed and resulted in some systems to crash due to the address list containing bogus namespace nodes from operation_regions declared in control methods. In order to fix the crash, this change removes dynamic operation_regions after control method termination.
Link: https://github.com/acpica/acpica/commit/b2337200 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202475 Fixes: 4abb951b73ff ("ACPICA: AML interpreter: add region addresses in global list during initialization") Reported-by: Michael J Gruber mjg@fedoraproject.org Signed-off-by: Erik Schmauss erik.schmauss@intel.com Signed-off-by: Bob Moore robert.moore@intel.com Cc: 4.20+ stable@vger.kernel.org # 4.20+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/nsobject.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/acpi/acpica/nsobject.c +++ b/drivers/acpi/acpica/nsobject.c @@ -186,6 +186,10 @@ void acpi_ns_detach_object(struct acpi_n } }
+ if (obj_desc->common.type == ACPI_TYPE_REGION) { + acpi_ut_remove_address_range(obj_desc->region.space_id, node); + } + /* Clear the Node entry in all cases */
node->object = NULL;
From: Zubin Mithra zsm@chromium.org
commit 212ac181c158c09038c474ba68068be49caecebb upstream.
When ioctl calls are made with non-null-terminated userspace strings, strlcpy causes an OOB-read from within strlen. Fix by changing to use strscpy instead.
Signed-off-by: Zubin Mithra zsm@chromium.org Reviewed-by: Guenter Roeck groeck@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/seq/seq_clientmgr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -1252,7 +1252,7 @@ static int snd_seq_ioctl_set_client_info
/* fill the info fields */ if (client_info->name[0]) - strlcpy(client->name, client_info->name, sizeof(client->name)); + strscpy(client->name, client_info->name, sizeof(client->name));
client->filter = client_info->filter; client->event_lost = client_info->event_lost; @@ -1530,7 +1530,7 @@ static int snd_seq_ioctl_create_queue(st /* set queue name */ if (!info->name[0]) snprintf(info->name, sizeof(info->name), "Queue-%d", q->queue); - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); snd_use_lock_free(&q->use_lock);
return 0; @@ -1592,7 +1592,7 @@ static int snd_seq_ioctl_set_queue_info( queuefree(q); return -EPERM; } - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); queuefree(q);
return 0;
From: Jian-Hong Pan jian-hong@endlessm.com
commit ea5c7eba216e832906e594799b8670f1954a588c upstream.
The Acer TravelMate B114-21 laptop cannot detect and record sound from headset MIC. This patch adds the ALC233_FIXUP_ACER_HEADSET_MIC HDA verb quirk chained with ALC233_FIXUP_ASUS_MIC_NO_PRESENCE pin quirk to fix this issue.
[ fixed the missing brace and reordered the entry -- tiwai ]
Signed-off-by: Jian-Hong Pan jian-hong@endlessm.com Signed-off-by: Daniel Drake drake@endlessm.com Reviewed-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5594,6 +5594,7 @@ enum { ALC233_FIXUP_ASUS_MIC_NO_PRESENCE, ALC233_FIXUP_EAPD_COEF_AND_MIC_NO_PRESENCE, ALC233_FIXUP_LENOVO_MULTI_CODECS, + ALC233_FIXUP_ACER_HEADSET_MIC, ALC294_FIXUP_LENOVO_MIC_LOCATION, ALC225_FIXUP_DELL_WYSE_MIC_NO_PRESENCE, ALC700_FIXUP_INTEL_REFERENCE, @@ -6401,6 +6402,16 @@ static const struct hda_fixup alc269_fix .type = HDA_FIXUP_FUNC, .v.func = alc233_alc662_fixup_lenovo_dual_codecs, }, + [ALC233_FIXUP_ACER_HEADSET_MIC] = { + .type = HDA_FIXUP_VERBS, + .v.verbs = (const struct hda_verb[]) { + { 0x20, AC_VERB_SET_COEF_INDEX, 0x45 }, + { 0x20, AC_VERB_SET_PROC_COEF, 0x5089 }, + { } + }, + .chained = true, + .chain_id = ALC233_FIXUP_ASUS_MIC_NO_PRESENCE + }, [ALC294_FIXUP_LENOVO_MIC_LOCATION] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -6644,6 +6655,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1308, "Acer Aspire Z24-890", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), + SND_PCI_QUIRK(0x1025, 0x132a, "Acer TravelMate B114-21", ALC233_FIXUP_ACER_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1330, "Acer TravelMate X514-51T", ALC255_FIXUP_ACER_HEADSET_MIC), SND_PCI_QUIRK(0x1028, 0x0470, "Dell M101z", ALC269_FIXUP_DELL_M101Z), SND_PCI_QUIRK(0x1028, 0x054b, "Dell XPS one 2710", ALC275_FIXUP_DELL_XPS),
From: Richard Sailer rs@tuxedocomputers.com
commit 80690a276f444a68a332136d98bfea1c338bc263 upstream.
This adds a SND_PCI_QUIRK(...) line for the Tuxedo XC 1509.
The Tuxedo XC 1509 and the System76 oryp5 are the same barebone notebooks manufactured by Clevo. To name the fixups both use after the actual underlying hardware, this patch also changes System76_orpy5 to clevo_pb51ed in 2 enum symbols and one function name, matching the other pci_quirk entries which are also named after the device ODM.
Fixes: 7f665b1c3283 ("ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5") Signed-off-by: Richard Sailer rs@tuxedocomputers.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -1864,8 +1864,8 @@ enum { ALC887_FIXUP_BASS_CHMAP, ALC1220_FIXUP_GB_DUAL_CODECS, ALC1220_FIXUP_CLEVO_P950, - ALC1220_FIXUP_SYSTEM76_ORYP5, - ALC1220_FIXUP_SYSTEM76_ORYP5_PINS, + ALC1220_FIXUP_CLEVO_PB51ED, + ALC1220_FIXUP_CLEVO_PB51ED_PINS, };
static void alc889_fixup_coef(struct hda_codec *codec, @@ -2070,7 +2070,7 @@ static void alc1220_fixup_clevo_p950(str static void alc_fixup_headset_mode_no_hp_mic(struct hda_codec *codec, const struct hda_fixup *fix, int action);
-static void alc1220_fixup_system76_oryp5(struct hda_codec *codec, +static void alc1220_fixup_clevo_pb51ed(struct hda_codec *codec, const struct hda_fixup *fix, int action) { @@ -2322,18 +2322,18 @@ static const struct hda_fixup alc882_fix .type = HDA_FIXUP_FUNC, .v.func = alc1220_fixup_clevo_p950, }, - [ALC1220_FIXUP_SYSTEM76_ORYP5] = { + [ALC1220_FIXUP_CLEVO_PB51ED] = { .type = HDA_FIXUP_FUNC, - .v.func = alc1220_fixup_system76_oryp5, + .v.func = alc1220_fixup_clevo_pb51ed, }, - [ALC1220_FIXUP_SYSTEM76_ORYP5_PINS] = { + [ALC1220_FIXUP_CLEVO_PB51ED_PINS] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { { 0x19, 0x01a1913c }, /* use as headset mic, without its own jack detect */ {} }, .chained = true, - .chain_id = ALC1220_FIXUP_SYSTEM76_ORYP5, + .chain_id = ALC1220_FIXUP_CLEVO_PB51ED, }, };
@@ -2411,8 +2411,9 @@ static const struct snd_pci_quirk alc882 SND_PCI_QUIRK(0x1558, 0x9501, "Clevo P950HR", ALC1220_FIXUP_CLEVO_P950), SND_PCI_QUIRK(0x1558, 0x95e1, "Clevo P95xER", ALC1220_FIXUP_CLEVO_P950), SND_PCI_QUIRK(0x1558, 0x95e2, "Clevo P950ER", ALC1220_FIXUP_CLEVO_P950), - SND_PCI_QUIRK(0x1558, 0x96e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_SYSTEM76_ORYP5_PINS), - SND_PCI_QUIRK(0x1558, 0x97e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_SYSTEM76_ORYP5_PINS), + SND_PCI_QUIRK(0x1558, 0x96e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_CLEVO_PB51ED_PINS), + SND_PCI_QUIRK(0x1558, 0x97e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_CLEVO_PB51ED_PINS), + SND_PCI_QUIRK(0x1558, 0x65d1, "Tuxedo Book XC1509", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK_VENDOR(0x1558, "Clevo laptop", ALC882_FIXUP_EAPD), SND_PCI_QUIRK(0x161f, 0x2054, "Medion laptop", ALC883_FIXUP_EAPD), SND_PCI_QUIRK(0x17aa, 0x3a0d, "Lenovo Y530", ALC882_FIXUP_LENOVO_Y530),
From: Hui Wang hui.wang@canonical.com
commit cae30527901d9590db0e12ace994c1d58bea87fd upstream.
Recently we set CONFIG_SND_HDA_POWER_SAVE_DEFAULT to 1 when configuring the kernel, then two machines were reported to have noise after installing the new kernel. Put them in the blacklist, the noise disappears.
https://bugs.launchpad.net/bugs/1821663 Cc: stable@vger.kernel.org Signed-off-by: Hui Wang hui.wang@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/hda_intel.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2272,6 +2272,8 @@ static struct snd_pci_quirk power_save_b SND_PCI_QUIRK(0x8086, 0x2040, "Intel DZ77BH-55K", 0), /* https://bugzilla.kernel.org/show_bug.cgi?id=199607 */ SND_PCI_QUIRK(0x8086, 0x2057, "Intel NUC5i7RYB", 0), + /* https://bugs.launchpad.net/bugs/1821663 */ + SND_PCI_QUIRK(0x8086, 0x2064, "Intel SDP 8086:2064", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1520902 */ SND_PCI_QUIRK(0x8086, 0x2068, "Intel NUC7i3BNB", 0), /* https://bugzilla.kernel.org/show_bug.cgi?id=198611 */ @@ -2280,6 +2282,8 @@ static struct snd_pci_quirk power_save_b SND_PCI_QUIRK(0x17aa, 0x367b, "Lenovo IdeaCentre B550", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1572975 */ SND_PCI_QUIRK(0x17aa, 0x36a7, "Lenovo C50 All in one", 0), + /* https://bugs.launchpad.net/bugs/1821663 */ + SND_PCI_QUIRK(0x1631, 0xe017, "Packard Bell NEC IMEDIA 5204", 0), {} }; #endif /* CONFIG_PM */
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit c6f3c5ee40c10bb65725047a220570f718507001 upstream.
With some architectures like ppc64, set_pmd_at() cannot cope with a situation where there is already some (different) valid entry present.
Use pmdp_set_access_flags() instead to modify the pfn which is built to deal with modifying existing PMD entries.
This is similar to commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by insert_pfn()")
We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support pud pfn entries now.
Without this patch we also see the below message in kernel log "BUG: non-zero pgtables_bytes on freeing mm:"
Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reported-by: Chandan Rajendra chandan@linux.ibm.com Reviewed-by: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -734,6 +734,21 @@ static void insert_pfn_pmd(struct vm_are spinlock_t *ptl;
ptl = pmd_lock(mm, pmd); + if (!pmd_none(*pmd)) { + if (write) { + if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); + goto out_unlock; + } + entry = pmd_mkyoung(*pmd); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (pmdp_set_access_flags(vma, addr, pmd, entry, 1)) + update_mmu_cache_pmd(vma, addr, pmd); + } + + goto out_unlock; + } + entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); @@ -745,11 +760,16 @@ static void insert_pfn_pmd(struct vm_are if (pgtable) { pgtable_trans_huge_deposit(mm, pmd, pgtable); mm_inc_nr_ptes(mm); + pgtable = NULL; }
set_pmd_at(mm, addr, pmd, entry); update_mmu_cache_pmd(vma, addr, pmd); + +out_unlock: spin_unlock(ptl); + if (pgtable) + pte_free(mm, pgtable); }
vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, @@ -800,6 +820,20 @@ static void insert_pfn_pud(struct vm_are spinlock_t *ptl;
ptl = pud_lock(mm, pud); + if (!pud_none(*pud)) { + if (write) { + if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_huge_zero_pud(*pud)); + goto out_unlock; + } + entry = pud_mkyoung(*pud); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + if (pudp_set_access_flags(vma, addr, pud, entry, 1)) + update_mmu_cache_pud(vma, addr, pud); + } + goto out_unlock; + } + entry = pud_mkhuge(pfn_t_pud(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pud_mkdevmap(entry); @@ -809,6 +843,8 @@ static void insert_pfn_pud(struct vm_are } set_pud_at(mm, addr, pud, entry); update_mmu_cache_pud(vma, addr, pud); + +out_unlock: spin_unlock(ptl); }
From: Peter Geis pgwipeout@gmail.com
commit 09f91381fa5de1d44bc323d8bf345f5d57b3d9b5 upstream.
Various rk3328 based boards experience occasional sdmmc0 write errors. This is due to the rk3328.dtsi tx drive levels being set to 4ma, vs 8ma per the rk3328 datasheet default settings.
Fix this by setting the tx signal pins to 8ma. Inspiration from tonymac32's patch, https://github.com/ayufan-rock64/linux-kernel/commit/dc1212b347e0da17c5460bc...
Fixes issues on the rk3328-roc-cc and the rk3328-rock64 (as per the above commit message).
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi @@ -1356,11 +1356,11 @@
sdmmc0 { sdmmc0_clk: sdmmc0-clk { - rockchip,pins = <1 RK_PA6 1 &pcfg_pull_none_4ma>; + rockchip,pins = <1 RK_PA6 1 &pcfg_pull_none_8ma>; };
sdmmc0_cmd: sdmmc0-cmd { - rockchip,pins = <1 RK_PA4 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA4 1 &pcfg_pull_up_8ma>; };
sdmmc0_dectn: sdmmc0-dectn { @@ -1372,14 +1372,14 @@ };
sdmmc0_bus1: sdmmc0-bus1 { - rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_8ma>; };
sdmmc0_bus4: sdmmc0-bus4 { - rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_4ma>, - <1 RK_PA1 1 &pcfg_pull_up_4ma>, - <1 RK_PA2 1 &pcfg_pull_up_4ma>, - <1 RK_PA3 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_8ma>, + <1 RK_PA1 1 &pcfg_pull_up_8ma>, + <1 RK_PA2 1 &pcfg_pull_up_8ma>, + <1 RK_PA3 1 &pcfg_pull_up_8ma>; };
sdmmc0_gpio: sdmmc0-gpio {
From: Helge Deller deller@gmx.de
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/kernel/process.c | 6 ------ arch/parisc/kernel/setup.c | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -210,12 +210,6 @@ void __cpuidle arch_cpu_idle(void)
static int __init parisc_idle_init(void) { - const char *marker; - - /* check QEMU/SeaBIOS marker in PAGE0 */ - marker = (char *) &PAGE0->pad0; - running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0); - if (!running_on_qemu) cpu_idle_poll_ctrl(1);
--- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -399,6 +399,9 @@ void __init start_parisc(void) int ret, cpunum; struct pdc_coproc_cfg coproc_cfg;
+ /* check QEMU/SeaBIOS marker in PAGE0 */ + running_on_qemu = (memcmp(&PAGE0->pad0, "SeaBIOS", 8) == 0); + cpunum = smp_processor_id();
init_cpu_topology();
From: Sven Schnelle svens@stackframe.org
commit 45efd871bf0a47648f119d1b41467f70484de5bc upstream.
While working on kretprobes for PA-RISC I was wondering while the kprobes sanity test always fails on kretprobes. This is caused by returning gpr20 instead of gpr28.
Signed-off-by: Sven Schnelle svens@stackframe.org Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/parisc/include/asm/ptrace.h +++ b/arch/parisc/include/asm/ptrace.h @@ -22,7 +22,7 @@ unsigned long profile_pc(struct pt_regs
static inline unsigned long regs_return_value(struct pt_regs *regs) { - return regs->gr[20]; + return regs->gr[28]; }
static inline void instruction_pointer_set(struct pt_regs *regs,
From: Sven Schnelle svens@stackframe.org
commit f324fa58327791b2696628b31480e7e21c745706 upstream.
When setting the instruction pointer on PA-RISC we also need to set the back of the instruction queue to the new offset, otherwise we will execute on instruction from the new location, and jumping back to the old location stored in iaoq_b.
Signed-off-by: Sven Schnelle svens@stackframe.org Signed-off-by: Helge Deller deller@gmx.de Fixes: 75ebedf1d263 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/include/asm/ptrace.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/parisc/include/asm/ptrace.h +++ b/arch/parisc/include/asm/ptrace.h @@ -28,7 +28,8 @@ static inline unsigned long regs_return_ static inline void instruction_pointer_set(struct pt_regs *regs, unsigned long val) { - regs->iaoq[0] = val; + regs->iaoq[0] = val; + regs->iaoq[1] = val + 4; }
/* Query offset/name of register from its name/offset */
From: Andrei Vagin avagin@gmail.com
commit 07d7e12091f4ab869cc6a4bb276399057e73b0b3 upstream.
To calculate a remaining time, it's required to subtract the current time from the expiration time. In alarm_timer_remaining() the arguments of ktime_sub are swapped.
Fixes: d653d8457c76 ("alarmtimer: Implement remaining callback") Signed-off-by: Andrei Vagin avagin@gmail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Stephen Boyd sboyd@kernel.org Cc: John Stultz john.stultz@linaro.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190408041542.26338-1-avagin@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -597,7 +597,7 @@ static ktime_t alarm_timer_remaining(str { struct alarm *alarm = &timr->it.alarm.alarmtimer;
- return ktime_sub(now, alarm->node.expires); + return ktime_sub(alarm->node.expires, now); }
/**
From: Yan Zhao yan.y.zhao@intel.com
commit dade58ed5af6365ac50ff4259c2a0bf31219e285 upstream.
in workload creation routine, if any failure occurs, do not queue this workload for delivery. if this failure is fatal, enter into failsafe mode.
Fixes: 6d76303553ba ("drm/i915/gvt: Move common vGPU workload creation into scheduler.c") Cc: stable@vger.kernel.org #4.19+ Cc: zhenyuw@linux.intel.com Signed-off-by: Yan Zhao yan.y.zhao@intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gvt/scheduler.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -1389,8 +1389,9 @@ intel_vgpu_create_workload(struct intel_ intel_runtime_pm_put(dev_priv); }
- if (ret && (vgpu_is_vm_unhealthy(ret))) { - enter_failsafe_mode(vgpu, GVT_FAILSAFE_GUEST_ERR); + if (ret) { + if (vgpu_is_vm_unhealthy(ret)) + enter_failsafe_mode(vgpu, GVT_FAILSAFE_GUEST_ERR); intel_vgpu_destroy_workload(workload); return ERR_PTR(ret); }
From: Dave Airlie airlied@redhat.com
commit 9b39b013037fbfa8d4b999345d9e904d8a336fc2 upstream.
If we unplug a udl device, the usb callback with deinit the mode_config struct, however userspace will still have an open file descriptor and a framebuffer on that device. When userspace closes the fd, we'll oops because it'll try and look stuff up in the object idr which we've destroyed.
This punts destroying the mode objects until release time instead.
Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20190405031715.5959-2-airlied@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/udl/udl_drv.c | 1 + drivers/gpu/drm/udl/udl_drv.h | 1 + drivers/gpu/drm/udl/udl_main.c | 8 +++++++- 3 files changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/udl/udl_drv.c +++ b/drivers/gpu/drm/udl/udl_drv.c @@ -51,6 +51,7 @@ static struct drm_driver driver = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_PRIME, .load = udl_driver_load, .unload = udl_driver_unload, + .release = udl_driver_release,
/* gem hooks */ .gem_free_object_unlocked = udl_gem_free_object, --- a/drivers/gpu/drm/udl/udl_drv.h +++ b/drivers/gpu/drm/udl/udl_drv.h @@ -104,6 +104,7 @@ void udl_urb_completion(struct urb *urb)
int udl_driver_load(struct drm_device *dev, unsigned long flags); void udl_driver_unload(struct drm_device *dev); +void udl_driver_release(struct drm_device *dev);
int udl_fbdev_init(struct drm_device *dev); void udl_fbdev_cleanup(struct drm_device *dev); --- a/drivers/gpu/drm/udl/udl_main.c +++ b/drivers/gpu/drm/udl/udl_main.c @@ -378,6 +378,12 @@ void udl_driver_unload(struct drm_device udl_free_urb_list(dev);
udl_fbdev_cleanup(dev); - udl_modeset_cleanup(dev); kfree(udl); } + +void udl_driver_release(struct drm_device *dev) +{ + udl_modeset_cleanup(dev); + drm_dev_fini(dev); + kfree(dev); +}
From: David Rientjes rientjes@google.com
commit ede885ecb2cdf8a8dd5367702e3d964ec846a2d5 upstream.
get_num_contig_pages() could potentially overflow int so make its type consistent with its usage.
Reported-by: Cfir Cohen cfir@google.com Cc: stable@vger.kernel.org Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/svm.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -6398,11 +6398,11 @@ e_free: return ret; }
-static int get_num_contig_pages(int idx, struct page **inpages, - unsigned long npages) +static unsigned long get_num_contig_pages(unsigned long idx, + struct page **inpages, unsigned long npages) { unsigned long paddr, next_paddr; - int i = idx + 1, pages = 1; + unsigned long i = idx + 1, pages = 1;
/* find the number of contiguous pages starting from idx */ paddr = __sme_page_pa(inpages[idx]); @@ -6421,12 +6421,12 @@ static int get_num_contig_pages(int idx,
static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) { - unsigned long vaddr, vaddr_end, next_vaddr, npages, size; + unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i; struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; struct kvm_sev_launch_update_data params; struct sev_data_launch_update_data *data; struct page **inpages; - int i, ret, pages; + int ret;
if (!sev_guest(kvm)) return -ENOTTY;
From: Arnd Bergmann arnd@arndb.de
commit 6147e136ff5071609b54f18982dea87706288e21 upstream.
clang points out with hundreds of warnings that the bitrev macros have a problem with constant input:
drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^
Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other.
The obvious fix is to rename one of the variables, so this adds an extra '_'.
It seems we got away with this because
- there are only a few drivers using bitrev macros
- usually there are no constant arguments to those
- when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance.
In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing).
Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nick Desaulniers ndesaulniers@google.com Cc: Zhao Qiang qiang.zhao@nxp.com Cc: Yalin Wang yalin.wang@sonymobile.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bitrev.h | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-)
--- a/include/linux/bitrev.h +++ b/include/linux/bitrev.h @@ -34,41 +34,41 @@ static inline u32 __bitrev32(u32 x)
#define __constant_bitrev32(x) \ ({ \ - u32 __x = x; \ - __x = (__x >> 16) | (__x << 16); \ - __x = ((__x & (u32)0xFF00FF00UL) >> 8) | ((__x & (u32)0x00FF00FFUL) << 8); \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = (___x >> 16) | (___x << 16); \ + ___x = ((___x & (u32)0xFF00FF00UL) >> 8) | ((___x & (u32)0x00FF00FFUL) << 8); \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev16(x) \ ({ \ - u16 __x = x; \ - __x = (__x >> 8) | (__x << 8); \ - __x = ((__x & (u16)0xF0F0U) >> 4) | ((__x & (u16)0x0F0FU) << 4); \ - __x = ((__x & (u16)0xCCCCU) >> 2) | ((__x & (u16)0x3333U) << 2); \ - __x = ((__x & (u16)0xAAAAU) >> 1) | ((__x & (u16)0x5555U) << 1); \ - __x; \ + u16 ___x = x; \ + ___x = (___x >> 8) | (___x << 8); \ + ___x = ((___x & (u16)0xF0F0U) >> 4) | ((___x & (u16)0x0F0FU) << 4); \ + ___x = ((___x & (u16)0xCCCCU) >> 2) | ((___x & (u16)0x3333U) << 2); \ + ___x = ((___x & (u16)0xAAAAU) >> 1) | ((___x & (u16)0x5555U) << 1); \ + ___x; \ })
#define __constant_bitrev8x4(x) \ ({ \ - u32 __x = x; \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev8(x) \ ({ \ - u8 __x = x; \ - __x = (__x >> 4) | (__x << 4); \ - __x = ((__x & (u8)0xCCU) >> 2) | ((__x & (u8)0x33U) << 2); \ - __x = ((__x & (u8)0xAAU) >> 1) | ((__x & (u8)0x55U) << 1); \ - __x; \ + u8 ___x = x; \ + ___x = (___x >> 4) | (___x << 4); \ + ___x = ((___x & (u8)0xCCU) >> 2) | ((___x & (u8)0x33U) << 2); \ + ___x = ((___x & (u8)0xAAU) >> 1) | ((___x & (u8)0x55U) << 1); \ + ___x; \ })
#define bitrev32(x) \
From: Greg Thelen gthelen@google.com
commit 0b3d6e6f2dd0a7b697b1aa8c167265908940624b upstream.
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") memcg dirty and writeback counters are managed as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the atomic. Stat readers only check the atomic. Thus readers such as balance_dirty_pages() may see a nontrivial error margin: 32 pages per cpu.
Assuming 100 cpus: 4k x86 page_size: 13 MiB error per memcg 64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions the errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is when all per-cpu counters hold positive values offsetting an atomic negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32). balance_dirty_pages() only consults the atomic and does not consider throttling the next n_cpu*32 dirty pages. If the file_lru is in the 13..200 MiB range then there's absolutely no dirty throttling, which burdens vmscan with only dirty+writeback pages thus resorting to oom kill.
It could be argued that tiny containers are not supported, but it's more subtle. It's the amount the space available for file lru that matters. If a container has memory.max-200MiB of non reclaimable memory, then it will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids oom kills.
$ cat test mount -t cgroup2 none /dev/cgroup cd /dev/cgroup echo +io +memory > cgroup.subtree_control mkdir test cd test echo 10M > memory.max (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo) (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c /* * Dirty pages from all but one cpu. * Clean pages from the non dirtying cpu. * This is to stress per cpu counter imbalance. * On a 100 cpu machine: * - per memcg per cpu dirty count is 32 pages for each of 99 cpus * - per memcg atomic is -99*32 pages * - thus the complete dirty limit: sum of all counters 0 * - balance_dirty_pages() only sees atomic count -99*32 pages, which * it max()s to 0. * - So a workload can dirty -99*32 pages before balance_dirty_pages() * cares. */ #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <sched.h> #include <stdlib.h> #include <stdio.h> #include <sys/stat.h> #include <sys/sysinfo.h> #include <sys/types.h> #include <unistd.h>
static char *buf; static int bufSize;
static void set_affinity(int cpu) { cpu_set_t affinity;
CPU_ZERO(&affinity); CPU_SET(cpu, &affinity); if (sched_setaffinity(0, sizeof(affinity), &affinity)) err(1, "sched_setaffinity"); }
static void dirty_on(int output_fd, int cpu) { int i, wrote;
set_affinity(cpu); for (i = 0; i < 32; i++) { for (wrote = 0; wrote < bufSize; ) { int ret = write(output_fd, buf+wrote, bufSize-wrote); if (ret == -1) err(1, "write"); wrote += ret; } } }
int main(int argc, char **argv) { int cpu, flush_cpu = 1, output_fd; const char *output;
if (argc != 2) errx(1, "usage: output_file");
output = argv[1]; bufSize = getpagesize(); buf = malloc(getpagesize()); if (buf == NULL) errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR); if (output_fd == -1) err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) { if (cpu != flush_cpu) dirty_on(output_fd, cpu); }
set_affinity(flush_cpu); if (fsync(output_fd)) err(1, "fsync(%s)", output); if (close(output_fd)) err(1, "close(%s)", output); free(buf); }
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to collect exact per memcg counters. This avoids the aforementioned oom kills.
This does not affect the overhead of memory.stat, which still reads the single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline, so no need for that overhead from percpu_counter. And the percpu_counter spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters in memcg oom reports. But that is saved for later.
Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com Signed-off-by: Greg Thelen gthelen@google.com Reviewed-by: Roman Gushchin guro@fb.com Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@kernel.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org [4.16+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/memcontrol.h | 5 ++++- mm/memcontrol.c | 20 ++++++++++++++++++-- 2 files changed, 22 insertions(+), 3 deletions(-)
--- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -559,7 +559,10 @@ struct mem_cgroup *lock_page_memcg(struc void __unlock_page_memcg(struct mem_cgroup *memcg); void unlock_page_memcg(struct page *page);
-/* idx can be of type enum memcg_stat_item or node_stat_item */ +/* + * idx can be of type enum memcg_stat_item or node_stat_item. + * Keep in sync with memcg_exact_page_state(). + */ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3897,6 +3897,22 @@ struct wb_domain *mem_cgroup_wb_domain(s return &memcg->cgwb_domain; }
+/* + * idx can be of type enum memcg_stat_item or node_stat_item. + * Keep in sync with memcg_exact_page(). + */ +static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx) +{ + long x = atomic_long_read(&memcg->stat[idx]); + int cpu; + + for_each_online_cpu(cpu) + x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx]; + if (x < 0) + x = 0; + return x; +} + /** * mem_cgroup_wb_stats - retrieve writeback related stats from its memcg * @wb: bdi_writeback in question @@ -3922,10 +3938,10 @@ void mem_cgroup_wb_stats(struct bdi_writ struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent;
- *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); + *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY);
/* this should eventually include NR_UNSTABLE_NFS */ - *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); + *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK); *pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) | (1 << LRU_ACTIVE_FILE)); *pheadroom = PAGE_COUNTER_MAX;
From: Guenter Roeck linux@roeck-us.net
commit 8f71370f4b02730e8c27faf460af7a3586e24e1f upstream.
If codec registration fails after the ASoC Intel SST driver has been probed, the kernel will Oops and crash at suspend/resume.
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 PID: 2811 Comm: cat Tainted: G W 4.19.30 #15 Hardware name: GOOGLE Clapper, BIOS Google_Clapper.5216.199.7 08/22/2014 RIP: 0010:snd_soc_suspend+0x5a/0xd21 Code: 03 80 3c 10 00 49 89 d7 74 0b 48 89 df e8 71 72 c4 fe 4c 89 fa 48 8b 03 48 89 45 d0 48 8d 98 a0 01 00 00 48 89 d8 48 c1 e8 03 <8a> 04 10 84 c0 0f 85 85 0c 00 00 80 3b 00 0f 84 6b 0c 00 00 48 8b RSP: 0018:ffff888035407750 EFLAGS: 00010202 RAX: 0000000000000034 RBX: 00000000000001a0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88805c417098 RBP: ffff8880354077b0 R08: dffffc0000000000 R09: ffffed100b975718 R10: 0000000000000001 R11: ffffffff949ea4a3 R12: 1ffff1100b975746 R13: dffffc0000000000 R14: ffff88805cba4588 R15: dffffc0000000000 FS: 0000794a78e91b80(0000) GS:ffff888068d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007bd5283ccf58 CR3: 000000004b7aa000 CR4: 00000000001006e0 Call Trace: ? dpm_complete+0x67b/0x67b ? i915_gem_suspend+0x14d/0x1ad sst_soc_prepare+0x91/0x1dd ? sst_be_hw_params+0x7e/0x7e dpm_prepare+0x39a/0x88b dpm_suspend_start+0x13/0x9d suspend_devices_and_enter+0x18f/0xbd7 ? arch_suspend_enable_irqs+0x11/0x11 ? printk+0xd9/0x12d ? lock_release+0x95f/0x95f ? log_buf_vmcoreinfo_setup+0x131/0x131 ? rcu_read_lock_sched_held+0x140/0x22a ? __bpf_trace_rcu_utilization+0xa/0xa ? __pm_pr_dbg+0x186/0x190 ? pm_notifier_call_chain+0x39/0x39 ? suspend_test+0x9d/0x9d pm_suspend+0x2f4/0x728 ? trace_suspend_resume+0x3da/0x3da ? lock_release+0x95f/0x95f ? kernfs_fop_write+0x19f/0x32d state_store+0xd8/0x147 ? sysfs_kf_read+0x155/0x155 kernfs_fop_write+0x23e/0x32d __vfs_write+0x108/0x608 ? vfs_read+0x2e9/0x2e9 ? rcu_read_lock_sched_held+0x140/0x22a ? __bpf_trace_rcu_utilization+0xa/0xa ? debug_smp_processor_id+0x10/0x10 ? selinux_file_permission+0x1c5/0x3c8 ? rcu_sync_lockdep_assert+0x6a/0xad ? __sb_start_write+0x129/0x2ac vfs_write+0x1aa/0x434 ksys_write+0xfe/0x1be ? __ia32_sys_read+0x82/0x82 do_syscall_64+0xcd/0x120 entry_SYSCALL_64_after_hwframe+0x49/0xbe
In the observed situation, the problem is seen because the codec driver failed to probe due to a hardware problem.
max98090 i2c-193C9890:00: Failed to read device revision: -1 max98090 i2c-193C9890:00: ASoC: failed to probe component -1 cht-bsw-max98090 cht-bsw-max98090: ASoC: failed to instantiate card -1 cht-bsw-max98090 cht-bsw-max98090: snd_soc_register_card failed -1 cht-bsw-max98090: probe of cht-bsw-max98090 failed with error -1
The problem is similar to the problem solved with commit 2fc995a87f2e ("ASoC: intel: Fix crash at suspend/resume without card registration"), but codec registration fails at a later point. At that time, the pointer checked with the above mentioned commit is already set, but it is not cleared if the device is subsequently removed. Adding a remove function to clear the pointer fixes the problem.
Cc: stable@vger.kernel.org Cc: Jarkko Nikula jarkko.nikula@linux.intel.com Cc: Curtis Malainey cujomalainey@chromium.org Signed-off-by: Guenter Roeck linux@roeck-us.net Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c +++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c @@ -711,9 +711,17 @@ static int sst_soc_probe(struct snd_soc_ return sst_dsp_init_v2_dpcm(component); }
+static void sst_soc_remove(struct snd_soc_component *component) +{ + struct sst_data *drv = dev_get_drvdata(component->dev); + + drv->soc_card = NULL; +} + static const struct snd_soc_component_driver sst_soc_platform_drv = { .name = DRV_NAME, .probe = sst_soc_probe, + .remove = sst_soc_remove, .ops = &sst_platform_ops, .compr_ops = &sst_platform_compr_ops, .pcm_new = sst_pcm_new,
From: S.j. Wang shengjiu.wang@nxp.com
commit 0ff4e8c61b794a4bf6c854ab071a1abaaa80f358 upstream.
There is very low possibility ( < 0.1% ) that channel swap happened in beginning when multi output/input pin is enabled. The issue is that hardware can't send data to correct pin in the beginning with the normal enable flow.
This is hardware issue, but there is no errata, the workaround flow is that: Each time playback/recording, firstly clear the xSMA/xSMB, then enable TE/RE, then enable xSMB and xSMA (xSMB must be enabled before xSMA). Which is to use the xSMA as the trigger start register, previously the xCR_TE or xCR_RE is the bit for starting.
Fixes commit 43d24e76b698 ("ASoC: fsl_esai: Add ESAI CPU DAI driver") Cc: stable@vger.kernel.org Reviewed-by: Fabio Estevam festevam@gmail.com Acked-by: Nicolin Chen nicoleotsuka@gmail.com Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/fsl/fsl_esai.c | 47 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 10 deletions(-)
--- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -54,6 +54,8 @@ struct fsl_esai { u32 fifo_depth; u32 slot_width; u32 slots; + u32 tx_mask; + u32 rx_mask; u32 hck_rate[2]; u32 sck_rate[2]; bool hck_dir[2]; @@ -361,21 +363,13 @@ static int fsl_esai_set_dai_tdm_slot(str regmap_update_bits(esai_priv->regmap, REG_ESAI_TCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(rx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(rx_mask)); - esai_priv->slot_width = slot_width; esai_priv->slots = slots; + esai_priv->tx_mask = tx_mask; + esai_priv->rx_mask = rx_mask;
return 0; } @@ -596,6 +590,7 @@ static int fsl_esai_trigger(struct snd_p bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; u8 i, channels = substream->runtime->channels; u32 pins = DIV_ROUND_UP(channels, esai_priv->slots); + u32 mask;
switch (cmd) { case SNDRV_PCM_TRIGGER_START: @@ -608,15 +603,38 @@ static int fsl_esai_trigger(struct snd_p for (i = 0; tx && i < channels; i++) regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);
+ /* + * When set the TE/RE in the end of enablement flow, there + * will be channel swap issue for multi data line case. + * In order to workaround this issue, we switch the bit + * enablement sequence to below sequence + * 1) clear the xSMB & xSMA: which is done in probe and + * stop state. + * 2) set TE/RE + * 3) set xSMB + * 4) set xSMA: xSMA is the last one in this flow, which + * will trigger esai to start. + */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins)); + mask = tx ? esai_priv->tx_mask : esai_priv->rx_mask; + + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(mask)); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(mask)); + break; case SNDRV_PCM_TRIGGER_SUSPEND: case SNDRV_PCM_TRIGGER_STOP: case SNDRV_PCM_TRIGGER_PAUSE_PUSH: regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, 0);
/* Disable and reset FIFO */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), @@ -906,6 +924,15 @@ static int fsl_esai_probe(struct platfor return ret; }
+ esai_priv->tx_mask = 0xFFFFFFFF; + esai_priv->rx_mask = 0xFFFFFFFF; + + /* Clear the TSMA, TSMB, RSMA, RSMB */ + regmap_write(esai_priv->regmap, REG_ESAI_TSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_TSMB, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMB, 0); + ret = devm_snd_soc_register_component(&pdev->dev, &fsl_esai_component, &fsl_esai_dai, 1); if (ret) {
From: Filipe Manana fdmanana@suse.com
commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream.
Whan a filesystem is mounted with the nologreplay mount option, which requires it to be mounted in RO mode as well, we can not allow discard on free space inside block groups, because log trees refer to extents that are not pinned in a block group's free space cache (pinning the extents is precisely the first phase of replaying a log tree).
So do not allow the fitrim ioctl to do anything when the filesystem is mounted with the nologreplay option, because later it can be mounted RW without that option, which causes log replay to happen and result in either a failure to replay the log trees (leading to a mount failure), a crash or some silent corruption.
Reported-by: Darrick J. Wong darrick.wong@oracle.com Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ioctl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -496,6 +496,16 @@ static noinline int btrfs_ioctl_fitrim(s if (!capable(CAP_SYS_ADMIN)) return -EPERM;
+ /* + * If the fs is mounted with nologreplay, which requires it to be + * mounted in RO mode as well, we can not allow discard on free space + * inside block groups, because log trees refer to extents that are not + * pinned in a block group's free space cache (pinning the extents is + * precisely the first phase of replaying a log tree). + */ + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) + return -EROFS; + rcu_read_lock(); list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, dev_list) {
From: Anand Jain anand.jain@oracle.com
commit 50398fde997f6be8faebdb5f38e9c9c467370f51 upstream.
We let pass zstd compression parameter even if it is not fully valid. For example:
$ btrfs prop set /btrfs compression zst $ btrfs prop get /btrfs compression compression=zst
zlib and lzo are fine.
Fix it by checking the correct prefix length.
Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/props.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -396,7 +396,7 @@ static int prop_compression_apply(struct btrfs_set_fs_incompat(fs_info, COMPRESS_LZO); } else if (!strncmp("zlib", value, 4)) { type = BTRFS_COMPRESS_ZLIB; - } else if (!strncmp("zstd", value, len)) { + } else if (!strncmp("zstd", value, 4)) { type = BTRFS_COMPRESS_ZSTD; btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD); } else {
From: Anand Jain anand.jain@oracle.com
commit 272e5326c7837697882ce3162029ba893059b616 upstream.
The compression property resets to NULL, instead of the old value if we fail to set the new compression parameter.
$ btrfs prop get /btrfs compression compression=lzo $ btrfs prop set /btrfs compression zli ERROR: failed to set compression for /btrfs: Invalid argument $ btrfs prop get /btrfs compression
This is because the compression property ->validate() is successful for 'zli' as the strncmp() used the length passed from the userspace.
Fix it by using the expected string length in strncmp().
Fixes: 63541927c8d1 ("Btrfs: add support for inode properties") Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/props.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -366,11 +366,11 @@ int btrfs_subvol_inherit_props(struct bt
static int prop_compression_validate(const char *value, size_t len) { - if (!strncmp("lzo", value, len)) + if (!strncmp("lzo", value, 3)) return 0; - else if (!strncmp("zlib", value, len)) + else if (!strncmp("zlib", value, 4)) return 0; - else if (!strncmp("zstd", value, len)) + else if (!strncmp("zstd", value, 4)) return 0;
return -EINVAL;
From: Dmitry V. Levin ldv@altlinux.org
commit 10a16997db3d99fc02c026cf2c6e6c670acafab0 upstream.
RISC-V syscall arguments are located in orig_a0,a1..a5 fields of struct pt_regs.
Due to an off-by-one bug and a bug in pointer arithmetic syscall_get_arguments() was reading s3..s7 fields instead of a1..a5. Likewise, syscall_set_arguments() was writing s3..s7 fields instead of a1..a5.
Link: http://lkml.kernel.org/r/20190329171221.GA32456@altlinux.org
Fixes: e2c0cdfba7f69 ("RISC-V: User-facing API") Cc: Ingo Molnar mingo@redhat.com Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Albert Ou aou@eecs.berkeley.edu Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org # v4.15+ Acked-by: Palmer Dabbelt palmer@sifive.com Signed-off-by: Dmitry V. Levin ldv@altlinux.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/riscv/include/asm/syscall.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/arch/riscv/include/asm/syscall.h +++ b/arch/riscv/include/asm/syscall.h @@ -78,10 +78,11 @@ static inline void syscall_get_arguments if (i == 0) { args[0] = regs->orig_a0; args++; - i++; n--; + } else { + i--; } - memcpy(args, ®s->a1 + i * sizeof(regs->a1), n * sizeof(args[0])); + memcpy(args, ®s->a1 + i, n * sizeof(args[0])); }
static inline void syscall_set_arguments(struct task_struct *task, @@ -93,10 +94,11 @@ static inline void syscall_set_arguments if (i == 0) { regs->orig_a0 = args[0]; args++; - i++; n--; - } - memcpy(®s->a1 + i * sizeof(regs->a1), args, n * sizeof(regs->a0)); + } else { + i--; + } + memcpy(®s->a1 + i, args, n * sizeof(regs->a1)); }
#endif /* _ASM_RISCV_SYSCALL_H */
From: Jérôme Glisse jglisse@redhat.com
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it.
Cc: linux-block@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/bio.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/block/bio.c +++ b/block/bio.c @@ -1240,8 +1240,11 @@ struct bio *bio_copy_user_iov(struct req } }
- if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) + if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) { + if (!map_data) + __free_page(page); break; + }
len -= bytes; offset = 0;
From: Jason Yan yanaijie@huawei.com
commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream.
If the last bio returned is not dio->bio, the status of the bio will not assigned to dio->bio if it is error. This will cause the whole IO status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0] <idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0] <idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>] <idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>] ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475] ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7 ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we will only check dio->bio.bi_status when we return the whole IO to the upper layer.
Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: Jens Axboe axboe@kernel.dk Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jason Yan yanaijie@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/block_dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -296,10 +296,10 @@ static void blkdev_bio_end_io(struct bio struct blkdev_dio *dio = bio->bi_private; bool should_dirty = dio->should_dirty;
- if (dio->multi_bio && !atomic_dec_and_test(&dio->ref)) { - if (bio->bi_status && !dio->bio.bi_status) - dio->bio.bi_status = bio->bi_status; - } else { + if (bio->bi_status && !dio->bio.bi_status) + dio->bio.bi_status = bio->bi_status; + + if (!dio->multi_bio || atomic_dec_and_test(&dio->ref)) { if (!dio->is_sync) { struct kiocb *iocb = dio->iocb; ssize_t ret;
From: Stephen Boyd swboyd@chromium.org
commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to walk the chain of parents and set irq wake on any chips that don't have the flag set either. If the intent is to call the .irq_set_wake() callback of the parent irqchip, then we expect irqchip implementations to omit the IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that calls irq_chip_set_wake_parent().
The problem has been observed on a Qualcomm sdm845 device where set wake fails on any GPIO interrupts after applying work in progress wakeup irq patches to the GPIO driver. The chain of chips looks like this:
QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag set, and so does the grandparent ARM GIC.
The GPIO driver doesn't know if the parent needs to set wake or not, so it unconditionally calls irq_chip_set_wake_parent() causing this function to return a failure because the parent irqchip (PDC) doesn't have the .irq_set_wake() callback set. Returning 0 instead makes everything work and irqs from the GPIO controller can be configured for wakeup.
Make it consistent by returning 0 (success) from irq_chip_set_wake_parent() when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
[ tglx: Massaged changelog ]
Fixes: 08b55e2a9208e ("genirq: Add irqchip_set_wake_parent") Signed-off-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-gpio@vger.kernel.org Cc: Lina Iyer ilina@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/chip.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1384,6 +1384,10 @@ int irq_chip_set_vcpu_affinity_parent(st int irq_chip_set_wake_parent(struct irq_data *data, unsigned int on) { data = data->parent_data; + + if (data->chip->flags & IRQCHIP_SKIP_SET_WAKE) + return 0; + if (data->chip->irq_set_wake) return data->chip->irq_set_wake(data, on);
From: Kefeng Wang wangkefeng.wang@huawei.com
commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.
When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc is not initialized which causes malfunction.
Fixes: 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/irqdesc.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -554,6 +554,7 @@ int __init early_irq_init(void) alloc_masks(&desc[i], node); raw_spin_lock_init(&desc[i].lock); lockdep_set_class(&desc[i].lock, &irq_desc_lock_class); + mutex_init(&desc[i].request_mutex); desc_set_defaults(i, &desc[i], node, NULL, NULL); } return arch_early_irq_init();
From: Cornelia Huck cohuck@redhat.com
commit cf94db21905333e610e479688add629397a4b384 upstream.
vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified.
However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.)
Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified.
While at it, fix a typo in the usage instructions.
Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Reviewed-by: Jens Freimann jfreimann@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/virtio/virtio_ring.c | 2 ++ include/linux/virtio_ring.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1086,6 +1086,8 @@ struct virtqueue *vring_create_virtqueue GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (queue) break; + if (!may_reduce_num) + return NULL; }
if (!num) --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -63,7 +63,7 @@ struct virtqueue; /* * Creates a virtqueue and allocates the descriptor ring. If * may_reduce_num is set, then this may allocate a smaller ring than - * expected. The caller should query virtqueue_get_ring_size to learn + * expected. The caller should query virtqueue_get_vring_size to learn * the actual size of the ring. */ struct virtqueue *vring_create_virtqueue(unsigned int index,
From: Jonas Karlman jonas@kwiboo.se
commit 6b2fde3dbfab6ebc45b0cd605e17ca5057ff9a3b upstream.
The following error can be seen during boot:
of: /cpus/cpu@501: Couldn't find opp node
Change cpu nodes to use operating-points-v2 in order to fix this.
Fixes: ce76de984649 ("ARM: dts: rockchip: convert rk3288 to operating-points-v2") Cc: stable@vger.kernel.org Signed-off-by: Jonas Karlman jonas@kwiboo.se Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/rk3288.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm/boot/dts/rk3288.dtsi +++ b/arch/arm/boot/dts/rk3288.dtsi @@ -70,7 +70,7 @@ compatible = "arm,cortex-a12"; reg = <0x501>; resets = <&cru SRST_CORE1>; - operating-points = <&cpu_opp_table>; + operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; /* min followed by max */ clock-latency = <40000>; clocks = <&cru ARMCLK>; @@ -80,7 +80,7 @@ compatible = "arm,cortex-a12"; reg = <0x502>; resets = <&cru SRST_CORE2>; - operating-points = <&cpu_opp_table>; + operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; /* min followed by max */ clock-latency = <40000>; clocks = <&cru ARMCLK>; @@ -90,7 +90,7 @@ compatible = "arm,cortex-a12"; reg = <0x503>; resets = <&cru SRST_CORE3>; - operating-points = <&cpu_opp_table>; + operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; /* min followed by max */ clock-latency = <40000>; clocks = <&cru ARMCLK>;
From: Peter Ujfalusi peter.ujfalusi@ti.com
commit 6691370646e844be98bb6558c024269791d20bd7 upstream.
Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: Stable@vger.kernel.org # v4.14+ Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/am335x-evmsk.dts | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/am335x-evmsk.dts +++ b/arch/arm/boot/dts/am335x-evmsk.dts @@ -73,6 +73,24 @@ enable-active-high; };
+ /* TPS79518 */ + v1_8d_reg: fixedregulator-v1_8d { + compatible = "regulator-fixed"; + regulator-name = "v1_8d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + /* TPS78633 */ + v3_3d_reg: fixedregulator-v3_3d { + compatible = "regulator-fixed"; + regulator-name = "v3_3d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + leds { pinctrl-names = "default"; pinctrl-0 = <&user_leds_s0>; @@ -501,10 +519,10 @@ status = "okay";
/* Regulators */ - AVDD-supply = <&vaux2_reg>; - IOVDD-supply = <&vaux2_reg>; - DRVDD-supply = <&vaux2_reg>; - DVDD-supply = <&vbat>; + AVDD-supply = <&v3_3d_reg>; + IOVDD-supply = <&v3_3d_reg>; + DRVDD-supply = <&v3_3d_reg>; + DVDD-supply = <&v1_8d_reg>; }; };
From: Peter Ujfalusi peter.ujfalusi@ti.com
commit 4f96dc0a3e79ec257a2b082dab3ee694ff88c317 upstream.
Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: Stable@vger.kernel.org # v4.14+ Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/am335x-evm.dts | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/am335x-evm.dts +++ b/arch/arm/boot/dts/am335x-evm.dts @@ -57,6 +57,24 @@ enable-active-high; };
+ /* TPS79501 */ + v1_8d_reg: fixedregulator-v1_8d { + compatible = "regulator-fixed"; + regulator-name = "v1_8d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + /* TPS79501 */ + v3_3d_reg: fixedregulator-v3_3d { + compatible = "regulator-fixed"; + regulator-name = "v3_3d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + matrix_keypad: matrix_keypad0 { compatible = "gpio-matrix-keypad"; debounce-delay-ms = <5>; @@ -499,10 +517,10 @@ status = "okay";
/* Regulators */ - AVDD-supply = <&vaux2_reg>; - IOVDD-supply = <&vaux2_reg>; - DRVDD-supply = <&vaux2_reg>; - DVDD-supply = <&vbat>; + AVDD-supply = <&v3_3d_reg>; + IOVDD-supply = <&v3_3d_reg>; + DRVDD-supply = <&v3_3d_reg>; + DVDD-supply = <&v1_8d_reg>; }; };
From: David Engraf david.engraf@sysgo.com
commit e7dfb6d04e4715be1f3eb2c60d97b753fd2e4516 upstream.
The function argument for the ISC_D0 on PC9 was incorrect. According to the documentation it should be 'C' aka 3.
Signed-off-by: David Engraf david.engraf@sysgo.com Reviewed-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: Ludovic Desroches ludovic.desroches@microchip.com Fixes: 7f16cb676c00 ("ARM: at91/dt: add sama5d2 pinmux") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/sama5d2-pinfunc.h +++ b/arch/arm/boot/dts/sama5d2-pinfunc.h @@ -518,7 +518,7 @@ #define PIN_PC9__GPIO PINMUX_PIN(PIN_PC9, 0, 0) #define PIN_PC9__FIQ PINMUX_PIN(PIN_PC9, 1, 3) #define PIN_PC9__GTSUCOMP PINMUX_PIN(PIN_PC9, 2, 1) -#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 2, 1) +#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 3, 1) #define PIN_PC9__TIOA4 PINMUX_PIN(PIN_PC9, 4, 2) #define PIN_PC10 74 #define PIN_PC10__GPIO PINMUX_PIN(PIN_PC10, 0, 0)
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -30,8 +30,8 @@ do { \ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -50,30 +50,30 @@ do { \ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr) { - int oldval = 0, ret, tmp; + int oldval, ret, tmp; u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
pagefault_disable();
switch (op) { case FUTEX_OP_SET: - __futex_atomic_op("mov %w0, %w4", + __futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ADD: - __futex_atomic_op("add %w0, %w1, %w4", + __futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_OR: - __futex_atomic_op("orr %w0, %w1, %w4", + __futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ANDN: - __futex_atomic_op("and %w0, %w1, %w4", + __futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg); break; case FUTEX_OP_XOR: - __futex_atomic_op("eor %w0, %w1, %w4", + __futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; default:
From: Peter Geis pgwipeout@gmail.com
commit 6fd8b9780ec1a49ac46e0aaf8775247205e66231 upstream.
Several rk3328 based boards experience high rgmii tx error rates. This is due to several pins in the rk3328.dtsi rgmii pinmux that are missing a defined pull strength setting. This causes the pinmux driver to default to 2ma (bit mask 00).
These pins are only defined in the rk3328.dtsi, and are not listed in the rk3328 specification. The TRM only lists them as "Reserved" (RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX, GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX). However, removal of these pins from the rgmii pinmux definition causes the interface to fail to transmit.
Also, the rgmii tx and rx pins defined in the dtsi are not consistent with the rk3328 specification, with tx pins currently set to 12ma and rx pins set to 2ma.
Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent with the specification. Defining the drive strength for the undefined pins eliminated the high tx packet error rate observed under heavy data transfers. Aligning the drive strength to the TRM values eliminated the occasional packet retry errors under iperf3 testing. This allows much higher data rates with no recorded tx errors.
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 44 +++++++++++++++---------------- 1 file changed, 22 insertions(+), 22 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi @@ -1553,50 +1553,50 @@ rgmiim1_pins: rgmiim1-pins { rockchip,pins = /* mac_txclk */ - <1 RK_PB4 2 &pcfg_pull_none_12ma>, + <1 RK_PB4 2 &pcfg_pull_none_8ma>, /* mac_rxclk */ - <1 RK_PB5 2 &pcfg_pull_none_2ma>, + <1 RK_PB5 2 &pcfg_pull_none_4ma>, /* mac_mdio */ - <1 RK_PC3 2 &pcfg_pull_none_2ma>, + <1 RK_PC3 2 &pcfg_pull_none_4ma>, /* mac_txen */ - <1 RK_PD1 2 &pcfg_pull_none_12ma>, + <1 RK_PD1 2 &pcfg_pull_none_8ma>, /* mac_clk */ - <1 RK_PC5 2 &pcfg_pull_none_2ma>, + <1 RK_PC5 2 &pcfg_pull_none_4ma>, /* mac_rxdv */ - <1 RK_PC6 2 &pcfg_pull_none_2ma>, + <1 RK_PC6 2 &pcfg_pull_none_4ma>, /* mac_mdc */ - <1 RK_PC7 2 &pcfg_pull_none_2ma>, + <1 RK_PC7 2 &pcfg_pull_none_4ma>, /* mac_rxd1 */ - <1 RK_PB2 2 &pcfg_pull_none_2ma>, + <1 RK_PB2 2 &pcfg_pull_none_4ma>, /* mac_rxd0 */ - <1 RK_PB3 2 &pcfg_pull_none_2ma>, + <1 RK_PB3 2 &pcfg_pull_none_4ma>, /* mac_txd1 */ - <1 RK_PB0 2 &pcfg_pull_none_12ma>, + <1 RK_PB0 2 &pcfg_pull_none_8ma>, /* mac_txd0 */ - <1 RK_PB1 2 &pcfg_pull_none_12ma>, + <1 RK_PB1 2 &pcfg_pull_none_8ma>, /* mac_rxd3 */ - <1 RK_PB6 2 &pcfg_pull_none_2ma>, + <1 RK_PB6 2 &pcfg_pull_none_4ma>, /* mac_rxd2 */ - <1 RK_PB7 2 &pcfg_pull_none_2ma>, + <1 RK_PB7 2 &pcfg_pull_none_4ma>, /* mac_txd3 */ - <1 RK_PC0 2 &pcfg_pull_none_12ma>, + <1 RK_PC0 2 &pcfg_pull_none_8ma>, /* mac_txd2 */ - <1 RK_PC1 2 &pcfg_pull_none_12ma>, + <1 RK_PC1 2 &pcfg_pull_none_8ma>,
/* mac_txclk */ - <0 RK_PB0 1 &pcfg_pull_none>, + <0 RK_PB0 1 &pcfg_pull_none_8ma>, /* mac_txen */ - <0 RK_PB4 1 &pcfg_pull_none>, + <0 RK_PB4 1 &pcfg_pull_none_8ma>, /* mac_clk */ - <0 RK_PD0 1 &pcfg_pull_none>, + <0 RK_PD0 1 &pcfg_pull_none_4ma>, /* mac_txd1 */ - <0 RK_PC0 1 &pcfg_pull_none>, + <0 RK_PC0 1 &pcfg_pull_none_8ma>, /* mac_txd0 */ - <0 RK_PC1 1 &pcfg_pull_none>, + <0 RK_PC1 1 &pcfg_pull_none_8ma>, /* mac_txd3 */ - <0 RK_PC7 1 &pcfg_pull_none>, + <0 RK_PC7 1 &pcfg_pull_none_8ma>, /* mac_txd2 */ - <0 RK_PC6 1 &pcfg_pull_none>; + <0 RK_PC6 1 &pcfg_pull_none_8ma>; };
rmiim1_pins: rmiim1-pins {
From: Will Deacon will.deacon@arm.com
commit 1e6f5440a6814d28c32d347f338bfef68bc3e69d upstream.
Calling dump_backtrace() with a pt_regs argument corresponding to userspace doesn't make any sense and our unwinder will simply print "Call trace:" before unwinding the stack looking for user frames.
Rather than go through this song and dance, just return early if we're passed a user register state.
Cc: stable@vger.kernel.org Fixes: 1149aad10b1e ("arm64: Add dump_backtrace() in show_regs") Reported-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/traps.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -101,10 +101,16 @@ static void dump_instr(const char *lvl, void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) { struct stackframe frame; - int skip; + int skip = 0;
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
+ if (regs) { + if (user_mode(regs)) + return; + skip = 1; + } + if (!tsk) tsk = current;
@@ -125,7 +131,6 @@ void dump_backtrace(struct pt_regs *regs frame.graph = tsk->curr_ret_stack; #endif
- skip = !!regs; printk("Call trace:\n"); do { /* skip until specified stack frame */ @@ -175,15 +180,13 @@ static int __die(const char *str, int er return ret;
print_modules(); - __show_regs(regs); pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n", TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), end_of_stack(tsk)); + show_regs(regs);
- if (!user_mode(regs)) { - dump_backtrace(regs, tsk); + if (!user_mode(regs)) dump_instr(KERN_EMERG, regs); - }
return ret; }
From: Dan Carpenter dan.carpenter@oracle.com
commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream.
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access.
Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/xen/hypercall.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -206,6 +206,9 @@ xen_single_call(unsigned int call, __HYPERCALL_DECLS; __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
+ if (call >= PAGE_SIZE / sizeof(hypercall_page[0])) + return -EINVAL; + asm volatile(CALL_NOSPEC : __HYPERCALL_5PARAM : [thunk_target] "a" (&hypercall_page[call])
From: Mel Gorman mgorman@techsingularity.net
commit 0e9f02450da07fc7b1346c8c32c771555173e397 upstream.
A NULL pointer dereference bug was reported on a distribution kernel but the same issue should be present on mainline kernel. It occured on s390 but should not be arch-specific. A partial oops looks like:
Unable to handle kernel pointer dereference in virtual kernel address space ... Call Trace: ... try_to_wake_up+0xfc/0x450 vhost_poll_wakeup+0x3a/0x50 [vhost] __wake_up_common+0xbc/0x178 __wake_up_common_lock+0x9e/0x160 __wake_up_sync_key+0x4e/0x60 sock_def_readable+0x5e/0x98
The bug hits any time between 1 hour to 3 days. The dereference occurs in update_cfs_rq_h_load when accumulating h_load. The problem is that cfq_rq->h_load_next is not protected by any locking and can be updated by parallel calls to task_h_load. Depending on the compiler, code may be generated that re-reads cfq_rq->h_load_next after the check for NULL and then oops when reading se->avg.load_avg. The dissassembly showed that it was possible to reread h_load_next after the check for NULL.
While this does not appear to be an issue for later compilers, it's still an accident if the correct code is generated. Full locking in this path would have high overhead so this patch uses READ_ONCE to read h_load_next only once and check for NULL before dereferencing. It was confirmed that there were no further oops after 10 days of testing.
As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any potential problems with store tearing.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.ne... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7437,10 +7437,10 @@ static void update_cfs_rq_h_load(struct if (cfs_rq->last_h_load_update == now) return;
- cfs_rq->h_load_next = NULL; + WRITE_ONCE(cfs_rq->h_load_next, NULL); for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); - cfs_rq->h_load_next = se; + WRITE_ONCE(cfs_rq->h_load_next, se); if (cfs_rq->last_h_load_update == now) break; } @@ -7450,7 +7450,7 @@ static void update_cfs_rq_h_load(struct cfs_rq->last_h_load_update = now; }
- while ((se = cfs_rq->h_load_next) != NULL) { + while ((se = READ_ONCE(cfs_rq->h_load_next)) != NULL) { load = cfs_rq->h_load; load = div64_ul(load * se->avg.load_avg, cfs_rq_load_avg(cfs_rq) + 1);
From: Max Filippov jcmvbkbc@gmail.com
commit ada770b1e74a77fff2d5f539bf6c42c25f4784db upstream.
return_address returns the address that is one level higher in the call stack than requested in its argument, because level 0 corresponds to its caller's return address. Use requested level as the number of stack frames to skip.
This fixes the address reported by might_sleep and friends.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/stacktrace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/xtensa/kernel/stacktrace.c +++ b/arch/xtensa/kernel/stacktrace.c @@ -253,10 +253,14 @@ static int return_address_cb(struct stac return 1; }
+/* + * level == 0 is for the return address from the caller of this function, + * not from this function itself. + */ unsigned long return_address(unsigned level) { struct return_addr_data r = { - .skip = level + 1, + .skip = level, }; walk_stackframe(stack_pointer(NULL), return_address_cb, &r); return r.addr;
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit 88ca66d8540ca26119b1428cddb96b37925bdf01 upstream.
The minimum supported gcc version is >= 4.6, so these can be removed.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Signed-off-by: Borislav Petkov bp@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Dan Williams dan.j.williams@intel.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Ingo Molnar mingo@redhat.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bitops.h | 6 ------ arch/x86/include/asm/string_32.h | 20 -------------------- arch/x86/include/asm/string_64.h | 15 --------------- 3 files changed, 41 deletions(-)
--- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -36,13 +36,7 @@ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1). */
-#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1) -/* Technically wrong, but this avoids compilation errors on some gcc - versions. */ -#define BITOP_ADDR(x) "=m" (*(volatile long *) (x)) -#else #define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) -#endif
#define ADDR BITOP_ADDR(addr)
--- a/arch/x86/include/asm/string_32.h +++ b/arch/x86/include/asm/string_32.h @@ -179,14 +179,7 @@ static inline void *__memcpy3d(void *to, * No 3D Now! */
-#if (__GNUC__ >= 4) #define memcpy(t, f, n) __builtin_memcpy(t, f, n) -#else -#define memcpy(t, f, n) \ - (__builtin_constant_p((n)) \ - ? __constant_memcpy((t), (f), (n)) \ - : __memcpy((t), (f), (n))) -#endif
#endif #endif /* !CONFIG_FORTIFY_SOURCE */ @@ -282,12 +275,7 @@ void *__constant_c_and_count_memset(void
{ int d0, d1; -#if __GNUC__ == 4 && __GNUC_MINOR__ == 0 - /* Workaround for broken gcc 4.0 */ - register unsigned long eax asm("%eax") = pattern; -#else unsigned long eax = pattern; -#endif
switch (count % 4) { case 0: @@ -321,15 +309,7 @@ void *__constant_c_and_count_memset(void #define __HAVE_ARCH_MEMSET extern void *memset(void *, int, size_t); #ifndef CONFIG_FORTIFY_SOURCE -#if (__GNUC__ >= 4) #define memset(s, c, count) __builtin_memset(s, c, count) -#else -#define memset(s, c, count) \ - (__builtin_constant_p(c) \ - ? __constant_c_x_memset((s), (0x01010101UL * (unsigned char)(c)), \ - (count)) \ - : __memset((s), (c), (count))) -#endif #endif /* !CONFIG_FORTIFY_SOURCE */
#define __HAVE_ARCH_MEMSET16 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -32,21 +32,6 @@ static __always_inline void *__inline_me extern void *memcpy(void *to, const void *from, size_t len); extern void *__memcpy(void *to, const void *from, size_t len);
-#ifndef CONFIG_FORTIFY_SOURCE -#if (__GNUC__ == 4 && __GNUC_MINOR__ < 3) || __GNUC__ < 4 -#define memcpy(dst, src, len) \ -({ \ - size_t __len = (len); \ - void *__ret; \ - if (__builtin_constant_p(len) && __len >= 64) \ - __ret = __memcpy((dst), (src), __len); \ - else \ - __ret = __builtin_memcpy((dst), (src), __len); \ - __ret; \ -}) -#endif -#endif /* !CONFIG_FORTIFY_SOURCE */ - #define __HAVE_ARCH_MEMSET void *memset(void *s, int c, size_t n); void *__memset(void *s, int c, size_t n);
From: Alexander Potapenko glider@google.com
commit 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 upstream.
There's a number of problems with how arch/x86/include/asm/bitops.h is currently using assembly constraints for the memory region bitops are modifying:
1) Use memory clobber in bitops that touch arbitrary memory
Certain bit operations that read/write bits take a base pointer and an arbitrarily large offset to address the bit relative to that base. Inline assembly constraints aren't expressive enough to tell the compiler that the assembly directive is going to touch a specific memory location of unknown size, therefore we have to use the "memory" clobber to indicate that the assembly is going to access memory locations other than those listed in the inputs/outputs.
To indicate that BTR/BTS instructions don't necessarily touch the first sizeof(long) bytes of the argument, we also move the address to assembly inputs.
This particular change leads to size increase of 124 kernel functions in a defconfig build. For some of them the diff is in NOP operations, other end up re-reading values from memory and may potentially slow down the execution. But without these clobbers the compiler is free to cache the contents of the bitmaps and use them as if they weren't changed by the inline assembly.
2) Use byte-sized arguments for operations touching single bytes.
Passing a long value to ANDB/ORB/XORB instructions makes the compiler treat sizeof(long) bytes as being clobbered, which isn't the case. This may theoretically lead to worse code in the case of heavy optimization.
Practical impact:
I've built a defconfig kernel and looked through some of the functions generated by GCC 7.3.0 with and without this clobber, and didn't spot any miscompilations.
However there is a (trivial) theoretical case where this code leads to miscompilation:
https://lkml.org/lkml/2019/3/28/393
using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes such a function in the kernel someday.
So the primary motivation is to fix an existing misuse of the asm directive, which happens to work in certain configurations now, but isn't guaranteed to work under different circumstances.
[ --mingo: Added -stable tag because defconfig only builds a fraction of the kernel and the trivial testcase looks normal enough to be used in existing or in-development code. ]
Signed-off-by: Alexander Potapenko glider@google.com Cc: stable@vger.kernel.org Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Dmitry Vyukov dvyukov@google.com Cc: H. Peter Anvin hpa@zytor.com Cc: James Y Knight jyknight@google.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com [ Edited the changelog, tidied up one of the defines. ] Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bitops.h | 41 ++++++++++++++++++----------------------- 1 file changed, 18 insertions(+), 23 deletions(-)
--- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -36,16 +36,17 @@ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1). */
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) +#define RLONG_ADDR(x) "m" (*(volatile long *) (x)) +#define WBYTE_ADDR(x) "+m" (*(volatile char *) (x))
-#define ADDR BITOP_ADDR(addr) +#define ADDR RLONG_ADDR(addr)
/* * We do the locked ops that don't return the old value as * a mask operation on a byte. */ #define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) -#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3)) +#define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3)) #define CONST_MASK(nr) (1 << ((nr) & 7))
/** @@ -73,7 +74,7 @@ set_bit(long nr, volatile unsigned long : "memory"); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0" - : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } }
@@ -88,7 +89,7 @@ set_bit(long nr, volatile unsigned long */ static __always_inline void __set_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory"); + asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); }
/** @@ -110,8 +111,7 @@ clear_bit(long nr, volatile unsigned lon : "iq" ((u8)~CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } }
@@ -131,7 +131,7 @@ static __always_inline void clear_bit_un
static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr)); + asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); }
static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) @@ -139,7 +139,7 @@ static __always_inline bool clear_bit_un bool negative; asm volatile(LOCK_PREFIX "andb %2,%1" CC_SET(s) - : CC_OUT(s) (negative), ADDR + : CC_OUT(s) (negative), WBYTE_ADDR(addr) : "ir" ((char) ~(1 << nr)) : "memory"); return negative; } @@ -155,13 +155,9 @@ static __always_inline bool clear_bit_un * __clear_bit() is non-atomic and implies release semantics before the memory * operation. It can be used for an unlock if no other CPUs can concurrently * modify other bits in the word. - * - * No memory barrier is required here, because x86 cannot reorder stores past - * older loads. Same principle as spin_unlock. */ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr) { - barrier(); __clear_bit(nr, addr); }
@@ -176,7 +172,7 @@ static __always_inline void __clear_bit_ */ static __always_inline void __change_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr)); + asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); }
/** @@ -196,8 +192,7 @@ static __always_inline void change_bit(l : "iq" ((u8)CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } }
@@ -243,8 +238,8 @@ static __always_inline bool __test_and_s
asm(__ASM_SIZE(bts) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr)); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; }
@@ -284,8 +279,8 @@ static __always_inline bool __test_and_c
asm volatile(__ASM_SIZE(btr) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr)); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; }
@@ -296,8 +291,8 @@ static __always_inline bool __test_and_c
asm volatile(__ASM_SIZE(btc) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr) : "memory"); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory");
return oldbit; } @@ -329,7 +324,7 @@ static __always_inline bool variable_tes asm volatile(__ASM_SIZE(bt) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) - : "m" (*(unsigned long *)addr), "Ir" (nr)); + : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory");
return oldbit; }
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream.
On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed).
There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured.
To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all().
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 65 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 3 deletions(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -3,6 +3,7 @@ #include <linux/types.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/delay.h> #include <asm/apicdef.h>
#include "../perf_event.h" @@ -429,6 +430,64 @@ static void amd_pmu_cpu_dead(int cpu) } }
+/* + * When a PMC counter overflows, an NMI is used to process the event and + * reset the counter. NMI latency can result in the counter being updated + * before the NMI can run, which can result in what appear to be spurious + * NMIs. This function is intended to wait for the NMI to run and reset + * the counter to avoid possible unhandled NMI messages. + */ +#define OVERFLOW_WAIT_COUNT 50 + +static void amd_pmu_wait_on_overflow(int idx) +{ + unsigned int i; + u64 counter; + + /* + * Wait for the counter to be reset if it has overflowed. This loop + * should exit very, very quickly, but just in case, don't wait + * forever... + */ + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { + rdmsrl(x86_pmu_event_addr(idx), counter); + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) + break; + + /* Might be in IRQ context, so can't sleep */ + udelay(1); + } +} + +static void amd_pmu_disable_all(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + int idx; + + x86_pmu_disable_all(); + + /* + * This shouldn't be called from NMI context, but add a safeguard here + * to return, since if we're in NMI context we can't wait for an NMI + * to reset an overflowed counter value. + */ + if (in_nmi()) + return; + + /* + * Check each counter for overflow and wait for it to be reset by the + * NMI if it has overflowed. This relies on the fact that all active + * counters are always enabled when this function is caled and + * ARCH_PERFMON_EVENTSEL_INT is always set. + */ + for (idx = 0; idx < x86_pmu.num_counters; idx++) { + if (!test_bit(idx, cpuc->active_mask)) + continue; + + amd_pmu_wait_on_overflow(idx); + } +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -622,7 +681,7 @@ static ssize_t amd_event_sysfs_show(char static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", .handle_irq = x86_pmu_handle_irq, - .disable_all = x86_pmu_disable_all, + .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event, .disable = x86_pmu_disable_event, @@ -728,7 +787,7 @@ void amd_pmu_enable_virt(void) cpuc->perf_ctr_virt_mask = 0;
/* Reload all events */ - x86_pmu_disable_all(); + amd_pmu_disable_all(); x86_pmu_enable_all(0); } EXPORT_SYMBOL_GPL(amd_pmu_enable_virt); @@ -746,7 +805,7 @@ void amd_pmu_disable_virt(void) cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
/* Reload all events */ - x86_pmu_disable_all(); + amd_pmu_disable_all(); x86_pmu_enable_all(0); } EXPORT_SYMBOL_GPL(amd_pmu_disable_virt);
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream.
On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed).
When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured.
To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 56 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -4,10 +4,13 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> +#include <linux/nmi.h> #include <asm/apicdef.h>
#include "../perf_event.h"
+static DEFINE_PER_CPU(unsigned int, perf_nmi_counter); + static __initconst const u64 amd_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -488,6 +491,57 @@ static void amd_pmu_disable_all(void) } }
+/* + * Because of NMI latency, if multiple PMC counters are active or other sources + * of NMIs are received, the perf NMI handler can handle one or more overflowed + * PMC counters outside of the NMI associated with the PMC overflow. If the NMI + * doesn't arrive at the LAPIC in time to become a pending NMI, then the kernel + * back-to-back NMI support won't be active. This PMC handler needs to take into + * account that this can occur, otherwise this could result in unknown NMI + * messages being issued. Examples of this is PMC overflow while in the NMI + * handler when multiple PMCs are active or PMC overflow while handling some + * other source of an NMI. + * + * Attempt to mitigate this by using the number of active PMCs to determine + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of the + * number of active PMCs or 2. The value of 2 is used in case an NMI does not + * arrive at the LAPIC in time to be collapsed into an already pending NMI. + */ +static int amd_pmu_handle_irq(struct pt_regs *regs) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + int active, handled; + + /* + * Obtain the active count before calling x86_pmu_handle_irq() since + * it is possible that x86_pmu_handle_irq() may make a counter + * inactive (through x86_pmu_stop). + */ + active = __bitmap_weight(cpuc->active_mask, X86_PMC_IDX_MAX); + + /* Process any counter overflows */ + handled = x86_pmu_handle_irq(regs); + + /* + * If a counter was handled, record the number of possible remaining + * NMIs that can occur. + */ + if (handled) { + this_cpu_write(perf_nmi_counter, + min_t(unsigned int, 2, active)); + + return handled; + } + + if (!this_cpu_read(perf_nmi_counter)) + return NMI_DONE; + + this_cpu_dec(perf_nmi_counter); + + return NMI_HANDLED; +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -680,7 +734,7 @@ static ssize_t amd_event_sysfs_show(char
static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", - .handle_irq = x86_pmu_handle_irq, + .handle_irq = amd_pmu_handle_irq, .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event,
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream.
Spurious interrupt support was added to perf in the following commit, almost a decade ago:
63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")
The two previous patches (resolving the race condition when disabling a PMC and NMI latency mitigation) allow for the removal of this older spurious interrupt support.
Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap is cleared before disabling the PMC, which sets up a race condition. This race condition was mitigated by introducing the running bitmap. That race condition can be eliminated by first disabling the PMC, waiting for PMC reset on overflow and then clearing the bit for the PMC in the active_mask bitmap. The NMI handler will not re-enable a disabled counter.
If x86_pmu_stop() is called from the perf NMI handler, the NMI latency mitigation support will guard against any unhandled NMI messages.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 21 +++++++++++++++++++-- arch/x86/events/core.c | 13 +++---------- 2 files changed, 22 insertions(+), 12 deletions(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -4,8 +4,8 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> -#include <linux/nmi.h> #include <asm/apicdef.h> +#include <asm/nmi.h>
#include "../perf_event.h"
@@ -491,6 +491,23 @@ static void amd_pmu_disable_all(void) } }
+static void amd_pmu_disable_event(struct perf_event *event) +{ + x86_pmu_disable_event(event); + + /* + * This can be called from NMI context (via x86_pmu_stop). The counter + * may have overflowed, but either way, we'll never see it get reset + * by the NMI if we're already in the NMI. And the NMI latency support + * below will take care of any pending NMI that might have been + * generated by the overflow. + */ + if (in_nmi()) + return; + + amd_pmu_wait_on_overflow(event->hw.idx); +} + /* * Because of NMI latency, if multiple PMC counters are active or other sources * of NMIs are received, the perf NMI handler can handle one or more overflowed @@ -738,7 +755,7 @@ static __initconst const struct x86_pmu .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event, - .disable = x86_pmu_disable_event, + .disable = amd_pmu_disable_event, .hw_config = amd_pmu_hw_config, .schedule_events = x86_schedule_events, .eventsel = MSR_K7_EVNTSEL0, --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1328,8 +1328,9 @@ void x86_pmu_stop(struct perf_event *eve struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct hw_perf_event *hwc = &event->hw;
- if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) { + if (test_bit(hwc->idx, cpuc->active_mask)) { x86_pmu.disable(event); + __clear_bit(hwc->idx, cpuc->active_mask); cpuc->events[hwc->idx] = NULL; WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); hwc->state |= PERF_HES_STOPPED; @@ -1426,16 +1427,8 @@ int x86_pmu_handle_irq(struct pt_regs *r apic_write(APIC_LVTPC, APIC_DM_NMI);
for (idx = 0; idx < x86_pmu.num_counters; idx++) { - if (!test_bit(idx, cpuc->active_mask)) { - /* - * Though we deactivated the counter some cpus - * might still deliver spurious interrupts still - * in flight. Catch them: - */ - if (__test_and_clear_bit(idx, cpuc->running)) - handled++; + if (!test_bit(idx, cpuc->active_mask)) continue; - }
event = cpuc->events[idx];
From: Andre Przywara andre.przywara@arm.com
commit 9cde402a59770a0669d895399c13407f63d7d209 upstream.
There is a Marvell 88SE9170 PCIe SATA controller I found on a board here. Some quick testing with the ARM SMMU enabled reveals that it suffers from the same requester ID mixup problems as the other Marvell chips listed already.
Add the PCI vendor/device ID to the list of chips which need the workaround.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3852,6 +3852,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_M /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c14 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9130, quirk_dma_func1_alias); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9170, + quirk_dma_func1_alias); /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c47 + c57 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9172, quirk_dma_func1_alias);
From: Sergey Miroshnichenko s.miroshnichenko@yadro.com
commit 3943af9d01e94330d0cfac6fccdbc829aad50c92 upstream.
During a safe hot remove, the OS powers off the slot, which may cause a Data Link Layer State Changed event. The slot has already been set to OFF_STATE, so that event results in re-enabling the device, making it impossible to safely remove it.
Clear out the Presence Detect Changed and Data Link Layer State Changed events when the disabled slot has settled down.
It is still possible to re-enable the device if it remains in the slot after pressing the Attention Button by pressing it again.
Fixes the problem that Micah reported below: an NVMe drive power button may not actually turn off the drive.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237 Reported-by: Micah Parrish micah.parrish@hpe.com Tested-by: Micah Parrish micah.parrish@hpe.com Signed-off-by: Sergey Miroshnichenko s.miroshnichenko@yadro.com [bhelgaas: changelog, add bugzilla URL] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Lukas Wunner lukas@wunner.de Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -117,6 +117,10 @@ static void remove_board(struct slot *p_ * removed from the slot/adapter. */ msleep(1000); + + /* Ignore link or presence changes caused by power off */ + atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC), + &ctrl->pending_events); }
/* turn off Green LED */
From: Mikulas Patocka mpatocka@redhat.com
commit 0d74e6a3b6421d98eeafbed26f29156d469bc0b5 upstream.
If the string opt_string is small, the function memcmp can access bytes that are beyond the terminating nul character. In theory, it could cause segfault, if opt_string were located just below some unmapped memory.
Change from memcmp to strncmp so that we don't read bytes beyond the end of the string.
Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-integrity.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -3174,7 +3174,7 @@ static int dm_integrity_ctr(struct dm_ta journal_watermark = val; else if (sscanf(opt_string, "commit_time:%u%c", &val, &dummy) == 1) sync_msec = val; - else if (!memcmp(opt_string, "meta_device:", strlen("meta_device:"))) { + else if (!strncmp(opt_string, "meta_device:", strlen("meta_device:"))) { if (ic->meta_dev) { dm_put_device(ti, ic->meta_dev); ic->meta_dev = NULL; @@ -3193,17 +3193,17 @@ static int dm_integrity_ctr(struct dm_ta goto bad; } ic->sectors_per_block = val >> SECTOR_SHIFT; - } else if (!memcmp(opt_string, "internal_hash:", strlen("internal_hash:"))) { + } else if (!strncmp(opt_string, "internal_hash:", strlen("internal_hash:"))) { r = get_alg_and_key(opt_string, &ic->internal_hash_alg, &ti->error, "Invalid internal_hash argument"); if (r) goto bad; - } else if (!memcmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) { + } else if (!strncmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) { r = get_alg_and_key(opt_string, &ic->journal_crypt_alg, &ti->error, "Invalid journal_crypt argument"); if (r) goto bad; - } else if (!memcmp(opt_string, "journal_mac:", strlen("journal_mac:"))) { + } else if (!strncmp(opt_string, "journal_mac:", strlen("journal_mac:"))) { r = get_alg_and_key(opt_string, &ic->journal_mac_alg, &ti->error, "Invalid journal_mac argument"); if (r)
From: Mikulas Patocka mpatocka@redhat.com
commit 75ae193626de3238ca5fb895868ec91c94e63b1b upstream.
The limit was already incorporated to dm-crypt with commit 4e870e948fba ("dm crypt: fix error with too large bios"), so we don't need to apply it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is wrong anyway because the variable ti->max_io_len it is supposed to be in the units of 512-byte sectors not in bytes.
Reduction of the limit to 1048576 sectors could even cause data corruption in rare cases - suppose that we have a dm-striped device with stripe size 768MiB. The target will call dm_set_target_max_io_len with the value 1572864. The buggy code would reduce it to 1048576. Now, the dm-core will errorneously split the bios on 1048576-sector boundary insetad of 1572864-sector boundary and pass these stripe-crossing bios to the striped target.
Cc: stable@vger.kernel.org # v4.16+ Fixes: 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") Signed-off-by: Mikulas Patocka mpatocka@redhat.com Acked-by: Ming Lei ming.lei@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1007,15 +1007,7 @@ int dm_set_target_max_io_len(struct dm_t return -EINVAL; }
- /* - * BIO based queue uses its own splitting. When multipage bvecs - * is switched on, size of the incoming bio may be too big to - * be handled in some targets, such as crypt. - * - * When these targets are ready for the big bio, we can remove - * the limit. - */ - ti->max_io_len = min_t(uint32_t, len, BIO_MAX_PAGES * PAGE_SIZE); + ti->max_io_len = (uint32_t) len;
return 0; }
From: Ilya Dryomov idryomov@gmail.com
commit eb40c0acdc342b815d4d03ae6abb09e80c0f2988 upstream.
Some devices don't use blk_integrity but still want stable pages because they do their own checksumming. Examples include rbd and iSCSI when data digests are negotiated. Stacking DM (and thus LVM) on top of these devices results in sporadic checksum errors.
Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1872,6 +1872,36 @@ static bool dm_table_supports_secure_era return true; }
+static int device_requires_stable_pages(struct dm_target *ti, + struct dm_dev *dev, sector_t start, + sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return q && bdi_cap_stable_pages_required(q->backing_dev_info); +} + +/* + * If any underlying device requires stable pages, a table must require + * them as well. Only targets that support iterate_devices are considered: + * don't want error, zero, etc to require stable pages. + */ +static bool dm_table_requires_stable_pages(struct dm_table *t) +{ + struct dm_target *ti; + unsigned i; + + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + + if (ti->type->iterate_devices && + ti->type->iterate_devices(ti, device_requires_stable_pages, NULL)) + return true; + } + + return false; +} + void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { @@ -1930,6 +1960,15 @@ void dm_table_set_restrictions(struct dm dm_table_verify_integrity(t);
/* + * Some devices don't use blk_integrity but still want stable pages + * because they do their own checksumming. + */ + if (dm_table_requires_stable_pages(t)) + q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; + else + q->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; + + /* * Determine whether or not this queue's I/O timings contribute * to the entropy pool, Only request-based targets use this. * Clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
From: Mikulas Patocka mpatocka@redhat.com
commit 4ed319c6ac08e9a28fca7ac188181ac122f4de84 upstream.
dm-integrity will deadlock if overlapping I/O is issued to it, the bug was introduced by commit 724376a04d1a ("dm integrity: implement fair range locks"). Users rarely use overlapping I/O so this bug went undetected until now.
Fix this bug by correcting, likely cut-n-paste, typos in ranges_overlap() and also remove a flawed ranges_overlap() check in remove_range_unlocked(). This condition could leave unprocessed bios hanging on wait_list forever.
Cc: stable@vger.kernel.org # v4.19+ Fixes: 724376a04d1a ("dm integrity: implement fair range locks") Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-integrity.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -908,7 +908,7 @@ static void copy_from_journal(struct dm_ static bool ranges_overlap(struct dm_integrity_range *range1, struct dm_integrity_range *range2) { return range1->logical_sector < range2->logical_sector + range2->n_sectors && - range2->logical_sector + range2->n_sectors > range2->logical_sector; + range1->logical_sector + range1->n_sectors > range2->logical_sector; }
static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *new_range, bool check_waiting) @@ -954,8 +954,6 @@ static void remove_range_unlocked(struct struct dm_integrity_range *last_range = list_first_entry(&ic->wait_list, struct dm_integrity_range, wait_entry); struct task_struct *last_range_task; - if (!ranges_overlap(range, last_range)) - break; last_range_task = last_range->task; list_del(&last_range->wait_entry); if (!add_new_range(ic, last_range, false)) {
From: Katsuhiro Suzuki katsuhiro@katsuster.net
commit ef05bcb60c1a8841e38c91923ba998181117a87c upstream.
This patch fixes pin assign of vcc_host1_5v. This regulator is controlled by USB20_HOST_DRV signal.
ROCK64 schematic says that GPIO0_A2 pin is used as USB20_HOST_DRV. GPIO0_D3 pin is for SPDIF_TX_M0.
Signed-off-by: Katsuhiro Suzuki katsuhiro@katsuster.net Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts @@ -46,7 +46,7 @@ vcc_host1_5v: vcc_otg_5v: vcc-host1-5v-regulator { compatible = "regulator-fixed"; enable-active-high; - gpio = <&gpio0 RK_PD3 GPIO_ACTIVE_HIGH>; + gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_HIGH>; pinctrl-names = "default"; pinctrl-0 = <&usb20_host_drv>; regulator-name = "vcc_host1_5v"; @@ -238,7 +238,7 @@
usb2 { usb20_host_drv: usb20-host-drv { - rockchip,pins = <0 RK_PD3 RK_FUNC_GPIO &pcfg_pull_none>; + rockchip,pins = <0 RK_PA2 RK_FUNC_GPIO &pcfg_pull_none>; }; };
From: Tomohiro Mayama parly-gh@iris.mystia.org
commit a8772e5d826d0f61f8aa9c284b3ab49035d5273d upstream.
This patch makes USB ports functioning again.
Fixes: 955bebde057e ("arm64: dts: rockchip: add rk3328-rock64 board") Cc: stable@vger.kernel.org Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Tomohiro Mayama parly-gh@iris.mystia.org Tested-by: Katsuhiro Suzuki katsuhiro@katsuster.net Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts @@ -45,8 +45,7 @@
vcc_host1_5v: vcc_otg_5v: vcc-host1-5v-regulator { compatible = "regulator-fixed"; - enable-active-high; - gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_HIGH>; + gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_LOW>; pinctrl-names = "default"; pinctrl-0 = <&usb20_host_drv>; regulator-name = "vcc_host1_5v";
From: Erik Schmauss erik.schmauss@intel.com
commit 4abb951b73ff0a8a979113ef185651aa3c8da19b upstream.
The table load process omitted adding the operation region address range to the global list. This omission is problematic because the OS queries the global list to check for address range conflicts before deciding which drivers to load. This commit may result in warning messages that look like the following:
[ 7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts with op_region 0x00000400-0x0000047F (\PMIO) (20180531/utaddress-213) [ 7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
However, these messages do not signify regressions. It is a result of properly adding address ranges within the global address list.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200011 Tested-by: Jean-Marc Lenoir archlinux@jihemel.com Signed-off-by: Erik Schmauss erik.schmauss@intel.com Cc: All applicable stable@vger.kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/dsopcode.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/acpi/acpica/dsopcode.c +++ b/drivers/acpi/acpica/dsopcode.c @@ -523,6 +523,10 @@ acpi_ds_eval_table_region_operands(struc ACPI_FORMAT_UINT64(obj_desc->region.address), obj_desc->region.length));
+ status = acpi_ut_add_address_range(obj_desc->region.space_id, + obj_desc->region.address, + obj_desc->region.length, node); + /* Now the address and length are valid for this opregion */
obj_desc->region.flags |= AOPOBJ_DATA_VALID;
From: Marc Orr marcorr@google.com
commit acff78477b9b4f26ecdf65733a4ed77fe837e9dc upstream.
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a result, we discovered the potential for a buggy or malicious L1 to get access to L0's x2APIC MSRs, via an L2, as follows.
1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl variable, in nested_vmx_prepare_msr_bitmap() to become true. 2. L1 disables "virtualize x2APIC mode" in VMCS12. 3. L1 enables "APIC-register virtualization" in VMCS12.
Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR intercepts with VMCS12's "virtualize x2APIC mode" control.
The scenario outlined above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Add leak scenario to virt_x2apic_mode_test".
Note, it looks like this issue may have been introduced inadvertently during a merge---see 15303ba5d1cd.
Signed-off-by: Marc Orr marcorr@google.com Reviewed-by: Jim Mattson jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx.c | 72 ++++++++++++++++++++++++++++++++--------------------- 1 file changed, 44 insertions(+), 28 deletions(-)
--- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -11582,6 +11582,17 @@ static int nested_vmx_check_tpr_shadow_c return 0; }
+static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap) { + int msr; + + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + + msr_bitmap[word] = ~0; + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; + } +} + /* * Merge L0's and L1's MSR bitmap, return false to indicate that * we do not use the hardware. @@ -11623,39 +11634,44 @@ static inline bool nested_vmx_prepare_ms return false;
msr_bitmap_l1 = (unsigned long *)kmap(page); - if (nested_cpu_has_apic_reg_virt(vmcs12)) { - /* - * L0 need not intercept reads for MSRs between 0x800 and 0x8ff, it - * just lets the processor take the value from the virtual-APIC page; - * take those 256 bits directly from the L1 bitmap. - */ - for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { - unsigned word = msr / BITS_PER_LONG; - msr_bitmap_l0[word] = msr_bitmap_l1[word]; - msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0; - } - } else { - for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { - unsigned word = msr / BITS_PER_LONG; - msr_bitmap_l0[word] = ~0; - msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0; - } - }
- nested_vmx_disable_intercept_for_msr( - msr_bitmap_l1, msr_bitmap_l0, - X2APIC_MSR(APIC_TASKPRI), - MSR_TYPE_W); + /* + * To keep the control flow simple, pay eight 8-byte writes (sixteen + * 4-byte writes on 32-bit systems) up front to enable intercepts for + * the x2APIC MSR range and selectively disable them below. + */ + enable_x2apic_msr_intercepts(msr_bitmap_l0); + + if (nested_cpu_has_virt_x2apic_mode(vmcs12)) { + if (nested_cpu_has_apic_reg_virt(vmcs12)) { + /* + * L0 need not intercept reads for MSRs between 0x800 + * and 0x8ff, it just lets the processor take the value + * from the virtual-APIC page; take those 256 bits + * directly from the L1 bitmap. + */ + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + + msr_bitmap_l0[word] = msr_bitmap_l1[word]; + } + }
- if (nested_cpu_has_vid(vmcs12)) { - nested_vmx_disable_intercept_for_msr( - msr_bitmap_l1, msr_bitmap_l0, - X2APIC_MSR(APIC_EOI), - MSR_TYPE_W); nested_vmx_disable_intercept_for_msr( msr_bitmap_l1, msr_bitmap_l0, - X2APIC_MSR(APIC_SELF_IPI), + X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W); + + if (nested_cpu_has_vid(vmcs12)) { + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + X2APIC_MSR(APIC_EOI), + MSR_TYPE_W); + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + X2APIC_MSR(APIC_SELF_IPI), + MSR_TYPE_W); + } }
if (spec_ctrl)
From: Marc Orr marcorr@google.com
commit c73f4c998e1fd4249b9edfa39e23f4fda2b9b041 upstream.
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the SDM, when "virtualize x2APIC mode" is 1 and "APIC-register virtualization" is 0, a RDMSR of 808H should return the VTPR from the virtual APIC page.
However, for nested, KVM currently fails to disable the read intercept for this MSR. This means that a RDMSR exit takes precedence over "virtualize x2APIC mode", and KVM passes through L1's TPR to L2, instead of sourcing the value from L2's virtual APIC page.
This patch fixes the issue by disabling the read intercept, in VMCS02, for the VTPR when "APIC-register virtualization" is 0.
The issue described above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC mode w/ nested".
Signed-off-by: Marc Orr marcorr@google.com Reviewed-by: Jim Mattson jmattson@google.com Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge") Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -11660,7 +11660,7 @@ static inline bool nested_vmx_prepare_ms nested_vmx_disable_intercept_for_msr( msr_bitmap_l1, msr_bitmap_l0, X2APIC_MSR(APIC_TASKPRI), - MSR_TYPE_W); + MSR_TYPE_R | MSR_TYPE_W);
if (nested_cpu_has_vid(vmcs12)) { nested_vmx_disable_intercept_for_msr(
stable-rc/linux-4.19.y boot: 110 boots: 1 failed, 102 passed with 7 offline (v4.19.34-102-ge4f859c2cd83)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.19.y/kernel/v4.19... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.19.y/kernel/v4.19.34-102...
Tree: stable-rc Branch: linux-4.19.y Git Describe: v4.19.34-102-ge4f859c2cd83 Git Commit: e4f859c2cd83c05fc968871175b2a2ed5e43eb49 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 65 unique boards, 24 SoC families, 14 builds out of 206
Boot Regressions Detected:
arm64:
defconfig: gcc-7: apq8016-sbc: lab-mhart: new failure (last pass: v4.19.34)
Boot Failure Detected:
arm64:
defconfig: gcc-7: apq8016-sbc: 1 failed lab
Offline Platforms:
arm:
qcom_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
multi_v7_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab
sunxi_defconfig: gcc-7 sun5i-r8-chip: 1 offline lab
arm64:
defconfig: gcc-7 apq8016-sbc: 1 offline lab
--- For more info write to info@kernelci.org
On 15/04/2019 19:57, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.35 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:40 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.35-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.19: 11 builds: 11 pass, 0 fail 22 boots: 22 pass, 0 fail 30 tests: 30 pass, 0 fail
Linux version: 4.19.35-rc1-ge4f859c Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Tue, 16 Apr 2019 at 00:36, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.19.35 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:40 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.35-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.19.35-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.19.y git commit: e4f859c2cd83c05fc968871175b2a2ed5e43eb49 git describe: v4.19.34-102-ge4f859c2cd83 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.34-10...
No regressions (compared to build v4.19.34)
No fixes (compared to build v4.19.34)
Ran 23582 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Mon, Apr 15, 2019 at 08:57:58PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.35 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:40 UTC 2019. Anything received after that time might be too late.
Build results: total: 156 pass: 156 fail: 0 Qemu test results: total: 349 pass: 349 fail: 0
Guenter
On 4/15/19 12:57 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.35 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:40 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.35-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Wed, Apr 17, 2019 at 08:15:56AM +0200, Greg Kroah-Hartman wrote:
On Wed, Apr 17, 2019 at 03:46:09AM +0530, Bharath Vedartham wrote:
Compiled and Booted(defconfig) on my x86 machine. No dmesg regressions.
Thanks for testing 2 of these and letting me know.
Oops, you tested 3, thanks for that :)
On Wed, Apr 17, 2019 at 08:16:15AM +0200, Greg Kroah-Hartman wrote:
On Wed, Apr 17, 2019 at 08:15:56AM +0200, Greg Kroah-Hartman wrote:
On Wed, Apr 17, 2019 at 03:46:09AM +0530, Bharath Vedartham wrote:
Compiled and Booted(defconfig) on my x86 machine. No dmesg regressions.
Thanks for testing 2 of these and letting me know.
Oops, you tested 3, thanks for that :)
eager to do more!
linux-stable-mirror@lists.linaro.org