This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.43-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.43-rc1
David Ahern dsahern@kernel.org neighbour: allow NUD_NOARP entries to be forced GCed
Roger Pau Monne roger.pau@citrix.com xen-netback: take a reference to the RX task thread
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: missing error reporting for not selected expressions
Roja Rani Yarubandi rojay@codeaurora.org i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
Gao Xiang hsiangkao@redhat.com lib/lz4: explicitly support in-place decompression
Vitaly Kuznetsov vkuznets@redhat.com x86/kvm: Disable all PV features on crash
Vitaly Kuznetsov vkuznets@redhat.com x86/kvm: Disable kvmclock on all CPUs on shutdown
Vitaly Kuznetsov vkuznets@redhat.com x86/kvm: Teardown PV features on boot CPU as well
Marc Zyngier maz@kernel.org KVM: arm64: Fix debug register indexing
Sean Christopherson seanjc@google.com KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Anand Jain anand.jain@oracle.com btrfs: fix unmountable seed device after fstrim
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: always use mdp device to scale bandwidth
Mina Almasry almasrymina@google.com mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Filipe Manana fdmanana@suse.com btrfs: fix deadlock when cloning inline extents and low on available space
Josef Bacik josef@toxicpanda.com btrfs: abort in rename_exchange if we fail to insert the second ref
Josef Bacik josef@toxicpanda.com btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik josef@toxicpanda.com btrfs: return errors from btrfs_del_csums in cleanup_ref_head
Josef Bacik josef@toxicpanda.com btrfs: fix error handling in btrfs_del_csums
Josef Bacik josef@toxicpanda.com btrfs: mark ordered extent and inode with error if we fail to finish
Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com powerpc/kprobes: Fix validation of prefixed instructions across page boundary
Thomas Gleixner tglx@linutronix.de x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
Nirmoy Das nirmoy.das@amd.com drm/amdgpu: make sure we unpin the UVD BO
Luben Tuikov luben.tuikov@amd.com drm/amdgpu: Don't query CE and UE errors
Krzysztof Kozlowski krzysztof.kozlowski@canonical.com nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Pu Wen puwen@hygon.cn x86/sev: Check SME/SEV support in CPUID first
Thomas Gleixner tglx@linutronix.de x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
Ding Hui dinghui@sangfor.com.cn mm/page_alloc: fix counting of free pages after take off from buddy
Gerald Schaefer gerald.schaefer@linux.ibm.com mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
Junxiao Bi junxiao.bi@oracle.com ocfs2: fix data corruption by fallocate
Mark Rutland mark.rutland@arm.com pid: take a reference when initializing `cad_pid`
Phil Elwell phil@raspberrypi.com usb: dwc2: Fix build in periphal-only mode
Ritesh Harjani riteshh@linux.ibm.com ext4: fix accessing uninit percpu counter variable with fast_commit
Phillip Potter phil@philpotter.co.uk ext4: fix memory leak in ext4_mb_init_backend on error path.
Harshad Shirwadkar harshadshirwadkar@gmail.com ext4: fix fast commit alignment issues
Ye Bin yebin10@huawei.com ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Alexey Makhalov amakhalov@vmware.com ext4: fix memory leak in ext4_fill_super
Marek Vasut marex@denx.de ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
Michal Vokáč michal.vokac@ysoft.com ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
Hui Wang hui.wang@canonical.com ALSA: hda: update the power_state during the direct-complete
Carlos M carlos.marr.pz@gmail.com ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
Takashi Iwai tiwai@suse.de ALSA: timer: Fix master timer notification
Bob Peterson rpeterso@redhat.com gfs2: fix scheduling while atomic bug in glocks
Ahelenia Ziemiańska nabijaczleweli@nabijaczleweli.xyz HID: multitouch: require Finger field to mark Win8 reports as MT
Johan Hovold johan@kernel.org HID: magicmouse: fix NULL-deref on disconnect
Johnny Chuang johnny.chuang.emc@gmail.com HID: i2c-hid: Skip ELAN power-on command after reset
Pavel Skripkin paskripkin@gmail.com net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin paskripkin@gmail.com net: caif: fix memory leak in caif_device_notify
Pavel Skripkin paskripkin@gmail.com net: caif: add proper error handling
Pavel Skripkin paskripkin@gmail.com net: caif: added cfserl_release function
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: free empty intermediate nodes when removing single node
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: allocate nodes in kmem_cache
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: remove nodes in O(1)
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: initialize list head in selftest
Jason A. Donenfeld Jason@zx2c4.com wireguard: selftests: make sure rp_filter is disabled on vethc
Jason A. Donenfeld Jason@zx2c4.com wireguard: selftests: remove old conntrack kconfig value
Jason A. Donenfeld Jason@zx2c4.com wireguard: use synchronize_net rather than synchronize_rcu
Jason A. Donenfeld Jason@zx2c4.com wireguard: peer: allocate in kmem_cache
Jason A. Donenfeld Jason@zx2c4.com wireguard: do not use -O3
Lin Ma linma@zju.edu.cn Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma linma@zju.edu.cn Bluetooth: fix the erroneous flush_work() order
James Zhu James.Zhu@amd.com drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
James Zhu James.Zhu@amd.com drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
James Zhu James.Zhu@amd.com drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
Pavel Begunkov asml.silence@gmail.com io_uring: use better types for cflags
Pavel Begunkov asml.silence@gmail.com io_uring: fix link timeout refs
Jisheng Zhang jszhang@kernel.org riscv: vdso: fix and clean-up Makefile
Johan Hovold johan@kernel.org serial: stm32: fix threaded interrupt handling
Hoang Le hoang.h.le@dektech.com.au tipc: fix unique bearer names sanity check
Hoang Le hoang.h.le@dektech.com.au tipc: add extack messages for bearer/media failure
Tony Lindgren tony@atomide.com bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
Geert Uytterhoeven geert+renesas@glider.be ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
Fabio Estevam festevam@gmail.com ARM: dts: imx7d-pico: Fix the 'tuning-step' property
Fabio Estevam festevam@gmail.com ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
Michael Walle michael@walle.cc arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
Lucas Stach l.stach@pengutronix.de arm64: dts: zii-ultra: fix 12V_MAIN voltage
Michael Walle michael@walle.cc arm64: dts: ls1028a: fix memory node
Tony Lindgren tony@atomide.com bus: ti-sysc: Fix am335x resume hang for usb otg module
Jens Wiklander jens.wiklander@linaro.org optee: use export_uuid() to copy client UUID
Vignesh Raghavendra vigneshr@ti.com arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
Magnus Karlsson magnus.karlsson@intel.com ixgbe: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com ixgbe: optimize for XDP_REDIRECT in xsk path
Magnus Karlsson magnus.karlsson@intel.com ice: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com ice: optimize for XDP_REDIRECT in xsk path
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: simplify ice_run_xdp
Magnus Karlsson magnus.karlsson@intel.com i40e: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com i40e: optimize for XDP_REDIRECT in xsk path
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: avoid link re-train during TC-MQPRIO configuration
Roja Rani Yarubandi rojay@codeaurora.org i2c: qcom-geni: Add shutdown callback for i2c
Dave Ertman david.m.ertman@intel.com ice: Allow all LLDP packets from PF to Tx
Paul Greenwalt paul.greenwalt@intel.com ice: report supported and advertised autoneg using PHY capabilities
Haiyue Wang haiyue.wang@intel.com ice: handle the VF VSI rebuild failure
Brett Creeley brett.creeley@intel.com ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
Brett Creeley brett.creeley@intel.com ice: Fix allowing VF to request more/less queues via virtchnl
Coco Li lixiaoyan@google.com ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: fix regression with HASH tc prio value update
Magnus Karlsson magnus.karlsson@intel.com ixgbevf: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com igb: add correct exception tracing for XDP
Wei Yongjun weiyongjun1@huawei.com ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei thunder.leizhen@huawei.com ieee802154: fix error return code in ieee802154_add_iface()
Daniel Borkmann daniel@iogearbox.net bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
Tobias Klauser tklauser@distanz.ch bpf: Simplify cases in bpf_base_func_proto
Zhihao Cheng chengzhihao1@huawei.com drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_ct: skip expectations for confirmed conntrack
Max Gurtovoy mgurtovoy@nvidia.com nvmet: fix freeing unallocated p2pmem
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Create multi-destination flow table with level less than 64
Roi Dayan roid@nvidia.com net/mlx5e: Check for needed capability for cvlan matching
Moshe Shemesh moshe@nvidia.com net/mlx5: Check firmware sync reset requested is set before trying to abort it
Aya Levin ayal@nvidia.com net/mlx5e: Fix incompatible casting
Maxim Mikityanskiy maximmi@nvidia.com net/tls: Fix use-after-free after the TLS device goes down and up
Maxim Mikityanskiy maximmi@nvidia.com net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
Alexander Aring aahringo@redhat.com net: sock: fix in-kernel mark setting
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
Li Huafei lihuafei1@huawei.com perf probe: Fix NULL pointer dereference in convert_variable_location()
Erik Kaneda erik.kaneda@intel.com ACPICA: Clean up context mutex during object deletion
Sagi Grimberg sagi@grimberg.me nvme-rdma: fix in-casule data send for chained sgls
Paolo Abeni pabeni@redhat.com mptcp: always parse mptcp options for MPC reqsk
Ariel Levkovich lariel@nvidia.com net/sched: act_ct: Fix ct template allocation for zone 0
Paul Blakey paulb@nvidia.com net/sched: act_ct: Offload connections with commit action
Parav Pandit parav@nvidia.com devlink: Correct VIRTUAL port to not have phys_port attributes
Arnd Bergmann arnd@arndb.de HID: i2c-hid: fix format string mismatch
Zhen Lei thunder.leizhen@huawei.com HID: pidff: fix error return code in hid_pidff_init()
Tom Rix trix@redhat.com HID: logitech-hidpp: initialize level variable
Julian Anastasov ja@ssi.bg ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy mgurtovoy@nvidia.com vfio/platform: fix module_put call in error flow
Wei Yongjun weiyongjun1@huawei.com samples: vfio-mdev: fix error handing in mdpy_fb_probe()
Randy Dunlap rdunlap@infradead.org vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei thunder.leizhen@huawei.com vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes linux@rasmusvillemoes.dk efi: cper: fix snprintf() use in cper_dimm_err_location()
Dan Carpenter dan.carpenter@oracle.com efi/libstub: prevent read overflow in find_file_option()
Heiner Kallweit hkallweit1@gmail.com efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Changbin Du changbin.du@intel.com efi/fdt: fix panic when no valid fdt found
Florian Westphal fw@strlen.de netfilter: conntrack: unregister ipv4 sockopts on error unwind
Grant Peltier grantpeltier93@gmail.com hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
Armin Wolf W_Armin@gmx.de hwmon: (dell-smm-hwmon) Fix index values
Grant Grundler grundler@chromium.org net: usb: cdc_ncm: don't spew notifications
Josef Bacik josef@toxicpanda.com btrfs: tree-checker: do not error out if extent ref hash doesn't match
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/imx6dl-yapp4-common.dtsi | 6 +- arch/arm/boot/dts/imx6q-dhcom-som.dtsi | 12 ++ arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi | 2 +- arch/arm/boot/dts/imx7d-meerkat96.dts | 2 +- arch/arm/boot/dts/imx7d-pico.dtsi | 2 +- .../freescale/fsl-ls1028a-kontron-sl28-var4.dts | 5 +- arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 4 +- .../arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi | 4 +- arch/arm64/boot/dts/ti/k3-j7200-main.dtsi | 2 + arch/arm64/kvm/sys_regs.c | 42 ++--- arch/powerpc/kernel/kprobes.c | 4 +- arch/riscv/kernel/vdso/Makefile | 4 +- arch/x86/include/asm/apic.h | 1 + arch/x86/include/asm/disabled-features.h | 7 +- arch/x86/include/asm/fpu/api.h | 6 +- arch/x86/include/asm/fpu/internal.h | 7 - arch/x86/include/asm/kvm_para.h | 10 +- arch/x86/kernel/apic/apic.c | 1 + arch/x86/kernel/apic/vector.c | 20 +++ arch/x86/kernel/fpu/xstate.c | 57 ------- arch/x86/kernel/kvm.c | 92 +++++++--- arch/x86/kernel/kvmclock.c | 26 +-- arch/x86/kvm/svm/svm.c | 8 +- arch/x86/mm/mem_encrypt_identity.c | 11 +- drivers/acpi/acpica/utdelete.c | 8 + drivers/bus/ti-sysc.c | 57 ++++++- drivers/firmware/efi/cper.c | 4 +- drivers/firmware/efi/fdtparams.c | 3 + drivers/firmware/efi/libstub/file.c | 2 +- drivers/firmware/efi/memattr.c | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 16 -- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 4 +- drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 4 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 5 +- drivers/gpu/drm/i915/selftests/i915_request.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 3 +- drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 51 +----- drivers/hid/hid-logitech-hidpp.c | 1 + drivers/hid/hid-magicmouse.c | 2 +- drivers/hid/hid-multitouch.c | 10 +- drivers/hid/i2c-hid/i2c-hid-core.c | 13 +- drivers/hid/usbhid/hid-pidff.c | 1 + drivers/hwmon/dell-smm-hwmon.c | 4 +- drivers/hwmon/pmbus/isl68137.c | 4 +- drivers/i2c/busses/i2c-qcom-geni.c | 21 ++- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 - drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 +- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 14 +- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c | 9 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 7 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 15 +- drivers/net/ethernet/intel/ice/ice_ethtool.c | 51 +----- drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 1 + drivers/net/ethernet/intel/ice/ice_lib.c | 2 + drivers/net/ethernet/intel/ice/ice_txrx.c | 24 +-- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 19 ++- drivers/net/ethernet/intel/ice/ice_xsk.c | 16 +- drivers/net/ethernet/intel/igb/igb_main.c | 10 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 21 ++- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 5 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 + drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 3 + .../ethernet/mellanox/mlx5/core/steering/dr_fw.c | 3 +- drivers/net/usb/cdc_ncm.c | 12 +- drivers/net/wireguard/Makefile | 3 +- drivers/net/wireguard/allowedips.c | 189 +++++++++++---------- drivers/net/wireguard/allowedips.h | 14 +- drivers/net/wireguard/main.c | 17 +- drivers/net/wireguard/peer.c | 27 ++- drivers/net/wireguard/peer.h | 3 + drivers/net/wireguard/selftest/allowedips.c | 165 +++++++++--------- drivers/net/wireguard/socket.c | 2 +- drivers/net/xen-netback/interface.c | 6 + drivers/nvme/host/rdma.c | 5 +- drivers/nvme/target/core.c | 33 ++-- drivers/tee/optee/call.c | 6 +- drivers/tee/optee/optee_msg.h | 6 +- drivers/tty/serial/stm32-usart.c | 22 +-- drivers/usb/dwc2/core_intr.c | 4 + drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci_config.c | 2 +- drivers/vfio/platform/vfio_platform_common.c | 2 +- fs/btrfs/extent-tree.c | 12 +- fs/btrfs/file-item.c | 10 +- fs/btrfs/inode.c | 19 ++- fs/btrfs/reflink.c | 38 +++-- fs/btrfs/tree-checker.c | 16 +- fs/btrfs/tree-log.c | 13 +- fs/ext4/extents.c | 43 ++--- fs/ext4/fast_commit.c | 182 ++++++++++---------- fs/ext4/fast_commit.h | 7 - fs/ext4/ialloc.c | 6 +- fs/ext4/mballoc.c | 2 +- fs/ext4/super.c | 11 +- fs/gfs2/glock.c | 2 + fs/io_uring.c | 6 +- fs/ocfs2/file.c | 55 +++++- include/linux/mlx5/mlx5_ifc.h | 2 + include/linux/platform_data/ti-sysc.h | 1 + include/linux/usb/usbnet.h | 2 + include/net/caif/caif_dev.h | 2 +- include/net/caif/cfcnfg.h | 2 +- include/net/caif/cfserl.h | 1 + include/net/tls.h | 10 +- init/main.c | 2 +- kernel/bpf/helpers.c | 19 +-- kernel/trace/bpf_trace.c | 32 ++-- lib/lz4/lz4_decompress.c | 6 +- lib/lz4/lz4defs.h | 1 + mm/debug_vm_pgtable.c | 4 +- mm/hugetlb.c | 14 +- mm/page_alloc.c | 2 + net/bluetooth/hci_core.c | 7 +- net/bluetooth/hci_sock.c | 4 +- net/caif/caif_dev.c | 13 +- net/caif/caif_usb.c | 14 +- net/caif/cfcnfg.c | 16 +- net/caif/cfserl.c | 5 + net/core/devlink.c | 4 +- net/core/neighbour.c | 1 + net/core/sock.c | 16 +- net/dsa/tag_8021q.c | 2 +- net/ieee802154/nl-mac.c | 4 +- net/ieee802154/nl-phy.c | 4 +- net/ipv6/route.c | 8 +- net/mptcp/subflow.c | 17 +- net/netfilter/ipvs/ip_vs_ctl.c | 2 +- net/netfilter/nf_conntrack_proto.c | 2 +- net/netfilter/nf_tables_api.c | 4 +- net/netfilter/nfnetlink_cthelper.c | 8 +- net/netfilter/nft_ct.c | 2 +- net/nfc/llcp_sock.c | 2 + net/sched/act_ct.c | 10 +- net/tipc/bearer.c | 94 +++++++--- net/tls/tls_device.c | 60 +++++-- net/tls/tls_device_fallback.c | 7 + net/tls/tls_main.c | 1 + samples/vfio-mdev/mdpy-fb.c | 13 +- sound/core/timer.c | 3 +- sound/pci/hda/hda_codec.c | 5 + sound/pci/hda/patch_realtek.c | 1 + tools/perf/util/dwarf-aux.c | 8 +- tools/perf/util/probe-finder.c | 3 + tools/testing/selftests/wireguard/netns.sh | 1 + .../testing/selftests/wireguard/qemu/kernel.config | 1 - 150 files changed, 1260 insertions(+), 925 deletions(-)
From: Josef Bacik josef@toxicpanda.com
commit 1119a72e223f3073a604f8fccb3a470ccd8a4416 upstream.
The tree checker checks the extent ref hash at read and write time to make sure we do not corrupt the file system. Generally extent references go inline, but if we have enough of them we need to make an item, which looks like
key.objectid = <bytenr> key.type = <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY> key.offset = hash(tree, owner, offset)
However if key.offset collide with an unrelated extent reference we'll simply key.offset++ until we get something that doesn't collide. Obviously this doesn't match at tree checker time, and thus we error while writing out the transaction. This is relatively easy to reproduce, simply do something like the following
xfs_io -f -c "pwrite 0 1M" file offset=2
for i in {0..10000} do xfs_io -c "reflink file 0 ${offset}M 1M" file offset=$(( offset + 2 )) done
xfs_io -c "reflink file 0 17999258914816 1M" file xfs_io -c "reflink file 0 35998517829632 1M" file xfs_io -c "reflink file 0 53752752058368 1M" file
btrfs filesystem sync
And the sync will error out because we'll abort the transaction. The magic values above are used because they generate hash collisions with the first file in the main subvol.
The fix for this is to remove the hash value check from tree checker, as we have no idea which offset ours should belong to.
Reported-by: Tuomas Lähdekorpi tuomas.lahdekorpi@gmail.com Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com [ add comment] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 40845428b739..d4a3a56726aa 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1440,22 +1440,14 @@ static int check_extent_data_ref(struct extent_buffer *leaf, return -EUCLEAN; } for (; ptr < end; ptr += sizeof(*dref)) { - u64 root_objectid; - u64 owner; u64 offset; - u64 hash;
+ /* + * We cannot check the extent_data_ref hash due to possible + * overflow from the leaf due to hash collisions. + */ dref = (struct btrfs_extent_data_ref *)ptr; - root_objectid = btrfs_extent_data_ref_root(leaf, dref); - owner = btrfs_extent_data_ref_objectid(leaf, dref); offset = btrfs_extent_data_ref_offset(leaf, dref); - hash = hash_extent_data_ref(root_objectid, owner, offset); - if (hash != key->offset) { - extent_err(leaf, slot, - "invalid extent data ref hash, item has 0x%016llx key has 0x%016llx", - hash, key->offset); - return -EUCLEAN; - } if (!IS_ALIGNED(offset, leaf->fs_info->sectorsize)) { extent_err(leaf, slot, "invalid extent data backref offset, have %llu expect aligned to %u",
From: Grant Grundler grundler@chromium.org
[ Upstream commit de658a195ee23ca6aaffe197d1d2ea040beea0a2 ]
RTL8156 sends notifications about every 32ms. Only display/log notifications when something changes.
This issue has been reported by others: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472 https://lkml.org/lkml/2020/8/27/1083
... [785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd [785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00 [785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6 [785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN [785962.929954] usb 1-1: Manufacturer: Realtek [785962.929956] usb 1-1: SerialNumber: 000000001 [785962.991755] usbcore: registered new interface driver cdc_ether [785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15 [785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384 [785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384 [785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15 [785963.019211] usbcore: registered new interface driver cdc_ncm [785963.023856] usbcore: registered new interface driver cdc_wdm [785963.025461] usbcore: registered new interface driver cdc_mbim [785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0 [785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected [785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected [785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected ...
This is about 2KB per second and will overwrite all contents of a 1MB dmesg buffer in under 10 minutes rendering them useless for debugging many kernel problems.
This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering the majority of those logs useless too.
When the link is up (expected state), spew amount is >2x higher: ... [786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected [786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink [786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected [786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink [786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected [786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink ...
Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: Grant Grundler grundler@chromium.org Reviewed-by: Hayes Wang hayeswang@realtek.com Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/cdc_ncm.c | 12 +++++++++++- include/linux/usb/usbnet.h | 2 ++ 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 854c6624e685..1d3bf810f2ca 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -1827,6 +1827,15 @@ cdc_ncm_speed_change(struct usbnet *dev, uint32_t rx_speed = le32_to_cpu(data->DLBitRRate); uint32_t tx_speed = le32_to_cpu(data->ULBitRate);
+ /* if the speed hasn't changed, don't report it. + * RTL8156 shipped before 2021 sends notification about every 32ms. + */ + if (dev->rx_speed == rx_speed && dev->tx_speed == tx_speed) + return; + + dev->rx_speed = rx_speed; + dev->tx_speed = tx_speed; + /* * Currently the USB-NET API does not support reporting the actual * device speed. Do print it instead. @@ -1867,7 +1876,8 @@ static void cdc_ncm_status(struct usbnet *dev, struct urb *urb) * USB_CDC_NOTIFY_NETWORK_CONNECTION notification shall be * sent by device after USB_CDC_NOTIFY_SPEED_CHANGE. */ - usbnet_link_change(dev, !!event->wValue, 0); + if (netif_carrier_ok(dev->net) != !!event->wValue) + usbnet_link_change(dev, !!event->wValue, 0); break;
case USB_CDC_NOTIFY_SPEED_CHANGE: diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 2e4f7721fc4e..8110c29fab42 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -83,6 +83,8 @@ struct usbnet { # define EVENT_LINK_CHANGE 11 # define EVENT_SET_RX_MODE 12 # define EVENT_NO_IP_ALIGN 13 + u32 rx_speed; /* in bps - NOT Mbps */ + u32 tx_speed; /* in bps - NOT Mbps */ };
static inline struct usb_driver *driver_of(struct usb_interface *intf)
From: Armin Wolf W_Armin@gmx.de
[ Upstream commit 35d470b5fbc9f82feb77b56bb0d5d0b5cd73e9da ]
When support for up to 10 temp sensors and for disabling automatic BIOS fan control was added, noone updated the index values used for disallowing fan support and fan type calls. Fix those values.
Signed-off-by: Armin Wolf W_Armin@gmx.de Reviewed-by: Pali Rohár pali@kernel.org Link: https://lore.kernel.org/r/20210513154546.12430-1-W_Armin@gmx.de Fixes: 1bb46a20e73b ("hwmon: (dell-smm) Support up to 10 temp sensors") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/dell-smm-hwmon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c index 73b9db9e3aab..63b74e781c5d 100644 --- a/drivers/hwmon/dell-smm-hwmon.c +++ b/drivers/hwmon/dell-smm-hwmon.c @@ -838,10 +838,10 @@ static struct attribute *i8k_attrs[] = { static umode_t i8k_is_visible(struct kobject *kobj, struct attribute *attr, int index) { - if (disallow_fan_support && index >= 8) + if (disallow_fan_support && index >= 20) return 0; if (disallow_fan_type_call && - (index == 9 || index == 12 || index == 15)) + (index == 21 || index == 25 || index == 28)) return 0; if (index >= 0 && index <= 1 && !(i8k_hwmon_flags & I8K_HWMON_HAVE_TEMP1))
From: Grant Peltier grantpeltier93@gmail.com
[ Upstream commit 2a29db088c7ae7121801a0d7a60740ed2d18c4f3 ]
The initial version of the RAA228228 datasheet claimed that the device supported READ_TEMPERATURE_3 but not READ_TEMPERATURE_1. It has since been discovered that the datasheet was incorrect. The RAA228228 does support READ_TEMPERATURE_1 but does not support READ_TEMPERATURE_3.
Signed-off-by: Grant Peltier grantpeltier93@gmail.com Fixes: 51fb91ed5a6f ("hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_1 telemetry for RAA228228") Link: https://lore.kernel.org/r/20210514211954.GA24646@raspberrypi Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/pmbus/isl68137.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwmon/pmbus/isl68137.c b/drivers/hwmon/pmbus/isl68137.c index 7cad76e07f70..3f1b826dac8a 100644 --- a/drivers/hwmon/pmbus/isl68137.c +++ b/drivers/hwmon/pmbus/isl68137.c @@ -244,8 +244,8 @@ static int isl68137_probe(struct i2c_client *client) info->read_word_data = raa_dmpvr2_read_word_data; break; case raa_dmpvr2_2rail_nontc: - info->func[0] &= ~PMBUS_HAVE_TEMP; - info->func[1] &= ~PMBUS_HAVE_TEMP; + info->func[0] &= ~PMBUS_HAVE_TEMP3; + info->func[1] &= ~PMBUS_HAVE_TEMP3; fallthrough; case raa_dmpvr2_2rail: info->pages = 2;
From: Florian Westphal fw@strlen.de
[ Upstream commit 22cbdbcfb61acc78d5fc21ebb13ccc0d7e29f793 ]
When ipv6 sockopt register fails, the ipv4 one needs to be removed.
Fixes: a0ae2562c6c ("netfilter: conntrack: remove l3proto abstraction") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_proto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c index 47e9319d2cf3..71892822bbf5 100644 --- a/net/netfilter/nf_conntrack_proto.c +++ b/net/netfilter/nf_conntrack_proto.c @@ -660,7 +660,7 @@ int nf_conntrack_proto_init(void)
#if IS_ENABLED(CONFIG_IPV6) cleanup_sockopt: - nf_unregister_sockopt(&so_getorigdst6); + nf_unregister_sockopt(&so_getorigdst); #endif return ret; }
From: Changbin Du changbin.du@gmail.com
[ Upstream commit 668a84c1bfb2b3fd5a10847825a854d63fac7baa ]
setup_arch() would invoke efi_init()->efi_get_fdt_params(). If no valid fdt found then initial_boot_params will be null. So we should stop further fdt processing here. I encountered this issue on risc-v.
Signed-off-by: Changbin Du changbin.du@gmail.com Fixes: b91540d52a08b ("RISC-V: Add EFI runtime services") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/fdtparams.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/firmware/efi/fdtparams.c b/drivers/firmware/efi/fdtparams.c index bb042ab7c2be..e901f8564ca0 100644 --- a/drivers/firmware/efi/fdtparams.c +++ b/drivers/firmware/efi/fdtparams.c @@ -98,6 +98,9 @@ u64 __init efi_get_fdt_params(struct efi_memory_map_data *mm) BUILD_BUG_ON(ARRAY_SIZE(target) != ARRAY_SIZE(name)); BUILD_BUG_ON(ARRAY_SIZE(target) != ARRAY_SIZE(dt_params[0].params));
+ if (!fdt) + return 0; + for (i = 0; i < ARRAY_SIZE(dt_params); i++) { node = fdt_path_offset(fdt, dt_params[i].path); if (node < 0)
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 45add3cc99feaaf57d4b6f01d52d532c16a1caee ]
UEFI spec 2.9, p.108, table 4-1 lists the scenario that both attributes are cleared with the description "No memory access protection is possible for Entry". So we can have valid entries where both attributes are cleared, so remove the check.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Fixes: 10f0d2f577053 ("efi: Implement generic support for the Memory Attributes table") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/memattr.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c index 5737cb0fcd44..0a9aba5f9cef 100644 --- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -67,11 +67,6 @@ static bool entry_is_valid(const efi_memory_desc_t *in, efi_memory_desc_t *out) return false; }
- if (!(in->attribute & (EFI_MEMORY_RO | EFI_MEMORY_XP))) { - pr_warn("Entry attributes invalid: RO and XP bits both cleared\n"); - return false; - } - if (PAGE_SIZE > EFI_PAGE_SIZE && (!PAGE_ALIGNED(in->phys_addr) || !PAGE_ALIGNED(in->num_pages << EFI_PAGE_SHIFT))) {
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit c4039b29fe9637e1135912813f830994af4c867f ]
If the buffer has slashes up to the end then this will read past the end of the array. I don't anticipate that this is an issue for many people in real life, but it's the right thing to do and it makes static checkers happy.
Fixes: 7a88a6227dc7 ("efi/libstub: Fix path separator regression") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/libstub/file.c b/drivers/firmware/efi/libstub/file.c index 4e81c6077188..dd95f330fe6e 100644 --- a/drivers/firmware/efi/libstub/file.c +++ b/drivers/firmware/efi/libstub/file.c @@ -103,7 +103,7 @@ static int find_file_option(const efi_char16_t *cmdline, int cmdline_len, return 0;
/* Skip any leading slashes */ - while (cmdline[i] == L'/' || cmdline[i] == L'\') + while (i < cmdline_len && (cmdline[i] == L'/' || cmdline[i] == L'\')) i++;
while (--result_len > 0 && i < cmdline_len) {
From: Rasmus Villemoes linux@rasmusvillemoes.dk
[ Upstream commit 942859d969de7f6f7f2659a79237a758b42782da ]
snprintf() should be given the full buffer size, not one less. And it guarantees nul-termination, so doing it manually afterwards is pointless.
It's even potentially harmful (though probably not in practice because CPER_REC_LEN is 256), due to the "return how much would have been written had the buffer been big enough" semantics. I.e., if the bank and/or device strings are long enough that the "DIMM location ..." output gets truncated, writing to msg[n] is a buffer overflow.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Fixes: 3760cd20402d4 ("CPER: Adjust code flow of some functions") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/cper.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c index e15d484b6a5a..ea7ca74fc173 100644 --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -276,8 +276,7 @@ static int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg) if (!msg || !(mem->validation_bits & CPER_MEM_VALID_MODULE_HANDLE)) return 0;
- n = 0; - len = CPER_REC_LEN - 1; + len = CPER_REC_LEN; dmi_memdev_name(mem->mem_dev_handle, &bank, &device); if (bank && device) n = snprintf(msg, len, "DIMM location: %s %s ", bank, device); @@ -286,7 +285,6 @@ static int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg) "DIMM location: not present. DMI handle: 0x%.4x ", mem->mem_dev_handle);
- msg[n] = '\0'; return n; }
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit d1ce2c79156d3baf0830990ab06d296477b93c26 ]
The error code returned from vfio_ext_cap_len() is stored in 'len', not in 'ret'.
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Reviewed-by: Max Gurtovoy mgurtovoy@nvidia.com Message-Id: 20210515020458.6771-1-thunder.leizhen@huawei.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/vfio_pci_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index a402adee8a21..47f21a6ca7fe 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1581,7 +1581,7 @@ static int vfio_ecap_init(struct vfio_pci_device *vdev) if (len == 0xFF) { len = vfio_ext_cap_len(vdev, ecap, epos); if (len < 0) - return ret; + return len; } }
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 2a55ca37350171d9b43d561528f23d4130097255 ]
zap_vma_ptes() is only available when CONFIG_MMU is set/enabled. Without CONFIG_MMU, vfio_pci.o has build errors, so make VFIO_PCI depend on MMU.
riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `vfio_pci_mmap_open': vfio_pci.c:(.text+0x1ec): undefined reference to `zap_vma_ptes' riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `.L0 ': vfio_pci.c:(.text+0x165c): undefined reference to `zap_vma_ptes'
Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Cornelia Huck cohuck@redhat.com Cc: kvm@vger.kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: Eric Auger eric.auger@redhat.com Message-Id: 20210515190856.2130-1-rdunlap@infradead.org Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 0f28bf99efeb..4e1107767e29 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -2,6 +2,7 @@ config VFIO_PCI tristate "VFIO support for PCI devices" depends on VFIO && PCI && EVENTFD + depends on MMU select VFIO_VIRQFD select IRQ_BYPASS_MANAGER help
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit 752774ce7793a1f8baa55aae31f3b4caac49cbe4 ]
Fix to return a negative error code from the framebuffer_alloc() error handling case instead of 0, also release regions in some error handing cases.
Fixes: cacade1946a4 ("sample: vfio mdev display - guest driver") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Message-Id: 20210520133641.1421378-1-weiyongjun1@huawei.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- samples/vfio-mdev/mdpy-fb.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/samples/vfio-mdev/mdpy-fb.c b/samples/vfio-mdev/mdpy-fb.c index 21dbf63d6e41..9ec93d90e8a5 100644 --- a/samples/vfio-mdev/mdpy-fb.c +++ b/samples/vfio-mdev/mdpy-fb.c @@ -117,22 +117,27 @@ static int mdpy_fb_probe(struct pci_dev *pdev, if (format != DRM_FORMAT_XRGB8888) { pci_err(pdev, "format mismatch (0x%x != 0x%x)\n", format, DRM_FORMAT_XRGB8888); - return -EINVAL; + ret = -EINVAL; + goto err_release_regions; } if (width < 100 || width > 10000) { pci_err(pdev, "width (%d) out of range\n", width); - return -EINVAL; + ret = -EINVAL; + goto err_release_regions; } if (height < 100 || height > 10000) { pci_err(pdev, "height (%d) out of range\n", height); - return -EINVAL; + ret = -EINVAL; + goto err_release_regions; } pci_info(pdev, "mdpy found: %dx%d framebuffer\n", width, height);
info = framebuffer_alloc(sizeof(struct mdpy_fb_par), &pdev->dev); - if (!info) + if (!info) { + ret = -ENOMEM; goto err_release_regions; + } pci_set_drvdata(pdev, info); par = info->par;
From: Max Gurtovoy mgurtovoy@nvidia.com
[ Upstream commit dc51ff91cf2d1e9a2d941da483602f71d4a51472 ]
The ->parent_module is the one that use in try_module_get. It should also be the one the we use in module_put during vfio_platform_open().
Fixes: 32a2d71c4e80 ("vfio: platform: introduce vfio-platform-base module") Signed-off-by: Max Gurtovoy mgurtovoy@nvidia.com Message-Id: 20210518192133.59195-1-mgurtovoy@nvidia.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/platform/vfio_platform_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index fb4b385191f2..e83a7cd15c95 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -289,7 +289,7 @@ err_irq: vfio_platform_regions_cleanup(vdev); err_reg: mutex_unlock(&driver_lock); - module_put(THIS_MODULE); + module_put(vdev->parent_module); return ret; }
From: Julian Anastasov ja@ssi.bg
[ Upstream commit 56e4ee82e850026d71223262c07df7d6af3bd872 ]
syzbot reported memory leak [1] when adding service with HASHED flag. We should ignore this flag both from sockopt and netlink provided data, otherwise the service is not hashed and not visible while releasing resources.
[1] BUG: memory leak unreferenced object 0xffff888115227800 (size 512): comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline] [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline] [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343 [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570 [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101 [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435 [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857 [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117 [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline] [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline] [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125 [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported-and-tested-by: syzbot+e562383183e4b1766930@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Julian Anastasov ja@ssi.bg Reviewed-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_ctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index d45dbcba8b49..c25097092a06 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -1367,7 +1367,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u, ip_vs_addr_copy(svc->af, &svc->addr, &u->addr); svc->port = u->port; svc->fwmark = u->fwmark; - svc->flags = u->flags; + svc->flags = u->flags & ~IP_VS_SVC_F_HASHED; svc->timeout = u->timeout * HZ; svc->netmask = u->netmask; svc->ipvs = ipvs;
From: Tom Rix trix@redhat.com
[ Upstream commit 81c8bf9170477d453b24a6bc3300d201d641e645 ]
Static analysis reports this representative problem
hid-logitech-hidpp.c:1356:23: warning: Assigned value is garbage or undefined hidpp->battery.level = level; ^ ~~~~~
In some cases, 'level' is never set in hidpp20_battery_map_status_voltage() Since level is not available on all hw, initialize level to unknown.
Fixes: be281368f297 ("hid-logitech-hidpp: read battery voltage from newer devices") Signed-off-by: Tom Rix trix@redhat.com Reviewed-by: Filipe Laíns lains@riseup.net Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-logitech-hidpp.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c index 74ebfb12c360..66b105162039 100644 --- a/drivers/hid/hid-logitech-hidpp.c +++ b/drivers/hid/hid-logitech-hidpp.c @@ -1259,6 +1259,7 @@ static int hidpp20_battery_map_status_voltage(u8 data[3], int *voltage, int status;
long flags = (long) data[2]; + *level = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN;
if (flags & 0x80) switch (flags & 0x07) {
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit 3dd653c077efda8152f4dd395359617d577a54cd ]
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function.
Fixes: 224ee88fe395 ("Input: add force feedback driver for PID devices") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/usbhid/hid-pidff.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hid/usbhid/hid-pidff.c b/drivers/hid/usbhid/hid-pidff.c index fddac7c72f64..07a9fe97d2e0 100644 --- a/drivers/hid/usbhid/hid-pidff.c +++ b/drivers/hid/usbhid/hid-pidff.c @@ -1292,6 +1292,7 @@ int hid_pidff_init(struct hid_device *hid)
if (pidff->pool[PID_DEVICE_MANAGED_POOL].value && pidff->pool[PID_DEVICE_MANAGED_POOL].value[0] == 0) { + error = -EPERM; hid_notice(hid, "device does not support device managed pool\n"); goto fail;
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit dc5f9f55502e13ba05731d5046a14620aa2ff456 ]
clang doesn't like printing a 32-bit integer using %hX format string:
drivers/hid/i2c-hid/i2c-hid-core.c:994:18: error: format specifies type 'unsigned short' but the argument has type '__u32' (aka 'unsigned int') [-Werror,-Wformat] client->name, hid->vendor, hid->product); ^~~~~~~~~~~ drivers/hid/i2c-hid/i2c-hid-core.c:994:31: error: format specifies type 'unsigned short' but the argument has type '__u32' (aka 'unsigned int') [-Werror,-Wformat] client->name, hid->vendor, hid->product); ^~~~~~~~~~~~
Use an explicit cast to truncate it to the low 16 bits instead.
Fixes: 9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/i2c-hid/i2c-hid-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c index cb7758d59014..13bc59ed1d9e 100644 --- a/drivers/hid/i2c-hid/i2c-hid-core.c +++ b/drivers/hid/i2c-hid/i2c-hid-core.c @@ -1131,8 +1131,8 @@ static int i2c_hid_probe(struct i2c_client *client, hid->vendor = le16_to_cpu(ihid->hdesc.wVendorID); hid->product = le16_to_cpu(ihid->hdesc.wProductID);
- snprintf(hid->name, sizeof(hid->name), "%s %04hX:%04hX", - client->name, hid->vendor, hid->product); + snprintf(hid->name, sizeof(hid->name), "%s %04X:%04X", + client->name, (u16)hid->vendor, (u16)hid->product); strlcpy(hid->phys, dev_name(&client->dev), sizeof(hid->phys));
ihid->quirks = i2c_hid_lookup_quirk(hid->vendor, hid->product);
From: Parav Pandit parav@nvidia.com
[ Upstream commit b28d8f0c25a9b0355116cace5f53ea52bd4020c8 ]
Physical port name, port number attributes do not belong to virtual port flavour. When VF or SF virtual ports are registered they incorrectly append "np0" string in the netdevice name of the VF/SF.
Before this fix, VF netdevice name were ens2f0np0v0, ens2f0np0v1 for VF 0 and 1 respectively.
After the fix, they are ens2f0v0, ens2f0v1.
With this fix, reading /sys/class/net/ens2f0v0/phys_port_name returns -EOPNOTSUPP.
Also devlink port show example for 2 VFs on one PF to ensure that any physical port attributes are not exposed.
$ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false pci/0000:06:00.3/196608: type eth netdev ens2f0v0 flavour virtual splittable false pci/0000:06:00.4/262144: type eth netdev ens2f0v1 flavour virtual splittable false
This change introduces a netdevice name change on systemd/udev version 245 and higher which honors phys_port_name sysfs file for generation of netdevice name.
This also aligns to phys_port_name usage which is limited to switchdev ports as described in [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Doc...
Fixes: acf1ee44ca5d ("devlink: Introduce devlink port flavour virtual") Signed-off-by: Parav Pandit parav@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20210526200027.14008-1-parav@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/devlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/devlink.c b/net/core/devlink.c index 5d397838bceb..90badb6f7227 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -693,7 +693,6 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg, case DEVLINK_PORT_FLAVOUR_PHYSICAL: case DEVLINK_PORT_FLAVOUR_CPU: case DEVLINK_PORT_FLAVOUR_DSA: - case DEVLINK_PORT_FLAVOUR_VIRTUAL: if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->phys.port_number)) return -EMSGSIZE; @@ -8376,7 +8375,6 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
switch (attrs->flavour) { case DEVLINK_PORT_FLAVOUR_PHYSICAL: - case DEVLINK_PORT_FLAVOUR_VIRTUAL: if (!attrs->split) n = snprintf(name, len, "p%u", attrs->phys.port_number); else @@ -8413,6 +8411,8 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, n = snprintf(name, len, "pf%uvf%u", attrs->pci_vf.pf, attrs->pci_vf.vf); break; + case DEVLINK_PORT_FLAVOUR_VIRTUAL: + return -EOPNOTSUPP; }
if (n >= len)
From: Paul Blakey paulb@nvidia.com
[ Upstream commit 0cc254e5aa37cf05f65bcdcdc0ac5c58010feb33 ]
Currently established connections are not offloaded if the filter has a "ct commit" action. This behavior will not offload connections of the following scenario:
$ tc_filter add dev $DEV ingress protocol ip prio 1 flower \ ct_state -trk \ action ct commit action goto chain 1
$ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \ action mirred egress redirect dev $DEV2
$ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \ action ct commit action goto chain 1
$ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \ ct_state +trk+est \ action mirred egress redirect dev $DEV
Offload established connections, regardless of the commit flag.
Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows") Reviewed-by: Oz Shlomo ozsh@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Paul Blakey paulb@nvidia.com Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_ct.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index aba3cd85f284..6376b7b22325 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -979,7 +979,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, */ cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force); if (!cached) { - if (!commit && tcf_ct_flow_table_lookup(p, skb, family)) { + if (tcf_ct_flow_table_lookup(p, skb, family)) { skip_add = true; goto do_nat; } @@ -1019,10 +1019,11 @@ do_nat: * even if the connection is already confirmed. */ nf_conntrack_confirm(skb); - } else if (!skip_add) { - tcf_ct_flow_table_process_conn(p->ct_ft, ct, ctinfo); }
+ if (!skip_add) + tcf_ct_flow_table_process_conn(p->ct_ft, ct, ctinfo); + out_push: skb_push_rcsum(skb, nh_ofs);
From: Ariel Levkovich lariel@nvidia.com
[ Upstream commit fb91702b743dec78d6507c53a2dec8a8883f509d ]
Fix current behavior of skipping template allocation in case the ct action is in zone 0.
Skipping the allocation may cause the datapath ct code to ignore the entire ct action with all its attributes (commit, nat) in case the ct action in zone 0 was preceded by a ct clear action.
The ct clear action sets the ct_state to untracked and resets the skb->_nfct pointer. Under these conditions and without an allocated ct template, the skb->_nfct pointer will remain NULL which will cause the tc ct action handler to exit without handling commit and nat actions, if such exist.
For example, the following rule in OVS dp: recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \ in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \ recirc(0x37a)
Will result in act_ct skipping the commit and nat actions in zone 0.
The change removes the skipping of template allocation for zone 0 and treats it the same as any other zone.
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Ariel Levkovich lariel@nvidia.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_ct.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 6376b7b22325..315a5b2f3add 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1199,9 +1199,6 @@ static int tcf_ct_fill_params(struct net *net, sizeof(p->zone)); }
- if (p->zone == NF_CT_DEFAULT_ZONE_ID) - return 0; - nf_ct_zone_init(&zone, p->zone, NF_CT_DEFAULT_ZONE_DIR, 0); tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL); if (!tmpl) {
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 06f9a435b3aa12f4de6da91f11fdce8ce7b46205 ]
In subflow_syn_recv_sock() we currently skip options parsing for OoO packet, given that such packets may not carry the relevant MPC option.
If the peer generates an MPC+data TSO packet and some of the early segments are lost or get reorder, we server will ignore the peer key, causing transient, unexpected fallback to TCP.
The solution is always parsing the incoming MPTCP options, and do the fallback only for in-order packets. This actually cleans the existing code a bit.
Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Reported-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/subflow.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index bdd6af38a9ae..96b6aca9d0ae 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -527,21 +527,20 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
/* if the sk is MP_CAPABLE, we try to fetch the client key */ if (subflow_req->mp_capable) { - if (TCP_SKB_CB(skb)->seq != subflow_req->ssn_offset + 1) { - /* here we can receive and accept an in-window, - * out-of-order pkt, which will not carry the MP_CAPABLE - * opt even on mptcp enabled paths - */ - goto create_msk; - } - + /* we can receive and accept an in-window, out-of-order pkt, + * which may not carry the MP_CAPABLE opt even on mptcp enabled + * paths: always try to extract the peer key, and fallback + * for packets missing it. + * Even OoO DSS packets coming legitly after dropped or + * reordered MPC will cause fallback, but we don't have other + * options. + */ mptcp_get_options(skb, &mp_opt); if (!mp_opt.mp_capable) { fallback = true; goto create_child; }
-create_msk: new_msk = mptcp_sk_clone(listener->conn, &mp_opt, req); if (!new_msk) fallback = true;
From: Sagi Grimberg sagi@grimberg.me
[ Upstream commit 12b2aaadb6d5ef77434e8db21f469f46fe2d392e ]
We have only 2 inline sg entries and we allow 4 sg entries for the send wr sge. Larger sgls entries will be chained. However when we build in-capsule send wr sge, we iterate without taking into account that the sgl may be chained and still fit in-capsule (which can happen if the sgl is bigger than 2, but lower-equal to 4).
Fix in-capsule data mapping to correctly iterate chained sgls.
Fixes: 38e1800275d3 ("nvme-rdma: Avoid preallocating big SGL for data") Reported-by: Walker, Benjamin benjamin.walker@intel.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Max Gurtovoy mgurtovoy@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/rdma.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 8b326508a480..e6d58402b829 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1327,16 +1327,17 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue, int count) { struct nvme_sgl_desc *sg = &c->common.dptr.sgl; - struct scatterlist *sgl = req->data_sgl.sg_table.sgl; struct ib_sge *sge = &req->sge[1]; + struct scatterlist *sgl; u32 len = 0; int i;
- for (i = 0; i < count; i++, sgl++, sge++) { + for_each_sg(req->data_sgl.sg_table.sgl, sgl, count, i) { sge->addr = sg_dma_address(sgl); sge->length = sg_dma_len(sgl); sge->lkey = queue->device->pd->local_dma_lkey; len += sge->length; + sge++; }
sg->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
From: Erik Kaneda erik.kaneda@intel.com
[ Upstream commit e4dfe108371214500ee10c2cf19268f53acaa803 ]
ACPICA commit bc43c878fd4ff27ba75b1d111b97ee90d4a82707
Fixes: c27f3d011b08 ("Fix race in GenericSerialBus (I2C) and GPIO OpRegion parameter handling") Link: https://github.com/acpica/acpica/commit/bc43c878 Reported-by: John Garry john.garry@huawei.com Reported-by: Xiang Chen chenxiang66@hisilicon.com Tested-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: Erik Kaneda erik.kaneda@intel.com Signed-off-by: Bob Moore robert.moore@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpica/utdelete.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/acpi/acpica/utdelete.c b/drivers/acpi/acpica/utdelete.c index 4c0d4e434196..72d2c0b65633 100644 --- a/drivers/acpi/acpica/utdelete.c +++ b/drivers/acpi/acpica/utdelete.c @@ -285,6 +285,14 @@ static void acpi_ut_delete_internal_obj(union acpi_operand_object *object) } break;
+ case ACPI_TYPE_LOCAL_ADDRESS_HANDLER: + + ACPI_DEBUG_PRINT((ACPI_DB_ALLOCATIONS, + "***** Address handler %p\n", object)); + + acpi_os_delete_mutex(object->address_space.context_mutex); + break; + default:
break;
From: Li Huafei lihuafei1@huawei.com
[ Upstream commit 3cb17cce1e76ccc5499915a4d7e095a1ad6bf7ff ]
If we just check whether the variable can be converted, 'tvar' should be a null pointer. However, the null pointer check is missing in the 'Constant value' execution path.
The following cases can trigger this problem:
$ cat test.c #include <stdio.h>
void main(void) { int a; const int b = 1;
asm volatile("mov %1, %0" : "=r"(a): "i"(b)); printf("a: %d\n", a); }
$ gcc test.c -o test -O -g $ sudo ./perf probe -x ./test -L "main" <main@/home/lhf/test.c:0> 0 void main(void) { 2 int a; const int b = 1;
asm volatile("mov %1, %0" : "=r"(a): "i"(b)); 6 printf("a: %d\n", a); }
$ sudo ./perf probe -x ./test -V "main:6" Segmentation fault
The check on 'tvar' is added. If 'tavr' is a null pointer, we return 0 to indicate that the variable can be converted. Now, we can successfully show the variables that can be accessed.
$ sudo ./perf probe -x ./test -V "main:6" Available variables at main:6 @<main+13> char* __fmt int a int b
However, the variable 'b' cannot be tracked.
$ sudo ./perf probe -x ./test -D "main:6 b" Failed to find the location of the 'b' variable at this address. Perhaps it has been optimized out. Use -V with the --range option to show 'b' location range. Error: Failed to add events.
This is because __die_find_variable_cb() did not successfully match variable 'b', which has the DW_AT_const_value attribute instead of DW_AT_location. We added support for DW_AT_const_value in __die_find_variable_cb(). With this modification, we can successfully track the variable 'b'.
$ sudo ./perf probe -x ./test -D "main:6 b" p:probe_test/main_L6 /home/lhf/test:0x1156 b=\1:s32
Fixes: 66f69b219716 ("perf probe: Support DW_AT_const_value constant value") Signed-off-by: Li Huafei lihuafei1@huawei.com Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Frank Ch. Eigler fche@redhat.com Cc: Jianlin Lv jianlin.lv@arm.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Cc: Yang Jihong yangjihong1@huawei.com Cc: Zhang Jinhao zhangjinhao2@huawei.com http://lore.kernel.org/lkml/20210601092750.169601-1-lihuafei1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/dwarf-aux.c | 8 ++++++-- tools/perf/util/probe-finder.c | 3 +++ 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index 7b2d471a6419..4343356f3cf9 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -975,9 +975,13 @@ static int __die_find_variable_cb(Dwarf_Die *die_mem, void *data) if ((tag == DW_TAG_formal_parameter || tag == DW_TAG_variable) && die_compare_name(die_mem, fvp->name) && - /* Does the DIE have location information or external instance? */ + /* + * Does the DIE have location information or const value + * or external instance? + */ (dwarf_attr(die_mem, DW_AT_external, &attr) || - dwarf_attr(die_mem, DW_AT_location, &attr))) + dwarf_attr(die_mem, DW_AT_location, &attr) || + dwarf_attr(die_mem, DW_AT_const_value, &attr))) return DIE_FIND_CB_END; if (dwarf_haspc(die_mem, fvp->addr)) return DIE_FIND_CB_CONTINUE; diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 76dd349aa48d..fdafbfcef687 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -190,6 +190,9 @@ static int convert_variable_location(Dwarf_Die *vr_die, Dwarf_Addr addr, immediate_value_is_supported()) { Dwarf_Sword snum;
+ if (!tvar) + return 0; + dwarf_formsdata(&attr, &snum); ret = asprintf(&tvar->value, "\%ld", (long)snum);
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 4ef8d857b5f494e62bce9085031563fda35f9563 ]
When using sub-VLANs in the range of 1-7, the resulting value from:
rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan);
is wrong according to the description from tag_8021q.c:
| 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | +-----------+-----+-----------------+-----------+-----------------------+ | DIR | SVL | SWITCH_ID | SUBVLAN | PORT | +-----------+-----+-----------------+-----------+-----------------------+
For example, when ds->index == 0, port == 3 and subvlan == 1, dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for subvlan == 0, but it should have returned 1043.
This is because the low portion of the subvlan bits are not masked properly when writing into the 12-bit VLAN value. They are masked into bits 4:3, but they should be masked into bits 5:4.
Fixes: 3eaae1d05f2b ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/tag_8021q.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_8021q.c b/net/dsa/tag_8021q.c index 8e3e8a5b8559..a00b513c22a1 100644 --- a/net/dsa/tag_8021q.c +++ b/net/dsa/tag_8021q.c @@ -64,7 +64,7 @@ #define DSA_8021Q_SUBVLAN_HI_SHIFT 9 #define DSA_8021Q_SUBVLAN_HI_MASK GENMASK(9, 9) #define DSA_8021Q_SUBVLAN_LO_SHIFT 4 -#define DSA_8021Q_SUBVLAN_LO_MASK GENMASK(4, 3) +#define DSA_8021Q_SUBVLAN_LO_MASK GENMASK(5, 4) #define DSA_8021Q_SUBVLAN_HI(x) (((x) & GENMASK(2, 2)) >> 2) #define DSA_8021Q_SUBVLAN_LO(x) ((x) & GENMASK(1, 0)) #define DSA_8021Q_SUBVLAN(x) \
From: Alexander Aring aahringo@redhat.com
[ Upstream commit dd9082f4a9f94280fbbece641bf8fc0a25f71f7a ]
This patch fixes the in-kernel mark setting by doing an additional sk_dst_reset() which was introduced by commit 50254256f382 ("sock: Reset dst when changing sk_mark via setsockopt"). The code is now shared to avoid any further suprises when changing the socket mark value.
Fixes: 84d1c617402e ("net: sock: add sock_set_mark") Reported-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Alexander Aring aahringo@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index dee29f41beaf..7de51ea15cdf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -807,10 +807,18 @@ void sock_set_rcvbuf(struct sock *sk, int val) } EXPORT_SYMBOL(sock_set_rcvbuf);
+static void __sock_set_mark(struct sock *sk, u32 val) +{ + if (val != sk->sk_mark) { + sk->sk_mark = val; + sk_dst_reset(sk); + } +} + void sock_set_mark(struct sock *sk, u32 val) { lock_sock(sk); - sk->sk_mark = val; + __sock_set_mark(sk, val); release_sock(sk); } EXPORT_SYMBOL(sock_set_mark); @@ -1118,10 +1126,10 @@ set_sndbuf: case SO_MARK: if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; - } else if (val != sk->sk_mark) { - sk->sk_mark = val; - sk_dst_reset(sk); + break; } + + __sock_set_mark(sk, val); break;
case SO_RXQ_OVFL:
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit 05fc8b6cbd4f979a6f25759c4a17dd5f657f7ecd ]
RCU synchronization is guaranteed to finish in finite time, unlike a busy loop that polls a flag. This patch is a preparation for the bugfix in the next patch, where the same synchronize_net() call will also be used to sync with the TX datapath.
Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tls.h | 1 - net/tls/tls_device.c | 10 +++------- 2 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h index 2bdd802212fe..d32a06705587 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -193,7 +193,6 @@ struct tls_offload_context_tx { (sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
enum tls_context_flags { - TLS_RX_SYNC_RUNNING = 0, /* Unlike RX where resync is driven entirely by the core in TX only * the driver knows when things went out of sync, so we need the flag * to be atomic. diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index a3ab2d3d4e4e..abc04045577d 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -680,15 +680,13 @@ static void tls_device_resync_rx(struct tls_context *tls_ctx, struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx); struct net_device *netdev;
- if (WARN_ON(test_and_set_bit(TLS_RX_SYNC_RUNNING, &tls_ctx->flags))) - return; - trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type); + rcu_read_lock(); netdev = READ_ONCE(tls_ctx->netdev); if (netdev) netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn, TLS_OFFLOAD_CTX_DIR_RX); - clear_bit_unlock(TLS_RX_SYNC_RUNNING, &tls_ctx->flags); + rcu_read_unlock(); TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXDEVICERESYNC); }
@@ -1298,9 +1296,7 @@ static int tls_device_down(struct net_device *netdev) netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_RX); WRITE_ONCE(ctx->netdev, NULL); - smp_mb__before_atomic(); /* pairs with test_and_set_bit() */ - while (test_bit(TLS_RX_SYNC_RUNNING, &ctx->flags)) - usleep_range(10, 200); + synchronize_net(); dev_put(netdev); list_del_init(&ctx->list);
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit c55dcdd435aa6c6ad6ccac0a4c636d010ee367a4 ]
When a netdev with active TLS offload goes down, tls_device_down is called to stop the offload and tear down the TLS context. However, the socket stays alive, and it still points to the TLS context, which is now deallocated. If a netdev goes up, while the connection is still active, and the data flow resumes after a number of TCP retransmissions, it will lead to a use-after-free of the TLS context.
This commit addresses this bug by keeping the context alive until its normal destruction, and implements the necessary fallbacks, so that the connection can resume in software (non-offloaded) kTLS mode.
On the TX side tls_sw_fallback is used to encrypt all packets. The RX side already has all the necessary fallbacks, because receiving non-decrypted packets is supported. The thing needed on the RX side is to block resync requests, which are normally produced after receiving non-decrypted packets.
The necessary synchronization is implemented for a graceful teardown: first the fallbacks are deployed, then the driver resources are released (it used to be possible to have a tls_dev_resync after tls_dev_del).
A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback mode. It's used to skip the RX resync logic completely, as it becomes useless, and some objects may be released (for example, resync_async, which is allocated and freed by the driver).
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tls.h | 9 ++++++ net/tls/tls_device.c | 52 +++++++++++++++++++++++++++++++---- net/tls/tls_device_fallback.c | 7 +++++ net/tls/tls_main.c | 1 + 4 files changed, 64 insertions(+), 5 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h index d32a06705587..43891b28fc48 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -193,6 +193,11 @@ struct tls_offload_context_tx { (sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
enum tls_context_flags { + /* tls_device_down was called after the netdev went down, device state + * was released, and kTLS works in software, even though rx_conf is + * still TLS_HW (needed for transition). + */ + TLS_RX_DEV_DEGRADED = 0, /* Unlike RX where resync is driven entirely by the core in TX only * the driver knows when things went out of sync, so we need the flag * to be atomic. @@ -264,6 +269,7 @@ struct tls_context {
/* cache cold stuff */ struct proto *sk_proto; + struct sock *sk;
void (*sk_destruct)(struct sock *sk);
@@ -446,6 +452,9 @@ static inline u16 tls_user_config(struct tls_context *ctx, bool tx) struct sk_buff * tls_validate_xmit_skb(struct sock *sk, struct net_device *dev, struct sk_buff *skb); +struct sk_buff * +tls_validate_xmit_skb_sw(struct sock *sk, struct net_device *dev, + struct sk_buff *skb);
static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk) { diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index abc04045577d..f718c7346088 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -50,6 +50,7 @@ static void tls_device_gc_task(struct work_struct *work); static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task); static LIST_HEAD(tls_device_gc_list); static LIST_HEAD(tls_device_list); +static LIST_HEAD(tls_device_down_list); static DEFINE_SPINLOCK(tls_device_lock);
static void tls_device_free_ctx(struct tls_context *ctx) @@ -759,6 +760,8 @@ void tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq)
if (tls_ctx->rx_conf != TLS_HW) return; + if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) + return;
prot = &tls_ctx->prot_info; rx_ctx = tls_offload_ctx_rx(tls_ctx); @@ -961,6 +964,17 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
ctx->sw.decrypted |= is_decrypted;
+ if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) { + if (likely(is_encrypted || is_decrypted)) + return 0; + + /* After tls_device_down disables the offload, the next SKB will + * likely have initial fragments decrypted, and final ones not + * decrypted. We need to reencrypt that single SKB. + */ + return tls_device_reencrypt(sk, skb); + } + /* Return immediately if the record is either entirely plaintext or * entirely ciphertext. Otherwise handle reencrypt partially decrypted * record. @@ -1288,6 +1302,26 @@ static int tls_device_down(struct net_device *netdev) spin_unlock_irqrestore(&tls_device_lock, flags);
list_for_each_entry_safe(ctx, tmp, &list, list) { + /* Stop offloaded TX and switch to the fallback. + * tls_is_sk_tx_device_offloaded will return false. + */ + WRITE_ONCE(ctx->sk->sk_validate_xmit_skb, tls_validate_xmit_skb_sw); + + /* Stop the RX and TX resync. + * tls_dev_resync must not be called after tls_dev_del. + */ + WRITE_ONCE(ctx->netdev, NULL); + + /* Start skipping the RX resync logic completely. */ + set_bit(TLS_RX_DEV_DEGRADED, &ctx->flags); + + /* Sync with inflight packets. After this point: + * TX: no non-encrypted packets will be passed to the driver. + * RX: resync requests from the driver will be ignored. + */ + synchronize_net(); + + /* Release the offload context on the driver side. */ if (ctx->tx_conf == TLS_HW) netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_TX); @@ -1295,13 +1329,21 @@ static int tls_device_down(struct net_device *netdev) !test_bit(TLS_RX_DEV_CLOSED, &ctx->flags)) netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_RX); - WRITE_ONCE(ctx->netdev, NULL); - synchronize_net(); + dev_put(netdev); - list_del_init(&ctx->list);
- if (refcount_dec_and_test(&ctx->refcount)) - tls_device_free_ctx(ctx); + /* Move the context to a separate list for two reasons: + * 1. When the context is deallocated, list_del is called. + * 2. It's no longer an offloaded context, so we don't want to + * run offload-specific code on this context. + */ + spin_lock_irqsave(&tls_device_lock, flags); + list_move_tail(&ctx->list, &tls_device_down_list); + spin_unlock_irqrestore(&tls_device_lock, flags); + + /* Device contexts for RX and TX will be freed in on sk_destruct + * by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW. + */ }
up_write(&device_offload_lock); diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c index 28895333701e..0d40016bf69e 100644 --- a/net/tls/tls_device_fallback.c +++ b/net/tls/tls_device_fallback.c @@ -430,6 +430,13 @@ struct sk_buff *tls_validate_xmit_skb(struct sock *sk, } EXPORT_SYMBOL_GPL(tls_validate_xmit_skb);
+struct sk_buff *tls_validate_xmit_skb_sw(struct sock *sk, + struct net_device *dev, + struct sk_buff *skb) +{ + return tls_sw_fallback(sk, skb); +} + struct sk_buff *tls_encrypt_skb(struct sk_buff *skb) { return tls_sw_fallback(skb->sk, skb); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 8d93cea99f2c..32a51b20509c 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -633,6 +633,7 @@ struct tls_context *tls_ctx_create(struct sock *sk) mutex_init(&ctx->tx_lock); rcu_assign_pointer(icsk->icsk_ulp_data, ctx); ctx->sk_proto = READ_ONCE(sk->sk_prot); + ctx->sk = sk; return ctx; }
From: Aya Levin ayal@nvidia.com
[ Upstream commit d8ec92005f806dfa7524e9171eca707c0bb1267e ]
Device supports setting of a single fec mode at a time, enforce this by bitmap_weight == 1. Input from fec command is in u32, avoid cast to unsigned long and use bitmap_from_arr32 to populate bitmap safely.
Fixes: 4bd9d5070b92 ("net/mlx5e: Enforce setting of a single FEC mode") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 986f0d86e94d..bc7c1962f9e6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1618,12 +1618,13 @@ static int mlx5e_set_fecparam(struct net_device *netdev, { struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5_core_dev *mdev = priv->mdev; + unsigned long fec_bitmap; u16 fec_policy = 0; int mode; int err;
- if (bitmap_weight((unsigned long *)&fecparam->fec, - ETHTOOL_FEC_LLRS_BIT + 1) > 1) + bitmap_from_arr32(&fec_bitmap, &fecparam->fec, sizeof(fecparam->fec) * BITS_PER_BYTE); + if (bitmap_weight(&fec_bitmap, ETHTOOL_FEC_LLRS_BIT + 1) > 1) return -EOPNOTSUPP;
for (mode = 0; mode < ARRAY_SIZE(pplm_fec_2_ethtool); mode++) {
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit 5940e64281c09976ce2b560244217e610bf9d029 ]
In case driver sent NACK to firmware on sync reset request, it will get sync reset abort event while it didn't set sync reset requested mode. Thus, on abort sync reset event handler, driver should check reset requested is set before trying to stop sync reset poll.
Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c index f9042e147c7f..ee710ce00795 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -354,6 +354,9 @@ static void mlx5_sync_reset_abort_event(struct work_struct *work) reset_abort_work); struct mlx5_core_dev *dev = fw_reset->dev;
+ if (!test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) + return; + mlx5_sync_reset_clear_reset_requested(dev, true); mlx5_core_warn(dev, "PCI Sync FW Update Reset Aborted.\n"); }
From: Roi Dayan roid@nvidia.com
[ Upstream commit afe93f71b5d3cdae7209213ec8ef25210b837b93 ]
If not supported show an error and return instead of trying to offload to the hardware and fail.
Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match") Reported-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 1bdeb948f56d..80abdb0b47d7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -2253,11 +2253,13 @@ static int __parse_cls_flower(struct mlx5e_priv *priv, misc_parameters); struct flow_rule *rule = flow_cls_offload_flow_rule(f); struct flow_dissector *dissector = rule->match.dissector; + enum fs_flow_table_type fs_type; u16 addr_type = 0; u8 ip_proto = 0; u8 *match_level; int err;
+ fs_type = mlx5e_is_eswitch_flow(flow) ? FS_FT_FDB : FS_FT_NIC_RX; match_level = outer_match_level;
if (dissector->used_keys & @@ -2382,6 +2384,13 @@ static int __parse_cls_flower(struct mlx5e_priv *priv, if (match.mask->vlan_id || match.mask->vlan_priority || match.mask->vlan_tpid) { + if (!MLX5_CAP_FLOWTABLE_TYPE(priv->mdev, ft_field_support.outer_second_vid, + fs_type)) { + NL_SET_ERR_MSG_MOD(extack, + "Matching on CVLAN is not supported"); + return -EOPNOTSUPP; + } + if (match.key->vlan_tpid == htons(ETH_P_8021AD)) { MLX5_SET(fte_match_set_misc, misc_c, outer_second_svlan_tag, 1);
From: Yevgeny Kliteynik kliteyn@nvidia.com
[ Upstream commit 216214c64a8c1cb9078c2c0aec7bb4a2f8e75397 ]
Flow table that contains flow pointing to multiple flow tables or multiple TIRs must have a level lower than 64. In our case it applies to muli- destination flow table. Fix the level of the created table to comply with HW Spec definitions, and still make sure that its level lower than SW-owned tables, so that it would be possible to point from the multi-destination FW table to SW tables.
Fixes: 34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use") Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Reviewed-by: Alex Vesker valex@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_fw.c | 3 ++- include/linux/mlx5/mlx5_ifc.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_fw.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_fw.c index 1fbcd012bb85..7ccfd40586ce 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_fw.c @@ -112,7 +112,8 @@ int mlx5dr_fw_create_md_tbl(struct mlx5dr_domain *dmn, int ret;
ft_attr.table_type = MLX5_FLOW_TABLE_TYPE_FDB; - ft_attr.level = dmn->info.caps.max_ft_level - 2; + ft_attr.level = min_t(int, dmn->info.caps.max_ft_level - 2, + MLX5_FT_MAX_MULTIPATH_LEVEL); ft_attr.reformat_en = reformat_req; ft_attr.decap_en = reformat_req;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index cc9ee0776974..af8f4e2cf21d 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1223,6 +1223,8 @@ enum mlx5_fc_bulk_alloc_bitmask {
#define MLX5_FC_BULK_NUM_FCS(fc_enum) (MLX5_FC_BULK_SIZE_FACTOR * (fc_enum))
+#define MLX5_FT_MAX_MULTIPATH_LEVEL 63 + enum { MLX5_STEERING_FORMAT_CONNECTX_5 = 0, MLX5_STEERING_FORMAT_CONNECTX_6DX = 1,
From: Max Gurtovoy mgurtovoy@nvidia.com
[ Upstream commit bcd9a0797d73eeff659582f23277e7ab6e5f18f3 ]
In case p2p device was found but the p2p pool is empty, the nvme target is still trying to free the sgl from the p2p pool instead of the regular sgl pool and causing a crash (BUG() is called). Instead, assign the p2p_dev for the request only if it was allocated from p2p pool.
This is the crash that was caused:
[Sun May 30 19:13:53 2021] ------------[ cut here ]------------ [Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518! [Sun May 30 19:13:53 2021] invalid opcode: 0000 [#1] SMP PTI ... [Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518! ... [Sun May 30 19:13:53 2021] RIP: 0010:gen_pool_free_owner+0xa8/0xb0 ... [Sun May 30 19:13:53 2021] Call Trace: [Sun May 30 19:13:53 2021] ------------[ cut here ]------------ [Sun May 30 19:13:53 2021] pci_free_p2pmem+0x2b/0x70 [Sun May 30 19:13:53 2021] pci_p2pmem_free_sgl+0x4f/0x80 [Sun May 30 19:13:53 2021] nvmet_req_free_sgls+0x1e/0x80 [nvmet] [Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518! [Sun May 30 19:13:53 2021] nvmet_rdma_release_rsp+0x4e/0x1f0 [nvmet_rdma] [Sun May 30 19:13:53 2021] nvmet_rdma_send_done+0x1c/0x60 [nvmet_rdma]
Fixes: c6e3f1339812 ("nvmet: add metadata support for block devices") Reviewed-by: Israel Rukshin israelr@nvidia.com Signed-off-by: Max Gurtovoy mgurtovoy@nvidia.com Reviewed-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/core.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 46e4f7ea34c8..8b939e9db470 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -988,19 +988,23 @@ static unsigned int nvmet_data_transfer_len(struct nvmet_req *req) return req->transfer_len - req->metadata_len; }
-static int nvmet_req_alloc_p2pmem_sgls(struct nvmet_req *req) +static int nvmet_req_alloc_p2pmem_sgls(struct pci_dev *p2p_dev, + struct nvmet_req *req) { - req->sg = pci_p2pmem_alloc_sgl(req->p2p_dev, &req->sg_cnt, + req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt, nvmet_data_transfer_len(req)); if (!req->sg) goto out_err;
if (req->metadata_len) { - req->metadata_sg = pci_p2pmem_alloc_sgl(req->p2p_dev, + req->metadata_sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->metadata_sg_cnt, req->metadata_len); if (!req->metadata_sg) goto out_free_sg; } + + req->p2p_dev = p2p_dev; + return 0; out_free_sg: pci_p2pmem_free_sgl(req->p2p_dev, req->sg); @@ -1008,25 +1012,19 @@ out_err: return -ENOMEM; }
-static bool nvmet_req_find_p2p_dev(struct nvmet_req *req) +static struct pci_dev *nvmet_req_find_p2p_dev(struct nvmet_req *req) { - if (!IS_ENABLED(CONFIG_PCI_P2PDMA)) - return false; - - if (req->sq->ctrl && req->sq->qid && req->ns) { - req->p2p_dev = radix_tree_lookup(&req->sq->ctrl->p2p_ns_map, - req->ns->nsid); - if (req->p2p_dev) - return true; - } - - req->p2p_dev = NULL; - return false; + if (!IS_ENABLED(CONFIG_PCI_P2PDMA) || + !req->sq->ctrl || !req->sq->qid || !req->ns) + return NULL; + return radix_tree_lookup(&req->sq->ctrl->p2p_ns_map, req->ns->nsid); }
int nvmet_req_alloc_sgls(struct nvmet_req *req) { - if (nvmet_req_find_p2p_dev(req) && !nvmet_req_alloc_p2pmem_sgls(req)) + struct pci_dev *p2p_dev = nvmet_req_find_p2p_dev(req); + + if (p2p_dev && !nvmet_req_alloc_p2pmem_sgls(p2p_dev, req)) return 0;
req->sg = sgl_alloc(nvmet_data_transfer_len(req), GFP_KERNEL, @@ -1055,6 +1053,7 @@ void nvmet_req_free_sgls(struct nvmet_req *req) pci_p2pmem_free_sgl(req->p2p_dev, req->sg); if (req->metadata_sg) pci_p2pmem_free_sgl(req->p2p_dev, req->metadata_sg); + req->p2p_dev = NULL; } else { sgl_free(req->sg); if (req->metadata_sg)
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 1710eb913bdcda3917f44d383c32de6bdabfc836 ]
nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed conntrack entry. However, nf_ct_ext_add() can only be called for !nf_ct_is_confirmed().
[ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00 [ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202 [ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887 [ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440 [ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447 [ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440 [ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20 [ 1825.352240] FS: 00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000 [ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0 [ 1825.352508] Call Trace: [ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack] [ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct] [ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables]
Add the ct helper extension only for unconfirmed conntrack. Skip rule evaluation if the ct helper extension does not exist. Thus, you can only create expectations from the first packet.
It should be possible to remove this limitation by adding a new action to attach a generic ct helper to the first packet. Then, use this ct helper extension from follow up packets to create the ct expectation.
While at it, add a missing check to skip the template conntrack too and remove check for IPCT_UNTRACK which is implicit to !ct.
Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index a1b0aac46e9e..70d46e0bbf06 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -1218,7 +1218,7 @@ static void nft_ct_expect_obj_eval(struct nft_object *obj, struct nf_conn *ct;
ct = nf_ct_get(pkt->skb, &ctinfo); - if (!ct || ctinfo == IP_CT_UNTRACKED) { + if (!ct || nf_ct_is_confirmed(ct) || nf_ct_is_template(ct)) { regs->verdict.code = NFT_BREAK; return; }
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 8971ee8b087750a23f3cd4dc55bff2d0303fd267 ]
The private helper data size cannot be updated. However, updates that contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is the same.
Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_cthelper.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nfnetlink_cthelper.c b/net/netfilter/nfnetlink_cthelper.c index 5b0d0a77379c..91afbf8ac8cf 100644 --- a/net/netfilter/nfnetlink_cthelper.c +++ b/net/netfilter/nfnetlink_cthelper.c @@ -380,10 +380,14 @@ static int nfnl_cthelper_update(const struct nlattr * const tb[], struct nf_conntrack_helper *helper) { + u32 size; int ret;
- if (tb[NFCTH_PRIV_DATA_LEN]) - return -EBUSY; + if (tb[NFCTH_PRIV_DATA_LEN]) { + size = ntohl(nla_get_be32(tb[NFCTH_PRIV_DATA_LEN])); + if (size != helper->data_len) + return -EBUSY; + }
if (tb[NFCTH_POLICY]) { ret = nfnl_cthelper_update_policy(helper, tb[NFCTH_POLICY]);
From: Zhihao Cheng chengzhihao1@huawei.com
[ Upstream commit 10c1f0cbcea93beec5d3bdc02b1a3b577b4985e7 ]
In case of error, the function live_context() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR().
Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/33c46ef24cd547d0ad21dc10644149... [tursulin: Wrap commit text, fix Fixes: tag.] Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com (cherry picked from commit 8f4caef8d5401b42c6367d46c23da5e0e8111516) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index e424a6d1a68c..7a72faf29f27 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1391,8 +1391,8 @@ static int live_breadcrumbs_smoketest(void *arg)
for (n = 0; n < smoke[0].ncontexts; n++) { smoke[0].contexts[n] = live_context(i915, file); - if (!smoke[0].contexts[n]) { - ret = -ENOMEM; + if (IS_ERR(smoke[0].contexts[n])) { + ret = PTR_ERR(smoke[0].contexts[n]); goto out_contexts; } }
From: Tobias Klauser tklauser@distanz.ch
[ Upstream commit 61ca36c8c4eb3bae35a285b1ae18c514cde65439 ]
!perfmon_capable() is checked before the last switch(func_id) in bpf_base_func_proto. Thus, the cases BPF_FUNC_trace_printk and BPF_FUNC_snprintf_btf can be moved to that last switch(func_id) to omit the inline !perfmon_capable() checks.
Signed-off-by: Tobias Klauser tklauser@distanz.ch Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20210127174615.3038-1-tklauser@distanz.ch Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/helpers.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index c489430cac78..d0fc091d2ab4 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -707,14 +707,6 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_spin_lock_proto; case BPF_FUNC_spin_unlock: return &bpf_spin_unlock_proto; - case BPF_FUNC_trace_printk: - if (!perfmon_capable()) - return NULL; - return bpf_get_trace_printk_proto(); - case BPF_FUNC_snprintf_btf: - if (!perfmon_capable()) - return NULL; - return &bpf_snprintf_btf_proto; case BPF_FUNC_jiffies64: return &bpf_jiffies64_proto; case BPF_FUNC_per_cpu_ptr: @@ -729,6 +721,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) return NULL;
switch (func_id) { + case BPF_FUNC_trace_printk: + return bpf_get_trace_printk_proto(); case BPF_FUNC_get_current_task: return &bpf_get_current_task_proto; case BPF_FUNC_probe_read_user: @@ -739,6 +733,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_probe_read_user_str_proto; case BPF_FUNC_probe_read_kernel_str: return &bpf_probe_read_kernel_str_proto; + case BPF_FUNC_snprintf_btf: + return &bpf_snprintf_btf_proto; default: return NULL; }
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit ff40e51043af63715ab413995ff46996ecf9583f ]
Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit:
1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0].
2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like:
rq_lock(rq) -> -------------------------+ trace_sched_switch() -> | bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> | audit_log_end() -> | wake_up_interruptible() -> | try_to_wake_up() -> | rq_lock(rq) --------------+
What's worse is that the intention of 59438b46471a to further restrict lockdown settings for specific applications in respect to the global lockdown policy is completely broken for BPF. The SELinux policy rule for the current lockdown check looks something like this:
allow <who> <who> : lockdown { <reason> };
However, this doesn't match with the 'current' task where the security_locked_down() is executed, example: httpd does a syscall. There is a tracing program attached to the syscall which triggers a BPF program to run, which ends up doing a bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does the permission check against 'current', that is, httpd in this example. httpd has literally zero relation to this tracing program, and it would be nonsensical having to write an SELinux policy rule against httpd to let the tracing helper pass. The policy in this case needs to be against the entity that is installing the BPF program. For example, if bpftrace would generate a histogram of syscall counts by user space application:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
bpftrace would then go and generate a BPF program from this internally. One way of doing it [for the sake of the example] could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF program _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the system like my_random_tracer _not_.
Therefore fix all three issues at the same time by taking a completely different approach for the security_locked_down() hook, that is, move the check into the program verification phase where we actually retrieve the BPF func proto. This also reliably gets the task (current) that is trying to install the BPF tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helper's fast-path which can be called several millions of times per second.
The check is then also in line with other security_locked_down() hooks in the system where the enforcement is performed at open/load time, for example, open_kcore() for /proc/kcore access or module_sig_check() for module signatures just to pick few random ones. What's out of scope in the fix as well as in other security_locked_down() hook locations /outside/ of BPF subsystem is that if the lockdown policy changes on the fly there is no retrospective action. This requires a different discussion, potentially complex infrastructure, and it's also not clear whether this can be solved generically. Either way, it is out of scope for a suitable stable fix which this one is targeting. Note that the breakage is specifically on 59438b46471a where it started to rely on 'current' as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode").
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:
I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like:
type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0
This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine.
[...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...]
[1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDC..., Serhei Makarov says:
Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on ppc64le. Example stack trace:
[...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820 [ 730.888917] trace_call_bpf+0xe9/0x240 [ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...]
Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") Reported-by: Ondrej Mosnacek omosnace@redhat.com Reported-by: Jakub Hrozek jhrozek@redhat.com Reported-by: Serhei Makarov smakarov@redhat.com Reported-by: Jiri Olsa jolsa@redhat.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Tested-by: Jiri Olsa jolsa@redhat.com Cc: Paul Moore paul@paul-moore.com Cc: James Morris jamorris@linux.microsoft.com Cc: Jerome Marchand jmarchan@redhat.com Cc: Frank Eigler fche@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.n... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/helpers.c | 7 +++++-- kernel/trace/bpf_trace.c | 32 ++++++++++++-------------------- 2 files changed, 17 insertions(+), 22 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index d0fc091d2ab4..f7e99bb8c3b6 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -14,6 +14,7 @@ #include <linux/jiffies.h> #include <linux/pid_namespace.h> #include <linux/proc_ns.h> +#include <linux/security.h>
#include "../../lib/kstrtox.h"
@@ -728,11 +729,13 @@ bpf_base_func_proto(enum bpf_func_id func_id) case BPF_FUNC_probe_read_user: return &bpf_probe_read_user_proto; case BPF_FUNC_probe_read_kernel: - return &bpf_probe_read_kernel_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : &bpf_probe_read_kernel_proto; case BPF_FUNC_probe_read_user_str: return &bpf_probe_read_user_str_proto; case BPF_FUNC_probe_read_kernel_str: - return &bpf_probe_read_kernel_str_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : &bpf_probe_read_kernel_str_proto; case BPF_FUNC_snprintf_btf: return &bpf_snprintf_btf_proto; default: diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index fcbfc9564996..01710831fd02 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -212,16 +212,11 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = { static __always_inline int bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) { - int ret = security_locked_down(LOCKDOWN_BPF_READ); + int ret;
- if (unlikely(ret < 0)) - goto fail; ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); if (unlikely(ret < 0)) - goto fail; - return ret; -fail: - memset(dst, 0, size); + memset(dst, 0, size); return ret; }
@@ -243,10 +238,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_proto = { static __always_inline int bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr) { - int ret = security_locked_down(LOCKDOWN_BPF_READ); - - if (unlikely(ret < 0)) - goto fail; + int ret;
/* * The strncpy_from_kernel_nofault() call will likely not fill the @@ -259,11 +251,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr) */ ret = strncpy_from_kernel_nofault(dst, unsafe_ptr, size); if (unlikely(ret < 0)) - goto fail; - - return ret; -fail: - memset(dst, 0, size); + memset(dst, 0, size); return ret; }
@@ -1293,16 +1281,20 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_probe_read_user: return &bpf_probe_read_user_proto; case BPF_FUNC_probe_read_kernel: - return &bpf_probe_read_kernel_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : &bpf_probe_read_kernel_proto; case BPF_FUNC_probe_read_user_str: return &bpf_probe_read_user_str_proto; case BPF_FUNC_probe_read_kernel_str: - return &bpf_probe_read_kernel_str_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : &bpf_probe_read_kernel_str_proto; #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE case BPF_FUNC_probe_read: - return &bpf_probe_read_compat_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : &bpf_probe_read_compat_proto; case BPF_FUNC_probe_read_str: - return &bpf_probe_read_compat_str_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : &bpf_probe_read_compat_str_proto; #endif #ifdef CONFIG_CGROUPS case BPF_FUNC_get_current_cgroup_id:
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit 79c6b8ed30e54b401c873dbad2511f2a1c525fd5 ]
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function.
Fixes: be51da0f3e34 ("ieee802154: Stop using NLA_PUT*().") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Link: https://lore.kernel.org/r/20210508062517.2574-1-thunder.leizhen@huawei.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ieee802154/nl-phy.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ieee802154/nl-phy.c b/net/ieee802154/nl-phy.c index 2cdc7e63fe17..88215b5c93aa 100644 --- a/net/ieee802154/nl-phy.c +++ b/net/ieee802154/nl-phy.c @@ -241,8 +241,10 @@ int ieee802154_add_iface(struct sk_buff *skb, struct genl_info *info) }
if (nla_put_string(msg, IEEE802154_ATTR_PHY_NAME, wpan_phy_name(phy)) || - nla_put_string(msg, IEEE802154_ATTR_DEV_NAME, dev->name)) + nla_put_string(msg, IEEE802154_ATTR_DEV_NAME, dev->name)) { + rc = -EMSGSIZE; goto nla_put_failure; + } dev_put(dev);
wpan_phy_put(phy);
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit 373e864cf52403b0974c2f23ca8faf9104234555 ]
Fix to return negative error code -ENOBUFS from the error handling case instead of 0, as done elsewhere in this function.
Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Link: https://lore.kernel.org/r/20210519141614.3040055-1-weiyongjun1@huawei.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ieee802154/nl-mac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ieee802154/nl-mac.c b/net/ieee802154/nl-mac.c index d19c40c684e8..71be75112321 100644 --- a/net/ieee802154/nl-mac.c +++ b/net/ieee802154/nl-mac.c @@ -680,8 +680,10 @@ int ieee802154_llsec_getparams(struct sk_buff *skb, struct genl_info *info) nla_put_u8(msg, IEEE802154_ATTR_LLSEC_SECLEVEL, params.out_level) || nla_put_u32(msg, IEEE802154_ATTR_LLSEC_FRAME_COUNTER, be32_to_cpu(params.frame_counter)) || - ieee802154_llsec_fill_key_id(msg, ¶ms.out_key)) + ieee802154_llsec_fill_key_id(msg, ¶ms.out_key)) { + rc = -ENOBUFS; goto out_free; + }
dev_put(dev);
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 74431c40b9c5fa673fff83ec157a76a69efd5c72 ]
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared.
Fixes: 9cbc948b5a20 ("igb: add XDP support") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: Vishakha Jambekar vishakha.jambekar@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 368f0aac5e1d..5c87c0a7ce3d 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8419,18 +8419,20 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter, break; case XDP_TX: result = igb_xdp_xmit_back(adapter, xdp); + if (result == IGB_XDP_CONSUMED) + goto out_failure; break; case XDP_REDIRECT: err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog); - if (!err) - result = IGB_XDP_REDIR; - else - result = IGB_XDP_CONSUMED; + if (err) + goto out_failure; + result = IGB_XDP_REDIR; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; case XDP_DROP:
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit faae81420d162551b6ef2d804aafc00f4cd68e0e ]
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared.
Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: Vishakha Jambekar vishakha.jambekar@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index 82fce27f682b..a7d0a459969a 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1072,11 +1072,14 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter, case XDP_TX: xdp_ring = adapter->xdp_ring[rx_ring->queue_index]; result = ixgbevf_xmit_xdp_ring(xdp_ring, xdp); + if (result == IXGBEVF_XDP_CONSUMED) + goto out_failure; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; /* handle aborts by dropping packet */ case XDP_DROP:
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit a27fb314cba8cb84cd6456a4699c3330a83c326d ]
commit db43b30cd89c ("cxgb4: add ethtool n-tuple filter deletion") has moved searching for next highest priority HASH filter rule to cxgb4_flow_rule_destroy(), which searches the rhashtable before the the rule is removed from it and hence always finds at least 1 entry. Fix by removing the rule from rhashtable first before calling cxgb4_flow_rule_destroy() and hence avoid fetching stale info.
Fixes: db43b30cd89c ("cxgb4: add ethtool n-tuple filter deletion") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c index 1b88bd1c2dbe..dd9be229819a 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c @@ -997,20 +997,16 @@ int cxgb4_tc_flower_destroy(struct net_device *dev, if (!ch_flower) return -ENOENT;
+ rhashtable_remove_fast(&adap->flower_tbl, &ch_flower->node, + adap->flower_ht_params); + ret = cxgb4_flow_rule_destroy(dev, ch_flower->fs.tc_prio, &ch_flower->fs, ch_flower->filter_id); if (ret) - goto err; + netdev_err(dev, "Flow rule destroy failed for tid: %u, ret: %d", + ch_flower->filter_id, ret);
- ret = rhashtable_remove_fast(&adap->flower_tbl, &ch_flower->node, - adap->flower_ht_params); - if (ret) { - netdev_err(dev, "Flow remove from rhashtable failed"); - goto err; - } kfree_rcu(ch_flower, rcu); - -err: return ret; }
From: Coco Li lixiaoyan@google.com
[ Upstream commit 821bbf79fe46a8b1d18aa456e8ed0a3c208c3754 ]
Reported by syzbot: HEAD commit: 90c911ad Merge tag 'fixes' of git://git.kernel.org/pub/scm.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master dashboard link: https://syzkaller.appspot.com/bug?extid=123aa35098fd3c000eb7 compiler: Debian clang version 11.0.1-2
================================================================== BUG: KASAN: slab-out-of-bounds in fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline] BUG: KASAN: slab-out-of-bounds in fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732 Read of size 8 at addr ffff8880145c78f8 by task syz-executor.4/17760
CPU: 0 PID: 17760 Comm: syz-executor.4 Not tainted 5.12.0-rc8-syzkaller #0 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x202/0x31e lib/dump_stack.c:120 print_address_description+0x5f/0x3b0 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report+0x15c/0x200 mm/kasan/report.c:416 fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline] fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732 fib6_nh_release+0x9a/0x430 net/ipv6/route.c:3536 fib6_info_destroy_rcu+0xcb/0x1c0 net/ipv6/ip6_fib.c:174 rcu_do_batch kernel/rcu/tree.c:2559 [inline] rcu_core+0x8f6/0x1450 kernel/rcu/tree.c:2794 __do_softirq+0x372/0x7a6 kernel/softirq.c:345 invoke_softirq kernel/softirq.c:221 [inline] __irq_exit_rcu+0x22c/0x260 kernel/softirq.c:422 irq_exit_rcu+0x5/0x20 kernel/softirq.c:434 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632 RIP: 0010:lock_acquire+0x1f6/0x720 kernel/locking/lockdep.c:5515 Code: f6 84 24 a1 00 00 00 02 0f 85 8d 02 00 00 f7 c3 00 02 00 00 49 bd 00 00 00 00 00 fc ff df 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 3d 00 00 00 00 00 4b c7 44 3d 09 00 00 00 00 43 c7 44 3d RSP: 0018:ffffc90009e06560 EFLAGS: 00000206 RAX: 1ffff920013c0cc0 RBX: 0000000000000246 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90009e066e0 R08: dffffc0000000000 R09: fffffbfff1f992b1 R10: fffffbfff1f992b1 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000000 R15: 1ffff920013c0cb4 rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267 rcu_read_lock include/linux/rcupdate.h:656 [inline] ext4_get_group_info+0xea/0x340 fs/ext4/ext4.h:3231 ext4_mb_prefetch+0x123/0x5d0 fs/ext4/mballoc.c:2212 ext4_mb_regular_allocator+0x8a5/0x28f0 fs/ext4/mballoc.c:2379 ext4_mb_new_blocks+0xc6e/0x24f0 fs/ext4/mballoc.c:4982 ext4_ext_map_blocks+0x2be3/0x7210 fs/ext4/extents.c:4238 ext4_map_blocks+0xab3/0x1cb0 fs/ext4/inode.c:638 ext4_getblk+0x187/0x6c0 fs/ext4/inode.c:848 ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:900 ext4_append+0x1a4/0x360 fs/ext4/namei.c:67 ext4_init_new_dir+0x337/0xa10 fs/ext4/namei.c:2768 ext4_mkdir+0x4b8/0xc00 fs/ext4/namei.c:2814 vfs_mkdir+0x45b/0x640 fs/namei.c:3819 ovl_do_mkdir fs/overlayfs/overlayfs.h:161 [inline] ovl_mkdir_real+0x53/0x1a0 fs/overlayfs/dir.c:146 ovl_create_real+0x280/0x490 fs/overlayfs/dir.c:193 ovl_workdir_create+0x425/0x600 fs/overlayfs/super.c:788 ovl_make_workdir+0xed/0x1140 fs/overlayfs/super.c:1355 ovl_get_workdir fs/overlayfs/super.c:1492 [inline] ovl_fill_super+0x39ee/0x5370 fs/overlayfs/super.c:2035 mount_nodev+0x52/0xe0 fs/super.c:1413 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1497 do_new_mount fs/namespace.c:2903 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3233 do_mount fs/namespace.c:3246 [inline] __do_sys_mount fs/namespace.c:3454 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3431 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f68f2b87188 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9 RDX: 00000000200000c0 RSI: 0000000020000000 RDI: 000000000040000a RBP: 00000000004bfbb9 R08: 0000000020000100 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60 R13: 00007ffe19002dff R14: 00007f68f2b87300 R15: 0000000000022000
Allocated by task 17768: kasan_save_stack mm/kasan/common.c:38 [inline] kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] ____kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506 kasan_kmalloc include/linux/kasan.h:233 [inline] __kmalloc+0xb4/0x380 mm/slub.c:4055 kmalloc include/linux/slab.h:559 [inline] kzalloc include/linux/slab.h:684 [inline] fib6_info_alloc+0x2c/0xd0 net/ipv6/ip6_fib.c:154 ip6_route_info_create+0x55d/0x1a10 net/ipv6/route.c:3638 ip6_route_add+0x22/0x120 net/ipv6/route.c:3728 inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352 rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae
Last potentially related work creation: kasan_save_stack+0x27/0x50 mm/kasan/common.c:38 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345 __call_rcu kernel/rcu/tree.c:3039 [inline] call_rcu+0x1b1/0xa30 kernel/rcu/tree.c:3114 fib6_info_release include/net/ip6_fib.h:337 [inline] ip6_route_info_create+0x10c4/0x1a10 net/ipv6/route.c:3718 ip6_route_add+0x22/0x120 net/ipv6/route.c:3728 inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352 rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation: kasan_save_stack+0x27/0x50 mm/kasan/common.c:38 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345 insert_work+0x54/0x400 kernel/workqueue.c:1331 __queue_work+0x981/0xcc0 kernel/workqueue.c:1497 queue_work_on+0x111/0x200 kernel/workqueue.c:1524 queue_work include/linux/workqueue.h:507 [inline] call_usermodehelper_exec+0x283/0x470 kernel/umh.c:433 kobject_uevent_env+0x1349/0x1730 lib/kobject_uevent.c:617 kvm_uevent_notify_change+0x309/0x3b0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4809 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:877 [inline] kvm_put_kvm+0x9c/0xd10 arch/x86/kvm/../../../virt/kvm/kvm_main.c:920 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3120 __fput+0x352/0x7b0 fs/file_table.c:280 task_work_run+0x146/0x1c0 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff8880145c7800 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 56 bytes to the right of 192-byte region [ffff8880145c7800, ffff8880145c78c0) The buggy address belongs to the page: page:ffffea00005171c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x145c7 flags: 0xfff00000000200(slab) raw: 00fff00000000200 ffffea00006474c0 0000000200000002 ffff888010c41a00 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8880145c7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880145c7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8880145c7880: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff8880145c7900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880145c7980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ==================================================================
In the ip6_route_info_create function, in the case that the nh pointer is not NULL, the fib6_nh in fib6_info has not been allocated. Therefore, when trying to free fib6_info in this error case using fib6_info_release, the function will call fib6_info_destroy_rcu, which it will access fib6_nh_release(f6i->fib6_nh); However, f6i->fib6_nh doesn't have any refcount yet given the lack of allocation causing the reported memory issue above. Therefore, releasing the empty pointer directly instead would be the solution.
Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info") Fixes: 706ec91916462 ("ipv6: Fix nexthop refcnt leak when creating ipv6 route info") Signed-off-by: Coco Li lixiaoyan@google.com Cc: David Ahern dsahern@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 71e578ed8699..ccff4738313c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3671,11 +3671,11 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, if (nh) { if (rt->fib6_src.plen) { NL_SET_ERR_MSG(extack, "Nexthops can not be used with source routing"); - goto out; + goto out_free; } if (!nexthop_get(nh)) { NL_SET_ERR_MSG(extack, "Nexthop has been deleted"); - goto out; + goto out_free; } rt->nh = nh; fib6_nh = nexthop_fib6_nh(rt->nh); @@ -3712,6 +3712,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, out: fib6_info_release(rt); return ERR_PTR(err); +out_free: + ip_fib_metrics_put(rt->fib6_metrics); + kfree(rt); + return ERR_PTR(err); }
int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags,
From: Brett Creeley brett.creeley@intel.com
[ Upstream commit f0457690af56673cb0c47af6e25430389a149225 ]
Commit 12bb018c538c ("ice: Refactor VF reset") caused a regression that removes the ability for a VF to request a different amount of queues via VIRTCHNL_OP_REQUEST_QUEUES. This prevents VF drivers to either increase or decrease the number of queue pairs they are allocated. Fix this by using the variable vf->num_req_qs when determining the vf->num_vf_qs during VF VSI creation.
Fixes: 12bb018c538c ("ice: Refactor VF reset") Signed-off-by: Brett Creeley brett.creeley@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index e1384503dd4d..fb20c6971f4c 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -192,6 +192,8 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id) break; case ICE_VSI_VF: vf = &pf->vf[vsi->vf_id]; + if (vf->num_req_qs) + vf->num_vf_qs = vf->num_req_qs; vsi->alloc_txq = vf->num_vf_qs; vsi->alloc_rxq = vf->num_vf_qs; /* pf->num_msix_per_vf includes (VF miscellaneous vector +
From: Brett Creeley brett.creeley@intel.com
[ Upstream commit 8679f07a9922068b9b6be81b632f52cac45d1b91 ]
Some AVF drivers expect the VF_MBX_ATQLEN register to be cleared for any type of VFR/VFLR. Fix this by clearing the VF_MBX_ATQLEN register at the same time as VF_MBX_ARQLEN.
Fixes: 82ba01282cf8 ("ice: clear VF ARQLEN register on reset") Signed-off-by: Brett Creeley brett.creeley@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 1 + drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 12 +++++++----- 2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h index 90abc8612a6a..406dd6bd97a7 100644 --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h @@ -31,6 +31,7 @@ #define PF_FW_ATQLEN_ATQOVFL_M BIT(29) #define PF_FW_ATQLEN_ATQCRIT_M BIT(30) #define VF_MBX_ARQLEN(_VF) (0x0022BC00 + ((_VF) * 4)) +#define VF_MBX_ATQLEN(_VF) (0x0022A800 + ((_VF) * 4)) #define PF_FW_ATQLEN_ATQENABLE_M BIT(31) #define PF_FW_ATQT 0x00080400 #define PF_MBX_ARQBAH 0x0022E400 diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index b3161c5def46..374ccce2a784 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -435,13 +435,15 @@ static void ice_trigger_vf_reset(struct ice_vf *vf, bool is_vflr, bool is_pfr) */ clear_bit(ICE_VF_STATE_INIT, vf->vf_states);
- /* VF_MBX_ARQLEN is cleared by PFR, so the driver needs to clear it - * in the case of VFR. If this is done for PFR, it can mess up VF - * resets because the VF driver may already have started cleanup - * by the time we get here. + /* VF_MBX_ARQLEN and VF_MBX_ATQLEN are cleared by PFR, so the driver + * needs to clear them in the case of VFR/VFLR. If this is done for + * PFR, it can mess up VF resets because the VF driver may already + * have started cleanup by the time we get here. */ - if (!is_pfr) + if (!is_pfr) { wr32(hw, VF_MBX_ARQLEN(vf->vf_id), 0); + wr32(hw, VF_MBX_ATQLEN(vf->vf_id), 0); + }
/* In the case of a VFLR, the HW has already reset the VF and we * just need to clean up, so don't hit the VFRTRIG register.
From: Haiyue Wang haiyue.wang@intel.com
[ Upstream commit c7ee6ce1cf60b7fcdbdd2354d377d00bae3fa2d2 ]
VSI rebuild can be failed for LAN queue config, then the VF's VSI will be NULL, the VF reset should be stopped with the VF entering into the disable state.
Fixes: 12bb018c538c ("ice: Refactor VF reset") Signed-off-by: Haiyue Wang haiyue.wang@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 374ccce2a784..c9f82fd3cf48 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -1341,7 +1341,12 @@ bool ice_reset_vf(struct ice_vf *vf, bool is_vflr) }
ice_vf_pre_vsi_rebuild(vf); - ice_vf_rebuild_vsi_with_release(vf); + + if (ice_vf_rebuild_vsi_with_release(vf)) { + dev_err(dev, "Failed to release and setup the VF%u's VSI\n", vf->vf_id); + return false; + } + ice_vf_post_vsi_rebuild(vf);
return true;
From: Paul Greenwalt paul.greenwalt@intel.com
[ Upstream commit 5cd349c349d6ec52862e550d3576893d35ab8ac2 ]
Ethtool incorrectly reported supported and advertised auto-negotiation settings for a backplane PHY image which did not support auto-negotiation. This can occur when using media or PHY type for reporting ethtool supported and advertised auto-negotiation settings.
Remove setting supported and advertised auto-negotiation settings based on PHY type in ice_phy_type_to_ethtool(), and MAC type in ice_get_link_ksettings().
Ethtool supported and advertised auto-negotiation settings should be based on the PHY image using the AQ command get PHY capabilities with media. Add setting supported and advertised auto-negotiation settings based get PHY capabilities with media in ice_get_link_ksettings().
Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Paul Greenwalt paul.greenwalt@intel.com Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_ethtool.c | 51 +++----------------- 1 file changed, 6 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index d70573f5072c..a7975afecf70 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -1797,49 +1797,6 @@ ice_phy_type_to_ethtool(struct net_device *netdev, ice_ethtool_advertise_link_mode(ICE_AQ_LINK_SPEED_100GB, 100000baseKR4_Full); } - - /* Autoneg PHY types */ - if (phy_types_low & ICE_PHY_TYPE_LOW_100BASE_TX || - phy_types_low & ICE_PHY_TYPE_LOW_1000BASE_T || - phy_types_low & ICE_PHY_TYPE_LOW_1000BASE_KX || - phy_types_low & ICE_PHY_TYPE_LOW_2500BASE_T || - phy_types_low & ICE_PHY_TYPE_LOW_2500BASE_KX || - phy_types_low & ICE_PHY_TYPE_LOW_5GBASE_T || - phy_types_low & ICE_PHY_TYPE_LOW_5GBASE_KR || - phy_types_low & ICE_PHY_TYPE_LOW_10GBASE_T || - phy_types_low & ICE_PHY_TYPE_LOW_10GBASE_KR_CR1 || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_T || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_CR || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_CR_S || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_CR1 || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_KR || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_KR_S || - phy_types_low & ICE_PHY_TYPE_LOW_25GBASE_KR1 || - phy_types_low & ICE_PHY_TYPE_LOW_40GBASE_CR4 || - phy_types_low & ICE_PHY_TYPE_LOW_40GBASE_KR4) { - ethtool_link_ksettings_add_link_mode(ks, supported, - Autoneg); - ethtool_link_ksettings_add_link_mode(ks, advertising, - Autoneg); - } - if (phy_types_low & ICE_PHY_TYPE_LOW_50GBASE_CR2 || - phy_types_low & ICE_PHY_TYPE_LOW_50GBASE_KR2 || - phy_types_low & ICE_PHY_TYPE_LOW_50GBASE_CP || - phy_types_low & ICE_PHY_TYPE_LOW_50GBASE_KR_PAM4) { - ethtool_link_ksettings_add_link_mode(ks, supported, - Autoneg); - ethtool_link_ksettings_add_link_mode(ks, advertising, - Autoneg); - } - if (phy_types_low & ICE_PHY_TYPE_LOW_100GBASE_CR4 || - phy_types_low & ICE_PHY_TYPE_LOW_100GBASE_KR4 || - phy_types_low & ICE_PHY_TYPE_LOW_100GBASE_KR_PAM4 || - phy_types_low & ICE_PHY_TYPE_LOW_100GBASE_CP2) { - ethtool_link_ksettings_add_link_mode(ks, supported, - Autoneg); - ethtool_link_ksettings_add_link_mode(ks, advertising, - Autoneg); - } }
#define TEST_SET_BITS_TIMEOUT 50 @@ -1996,9 +1953,7 @@ ice_get_link_ksettings(struct net_device *netdev, ks->base.port = PORT_TP; break; case ICE_MEDIA_BACKPLANE: - ethtool_link_ksettings_add_link_mode(ks, supported, Autoneg); ethtool_link_ksettings_add_link_mode(ks, supported, Backplane); - ethtool_link_ksettings_add_link_mode(ks, advertising, Autoneg); ethtool_link_ksettings_add_link_mode(ks, advertising, Backplane); ks->base.port = PORT_NONE; @@ -2073,6 +2028,12 @@ ice_get_link_ksettings(struct net_device *netdev, if (caps->link_fec_options & ICE_AQC_PHY_FEC_25G_RS_CLAUSE91_EN) ethtool_link_ksettings_add_link_mode(ks, supported, FEC_RS);
+ /* Set supported and advertised autoneg */ + if (ice_is_phy_caps_an_enabled(caps)) { + ethtool_link_ksettings_add_link_mode(ks, supported, Autoneg); + ethtool_link_ksettings_add_link_mode(ks, advertising, Autoneg); + } + done: kfree(caps); return err;
From: Dave Ertman david.m.ertman@intel.com
[ Upstream commit f9f83202b7263ac371d616d6894a2c9ed79158ef ]
Currently in the ice driver, the check whether to allow a LLDP packet to egress the interface from the PF_VSI is being based on the SKB's priority field. It checks to see if the packets priority is equal to TC_PRIO_CONTROL. Injected LLDP packets do not always meet this condition.
SCAPY defaults to a sk_buff->protocol value of ETH_P_ALL (0x0003) and does not set the priority field. There will be other injection methods (even ones used by end users) that will not correctly configure the socket so that SKB fields are correctly populated.
Then ethernet header has to have to correct value for the protocol though.
Add a check to also allow packets whose ethhdr->h_proto matches ETH_P_LLDP (0x88CC).
Fixes: 0c3a6101ff2d ("ice: Allow egress control packets from PF_VSI") Signed-off-by: Dave Ertman david.m.ertman@intel.com Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 0f2544c420ac..1510091a63e8 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -2373,6 +2373,7 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_ring *tx_ring) struct ice_tx_offload_params offload = { 0 }; struct ice_vsi *vsi = tx_ring->vsi; struct ice_tx_buf *first; + struct ethhdr *eth; unsigned int count; int tso, csum;
@@ -2419,7 +2420,9 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_ring *tx_ring) goto out_drop;
/* allow CONTROL frames egress from main VSI if FW LLDP disabled */ - if (unlikely(skb->priority == TC_PRIO_CONTROL && + eth = (struct ethhdr *)skb_mac_header(skb); + if (unlikely((skb->priority == TC_PRIO_CONTROL || + eth->h_proto == htons(ETH_P_LLDP)) && vsi->type == ICE_VSI_PF && vsi->port_info->qos_cfg.is_sw_lldp)) offload.cd_qw1 |= (u64)(ICE_TX_DESC_DTYPE_CTX |
From: Roja Rani Yarubandi rojay@codeaurora.org
[ Upstream commit 9f78c607600ce4f2a952560de26534715236f612 ]
If the hardware is still accessing memory after SMMU translation is disabled (as part of smmu shutdown callback), then the IOVAs (I/O virtual address) which it was using will go on the bus as the physical addresses which will result in unknown crashes like NoC/interconnect errors.
So, implement shutdown callback for i2c driver to suspend the bus during system "reboot" or "shutdown".
Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by: Roja Rani Yarubandi rojay@codeaurora.org Reviewed-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-qcom-geni.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c index 4a6dd05d6dbf..899ad2c7d67d 100644 --- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -654,6 +654,14 @@ static int geni_i2c_remove(struct platform_device *pdev) return 0; }
+static void geni_i2c_shutdown(struct platform_device *pdev) +{ + struct geni_i2c_dev *gi2c = platform_get_drvdata(pdev); + + /* Make client i2c transfers start failing */ + i2c_mark_adapter_suspended(&gi2c->adap); +} + static int __maybe_unused geni_i2c_runtime_suspend(struct device *dev) { int ret; @@ -718,6 +726,7 @@ MODULE_DEVICE_TABLE(of, geni_i2c_dt_match); static struct platform_driver geni_i2c_driver = { .probe = geni_i2c_probe, .remove = geni_i2c_remove, + .shutdown = geni_i2c_shutdown, .driver = { .name = "geni_i2c", .pm = &geni_i2c_pm_ops,
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit 3822d0670c9d4342794d73e0d0e615322b40438e ]
When configuring TC-MQPRIO offload, only turn off netdev carrier and don't bring physical link down in hardware. Otherwise, when the physical link is brought up again after configuration, it gets re-trained and stalls ongoing traffic.
Also, when firmware is no longer accessible or crashed, avoid sending FLOWC and waiting for reply that will never come.
Fix following hung_task_timeout_secs trace seen in these cases.
INFO: task tc:20807 blocked for more than 122 seconds. Tainted: G S 5.13.0-rc3+ #122 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tc state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000 Call Trace: __schedule+0x27b/0x6a0 schedule+0x37/0xa0 schedule_preempt_disabled+0x5/0x10 __mutex_lock.isra.14+0x2a0/0x4a0 ? netlink_lookup+0x120/0x1a0 ? rtnl_fill_ifinfo+0x10f0/0x10f0 __netlink_dump_start+0x70/0x250 rtnetlink_rcv_msg+0x28b/0x380 ? rtnl_fill_ifinfo+0x10f0/0x10f0 ? rtnl_calcit.isra.42+0x120/0x120 netlink_rcv_skb+0x4b/0xf0 netlink_unicast+0x1a0/0x280 netlink_sendmsg+0x216/0x440 sock_sendmsg+0x56/0x60 __sys_sendto+0xe9/0x150 ? handle_mm_fault+0x6d/0x1b0 ? do_user_addr_fault+0x1c5/0x620 __x64_sys_sendto+0x1f/0x30 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7f73218321 RSP: 002b:00007ffd19626208 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055b7c0a8b240 RCX: 00007f7f73218321 RDX: 0000000000000028 RSI: 00007ffd19626210 RDI: 0000000000000003 RBP: 000055b7c08680ff R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000055b7c085f5f6 R13: 000055b7c085f60a R14: 00007ffd19636470 R15: 00007ffd196262a0
Fixes: b1396c2bd675 ("cxgb4: parse and configure TC-MQPRIO offload") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 -- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 ++-- drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c | 9 ++++++--- drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 ++++++ 4 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 27308600da15..2dd486915629 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -2177,8 +2177,6 @@ int cxgb4_update_mac_filt(struct port_info *pi, unsigned int viid, bool persistent, u8 *smt_idx); int cxgb4_get_msix_idx_from_bmap(struct adapter *adap); void cxgb4_free_msix_idx_in_bmap(struct adapter *adap, u32 msix_idx); -int cxgb_open(struct net_device *dev); -int cxgb_close(struct net_device *dev); void cxgb4_enable_rx(struct adapter *adap, struct sge_rspq *q); void cxgb4_quiesce_rx(struct sge_rspq *q); int cxgb4_port_mirror_alloc(struct net_device *dev); diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 23c13f34a572..04dcb5e4b316 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -2834,7 +2834,7 @@ static void cxgb_down(struct adapter *adapter) /* * net_device operations */ -int cxgb_open(struct net_device *dev) +static int cxgb_open(struct net_device *dev) { struct port_info *pi = netdev_priv(dev); struct adapter *adapter = pi->adapter; @@ -2882,7 +2882,7 @@ out_unlock: return err; }
-int cxgb_close(struct net_device *dev) +static int cxgb_close(struct net_device *dev) { struct port_info *pi = netdev_priv(dev); struct adapter *adapter = pi->adapter; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c index 6c259de96f96..338b04f339b3 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c @@ -589,7 +589,8 @@ int cxgb4_setup_tc_mqprio(struct net_device *dev, * down before configuring tc params. */ if (netif_running(dev)) { - cxgb_close(dev); + netif_tx_stop_all_queues(dev); + netif_carrier_off(dev); needs_bring_up = true; }
@@ -615,8 +616,10 @@ int cxgb4_setup_tc_mqprio(struct net_device *dev, }
out: - if (needs_bring_up) - cxgb_open(dev); + if (needs_bring_up) { + netif_tx_start_all_queues(dev); + netif_carrier_on(dev); + }
mutex_unlock(&adap->tc_mqprio->mqprio_mutex); return ret; diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index 546301272271..ccb6bd002b20 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -2552,6 +2552,12 @@ int cxgb4_ethofld_send_flowc(struct net_device *dev, u32 eotid, u32 tc) if (!eosw_txq) return -ENOMEM;
+ if (!(adap->flags & CXGB4_FW_OK)) { + /* Don't stall caller when access to FW is lost */ + complete(&eosw_txq->completion); + return -EIO; + } + skb = alloc_skb(len, GFP_KERNEL); if (!skb) return -ENOMEM;
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 346497c78d15cdd5bdc3b642a895009359e5457f ]
Optimize i40e_run_xdp_zc() for the XDP program verdict being XDP_REDIRECT in the xsk zero-copy path. This path is only used when having AF_XDP zero-copy on and in that case most packets will be directed to user space. This provides a little over 100k extra packets in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: George Kuruvinakunnel george.kuruvinakunnel@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 8557807b4171..18e8d4e456d6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -159,6 +159,13 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) xdp_prog = READ_ONCE(rx_ring->xdp_prog); act = bpf_prog_run_xdp(xdp_prog, xdp);
+ if (likely(act == XDP_REDIRECT)) { + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; + rcu_read_unlock(); + return result; + } + switch (act) { case XDP_PASS: break; @@ -166,10 +173,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index]; result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring); break; - case XDP_REDIRECT: - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; - break; default: bpf_warn_invalid_xdp_action(act); fallthrough;
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit f6c10b48f8c8da44adaff730d8e700b6272add2b ]
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared.
Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: Kiran Bhandare kiranx.bhandare@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 7 ++++++- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 8 ++++++-- 2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 011f484606a3..c40ac82db863 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2205,15 +2205,20 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp) case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index]; result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring); + if (result == I40E_XDP_CONSUMED) + goto out_failure; break; case XDP_REDIRECT: err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; + if (err) + goto out_failure; + result = I40E_XDP_REDIR; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; /* handle aborts by dropping packet */ case XDP_DROP: diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 18e8d4e456d6..86c79f71c685 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -161,9 +161,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
if (likely(act == XDP_REDIRECT)) { err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; + if (err) + goto out_failure; rcu_read_unlock(); - return result; + return I40E_XDP_REDIR; }
switch (act) { @@ -172,11 +173,14 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index]; result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring); + if (result == I40E_XDP_CONSUMED) + goto out_failure; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; /* handle aborts by dropping packet */ case XDP_DROP:
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 59c97d1b51b119eace6b1e61a6f820701f5a8299 ]
There's no need for 'result' variable, we can directly return the internal status based on action returned by xdp prog.
Reviewed-by: Björn Töpel bjorn.topel@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Kiran Bhandare kiranx.bhandare@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 1510091a63e8..5cc2854a5d48 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -537,22 +537,20 @@ static int ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - int err, result = ICE_XDP_PASS; struct ice_ring *xdp_ring; + int err; u32 act;
act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: - break; + return ICE_XDP_PASS; case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()]; - result = ice_xmit_xdp_buff(xdp, xdp_ring); - break; + return ice_xmit_xdp_buff(xdp, xdp_ring); case XDP_REDIRECT: err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED; - break; + return !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED; default: bpf_warn_invalid_xdp_action(act); fallthrough; @@ -560,11 +558,8 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp, trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; case XDP_DROP: - result = ICE_XDP_CONSUMED; - break; + return ICE_XDP_CONSUMED; } - - return result; }
/**
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit bb52073645a618ab4d93c8d932fb8faf114c55bc ]
Optimize ice_run_xdp_zc() for the XDP program verdict being XDP_REDIRECT in the xsk zero-copy path. This path is only used when having AF_XDP zero-copy on and in that case most packets will be directed to user space. This provides a little over 100k extra packets in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: George Kuruvinakunnel george.kuruvinakunnel@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_xsk.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 98101a8e2952..952148eede30 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -524,6 +524,14 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp) }
act = bpf_prog_run_xdp(xdp_prog, xdp); + + if (likely(act == XDP_REDIRECT)) { + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + result = !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED; + rcu_read_unlock(); + return result; + } + switch (act) { case XDP_PASS: break; @@ -531,10 +539,6 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp) xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->q_index]; result = ice_xmit_xdp_buff(xdp, xdp_ring); break; - case XDP_REDIRECT: - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED; - break; default: bpf_warn_invalid_xdp_action(act); fallthrough;
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 89d65df024c59988291f643b4e45d1528c51aef9 ]
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared.
Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: Kiran Bhandare kiranx.bhandare@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 12 +++++++++--- drivers/net/ethernet/intel/ice/ice_xsk.c | 8 ++++++-- 2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 5cc2854a5d48..442a9bcbf60a 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -538,7 +538,7 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { struct ice_ring *xdp_ring; - int err; + int err, result; u32 act;
act = bpf_prog_run_xdp(xdp_prog, xdp); @@ -547,14 +547,20 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp, return ICE_XDP_PASS; case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()]; - return ice_xmit_xdp_buff(xdp, xdp_ring); + result = ice_xmit_xdp_buff(xdp, xdp_ring); + if (result == ICE_XDP_CONSUMED) + goto out_failure; + return result; case XDP_REDIRECT: err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - return !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED; + if (err) + goto out_failure; + return ICE_XDP_REDIR; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; case XDP_DROP: diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 952148eede30..9f36f8d7a985 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -527,9 +527,10 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
if (likely(act == XDP_REDIRECT)) { err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED; + if (err) + goto out_failure; rcu_read_unlock(); - return result; + return ICE_XDP_REDIR; }
switch (act) { @@ -538,11 +539,14 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp) case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->q_index]; result = ice_xmit_xdp_buff(xdp, xdp_ring); + if (result == ICE_XDP_CONSUMED) + goto out_failure; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; case XDP_DROP:
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 7d52fe2eaddfa3d7255d43c3e89ebf2748b7ea7a ]
Optimize ixgbe_run_xdp_zc() for the XDP program verdict being XDP_REDIRECT in the xsk zero-copy path. This path is only used when having AF_XDP zero-copy on and in that case most packets will be directed to user space. This provides a little under 100k extra packets in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: Vishakha Jambekar vishakha.jambekar@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 3771857cf887..91ad5b902673 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -104,6 +104,13 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, xdp_prog = READ_ONCE(rx_ring->xdp_prog); act = bpf_prog_run_xdp(xdp_prog, xdp);
+ if (likely(act == XDP_REDIRECT)) { + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED; + rcu_read_unlock(); + return result; + } + switch (act) { case XDP_PASS: break; @@ -115,10 +122,6 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, } result = ixgbe_xmit_xdp_ring(adapter, xdpf); break; - case XDP_REDIRECT: - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED; - break; default: bpf_warn_invalid_xdp_action(act); fallthrough;
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 8281356b1cab1cccc71412eb4cf28b99d6bb2c19 ]
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared.
Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Tested-by: Vishakha Jambekar vishakha.jambekar@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 ++++++++-------- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 14 ++++++++------ 2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 0b9fddbc5db4..1bfba87f1ff6 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2218,23 +2218,23 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, break; case XDP_TX: xdpf = xdp_convert_buff_to_frame(xdp); - if (unlikely(!xdpf)) { - result = IXGBE_XDP_CONSUMED; - break; - } + if (unlikely(!xdpf)) + goto out_failure; result = ixgbe_xmit_xdp_ring(adapter, xdpf); + if (result == IXGBE_XDP_CONSUMED) + goto out_failure; break; case XDP_REDIRECT: err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog); - if (!err) - result = IXGBE_XDP_REDIR; - else - result = IXGBE_XDP_CONSUMED; + if (err) + goto out_failure; + result = IXGBE_XDP_REDIR; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; /* handle aborts by dropping packet */ case XDP_DROP: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 91ad5b902673..f72d2978263b 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -106,9 +106,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
if (likely(act == XDP_REDIRECT)) { err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED; + if (err) + goto out_failure; rcu_read_unlock(); - return result; + return IXGBE_XDP_REDIR; }
switch (act) { @@ -116,16 +117,17 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, break; case XDP_TX: xdpf = xdp_convert_buff_to_frame(xdp); - if (unlikely(!xdpf)) { - result = IXGBE_XDP_CONSUMED; - break; - } + if (unlikely(!xdpf)) + goto out_failure; result = ixgbe_xmit_xdp_ring(adapter, xdpf); + if (result == IXGBE_XDP_CONSUMED) + goto out_failure; break; default: bpf_warn_invalid_xdp_action(act); fallthrough; case XDP_ABORTED: +out_failure: trace_xdp_exception(rx_ring->netdev, xdp_prog, act); fallthrough; /* handle aborts by dropping packet */ case XDP_DROP:
From: Vignesh Raghavendra vigneshr@ti.com
[ Upstream commit 52ae30f55a2a40cff549fac95de82f25403bd387 ]
Traffic through main NAVSS interconnect is coherent wrt ARM caches on J7200 SoC. Add missing dma-coherent property to main_navss node.
Also add dma-ranges to be consistent with mcu_navss node and with AM65/J721e main_navss and mcu_navss nodes.
Fixes: d361ed88455fe ("arm64: dts: ti: Add support for J7200 SoC") Signed-off-by: Vignesh Raghavendra vigneshr@ti.com Reviewed-by: Peter Ujfalusi peter.ujfalusi@gmail.com Signed-off-by: Nishanth Menon nm@ti.com Link: https://lore.kernel.org/r/20210510180601.19458-1-vigneshr@ti.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/ti/k3-j7200-main.dtsi | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/ti/k3-j7200-main.dtsi b/arch/arm64/boot/dts/ti/k3-j7200-main.dtsi index 72d6496e88dd..689538244392 100644 --- a/arch/arm64/boot/dts/ti/k3-j7200-main.dtsi +++ b/arch/arm64/boot/dts/ti/k3-j7200-main.dtsi @@ -78,6 +78,8 @@ #size-cells = <2>; ranges = <0x00 0x30000000 0x00 0x30000000 0x00 0x0c400000>; ti,sci-dev-id = <199>; + dma-coherent; + dma-ranges;
main_navss_intr: interrupt-controller1 { compatible = "ti,sci-intr";
From: Jens Wiklander jens.wiklander@linaro.org
[ Upstream commit 673c7aa2436bfc857b92417f3e590a297c586dde ]
Prior to this patch optee_open_session() was making assumptions about the internal format of uuid_t by casting a memory location in a parameter struct to uuid_t *. Fix this using export_uuid() to get a well defined binary representation and also add an octets field in struct optee_msg_param in order to avoid casting.
Fixes: c5b4312bea5d ("tee: optee: Add support for session login client UUID generation") Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tee/optee/call.c | 6 ++++-- drivers/tee/optee/optee_msg.h | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c index 780d7c4fd756..0790de29f0ca 100644 --- a/drivers/tee/optee/call.c +++ b/drivers/tee/optee/call.c @@ -217,6 +217,7 @@ int optee_open_session(struct tee_context *ctx, struct optee_msg_arg *msg_arg; phys_addr_t msg_parg; struct optee_session *sess = NULL; + uuid_t client_uuid;
/* +2 for the meta parameters added below */ shm = get_msg_arg(ctx, arg->num_params + 2, &msg_arg, &msg_parg); @@ -237,10 +238,11 @@ int optee_open_session(struct tee_context *ctx, memcpy(&msg_arg->params[0].u.value, arg->uuid, sizeof(arg->uuid)); msg_arg->params[1].u.value.c = arg->clnt_login;
- rc = tee_session_calc_client_uuid((uuid_t *)&msg_arg->params[1].u.value, - arg->clnt_login, arg->clnt_uuid); + rc = tee_session_calc_client_uuid(&client_uuid, arg->clnt_login, + arg->clnt_uuid); if (rc) goto out; + export_uuid(msg_arg->params[1].u.octets, &client_uuid);
rc = optee_to_msg_param(msg_arg->params + 2, arg->num_params, param); if (rc) diff --git a/drivers/tee/optee/optee_msg.h b/drivers/tee/optee/optee_msg.h index 7b2d919da2ac..c7ac7d02d6cc 100644 --- a/drivers/tee/optee/optee_msg.h +++ b/drivers/tee/optee/optee_msg.h @@ -9,7 +9,7 @@ #include <linux/types.h>
/* - * This file defines the OP-TEE message protocol used to communicate + * This file defines the OP-TEE message protocol (ABI) used to communicate * with an instance of OP-TEE running in secure world. * * This file is divided into three sections. @@ -146,9 +146,10 @@ struct optee_msg_param_value { * @tmem: parameter by temporary memory reference * @rmem: parameter by registered memory reference * @value: parameter by opaque value + * @octets: parameter by octet string * * @attr & OPTEE_MSG_ATTR_TYPE_MASK indicates if tmem, rmem or value is used in - * the union. OPTEE_MSG_ATTR_TYPE_VALUE_* indicates value, + * the union. OPTEE_MSG_ATTR_TYPE_VALUE_* indicates value or octets, * OPTEE_MSG_ATTR_TYPE_TMEM_* indicates @tmem and * OPTEE_MSG_ATTR_TYPE_RMEM_* indicates @rmem, * OPTEE_MSG_ATTR_TYPE_NONE indicates that none of the members are used. @@ -159,6 +160,7 @@ struct optee_msg_param { struct optee_msg_param_tmem tmem; struct optee_msg_param_rmem rmem; struct optee_msg_param_value value; + u8 octets[24]; } u; };
From: Tony Lindgren tony@atomide.com
[ Upstream commit 4d7b324e231366ea772ab10df46be31273ca39af ]
On am335x, suspend and resume only works once, and the system hangs if suspend is attempted again. However, turns out suspend and resume works fine multiple times if the USB OTG driver for musb controller is loaded.
The issue is caused my the interconnect target module losing context during suspend, and it needs a restore on resume to be reconfigure again as debugged earlier by Dave Gerlach d-gerlach@ti.com.
There are also other modules that need a restore on resume, like gpmc as noted by Dave. So let's add a common way to restore an interconnect target module based on a quirk flag. For now, let's enable the quirk for am335x otg only to fix the suspend and resume issue.
As gpmc is not causing hangs based on tests with BeagleBone, let's patch gpmc separately. For gpmc, we also need a hardware reset done before restore according to Dave.
To reinit the modules, we decouple system suspend from PM runtime. We replace calls to pm_runtime_force_suspend() and pm_runtime_force_resume() with direct calls to internal functions and rely on the driver internal state. There no point trying to handle complex system suspend and resume quirks via PM runtime.
This is issue should have already been noticed with commit 1819ef2e2d12 ("bus: ti-sysc: Use swsup quirks also for am335x musb") when quirk handling was added for am335x otg for swsup. But the issue went unnoticed as having musb driver loaded hides the issue, and suspend and resume works once without the driver loaded.
Fixes: 1819ef2e2d12 ("bus: ti-sysc: Use swsup quirks also for am335x musb") Suggested-by: Dave Gerlach d-gerlach@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bus/ti-sysc.c | 53 +++++++++++++++++++++++++-- include/linux/platform_data/ti-sysc.h | 1 + 2 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c index 9afbe4992a1d..b7f8c6074a15 100644 --- a/drivers/bus/ti-sysc.c +++ b/drivers/bus/ti-sysc.c @@ -1330,6 +1330,34 @@ static int __maybe_unused sysc_runtime_resume(struct device *dev) return error; }
+static int sysc_reinit_module(struct sysc *ddata, bool leave_enabled) +{ + struct device *dev = ddata->dev; + int error; + + /* Disable target module if it is enabled */ + if (ddata->enabled) { + error = sysc_runtime_suspend(dev); + if (error) + dev_warn(dev, "reinit suspend failed: %i\n", error); + } + + /* Enable target module */ + error = sysc_runtime_resume(dev); + if (error) + dev_warn(dev, "reinit resume failed: %i\n", error); + + if (leave_enabled) + return error; + + /* Disable target module if no leave_enabled was set */ + error = sysc_runtime_suspend(dev); + if (error) + dev_warn(dev, "reinit suspend failed: %i\n", error); + + return error; +} + static int __maybe_unused sysc_noirq_suspend(struct device *dev) { struct sysc *ddata; @@ -1340,12 +1368,18 @@ static int __maybe_unused sysc_noirq_suspend(struct device *dev) (SYSC_QUIRK_LEGACY_IDLE | SYSC_QUIRK_NO_IDLE)) return 0;
- return pm_runtime_force_suspend(dev); + if (!ddata->enabled) + return 0; + + ddata->needs_resume = 1; + + return sysc_runtime_suspend(dev); }
static int __maybe_unused sysc_noirq_resume(struct device *dev) { struct sysc *ddata; + int error = 0;
ddata = dev_get_drvdata(dev);
@@ -1353,7 +1387,19 @@ static int __maybe_unused sysc_noirq_resume(struct device *dev) (SYSC_QUIRK_LEGACY_IDLE | SYSC_QUIRK_NO_IDLE)) return 0;
- return pm_runtime_force_resume(dev); + if (ddata->cfg.quirks & SYSC_QUIRK_REINIT_ON_RESUME) { + error = sysc_reinit_module(ddata, ddata->needs_resume); + if (error) + dev_warn(dev, "noirq_resume failed: %i\n", error); + } else if (ddata->needs_resume) { + error = sysc_runtime_resume(dev); + if (error) + dev_warn(dev, "noirq_resume failed: %i\n", error); + } + + ddata->needs_resume = 0; + + return error; }
static const struct dev_pm_ops sysc_pm_ops = { @@ -1462,7 +1508,8 @@ static const struct sysc_revision_quirk sysc_revision_quirks[] = { SYSC_QUIRK("usb_otg_hs", 0, 0x400, 0x404, 0x408, 0x00000050, 0xffffffff, SYSC_QUIRK_SWSUP_SIDLE | SYSC_QUIRK_SWSUP_MSTANDBY), SYSC_QUIRK("usb_otg_hs", 0, 0, 0x10, -ENODEV, 0x4ea2080d, 0xffffffff, - SYSC_QUIRK_SWSUP_SIDLE | SYSC_QUIRK_SWSUP_MSTANDBY), + SYSC_QUIRK_SWSUP_SIDLE | SYSC_QUIRK_SWSUP_MSTANDBY | + SYSC_QUIRK_REINIT_ON_RESUME), SYSC_QUIRK("wdt", 0, 0, 0x10, 0x14, 0x502a0500, 0xfffff0f0, SYSC_MODULE_QUIRK_WDT), /* PRUSS on am3, am4 and am5 */ diff --git a/include/linux/platform_data/ti-sysc.h b/include/linux/platform_data/ti-sysc.h index fafc1beea504..9837fb011f2f 100644 --- a/include/linux/platform_data/ti-sysc.h +++ b/include/linux/platform_data/ti-sysc.h @@ -50,6 +50,7 @@ struct sysc_regbits { s8 emufree_shift; };
+#define SYSC_QUIRK_REINIT_ON_RESUME BIT(27) #define SYSC_QUIRK_GPMC_DEBUG BIT(26) #define SYSC_MODULE_QUIRK_ENA_RESETDONE BIT(25) #define SYSC_MODULE_QUIRK_PRUSS BIT(24)
From: Michael Walle michael@walle.cc
[ Upstream commit dabea675faf16e8682aa478ff3ce65dd775620bc ]
While enabling EDAC support for the LS1028A it was discovered that the memory node has a wrong endianness setting as well as a wrong interrupt assignment. Fix both.
This was tested on a sl28 board. To force ECC errors, you can use the error injection supported by the controller in hardware (with CONFIG_EDAC_DEBUG enabled):
# enable error injection $ echo 0x100 > /sys/devices/system/edac/mc/mc0/inject_ctrl # flip lowest bit of the data $ echo 0x1 > /sys/devices/system/edac/mc/mc0/inject_data_lo
Fixes: 8897f3255c9c ("arm64: dts: Add support for NXP LS1028A SoC") Signed-off-by: Michael Walle michael@walle.cc Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi index 62f4dcb96e70..f3b58bb9b840 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi @@ -192,8 +192,8 @@ ddr: memory-controller@1080000 { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0 0x1000>; - interrupts = <GIC_SPI 144 IRQ_TYPE_LEVEL_HIGH>; - big-endian; + interrupts = <GIC_SPI 17 IRQ_TYPE_LEVEL_HIGH>; + little-endian; };
dcfg: syscon@1e00000 {
From: Lucas Stach l.stach@pengutronix.de
[ Upstream commit ac0cbf9d13dccfd09bebc2f8f5697b6d3ffe27c4 ]
As this is a fixed regulator on the board there was no harm in the wrong voltage being specified, apart from a confusing reporting to userspace.
Fixes: 4a13b3bec3b4 ("arm64: dts: imx: add Zii Ultra board support") Signed-off-by: Lucas Stach l.stach@pengutronix.de Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi b/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi index fa7a041ffcfd..825c83c71a9f 100644 --- a/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi @@ -45,8 +45,8 @@ reg_12p0_main: regulator-12p0-main { compatible = "regulator-fixed"; regulator-name = "12V_MAIN"; - regulator-min-microvolt = <5000000>; - regulator-max-microvolt = <5000000>; + regulator-min-microvolt = <12000000>; + regulator-max-microvolt = <12000000>; regulator-always-on; };
From: Michael Walle michael@walle.cc
[ Upstream commit 25201269c6ec3e9398426962ccdd55428261f7d0 ]
During hardware validation it was noticed that the clock isn't continuously enabled when there is no link. This is because the 125MHz clock is derived from the internal PLL which seems to go into some kind of power-down mode every once in a while. The LS1028A expects a contiuous clock. Thus enable the PLL all the time.
Also, the RGMII pad voltage is wrong. It was configured to 2.5V (that is the VDDH regulator). The correct voltage is 1.8V, i.e. the VDDIO regulator.
This fix is for the freescale/fsl-ls1028a-kontron-sl28-var4.dts.
Fixes: 815364d0424e ("arm64: dts: freescale: add Kontron sl28 support") Signed-off-by: Michael Walle michael@walle.cc Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../boot/dts/freescale/fsl-ls1028a-kontron-sl28-var4.dts | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var4.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var4.dts index df212ed5bb94..e65d1c477e2c 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var4.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var4.dts @@ -31,11 +31,10 @@ reg = <0x4>; eee-broken-1000t; eee-broken-100tx; - qca,clk-out-frequency = <125000000>; qca,clk-out-strength = <AR803X_STRENGTH_FULL>; - - vddio-supply = <&vddh>; + qca,keep-pll-enabled; + vddio-supply = <&vddio>;
vddio: vddio-regulator { regulator-name = "VDDIO";
From: Fabio Estevam festevam@gmail.com
[ Upstream commit 7c8f0338cdacc90fdf6468adafa8e27952987f00 ]
According to Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml, the correct name of the property is 'fsl,tuning-step'.
Fix it accordingly.
Signed-off-by: Fabio Estevam festevam@gmail.com Fixes: ae7b3384b61b ("ARM: dts: Add support for 96Boards Meerkat96 board") Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/imx7d-meerkat96.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/imx7d-meerkat96.dts b/arch/arm/boot/dts/imx7d-meerkat96.dts index 5339210b63d0..dd8003bd1fc0 100644 --- a/arch/arm/boot/dts/imx7d-meerkat96.dts +++ b/arch/arm/boot/dts/imx7d-meerkat96.dts @@ -193,7 +193,7 @@ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_usdhc1>; keep-power-in-suspend; - tuning-step = <2>; + fsl,tuning-step = <2>; vmmc-supply = <®_3p3v>; no-1-8-v; broken-cd;
From: Fabio Estevam festevam@gmail.com
[ Upstream commit 0e2fa4959c4f44815ce33e46e4054eeb0f346053 ]
According to Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml, the correct name of the property is 'fsl,tuning-step'.
Fix it accordingly.
Signed-off-by: Fabio Estevam festevam@gmail.com Fixes: f13f571ac8a1 ("ARM: dts: imx7d-pico: Extend peripherals support") Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/imx7d-pico.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/imx7d-pico.dtsi b/arch/arm/boot/dts/imx7d-pico.dtsi index e57da0d32b98..e519897fae08 100644 --- a/arch/arm/boot/dts/imx7d-pico.dtsi +++ b/arch/arm/boot/dts/imx7d-pico.dtsi @@ -351,7 +351,7 @@ pinctrl-2 = <&pinctrl_usdhc1_200mhz>; cd-gpios = <&gpio5 0 GPIO_ACTIVE_LOW>; bus-width = <4>; - tuning-step = <2>; + fsl,tuning-step = <2>; vmmc-supply = <®_3p3v>; wakeup-source; no-1-8-v;
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit b73eb6b3b91ff7d76cff5f8c7ab92fe0c51e3829 ]
According to the DT bindings, #gpio-cells must be two.
Fixes: 63e71fedc07c4ece ("ARM: dts: Add support for emtrion emCON-MX6 series") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi b/arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi index 828cf3e39784..c4e146f3341b 100644 --- a/arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi +++ b/arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi @@ -126,7 +126,7 @@ compatible = "nxp,pca8574"; reg = <0x3a>; gpio-controller; - #gpio-cells = <1>; + #gpio-cells = <2>; }; };
From: Tony Lindgren tony@atomide.com
[ Upstream commit c8692ad416dcc420ce1b403596a425c8f4c2720b ]
Looks like the swsup_sidle_act quirk handling is unreliable for serial ports. The serial ports just eventually stop idling until woken up and re-idled again. As the serial port not idling blocks any deeper SoC idle states, it's adds an annoying random flakeyness for power management.
Let's just switch to swsup_sidle quirk instead like we already do for omap3 uarts. This means we manually idle the port instead of trying to use the hardware autoidle features when not in use.
For more details on why the serial ports have been using swsup_idle_act, see commit 66dde54e978a ("ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes"). It seems that the swsup_idle_act quirk handling is not enough though, and for example the TI Android kernel changed to using swsup_sidle with commit 77c34c84e1e0 ("OMAP4: HWMOD: UART1: disable smart-idle.").
Fixes: b4a9a7a38917 ("bus: ti-sysc: Handle swsup idle mode quirks") Cc: Carl Philipp Klemm philipp@uvos.xyz Cc: Ivan Jelincic parazyd@dyne.org Cc: Merlijn Wajer merlijn@wizzup.org Cc: Pavel Machek pavel@ucw.cz Cc: Sebastian Reichel sre@kernel.org Cc: Sicelo A. Mhlongo absicsz@gmail.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bus/ti-sysc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c index b7f8c6074a15..818dc7f54f03 100644 --- a/drivers/bus/ti-sysc.c +++ b/drivers/bus/ti-sysc.c @@ -1450,9 +1450,9 @@ static const struct sysc_revision_quirk sysc_revision_quirks[] = { SYSC_QUIRK_SWSUP_SIDLE | SYSC_QUIRK_LEGACY_IDLE), /* Uarts on omap4 and later */ SYSC_QUIRK("uart", 0, 0x50, 0x54, 0x58, 0x50411e03, 0xffff00ff, - SYSC_QUIRK_SWSUP_SIDLE_ACT | SYSC_QUIRK_LEGACY_IDLE), + SYSC_QUIRK_SWSUP_SIDLE | SYSC_QUIRK_LEGACY_IDLE), SYSC_QUIRK("uart", 0, 0x50, 0x54, 0x58, 0x47422e03, 0xffffffff, - SYSC_QUIRK_SWSUP_SIDLE_ACT | SYSC_QUIRK_LEGACY_IDLE), + SYSC_QUIRK_SWSUP_SIDLE | SYSC_QUIRK_LEGACY_IDLE),
/* Quirks that need to be set based on the module address */ SYSC_QUIRK("mcpdm", 0x40132000, 0, 0x10, -ENODEV, 0x50000800, 0xffffffff,
From: Hoang Le hoang.h.le@dektech.com.au
[ Upstream commit b83e214b2e04204f1fc674574362061492c37245 ]
Add extack error messages for -EINVAL errors when enabling bearer, getting/setting properties for a media/bearer
Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Hoang Le hoang.h.le@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/bearer.c | 50 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 10 deletions(-)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 650414110452..4d0e11623e5c 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -234,7 +234,8 @@ void tipc_bearer_remove_dest(struct net *net, u32 bearer_id, u32 dest) */ static int tipc_enable_bearer(struct net *net, const char *name, u32 disc_domain, u32 prio, - struct nlattr *attr[]) + struct nlattr *attr[], + struct netlink_ext_ack *extack) { struct tipc_net *tn = tipc_net(net); struct tipc_bearer_names b_names; @@ -248,17 +249,20 @@ static int tipc_enable_bearer(struct net *net, const char *name,
if (!bearer_name_validate(name, &b_names)) { errstr = "illegal name"; + NL_SET_ERR_MSG(extack, "Illegal name"); goto rejected; }
if (prio > TIPC_MAX_LINK_PRI && prio != TIPC_MEDIA_LINK_PRI) { errstr = "illegal priority"; + NL_SET_ERR_MSG(extack, "Illegal priority"); goto rejected; }
m = tipc_media_find(b_names.media_name); if (!m) { errstr = "media not registered"; + NL_SET_ERR_MSG(extack, "Media not registered"); goto rejected; }
@@ -272,6 +276,7 @@ static int tipc_enable_bearer(struct net *net, const char *name, break; if (!strcmp(name, b->name)) { errstr = "already enabled"; + NL_SET_ERR_MSG(extack, "Already enabled"); goto rejected; } bearer_id++; @@ -283,6 +288,7 @@ static int tipc_enable_bearer(struct net *net, const char *name, name, prio); if (prio == TIPC_MIN_LINK_PRI) { errstr = "cannot adjust to lower"; + NL_SET_ERR_MSG(extack, "Cannot adjust to lower"); goto rejected; } pr_warn("Bearer <%s>: trying with adjusted priority\n", name); @@ -293,6 +299,7 @@ static int tipc_enable_bearer(struct net *net, const char *name,
if (bearer_id >= MAX_BEARERS) { errstr = "max 3 bearers permitted"; + NL_SET_ERR_MSG(extack, "Max 3 bearers permitted"); goto rejected; }
@@ -306,6 +313,7 @@ static int tipc_enable_bearer(struct net *net, const char *name, if (res) { kfree(b); errstr = "failed to enable media"; + NL_SET_ERR_MSG(extack, "Failed to enable media"); goto rejected; }
@@ -322,6 +330,7 @@ static int tipc_enable_bearer(struct net *net, const char *name, if (res) { bearer_disable(net, b); errstr = "failed to create discoverer"; + NL_SET_ERR_MSG(extack, "Failed to create discoverer"); goto rejected; }
@@ -894,6 +903,7 @@ int tipc_nl_bearer_get(struct sk_buff *skb, struct genl_info *info) bearer = tipc_bearer_find(net, name); if (!bearer) { err = -EINVAL; + NL_SET_ERR_MSG(info->extack, "Bearer not found"); goto err_out; }
@@ -933,8 +943,10 @@ int __tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info) name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
bearer = tipc_bearer_find(net, name); - if (!bearer) + if (!bearer) { + NL_SET_ERR_MSG(info->extack, "Bearer not found"); return -EINVAL; + }
bearer_disable(net, bearer);
@@ -992,7 +1004,8 @@ int __tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info) prio = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); }
- return tipc_enable_bearer(net, bearer, domain, prio, attrs); + return tipc_enable_bearer(net, bearer, domain, prio, attrs, + info->extack); }
int tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info) @@ -1031,6 +1044,7 @@ int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info) b = tipc_bearer_find(net, name); if (!b) { rtnl_unlock(); + NL_SET_ERR_MSG(info->extack, "Bearer not found"); return -EINVAL; }
@@ -1071,8 +1085,10 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
b = tipc_bearer_find(net, name); - if (!b) + if (!b) { + NL_SET_ERR_MSG(info->extack, "Bearer not found"); return -EINVAL; + }
if (attrs[TIPC_NLA_BEARER_PROP]) { struct nlattr *props[TIPC_NLA_PROP_MAX + 1]; @@ -1091,12 +1107,18 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) if (props[TIPC_NLA_PROP_WIN]) b->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); if (props[TIPC_NLA_PROP_MTU]) { - if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) + if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) { + NL_SET_ERR_MSG(info->extack, + "MTU property is unsupported"); return -EINVAL; + } #ifdef CONFIG_TIPC_MEDIA_UDP if (tipc_udp_mtu_bad(nla_get_u32 - (props[TIPC_NLA_PROP_MTU]))) + (props[TIPC_NLA_PROP_MTU]))) { + NL_SET_ERR_MSG(info->extack, + "MTU value is out-of-range"); return -EINVAL; + } b->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]); tipc_node_apply_property(net, b, TIPC_NLA_PROP_MTU); #endif @@ -1224,6 +1246,7 @@ int tipc_nl_media_get(struct sk_buff *skb, struct genl_info *info) rtnl_lock(); media = tipc_media_find(name); if (!media) { + NL_SET_ERR_MSG(info->extack, "Media not found"); err = -EINVAL; goto err_out; } @@ -1260,9 +1283,10 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info) name = nla_data(attrs[TIPC_NLA_MEDIA_NAME]);
m = tipc_media_find(name); - if (!m) + if (!m) { + NL_SET_ERR_MSG(info->extack, "Media not found"); return -EINVAL; - + } if (attrs[TIPC_NLA_MEDIA_PROP]) { struct nlattr *props[TIPC_NLA_PROP_MAX + 1];
@@ -1278,12 +1302,18 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info) if (props[TIPC_NLA_PROP_WIN]) m->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); if (props[TIPC_NLA_PROP_MTU]) { - if (m->type_id != TIPC_MEDIA_TYPE_UDP) + if (m->type_id != TIPC_MEDIA_TYPE_UDP) { + NL_SET_ERR_MSG(info->extack, + "MTU property is unsupported"); return -EINVAL; + } #ifdef CONFIG_TIPC_MEDIA_UDP if (tipc_udp_mtu_bad(nla_get_u32 - (props[TIPC_NLA_PROP_MTU]))) + (props[TIPC_NLA_PROP_MTU]))) { + NL_SET_ERR_MSG(info->extack, + "MTU value is out-of-range"); return -EINVAL; + } m->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]); #endif }
From: Hoang Le hoang.h.le@dektech.com.au
[ Upstream commit f20a46c3044c3f75232b3d0e2d09af9b25efaf45 ]
When enabling a bearer by name, we don't sanity check its name with higher slot in bearer list. This may have the effect that the name of an already enabled bearer bypasses the check.
To fix the above issue, we just perform an extra checking with all existing bearers.
Fixes: cb30a63384bc9 ("tipc: refactor function tipc_enable_bearer()") Cc: stable@vger.kernel.org Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Hoang Le hoang.h.le@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/bearer.c | 46 +++++++++++++++++++++++++++------------------- 1 file changed, 27 insertions(+), 19 deletions(-)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 4d0e11623e5c..12e535b43d88 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -246,6 +246,7 @@ static int tipc_enable_bearer(struct net *net, const char *name, int bearer_id = 0; int res = -EINVAL; char *errstr = ""; + u32 i;
if (!bearer_name_validate(name, &b_names)) { errstr = "illegal name"; @@ -270,31 +271,38 @@ static int tipc_enable_bearer(struct net *net, const char *name, prio = m->priority;
/* Check new bearer vs existing ones and find free bearer id if any */ - while (bearer_id < MAX_BEARERS) { - b = rtnl_dereference(tn->bearer_list[bearer_id]); - if (!b) - break; + bearer_id = MAX_BEARERS; + i = MAX_BEARERS; + while (i-- != 0) { + b = rtnl_dereference(tn->bearer_list[i]); + if (!b) { + bearer_id = i; + continue; + } if (!strcmp(name, b->name)) { errstr = "already enabled"; NL_SET_ERR_MSG(extack, "Already enabled"); goto rejected; } - bearer_id++; - if (b->priority != prio) - continue; - if (++with_this_prio <= 2) - continue; - pr_warn("Bearer <%s>: already 2 bearers with priority %u\n", - name, prio); - if (prio == TIPC_MIN_LINK_PRI) { - errstr = "cannot adjust to lower"; - NL_SET_ERR_MSG(extack, "Cannot adjust to lower"); - goto rejected; + + if (b->priority == prio && + (++with_this_prio > 2)) { + pr_warn("Bearer <%s>: already 2 bearers with priority %u\n", + name, prio); + + if (prio == TIPC_MIN_LINK_PRI) { + errstr = "cannot adjust to lower"; + NL_SET_ERR_MSG(extack, "Cannot adjust to lower"); + goto rejected; + } + + pr_warn("Bearer <%s>: trying with adjusted priority\n", + name); + prio--; + bearer_id = MAX_BEARERS; + i = MAX_BEARERS; + with_this_prio = 1; } - pr_warn("Bearer <%s>: trying with adjusted priority\n", name); - prio--; - bearer_id = 0; - with_this_prio = 1; }
if (bearer_id >= MAX_BEARERS) {
From: Johan Hovold johan@kernel.org
[ Upstream commit e359b4411c2836cf87c8776682d1b594635570de ]
When DMA is enabled the receive handler runs in a threaded handler, but the primary handler up until very recently neither disabled interrupts in the device or used IRQF_ONESHOT. This would lead to a deadlock if an interrupt comes in while the threaded receive handler is running under the port lock.
Commit ad7676812437 ("serial: stm32: fix a deadlock condition with wakeup event") claimed to fix an unrelated deadlock, but unfortunately also disabled interrupts in the threaded handler. While this prevents the deadlock mentioned in the previous paragraph it also defeats the purpose of using a threaded handler in the first place.
Fix this by making the interrupt one-shot and not disabling interrupts in the threaded handler.
Note that (receive) DMA must not be used for a console port as the threaded handler could be interrupted while holding the port lock, something which could lead to a deadlock in case an interrupt handler ends up calling printk.
Fixes: ad7676812437 ("serial: stm32: fix a deadlock condition with wakeup event") Fixes: 3489187204eb ("serial: stm32: adding dma support") Cc: stable@vger.kernel.org # 4.9 Cc: Alexandre TORGUE alexandre.torgue@st.com Cc: Gerald Baeza gerald.baeza@st.com Reviewed-by: Valentin Caronvalentin.caron@foss.st.com Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20210416140557.25177-3-johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/stm32-usart.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c index 2cf9fc915510..844059861f9e 100644 --- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -213,14 +213,11 @@ static void stm32_usart_receive_chars(struct uart_port *port, bool threaded) struct tty_port *tport = &port->state->port; struct stm32_port *stm32_port = to_stm32_port(port); const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs; - unsigned long c, flags; + unsigned long c; u32 sr; char flag;
- if (threaded) - spin_lock_irqsave(&port->lock, flags); - else - spin_lock(&port->lock); + spin_lock(&port->lock);
while (stm32_usart_pending_rx(port, &sr, &stm32_port->last_res, threaded)) { @@ -277,10 +274,7 @@ static void stm32_usart_receive_chars(struct uart_port *port, bool threaded) uart_insert_char(port, sr, USART_SR_ORE, c, flag); }
- if (threaded) - spin_unlock_irqrestore(&port->lock, flags); - else - spin_unlock(&port->lock); + spin_unlock(&port->lock);
tty_flip_buffer_push(tport); } @@ -653,7 +647,8 @@ static int stm32_usart_startup(struct uart_port *port)
ret = request_threaded_irq(port->irq, stm32_usart_interrupt, stm32_usart_threaded_interrupt, - IRQF_NO_SUSPEND, name, port); + IRQF_ONESHOT | IRQF_NO_SUSPEND, + name, port); if (ret) return ret;
@@ -1126,6 +1121,13 @@ static int stm32_usart_of_dma_rx_probe(struct stm32_port *stm32port, struct dma_async_tx_descriptor *desc = NULL; int ret;
+ /* + * Using DMA and threaded handler for the console could lead to + * deadlocks. + */ + if (uart_console(port)) + return -ENODEV; + /* Request DMA RX channel */ stm32port->rx_ch = dma_request_slave_channel(dev, "rx"); if (!stm32port->rx_ch) {
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit 772d7891e8b3b0baae7bb88a294d61fd07ba6d15 ]
Running "make" on an already compiled kernel tree will rebuild the kernel even without any modifications:
CALL linux/scripts/checksyscalls.sh CALL linux/scripts/atomic/check-atomics.sh CHK include/generated/compile.h SO2S arch/riscv/kernel/vdso/vdso-syms.S AS arch/riscv/kernel/vdso/vdso-syms.o AR arch/riscv/kernel/vdso/built-in.a AR arch/riscv/kernel/built-in.a AR arch/riscv/built-in.a GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.a LD vmlinux.o
The reason is "Any target that utilizes if_changed must be listed in $(targets), otherwise the command line check will fail, and the target will always be built" as explained by Documentation/kbuild/makefiles.rst
Fix this build bug by adding vdso-syms.S to $(targets)
At the same time, there are two trivial clean up modifications:
- the vdso-dummy.o is not needed any more after so remove it.
- vdso.lds is a generated file, so it should be prefixed with $(obj)/ instead of $(src)/
Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+") Cc: stable@vger.kernel.org Signed-off-by: Jisheng Zhang jszhang@kernel.org Signed-off-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kernel/vdso/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index ca2b40dfd24b..24d936c147cd 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -23,7 +23,7 @@ ifneq ($(c-gettimeofday-y),) endif
# Build rules -targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-dummy.o +targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-syms.S obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
obj-y += vdso.o vdso-syms.o @@ -41,7 +41,7 @@ KASAN_SANITIZE := n $(obj)/vdso.o: $(obj)/vdso.so
# link rule for the .so file, .lds has to be first -$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE +$(obj)/vdso.so.dbg: $(obj)/vdso.lds $(obj-vdso) FORCE $(call if_changed,vdsold) LDFLAGS_vdso.so.dbg = -shared -s -soname=linux-vdso.so.1 \ --build-id=sha1 --hash-style=both --eh-frame-hdr
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit a298232ee6b9a1d5d732aa497ff8be0d45b5bd82 ]
WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 Call Trace: __refcount_sub_and_test include/linux/refcount.h:283 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] io_put_req fs/io_uring.c:2140 [inline] io_queue_linked_timeout fs/io_uring.c:6300 [inline] __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354 io_submit_sqe fs/io_uring.c:6534 [inline] io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline] __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182
io_link_timeout_fn() should put only one reference of the linked timeout request, however in case of racing with the master request's completion first io_req_complete() puts one and then io_put_req_deferred() is called.
Cc: stable@vger.kernel.org # 5.12+ Fixes: 9ae1f8dd372e0 ("io_uring: fix inconsistent lock state") Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.162041762... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 369ec81033d6..958c463c11eb 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6266,6 +6266,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) if (prev) { io_async_find_and_cancel(ctx, req, prev->user_data, -ETIME); io_put_req_deferred(prev, 1); + io_put_req_deferred(req, 1); } else { io_cqring_add_event(req, -ETIME, 0); io_put_req_deferred(req, 1);
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 8c3f9cd1603d0e4af6c50ebc6d974ab7bdd03cf4 ]
__io_cqring_fill_event() takes cflags as long to squeeze it into u32 in an CQE, awhile all users pass int or unsigned. Replace it with unsigned int and store it as u32 in struct io_completion to match CQE.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 958c463c11eb..fdbaaf579cc6 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -545,7 +545,7 @@ struct io_statx { struct io_completion { struct file *file; struct list_head list; - int cflags; + u32 cflags; };
struct io_async_connect { @@ -1711,7 +1711,8 @@ static void io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force, } }
-static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags) +static void __io_cqring_fill_event(struct io_kiocb *req, long res, + unsigned int cflags) { struct io_ring_ctx *ctx = req->ctx; struct io_uring_cqe *cqe;
From: James Zhu James.Zhu@amd.com
[ Upstream commit 4a62542ae064e3b645d6bbf2295a6c05136956c6 ]
Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating.
Signed-off-by: James Zhu James.Zhu@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 700621ddc02e..c9c888be1228 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -345,15 +345,14 @@ done: static int vcn_v3_0_hw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; - struct amdgpu_ring *ring; int i;
+ cancel_delayed_work_sync(&adev->vcn.idle_work); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue;
- ring = &adev->vcn.inst[i].ring_dec; - if (!amdgpu_sriov_vf(adev)) { if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
From: James Zhu James.Zhu@amd.com
[ Upstream commit 23f10a571da5eaa63b7845d16e2f49837e841ab9 ]
Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating.
Signed-off-by: James Zhu James.Zhu@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c index 63b350182389..8c84e35c2719 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c @@ -187,14 +187,14 @@ static int jpeg_v2_5_hw_init(void *handle) static int jpeg_v2_5_hw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; - struct amdgpu_ring *ring; int i;
+ cancel_delayed_work_sync(&adev->vcn.idle_work); + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { if (adev->jpeg.harvest_config & (1 << i)) continue;
- ring = &adev->jpeg.inst[i].ring_dec; if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, i, mmUVD_JRBC_STATUS)) jpeg_v2_5_set_powergating_state(adev, AMD_PG_STATE_GATE);
From: James Zhu James.Zhu@amd.com
[ Upstream commit 20ebbfd22f8115a1e4f60d3d289f66be4d47f1ec ]
Add cancel_delayed_work_sync before set power gating state to avoid race condition issue when power gating.
Signed-off-by: James Zhu James.Zhu@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c index 9259e35f0f55..e00c88abeaed 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c @@ -159,9 +159,9 @@ static int jpeg_v3_0_hw_init(void *handle) static int jpeg_v3_0_hw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; - struct amdgpu_ring *ring;
- ring = &adev->jpeg.inst->ring_dec; + cancel_delayed_work_sync(&adev->vcn.idle_work); + if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, 0, mmUVD_JRBC_STATUS)) jpeg_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
From: Lin Ma linma@zju.edu.cn
commit 6a137caec23aeb9e036cdfd8a46dd8a366460e5d upstream.
In the cleanup routine for failed initialization of HCI device, the flush_work(&hdev->rx_work) need to be finished before the flush_work(&hdev->cmd_work). Otherwise, the hci_rx_work() can possibly invoke new cmd_work and cause a bug, like double free, in late processings.
This was assigned CVE-2021-3564.
This patch reorder the flush_work() to fix this bug.
Cc: Marcel Holtmann marcel@holtmann.org Cc: Johan Hedberg johan.hedberg@gmail.com Cc: Luiz Augusto von Dentz luiz.dentz@gmail.com Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: Hao Xiong mart1n@zju.edu.cn Cc: stable stable@vger.kernel.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1602,8 +1602,13 @@ setup_failed: } else { /* Init failed, cleanup */ flush_work(&hdev->tx_work); - flush_work(&hdev->cmd_work); + + /* Since hci_rx_work() is possible to awake new cmd_work + * it should be flushed first to avoid unexpected call of + * hci_cmd_work() + */ flush_work(&hdev->rx_work); + flush_work(&hdev->cmd_work);
skb_queue_purge(&hdev->cmd_q); skb_queue_purge(&hdev->rx_q);
From: Lin Ma linma@zju.edu.cn
commit e305509e678b3a4af2b3cfd410f409f7cdaabb52 upstream.
The hci_sock_dev_event() function will cleanup the hdev object for sockets even if this object may still be in used within the hci_sock_bound_ioctl() function, result in UAF vulnerability.
This patch replace the BH context lock to serialize these affairs and prevent the race condition.
Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_sock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -762,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev * /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - bh_lock_sock_nested(sk); + lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; @@ -771,7 +771,7 @@ void hci_sock_dev_event(struct hci_dev *
hci_dev_put(hdev); } - bh_unlock_sock(sk); + release_sock(sk); } read_unlock(&hci_sk_list.lock); }
From: Jason A. Donenfeld Jason@zx2c4.com
commit cc5060ca0285efe2728bced399a1955a7ce808b2 upstream.
Apparently, various versions of gcc have O3-related miscompiles. Looking at the difference between -O2 and -O3 for gcc 11 doesn't indicate miscompiles, but the difference also doesn't seem so significant for performance that it's worth risking.
Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKW... Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1... Reported-by: Linus Torvalds torvalds@linux-foundation.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/wireguard/Makefile +++ b/drivers/net/wireguard/Makefile @@ -1,5 +1,4 @@ -ccflags-y := -O3 -ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt' +ccflags-y := -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt' ccflags-$(CONFIG_WIREGUARD_DEBUG) += -DDEBUG wireguard-y := main.o wireguard-y += noise.o
From: Jason A. Donenfeld Jason@zx2c4.com
commit a4e9f8e3287c9eb6bf70df982870980dd3341863 upstream.
With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo.
Fixes: 8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers") Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/main.c | 7 +++++++ drivers/net/wireguard/peer.c | 21 +++++++++++++++++---- drivers/net/wireguard/peer.h | 3 +++ 3 files changed, 27 insertions(+), 4 deletions(-)
--- a/drivers/net/wireguard/main.c +++ b/drivers/net/wireguard/main.c @@ -28,6 +28,10 @@ static int __init mod_init(void) #endif wg_noise_init();
+ ret = wg_peer_init(); + if (ret < 0) + goto err_peer; + ret = wg_device_init(); if (ret < 0) goto err_device; @@ -44,6 +48,8 @@ static int __init mod_init(void) err_netlink: wg_device_uninit(); err_device: + wg_peer_uninit(); +err_peer: return ret; }
@@ -51,6 +57,7 @@ static void __exit mod_exit(void) { wg_genetlink_uninit(); wg_device_uninit(); + wg_peer_uninit(); }
module_init(mod_init); --- a/drivers/net/wireguard/peer.c +++ b/drivers/net/wireguard/peer.c @@ -15,6 +15,7 @@ #include <linux/rcupdate.h> #include <linux/list.h>
+static struct kmem_cache *peer_cache; static atomic64_t peer_counter = ATOMIC64_INIT(0);
struct wg_peer *wg_peer_create(struct wg_device *wg, @@ -29,10 +30,10 @@ struct wg_peer *wg_peer_create(struct wg if (wg->num_peers >= MAX_PEERS_PER_DEVICE) return ERR_PTR(ret);
- peer = kzalloc(sizeof(*peer), GFP_KERNEL); + peer = kmem_cache_zalloc(peer_cache, GFP_KERNEL); if (unlikely(!peer)) return ERR_PTR(ret); - if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL)) + if (unlikely(dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))) goto err;
peer->device = wg; @@ -64,7 +65,7 @@ struct wg_peer *wg_peer_create(struct wg return peer;
err: - kfree(peer); + kmem_cache_free(peer_cache, peer); return ERR_PTR(ret); }
@@ -193,7 +194,8 @@ static void rcu_release(struct rcu_head /* The final zeroing takes care of clearing any remaining handshake key * material and other potentially sensitive information. */ - kfree_sensitive(peer); + memzero_explicit(peer, sizeof(*peer)); + kmem_cache_free(peer_cache, peer); }
static void kref_release(struct kref *refcount) @@ -225,3 +227,14 @@ void wg_peer_put(struct wg_peer *peer) return; kref_put(&peer->refcount, kref_release); } + +int __init wg_peer_init(void) +{ + peer_cache = KMEM_CACHE(wg_peer, 0); + return peer_cache ? 0 : -ENOMEM; +} + +void wg_peer_uninit(void) +{ + kmem_cache_destroy(peer_cache); +} --- a/drivers/net/wireguard/peer.h +++ b/drivers/net/wireguard/peer.h @@ -80,4 +80,7 @@ void wg_peer_put(struct wg_peer *peer); void wg_peer_remove(struct wg_peer *peer); void wg_peer_remove_all(struct wg_device *wg);
+int wg_peer_init(void); +void wg_peer_uninit(void); + #endif /* _WG_PEER_H */
From: Jason A. Donenfeld Jason@zx2c4.com
commit 24b70eeeb4f46c09487f8155239ebfb1f875774a upstream.
Many of the synchronization points are sometimes called under the rtnl lock, which means we should use synchronize_net rather than synchronize_rcu. Under the hood, this expands to using the expedited flavor of function in the event that rtnl is held, in order to not stall other concurrent changes.
This fixes some very, very long delays when removing multiple peers at once, which would cause some operations to take several minutes.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/peer.c | 6 +++--- drivers/net/wireguard/socket.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/wireguard/peer.c +++ b/drivers/net/wireguard/peer.c @@ -89,7 +89,7 @@ static void peer_make_dead(struct wg_pee /* Mark as dead, so that we don't allow jumping contexts after. */ WRITE_ONCE(peer->is_dead, true);
- /* The caller must now synchronize_rcu() for this to take effect. */ + /* The caller must now synchronize_net() for this to take effect. */ }
static void peer_remove_after_dead(struct wg_peer *peer) @@ -161,7 +161,7 @@ void wg_peer_remove(struct wg_peer *peer lockdep_assert_held(&peer->device->device_update_lock);
peer_make_dead(peer); - synchronize_rcu(); + synchronize_net(); peer_remove_after_dead(peer); }
@@ -179,7 +179,7 @@ void wg_peer_remove_all(struct wg_device peer_make_dead(peer); list_add_tail(&peer->peer_list, &dead_peers); } - synchronize_rcu(); + synchronize_net(); list_for_each_entry_safe(peer, temp, &dead_peers, peer_list) peer_remove_after_dead(peer); } --- a/drivers/net/wireguard/socket.c +++ b/drivers/net/wireguard/socket.c @@ -430,7 +430,7 @@ void wg_socket_reinit(struct wg_device * if (new4) wg->incoming_port = ntohs(inet_sk(new4)->inet_sport); mutex_unlock(&wg->socket_update_lock); - synchronize_rcu(); + synchronize_net(); sock_free(old4); sock_free(old6); }
From: Jason A. Donenfeld Jason@zx2c4.com
commit acf2492b51c9a3c4dfb947f4d3477a86d315150f upstream.
On recent kernels, this config symbol is no longer used.
Reported-by: Rui Salvaterra rsalvaterra@gmail.com Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/wireguard/qemu/kernel.config | 1 - 1 file changed, 1 deletion(-)
--- a/tools/testing/selftests/wireguard/qemu/kernel.config +++ b/tools/testing/selftests/wireguard/qemu/kernel.config @@ -19,7 +19,6 @@ CONFIG_NETFILTER_XTABLES=y CONFIG_NETFILTER_XT_NAT=y CONFIG_NETFILTER_XT_MATCH_LENGTH=y CONFIG_NETFILTER_XT_MARK=y -CONFIG_NF_CONNTRACK_IPV4=y CONFIG_NF_NAT_IPV4=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_FILTER=y
From: Jason A. Donenfeld Jason@zx2c4.com
commit f8873d11d4121aad35024f9379e431e0c83abead upstream.
Some distros may enable strict rp_filter by default, which will prevent vethc from receiving the packets with an unrouteable reverse path address.
Reported-by: Hangbin Liu liuhangbin@gmail.com Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/wireguard/netns.sh | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -363,6 +363,7 @@ ip1 -6 rule add table main suppress_pref ip1 -4 route add default dev wg0 table 51820 ip1 -4 rule add not fwmark 51820 table 51820 ip1 -4 rule add table main suppress_prefixlength 0 +n1 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/vethc/rp_filter' # Flood the pings instead of sending just one, to trigger routing table reference counting bugs. n1 ping -W 1 -c 100 -f 192.168.99.7 n1 ping -W 1 -c 100 -f abab::1111
From: Jason A. Donenfeld Jason@zx2c4.com
commit 46cfe8eee285cde465b420637507884551f5d7ca upstream.
The randomized trie tests weren't initializing the dummy peer list head, resulting in a NULL pointer dereference when used. Fix this by initializing it in the randomized trie test, just like we do for the static unit test.
While we're at it, all of the other strings like this have the word "self-test", so add it to the missing place here.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/selftest/allowedips.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/wireguard/selftest/allowedips.c +++ b/drivers/net/wireguard/selftest/allowedips.c @@ -296,6 +296,7 @@ static __init bool randomized_test(void) goto free; } kref_init(&peers[i]->refcount); + INIT_LIST_HEAD(&peers[i]->allowedips_list); }
mutex_lock(&mutex); @@ -333,7 +334,7 @@ static __init bool randomized_test(void) if (wg_allowedips_insert_v4(&t, (struct in_addr *)mutated, cidr, peer, &mutex) < 0) { - pr_err("allowedips random malloc: FAIL\n"); + pr_err("allowedips random self-test malloc: FAIL\n"); goto free_locked; } if (horrible_allowedips_insert_v4(&h,
From: Jason A. Donenfeld Jason@zx2c4.com
commit f634f418c227c912e7ea95a3299efdc9b10e4022 upstream.
Previously, deleting peers would require traversing the entire trie in order to rebalance nodes and safely free them. This meant that removing 1000 peers from a trie with a half million nodes would take an extremely long time, during which we're holding the rtnl lock. Large-scale users were reporting 200ms latencies added to the networking stack as a whole every time their userspace software would queue up significant removals. That's a serious situation.
This commit fixes that by maintaining a double pointer to the parent's bit pointer for each node, and then using the already existing node list belonging to each peer to go directly to the node, fix up its pointers, and free it with RCU. This means removal is O(1) instead of O(n), and we don't use gobs of stack.
The removal algorithm has the same downside as the code that it fixes: it won't collapse needlessly long runs of fillers. We can enhance that in the future if it ever becomes a problem. This commit documents that limitation with a TODO comment in code, a small but meaningful improvement over the prior situation.
Currently the biggest flaw, which the next commit addresses, is that because this increases the node size on 64-bit machines from 60 bytes to 68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up using twice as much memory per node, because of power-of-two allocations, which is a big bummer. We'll need to figure something out there.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/allowedips.c | 130 +++++++++++++++---------------------- drivers/net/wireguard/allowedips.h | 9 -- 2 files changed, 56 insertions(+), 83 deletions(-)
--- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -66,60 +66,6 @@ static void root_remove_peer_lists(struc } }
-static void walk_remove_by_peer(struct allowedips_node __rcu **top, - struct wg_peer *peer, struct mutex *lock) -{ -#define REF(p) rcu_access_pointer(p) -#define DEREF(p) rcu_dereference_protected(*(p), lockdep_is_held(lock)) -#define PUSH(p) ({ \ - WARN_ON(IS_ENABLED(DEBUG) && len >= 128); \ - stack[len++] = p; \ - }) - - struct allowedips_node __rcu **stack[128], **nptr; - struct allowedips_node *node, *prev; - unsigned int len; - - if (unlikely(!peer || !REF(*top))) - return; - - for (prev = NULL, len = 0, PUSH(top); len > 0; prev = node) { - nptr = stack[len - 1]; - node = DEREF(nptr); - if (!node) { - --len; - continue; - } - if (!prev || REF(prev->bit[0]) == node || - REF(prev->bit[1]) == node) { - if (REF(node->bit[0])) - PUSH(&node->bit[0]); - else if (REF(node->bit[1])) - PUSH(&node->bit[1]); - } else if (REF(node->bit[0]) == prev) { - if (REF(node->bit[1])) - PUSH(&node->bit[1]); - } else { - if (rcu_dereference_protected(node->peer, - lockdep_is_held(lock)) == peer) { - RCU_INIT_POINTER(node->peer, NULL); - list_del_init(&node->peer_list); - if (!node->bit[0] || !node->bit[1]) { - rcu_assign_pointer(*nptr, DEREF( - &node->bit[!REF(node->bit[0])])); - kfree_rcu(node, rcu); - node = DEREF(nptr); - } - } - --len; - } - } - -#undef REF -#undef DEREF -#undef PUSH -} - static unsigned int fls128(u64 a, u64 b) { return a ? fls64(a) + 64U : fls64(b); @@ -224,6 +170,7 @@ static int add(struct allowedips_node __ RCU_INIT_POINTER(node->peer, peer); list_add_tail(&node->peer_list, &peer->allowedips_list); copy_and_assign_cidr(node, key, cidr, bits); + rcu_assign_pointer(node->parent_bit, trie); rcu_assign_pointer(*trie, node); return 0; } @@ -243,9 +190,9 @@ static int add(struct allowedips_node __ if (!node) { down = rcu_dereference_protected(*trie, lockdep_is_held(lock)); } else { - down = rcu_dereference_protected(CHOOSE_NODE(node, key), - lockdep_is_held(lock)); + down = rcu_dereference_protected(CHOOSE_NODE(node, key), lockdep_is_held(lock)); if (!down) { + rcu_assign_pointer(newnode->parent_bit, &CHOOSE_NODE(node, key)); rcu_assign_pointer(CHOOSE_NODE(node, key), newnode); return 0; } @@ -254,29 +201,37 @@ static int add(struct allowedips_node __ parent = node;
if (newnode->cidr == cidr) { + rcu_assign_pointer(down->parent_bit, &CHOOSE_NODE(newnode, down->bits)); rcu_assign_pointer(CHOOSE_NODE(newnode, down->bits), down); - if (!parent) + if (!parent) { + rcu_assign_pointer(newnode->parent_bit, trie); rcu_assign_pointer(*trie, newnode); - else - rcu_assign_pointer(CHOOSE_NODE(parent, newnode->bits), - newnode); - } else { - node = kzalloc(sizeof(*node), GFP_KERNEL); - if (unlikely(!node)) { - list_del(&newnode->peer_list); - kfree(newnode); - return -ENOMEM; + } else { + rcu_assign_pointer(newnode->parent_bit, &CHOOSE_NODE(parent, newnode->bits)); + rcu_assign_pointer(CHOOSE_NODE(parent, newnode->bits), newnode); } - INIT_LIST_HEAD(&node->peer_list); - copy_and_assign_cidr(node, newnode->bits, cidr, bits); + return 0; + } + + node = kzalloc(sizeof(*node), GFP_KERNEL); + if (unlikely(!node)) { + list_del(&newnode->peer_list); + kfree(newnode); + return -ENOMEM; + } + INIT_LIST_HEAD(&node->peer_list); + copy_and_assign_cidr(node, newnode->bits, cidr, bits);
- rcu_assign_pointer(CHOOSE_NODE(node, down->bits), down); - rcu_assign_pointer(CHOOSE_NODE(node, newnode->bits), newnode); - if (!parent) - rcu_assign_pointer(*trie, node); - else - rcu_assign_pointer(CHOOSE_NODE(parent, node->bits), - node); + rcu_assign_pointer(down->parent_bit, &CHOOSE_NODE(node, down->bits)); + rcu_assign_pointer(CHOOSE_NODE(node, down->bits), down); + rcu_assign_pointer(newnode->parent_bit, &CHOOSE_NODE(node, newnode->bits)); + rcu_assign_pointer(CHOOSE_NODE(node, newnode->bits), newnode); + if (!parent) { + rcu_assign_pointer(node->parent_bit, trie); + rcu_assign_pointer(*trie, node); + } else { + rcu_assign_pointer(node->parent_bit, &CHOOSE_NODE(parent, node->bits)); + rcu_assign_pointer(CHOOSE_NODE(parent, node->bits), node); } return 0; } @@ -335,9 +290,30 @@ int wg_allowedips_insert_v6(struct allow void wg_allowedips_remove_by_peer(struct allowedips *table, struct wg_peer *peer, struct mutex *lock) { + struct allowedips_node *node, *child, *tmp; + + if (list_empty(&peer->allowedips_list)) + return; ++table->seq; - walk_remove_by_peer(&table->root4, peer, lock); - walk_remove_by_peer(&table->root6, peer, lock); + list_for_each_entry_safe(node, tmp, &peer->allowedips_list, peer_list) { + list_del_init(&node->peer_list); + RCU_INIT_POINTER(node->peer, NULL); + if (node->bit[0] && node->bit[1]) + continue; + child = rcu_dereference_protected( + node->bit[!rcu_access_pointer(node->bit[0])], + lockdep_is_held(lock)); + if (child) + child->parent_bit = node->parent_bit; + *rcu_dereference_protected(node->parent_bit, lockdep_is_held(lock)) = child; + kfree_rcu(node, rcu); + + /* TODO: Note that we currently don't walk up and down in order to + * free any potential filler nodes. This means that this function + * doesn't free up as much as it could, which could be revisited + * at some point. + */ + } }
int wg_allowedips_read_node(struct allowedips_node *node, u8 ip[16], u8 *cidr) --- a/drivers/net/wireguard/allowedips.h +++ b/drivers/net/wireguard/allowedips.h @@ -15,14 +15,11 @@ struct wg_peer; struct allowedips_node { struct wg_peer __rcu *peer; struct allowedips_node __rcu *bit[2]; - /* While it may seem scandalous that we waste space for v4, - * we're alloc'ing to the nearest power of 2 anyway, so this - * doesn't actually make a difference. - */ - u8 bits[16] __aligned(__alignof(u64)); u8 cidr, bit_at_a, bit_at_b, bitlen; + u8 bits[16] __aligned(__alignof(u64));
- /* Keep rarely used list at bottom to be beyond cache line. */ + /* Keep rarely used members at bottom to be beyond cache line. */ + struct allowedips_node *__rcu *parent_bit; /* XXX: this puts us at 68->128 bytes instead of 60->64 bytes!! */ union { struct list_head peer_list; struct rcu_head rcu;
From: Jason A. Donenfeld Jason@zx2c4.com
commit dc680de28ca849dfe589dc15ac56d22505f0ef11 upstream.
The previous commit moved from O(n) to O(1) for removal, but in the process introduced an additional pointer member to a struct that increased the size from 60 to 68 bytes, putting nodes in the 128-byte slab. With deployed systems having as many as 2 million nodes, this represents a significant doubling in memory usage (128 MiB -> 256 MiB). Fix this by using our own kmem_cache, that's sized exactly right. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/allowedips.c | 31 +++++++++++++++++++++++++------ drivers/net/wireguard/allowedips.h | 5 ++++- drivers/net/wireguard/main.c | 10 +++++++++- 3 files changed, 38 insertions(+), 8 deletions(-)
--- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -6,6 +6,8 @@ #include "allowedips.h" #include "peer.h"
+static struct kmem_cache *node_cache; + static void swap_endian(u8 *dst, const u8 *src, u8 bits) { if (bits == 32) { @@ -40,6 +42,11 @@ static void push_rcu(struct allowedips_n } }
+static void node_free_rcu(struct rcu_head *rcu) +{ + kmem_cache_free(node_cache, container_of(rcu, struct allowedips_node, rcu)); +} + static void root_free_rcu(struct rcu_head *rcu) { struct allowedips_node *node, *stack[128] = { @@ -49,7 +56,7 @@ static void root_free_rcu(struct rcu_hea while (len > 0 && (node = stack[--len])) { push_rcu(stack, node->bit[0], &len); push_rcu(stack, node->bit[1], &len); - kfree(node); + kmem_cache_free(node_cache, node); } }
@@ -164,7 +171,7 @@ static int add(struct allowedips_node __ return -EINVAL;
if (!rcu_access_pointer(*trie)) { - node = kzalloc(sizeof(*node), GFP_KERNEL); + node = kmem_cache_zalloc(node_cache, GFP_KERNEL); if (unlikely(!node)) return -ENOMEM; RCU_INIT_POINTER(node->peer, peer); @@ -180,7 +187,7 @@ static int add(struct allowedips_node __ return 0; }
- newnode = kzalloc(sizeof(*newnode), GFP_KERNEL); + newnode = kmem_cache_zalloc(node_cache, GFP_KERNEL); if (unlikely(!newnode)) return -ENOMEM; RCU_INIT_POINTER(newnode->peer, peer); @@ -213,10 +220,10 @@ static int add(struct allowedips_node __ return 0; }
- node = kzalloc(sizeof(*node), GFP_KERNEL); + node = kmem_cache_zalloc(node_cache, GFP_KERNEL); if (unlikely(!node)) { list_del(&newnode->peer_list); - kfree(newnode); + kmem_cache_free(node_cache, newnode); return -ENOMEM; } INIT_LIST_HEAD(&node->peer_list); @@ -306,7 +313,7 @@ void wg_allowedips_remove_by_peer(struct if (child) child->parent_bit = node->parent_bit; *rcu_dereference_protected(node->parent_bit, lockdep_is_held(lock)) = child; - kfree_rcu(node, rcu); + call_rcu(&node->rcu, node_free_rcu);
/* TODO: Note that we currently don't walk up and down in order to * free any potential filler nodes. This means that this function @@ -350,4 +357,16 @@ struct wg_peer *wg_allowedips_lookup_src return NULL; }
+int __init wg_allowedips_slab_init(void) +{ + node_cache = KMEM_CACHE(allowedips_node, 0); + return node_cache ? 0 : -ENOMEM; +} + +void wg_allowedips_slab_uninit(void) +{ + rcu_barrier(); + kmem_cache_destroy(node_cache); +} + #include "selftest/allowedips.c" --- a/drivers/net/wireguard/allowedips.h +++ b/drivers/net/wireguard/allowedips.h @@ -19,7 +19,7 @@ struct allowedips_node { u8 bits[16] __aligned(__alignof(u64));
/* Keep rarely used members at bottom to be beyond cache line. */ - struct allowedips_node *__rcu *parent_bit; /* XXX: this puts us at 68->128 bytes instead of 60->64 bytes!! */ + struct allowedips_node *__rcu *parent_bit; union { struct list_head peer_list; struct rcu_head rcu; @@ -53,4 +53,7 @@ struct wg_peer *wg_allowedips_lookup_src bool wg_allowedips_selftest(void); #endif
+int wg_allowedips_slab_init(void); +void wg_allowedips_slab_uninit(void); + #endif /* _WG_ALLOWEDIPS_H */ --- a/drivers/net/wireguard/main.c +++ b/drivers/net/wireguard/main.c @@ -21,10 +21,15 @@ static int __init mod_init(void) { int ret;
+ ret = wg_allowedips_slab_init(); + if (ret < 0) + goto err_allowedips; + #ifdef DEBUG + ret = -ENOTRECOVERABLE; if (!wg_allowedips_selftest() || !wg_packet_counter_selftest() || !wg_ratelimiter_selftest()) - return -ENOTRECOVERABLE; + goto err_peer; #endif wg_noise_init();
@@ -50,6 +55,8 @@ err_netlink: err_device: wg_peer_uninit(); err_peer: + wg_allowedips_slab_uninit(); +err_allowedips: return ret; }
@@ -58,6 +65,7 @@ static void __exit mod_exit(void) wg_genetlink_uninit(); wg_device_uninit(); wg_peer_uninit(); + wg_allowedips_slab_uninit(); }
module_init(mod_init);
From: Jason A. Donenfeld Jason@zx2c4.com
commit bf7b042dc62a31f66d3a41dd4dfc7806f267b307 upstream.
When removing single nodes, it's possible that that node's parent is an empty intermediate node, in which case, it too should be removed. Otherwise the trie fills up and never is fully emptied, leading to gradual memory leaks over time for tries that are modified often. There was originally code to do this, but was removed during refactoring in 2016 and never reworked. Now that we have proper parent pointers from the previous commits, we can implement this properly.
In order to reduce branching and expensive comparisons, we want to keep the double pointer for parent assignment (which lets us easily chain up to the root), but we still need to actually get the parent's base address. So encode the bit number into the last two bits of the pointer, and pack and unpack it as needed. This is a little bit clumsy but is the fastest and less memory wasteful of the compromises. Note that we align the root struct here to a minimum of 4, because it's embedded into a larger struct, and we're relying on having the bottom two bits for our flag, which would only be 16-bit aligned on m68k.
The existing macro-based helpers were a bit unwieldy for adding the bit packing to, so this commit replaces them with safer and clearer ordinary functions.
We add a test to the randomized/fuzzer part of the selftests, to free the randomized tries by-peer, refuzz it, and repeat, until it's supposed to be empty, and then then see if that actually resulted in the whole thing being emptied. That combined with kmemcheck should hopefully make sure this commit is doing what it should. Along the way this resulted in various other cleanups of the tests and fixes for recent graphviz.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/allowedips.c | 102 ++++++++++------- drivers/net/wireguard/allowedips.h | 4 drivers/net/wireguard/selftest/allowedips.c | 162 +++++++++++++--------------- 3 files changed, 137 insertions(+), 131 deletions(-)
--- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -30,8 +30,11 @@ static void copy_and_assign_cidr(struct node->bitlen = bits; memcpy(node->bits, src, bits / 8U); } -#define CHOOSE_NODE(parent, key) \ - parent->bit[(key[parent->bit_at_a] >> parent->bit_at_b) & 1] + +static inline u8 choose(struct allowedips_node *node, const u8 *key) +{ + return (key[node->bit_at_a] >> node->bit_at_b) & 1; +}
static void push_rcu(struct allowedips_node **stack, struct allowedips_node __rcu *p, unsigned int *len) @@ -112,7 +115,7 @@ static struct allowedips_node *find_node found = node; if (node->cidr == bits) break; - node = rcu_dereference_bh(CHOOSE_NODE(node, key)); + node = rcu_dereference_bh(node->bit[choose(node, key)]); } return found; } @@ -144,8 +147,7 @@ static bool node_placement(struct allowe u8 cidr, u8 bits, struct allowedips_node **rnode, struct mutex *lock) { - struct allowedips_node *node = rcu_dereference_protected(trie, - lockdep_is_held(lock)); + struct allowedips_node *node = rcu_dereference_protected(trie, lockdep_is_held(lock)); struct allowedips_node *parent = NULL; bool exact = false;
@@ -155,13 +157,24 @@ static bool node_placement(struct allowe exact = true; break; } - node = rcu_dereference_protected(CHOOSE_NODE(parent, key), - lockdep_is_held(lock)); + node = rcu_dereference_protected(parent->bit[choose(parent, key)], lockdep_is_held(lock)); } *rnode = parent; return exact; }
+static inline void connect_node(struct allowedips_node **parent, u8 bit, struct allowedips_node *node) +{ + node->parent_bit_packed = (unsigned long)parent | bit; + rcu_assign_pointer(*parent, node); +} + +static inline void choose_and_connect_node(struct allowedips_node *parent, struct allowedips_node *node) +{ + u8 bit = choose(parent, node->bits); + connect_node(&parent->bit[bit], bit, node); +} + static int add(struct allowedips_node __rcu **trie, u8 bits, const u8 *key, u8 cidr, struct wg_peer *peer, struct mutex *lock) { @@ -177,8 +190,7 @@ static int add(struct allowedips_node __ RCU_INIT_POINTER(node->peer, peer); list_add_tail(&node->peer_list, &peer->allowedips_list); copy_and_assign_cidr(node, key, cidr, bits); - rcu_assign_pointer(node->parent_bit, trie); - rcu_assign_pointer(*trie, node); + connect_node(trie, 2, node); return 0; } if (node_placement(*trie, key, cidr, bits, &node, lock)) { @@ -197,10 +209,10 @@ static int add(struct allowedips_node __ if (!node) { down = rcu_dereference_protected(*trie, lockdep_is_held(lock)); } else { - down = rcu_dereference_protected(CHOOSE_NODE(node, key), lockdep_is_held(lock)); + const u8 bit = choose(node, key); + down = rcu_dereference_protected(node->bit[bit], lockdep_is_held(lock)); if (!down) { - rcu_assign_pointer(newnode->parent_bit, &CHOOSE_NODE(node, key)); - rcu_assign_pointer(CHOOSE_NODE(node, key), newnode); + connect_node(&node->bit[bit], bit, newnode); return 0; } } @@ -208,15 +220,11 @@ static int add(struct allowedips_node __ parent = node;
if (newnode->cidr == cidr) { - rcu_assign_pointer(down->parent_bit, &CHOOSE_NODE(newnode, down->bits)); - rcu_assign_pointer(CHOOSE_NODE(newnode, down->bits), down); - if (!parent) { - rcu_assign_pointer(newnode->parent_bit, trie); - rcu_assign_pointer(*trie, newnode); - } else { - rcu_assign_pointer(newnode->parent_bit, &CHOOSE_NODE(parent, newnode->bits)); - rcu_assign_pointer(CHOOSE_NODE(parent, newnode->bits), newnode); - } + choose_and_connect_node(newnode, down); + if (!parent) + connect_node(trie, 2, newnode); + else + choose_and_connect_node(parent, newnode); return 0; }
@@ -229,17 +237,12 @@ static int add(struct allowedips_node __ INIT_LIST_HEAD(&node->peer_list); copy_and_assign_cidr(node, newnode->bits, cidr, bits);
- rcu_assign_pointer(down->parent_bit, &CHOOSE_NODE(node, down->bits)); - rcu_assign_pointer(CHOOSE_NODE(node, down->bits), down); - rcu_assign_pointer(newnode->parent_bit, &CHOOSE_NODE(node, newnode->bits)); - rcu_assign_pointer(CHOOSE_NODE(node, newnode->bits), newnode); - if (!parent) { - rcu_assign_pointer(node->parent_bit, trie); - rcu_assign_pointer(*trie, node); - } else { - rcu_assign_pointer(node->parent_bit, &CHOOSE_NODE(parent, node->bits)); - rcu_assign_pointer(CHOOSE_NODE(parent, node->bits), node); - } + choose_and_connect_node(node, down); + choose_and_connect_node(node, newnode); + if (!parent) + connect_node(trie, 2, node); + else + choose_and_connect_node(parent, node); return 0; }
@@ -297,7 +300,8 @@ int wg_allowedips_insert_v6(struct allow void wg_allowedips_remove_by_peer(struct allowedips *table, struct wg_peer *peer, struct mutex *lock) { - struct allowedips_node *node, *child, *tmp; + struct allowedips_node *node, *child, **parent_bit, *parent, *tmp; + bool free_parent;
if (list_empty(&peer->allowedips_list)) return; @@ -307,19 +311,29 @@ void wg_allowedips_remove_by_peer(struct RCU_INIT_POINTER(node->peer, NULL); if (node->bit[0] && node->bit[1]) continue; - child = rcu_dereference_protected( - node->bit[!rcu_access_pointer(node->bit[0])], - lockdep_is_held(lock)); + child = rcu_dereference_protected(node->bit[!rcu_access_pointer(node->bit[0])], + lockdep_is_held(lock)); if (child) - child->parent_bit = node->parent_bit; - *rcu_dereference_protected(node->parent_bit, lockdep_is_held(lock)) = child; + child->parent_bit_packed = node->parent_bit_packed; + parent_bit = (struct allowedips_node **)(node->parent_bit_packed & ~3UL); + *parent_bit = child; + parent = (void *)parent_bit - + offsetof(struct allowedips_node, bit[node->parent_bit_packed & 1]); + free_parent = !rcu_access_pointer(node->bit[0]) && + !rcu_access_pointer(node->bit[1]) && + (node->parent_bit_packed & 3) <= 1 && + !rcu_access_pointer(parent->peer); + if (free_parent) + child = rcu_dereference_protected( + parent->bit[!(node->parent_bit_packed & 1)], + lockdep_is_held(lock)); call_rcu(&node->rcu, node_free_rcu); - - /* TODO: Note that we currently don't walk up and down in order to - * free any potential filler nodes. This means that this function - * doesn't free up as much as it could, which could be revisited - * at some point. - */ + if (!free_parent) + continue; + if (child) + child->parent_bit_packed = parent->parent_bit_packed; + *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child; + call_rcu(&parent->rcu, node_free_rcu); } }
--- a/drivers/net/wireguard/allowedips.h +++ b/drivers/net/wireguard/allowedips.h @@ -19,7 +19,7 @@ struct allowedips_node { u8 bits[16] __aligned(__alignof(u64));
/* Keep rarely used members at bottom to be beyond cache line. */ - struct allowedips_node *__rcu *parent_bit; + unsigned long parent_bit_packed; union { struct list_head peer_list; struct rcu_head rcu; @@ -30,7 +30,7 @@ struct allowedips { struct allowedips_node __rcu *root4; struct allowedips_node __rcu *root6; u64 seq; -}; +} __aligned(4); /* We pack the lower 2 bits of &root, but m68k only gives 16-bit alignment. */
void wg_allowedips_init(struct allowedips *table); void wg_allowedips_free(struct allowedips *table, struct mutex *mutex); --- a/drivers/net/wireguard/selftest/allowedips.c +++ b/drivers/net/wireguard/selftest/allowedips.c @@ -19,32 +19,22 @@
#include <linux/siphash.h>
-static __init void swap_endian_and_apply_cidr(u8 *dst, const u8 *src, u8 bits, - u8 cidr) -{ - swap_endian(dst, src, bits); - memset(dst + (cidr + 7) / 8, 0, bits / 8 - (cidr + 7) / 8); - if (cidr) - dst[(cidr + 7) / 8 - 1] &= ~0U << ((8 - (cidr % 8)) % 8); -} - static __init void print_node(struct allowedips_node *node, u8 bits) { char *fmt_connection = KERN_DEBUG "\t"%p/%d" -> "%p/%d";\n"; - char *fmt_declaration = KERN_DEBUG - "\t"%p/%d"[style=%s, color="#%06x"];\n"; + char *fmt_declaration = KERN_DEBUG "\t"%p/%d"[style=%s, color="#%06x"];\n"; + u8 ip1[16], ip2[16], cidr1, cidr2; char *style = "dotted"; - u8 ip1[16], ip2[16]; u32 color = 0;
+ if (node == NULL) + return; if (bits == 32) { fmt_connection = KERN_DEBUG "\t"%pI4/%d" -> "%pI4/%d";\n"; - fmt_declaration = KERN_DEBUG - "\t"%pI4/%d"[style=%s, color="#%06x"];\n"; + fmt_declaration = KERN_DEBUG "\t"%pI4/%d"[style=%s, color="#%06x"];\n"; } else if (bits == 128) { fmt_connection = KERN_DEBUG "\t"%pI6/%d" -> "%pI6/%d";\n"; - fmt_declaration = KERN_DEBUG - "\t"%pI6/%d"[style=%s, color="#%06x"];\n"; + fmt_declaration = KERN_DEBUG "\t"%pI6/%d"[style=%s, color="#%06x"];\n"; } if (node->peer) { hsiphash_key_t key = { { 0 } }; @@ -55,24 +45,20 @@ static __init void print_node(struct all hsiphash_1u32(0xabad1dea, &key) % 200; style = "bold"; } - swap_endian_and_apply_cidr(ip1, node->bits, bits, node->cidr); - printk(fmt_declaration, ip1, node->cidr, style, color); + wg_allowedips_read_node(node, ip1, &cidr1); + printk(fmt_declaration, ip1, cidr1, style, color); if (node->bit[0]) { - swap_endian_and_apply_cidr(ip2, - rcu_dereference_raw(node->bit[0])->bits, bits, - node->cidr); - printk(fmt_connection, ip1, node->cidr, ip2, - rcu_dereference_raw(node->bit[0])->cidr); - print_node(rcu_dereference_raw(node->bit[0]), bits); + wg_allowedips_read_node(rcu_dereference_raw(node->bit[0]), ip2, &cidr2); + printk(fmt_connection, ip1, cidr1, ip2, cidr2); } if (node->bit[1]) { - swap_endian_and_apply_cidr(ip2, - rcu_dereference_raw(node->bit[1])->bits, - bits, node->cidr); - printk(fmt_connection, ip1, node->cidr, ip2, - rcu_dereference_raw(node->bit[1])->cidr); - print_node(rcu_dereference_raw(node->bit[1]), bits); + wg_allowedips_read_node(rcu_dereference_raw(node->bit[1]), ip2, &cidr2); + printk(fmt_connection, ip1, cidr1, ip2, cidr2); } + if (node->bit[0]) + print_node(rcu_dereference_raw(node->bit[0]), bits); + if (node->bit[1]) + print_node(rcu_dereference_raw(node->bit[1]), bits); }
static __init void print_tree(struct allowedips_node __rcu *top, u8 bits) @@ -121,8 +107,8 @@ static __init inline union nf_inet_addr { union nf_inet_addr mask;
- memset(&mask, 0x00, 128 / 8); - memset(&mask, 0xff, cidr / 8); + memset(&mask, 0, sizeof(mask)); + memset(&mask.all, 0xff, cidr / 8); if (cidr % 32) mask.all[cidr / 32] = (__force u32)htonl( (0xFFFFFFFFUL << (32 - (cidr % 32))) & 0xFFFFFFFFUL); @@ -149,42 +135,36 @@ horrible_mask_self(struct horrible_allow }
static __init inline bool -horrible_match_v4(const struct horrible_allowedips_node *node, - struct in_addr *ip) +horrible_match_v4(const struct horrible_allowedips_node *node, struct in_addr *ip) { return (ip->s_addr & node->mask.ip) == node->ip.ip; }
static __init inline bool -horrible_match_v6(const struct horrible_allowedips_node *node, - struct in6_addr *ip) +horrible_match_v6(const struct horrible_allowedips_node *node, struct in6_addr *ip) { - return (ip->in6_u.u6_addr32[0] & node->mask.ip6[0]) == - node->ip.ip6[0] && - (ip->in6_u.u6_addr32[1] & node->mask.ip6[1]) == - node->ip.ip6[1] && - (ip->in6_u.u6_addr32[2] & node->mask.ip6[2]) == - node->ip.ip6[2] && + return (ip->in6_u.u6_addr32[0] & node->mask.ip6[0]) == node->ip.ip6[0] && + (ip->in6_u.u6_addr32[1] & node->mask.ip6[1]) == node->ip.ip6[1] && + (ip->in6_u.u6_addr32[2] & node->mask.ip6[2]) == node->ip.ip6[2] && (ip->in6_u.u6_addr32[3] & node->mask.ip6[3]) == node->ip.ip6[3]; }
static __init void -horrible_insert_ordered(struct horrible_allowedips *table, - struct horrible_allowedips_node *node) +horrible_insert_ordered(struct horrible_allowedips *table, struct horrible_allowedips_node *node) { struct horrible_allowedips_node *other = NULL, *where = NULL; u8 my_cidr = horrible_mask_to_cidr(node->mask);
hlist_for_each_entry(other, &table->head, table) { - if (!memcmp(&other->mask, &node->mask, - sizeof(union nf_inet_addr)) && - !memcmp(&other->ip, &node->ip, - sizeof(union nf_inet_addr)) && - other->ip_version == node->ip_version) { + if (other->ip_version == node->ip_version && + !memcmp(&other->mask, &node->mask, sizeof(union nf_inet_addr)) && + !memcmp(&other->ip, &node->ip, sizeof(union nf_inet_addr))) { other->value = node->value; kfree(node); return; } + } + hlist_for_each_entry(other, &table->head, table) { where = other; if (horrible_mask_to_cidr(other->mask) <= my_cidr) break; @@ -201,8 +181,7 @@ static __init int horrible_allowedips_insert_v4(struct horrible_allowedips *table, struct in_addr *ip, u8 cidr, void *value) { - struct horrible_allowedips_node *node = kzalloc(sizeof(*node), - GFP_KERNEL); + struct horrible_allowedips_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
if (unlikely(!node)) return -ENOMEM; @@ -219,8 +198,7 @@ static __init int horrible_allowedips_insert_v6(struct horrible_allowedips *table, struct in6_addr *ip, u8 cidr, void *value) { - struct horrible_allowedips_node *node = kzalloc(sizeof(*node), - GFP_KERNEL); + struct horrible_allowedips_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
if (unlikely(!node)) return -ENOMEM; @@ -234,39 +212,43 @@ horrible_allowedips_insert_v6(struct hor }
static __init void * -horrible_allowedips_lookup_v4(struct horrible_allowedips *table, - struct in_addr *ip) +horrible_allowedips_lookup_v4(struct horrible_allowedips *table, struct in_addr *ip) { struct horrible_allowedips_node *node; - void *ret = NULL;
hlist_for_each_entry(node, &table->head, table) { - if (node->ip_version != 4) - continue; - if (horrible_match_v4(node, ip)) { - ret = node->value; - break; - } + if (node->ip_version == 4 && horrible_match_v4(node, ip)) + return node->value; } - return ret; + return NULL; }
static __init void * -horrible_allowedips_lookup_v6(struct horrible_allowedips *table, - struct in6_addr *ip) +horrible_allowedips_lookup_v6(struct horrible_allowedips *table, struct in6_addr *ip) { struct horrible_allowedips_node *node; - void *ret = NULL;
hlist_for_each_entry(node, &table->head, table) { - if (node->ip_version != 6) + if (node->ip_version == 6 && horrible_match_v6(node, ip)) + return node->value; + } + return NULL; +} + + +static __init void +horrible_allowedips_remove_by_value(struct horrible_allowedips *table, void *value) +{ + struct horrible_allowedips_node *node; + struct hlist_node *h; + + hlist_for_each_entry_safe(node, h, &table->head, table) { + if (node->value != value) continue; - if (horrible_match_v6(node, ip)) { - ret = node->value; - break; - } + hlist_del(&node->table); + kfree(node); } - return ret; + }
static __init bool randomized_test(void) @@ -397,23 +379,33 @@ static __init bool randomized_test(void) print_tree(t.root6, 128); }
- for (i = 0; i < NUM_QUERIES; ++i) { - prandom_bytes(ip, 4); - if (lookup(t.root4, 32, ip) != - horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip)) { - pr_err("allowedips random self-test: FAIL\n"); - goto free; + for (j = 0;; ++j) { + for (i = 0; i < NUM_QUERIES; ++i) { + prandom_bytes(ip, 4); + if (lookup(t.root4, 32, ip) != horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip)) { + horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip); + pr_err("allowedips random v4 self-test: FAIL\n"); + goto free; + } + prandom_bytes(ip, 16); + if (lookup(t.root6, 128, ip) != horrible_allowedips_lookup_v6(&h, (struct in6_addr *)ip)) { + pr_err("allowedips random v6 self-test: FAIL\n"); + goto free; + } } + if (j >= NUM_PEERS) + break; + mutex_lock(&mutex); + wg_allowedips_remove_by_peer(&t, peers[j], &mutex); + mutex_unlock(&mutex); + horrible_allowedips_remove_by_value(&h, peers[j]); }
- for (i = 0; i < NUM_QUERIES; ++i) { - prandom_bytes(ip, 16); - if (lookup(t.root6, 128, ip) != - horrible_allowedips_lookup_v6(&h, (struct in6_addr *)ip)) { - pr_err("allowedips random self-test: FAIL\n"); - goto free; - } + if (t.root4 || t.root6) { + pr_err("allowedips random self-test removal: FAIL\n"); + goto free; } + ret = true;
free:
From: Pavel Skripkin paskripkin@gmail.com
commit bce130e7f392ddde8cfcb09927808ebd5f9c8669 upstream.
Added cfserl_release() function.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/caif/cfserl.h | 1 + net/caif/cfserl.c | 5 +++++ 2 files changed, 6 insertions(+)
--- a/include/net/caif/cfserl.h +++ b/include/net/caif/cfserl.h @@ -9,4 +9,5 @@ #include <net/caif/caif_layer.h>
struct cflayer *cfserl_create(int instance, bool use_stx); +void cfserl_release(struct cflayer *layer); #endif --- a/net/caif/cfserl.c +++ b/net/caif/cfserl.c @@ -31,6 +31,11 @@ static int cfserl_transmit(struct cflaye static void cfserl_ctrlcmd(struct cflayer *layr, enum caif_ctrlcmd ctrl, int phyid);
+void cfserl_release(struct cflayer *layer) +{ + kfree(layer); +} + struct cflayer *cfserl_create(int instance, bool use_stx) { struct cfserl *this = kzalloc(sizeof(struct cfserl), GFP_ATOMIC);
From: Pavel Skripkin paskripkin@gmail.com
commit a2805dca5107d5603f4bbc027e81e20d93476e96 upstream.
caif_enroll_dev() can fail in some cases. Ingnoring these cases can lead to memory leak due to not assigning link_support pointer to anywhere.
Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/caif/caif_dev.h | 2 +- include/net/caif/cfcnfg.h | 2 +- net/caif/caif_dev.c | 8 +++++--- net/caif/cfcnfg.c | 16 +++++++++++----- 4 files changed, 18 insertions(+), 10 deletions(-)
--- a/include/net/caif/caif_dev.h +++ b/include/net/caif/caif_dev.h @@ -119,7 +119,7 @@ void caif_free_client(struct cflayer *ad * The link_support layer is used to add any Link Layer specific * framing. */ -void caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, +int caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, struct cflayer *link_support, int head_room, struct cflayer **layer, int (**rcv_func)( struct sk_buff *, struct net_device *, --- a/include/net/caif/cfcnfg.h +++ b/include/net/caif/cfcnfg.h @@ -62,7 +62,7 @@ void cfcnfg_remove(struct cfcnfg *cfg); * @fcs: Specify if checksum is used in CAIF Framing Layer. * @head_room: Head space needed by link specific protocol. */ -void +int cfcnfg_add_phy_layer(struct cfcnfg *cnfg, struct net_device *dev, struct cflayer *phy_layer, enum cfcnfg_phy_preference pref, --- a/net/caif/caif_dev.c +++ b/net/caif/caif_dev.c @@ -308,7 +308,7 @@ static void dev_flowctrl(struct net_devi caifd_put(caifd); }
-void caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, +int caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, struct cflayer *link_support, int head_room, struct cflayer **layer, int (**rcv_func)(struct sk_buff *, struct net_device *, @@ -319,11 +319,12 @@ void caif_enroll_dev(struct net_device * enum cfcnfg_phy_preference pref; struct cfcnfg *cfg = get_cfcnfg(dev_net(dev)); struct caif_device_entry_list *caifdevs; + int res;
caifdevs = caif_device_list(dev_net(dev)); caifd = caif_device_alloc(dev); if (!caifd) - return; + return -ENOMEM; *layer = &caifd->layer; spin_lock_init(&caifd->flow_lock);
@@ -344,7 +345,7 @@ void caif_enroll_dev(struct net_device * strlcpy(caifd->layer.name, dev->name, sizeof(caifd->layer.name)); caifd->layer.transmit = transmit; - cfcnfg_add_phy_layer(cfg, + res = cfcnfg_add_phy_layer(cfg, dev, &caifd->layer, pref, @@ -354,6 +355,7 @@ void caif_enroll_dev(struct net_device * mutex_unlock(&caifdevs->lock); if (rcv_func) *rcv_func = receive; + return res; } EXPORT_SYMBOL(caif_enroll_dev);
--- a/net/caif/cfcnfg.c +++ b/net/caif/cfcnfg.c @@ -450,7 +450,7 @@ unlock: rcu_read_unlock(); }
-void +int cfcnfg_add_phy_layer(struct cfcnfg *cnfg, struct net_device *dev, struct cflayer *phy_layer, enum cfcnfg_phy_preference pref, @@ -459,7 +459,7 @@ cfcnfg_add_phy_layer(struct cfcnfg *cnfg { struct cflayer *frml; struct cfcnfg_phyinfo *phyinfo = NULL; - int i; + int i, res = 0; u8 phyid;
mutex_lock(&cnfg->lock); @@ -473,12 +473,15 @@ cfcnfg_add_phy_layer(struct cfcnfg *cnfg goto got_phyid; } pr_warn("Too many CAIF Link Layers (max 6)\n"); + res = -EEXIST; goto out;
got_phyid: phyinfo = kzalloc(sizeof(struct cfcnfg_phyinfo), GFP_ATOMIC); - if (!phyinfo) + if (!phyinfo) { + res = -ENOMEM; goto out_err; + }
phy_layer->id = phyid; phyinfo->pref = pref; @@ -492,8 +495,10 @@ got_phyid:
frml = cffrml_create(phyid, fcs);
- if (!frml) + if (!frml) { + res = -ENOMEM; goto out_err; + } phyinfo->frm_layer = frml; layer_set_up(frml, cnfg->mux);
@@ -511,11 +516,12 @@ got_phyid: list_add_rcu(&phyinfo->node, &cnfg->phys); out: mutex_unlock(&cnfg->lock); - return; + return res;
out_err: kfree(phyinfo); mutex_unlock(&cnfg->lock); + return res; } EXPORT_SYMBOL(cfcnfg_add_phy_layer);
From: Pavel Skripkin paskripkin@gmail.com
commit b53558a950a89824938e9811eddfc8efcd94e1bb upstream.
In case of caif_enroll_dev() fail, allocated link_support won't be assigned to the corresponding structure. So simply free allocated pointer in case of error
Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/caif/caif_dev.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/caif/caif_dev.c +++ b/net/caif/caif_dev.c @@ -370,6 +370,7 @@ static int caif_device_notify(struct not struct cflayer *layer, *link_support; int head_room = 0; struct caif_device_entry_list *caifdevs; + int res;
cfg = get_cfcnfg(dev_net(dev)); caifdevs = caif_device_list(dev_net(dev)); @@ -395,8 +396,10 @@ static int caif_device_notify(struct not break; } } - caif_enroll_dev(dev, caifdev, link_support, head_room, + res = caif_enroll_dev(dev, caifdev, link_support, head_room, &layer, NULL); + if (res) + cfserl_release(link_support); caifdev->flowctrl = dev_flowctrl; break;
From: Pavel Skripkin paskripkin@gmail.com
commit 7f5d86669fa4d485523ddb1d212e0a2d90bd62bb upstream.
In case of caif_enroll_dev() fail, allocated link_support won't be assigned to the corresponding structure. So simply free allocated pointer in case of error.
Fixes: 7ad65bf68d70 ("caif: Add support for CAIF over CDC NCM USB interface") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/caif/caif_usb.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/net/caif/caif_usb.c +++ b/net/caif/caif_usb.c @@ -115,6 +115,11 @@ static struct cflayer *cfusbl_create(int return (struct cflayer *) this; }
+static void cfusbl_release(struct cflayer *layer) +{ + kfree(layer); +} + static struct packet_type caif_usb_type __read_mostly = { .type = cpu_to_be16(ETH_P_802_EX1), }; @@ -127,6 +132,7 @@ static int cfusbl_device_notify(struct n struct cflayer *layer, *link_support; struct usbnet *usbnet; struct usb_device *usbdev; + int res;
/* Check whether we have a NCM device, and find its VID/PID. */ if (!(dev->dev.parent && dev->dev.parent->driver && @@ -169,8 +175,11 @@ static int cfusbl_device_notify(struct n if (dev->num_tx_queues > 1) pr_warn("USB device uses more than one tx queue\n");
- caif_enroll_dev(dev, &common, link_support, CFUSB_MAX_HEADLEN, + res = caif_enroll_dev(dev, &common, link_support, CFUSB_MAX_HEADLEN, &layer, &caif_usb_type.func); + if (res) + goto err; + if (!pack_added) dev_add_pack(&caif_usb_type); pack_added = true; @@ -178,6 +187,9 @@ static int cfusbl_device_notify(struct n strlcpy(layer->name, dev->name, sizeof(layer->name));
return 0; +err: + cfusbl_release(link_support); + return res; }
static struct notifier_block caif_device_notifier = {
From: Johnny Chuang johnny.chuang.emc@gmail.com
commit ca66a6770bd9d6d99e469debd1c7363ac455daf9 upstream.
For ELAN touchscreen, we found our boot code of IC was not flexible enough to receive and handle this command. Once the FW main code of our controller is crashed for some reason, the controller could not be enumerated successfully to be recognized by the system host. therefore, it lost touch functionality.
Add quirk for skip send power-on command after reset. It will impact to ELAN touchscreen and touchpad on HID over I2C projects.
Fixes: 43b7029f475e ("HID: i2c-hid: Send power-on command after reset").
Cc: stable@vger.kernel.org Signed-off-by: Johnny Chuang johnny.chuang.emc@gmail.com Reviewed-by: Harry Cutts hcutts@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Tested-by: Douglas Anderson dianders@chromium.org Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/i2c-hid/i2c-hid-core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/hid/i2c-hid/i2c-hid-core.c +++ b/drivers/hid/i2c-hid/i2c-hid-core.c @@ -50,6 +50,7 @@ #define I2C_HID_QUIRK_BOGUS_IRQ BIT(4) #define I2C_HID_QUIRK_RESET_ON_RESUME BIT(5) #define I2C_HID_QUIRK_BAD_INPUT_SIZE BIT(6) +#define I2C_HID_QUIRK_NO_WAKEUP_AFTER_RESET BIT(7)
/* flags */ @@ -183,6 +184,11 @@ static const struct i2c_hid_quirks { I2C_HID_QUIRK_RESET_ON_RESUME }, { USB_VENDOR_ID_ITE, I2C_DEVICE_ID_ITE_LENOVO_LEGION_Y720, I2C_HID_QUIRK_BAD_INPUT_SIZE }, + /* + * Sending the wakeup after reset actually break ELAN touchscreen controller + */ + { USB_VENDOR_ID_ELAN, HID_ANY_ID, + I2C_HID_QUIRK_NO_WAKEUP_AFTER_RESET }, { 0, 0 } };
@@ -466,7 +472,8 @@ static int i2c_hid_hwreset(struct i2c_cl }
/* At least some SIS devices need this after reset */ - ret = i2c_hid_set_power(client, I2C_HID_PWR_ON); + if (!(ihid->quirks & I2C_HID_QUIRK_NO_WAKEUP_AFTER_RESET)) + ret = i2c_hid_set_power(client, I2C_HID_PWR_ON);
out_unlock: mutex_unlock(&ihid->reset_lock);
From: Johan Hovold johan@kernel.org
commit 4b4f6cecca446abcb686c6e6c451d4f1ec1a7497 upstream.
Commit 9d7b18668956 ("HID: magicmouse: add support for Apple Magic Trackpad 2") added a sanity check for an Apple trackpad but returned success instead of -ENODEV when the check failed. This means that the remove callback will dereference the never-initialised driver data pointer when the driver is later unbound (e.g. on USB disconnect).
Reported-by: syzbot+ee6f6e2e68886ca256a8@syzkaller.appspotmail.com Fixes: 9d7b18668956 ("HID: magicmouse: add support for Apple Magic Trackpad 2") Cc: stable@vger.kernel.org # 4.20 Cc: Claudio Mettler claudio@ponyfleisch.ch Cc: Marek Wyborski marek.wyborski@emwesoft.com Cc: Sean O'Brien seobrien@chromium.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-magicmouse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hid/hid-magicmouse.c +++ b/drivers/hid/hid-magicmouse.c @@ -597,7 +597,7 @@ static int magicmouse_probe(struct hid_d if (id->vendor == USB_VENDOR_ID_APPLE && id->product == USB_DEVICE_ID_APPLE_MAGICTRACKPAD2 && hdev->type != HID_TYPE_USBMOUSE) - return 0; + return -ENODEV;
msc = devm_kzalloc(&hdev->dev, sizeof(*msc), GFP_KERNEL); if (msc == NULL) {
From: Ahelenia Ziemiańska nabijaczleweli@nabijaczleweli.xyz
commit a2353e3b26012ff43bcdf81d37a3eaddd7ecdbf3 upstream.
This effectively changes collection_is_mt from contact ID in report->field to (device is Win8 => collection is finger) && contact ID in report->field
Some devices erroneously report Pen for fingers, and Win8 stylus-on-touchscreen devices report contact ID, but mark the accompanying touchscreen device's collection correctly
Cc: stable@vger.kernel.org Signed-off-by: Ahelenia Ziemiańska nabijaczleweli@nabijaczleweli.xyz Acked-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-multitouch.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -604,9 +604,13 @@ static struct mt_report_data *mt_allocat if (!(HID_MAIN_ITEM_VARIABLE & field->flags)) continue;
- for (n = 0; n < field->report_count; n++) { - if (field->usage[n].hid == HID_DG_CONTACTID) - rdata->is_mt_collection = true; + if (field->logical == HID_DG_FINGER || td->hdev->group != HID_GROUP_MULTITOUCH_WIN_8) { + for (n = 0; n < field->report_count; n++) { + if (field->usage[n].hid == HID_DG_CONTACTID) { + rdata->is_mt_collection = true; + break; + } + } } }
From: Bob Peterson rpeterso@redhat.com
commit 20265d9a67e40eafd39a8884658ca2e36f05985d upstream.
Before this patch, in the unlikely event that gfs2_glock_dq encountered a withdraw, it would do a wait_on_bit to wait for its journal to be recovered, but it never released the glock's spin_lock, which caused a scheduling-while-atomic error.
This patch unlocks the lockref spin_lock before waiting for recovery.
Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Alexander Aring aahringo@redhat.com Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/gfs2/glock.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1457,9 +1457,11 @@ void gfs2_glock_dq(struct gfs2_holder *g glock_blocked_by_withdraw(gl) && gh->gh_gl != sdp->sd_jinode_gl) { sdp->sd_glock_dqs_held++; + spin_unlock(&gl->gl_lockref.lock); might_sleep(); wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY, TASK_UNINTERRUPTIBLE); + spin_lock(&gl->gl_lockref.lock); } if (gh->gh_flags & GL_NOCACHE) handle_callback(gl, LM_ST_UNLOCKED, 0, false);
From: Takashi Iwai tiwai@suse.de
commit 9c1fe96bded935369f8340c2ac2e9e189f697d5d upstream.
snd_timer_notify1() calls the notification to each slave for a master event, but it passes a wrong event number. It should be +10 offset, corresponding to SNDRV_TIMER_EVENT_MXXX, but it's incorrectly with +100 offset. Casually this was spotted by UBSAN check via syzkaller.
Reported-by: syzbot+d102fa5b35335a7e544e@syzkaller.appspotmail.com Reviewed-by: Jaroslav Kysela perex@perex.cz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/000000000000e5560e05c3bd1d63@google.com Link: https://lore.kernel.org/r/20210602113823.23777-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/timer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -520,9 +520,10 @@ static void snd_timer_notify1(struct snd return; if (timer->hw.flags & SNDRV_TIMER_HW_SLAVE) return; + event += 10; /* convert to SNDRV_TIMER_EVENT_MXXX */ list_for_each_entry(ts, &ti->slave_active_head, active_list) if (ts->ccallback) - ts->ccallback(ts, event + 100, &tstamp, resolution); + ts->ccallback(ts, event, &tstamp, resolution); }
/* start/continue a master timer */
From: Carlos M carlos.marr.pz@gmail.com
commit 901be145a46eb79879367d853194346a549e623d upstream.
For the HP Pavilion 15-CK0xx, with audio subsystem ID 0x103c:0x841c, adding a line in patch_realtek.c to apply the ALC269_FIXUP_HP_MUTE_LED_MIC3 fix activates the mute key LED.
Signed-off-by: Carlos M carlos.marr.pz@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210531202026.35427-1-carlos.marr.pz@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -8289,6 +8289,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x82bf, "HP G3 mini", ALC221_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x82c0, "HP G3 mini premium", ALC221_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x83b9, "HP Spectre x360", ALC269_FIXUP_HP_MUTE_LED_MIC3), + SND_PCI_QUIRK(0x103c, 0x841c, "HP Pavilion 15-CK0xx", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x8497, "HP Envy x360", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x84da, "HP OMEN dc0019-ur", ALC295_FIXUP_HP_OMEN), SND_PCI_QUIRK(0x103c, 0x84e7, "HP Pavilion 15", ALC269_FIXUP_HP_MUTE_LED_MIC3),
From: Hui Wang hui.wang@canonical.com
commit b8b90c17602689eeaa5b219d104bbc215d1225cc upstream.
The patch_realtek.c needs to check if the power_state.event equals PM_EVENT_SUSPEND, after using the direct-complete, the suspend() and resume() will be skipped if the codec is already rt_suspended, in this case, the patch_realtek.c will always get PM_EVENT_ON even the system is really resumed from S3.
We could set power_state to PMSG_SUSPEND in the prepare(), if other PM functions are called before complete(), those functions will override power_state; if no other PM functions are called before complete(), we could know the suspend() and resume() are skipped since only S3 pm functions could be skipped by direct-complete, in this case set power_state to PMSG_RESUME in the complete(). This could guarantee the first time of calling hda_codec_runtime_resume() after complete() has the correct power_state.
Fixes: 215a22ed31a1 ("ALSA: hda: Refactor codec PM to use direct-complete optimization") Cc: stable@vger.kernel.org Signed-off-by: Hui Wang hui.wang@canonical.com Link: https://lore.kernel.org/r/20210602145424.3132-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/hda_codec.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -2973,6 +2973,7 @@ static int hda_codec_runtime_resume(stru #ifdef CONFIG_PM_SLEEP static int hda_codec_pm_prepare(struct device *dev) { + dev->power.power_state = PMSG_SUSPEND; return pm_runtime_suspended(dev); }
@@ -2980,6 +2981,10 @@ static void hda_codec_pm_complete(struct { struct hda_codec *codec = dev_to_hda_codec(dev);
+ /* If no other pm-functions are called between prepare() and complete() */ + if (dev->power.power_state.event == PM_EVENT_SUSPEND) + dev->power.power_state = PMSG_RESUME; + if (pm_runtime_suspended(dev) && (codec->jackpoll_interval || hda_codec_need_resume(codec) || codec->forced_resume)) pm_request_resume(dev);
From: Michal Vokáč michal.vokac@ysoft.com
commit 0e4a4a08cd78efcaddbc2e4c5ed86b5a5cb8a15e upstream.
The FEC does not have a PHY so it should not have a phy-handle. It is connected to the switch at RGMII level so we need a fixed-link sub-node on both ends.
This was not a problem until the qca8k.c driver was converted to PHYLINK by commit b3591c2a3661 ("net: dsa: qca8k: Switch to PHYLINK instead of PHYLIB"). That commit revealed the FEC configuration was not correct.
Fixes: 87489ec3a77f ("ARM: dts: imx: Add Y Soft IOTA Draco, Hydra and Ursa boards") Cc: stable@vger.kernel.org Signed-off-by: Michal Vokáč michal.vokac@ysoft.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/imx6dl-yapp4-common.dtsi | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi +++ b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi @@ -105,9 +105,13 @@ phy-reset-gpios = <&gpio1 25 GPIO_ACTIVE_LOW>; phy-reset-duration = <20>; phy-supply = <&sw2_reg>; - phy-handle = <ðphy0>; status = "okay";
+ fixed-link { + speed = <1000>; + full-duplex; + }; + mdio { #address-cells = <1>; #size-cells = <0>;
From: Marek Vasut marex@denx.de
commit 8967b27a6c1c19251989c7ab33c058d16e4a5f53 upstream.
Per schematic, both PU and SOC regulator are supplied from LTC3676 SW1 via VDDSOC_IN rail, add the PU input. Both VDD1P1, VDD2P5 are supplied from LTC3676 SW2 via VDDHIGH_IN rail, add both inputs.
While no instability or problems are currently observed, the regulators should be fully described in DT and that description should fully match the hardware, else this might lead to unforseen issues later. Fix this.
Fixes: 52c7a088badd ("ARM: dts: imx6q: Add support for the DHCOM iMX6 SoM and PDK2") Reviewed-by: Fabio Estevam festevam@gmail.com Signed-off-by: Marek Vasut marex@denx.de Cc: Christoph Niedermaier cniedermaier@dh-electronics.com Cc: Fabio Estevam festevam@gmail.com Cc: Ludwig Zenz lzenz@dh-electronics.com Cc: NXP Linux Team linux-imx@nxp.com Cc: Shawn Guo shawnguo@kernel.org Cc: stable@vger.kernel.org Reviewed-by: Christoph Niedermaier cniedermaier@dh-electronics.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/imx6q-dhcom-som.dtsi | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/arch/arm/boot/dts/imx6q-dhcom-som.dtsi +++ b/arch/arm/boot/dts/imx6q-dhcom-som.dtsi @@ -406,6 +406,18 @@ vin-supply = <&sw1_reg>; };
+®_pu { + vin-supply = <&sw1_reg>; +}; + +®_vdd1p1 { + vin-supply = <&sw2_reg>; +}; + +®_vdd2p5 { + vin-supply = <&sw2_reg>; +}; + &uart1 { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_uart1>;
From: Alexey Makhalov amakhalov@vmware.com
commit afd09b617db3786b6ef3dc43e28fe728cfea84df upstream.
Buffer head references must be released before calling kill_bdev(); otherwise the buffer head (and its page referenced by b_data) will not be freed by kill_bdev, and subsequently that bh will be leaked.
If blocksizes differ, sb_set_blocksize() will kill current buffers and page cache by using kill_bdev(). And then super block will be reread again but using correct blocksize this time. sb_set_blocksize() didn't fully free superblock page and buffer head, and being busy, they were not freed and instead leaked.
This can easily be reproduced by calling an infinite loop of:
systemctl start <ext4_on_lvm>.mount, and systemctl stop <ext4_on_lvm>.mount
... since systemd creates a cgroup for each slice which it mounts, and the bh leak get amplified by a dying memory cgroup that also never gets freed, and memory consumption is much more easily noticed.
Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize") Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov amakhalov@vmware.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/super.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4451,14 +4451,20 @@ static int ext4_fill_super(struct super_ }
if (sb->s_blocksize != blocksize) { + /* + * bh must be released before kill_bdev(), otherwise + * it won't be freed and its page also. kill_bdev() + * is called by sb_set_blocksize(). + */ + brelse(bh); /* Validate the filesystem blocksize */ if (!sb_set_blocksize(sb, blocksize)) { ext4_msg(sb, KERN_ERR, "bad block size %d", blocksize); + bh = NULL; goto failed_mount; }
- brelse(bh); logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE; offset = do_div(logical_sb_block, blocksize); bh = ext4_sb_bread_unmovable(sb, logical_sb_block); @@ -5181,8 +5187,9 @@ failed_mount: kfree(get_qf_name(sb, sbi, i)); #endif fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy); - ext4_blkdev_remove(sbi); + /* ext4_blkdev_remove() calls kill_bdev(), release bh before it. */ brelse(bh); + ext4_blkdev_remove(sbi); out_fail: sb->s_fs_info = NULL; kfree(sbi->s_blockgroup_lock);
From: Ye Bin yebin10@huawei.com
commit 082cd4ec240b8734a82a89ffb890216ac98fec68 upstream.
We got follow bug_on when run fsstress with injecting IO fault: [130747.323114] kernel BUG at fs/ext4/extents_status.c:762! [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP ...... [130747.334329] Call trace: [130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4] [130747.334975] ext4_cache_extents+0x64/0xe8 [ext4] [130747.335368] ext4_find_extent+0x300/0x330 [ext4] [130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4] [130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4] [130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4] [130747.336995] ext4_readpage+0x54/0x100 [ext4] [130747.337359] generic_file_buffered_read+0x410/0xae8 [130747.337767] generic_file_read_iter+0x114/0x190 [130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4] [130747.338556] __vfs_read+0x11c/0x188 [130747.338851] vfs_read+0x94/0x150 [130747.339110] ksys_read+0x74/0xf0
This patch's modification is according to Jan Kara's suggestion in: https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201... "I see. Now I understand your patch. Honestly, seeing how fragile is trying to fix extent tree after split has failed in the middle, I would probably go even further and make sure we fix the tree properly in case of ENOSPC and EDQUOT (those are easily user triggerable). Anything else indicates a HW problem or fs corruption so I'd rather leave the extent tree as is and don't try to fix it (which also means we will not create overlapping extents)."
Cc: stable@kernel.org Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/extents.c | 43 +++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 20 deletions(-)
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3206,7 +3206,10 @@ static int ext4_split_extent_at(handle_t ext4_ext_mark_unwritten(ex2);
err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); - if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) { + if (err != -ENOSPC && err != -EDQUOT) + goto out; + + if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { if (split_flag & EXT4_EXT_DATA_VALID1) { err = ext4_ext_zeroout(inode, ex2); @@ -3232,25 +3235,22 @@ static int ext4_split_extent_at(handle_t ext4_ext_pblock(&orig_ex)); }
- if (err) - goto fix_extent_len; - /* update the extent length and mark as initialized */ - ex->ee_len = cpu_to_le16(ee_len); - ext4_ext_try_to_merge(handle, inode, path, ex); - err = ext4_ext_dirty(handle, inode, path + path->p_depth); - if (err) - goto fix_extent_len; - - /* update extent status tree */ - err = ext4_zeroout_es(inode, &zero_ex); - - goto out; - } else if (err) - goto fix_extent_len; - -out: - ext4_ext_show_leaf(inode, path); - return err; + if (!err) { + /* update the extent length and mark as initialized */ + ex->ee_len = cpu_to_le16(ee_len); + ext4_ext_try_to_merge(handle, inode, path, ex); + err = ext4_ext_dirty(handle, inode, path + path->p_depth); + if (!err) + /* update extent status tree */ + err = ext4_zeroout_es(inode, &zero_ex); + /* If we failed at this point, we don't know in which + * state the extent tree exactly is so don't try to fix + * length of the original extent as it may do even more + * damage. + */ + goto out; + } + }
fix_extent_len: ex->ee_len = orig_ex.ee_len; @@ -3260,6 +3260,9 @@ fix_extent_len: */ ext4_ext_dirty(handle, inode, path + path->p_depth); return err; +out: + ext4_ext_show_leaf(inode, path); + return err; }
/*
From: Harshad Shirwadkar harshadshirwadkar@gmail.com
commit a7ba36bc94f20b6c77f16364b9a23f582ea8faac upstream.
Fast commit recovery data on disk may not be aligned. So, when the recovery code reads it, this patch makes sure that fast commit info found on-disk is first memcpy-ed into an aligned variable before accessing it. As a consequence of it, we also remove some macros that could resulted in unaligned accesses.
Cc: stable@kernel.org Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Harshad Shirwadkar harshadshirwadkar@gmail.com Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 182 ++++++++++++++++++++++++-------------------------- fs/ext4/fast_commit.h | 7 - 2 files changed, 90 insertions(+), 99 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -1227,18 +1227,6 @@ static void ext4_fc_cleanup(journal_t *j
/* Ext4 Replay Path Routines */
-/* Get length of a particular tlv */ -static inline int ext4_fc_tag_len(struct ext4_fc_tl *tl) -{ - return le16_to_cpu(tl->fc_len); -} - -/* Get a pointer to "value" of a tlv */ -static inline u8 *ext4_fc_tag_val(struct ext4_fc_tl *tl) -{ - return (u8 *)tl + sizeof(*tl); -} - /* Helper struct for dentry replay routines */ struct dentry_info_args { int parent_ino, dname_len, ino, inode_len; @@ -1246,28 +1234,29 @@ struct dentry_info_args { };
static inline void tl_to_darg(struct dentry_info_args *darg, - struct ext4_fc_tl *tl) + struct ext4_fc_tl *tl, u8 *val) { - struct ext4_fc_dentry_info *fcd; + struct ext4_fc_dentry_info fcd;
- fcd = (struct ext4_fc_dentry_info *)ext4_fc_tag_val(tl); + memcpy(&fcd, val, sizeof(fcd));
- darg->parent_ino = le32_to_cpu(fcd->fc_parent_ino); - darg->ino = le32_to_cpu(fcd->fc_ino); - darg->dname = fcd->fc_dname; - darg->dname_len = ext4_fc_tag_len(tl) - - sizeof(struct ext4_fc_dentry_info); + darg->parent_ino = le32_to_cpu(fcd.fc_parent_ino); + darg->ino = le32_to_cpu(fcd.fc_ino); + darg->dname = val + offsetof(struct ext4_fc_dentry_info, fc_dname); + darg->dname_len = le16_to_cpu(tl->fc_len) - + sizeof(struct ext4_fc_dentry_info); }
/* Unlink replay function */ -static int ext4_fc_replay_unlink(struct super_block *sb, struct ext4_fc_tl *tl) +static int ext4_fc_replay_unlink(struct super_block *sb, struct ext4_fc_tl *tl, + u8 *val) { struct inode *inode, *old_parent; struct qstr entry; struct dentry_info_args darg; int ret = 0;
- tl_to_darg(&darg, tl); + tl_to_darg(&darg, tl, val);
trace_ext4_fc_replay(sb, EXT4_FC_TAG_UNLINK, darg.ino, darg.parent_ino, darg.dname_len); @@ -1357,13 +1346,14 @@ out: }
/* Link replay function */ -static int ext4_fc_replay_link(struct super_block *sb, struct ext4_fc_tl *tl) +static int ext4_fc_replay_link(struct super_block *sb, struct ext4_fc_tl *tl, + u8 *val) { struct inode *inode; struct dentry_info_args darg; int ret = 0;
- tl_to_darg(&darg, tl); + tl_to_darg(&darg, tl, val); trace_ext4_fc_replay(sb, EXT4_FC_TAG_LINK, darg.ino, darg.parent_ino, darg.dname_len);
@@ -1408,9 +1398,10 @@ static int ext4_fc_record_modified_inode /* * Inode replay function */ -static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl) +static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl, + u8 *val) { - struct ext4_fc_inode *fc_inode; + struct ext4_fc_inode fc_inode; struct ext4_inode *raw_inode; struct ext4_inode *raw_fc_inode; struct inode *inode = NULL; @@ -1418,9 +1409,9 @@ static int ext4_fc_replay_inode(struct s int inode_len, ino, ret, tag = le16_to_cpu(tl->fc_tag); struct ext4_extent_header *eh;
- fc_inode = (struct ext4_fc_inode *)ext4_fc_tag_val(tl); + memcpy(&fc_inode, val, sizeof(fc_inode));
- ino = le32_to_cpu(fc_inode->fc_ino); + ino = le32_to_cpu(fc_inode.fc_ino); trace_ext4_fc_replay(sb, tag, ino, 0, 0);
inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); @@ -1432,12 +1423,13 @@ static int ext4_fc_replay_inode(struct s
ext4_fc_record_modified_inode(sb, ino);
- raw_fc_inode = (struct ext4_inode *)fc_inode->fc_raw_inode; + raw_fc_inode = (struct ext4_inode *) + (val + offsetof(struct ext4_fc_inode, fc_raw_inode)); ret = ext4_get_fc_inode_loc(sb, ino, &iloc); if (ret) goto out;
- inode_len = ext4_fc_tag_len(tl) - sizeof(struct ext4_fc_inode); + inode_len = le16_to_cpu(tl->fc_len) - sizeof(struct ext4_fc_inode); raw_inode = ext4_raw_inode(&iloc);
memcpy(raw_inode, raw_fc_inode, offsetof(struct ext4_inode, i_block)); @@ -1505,14 +1497,15 @@ out: * inode for which we are trying to create a dentry here, should already have * been replayed before we start here. */ -static int ext4_fc_replay_create(struct super_block *sb, struct ext4_fc_tl *tl) +static int ext4_fc_replay_create(struct super_block *sb, struct ext4_fc_tl *tl, + u8 *val) { int ret = 0; struct inode *inode = NULL; struct inode *dir = NULL; struct dentry_info_args darg;
- tl_to_darg(&darg, tl); + tl_to_darg(&darg, tl, val);
trace_ext4_fc_replay(sb, EXT4_FC_TAG_CREAT, darg.ino, darg.parent_ino, darg.dname_len); @@ -1591,9 +1584,9 @@ static int ext4_fc_record_regions(struct
/* Replay add range tag */ static int ext4_fc_replay_add_range(struct super_block *sb, - struct ext4_fc_tl *tl) + struct ext4_fc_tl *tl, u8 *val) { - struct ext4_fc_add_range *fc_add_ex; + struct ext4_fc_add_range fc_add_ex; struct ext4_extent newex, *ex; struct inode *inode; ext4_lblk_t start, cur; @@ -1603,15 +1596,14 @@ static int ext4_fc_replay_add_range(stru struct ext4_ext_path *path = NULL; int ret;
- fc_add_ex = (struct ext4_fc_add_range *)ext4_fc_tag_val(tl); - ex = (struct ext4_extent *)&fc_add_ex->fc_ex; + memcpy(&fc_add_ex, val, sizeof(fc_add_ex)); + ex = (struct ext4_extent *)&fc_add_ex.fc_ex;
trace_ext4_fc_replay(sb, EXT4_FC_TAG_ADD_RANGE, - le32_to_cpu(fc_add_ex->fc_ino), le32_to_cpu(ex->ee_block), + le32_to_cpu(fc_add_ex.fc_ino), le32_to_cpu(ex->ee_block), ext4_ext_get_actual_len(ex));
- inode = ext4_iget(sb, le32_to_cpu(fc_add_ex->fc_ino), - EXT4_IGET_NORMAL); + inode = ext4_iget(sb, le32_to_cpu(fc_add_ex.fc_ino), EXT4_IGET_NORMAL); if (IS_ERR(inode)) { jbd_debug(1, "Inode not found."); return 0; @@ -1720,32 +1712,33 @@ next:
/* Replay DEL_RANGE tag */ static int -ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl) +ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl, + u8 *val) { struct inode *inode; - struct ext4_fc_del_range *lrange; + struct ext4_fc_del_range lrange; struct ext4_map_blocks map; ext4_lblk_t cur, remaining; int ret;
- lrange = (struct ext4_fc_del_range *)ext4_fc_tag_val(tl); - cur = le32_to_cpu(lrange->fc_lblk); - remaining = le32_to_cpu(lrange->fc_len); + memcpy(&lrange, val, sizeof(lrange)); + cur = le32_to_cpu(lrange.fc_lblk); + remaining = le32_to_cpu(lrange.fc_len);
trace_ext4_fc_replay(sb, EXT4_FC_TAG_DEL_RANGE, - le32_to_cpu(lrange->fc_ino), cur, remaining); + le32_to_cpu(lrange.fc_ino), cur, remaining);
- inode = ext4_iget(sb, le32_to_cpu(lrange->fc_ino), EXT4_IGET_NORMAL); + inode = ext4_iget(sb, le32_to_cpu(lrange.fc_ino), EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "Inode %d not found", le32_to_cpu(lrange->fc_ino)); + jbd_debug(1, "Inode %d not found", le32_to_cpu(lrange.fc_ino)); return 0; }
ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
jbd_debug(1, "DEL_RANGE, inode %ld, lblk %d, len %d\n", - inode->i_ino, le32_to_cpu(lrange->fc_lblk), - le32_to_cpu(lrange->fc_len)); + inode->i_ino, le32_to_cpu(lrange.fc_lblk), + le32_to_cpu(lrange.fc_len)); while (remaining > 0) { map.m_lblk = cur; map.m_len = remaining; @@ -1766,8 +1759,8 @@ ext4_fc_replay_del_range(struct super_bl }
ret = ext4_punch_hole(inode, - le32_to_cpu(lrange->fc_lblk) << sb->s_blocksize_bits, - le32_to_cpu(lrange->fc_len) << sb->s_blocksize_bits); + le32_to_cpu(lrange.fc_lblk) << sb->s_blocksize_bits, + le32_to_cpu(lrange.fc_len) << sb->s_blocksize_bits); if (ret) jbd_debug(1, "ext4_punch_hole returned %d", ret); ext4_ext_replay_shrink_inode(inode, @@ -1909,11 +1902,11 @@ static int ext4_fc_replay_scan(journal_t struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_fc_replay_state *state; int ret = JBD2_FC_REPLAY_CONTINUE; - struct ext4_fc_add_range *ext; - struct ext4_fc_tl *tl; - struct ext4_fc_tail *tail; - __u8 *start, *end; - struct ext4_fc_head *head; + struct ext4_fc_add_range ext; + struct ext4_fc_tl tl; + struct ext4_fc_tail tail; + __u8 *start, *end, *cur, *val; + struct ext4_fc_head head; struct ext4_extent *ex;
state = &sbi->s_fc_replay_state; @@ -1940,15 +1933,17 @@ static int ext4_fc_replay_scan(journal_t }
state->fc_replay_expected_off++; - fc_for_each_tl(start, end, tl) { + for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { + memcpy(&tl, cur, sizeof(tl)); + val = cur + sizeof(tl); jbd_debug(3, "Scan phase, tag:%s, blk %lld\n", - tag2str(le16_to_cpu(tl->fc_tag)), bh->b_blocknr); - switch (le16_to_cpu(tl->fc_tag)) { + tag2str(le16_to_cpu(tl.fc_tag)), bh->b_blocknr); + switch (le16_to_cpu(tl.fc_tag)) { case EXT4_FC_TAG_ADD_RANGE: - ext = (struct ext4_fc_add_range *)ext4_fc_tag_val(tl); - ex = (struct ext4_extent *)&ext->fc_ex; + memcpy(&ext, val, sizeof(ext)); + ex = (struct ext4_extent *)&ext.fc_ex; ret = ext4_fc_record_regions(sb, - le32_to_cpu(ext->fc_ino), + le32_to_cpu(ext.fc_ino), le32_to_cpu(ex->ee_block), ext4_ext_pblock(ex), ext4_ext_get_actual_len(ex)); if (ret < 0) @@ -1962,18 +1957,18 @@ static int ext4_fc_replay_scan(journal_t case EXT4_FC_TAG_INODE: case EXT4_FC_TAG_PAD: state->fc_cur_tag++; - state->fc_crc = ext4_chksum(sbi, state->fc_crc, tl, - sizeof(*tl) + ext4_fc_tag_len(tl)); + state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, + sizeof(tl) + le16_to_cpu(tl.fc_len)); break; case EXT4_FC_TAG_TAIL: state->fc_cur_tag++; - tail = (struct ext4_fc_tail *)ext4_fc_tag_val(tl); - state->fc_crc = ext4_chksum(sbi, state->fc_crc, tl, - sizeof(*tl) + + memcpy(&tail, val, sizeof(tail)); + state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, + sizeof(tl) + offsetof(struct ext4_fc_tail, fc_crc)); - if (le32_to_cpu(tail->fc_tid) == expected_tid && - le32_to_cpu(tail->fc_crc) == state->fc_crc) { + if (le32_to_cpu(tail.fc_tid) == expected_tid && + le32_to_cpu(tail.fc_crc) == state->fc_crc) { state->fc_replay_num_tags = state->fc_cur_tag; state->fc_regions_valid = state->fc_regions_used; @@ -1984,19 +1979,19 @@ static int ext4_fc_replay_scan(journal_t state->fc_crc = 0; break; case EXT4_FC_TAG_HEAD: - head = (struct ext4_fc_head *)ext4_fc_tag_val(tl); - if (le32_to_cpu(head->fc_features) & + memcpy(&head, val, sizeof(head)); + if (le32_to_cpu(head.fc_features) & ~EXT4_FC_SUPPORTED_FEATURES) { ret = -EOPNOTSUPP; break; } - if (le32_to_cpu(head->fc_tid) != expected_tid) { + if (le32_to_cpu(head.fc_tid) != expected_tid) { ret = JBD2_FC_REPLAY_STOP; break; } state->fc_cur_tag++; - state->fc_crc = ext4_chksum(sbi, state->fc_crc, tl, - sizeof(*tl) + ext4_fc_tag_len(tl)); + state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, + sizeof(tl) + le16_to_cpu(tl.fc_len)); break; default: ret = state->fc_replay_num_tags ? @@ -2020,11 +2015,11 @@ static int ext4_fc_replay(journal_t *jou { struct super_block *sb = journal->j_private; struct ext4_sb_info *sbi = EXT4_SB(sb); - struct ext4_fc_tl *tl; - __u8 *start, *end; + struct ext4_fc_tl tl; + __u8 *start, *end, *cur, *val; int ret = JBD2_FC_REPLAY_CONTINUE; struct ext4_fc_replay_state *state = &sbi->s_fc_replay_state; - struct ext4_fc_tail *tail; + struct ext4_fc_tail tail;
if (pass == PASS_SCAN) { state->fc_current_pass = PASS_SCAN; @@ -2051,49 +2046,52 @@ static int ext4_fc_replay(journal_t *jou start = (u8 *)bh->b_data; end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- fc_for_each_tl(start, end, tl) { + for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { + memcpy(&tl, cur, sizeof(tl)); + val = cur + sizeof(tl); + if (state->fc_replay_num_tags == 0) { ret = JBD2_FC_REPLAY_STOP; ext4_fc_set_bitmaps_and_counters(sb); break; } jbd_debug(3, "Replay phase, tag:%s\n", - tag2str(le16_to_cpu(tl->fc_tag))); + tag2str(le16_to_cpu(tl.fc_tag))); state->fc_replay_num_tags--; - switch (le16_to_cpu(tl->fc_tag)) { + switch (le16_to_cpu(tl.fc_tag)) { case EXT4_FC_TAG_LINK: - ret = ext4_fc_replay_link(sb, tl); + ret = ext4_fc_replay_link(sb, &tl, val); break; case EXT4_FC_TAG_UNLINK: - ret = ext4_fc_replay_unlink(sb, tl); + ret = ext4_fc_replay_unlink(sb, &tl, val); break; case EXT4_FC_TAG_ADD_RANGE: - ret = ext4_fc_replay_add_range(sb, tl); + ret = ext4_fc_replay_add_range(sb, &tl, val); break; case EXT4_FC_TAG_CREAT: - ret = ext4_fc_replay_create(sb, tl); + ret = ext4_fc_replay_create(sb, &tl, val); break; case EXT4_FC_TAG_DEL_RANGE: - ret = ext4_fc_replay_del_range(sb, tl); + ret = ext4_fc_replay_del_range(sb, &tl, val); break; case EXT4_FC_TAG_INODE: - ret = ext4_fc_replay_inode(sb, tl); + ret = ext4_fc_replay_inode(sb, &tl, val); break; case EXT4_FC_TAG_PAD: trace_ext4_fc_replay(sb, EXT4_FC_TAG_PAD, 0, - ext4_fc_tag_len(tl), 0); + le16_to_cpu(tl.fc_len), 0); break; case EXT4_FC_TAG_TAIL: trace_ext4_fc_replay(sb, EXT4_FC_TAG_TAIL, 0, - ext4_fc_tag_len(tl), 0); - tail = (struct ext4_fc_tail *)ext4_fc_tag_val(tl); - WARN_ON(le32_to_cpu(tail->fc_tid) != expected_tid); + le16_to_cpu(tl.fc_len), 0); + memcpy(&tail, val, sizeof(tail)); + WARN_ON(le32_to_cpu(tail.fc_tid) != expected_tid); break; case EXT4_FC_TAG_HEAD: break; default: - trace_ext4_fc_replay(sb, le16_to_cpu(tl->fc_tag), 0, - ext4_fc_tag_len(tl), 0); + trace_ext4_fc_replay(sb, le16_to_cpu(tl.fc_tag), 0, + le16_to_cpu(tl.fc_len), 0); ret = -ECANCELED; break; } --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -146,12 +146,5 @@ struct ext4_fc_replay_state {
#define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1)
-#define fc_for_each_tl(__start, __end, __tl) \ - for (tl = (struct ext4_fc_tl *)start; \ - (u8 *)tl < (u8 *)end; \ - tl = (struct ext4_fc_tl *)((u8 *)tl + \ - sizeof(struct ext4_fc_tl) + \ - + le16_to_cpu(tl->fc_len))) -
#endif /* __FAST_COMMIT_H__ */
From: Phillip Potter phil@philpotter.co.uk
commit a8867f4e3809050571c98de7a2d465aff5e4daf5 upstream.
Fix a memory leak discovered by syzbot when a file system is corrupted with an illegally large s_log_groups_per_flex.
Reported-by: syzbot+aa12d6106ea4ca1b6aae@syzkaller.appspotmail.com Signed-off-by: Phillip Potter phil@philpotter.co.uk Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210412073837.1686-1-phil@philpotter.co.uk Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/mballoc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2738,7 +2738,7 @@ static int ext4_mb_init_backend(struct s */ if (sbi->s_es->s_log_groups_per_flex >= 32) { ext4_msg(sb, KERN_ERR, "too many log groups per flexible block group"); - goto err_freesgi; + goto err_freebuddy; } sbi->s_mb_prefetch = min_t(uint, 1 << sbi->s_es->s_log_groups_per_flex, BLK_MAX_SEGMENT_SIZE >> (sb->s_blocksize_bits - 9));
From: Ritesh Harjani riteshh@linux.ibm.com
commit b45f189a19b38e01676628db79cd3eeb1333516e upstream.
When running generic/527 with fast_commit configuration, the following issue is seen on Power. With fast_commit, during ext4_fc_replay() (which can be called from ext4_fill_super()), if inode eviction happens then it can access an uninitialized percpu counter variable.
This patch adds the check before accessing the counters in ext4_free_inode() path.
[ 321.165371] run fstests generic/527 at 2021-04-29 08:38:43 [ 323.027786] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: block_validity. Quota mode: none. [ 323.618772] BUG: Unable to handle kernel data access on read at 0x1fbd80000 [ 323.619767] Faulting instruction address: 0xc000000000bae78c cpu 0x1: Vector: 300 (Data Access) at [c000000010706ef0] pc: c000000000bae78c: percpu_counter_add_batch+0x3c/0x100 lr: c0000000006d0bb0: ext4_free_inode+0x780/0xb90 pid = 5593, comm = mount ext4_free_inode+0x780/0xb90 ext4_evict_inode+0xa8c/0xc60 evict+0xfc/0x1e0 ext4_fc_replay+0xc50/0x20f0 do_one_pass+0xfe0/0x1350 jbd2_journal_recover+0x184/0x2e0 jbd2_journal_load+0x1c0/0x4a0 ext4_fill_super+0x2458/0x4200 mount_bdev+0x1dc/0x290 ext4_mount+0x28/0x40 legacy_get_tree+0x4c/0xa0 vfs_get_tree+0x4c/0x120 path_mount+0xcf8/0xd70 do_mount+0x80/0xd0 sys_mount+0x3fc/0x490 system_call_exception+0x384/0x3d0 system_call_common+0xec/0x278
Cc: stable@kernel.org Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Ritesh Harjani riteshh@linux.ibm.com Reviewed-by: Harshad Shirwadkar harshadshirwadkar@gmail.com Link: https://lore.kernel.org/r/6cceb9a75c54bef8fa9696c1b08c8df5ff6169e2.161969241... Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ialloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -322,14 +322,16 @@ void ext4_free_inode(handle_t *handle, s if (is_directory) { count = ext4_used_dirs_count(sb, gdp) - 1; ext4_used_dirs_set(sb, gdp, count); - percpu_counter_dec(&sbi->s_dirs_counter); + if (percpu_counter_initialized(&sbi->s_dirs_counter)) + percpu_counter_dec(&sbi->s_dirs_counter); } ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8); ext4_group_desc_csum_set(sb, block_group, gdp); ext4_unlock_group(sb, block_group);
- percpu_counter_inc(&sbi->s_freeinodes_counter); + if (percpu_counter_initialized(&sbi->s_freeinodes_counter)) + percpu_counter_inc(&sbi->s_freeinodes_counter); if (sbi->s_log_groups_per_flex) { struct flex_groups *fg;
From: Phil Elwell phil@raspberrypi.com
In branches to which 24d209dba5a3 ("usb: dwc2: Fix hibernation between host and device modes.") has been back-ported, the bus_suspended member of struct dwc2_hsotg is only present in builds that support host-mode. To avoid having to pull in several more non-Fix commits in order to get it to compile, wrap the usage of the member in a macro conditional.
Fixes: 24d209dba5a3 ("usb: dwc2: Fix hibernation between host and device modes.") Signed-off-by: Phil Elwell phil@raspberrypi.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/dwc2/core_intr.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/usb/dwc2/core_intr.c +++ b/drivers/usb/dwc2/core_intr.c @@ -707,7 +707,11 @@ static inline void dwc_handle_gpwrdn_dis dwc2_writel(hsotg, gpwrdn_tmp, GPWRDN);
hsotg->hibernated = 0; + +#if IS_ENABLED(CONFIG_USB_DWC2_HOST) || \ + IS_ENABLED(CONFIG_USB_DWC2_DUAL_ROLE) hsotg->bus_suspended = 0; +#endif
if (gpwrdn & GPWRDN_IDSTS) { hsotg->op_state = OTG_STATE_B_PERIPHERAL;
From: Mark Rutland mark.rutland@arm.com
commit 0711f0d7050b9e07c44bc159bbc64ac0a1022c7f upstream.
During boot, kernel_init_freeable() initializes `cad_pid` to the init task's struct pid. Later on, we may change `cad_pid` via a sysctl, and when this happens proc_do_cad_pid() will increment the refcount on the new pid via get_pid(), and will decrement the refcount on the old pid via put_pid(). As we never called get_pid() when we initialized `cad_pid`, we decrement a reference we never incremented, can therefore free the init task's struct pid early. As there can be dangling references to the struct pid, we can later encounter a use-after-free (e.g. when delivering signals).
This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have been around since the conversion of `cad_pid` to struct pid in commit 9ec52099e4b8 ("[PATCH] replace cad_pid by a struct pid") from the pre-KASAN stone age of v2.6.19.
Fix this by getting a reference to the init task's struct pid when we assign it to `cad_pid`.
Full KASAN splat below.
================================================================== BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline] BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1 Hardware name: linux,dummy-virt (DT) Call trace: ns_of_pid include/linux/pid.h:153 [inline] task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 do_notify_parent+0x308/0xe60 kernel/signal.c:1950 exit_notify kernel/exit.c:682 [inline] do_exit+0x2334/0x2bd0 kernel/exit.c:845 do_group_exit+0x108/0x2c8 kernel/exit.c:922 get_signal+0x4e4/0x2a88 kernel/signal.c:2781 do_signal arch/arm64/kernel/signal.c:882 [inline] do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936 work_pending+0xc/0x2dc
Allocated by task 0: slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2915 [inline] kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920 alloc_pid+0xdc/0xc00 kernel/pid.c:180 copy_process+0x2794/0x5e18 kernel/fork.c:2129 kernel_clone+0x194/0x13c8 kernel/fork.c:2500 kernel_thread+0xd4/0x110 kernel/fork.c:2552 rest_init+0x44/0x4a0 init/main.c:687 arch_call_rest_init+0x1c/0x28 start_kernel+0x520/0x554 init/main.c:1064 0x0
Freed by task 270: slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kmem_cache_free+0x224/0x8e0 mm/slub.c:3177 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114 put_pid+0x30/0x48 kernel/pid.c:109 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617 call_write_iter include/linux/fs.h:1977 [inline] new_sync_write+0x3ac/0x510 fs/read_write.c:518 vfs_write fs/read_write.c:605 [inline] vfs_write+0x9c4/0x1018 fs/read_write.c:585 ksys_write+0x124/0x240 fs/read_write.c:658 __do_sys_write fs/read_write.c:670 [inline] __se_sys_write fs/read_write.c:667 [inline] __arm64_sys_write+0x78/0xb0 fs/read_write.c:667 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall arch/arm64/kernel/syscall.c:49 [inline] el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
The buggy address belongs to the object at ffff23794dda0000 which belongs to the cache pid of size 224 The buggy address is located 4 bytes inside of 224-byte region [ffff23794dda0000, ffff23794dda00e0) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0 head:(____ptrval____) order:1 compound_mapcount:0 flags: 0x3fffc0000010200(slab|head) raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080 raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ==================================================================
Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") Signed-off-by: Mark Rutland mark.rutland@arm.com Acked-by: Christian Brauner christian.brauner@ubuntu.com Cc: Cedric Le Goater clg@fr.ibm.com Cc: Christian Brauner christian@brauner.io Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kees Cook <keescook@chromium.org Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Paul Mackerras paulus@samba.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/init/main.c +++ b/init/main.c @@ -1505,7 +1505,7 @@ static noinline void __init kernel_init_ */ set_mems_allowed(node_states[N_MEMORY]);
- cad_pid = task_pid(current); + cad_pid = get_pid(task_pid(current));
smp_prepare_cpus(setup_max_cpus);
From: Junxiao Bi junxiao.bi@oracle.com
commit 6bba4471f0cc1296fe3c2089b9e52442d3074b2e upstream.
When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size.
Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily.
qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv
The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size.
fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0
v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T...
Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi junxiao.bi@oracle.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Jan Kara jack@suse.cz Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/file.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 5 deletions(-)
--- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1856,6 +1856,45 @@ out: }
/* + * zero out partial blocks of one cluster. + * + * start: file offset where zero starts, will be made upper block aligned. + * len: it will be trimmed to the end of current cluster if "start + len" + * is bigger than it. + */ +static int ocfs2_zeroout_partial_cluster(struct inode *inode, + u64 start, u64 len) +{ + int ret; + u64 start_block, end_block, nr_blocks; + u64 p_block, offset; + u32 cluster, p_cluster, nr_clusters; + struct super_block *sb = inode->i_sb; + u64 end = ocfs2_align_bytes_to_clusters(sb, start); + + if (start + len < end) + end = start + len; + + start_block = ocfs2_blocks_for_bytes(sb, start); + end_block = ocfs2_blocks_for_bytes(sb, end); + nr_blocks = end_block - start_block; + if (!nr_blocks) + return 0; + + cluster = ocfs2_bytes_to_clusters(sb, start); + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, + &nr_clusters, NULL); + if (ret) + return ret; + if (!p_cluster) + return 0; + + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); +} + +/* * Parts of this function taken from xfs_change_file_space() */ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(str { int ret; s64 llen; - loff_t size; + loff_t size, orig_isize; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct buffer_head *di_bh = NULL; handle_t *handle; @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(str goto out_inode_unlock; }
+ orig_isize = i_size_read(inode); switch (sr->l_whence) { case 0: /*SEEK_SET*/ break; @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(str sr->l_start += f_pos; break; case 2: /*SEEK_END*/ - sr->l_start += i_size_read(inode); + sr->l_start += orig_isize; break; default: ret = -EINVAL; @@ -1957,6 +1997,14 @@ static int __ocfs2_change_file_space(str default: ret = -EINVAL; } + + /* zeroout eof blocks in the cluster. */ + if (!ret && change_size && orig_isize < size) { + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, + size - orig_isize); + if (!ret) + i_size_write(inode, size); + } up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); @@ -1973,9 +2021,6 @@ static int __ocfs2_change_file_space(str goto out_inode_unlock; }
- if (change_size && i_size_read(inode) < size) - i_size_write(inode, size); - inode->i_ctime = inode->i_mtime = current_time(inode); ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); if (ret < 0)
From: Gerald Schaefer gerald.schaefer@linux.ibm.com
commit 04f7ce3f07ce39b1a3ca03a56b238a53acc52cfd upstream.
In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud entry, and so it does not match the given pmdp/pudp and (aligned down) pfn any more.
For s390, this results in memory corruption, because the IDTE instruction used e.g. in xxx_get_and_clear() will take the vaddr for some calculations, in combination with the given pmdp. It will then end up with a wrong table origin, ending on ...ff8, and some of those wrongly set low-order bits will also select a wrong pagetable level for the index addition. IDTE could therefore invalidate (or 0x20) something outside of the page tables, depending on the wrongly picked index, which in turn depends on the random vaddr.
As result, we sometimes see "BUG task_struct (Not tainted): Padding overwritten" on s390, where one 0x5a padding value got overwritten with 0x7a.
Fix this by aligning down, similar to how the pmd/pud_aligned pfns are calculated.
Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm.... Fixes: a5c3b9ffb0f40 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers") Signed-off-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Cc: Vineet Gupta vgupta@synopsys.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Paul Walmsley paul.walmsley@sifive.com Cc: stable@vger.kernel.org [5.9+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/debug_vm_pgtable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -163,7 +163,7 @@ static void __init pmd_advanced_tests(st
pr_debug("Validating PMD advanced\n"); /* Align the address wrt HPAGE_PMD_SIZE */ - vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE; + vaddr &= HPAGE_PMD_MASK;
pgtable_trans_huge_deposit(mm, pmdp, pgtable);
@@ -285,7 +285,7 @@ static void __init pud_advanced_tests(st
pr_debug("Validating PUD advanced\n"); /* Align the address wrt HPAGE_PUD_SIZE */ - vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE; + vaddr &= HPAGE_PUD_MASK;
set_pud_at(mm, vaddr, pudp, pud); pudp_set_wrprotect(mm, vaddr, pudp);
From: Ding Hui dinghui@sangfor.com.cn
commit bac9c6fa1f929213bbd0ac9cdf21e8e2f0916828 upstream.
Recently we found that there is a lot MemFree left in /proc/meminfo after do a lot of pages soft offline, it's not quite correct.
Before Oscar's rework of soft offline for free pages [1], if we soft offline free pages, these pages are left in buddy with HWPoison flag, and NR_FREE_PAGES is not updated immediately. So the difference between NR_FREE_PAGES and real number of available free pages is also even big at the beginning.
However, with the workload running, when we catch HWPoison page in any alloc functions subsequently, we will remove it from buddy, meanwhile update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get more and more closer to the real number of available free pages. (regardless of unpoison_memory())
Now, for offline free pages, after a successful call take_page_off_buddy(), the page is no longer belong to buddy allocator, and will not be used any more, but we missed accounting NR_FREE_PAGES in this situation, and there is no chance to be updated later.
Do update in take_page_off_buddy() like rmqueue() does, but avoid double counting if some one already set_migratetype_isolate() on the page.
[1]: commit 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")
Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages") Signed-off-by: Ding Hui dinghui@sangfor.com.cn Suggested-by: Naoya Horiguchi naoya.horiguchi@nec.com Reviewed-by: Oscar Salvador osalvador@suse.de Acked-by: David Hildenbrand david@redhat.com Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page_alloc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8870,6 +8870,8 @@ bool take_page_off_buddy(struct page *pa del_page_from_free_list(page_head, zone, page_order); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); + if (!is_migrate_isolate(migratetype)) + __mod_zone_freepage_state(zone, -1, migratetype); ret = true; break; }
From: Thomas Gleixner tglx@linutronix.de
commit 9bfecd05833918526cc7357d55e393393440c5fa upstream.
While digesting the XSAVE-related horrors which got introduced with the supervisor/user split, the recent addition of ENQCMD-related functionality got on the radar and turned out to be similarly broken.
update_pasid(), which is only required when X86_FEATURE_ENQCMD is available, is invoked from two places:
1) From switch_to() for the incoming task
2) Via a SMP function call from the IOMMU/SMV code
#1 is half-ways correct as it hacks around the brokenness of get_xsave_addr() by enforcing the state to be 'present', but all the conditionals in that code are completely pointless for that.
Also the invocation is just useless overhead because at that point it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task and all of this can be handled at return to user space.
#2 is broken beyond repair. The comment in the code claims that it is safe to invoke this in an IPI, but that's just wishful thinking.
FPU state of a running task is protected by fregs_lock() which is nothing else than a local_bh_disable(). As BH-disabled regions run usually with interrupts enabled the IPI can hit a code section which modifies FPU state and there is absolutely no guarantee that any of the assumptions which are made for the IPI case is true.
Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is invoked with a NULL pointer argument, so it can hit a completely unrelated task and unconditionally force an update for nothing. Worse, it can hit a kernel thread which operates on a user space address space and set a random PASID for it.
The offending commit does not cleanly revert, but it's sufficient to force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid() code to make this dysfunctional all over the place. Anything more complex would require more surgery and none of the related functions outside of the x86 core code are blatantly wrong, so removing those would be overkill.
As nothing enables the PASID bit in the IA32_XSS MSR yet, which is required to make this actually work, this cannot result in a regression except for related out of tree train-wrecks, but they are broken already today.
Fixes: 20f0afd1fb3d ("x86/mmu: Allocate/free a PASID") Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Andy Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/disabled-features.h | 7 +-- arch/x86/include/asm/fpu/api.h | 6 --- arch/x86/include/asm/fpu/internal.h | 7 --- arch/x86/kernel/fpu/xstate.c | 57 ------------------------------- 4 files changed, 3 insertions(+), 74 deletions(-)
--- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -56,11 +56,8 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif
-#ifdef CONFIG_IOMMU_SUPPORT -# define DISABLE_ENQCMD 0 -#else -# define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31)) -#endif +/* Force disable because it's broken beyond repair */ +#define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
/* * Make sure to add features to the correct mask --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -79,10 +79,6 @@ extern int cpu_has_xfeatures(u64 xfeatur */ #define PASID_DISABLED 0
-#ifdef CONFIG_IOMMU_SUPPORT -/* Update current's PASID MSR/state by mm's PASID. */ -void update_pasid(void); -#else static inline void update_pasid(void) { } -#endif + #endif /* _ASM_X86_FPU_API_H */ --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -584,13 +584,6 @@ static inline void switch_fpu_finish(str pkru_val = pk->pkru; } __write_pkru(pkru_val); - - /* - * Expensive PASID MSR write will be avoided in update_pasid() because - * TIF_NEED_FPU_LOAD was set. And the PASID state won't be updated - * unless it's different from mm->pasid to reduce overhead. - */ - update_pasid(); }
#endif /* _ASM_X86_FPU_INTERNAL_H */ --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1402,60 +1402,3 @@ int proc_pid_arch_status(struct seq_file return 0; } #endif /* CONFIG_PROC_PID_ARCH_STATUS */ - -#ifdef CONFIG_IOMMU_SUPPORT -void update_pasid(void) -{ - u64 pasid_state; - u32 pasid; - - if (!cpu_feature_enabled(X86_FEATURE_ENQCMD)) - return; - - if (!current->mm) - return; - - pasid = READ_ONCE(current->mm->pasid); - /* Set the valid bit in the PASID MSR/state only for valid pasid. */ - pasid_state = pasid == PASID_DISABLED ? - pasid : pasid | MSR_IA32_PASID_VALID; - - /* - * No need to hold fregs_lock() since the task's fpstate won't - * be changed by others (e.g. ptrace) while the task is being - * switched to or is in IPI. - */ - if (!test_thread_flag(TIF_NEED_FPU_LOAD)) { - /* The MSR is active and can be directly updated. */ - wrmsrl(MSR_IA32_PASID, pasid_state); - } else { - struct fpu *fpu = ¤t->thread.fpu; - struct ia32_pasid_state *ppasid_state; - struct xregs_state *xsave; - - /* - * The CPU's xstate registers are not currently active. Just - * update the PASID state in the memory buffer here. The - * PASID MSR will be loaded when returning to user mode. - */ - xsave = &fpu->state.xsave; - xsave->header.xfeatures |= XFEATURE_MASK_PASID; - ppasid_state = get_xsave_addr(xsave, XFEATURE_PASID); - /* - * Since XFEATURE_MASK_PASID is set in xfeatures, ppasid_state - * won't be NULL and no need to check its value. - * - * Only update the task's PASID state when it's different - * from the mm's pasid. - */ - if (ppasid_state->pasid != pasid_state) { - /* - * Invalid fpregs so that state restoring will pick up - * the PASID state. - */ - __fpu_invalidate_fpregs_state(fpu); - ppasid_state->pasid = pasid_state; - } - } -} -#endif /* CONFIG_IOMMU_SUPPORT */
From: Pu Wen puwen@hygon.cn
commit 009767dbf42ac0dbe3cf48c1ee224f6b778aa85a upstream.
The first two bits of the CPUID leaf 0x8000001F EAX indicate whether SEV or SME is supported, respectively. It's better to check whether SEV or SME is actually supported before accessing the MSR_AMD64_SEV to check whether SEV or SME is enabled.
This is both a bare-metal issue and a guest/VM issue. Since the first generation Hygon Dhyana CPU doesn't support the MSR_AMD64_SEV, reading that MSR results in a #GP - either directly from hardware in the bare-metal case or via the hypervisor (because the RDMSR is actually intercepted) in the guest/VM case, resulting in a failed boot. And since this is very early in the boot phase, rdmsrl_safe()/native_read_msr_safe() can't be used.
So check the CPUID bits first, before accessing the MSR.
[ tlendacky: Expand and improve commit message. ] [ bp: Massage commit message. ]
Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests") Signed-off-by: Pu Wen puwen@hygon.cn Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Tom Lendacky thomas.lendacky@amd.com Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210602070207.2480-1-puwen@hygon.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/mem_encrypt_identity.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -504,10 +504,6 @@ void __init sme_enable(struct boot_param #define AMD_SME_BIT BIT(0) #define AMD_SEV_BIT BIT(1)
- /* Check the SEV MSR whether SEV or SME is enabled */ - sev_status = __rdmsr(MSR_AMD64_SEV); - feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT; - /* * Check for the SME/SEV feature: * CPUID Fn8000_001F[EAX] @@ -519,11 +515,16 @@ void __init sme_enable(struct boot_param eax = 0x8000001f; ecx = 0; native_cpuid(&eax, &ebx, &ecx, &edx); - if (!(eax & feature_mask)) + /* Check whether SEV or SME is supported */ + if (!(eax & (AMD_SEV_BIT | AMD_SME_BIT))) return;
me_mask = 1UL << (ebx & 0x3f);
+ /* Check the SEV MSR whether SEV or SME is enabled */ + sev_status = __rdmsr(MSR_AMD64_SEV); + feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT; + /* Check if memory encryption is enabled */ if (feature_mask == AMD_SME_BIT) { /*
From: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com
commit 4ac06a1e013cf5fdd963317ffd3b968560f33bba upstream.
It's possible to trigger NULL pointer dereference by local unprivileged user, when calling getsockname() after failed bind() (e.g. the bind fails because LLCP_SAP_MAX used as SAP):
BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 Call Trace: llcp_sock_getname+0xb1/0xe0 __sys_getpeername+0x95/0xc0 ? lockdep_hardirqs_on_prepare+0xd5/0x180 ? syscall_enter_from_user_mode+0x1c/0x40 __x64_sys_getpeername+0x11/0x20 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae
This can be reproduced with Syzkaller C repro (bind followed by getpeername): https://syzkaller.appspot.com/x/repro.c?x=14def446e00000
Cc: stable@vger.kernel.org Fixes: d646960f7986 ("NFC: Initial LLCP support") Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com Reported-by: butt3rflyh4ck butterflyhuangxx@gmail.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonica... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/nfc/llcp_sock.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/nfc/llcp_sock.c +++ b/net/nfc/llcp_sock.c @@ -110,6 +110,7 @@ static int llcp_sock_bind(struct socket if (!llcp_sock->service_name) { nfc_llcp_local_put(llcp_sock->local); llcp_sock->local = NULL; + llcp_sock->dev = NULL; ret = -ENOMEM; goto put_dev; } @@ -119,6 +120,7 @@ static int llcp_sock_bind(struct socket llcp_sock->local = NULL; kfree(llcp_sock->service_name); llcp_sock->service_name = NULL; + llcp_sock->dev = NULL; ret = -EADDRINUSE; goto put_dev; }
From: Luben Tuikov luben.tuikov@amd.com
commit dce3d8e1d070900e0feeb06787a319ff9379212c upstream.
On QUERY2 IOCTL don't query counts of correctable and uncorrectable errors, since when RAS is enabled and supported on Vega20 server boards, this takes insurmountably long time, in O(n^3), which slows the system down to the point of it being unusable when we have GUI up.
Fixes: ae363a212b14 ("drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2") Cc: Alexander Deucher Alexander.Deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Luben Tuikov luben.tuikov@amd.com Reviewed-by: Alexander Deucher Alexander.Deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 16 ---------------- 1 file changed, 16 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -337,7 +337,6 @@ static int amdgpu_ctx_query2(struct amdg { struct amdgpu_ctx *ctx; struct amdgpu_ctx_mgr *mgr; - unsigned long ras_counter;
if (!fpriv) return -EINVAL; @@ -362,21 +361,6 @@ static int amdgpu_ctx_query2(struct amdg if (atomic_read(&ctx->guilty)) out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_GUILTY;
- /*query ue count*/ - ras_counter = amdgpu_ras_query_error_count(adev, false); - /*ras counter is monotonic increasing*/ - if (ras_counter != ctx->ras_counter_ue) { - out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_RAS_UE; - ctx->ras_counter_ue = ras_counter; - } - - /*query ce count*/ - ras_counter = amdgpu_ras_query_error_count(adev, true); - if (ras_counter != ctx->ras_counter_ce) { - out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_RAS_CE; - ctx->ras_counter_ce = ras_counter; - } - mutex_unlock(&mgr->lock); return 0; }
From: Nirmoy Das nirmoy.das@amd.com
commit 07438603a07e52f1c6aa731842bd298d2725b7be upstream.
Releasing pinned BOs is illegal now. UVD 6 was missing from: commit 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO")
Fixes: 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO") Cc: stable@vger.kernel.org Signed-off-by: Nirmoy Das nirmoy.das@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c @@ -356,6 +356,7 @@ static int uvd_v6_0_enc_ring_test_ib(str
error: dma_fence_put(fence); + amdgpu_bo_unpin(bo); amdgpu_bo_unreserve(bo); amdgpu_bo_unref(&bo); return r;
From: Thomas Gleixner tglx@linutronix.de
commit 7d65f9e80646c595e8c853640a9d0768a33e204c upstream.
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case.
When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors.
As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty.
Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors.
Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available.
Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan imran.f.khan@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/apic/apic.c | 1 + arch/x86/kernel/apic/vector.c | 20 ++++++++++++++++++++ 3 files changed, 22 insertions(+)
--- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -174,6 +174,7 @@ static inline int apic_is_clustered_box( extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask); extern void lapic_assign_system_vectors(void); extern void lapic_assign_legacy_vector(unsigned int isairq, bool replace); +extern void lapic_update_legacy_vectors(void); extern void lapic_online(void); extern void lapic_offline(void); extern bool apic_needs_pit(void); --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2539,6 +2539,7 @@ static void __init apic_bsp_setup(bool u end_local_APIC_setup(); irq_remap_enable_fault_handling(); setup_IO_APIC(); + lapic_update_legacy_vectors(); }
#ifdef CONFIG_UP_LATE_INIT --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -687,6 +687,26 @@ void lapic_assign_legacy_vector(unsigned irq_matrix_assign_system(vector_matrix, ISA_IRQ_VECTOR(irq), replace); }
+void __init lapic_update_legacy_vectors(void) +{ + unsigned int i; + + if (IS_ENABLED(CONFIG_X86_IO_APIC) && nr_ioapics > 0) + return; + + /* + * If the IO/APIC is disabled via config, kernel command line or + * lack of enumeration then all legacy interrupts are routed + * through the PIC. Make sure that they are marked as legacy + * vectors. PIC_CASCADE_IRQ has already been marked in + * lapic_assign_system_vectors(). + */ + for (i = 0; i < nr_legacy_irqs(); i++) { + if (i != PIC_CASCADE_IR) + lapic_assign_legacy_vector(i, true); + } +} + void __init lapic_assign_system_vectors(void) { unsigned int i, vector = 0;
From: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com
commit 82123a3d1d5a306fdf50c968a474cc60fe43a80f upstream.
When checking if the probed instruction is the suffix of a prefixed instruction, we access the instruction at the previous word. If the probed instruction is the very first word of a module, we can end up trying to access an invalid page.
Fix this by skipping the check for all instructions at the beginning of a page. Prefixed instructions cannot cross a 64-byte boundary and as such, we don't expect to encounter a suffix as the very first word in a page for kernel text. Even if there are prefixed instructions crossing a page boundary (from a module, for instance), the instruction will be illegal, so preventing probing on the suffix of such prefix instructions isn't worthwhile.
Fixes: b4657f7650ba ("powerpc/kprobes: Don't allow breakpoints on suffixes") Cc: stable@vger.kernel.org # v5.8+ Reported-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.162141666... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/kprobes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -108,7 +108,6 @@ int arch_prepare_kprobe(struct kprobe *p int ret = 0; struct kprobe *prev; struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr); - struct ppc_inst prefix = ppc_inst_read((struct ppc_inst *)(p->addr - 1));
if ((unsigned long)p->addr & 0x03) { printk("Attempt to register kprobe at an unaligned address\n"); @@ -116,7 +115,8 @@ int arch_prepare_kprobe(struct kprobe *p } else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) { printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n"); ret = -EINVAL; - } else if (ppc_inst_prefixed(prefix)) { + } else if ((unsigned long)p->addr & ~PAGE_MASK && + ppc_inst_prefixed(ppc_inst_read((struct ppc_inst *)(p->addr - 1)))) { printk("Cannot register a kprobe on the second word of prefixed instruction\n"); ret = -EINVAL; }
From: Josef Bacik josef@toxicpanda.com
commit d61bec08b904cf171835db98168f82bc338e92e4 upstream.
While doing error injection testing I saw that sometimes we'd get an abort that wouldn't stop the current transaction commit from completing. This abort was coming from finish ordered IO, but at this point in the transaction commit we should have gotten an error and stopped.
It turns out the abort came from finish ordered io while trying to write out the free space cache. It occurred to me that any failure inside of finish_ordered_io isn't actually raised to the person doing the writing, so we could have any number of failures in this path and think the ordered extent completed successfully and the inode was fine.
Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and marking the mapping of the inode with mapping_set_error, so any callers that simply call fdatawait will also get the error.
With this we're seeing the IO error on the free space inode when we fail to do the finish_ordered_io.
CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2760,6 +2760,18 @@ out: if (ret || truncated) { u64 unwritten_start = start;
+ /* + * If we failed to finish this ordered extent for any reason we + * need to make sure BTRFS_ORDERED_IOERR is set on the ordered + * extent, and mark the inode with the error if it wasn't + * already set. Any error during writeback would have already + * set the mapping error, so we need to set it if we're the ones + * marking this ordered extent as failed. + */ + if (ret && !test_and_set_bit(BTRFS_ORDERED_IOERR, + &ordered_extent->flags)) + mapping_set_error(ordered_extent->inode->i_mapping, -EIO); + if (truncated) unwritten_start += logical_len; clear_extent_uptodate(io_tree, unwritten_start, end, NULL);
From: Josef Bacik josef@toxicpanda.com
commit b86652be7c83f70bf406bed18ecf55adb9bfb91b upstream.
Error injection stress would sometimes fail with checksums on disk that did not have a corresponding extent. This occurred because the pattern in btrfs_del_csums was
while (1) { ret = btrfs_search_slot(); if (ret < 0) break; } ret = 0; out: btrfs_free_path(path); return ret;
If we got an error from btrfs_search_slot we'd clear the error because we were breaking instead of goto out. Instead of using goto out, simply handle the cases where we may leave a random value in ret, and get rid of the
ret = 0; out:
pattern and simply allow break to have the proper error reporting. With this fix we properly abort the transaction and do not commit thinking we successfully deleted the csum.
Reviewed-by: Qu Wenruo wqu@suse.com CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/file-item.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -690,7 +690,7 @@ int btrfs_del_csums(struct btrfs_trans_h u64 end_byte = bytenr + len; u64 csum_end; struct extent_buffer *leaf; - int ret; + int ret = 0; u16 csum_size = btrfs_super_csum_size(fs_info->super_copy); int blocksize_bits = fs_info->sb->s_blocksize_bits;
@@ -709,6 +709,7 @@ int btrfs_del_csums(struct btrfs_trans_h path->leave_spinning = 1; ret = btrfs_search_slot(trans, root, &key, path, -1, 1); if (ret > 0) { + ret = 0; if (path->slots[0] == 0) break; path->slots[0]--; @@ -765,7 +766,7 @@ int btrfs_del_csums(struct btrfs_trans_h ret = btrfs_del_items(trans, root, path, path->slots[0], del_nr); if (ret) - goto out; + break; if (key.offset == bytenr) break; } else if (key.offset < bytenr && csum_end > end_byte) { @@ -809,8 +810,9 @@ int btrfs_del_csums(struct btrfs_trans_h ret = btrfs_split_item(trans, root, path, &key, offset); if (ret && ret != -EAGAIN) { btrfs_abort_transaction(trans, ret); - goto out; + break; } + ret = 0;
key.offset = end_byte - 1; } else { @@ -820,8 +822,6 @@ int btrfs_del_csums(struct btrfs_trans_h } btrfs_release_path(path); } - ret = 0; -out: btrfs_free_path(path); return ret; }
From: Josef Bacik josef@toxicpanda.com
commit 856bd270dc4db209c779ce1e9555c7641ffbc88e upstream.
We are unconditionally returning 0 in cleanup_ref_head, despite the fact that btrfs_del_csums could fail. We need to return the error so the transaction gets aborted properly, fix this by returning ret from btrfs_del_csums in cleanup_ref_head.
Reviewed-by: Qu Wenruo wqu@suse.com CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1830,7 +1830,7 @@ static int cleanup_ref_head(struct btrfs trace_run_delayed_ref_head(fs_info, head, 0); btrfs_delayed_ref_unlock(head); btrfs_put_delayed_ref_head(head); - return 0; + return ret; }
static struct btrfs_delayed_ref_head *btrfs_obtain_ref_head(
From: Josef Bacik josef@toxicpanda.com
commit 011b28acf940eb61c000059dd9e2cfcbf52ed96b upstream.
This function has the following pattern
while (1) { ret = whatever(); if (ret) goto out; } ret = 0 out: return ret;
However several places in this while loop we simply break; when there's a problem, thus clearing the return value, and in one case we do a return -EIO, and leak the memory for the path.
Fix this by re-arranging the loop to deal with ret == 1 coming from btrfs_search_slot, and then simply delete the
ret = 0; out:
bit so everybody can break if there is an error, which will allow for proper error handling to occur.
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-log.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1752,6 +1752,7 @@ static noinline int fixup_inode_link_cou break;
if (ret == 1) { + ret = 0; if (path->slots[0] == 0) break; path->slots[0]--; @@ -1764,17 +1765,19 @@ static noinline int fixup_inode_link_cou
ret = btrfs_del_item(trans, root, path); if (ret) - goto out; + break;
btrfs_release_path(path); inode = read_one_inode(root, key.offset); - if (!inode) - return -EIO; + if (!inode) { + ret = -EIO; + break; + }
ret = fixup_inode_link_count(trans, root, inode); iput(inode); if (ret) - goto out; + break;
/* * fixup on a directory may create new entries, @@ -1783,8 +1786,6 @@ static noinline int fixup_inode_link_cou */ key.offset = (u64)-1; } - ret = 0; -out: btrfs_release_path(path); return ret; }
From: Josef Bacik josef@toxicpanda.com
commit dc09ef3562726cd520c8338c1640872a60187af5 upstream.
Error injection stress uncovered a problem where we'd leave a dangling inode ref if we failed during a rename_exchange. This happens because we insert the inode ref for one side of the rename, and then for the other side. If this second inode ref insert fails we'll leave the first one dangling and leave a corrupt file system behind. Fix this by aborting if we did the insert for the first inode ref.
CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8890,6 +8890,7 @@ static int btrfs_rename_exchange(struct int ret2; bool root_log_pinned = false; bool dest_log_pinned = false; + bool need_abort = false;
/* we only allow rename subvolume link between subvolumes */ if (old_ino != BTRFS_FIRST_FREE_OBJECTID && root != dest) @@ -8946,6 +8947,7 @@ static int btrfs_rename_exchange(struct old_idx); if (ret) goto out_fail; + need_abort = true; }
/* And now for the dest. */ @@ -8961,8 +8963,11 @@ static int btrfs_rename_exchange(struct new_ino, btrfs_ino(BTRFS_I(old_dir)), new_idx); - if (ret) + if (ret) { + if (need_abort) + btrfs_abort_transaction(trans, ret); goto out_fail; + } }
/* Update inode version and ctime/mtime. */
From: Filipe Manana fdmanana@suse.com
commit 76a6d5cd74479e7ec8a7f9a29bce63d5549b6b2e upstream.
There are a few cases where cloning an inline extent requires copying data into a page of the destination inode. For these cases we are allocating the required data and metadata space while holding a leaf locked. This can result in a deadlock when we are low on available space because allocating the space may flush delalloc and two deadlock scenarios can happen:
1) When starting writeback for an inode with a very small dirty range that fits in an inline extent, we deadlock during the writeback when trying to insert the inline extent, at cow_file_range_inline(), if the extent is going to be located in the leaf for which we are already holding a read lock;
2) After successfully starting writeback, for non-inline extent cases, the async reclaim thread will hang waiting for an ordered extent to complete if the ordered extent completion needs to modify the leaf for which the clone task is holding a read lock (for adding or replacing file extent items). So the cloning task will wait forever on the async reclaim thread to make progress, which in turn is waiting for the ordered extent completion which in turn is waiting to acquire a write lock on the same leaf.
So fix this by making sure we release the path (and therefore the leaf) every time we need to copy the inline extent's data into a page of the destination inode, as by that time we do not need to have the leaf locked.
Fixes: 05a5a7621ce66c ("Btrfs: implement full reflink support for inline extents") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/reflink.c | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-)
--- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -207,10 +207,7 @@ static int clone_copy_inline_extent(stru * inline extent's data to the page. */ ASSERT(key.offset > 0); - ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset, - inline_data, size, datal, - comp_type); - goto out; + goto copy_to_page; } } else if (i_size_read(dst) <= datal) { struct btrfs_file_extent_item *ei; @@ -226,13 +223,10 @@ static int clone_copy_inline_extent(stru BTRFS_FILE_EXTENT_INLINE) goto copy_inline_extent;
- ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset, - inline_data, size, datal, comp_type); - goto out; + goto copy_to_page; }
copy_inline_extent: - ret = 0; /* * We have no extent items, or we have an extent at offset 0 which may * or may not be inlined. All these cases are dealt the same way. @@ -244,11 +238,13 @@ copy_inline_extent: * clone. Deal with all these cases by copying the inline extent * data into the respective page at the destination inode. */ - ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset, - inline_data, size, datal, comp_type); - goto out; + goto copy_to_page; }
+ /* + * Release path before starting a new transaction so we don't hold locks + * that would confuse lockdep. + */ btrfs_release_path(path); /* * If we end up here it means were copy the inline extent into a leaf @@ -282,11 +278,6 @@ copy_inline_extent: out: if (!ret && !trans) { /* - * Release path before starting a new transaction so we don't - * hold locks that would confuse lockdep. - */ - btrfs_release_path(path); - /* * No transaction here means we copied the inline extent into a * page of the destination inode. * @@ -306,6 +297,21 @@ out: *trans_out = trans;
return ret; + +copy_to_page: + /* + * Release our path because we don't need it anymore and also because + * copy_inline_to_page() needs to reserve data and metadata, which may + * need to flush delalloc when we are low on available space and + * therefore cause a deadlock if writeback of an inline extent needs to + * write to the same leaf or an ordered extent completion needs to write + * to the same leaf. + */ + btrfs_release_path(path); + + ret = copy_inline_to_page(BTRFS_I(dst), new_key->offset, + inline_data, size, datal, comp_type); + goto out; }
/**
From: Mina Almasry almasrymina@google.com
[ Upstream commit d84cf06e3dd8c5c5b547b5d8931015fc536678e5 ]
The userfaultfd hugetlb tests cause a resv_huge_pages underflow. This happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on an index for which we already have a page in the cache. When this happens, we allocate a second page, double consuming the reservation, and then fail to insert the page into the cache and return -EEXIST.
To fix this, we first check if there is a page in the cache which already consumed the reservation, and return -EEXIST immediately if so.
There is still a rare condition where we fail to copy the page contents AND race with a call for hugetlb_no_page() for this index and again we will underflow resv_huge_pages. That is fixed in a more complicated patch not targeted for -stable.
Test:
Hacked the code locally such that resv_huge_pages underflows produce a warning, then:
./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
Both tests succeed and produce no warnings. After the test runs number of free/resv hugepages is correct.
[mike.kravetz@oracle.com: changelog fixes]
Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mina Almasry almasrymina@google.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Axel Rasmussen axelrasmussen@google.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 900851a4f914..bc1006a32733 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4708,10 +4708,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct page *page;
if (!*pagep) { - ret = -ENOMEM; + /* If a page already exists, then it's UFFDIO_COPY for + * a non-missing case. Return -EEXIST. + */ + if (vm_shared && + hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + ret = -EEXIST; + goto out; + } + page = alloc_huge_page(dst_vma, dst_addr, 0); - if (IS_ERR(page)) + if (IS_ERR(page)) { + ret = -ENOMEM; goto out; + }
ret = copy_huge_page_from_user(page, (const void __user *) src_addr,
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
commit a670ff578f1fb855fedc7931fa5bbc06b567af22 upstream.
Currently DPU driver scales bandwidth and core clock for sc7180 only, while the rest of chips get static bandwidth votes. Make all chipsets scale bandwidth and clock per composition requirements like sc7180 does. Drop old voting path completely.
Tested on RB3 (SDM845) and RB5 (SM8250).
Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20210401020533.3956787-2-dmitry.baryshkov@linaro.o... Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 3 - drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 51 ------------------------------- 2 files changed, 2 insertions(+), 52 deletions(-)
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c @@ -931,8 +931,7 @@ static int dpu_kms_hw_init(struct msm_km DPU_DEBUG("REG_DMA is not defined"); }
- if (of_device_is_compatible(dev->dev->of_node, "qcom,sc7180-mdss")) - dpu_kms_parse_data_bus_icc_path(dpu_kms); + dpu_kms_parse_data_bus_icc_path(dpu_kms);
pm_runtime_get_sync(&dpu_kms->pdev->dev);
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c @@ -31,40 +31,8 @@ struct dpu_mdss { void __iomem *mmio; struct dss_module_power mp; struct dpu_irq_controller irq_controller; - struct icc_path *path[2]; - u32 num_paths; };
-static int dpu_mdss_parse_data_bus_icc_path(struct drm_device *dev, - struct dpu_mdss *dpu_mdss) -{ - struct icc_path *path0 = of_icc_get(dev->dev, "mdp0-mem"); - struct icc_path *path1 = of_icc_get(dev->dev, "mdp1-mem"); - - if (IS_ERR_OR_NULL(path0)) - return PTR_ERR_OR_ZERO(path0); - - dpu_mdss->path[0] = path0; - dpu_mdss->num_paths = 1; - - if (!IS_ERR_OR_NULL(path1)) { - dpu_mdss->path[1] = path1; - dpu_mdss->num_paths++; - } - - return 0; -} - -static void dpu_mdss_icc_request_bw(struct msm_mdss *mdss) -{ - struct dpu_mdss *dpu_mdss = to_dpu_mdss(mdss); - int i; - u64 avg_bw = dpu_mdss->num_paths ? MAX_BW / dpu_mdss->num_paths : 0; - - for (i = 0; i < dpu_mdss->num_paths; i++) - icc_set_bw(dpu_mdss->path[i], avg_bw, kBps_to_icc(MAX_BW)); -} - static void dpu_mdss_irq(struct irq_desc *desc) { struct dpu_mdss *dpu_mdss = irq_desc_get_handler_data(desc); @@ -178,8 +146,6 @@ static int dpu_mdss_enable(struct msm_md struct dss_module_power *mp = &dpu_mdss->mp; int ret;
- dpu_mdss_icc_request_bw(mdss); - ret = msm_dss_enable_clk(mp->clk_config, mp->num_clk, true); if (ret) { DPU_ERROR("clock enable failed, ret:%d\n", ret); @@ -213,15 +179,12 @@ static int dpu_mdss_disable(struct msm_m { struct dpu_mdss *dpu_mdss = to_dpu_mdss(mdss); struct dss_module_power *mp = &dpu_mdss->mp; - int ret, i; + int ret;
ret = msm_dss_enable_clk(mp->clk_config, mp->num_clk, false); if (ret) DPU_ERROR("clock disable failed, ret:%d\n", ret);
- for (i = 0; i < dpu_mdss->num_paths; i++) - icc_set_bw(dpu_mdss->path[i], 0, 0); - return ret; }
@@ -232,7 +195,6 @@ static void dpu_mdss_destroy(struct drm_ struct dpu_mdss *dpu_mdss = to_dpu_mdss(priv->mdss); struct dss_module_power *mp = &dpu_mdss->mp; int irq; - int i;
pm_runtime_suspend(dev->dev); pm_runtime_disable(dev->dev); @@ -242,9 +204,6 @@ static void dpu_mdss_destroy(struct drm_ msm_dss_put_clk(mp->clk_config, mp->num_clk); devm_kfree(&pdev->dev, mp->clk_config);
- for (i = 0; i < dpu_mdss->num_paths; i++) - icc_put(dpu_mdss->path[i]); - if (dpu_mdss->mmio) devm_iounmap(&pdev->dev, dpu_mdss->mmio); dpu_mdss->mmio = NULL; @@ -276,12 +235,6 @@ int dpu_mdss_init(struct drm_device *dev
DRM_DEBUG("mapped mdss address space @%pK\n", dpu_mdss->mmio);
- if (!of_device_is_compatible(dev->dev->of_node, "qcom,sc7180-mdss")) { - ret = dpu_mdss_parse_data_bus_icc_path(dev, dpu_mdss); - if (ret) - return ret; - } - mp = &dpu_mdss->mp; ret = msm_dss_parse_clock(pdev, mp); if (ret) { @@ -307,8 +260,6 @@ int dpu_mdss_init(struct drm_device *dev
pm_runtime_enable(dev->dev);
- dpu_mdss_icc_request_bw(priv->mdss); - return ret;
irq_error:
From: Anand Jain anand.jain@oracle.com
commit 5e753a817b2d5991dfe8a801b7b1e8e79a1c5a20 upstream.
The following test case reproduces an issue of wrongly freeing in-use blocks on the readonly seed device when fstrim is called on the rw sprout device. As shown below.
Create a seed device and add a sprout device to it:
$ mkfs.btrfs -fq -dsingle -msingle /dev/loop0 $ btrfstune -S 1 /dev/loop0 $ mount /dev/loop0 /btrfs $ btrfs dev add -f /dev/loop1 /btrfs BTRFS info (device loop0): relocating block group 290455552 flags system BTRFS info (device loop0): relocating block group 1048576 flags system BTRFS info (device loop0): disk added /dev/loop1 $ umount /btrfs
Mount the sprout device and run fstrim:
$ mount /dev/loop1 /btrfs $ fstrim /btrfs $ umount /btrfs
Now try to mount the seed device, and it fails:
$ mount /dev/loop0 /btrfs mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
Block 5292032 is missing on the readonly seed device:
$ dmesg -kt | tail <snip> BTRFS error (device loop0): bad tree block start, want 5292032 have 0 BTRFS warning (device loop0): couldn't read-tree root BTRFS error (device loop0): open_ctree failed
From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880:
$ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP <snip> item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24 block group used 114688 chunk_objectid 256 flags METADATA <snip>
From the dump-tree of the sprout device (taken before the fstrim).
fstrim used block-group 5242880 to find the related free space to free:
$ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP <snip> item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24 block group used 32768 chunk_objectid 256 flags METADATA <snip>
BPF kernel tracing the fstrim command finds the missing block 5292032 within the range of the discarded blocks as below:
kprobe:btrfs_discard_extent { printf("freeing start %llu end %llu num_bytes %llu:\n", arg1, arg1+arg2, arg2); }
freeing start 5259264 end 5406720 num_bytes 147456 <snip>
Fix this by avoiding the discard command to the readonly seed device.
Reported-by: Chris Murphy lists@colorremedies.com CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1297,16 +1297,20 @@ int btrfs_discard_extent(struct btrfs_fs for (i = 0; i < bbio->num_stripes; i++, stripe++) { u64 bytes; struct request_queue *req_q; + struct btrfs_device *device = stripe->dev;
- if (!stripe->dev->bdev) { + if (!device->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); + req_q = bdev_get_queue(device->bdev); if (!blk_queue_discard(req_q)) continue;
- ret = btrfs_issue_discard(stripe->dev->bdev, + if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) + continue; + + ret = btrfs_issue_discard(device->bdev, stripe->physical, stripe->length, &bytes);
From: Sean Christopherson seanjc@google.com
commit 0884335a2e653b8a045083aa1d57ce74269ac81d upstream.
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit mode, the operand size is fixed at 32 bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler") Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210422022128.3464144-4-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2362,7 +2362,7 @@ static int cr_interception(struct vcpu_s err = 0; if (cr >= 16) { /* mov to cr */ cr -= 16; - val = kvm_register_read(&svm->vcpu, reg); + val = kvm_register_readl(&svm->vcpu, reg); trace_kvm_cr_write(cr, val); switch (cr) { case 0: @@ -2408,7 +2408,7 @@ static int cr_interception(struct vcpu_s kvm_queue_exception(&svm->vcpu, UD_VECTOR); return 1; } - kvm_register_write(&svm->vcpu, reg, val); + kvm_register_writel(&svm->vcpu, reg, val); trace_kvm_cr_read(cr, val); } return kvm_complete_insn_gp(&svm->vcpu, err); @@ -2439,13 +2439,13 @@ static int dr_interception(struct vcpu_s if (dr >= 16) { /* mov to DRn */ if (!kvm_require_dr(&svm->vcpu, dr - 16)) return 1; - val = kvm_register_read(&svm->vcpu, reg); + val = kvm_register_readl(&svm->vcpu, reg); kvm_set_dr(&svm->vcpu, dr - 16, val); } else { if (!kvm_require_dr(&svm->vcpu, dr)) return 1; kvm_get_dr(&svm->vcpu, dr, &val); - kvm_register_write(&svm->vcpu, reg, val); + kvm_register_writel(&svm->vcpu, reg, val); }
return kvm_skip_emulated_instruction(&svm->vcpu);
From: Marc Zyngier maz@kernel.org
commit cb853ded1d25e5b026ce115dbcde69e3d7e2e831 upstream.
Commit 03fdfb2690099 ("KVM: arm64: Don't write junk to sysregs on reset") flipped the register number to 0 for all the debug registers in the sysreg table, hereby indicating that these registers live in a separate shadow structure.
However, the author of this patch failed to realise that all the accessors are using that particular index instead of the register encoding, resulting in all the registers hitting index 0. Not quite a valid implementation of the architecture...
Address the issue by fixing all the accessors to use the CRm field of the encoding, which contains the debug register index.
Fixes: 03fdfb2690099 ("KVM: arm64: Don't write junk to sysregs on reset") Reported-by: Ricardo Koller ricarkol@google.com Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/sys_regs.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-)
--- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -464,14 +464,14 @@ static bool trap_bvr(struct kvm_vcpu *vc struct sys_reg_params *p, const struct sys_reg_desc *rd) { - u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg]; + u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->CRm];
if (p->is_write) reg_to_dbg(vcpu, p, dbg_reg); else dbg_to_reg(vcpu, p, dbg_reg);
- trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg); + trace_trap_reg(__func__, rd->CRm, p->is_write, *dbg_reg);
return true; } @@ -479,7 +479,7 @@ static bool trap_bvr(struct kvm_vcpu *vc static int set_bvr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->CRm];
if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -489,7 +489,7 @@ static int set_bvr(struct kvm_vcpu *vcpu static int get_bvr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->CRm];
if (copy_to_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -499,21 +499,21 @@ static int get_bvr(struct kvm_vcpu *vcpu static void reset_bvr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) { - vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg] = rd->val; + vcpu->arch.vcpu_debug_state.dbg_bvr[rd->CRm] = rd->val; }
static bool trap_bcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *rd) { - u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->reg]; + u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->CRm];
if (p->is_write) reg_to_dbg(vcpu, p, dbg_reg); else dbg_to_reg(vcpu, p, dbg_reg);
- trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg); + trace_trap_reg(__func__, rd->CRm, p->is_write, *dbg_reg);
return true; } @@ -521,7 +521,7 @@ static bool trap_bcr(struct kvm_vcpu *vc static int set_bcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->CRm];
if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -532,7 +532,7 @@ static int set_bcr(struct kvm_vcpu *vcpu static int get_bcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->CRm];
if (copy_to_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -542,22 +542,22 @@ static int get_bcr(struct kvm_vcpu *vcpu static void reset_bcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) { - vcpu->arch.vcpu_debug_state.dbg_bcr[rd->reg] = rd->val; + vcpu->arch.vcpu_debug_state.dbg_bcr[rd->CRm] = rd->val; }
static bool trap_wvr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *rd) { - u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg]; + u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->CRm];
if (p->is_write) reg_to_dbg(vcpu, p, dbg_reg); else dbg_to_reg(vcpu, p, dbg_reg);
- trace_trap_reg(__func__, rd->reg, p->is_write, - vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg]); + trace_trap_reg(__func__, rd->CRm, p->is_write, + vcpu->arch.vcpu_debug_state.dbg_wvr[rd->CRm]);
return true; } @@ -565,7 +565,7 @@ static bool trap_wvr(struct kvm_vcpu *vc static int set_wvr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->CRm];
if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -575,7 +575,7 @@ static int set_wvr(struct kvm_vcpu *vcpu static int get_wvr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->CRm];
if (copy_to_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -585,21 +585,21 @@ static int get_wvr(struct kvm_vcpu *vcpu static void reset_wvr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) { - vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg] = rd->val; + vcpu->arch.vcpu_debug_state.dbg_wvr[rd->CRm] = rd->val; }
static bool trap_wcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *rd) { - u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->reg]; + u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->CRm];
if (p->is_write) reg_to_dbg(vcpu, p, dbg_reg); else dbg_to_reg(vcpu, p, dbg_reg);
- trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg); + trace_trap_reg(__func__, rd->CRm, p->is_write, *dbg_reg);
return true; } @@ -607,7 +607,7 @@ static bool trap_wcr(struct kvm_vcpu *vc static int set_wcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->CRm];
if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -617,7 +617,7 @@ static int set_wcr(struct kvm_vcpu *vcpu static int get_wcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->reg]; + __u64 *r = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->CRm];
if (copy_to_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0) return -EFAULT; @@ -627,7 +627,7 @@ static int get_wcr(struct kvm_vcpu *vcpu static void reset_wcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) { - vcpu->arch.vcpu_debug_state.dbg_wcr[rd->reg] = rd->val; + vcpu->arch.vcpu_debug_state.dbg_wcr[rd->CRm] = rd->val; }
static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 8b79feffeca28c5459458fe78676b081e87c93a4 upstream.
Various PV features (Async PF, PV EOI, steal time) work through memory shared with hypervisor and when we restore from hibernation we must properly teardown all these features to make sure hypervisor doesn't write to stale locations after we jump to the previously hibernated kernel (which can try to place anything there). For secondary CPUs the job is already done by kvm_cpu_down_prepare(), register syscore ops to do the same for boot CPU.
Krzysztof: This fixes memory corruption visible after second resume from hibernation:
BUG: Bad page state in process dbus-daemon pfn:18b01 page:ffffea000062c040 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 compound_mapcount: -30591 flags: 0xfffffc0078141(locked|error|workingset|writeback|head|mappedtodisk|reclaim) raw: 000fffffc0078141 dead0000000002d0 dead000000000100 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x78141(locked|error|workingset|writeback|head|mappedtodisk|reclaim)
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20210414123544.1060604-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Andrea Righi andrea.righi@canonical.com [krzysztof: Extend the commit message, adjust for v5.10 context] Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kvm.c | 57 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 16 deletions(-)
--- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -26,6 +26,7 @@ #include <linux/kprobes.h> #include <linux/nmi.h> #include <linux/swait.h> +#include <linux/syscore_ops.h> #include <asm/timer.h> #include <asm/cpu.h> #include <asm/traps.h> @@ -460,6 +461,25 @@ static bool pv_tlb_flush_supported(void)
static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask);
+static void kvm_guest_cpu_offline(void) +{ + kvm_disable_steal_time(); + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) + wrmsrl(MSR_KVM_PV_EOI_EN, 0); + kvm_pv_disable_apf(); + apf_task_wake_all(); +} + +static int kvm_cpu_online(unsigned int cpu) +{ + unsigned long flags; + + local_irq_save(flags); + kvm_guest_cpu_init(); + local_irq_restore(flags); + return 0; +} + #ifdef CONFIG_SMP
static bool pv_ipi_supported(void) @@ -587,31 +607,34 @@ static void __init kvm_smp_prepare_boot_ kvm_spinlock_init(); }
-static void kvm_guest_cpu_offline(void) +static int kvm_cpu_down_prepare(unsigned int cpu) { - kvm_disable_steal_time(); - if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) - wrmsrl(MSR_KVM_PV_EOI_EN, 0); - kvm_pv_disable_apf(); - apf_task_wake_all(); -} + unsigned long flags;
-static int kvm_cpu_online(unsigned int cpu) -{ - local_irq_disable(); - kvm_guest_cpu_init(); - local_irq_enable(); + local_irq_save(flags); + kvm_guest_cpu_offline(); + local_irq_restore(flags); return 0; }
-static int kvm_cpu_down_prepare(unsigned int cpu) +#endif + +static int kvm_suspend(void) { - local_irq_disable(); kvm_guest_cpu_offline(); - local_irq_enable(); + return 0; } -#endif + +static void kvm_resume(void) +{ + kvm_cpu_online(raw_smp_processor_id()); +} + +static struct syscore_ops kvm_syscore_ops = { + .suspend = kvm_suspend, + .resume = kvm_resume, +};
static void kvm_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info) @@ -681,6 +704,8 @@ static void __init kvm_guest_init(void) kvm_guest_cpu_init(); #endif
+ register_syscore_ops(&kvm_syscore_ops); + /* * Hard lockup detection is enabled by default. Disable it, as guests * can get false positives too easily, for example if the host is
From: Vitaly Kuznetsov vkuznets@redhat.com
commit c02027b5742b5aa804ef08a4a9db433295533046 upstream.
Currenly, we disable kvmclock from machine_shutdown() hook and this only happens for boot CPU. We need to disable it for all CPUs to guard against memory corruption e.g. on restore from hibernate.
Note, writing '0' to kvmclock MSR doesn't clear memory location, it just prevents hypervisor from updating the location so for the short while after write and while CPU is still alive, the clock remains usable and correct so we don't need to switch to some other clocksource.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20210414123544.1060604-4-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Andrea Righi andrea.righi@canonical.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/kvm_para.h | 4 ++-- arch/x86/kernel/kvm.c | 1 + arch/x86/kernel/kvmclock.c | 5 +---- 3 files changed, 4 insertions(+), 6 deletions(-)
--- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -7,8 +7,6 @@ #include <linux/interrupt.h> #include <uapi/asm/kvm_para.h>
-extern void kvmclock_init(void); - #ifdef CONFIG_KVM_GUEST bool kvm_check_and_clear_guest_paused(void); #else @@ -86,6 +84,8 @@ static inline long kvm_hypercall4(unsign }
#ifdef CONFIG_KVM_GUEST +void kvmclock_init(void); +void kvmclock_disable(void); bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -468,6 +468,7 @@ static void kvm_guest_cpu_offline(void) wrmsrl(MSR_KVM_PV_EOI_EN, 0); kvm_pv_disable_apf(); apf_task_wake_all(); + kvmclock_disable(); }
static int kvm_cpu_online(unsigned int cpu) --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -221,11 +221,9 @@ static void kvm_crash_shutdown(struct pt } #endif
-static void kvm_shutdown(void) +void kvmclock_disable(void) { native_write_msr(msr_kvm_system_time, 0, 0); - kvm_disable_steal_time(); - native_machine_shutdown(); }
static void __init kvmclock_init_mem(void) @@ -352,7 +350,6 @@ void __init kvmclock_init(void) #endif x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; - machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC_CORE machine_ops.crash_shutdown = kvm_crash_shutdown; #endif
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 3d6b84132d2a57b5a74100f6923a8feb679ac2ce upstream.
Crash shutdown handler only disables kvmclock and steal time, other PV features remain active so we risk corrupting memory or getting some side-effects in kdump kernel. Move crash handler to kvm.c and unify with CPU offline.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20210414123544.1060604-5-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/kvm_para.h | 6 ----- arch/x86/kernel/kvm.c | 44 +++++++++++++++++++++++++++++----------- arch/x86/kernel/kvmclock.c | 21 ------------------- 3 files changed, 32 insertions(+), 39 deletions(-)
--- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -92,7 +92,6 @@ unsigned int kvm_arch_para_hints(void); void kvm_async_pf_task_wait_schedule(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_apf_flags(void); -void kvm_disable_steal_time(void); bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token);
DECLARE_STATIC_KEY_FALSE(kvm_async_pf_enabled); @@ -137,11 +136,6 @@ static inline u32 kvm_read_and_reset_apf return 0; }
-static inline void kvm_disable_steal_time(void) -{ - return; -} - static __always_inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token) { return false; --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -38,6 +38,7 @@ #include <asm/tlb.h> #include <asm/cpuidle_haltpoll.h> #include <asm/ptrace.h> +#include <asm/reboot.h> #include <asm/svm.h>
DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); @@ -375,6 +376,14 @@ static void kvm_pv_disable_apf(void) pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id()); }
+static void kvm_disable_steal_time(void) +{ + if (!has_steal_clock) + return; + + wrmsr(MSR_KVM_STEAL_TIME, 0, 0); +} + static void kvm_pv_guest_cpu_reboot(void *unused) { /* @@ -417,14 +426,6 @@ static u64 kvm_steal_clock(int cpu) return steal; }
-void kvm_disable_steal_time(void) -{ - if (!has_steal_clock) - return; - - wrmsr(MSR_KVM_STEAL_TIME, 0, 0); -} - static inline void __set_percpu_decrypted(void *ptr, unsigned long size) { early_set_memory_decrypted((unsigned long) ptr, size); @@ -461,13 +462,14 @@ static bool pv_tlb_flush_supported(void)
static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask);
-static void kvm_guest_cpu_offline(void) +static void kvm_guest_cpu_offline(bool shutdown) { kvm_disable_steal_time(); if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) wrmsrl(MSR_KVM_PV_EOI_EN, 0); kvm_pv_disable_apf(); - apf_task_wake_all(); + if (!shutdown) + apf_task_wake_all(); kvmclock_disable(); }
@@ -613,7 +615,7 @@ static int kvm_cpu_down_prepare(unsigned unsigned long flags;
local_irq_save(flags); - kvm_guest_cpu_offline(); + kvm_guest_cpu_offline(false); local_irq_restore(flags); return 0; } @@ -622,7 +624,7 @@ static int kvm_cpu_down_prepare(unsigned
static int kvm_suspend(void) { - kvm_guest_cpu_offline(); + kvm_guest_cpu_offline(false);
return 0; } @@ -637,6 +639,20 @@ static struct syscore_ops kvm_syscore_op .resume = kvm_resume, };
+/* + * After a PV feature is registered, the host will keep writing to the + * registered memory location. If the guest happens to shutdown, this memory + * won't be valid. In cases like kexec, in which you install a new kernel, this + * means a random memory location will be kept being written. + */ +#ifdef CONFIG_KEXEC_CORE +static void kvm_crash_shutdown(struct pt_regs *regs) +{ + kvm_guest_cpu_offline(true); + native_machine_crash_shutdown(regs); +} +#endif + static void kvm_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info) { @@ -705,6 +721,10 @@ static void __init kvm_guest_init(void) kvm_guest_cpu_init(); #endif
+#ifdef CONFIG_KEXEC_CORE + machine_ops.crash_shutdown = kvm_crash_shutdown; +#endif + register_syscore_ops(&kvm_syscore_ops);
/* --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -20,7 +20,6 @@ #include <asm/hypervisor.h> #include <asm/mem_encrypt.h> #include <asm/x86_init.h> -#include <asm/reboot.h> #include <asm/kvmclock.h>
static int kvmclock __initdata = 1; @@ -204,23 +203,6 @@ static void kvm_setup_secondary_clock(vo } #endif
-/* - * After the clock is registered, the host will keep writing to the - * registered memory location. If the guest happens to shutdown, this memory - * won't be valid. In cases like kexec, in which you install a new kernel, this - * means a random memory location will be kept being written. So before any - * kind of shutdown from our side, we unregister the clock by writing anything - * that does not have the 'enable' bit set in the msr - */ -#ifdef CONFIG_KEXEC_CORE -static void kvm_crash_shutdown(struct pt_regs *regs) -{ - native_write_msr(msr_kvm_system_time, 0, 0); - kvm_disable_steal_time(); - native_machine_crash_shutdown(regs); -} -#endif - void kvmclock_disable(void) { native_write_msr(msr_kvm_system_time, 0, 0); @@ -350,9 +332,6 @@ void __init kvmclock_init(void) #endif x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; -#ifdef CONFIG_KEXEC_CORE - machine_ops.crash_shutdown = kvm_crash_shutdown; -#endif kvm_get_preset_lpj();
/*
From: Gao Xiang hsiangkao@redhat.com
commit 89b158635ad79574bde8e94d45dad33f8cf09549 upstream.
LZ4 final literal copy could be overlapped when doing in-place decompression, so it's unsafe to just use memcpy() on an optimized memcpy approach but memmove() instead.
Upstream LZ4 has updated this years ago [1] (and the impact is non-sensible [2] plus only a few bytes remain), this commit just synchronizes LZ4 upstream code to the kernel side as well.
It can be observed as EROFS in-place decompression failure on specific files when X86_FEATURE_ERMS is unsupported, memcpy() optimization of commit 59daa706fbec ("x86, mem: Optimize memcpy by avoiding memory false dependece") will be enabled then.
Currently most modern x86-CPUs support ERMS, these CPUs just use "rep movsb" approach so no problem at all. However, it can still be verified with forcely disabling ERMS feature...
arch/x86/lib/memcpy_64.S: ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \ - "jmp memcpy_erms", X86_FEATURE_ERMS + "jmp memcpy_orig", X86_FEATURE_ERMS
We didn't observe any strange on arm64/arm/x86 platform before since most memcpy() would behave in an increasing address order ("copy upwards" [3]) and it's the correct order of in-place decompression but it really needs an update to memmove() for sure considering it's an undefined behavior according to the standard and some unique optimization already exists in the kernel.
[1] https://github.com/lz4/lz4/commit/33cb8518ac385835cc17be9a770b27b40cd0e15b [2] https://github.com/lz4/lz4/pull/717#issuecomment-497818921 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=12518
Link: https://lkml.kernel.org/r/20201122030749.2698994-1-hsiangkao@redhat.com Signed-off-by: Gao Xiang hsiangkao@redhat.com Reviewed-by: Nick Terrell terrelln@fb.com Cc: Yann Collet yann.collet.73@gmail.com Cc: Miao Xie miaoxie@huawei.com Cc: Chao Yu yuchao0@huawei.com Cc: Li Guifu bluce.liguifu@huawei.com Cc: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/lz4/lz4_decompress.c | 6 +++++- lib/lz4/lz4defs.h | 1 + 2 files changed, 6 insertions(+), 1 deletion(-)
--- a/lib/lz4/lz4_decompress.c +++ b/lib/lz4/lz4_decompress.c @@ -263,7 +263,11 @@ static FORCE_INLINE int LZ4_decompress_g } }
- LZ4_memcpy(op, ip, length); + /* + * supports overlapping memory regions; only matters + * for in-place decompression scenarios + */ + LZ4_memmove(op, ip, length); ip += length; op += length;
--- a/lib/lz4/lz4defs.h +++ b/lib/lz4/lz4defs.h @@ -146,6 +146,7 @@ static FORCE_INLINE void LZ4_writeLE16(v * environments. This is needed when decompressing the Linux Kernel, for example. */ #define LZ4_memcpy(dst, src, size) __builtin_memcpy(dst, src, size) +#define LZ4_memmove(dst, src, size) __builtin_memmove(dst, src, size)
static FORCE_INLINE void LZ4_copy8(void *dst, const void *src) {
From: Roja Rani Yarubandi rojay@codeaurora.org
commit 57648e860485de39c800a89f849fdd03c2d31d15 upstream.
Mark bus as suspended during system suspend to block the future transfers. Implement geni_i2c_resume_noirq() to resume the bus.
Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by: Roja Rani Yarubandi rojay@codeaurora.org Reviewed-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-qcom-geni.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -702,6 +702,8 @@ static int __maybe_unused geni_i2c_suspe { struct geni_i2c_dev *gi2c = dev_get_drvdata(dev);
+ i2c_mark_adapter_suspended(&gi2c->adap); + if (!gi2c->suspended) { geni_i2c_runtime_suspend(dev); pm_runtime_disable(dev); @@ -711,8 +713,16 @@ static int __maybe_unused geni_i2c_suspe return 0; }
+static int __maybe_unused geni_i2c_resume_noirq(struct device *dev) +{ + struct geni_i2c_dev *gi2c = dev_get_drvdata(dev); + + i2c_mark_adapter_resumed(&gi2c->adap); + return 0; +} + static const struct dev_pm_ops geni_i2c_pm_ops = { - SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(geni_i2c_suspend_noirq, NULL) + SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(geni_i2c_suspend_noirq, geni_i2c_resume_noirq) SET_RUNTIME_PM_OPS(geni_i2c_runtime_suspend, geni_i2c_runtime_resume, NULL) };
From: Pablo Neira Ayuso pablo@netfilter.org
commit c781471d67a56d7d4c113669a11ede0463b5c719 upstream.
Sometimes users forget to turn on nftables extensions from Kconfig that they need. In such case, the error reporting from userspace is misleading:
$ sudo nft add rule x y counter Error: Could not process rule: No such file or directory add rule x y counter ^^^^^^^^^^^^^^^^^^^^
Add missing NL_SET_BAD_ATTR() to provide a hint:
$ nft add rule x y counter Error: Could not process rule: No such file or directory add rule x y counter ^^^^^^^
Fixes: 83d9dcba06c5 ("netfilter: nf_tables: extended netlink error reporting for expressions") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3263,8 +3263,10 @@ static int nf_tables_newrule(struct net if (n == NFT_RULE_MAXEXPRS) goto err1; err = nf_tables_expr_parse(&ctx, tmp, &info[n]); - if (err < 0) + if (err < 0) { + NL_SET_BAD_ATTR(extack, tmp); goto err1; + } size += info[n].ops->size; n++; }
From: Roger Pau Monne roger.pau@citrix.com
commit 107866a8eb0b664675a260f1ba0655010fac1e08 upstream.
Do this in order to prevent the task from being freed if the thread returns (which can be triggered by the frontend) before the call to kthread_stop done as part of the backend tear down. Not taking the reference will lead to a use-after-free in that scenario. Such reference was taken before but dropped as part of the rework done in 2ac061ce97f4.
Reintroduce the reference taking and add a comment this time explaining why it's needed.
This is XSA-374 / CVE-2021-28691.
Fixes: 2ac061ce97f4 ('xen/netback: cleanup init and deinit code') Signed-off-by: Roger Pau Monné roger.pau@citrix.com Cc: stable@vger.kernel.org Reviewed-by: Jan Beulich jbeulich@suse.com Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netback/interface.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -685,6 +685,7 @@ static void xenvif_disconnect_queue(stru { if (queue->task) { kthread_stop(queue->task); + put_task_struct(queue->task); queue->task = NULL; }
@@ -745,6 +746,11 @@ int xenvif_connect_data(struct xenvif_qu if (IS_ERR(task)) goto kthread_err; queue->task = task; + /* + * Take a reference to the task in order to prevent it from being freed + * if the thread function returns before kthread_stop is called. + */ + get_task_struct(task);
task = kthread_run(xenvif_dealloc_kthread, queue, "%s-dealloc", queue->name);
From: David Ahern dsahern@kernel.org
commit 7a6b1ab7475fd6478eeaf5c9d1163e7a18125c8f upstream.
IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to fill up the neighbour table with enough entries that it will overflow for valid connections after that.
This behaviour is more prevalent after commit 58956317c8de ("neighbor: Improve garbage collection") is applied, as it prevents removal from entries that are not NUD_FAILED, unless they are more than 5s old.
Fixes: 58956317c8de (neighbor: Improve garbage collection) Reported-by: Kasper Dupont kasperd@gjkwv.06.feb.2021.kasperd.net Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@canonical.com Signed-off-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -239,6 +239,7 @@ static int neigh_forced_gc(struct neigh_
write_lock(&n->lock); if ((n->nud_state == NUD_FAILED) || + (n->nud_state == NUD_NOARP) || (tbl->is_multicast && tbl->is_multicast(n->primary_key)) || time_after(tref, n->updated))
On 6/8/21 12:25 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.43-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, 08 Jun 2021 20:25:40 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.43-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.10: 12 builds: 12 pass, 0 fail 28 boots: 28 pass, 0 fail 70 tests: 70 pass, 0 fail
Linux version: 5.10.43-rc1-gc108263eaf06 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Wed, 9 Jun 2021 at 00:09, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.43-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.10.43-rc1 * git: ['https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git', 'https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc'] * git branch: linux-5.10.y * git commit: c108263eaf06733c15c23cb4fdf83d188c0e2881 * git describe: v5.10.42-138-gc108263eaf06 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
## No regressions (compared to v5.10.42)
## No fixes (compared to v5.10.42)
## Test result summary total: 79560, pass: 65027, fail: 2540, skip: 11224, xfail: 769,
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 193 total, 193 passed, 0 failed * arm64: 27 total, 27 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 26 total, 26 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 42 total, 42 passed, 0 failed * parisc: 9 total, 9 passed, 0 failed * powerpc: 27 total, 27 passed, 0 failed * riscv: 18 total, 18 passed, 0 failed * s390: 18 total, 18 passed, 0 failed * sh: 18 total, 18 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 27 total, 27 passed, 0 failed
## Test suites summary * fwts * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On 6/8/2021 11:25 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.43-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB, using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.43-rc1
David Ahern dsahern@kernel.org neighbour: allow NUD_NOARP entries to be forced GCed
Roger Pau Monne roger.pau@citrix.com xen-netback: take a reference to the RX task thread
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: missing error reporting for not selected expressions
Roja Rani Yarubandi rojay@codeaurora.org i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
Gao Xiang hsiangkao@redhat.com lib/lz4: explicitly support in-place decompression
Vitaly Kuznetsov vkuznets@redhat.com x86/kvm: Disable all PV features on crash
Vitaly Kuznetsov vkuznets@redhat.com x86/kvm: Disable kvmclock on all CPUs on shutdown
Vitaly Kuznetsov vkuznets@redhat.com x86/kvm: Teardown PV features on boot CPU as well
Marc Zyngier maz@kernel.org KVM: arm64: Fix debug register indexing
Sean Christopherson seanjc@google.com KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Anand Jain anand.jain@oracle.com btrfs: fix unmountable seed device after fstrim
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: always use mdp device to scale bandwidth
Mina Almasry almasrymina@google.com mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Filipe Manana fdmanana@suse.com btrfs: fix deadlock when cloning inline extents and low on available space
Josef Bacik josef@toxicpanda.com btrfs: abort in rename_exchange if we fail to insert the second ref
Josef Bacik josef@toxicpanda.com btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik josef@toxicpanda.com btrfs: return errors from btrfs_del_csums in cleanup_ref_head
Josef Bacik josef@toxicpanda.com btrfs: fix error handling in btrfs_del_csums
Josef Bacik josef@toxicpanda.com btrfs: mark ordered extent and inode with error if we fail to finish
Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com powerpc/kprobes: Fix validation of prefixed instructions across page boundary
Thomas Gleixner tglx@linutronix.de x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
Nirmoy Das nirmoy.das@amd.com drm/amdgpu: make sure we unpin the UVD BO
Luben Tuikov luben.tuikov@amd.com drm/amdgpu: Don't query CE and UE errors
Krzysztof Kozlowski krzysztof.kozlowski@canonical.com nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Pu Wen puwen@hygon.cn x86/sev: Check SME/SEV support in CPUID first
Thomas Gleixner tglx@linutronix.de x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
Ding Hui dinghui@sangfor.com.cn mm/page_alloc: fix counting of free pages after take off from buddy
Gerald Schaefer gerald.schaefer@linux.ibm.com mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
Junxiao Bi junxiao.bi@oracle.com ocfs2: fix data corruption by fallocate
Mark Rutland mark.rutland@arm.com pid: take a reference when initializing `cad_pid`
Phil Elwell phil@raspberrypi.com usb: dwc2: Fix build in periphal-only mode
Ritesh Harjani riteshh@linux.ibm.com ext4: fix accessing uninit percpu counter variable with fast_commit
Phillip Potter phil@philpotter.co.uk ext4: fix memory leak in ext4_mb_init_backend on error path.
Harshad Shirwadkar harshadshirwadkar@gmail.com ext4: fix fast commit alignment issues
Ye Bin yebin10@huawei.com ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Alexey Makhalov amakhalov@vmware.com ext4: fix memory leak in ext4_fill_super
Marek Vasut marex@denx.de ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
Michal Vokáč michal.vokac@ysoft.com ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
Hui Wang hui.wang@canonical.com ALSA: hda: update the power_state during the direct-complete
Carlos M carlos.marr.pz@gmail.com ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
Takashi Iwai tiwai@suse.de ALSA: timer: Fix master timer notification
Bob Peterson rpeterso@redhat.com gfs2: fix scheduling while atomic bug in glocks
Ahelenia Ziemiańska nabijaczleweli@nabijaczleweli.xyz HID: multitouch: require Finger field to mark Win8 reports as MT
Johan Hovold johan@kernel.org HID: magicmouse: fix NULL-deref on disconnect
Johnny Chuang johnny.chuang.emc@gmail.com HID: i2c-hid: Skip ELAN power-on command after reset
Pavel Skripkin paskripkin@gmail.com net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin paskripkin@gmail.com net: caif: fix memory leak in caif_device_notify
Pavel Skripkin paskripkin@gmail.com net: caif: add proper error handling
Pavel Skripkin paskripkin@gmail.com net: caif: added cfserl_release function
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: free empty intermediate nodes when removing single node
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: allocate nodes in kmem_cache
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: remove nodes in O(1)
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: initialize list head in selftest
Jason A. Donenfeld Jason@zx2c4.com wireguard: selftests: make sure rp_filter is disabled on vethc
Jason A. Donenfeld Jason@zx2c4.com wireguard: selftests: remove old conntrack kconfig value
Jason A. Donenfeld Jason@zx2c4.com wireguard: use synchronize_net rather than synchronize_rcu
Jason A. Donenfeld Jason@zx2c4.com wireguard: peer: allocate in kmem_cache
Jason A. Donenfeld Jason@zx2c4.com wireguard: do not use -O3
Lin Ma linma@zju.edu.cn Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma linma@zju.edu.cn Bluetooth: fix the erroneous flush_work() order
James Zhu James.Zhu@amd.com drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
James Zhu James.Zhu@amd.com drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
James Zhu James.Zhu@amd.com drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
Pavel Begunkov asml.silence@gmail.com io_uring: use better types for cflags
Pavel Begunkov asml.silence@gmail.com io_uring: fix link timeout refs
Jisheng Zhang jszhang@kernel.org riscv: vdso: fix and clean-up Makefile
Johan Hovold johan@kernel.org serial: stm32: fix threaded interrupt handling
Hoang Le hoang.h.le@dektech.com.au tipc: fix unique bearer names sanity check
Hoang Le hoang.h.le@dektech.com.au tipc: add extack messages for bearer/media failure
Tony Lindgren tony@atomide.com bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
Geert Uytterhoeven geert+renesas@glider.be ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
Fabio Estevam festevam@gmail.com ARM: dts: imx7d-pico: Fix the 'tuning-step' property
Fabio Estevam festevam@gmail.com ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
Michael Walle michael@walle.cc arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
Lucas Stach l.stach@pengutronix.de arm64: dts: zii-ultra: fix 12V_MAIN voltage
Michael Walle michael@walle.cc arm64: dts: ls1028a: fix memory node
Tony Lindgren tony@atomide.com bus: ti-sysc: Fix am335x resume hang for usb otg module
Jens Wiklander jens.wiklander@linaro.org optee: use export_uuid() to copy client UUID
Vignesh Raghavendra vigneshr@ti.com arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
Magnus Karlsson magnus.karlsson@intel.com ixgbe: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com ixgbe: optimize for XDP_REDIRECT in xsk path
Magnus Karlsson magnus.karlsson@intel.com ice: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com ice: optimize for XDP_REDIRECT in xsk path
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: simplify ice_run_xdp
Magnus Karlsson magnus.karlsson@intel.com i40e: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com i40e: optimize for XDP_REDIRECT in xsk path
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: avoid link re-train during TC-MQPRIO configuration
Roja Rani Yarubandi rojay@codeaurora.org i2c: qcom-geni: Add shutdown callback for i2c
Dave Ertman david.m.ertman@intel.com ice: Allow all LLDP packets from PF to Tx
Paul Greenwalt paul.greenwalt@intel.com ice: report supported and advertised autoneg using PHY capabilities
Haiyue Wang haiyue.wang@intel.com ice: handle the VF VSI rebuild failure
Brett Creeley brett.creeley@intel.com ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
Brett Creeley brett.creeley@intel.com ice: Fix allowing VF to request more/less queues via virtchnl
Coco Li lixiaoyan@google.com ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: fix regression with HASH tc prio value update
Magnus Karlsson magnus.karlsson@intel.com ixgbevf: add correct exception tracing for XDP
Magnus Karlsson magnus.karlsson@intel.com igb: add correct exception tracing for XDP
Wei Yongjun weiyongjun1@huawei.com ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei thunder.leizhen@huawei.com ieee802154: fix error return code in ieee802154_add_iface()
Daniel Borkmann daniel@iogearbox.net bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
Tobias Klauser tklauser@distanz.ch bpf: Simplify cases in bpf_base_func_proto
Zhihao Cheng chengzhihao1@huawei.com drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_ct: skip expectations for confirmed conntrack
Max Gurtovoy mgurtovoy@nvidia.com nvmet: fix freeing unallocated p2pmem
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Create multi-destination flow table with level less than 64
Roi Dayan roid@nvidia.com net/mlx5e: Check for needed capability for cvlan matching
Moshe Shemesh moshe@nvidia.com net/mlx5: Check firmware sync reset requested is set before trying to abort it
Aya Levin ayal@nvidia.com net/mlx5e: Fix incompatible casting
Maxim Mikityanskiy maximmi@nvidia.com net/tls: Fix use-after-free after the TLS device goes down and up
Maxim Mikityanskiy maximmi@nvidia.com net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
Alexander Aring aahringo@redhat.com net: sock: fix in-kernel mark setting
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
Li Huafei lihuafei1@huawei.com perf probe: Fix NULL pointer dereference in convert_variable_location()
Erik Kaneda erik.kaneda@intel.com ACPICA: Clean up context mutex during object deletion
Sagi Grimberg sagi@grimberg.me nvme-rdma: fix in-casule data send for chained sgls
Paolo Abeni pabeni@redhat.com mptcp: always parse mptcp options for MPC reqsk
Ariel Levkovich lariel@nvidia.com net/sched: act_ct: Fix ct template allocation for zone 0
Paul Blakey paulb@nvidia.com net/sched: act_ct: Offload connections with commit action
Parav Pandit parav@nvidia.com devlink: Correct VIRTUAL port to not have phys_port attributes
Arnd Bergmann arnd@arndb.de HID: i2c-hid: fix format string mismatch
Zhen Lei thunder.leizhen@huawei.com HID: pidff: fix error return code in hid_pidff_init()
Tom Rix trix@redhat.com HID: logitech-hidpp: initialize level variable
Julian Anastasov ja@ssi.bg ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy mgurtovoy@nvidia.com vfio/platform: fix module_put call in error flow
Wei Yongjun weiyongjun1@huawei.com samples: vfio-mdev: fix error handing in mdpy_fb_probe()
Randy Dunlap rdunlap@infradead.org vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei thunder.leizhen@huawei.com vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes linux@rasmusvillemoes.dk efi: cper: fix snprintf() use in cper_dimm_err_location()
Dan Carpenter dan.carpenter@oracle.com efi/libstub: prevent read overflow in find_file_option()
Heiner Kallweit hkallweit1@gmail.com efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Changbin Du changbin.du@intel.com efi/fdt: fix panic when no valid fdt found
Florian Westphal fw@strlen.de netfilter: conntrack: unregister ipv4 sockopts on error unwind
Grant Peltier grantpeltier93@gmail.com hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
Armin Wolf W_Armin@gmx.de hwmon: (dell-smm-hwmon) Fix index values
Grant Grundler grundler@chromium.org net: usb: cdc_ncm: don't spew notifications
Josef Bacik josef@toxicpanda.com btrfs: tree-checker: do not error out if extent ref hash doesn't match
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/imx6dl-yapp4-common.dtsi | 6 +- arch/arm/boot/dts/imx6q-dhcom-som.dtsi | 12 ++ arch/arm/boot/dts/imx6qdl-emcon-avari.dtsi | 2 +- arch/arm/boot/dts/imx7d-meerkat96.dts | 2 +- arch/arm/boot/dts/imx7d-pico.dtsi | 2 +- .../freescale/fsl-ls1028a-kontron-sl28-var4.dts | 5 +- arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 4 +- .../arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi | 4 +- arch/arm64/boot/dts/ti/k3-j7200-main.dtsi | 2 + arch/arm64/kvm/sys_regs.c | 42 ++--- arch/powerpc/kernel/kprobes.c | 4 +- arch/riscv/kernel/vdso/Makefile | 4 +- arch/x86/include/asm/apic.h | 1 + arch/x86/include/asm/disabled-features.h | 7 +- arch/x86/include/asm/fpu/api.h | 6 +- arch/x86/include/asm/fpu/internal.h | 7 - arch/x86/include/asm/kvm_para.h | 10 +- arch/x86/kernel/apic/apic.c | 1 + arch/x86/kernel/apic/vector.c | 20 +++ arch/x86/kernel/fpu/xstate.c | 57 ------- arch/x86/kernel/kvm.c | 92 +++++++--- arch/x86/kernel/kvmclock.c | 26 +-- arch/x86/kvm/svm/svm.c | 8 +- arch/x86/mm/mem_encrypt_identity.c | 11 +- drivers/acpi/acpica/utdelete.c | 8 + drivers/bus/ti-sysc.c | 57 ++++++- drivers/firmware/efi/cper.c | 4 +- drivers/firmware/efi/fdtparams.c | 3 + drivers/firmware/efi/libstub/file.c | 2 +- drivers/firmware/efi/memattr.c | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 16 -- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 4 +- drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 4 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 5 +- drivers/gpu/drm/i915/selftests/i915_request.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 3 +- drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 51 +----- drivers/hid/hid-logitech-hidpp.c | 1 + drivers/hid/hid-magicmouse.c | 2 +- drivers/hid/hid-multitouch.c | 10 +- drivers/hid/i2c-hid/i2c-hid-core.c | 13 +- drivers/hid/usbhid/hid-pidff.c | 1 + drivers/hwmon/dell-smm-hwmon.c | 4 +- drivers/hwmon/pmbus/isl68137.c | 4 +- drivers/i2c/busses/i2c-qcom-geni.c | 21 ++- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 - drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 +- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 14 +- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c | 9 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 7 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 15 +- drivers/net/ethernet/intel/ice/ice_ethtool.c | 51 +----- drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 1 + drivers/net/ethernet/intel/ice/ice_lib.c | 2 + drivers/net/ethernet/intel/ice/ice_txrx.c | 24 +-- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 19 ++- drivers/net/ethernet/intel/ice/ice_xsk.c | 16 +- drivers/net/ethernet/intel/igb/igb_main.c | 10 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 21 ++- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 5 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 + drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 3 + .../ethernet/mellanox/mlx5/core/steering/dr_fw.c | 3 +- drivers/net/usb/cdc_ncm.c | 12 +- drivers/net/wireguard/Makefile | 3 +- drivers/net/wireguard/allowedips.c | 189 +++++++++++---------- drivers/net/wireguard/allowedips.h | 14 +- drivers/net/wireguard/main.c | 17 +- drivers/net/wireguard/peer.c | 27 ++- drivers/net/wireguard/peer.h | 3 + drivers/net/wireguard/selftest/allowedips.c | 165 +++++++++--------- drivers/net/wireguard/socket.c | 2 +- drivers/net/xen-netback/interface.c | 6 + drivers/nvme/host/rdma.c | 5 +- drivers/nvme/target/core.c | 33 ++-- drivers/tee/optee/call.c | 6 +- drivers/tee/optee/optee_msg.h | 6 +- drivers/tty/serial/stm32-usart.c | 22 +-- drivers/usb/dwc2/core_intr.c | 4 + drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci_config.c | 2 +- drivers/vfio/platform/vfio_platform_common.c | 2 +- fs/btrfs/extent-tree.c | 12 +- fs/btrfs/file-item.c | 10 +- fs/btrfs/inode.c | 19 ++- fs/btrfs/reflink.c | 38 +++-- fs/btrfs/tree-checker.c | 16 +- fs/btrfs/tree-log.c | 13 +- fs/ext4/extents.c | 43 ++--- fs/ext4/fast_commit.c | 182 ++++++++++---------- fs/ext4/fast_commit.h | 7 - fs/ext4/ialloc.c | 6 +- fs/ext4/mballoc.c | 2 +- fs/ext4/super.c | 11 +- fs/gfs2/glock.c | 2 + fs/io_uring.c | 6 +- fs/ocfs2/file.c | 55 +++++- include/linux/mlx5/mlx5_ifc.h | 2 + include/linux/platform_data/ti-sysc.h | 1 + include/linux/usb/usbnet.h | 2 + include/net/caif/caif_dev.h | 2 +- include/net/caif/cfcnfg.h | 2 +- include/net/caif/cfserl.h | 1 + include/net/tls.h | 10 +- init/main.c | 2 +- kernel/bpf/helpers.c | 19 +-- kernel/trace/bpf_trace.c | 32 ++-- lib/lz4/lz4_decompress.c | 6 +- lib/lz4/lz4defs.h | 1 + mm/debug_vm_pgtable.c | 4 +- mm/hugetlb.c | 14 +- mm/page_alloc.c | 2 + net/bluetooth/hci_core.c | 7 +- net/bluetooth/hci_sock.c | 4 +- net/caif/caif_dev.c | 13 +- net/caif/caif_usb.c | 14 +- net/caif/cfcnfg.c | 16 +- net/caif/cfserl.c | 5 + net/core/devlink.c | 4 +- net/core/neighbour.c | 1 + net/core/sock.c | 16 +- net/dsa/tag_8021q.c | 2 +- net/ieee802154/nl-mac.c | 4 +- net/ieee802154/nl-phy.c | 4 +- net/ipv6/route.c | 8 +- net/mptcp/subflow.c | 17 +- net/netfilter/ipvs/ip_vs_ctl.c | 2 +- net/netfilter/nf_conntrack_proto.c | 2 +- net/netfilter/nf_tables_api.c | 4 +- net/netfilter/nfnetlink_cthelper.c | 8 +- net/netfilter/nft_ct.c | 2 +- net/nfc/llcp_sock.c | 2 + net/sched/act_ct.c | 10 +- net/tipc/bearer.c | 94 +++++++--- net/tls/tls_device.c | 60 +++++-- net/tls/tls_device_fallback.c | 7 + net/tls/tls_main.c | 1 + samples/vfio-mdev/mdpy-fb.c | 13 +- sound/core/timer.c | 3 +- sound/pci/hda/hda_codec.c | 5 + sound/pci/hda/patch_realtek.c | 1 + tools/perf/util/dwarf-aux.c | 8 +- tools/perf/util/probe-finder.c | 3 + tools/testing/selftests/wireguard/netns.sh | 1 + .../testing/selftests/wireguard/qemu/kernel.config | 1 - 150 files changed, 1260 insertions(+), 925 deletions(-)
On Tue, Jun 08, 2021 at 08:25:40PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
Build results: total: 156 pass: 156 fail: 0 Qemu test results: total: 455 pass: 455 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
Hi!
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-5...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
Hi Greg,
On Tue, Jun 08, 2021 at 08:25:40PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.43 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.1.1 20210523): 63 configs -> no failure arm (gcc version 11.1.1 20210523): 105 configs -> no new failure arm64 (gcc version 11.1.1 20210523): 2 configs -> no failure x86_64 (gcc version 10.2.1 20210110): 2 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. arm64: Booted on rpi4b (4GB model). No regression.
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
linux-stable-mirror@lists.linaro.org