This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.112-rc1
Duoming Zhou duoming@zju.edu.cn ax25: Fix UAF bugs in ax25 timers
Duoming Zhou duoming@zju.edu.cn ax25: Fix NULL pointer dereferences in ax25 timers
Duoming Zhou duoming@zju.edu.cn ax25: fix NPD bug in ax25_disconnect
Duoming Zhou duoming@zju.edu.cn ax25: fix UAF bug in ax25_send_control()
Duoming Zhou duoming@zju.edu.cn ax25: Fix refcount leaks caused by ax25_cb_del()
Duoming Zhou duoming@zju.edu.cn ax25: fix UAF bugs of net_device caused by rebinding operation
Duoming Zhou duoming@zju.edu.cn ax25: fix reference count leaks of ax25_dev
Duoming Zhou duoming@zju.edu.cn ax25: add refcount in ax25_dev to avoid UAF bugs
Mike Christie michael.christie@oracle.com scsi: iscsi: Fix unbound endpoint error handling
Mike Christie michael.christie@oracle.com scsi: iscsi: Fix endpoint reuse regression
Chao Gao chao.gao@intel.com dma-direct: avoid redundant memory sync for swiotlb
Anna-Maria Behnsen anna-maria@linutronix.de timers: Fix warning condition in __run_timers()
Martin Povišer povik+lin@cutebit.org i2c: pasemi: Wait for write xfers to finish
Nadav Amit namit@vmware.com smp: Fix offline cpu check in flush_smp_call_function_queue()
Mikulas Patocka mpatocka@redhat.com dm integrity: fix memory corruption when tag_size is less than digest size
Nathan Chancellor nathan@kernel.org ARM: davinci: da850-evm: Avoid NULL pointer dereference
Paul Gortmaker paul.gortmaker@windriver.com tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
Rei Yamamoto yamamoto.rei@jp.fujitsu.com genirq/affinity: Consider that CPUs on nodes can be unbalanced
Tomasz Moń desowin@gmail.com drm/amdgpu: Enable gfxoff quirk on MacBook Pro
Melissa Wen mwen@igalia.com drm/amd/display: don't ignore alpha property on pre-multiplied mode
Nicolas Dichtel nicolas.dichtel@6wind.com ipv6: fix panic when forwarding a pkt with no in6 dev
Johannes Berg johannes.berg@intel.com nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size
Fabio M. De Francesco fmdefrancesco@gmail.com ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
Tao Jin tao-j@outlook.com ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakers
Tim Crawford tcrawford@system76.com ALSA: hda/realtek: Add quirk for Clevo PD50PNT
Naohiro Aota naohiro.aota@wdc.com btrfs: mark resumed async balance as writing
Jia-Ju Bai baijiaju1990@gmail.com btrfs: fix root ref counts in error handling in btrfs_get_root_ref
Toke Høiland-Jørgensen toke@redhat.com ath9k: Fix usage of driver-private space in tx_info
Toke Høiland-Jørgensen toke@toke.dk ath9k: Properly clear TX status area before reporting to mac80211
Jason A. Donenfeld Jason@zx2c4.com gcc-plugins: latent_entropy: use /dev/urandom
Johan Hovold johan@kernel.org memory: renesas-rpc-if: fix platform-device leak in error path
Oliver Upton oupton@google.com KVM: Don't create VM debugfs files outside of the VM directory
Sean Christopherson seanjc@google.com KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded
Patrick Wang patrick.wang.shcn@gmail.com mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
Minchan Kim minchan@kernel.org mm: fix unexpected zeroed page mapping with zram swap
Juergen Gross jgross@suse.com mm, page_alloc: fix build_zonerefs_node()
Borislav Petkov bp@suse.de perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant
Duoming Zhou duoming@zju.edu.cn drivers: net: slip: fix NPD bug in sl_tx_timeout()
Chandrakanth patil chandrakanth.patil@broadcom.com scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan
Alexey Galakhov agalakhov@gmail.com scsi: mvsas: Add PCI ID of RocketRaid 2640
Roman Li Roman.Li@amd.com drm/amd/display: Fix allocate_mst_payload assert on resume
Martin Leung Martin.Leung@amd.com drm/amd/display: Revert FEC check in validation
Xiaomeng Tong xiam0nd.tong@gmail.com myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
Marcin Kozlowski marcinguy@gmail.com net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
Andy Chiu andy.chiu@sifive.com net: axienet: setup mdio unconditionally
Steve Capper steve.capper@arm.com tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry
Joey Gouly joey.gouly@arm.com arm64: alternatives: mark patch_alternative() as `noinstr`
Jonathan Bakker xc-racer2@live.ca regulator: wm8994: Add an off-on delay for WM8994 variant
Leo Ruan tingquan.ruan@cn.bosch.com gpu: ipu-v3: Fix dev_dbg frequency output
Christian Lamparter chunkeey@gmail.com ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
Randy Dunlap rdunlap@infradead.org net: micrel: fix KS8851_MLL Kconfig
Tyrel Datwyler tyreld@linux.ibm.com scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
James Smart jsmart2021@gmail.com scsi: lpfc: Fix queue failures when recovering from PCI parity error
Xiaoguang Wang xiaoguang.wang@linux.alibaba.com scsi: target: tcmu: Fix possible page UAF
Michael Kelley mikelley@microsoft.com Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
QintaoShen unSimple1993@163.com drm/amdkfd: Check for potential null return of kmalloc_array()
Tianci Yin tianci.yin@amd.com drm/amdgpu/vcn: improve vcn dpg stop procedure
Tushar Patel tushar.patel@amd.com drm/amdkfd: Fix Incorrect VMIDs passed to HWS
Leo (Hanghong) Ma hanghong.ma@amd.com drm/amd/display: Update VTEM Infopacket definition
Chiawen Huang chiawen.huang@amd.com drm/amd/display: FEC check in timing validation
Charlene Liu Charlene.Liu@amd.com drm/amd/display: fix audio format not updated after edid updated
Josef Bacik josef@toxicpanda.com btrfs: do not warn for free space inode in cow_file_range
Darrick J. Wong djwong@kernel.org btrfs: fix fallocate to use file_modified to update permissions consistently
Aurabindo Pillai aurabindo.pillai@amd.com drm/amd: Add USBC connector ID
Jeremy Linton jeremy.linton@arm.com net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
Khazhismel Kumykov khazhy@google.com dm mpath: only use ktime_get_ns() in historical selector
Harshit Mogalapalli harshit.m.mogalapalli@oracle.com cifs: potential buffer overflow in handling symlinks
Lin Ma linma@zju.edu.cn nfc: nci: add flush_workqueue to prevent uaf
Adrian Hunter adrian.hunter@intel.com perf tools: Fix misleading add event PMU debug message
Athira Rajeev atrajeev@linux.vnet.ibm.com testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
Petr Malat oss@malat.biz sctp: Initialize daddr on peeled off socket
Mike Christie michael.christie@oracle.com scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
Mike Christie michael.christie@oracle.com scsi: iscsi: Fix offload conn cleanup when iscsid restarts
Mike Christie michael.christie@oracle.com scsi: iscsi: Move iscsi_ep_disconnect()
Mike Christie michael.christie@oracle.com scsi: iscsi: Fix in-kernel conn failure handling
Mike Christie michael.christie@oracle.com scsi: iscsi: Rel ref after iscsi_lookup_endpoint()
Mike Christie michael.christie@oracle.com scsi: iscsi: Use system_unbound_wq for destroy_work
Mike Christie michael.christie@oracle.com scsi: iscsi: Force immediate failure during shutdown
Mike Christie michael.christie@oracle.com scsi: iscsi: Stop queueing during ep_disconnect
Ajish Koshy Ajish.Koshy@microchip.com scsi: pm80xx: Enable upper inbound, outbound queues
Ajish Koshy Ajish.Koshy@microchip.com scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
Karsten Graul kgraul@linux.ibm.com net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
Stephen Boyd swboyd@chromium.org drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
Rob Clark robdclark@chromium.org drm/msm: Fix range size vs end confusion
Rameshkumar Sundaram quic_ramess@quicinc.com cfg80211: hold bss_lock while updating nontrans_list
Benedikt Spranger b.spranger@linutronix.de net/sched: taprio: Check if socket flags are valid
Dinh Nguyen dinguyen@kernel.org net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
Michael Walle michael@walle.cc net: dsa: felix: suppress -EPROBE_DEFER errors
Marcelo Ricardo Leitner marcelo.leitner@gmail.com net/sched: fix initialization order when updating chain 0 head
Vadim Pasternak vadimp@nvidia.com mlxsw: i2c: Fix initialization error flow
Calvin Johnson calvin.johnson@oss.nxp.com net: mdio: Alphabetically sort header inclusion
Linus Torvalds torvalds@linux-foundation.org gpiolib: acpi: use correct format characters
Guillaume Nault gnault@redhat.com veth: Ensure eth header is in skb's linear part
Vlad Buslov vladbu@nvidia.com net/sched: flower: fix parsing of ethertype following VLAN header
Chuck Lever chuck.lever@oracle.com SUNRPC: Fix the svc_deferred_event trace class
Kyle Copperfield kmcopper@danwin1210.me media: rockchip/rga: do proper error checking in probe
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix sorting of retrieved clock rates
Miaoqian Lin linmq006@gmail.com memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
Rob Clark robdclark@chromium.org drm/msm: Add missing put_task_struct() in debugfs path
Nathan Chancellor nathan@kernel.org btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
Mario Limonciello mario.limonciello@amd.com ACPI: processor idle: Check for architectural support for LPI
Mario Limonciello mario.limonciello@amd.com cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
Lin Ma linma@zju.edu.cn hamradio: remove needs_free_netdev to avoid UAF
Lin Ma linma@zju.edu.cn hamradio: defer 6pack kfree after unregister_netdev
Felix Kuehling Felix.Kuehling@amd.com drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu
-------------
Diffstat:
Makefile | 4 +- arch/arm/mach-davinci/board-da850-evm.c | 4 +- arch/arm64/kernel/alternative.c | 6 +- arch/arm64/kernel/cpuidle.c | 6 +- arch/x86/include/asm/kvm_host.h | 5 +- arch/x86/kvm/mmu/mmu.c | 20 +- arch/x86/kvm/x86.c | 20 +- drivers/acpi/processor_idle.c | 15 +- drivers/ata/libata-core.c | 3 + drivers/firmware/arm_scmi/clock.c | 3 +- drivers/gpio/gpiolib-acpi.c | 4 +- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 + drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 +- .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 14 +- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 14 +- .../amd/display/modules/info_packet/info_packet.c | 5 +- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +- drivers/gpu/drm/msm/dsi/dsi_manager.c | 2 +- drivers/gpu/drm/msm/msm_gem.c | 1 + drivers/gpu/ipu-v3/ipu-di.c | 5 +- drivers/hv/ring_buffer.c | 11 +- drivers/i2c/busses/i2c-pasemi.c | 6 + drivers/infiniband/ulp/iser/iscsi_iser.c | 2 + drivers/md/dm-historical-service-time.c | 10 +- drivers/md/dm-integrity.c | 7 +- drivers/media/platform/rockchip/rga/rga.c | 2 +- drivers/memory/atmel-ebi.c | 23 +- drivers/memory/renesas-rpc-if.c | 10 +- drivers/net/dsa/ocelot/felix_vsc9959.c | 2 +- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/i2c.c | 1 + drivers/net/ethernet/micrel/Kconfig | 1 + drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 6 +- drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c | 8 - drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h | 4 + .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 13 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 13 +- drivers/net/hamradio/6pack.c | 5 +- drivers/net/mdio/mdio-bcm-unimac.c | 16 +- drivers/net/mdio/mdio-bitbang.c | 4 +- drivers/net/mdio/mdio-cavium.c | 2 +- drivers/net/mdio/mdio-gpio.c | 10 +- drivers/net/mdio/mdio-ipq4019.c | 4 +- drivers/net/mdio/mdio-ipq8064.c | 4 +- drivers/net/mdio/mdio-mscc-miim.c | 8 +- drivers/net/mdio/mdio-mux-bcm-iproc.c | 10 +- drivers/net/mdio/mdio-mux-gpio.c | 8 +- drivers/net/mdio/mdio-mux-mmioreg.c | 6 +- drivers/net/mdio/mdio-mux-multiplexer.c | 2 +- drivers/net/mdio/mdio-mux.c | 6 +- drivers/net/mdio/mdio-octeon.c | 8 +- drivers/net/mdio/mdio-thunder.c | 10 +- drivers/net/mdio/mdio-xgene.c | 6 +- drivers/net/mdio/of_mdio.c | 10 +- drivers/net/slip/slip.c | 2 +- drivers/net/usb/aqc111.c | 9 +- drivers/net/veth.c | 2 +- drivers/net/wireless/ath/ath9k/main.c | 2 +- drivers/net/wireless/ath/ath9k/xmit.c | 33 +- drivers/perf/fsl_imx8_ddr_perf.c | 2 +- drivers/regulator/wm8994-regulator.c | 42 +- drivers/scsi/be2iscsi/be_iscsi.c | 19 +- drivers/scsi/be2iscsi/be_main.c | 1 + drivers/scsi/bnx2i/bnx2i_iscsi.c | 24 +- drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 1 + drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 1 + drivers/scsi/cxgbi/libcxgbi.c | 12 +- drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 2 +- drivers/scsi/libiscsi.c | 70 ++- drivers/scsi/lpfc/lpfc_init.c | 2 + drivers/scsi/megaraid/megaraid_sas.h | 3 + drivers/scsi/megaraid/megaraid_sas_base.c | 7 + drivers/scsi/mvsas/mv_init.c | 1 + drivers/scsi/pm8001/pm80xx_hwi.c | 33 +- drivers/scsi/qedi/qedi_iscsi.c | 26 +- drivers/scsi/qla4xxx/ql4_os.c | 2 + drivers/scsi/scsi_transport_iscsi.c | 541 +++++++++++++-------- drivers/target/target_core_user.c | 3 +- fs/btrfs/block-group.c | 4 - fs/btrfs/disk-io.c | 5 +- fs/btrfs/file.c | 13 +- fs/btrfs/inode.c | 1 - fs/btrfs/volumes.c | 2 + fs/cifs/link.c | 3 + include/asm-generic/tlb.h | 10 +- include/net/ax25.h | 12 + include/net/flow_dissector.h | 2 + include/scsi/libiscsi.h | 1 + include/scsi/scsi_transport_iscsi.h | 14 +- include/trace/events/sunrpc.h | 7 +- kernel/dma/direct.h | 3 +- kernel/irq/affinity.c | 5 +- kernel/smp.c | 2 +- kernel/time/tick-sched.c | 2 +- kernel/time/timer.c | 11 +- mm/kmemleak.c | 8 +- mm/page_alloc.c | 2 +- mm/page_io.c | 54 -- net/ax25/af_ax25.c | 38 +- net/ax25/ax25_dev.c | 28 +- net/ax25/ax25_route.c | 13 +- net/ax25/ax25_subr.c | 20 +- net/core/flow_dissector.c | 1 + net/ipv6/ip6_output.c | 2 +- net/nfc/nci/core.c | 4 + net/sched/cls_api.c | 2 +- net/sched/cls_flower.c | 18 +- net/sched/sch_taprio.c | 3 +- net/sctp/socket.c | 2 +- net/smc/smc_pnet.c | 5 +- net/wireless/nl80211.c | 3 +- net/wireless/scan.c | 2 + scripts/gcc-plugins/latent_entropy_plugin.c | 44 +- sound/core/pcm_misc.c | 2 +- sound/pci/hda/patch_realtek.c | 2 + tools/perf/util/parse-events.c | 5 +- tools/testing/selftests/mqueue/mq_perf_tests.c | 25 +- virt/kvm/kvm_main.c | 10 +- 125 files changed, 1062 insertions(+), 581 deletions(-)
From: Felix Kuehling Felix.Kuehling@amd.com
commit b40a6ab2cf9213923bf8e821ce7fa7f6a0a26990 upstream.
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. The VM can also be extracted from drm_priv, so drm_priv can replace the vm parameter in the kfd2kgd interface.
Signed-off-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Philip Yang philip.yang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com [This is a partial cherry-pick of the upstream commit.] Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1024,11 +1024,15 @@ int amdgpu_amdkfd_gpuvm_acquire_process_ struct dma_fence **ef) { struct amdgpu_device *adev = get_amdgpu_device(kgd); - struct drm_file *drm_priv = filp->private_data; - struct amdgpu_fpriv *drv_priv = drm_priv->driver_priv; - struct amdgpu_vm *avm = &drv_priv->vm; + struct amdgpu_fpriv *drv_priv; + struct amdgpu_vm *avm; int ret;
+ ret = amdgpu_file_to_fpriv(filp, &drv_priv); + if (ret) + return ret; + avm = &drv_priv->vm; + /* Already a compute VM? */ if (avm->process_info) return -EINVAL;
From: Lin Ma linma@zju.edu.cn
commit 0b9111922b1f399aba6ed1e1b8f2079c3da1aed8 upstream.
There is a possible race condition (use-after-free) like below
(USE) | (FREE) dev_queue_xmit | __dev_queue_xmit | __dev_xmit_skb | sch_direct_xmit | ... xmit_one | netdev_start_xmit | tty_ldisc_kill __netdev_start_xmit | 6pack_close sp_xmit | kfree sp_encaps | |
According to the patch "defer ax25 kfree after unregister_netdev", this patch reorder the kfree after the unregister_netdev to avoid the possible UAF as the unregister_netdev() is well synchronized and won't return if there is a running routine.
Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Xu Jia xujia39@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hamradio/6pack.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/hamradio/6pack.c +++ b/drivers/net/hamradio/6pack.c @@ -679,9 +679,11 @@ static void sixpack_close(struct tty_str del_timer_sync(&sp->tx_t); del_timer_sync(&sp->resync_t);
- /* Free all 6pack frame buffers. */ + /* Free all 6pack frame buffers after unreg. */ kfree(sp->rbuff); kfree(sp->xbuff); + + free_netdev(sp->dev); }
/* Perform I/O control on an active 6pack channel. */
From: Lin Ma linma@zju.edu.cn
commit 81b1d548d00bcd028303c4f3150fa753b9b8aa71 upstream.
The former patch "defer 6pack kfree after unregister_netdev" reorders the kfree of two buffer after the unregister_netdev to prevent the race condition. It also adds free_netdev() function in sixpack_close(), which is a direct copy from the similar code in mkiss_close().
However, in sixpack driver, the flag needs_free_netdev is set to true in sp_setup(), hence the unregister_netdev() will free the netdev automatically. Therefore, as the sp is netdev_priv, use-after-free occurs.
This patch removes the needs_free_netdev = true and just let the free_netdev to finish this deallocation task.
Fixes: 0b9111922b1f ("hamradio: defer 6pack kfree after unregister_netdev") Signed-off-by: Lin Ma linma@zju.edu.cn Link: https://lore.kernel.org/r/20211111141402.7551-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Xu Jia xujia39@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hamradio/6pack.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/net/hamradio/6pack.c +++ b/drivers/net/hamradio/6pack.c @@ -311,7 +311,6 @@ static void sp_setup(struct net_device * { /* Finish setting up the DEVICE info. */ dev->netdev_ops = &sp_netdev_ops; - dev->needs_free_netdev = true; dev->mtu = SIXP_MTU; dev->hard_header_len = AX25_MAX_HEADER_LEN; dev->header_ops = &ax25_header_ops;
From: Mario Limonciello mario.limonciello@amd.com
commit 01f6c7338ce267959975da65d86ba34f44d54220 upstream.
Currently the first thing checked is whether the PCSI cpu_suspend function has been initialized.
Another change will be overloading `acpi_processor_ffh_lpi_probe` and calling it sooner. So make the `has_lpi` check the first thing checked to prepare for that change.
Reviewed-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpuidle.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm64/kernel/cpuidle.c +++ b/arch/arm64/kernel/cpuidle.c @@ -54,6 +54,9 @@ static int psci_acpi_cpu_init_idle(unsig struct acpi_lpi_state *lpi; struct acpi_processor *pr = per_cpu(processors, cpu);
+ if (unlikely(!pr || !pr->flags.has_lpi)) + return -EINVAL; + /* * If the PSCI cpu_suspend function hook has not been initialized * idle states must not be enabled, so bail out @@ -61,9 +64,6 @@ static int psci_acpi_cpu_init_idle(unsig if (!psci_ops.cpu_suspend) return -EOPNOTSUPP;
- if (unlikely(!pr || !pr->flags.has_lpi)) - return -EINVAL; - count = pr->power.count - 1; if (count <= 0) return -ENODEV;
From: Mario Limonciello mario.limonciello@amd.com
commit eb087f305919ee8169ad65665610313e74260463 upstream.
When `osc_pc_lpi_support_confirmed` is set through `_OSC` and `_LPI` is populated then the cpuidle driver assumes that LPI is fully functional.
However currently the kernel only provides architectural support for LPI on ARM. This leads to high power consumption on X86 platforms that otherwise try to enable LPI.
So probe whether or not LPI support is implemented before enabling LPI in the kernel. This is done by overloading `acpi_processor_ffh_lpi_probe` to check whether it returns `-EOPNOTSUPP`. It also means that all future implementations of `acpi_processor_ffh_lpi_probe` will need to follow these semantics as well.
Reviewed-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/processor_idle.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -1080,6 +1080,11 @@ static int flatten_lpi_states(struct acp return 0; }
+int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu) +{ + return -EOPNOTSUPP; +} + static int acpi_processor_get_lpi_info(struct acpi_processor *pr) { int ret, i; @@ -1088,6 +1093,11 @@ static int acpi_processor_get_lpi_info(s struct acpi_device *d = NULL; struct acpi_lpi_states_array info[2], *tmp, *prev, *curr;
+ /* make sure our architecture has support */ + ret = acpi_processor_ffh_lpi_probe(pr->id); + if (ret == -EOPNOTSUPP) + return ret; + if (!osc_pc_lpi_support_confirmed) return -EOPNOTSUPP;
@@ -1139,11 +1149,6 @@ static int acpi_processor_get_lpi_info(s return 0; }
-int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu) -{ - return -ENODEV; -} - int __weak acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi) { return -ENODEV;
From: Nathan Chancellor nathan@kernel.org
commit 6d4a6b515c39f1f8763093e0f828959b2fbc2f45 upstream.
Clang's version of -Wunused-but-set-variable recently gained support for unary operations, which reveals two unused variables:
fs/btrfs/block-group.c:2949:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable] int num_started = 0; ^ fs/btrfs/block-group.c:3116:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable] int num_started = 0; ^ 2 errors generated.
These variables appear to be unused from their introduction, so just remove them to silence the warnings.
Fixes: c9dc4c657850 ("Btrfs: two stage dirty block group writeout") Fixes: 1bbc621ef284 ("Btrfs: allow block group cache writeout outside critical section in commit") CC: stable@vger.kernel.org # 5.4+ Link: https://github.com/ClangBuiltLinux/linux/issues/1614 Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2570,7 +2570,6 @@ int btrfs_start_dirty_block_groups(struc struct btrfs_path *path = NULL; LIST_HEAD(dirty); struct list_head *io = &cur_trans->io_bgs; - int num_started = 0; int loops = 0;
spin_lock(&cur_trans->dirty_bgs_lock); @@ -2636,7 +2635,6 @@ again: cache->io_ctl.inode = NULL; ret = btrfs_write_out_cache(trans, cache, path); if (ret == 0 && cache->io_ctl.inode) { - num_started++; should_put = 0;
/* @@ -2737,7 +2735,6 @@ int btrfs_write_dirty_block_groups(struc int should_put; struct btrfs_path *path; struct list_head *io = &cur_trans->io_bgs; - int num_started = 0;
path = btrfs_alloc_path(); if (!path) @@ -2795,7 +2792,6 @@ int btrfs_write_dirty_block_groups(struc cache->io_ctl.inode = NULL; ret = btrfs_write_out_cache(trans, cache, path); if (ret == 0 && cache->io_ctl.inode) { - num_started++; should_put = 0; list_add_tail(&cache->io_list, io); } else {
From: Rob Clark robdclark@chromium.org
[ Upstream commit ac3e4f42d5ec459f701743debd9c1ad2f2247402 ]
Fixes: 25faf2f2e065 ("drm/msm: Show process names in gem_describe") Signed-off-by: Rob Clark robdclark@chromium.org Link: https://lore.kernel.org/r/20220317184550.227991-1-robdclark@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_gem.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index 819567e40565..9c05bf6c4551 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -849,6 +849,7 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m) get_pid_task(aspace->pid, PIDTYPE_PID); if (task) { comm = kstrdup(task->comm, GFP_KERNEL); + put_task_struct(task); } else { comm = NULL; }
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 6f296a9665ba5ac68937bf11f96214eb9de81baa ]
The device_node pointer is returned by of_parse_phandle() with refcount incremented. We should use of_node_put() on it when done.
Fixes: 87108dc78eb8 ("memory: atmel-ebi: Enable the SMC clock if specified") Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Claudiu Beznea claudiu.beznea@microchip.com Link: https://lore.kernel.org/r/20220309110144.22412-1-linmq006@gmail.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/memory/atmel-ebi.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/memory/atmel-ebi.c b/drivers/memory/atmel-ebi.c index c267283b01fd..e749dcb3ddea 100644 --- a/drivers/memory/atmel-ebi.c +++ b/drivers/memory/atmel-ebi.c @@ -544,20 +544,27 @@ static int atmel_ebi_probe(struct platform_device *pdev) smc_np = of_parse_phandle(dev->of_node, "atmel,smc", 0);
ebi->smc.regmap = syscon_node_to_regmap(smc_np); - if (IS_ERR(ebi->smc.regmap)) - return PTR_ERR(ebi->smc.regmap); + if (IS_ERR(ebi->smc.regmap)) { + ret = PTR_ERR(ebi->smc.regmap); + goto put_node; + }
ebi->smc.layout = atmel_hsmc_get_reg_layout(smc_np); - if (IS_ERR(ebi->smc.layout)) - return PTR_ERR(ebi->smc.layout); + if (IS_ERR(ebi->smc.layout)) { + ret = PTR_ERR(ebi->smc.layout); + goto put_node; + }
ebi->smc.clk = of_clk_get(smc_np, 0); if (IS_ERR(ebi->smc.clk)) { - if (PTR_ERR(ebi->smc.clk) != -ENOENT) - return PTR_ERR(ebi->smc.clk); + if (PTR_ERR(ebi->smc.clk) != -ENOENT) { + ret = PTR_ERR(ebi->smc.clk); + goto put_node; + }
ebi->smc.clk = NULL; } + of_node_put(smc_np); ret = clk_prepare_enable(ebi->smc.clk); if (ret) return ret; @@ -608,6 +615,10 @@ static int atmel_ebi_probe(struct platform_device *pdev) }
return of_platform_populate(np, NULL, NULL, dev); + +put_node: + of_node_put(smc_np); + return ret; }
static __maybe_unused int atmel_ebi_resume(struct device *dev)
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit 23274739a5b6166f74d8d9cb5243d7bf6b46aab9 ]
During SCMI Clock protocol initialization, after having retrieved from the SCMI platform all the available discrete rates for a specific clock, the clock rates array is sorted, unfortunately using a pointer to its end as a base instead of its start, so that sorting does not work.
Fix invocation of sort() passing as base a pointer to the start of the retrieved clock rates array.
Link: https://lore.kernel.org/r/20220318092813.49283-1-cristian.marussi@arm.com Fixes: dccec73de91d ("firmware: arm_scmi: Keep the discrete clock rates sorted") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/clock.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_scmi/clock.c b/drivers/firmware/arm_scmi/clock.c index 4645677d86f1..a45678cd9b74 100644 --- a/drivers/firmware/arm_scmi/clock.c +++ b/drivers/firmware/arm_scmi/clock.c @@ -202,7 +202,8 @@ scmi_clock_describe_rates_get(const struct scmi_handle *handle, u32 clk_id,
if (rate_discrete && rate) { clk->list.num_rates = tot_rate_cnt; - sort(rate, tot_rate_cnt, sizeof(*rate), rate_cmp_func, NULL); + sort(clk->list.rates, tot_rate_cnt, sizeof(*rate), + rate_cmp_func, NULL); }
clk->rate_discrete = rate_discrete;
From: Kyle Copperfield kmcopper@danwin1210.me
[ Upstream commit 6150f276073a1480030242a7e006a89e161d6cd6 ]
The latest fix for probe error handling contained a typo that causes probing to fail with the following message:
rockchip-rga: probe of ff680000.rga failed with error -12
This patch fixes the typo.
Fixes: e58430e1d4fd (media: rockchip/rga: fix error handling in probe) Reviewed-by: Dragan Simic dragan.simic@gmail.com Signed-off-by: Kyle Copperfield kmcopper@danwin1210.me Reviewed-by: Kieran Bingham kieran.bingham+renesas@ideasonboard.com Reviewed-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rga/rga.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c index 6759091b15e0..d99ea8973b67 100644 --- a/drivers/media/platform/rockchip/rga/rga.c +++ b/drivers/media/platform/rockchip/rga/rga.c @@ -895,7 +895,7 @@ static int rga_probe(struct platform_device *pdev) } rga->dst_mmu_pages = (unsigned int *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 3); - if (rga->dst_mmu_pages) { + if (!rga->dst_mmu_pages) { ret = -ENOMEM; goto free_src_pages; }
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4d5004451ab2218eab94a30e1841462c9316ba19 ]
Fix a NULL deref crash that occurs when an svc_rqst is deferred while the sunrpc tracing subsystem is enabled. svc_revisit() sets dr->xprt to NULL, so it can't be relied upon in the tracepoint to provide the remote's address.
Unfortunately we can't revert the "svc_deferred_class" hunk in commit ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events") because there is now a specific check of event format specifiers for unsafe dereferences. The warning that check emits is:
event svc_defer_recv has unsafe dereference of argument 1
A "%pISpc" format specifier with a "struct sockaddr *" is indeed flagged by this check.
Instead, take the brute-force approach used by the svcrdma_qp_error tracepoint. Convert the dr::addr field into a presentation address in the TP_fast_assign() arm of the trace event, and store that as a string. This fix can be backported to -stable kernels.
In the meantime, commit c6ced22997ad ("tracing: Update print fmt check to handle new __get_sockaddr() macro") is now in v5.18, so this wonky fix can be replaced with __sockaddr() and friends properly during the v5.19 merge window.
Fixes: ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/sunrpc.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 23db248a7fdb..ed1bbac004d5 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1874,17 +1874,18 @@ DECLARE_EVENT_CLASS(svc_deferred_event, TP_STRUCT__entry( __field(const void *, dr) __field(u32, xid) - __string(addr, dr->xprt->xpt_remotebuf) + __array(__u8, addr, INET6_ADDRSTRLEN + 10) ),
TP_fast_assign( __entry->dr = dr; __entry->xid = be32_to_cpu(*(__be32 *)(dr->args + (dr->xprt_hlen>>2))); - __assign_str(addr, dr->xprt->xpt_remotebuf); + snprintf(__entry->addr, sizeof(__entry->addr) - 1, + "%pISpc", (struct sockaddr *)&dr->addr); ),
- TP_printk("addr=%s dr=%p xid=0x%08x", __get_str(addr), __entry->dr, + TP_printk("addr=%s dr=%p xid=0x%08x", __entry->addr, __entry->dr, __entry->xid) );
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit 2105f700b53c24aa48b65c15652acc386044d26a ]
A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to match the L2 ethertype following the first VLAN header, as confirmed by linked discussion with the maintainer. However, such rule also matches packets that have additional second VLAN header, even though filter has both eth_type and vlan_ethtype set to "ipv4". Looking at the code this seems to be mostly an artifact of the way flower uses flow dissector. First, even though looking at the uAPI eth_type and vlan_ethtype appear like a distinct fields, in flower they are all mapped to the same key->basic.n_proto. Second, flow dissector skips following VLAN header as no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the value of n_proto to last parsed header. With these, such filters ignore any headers present between first VLAN header and first "non magic" header (ipv4 in this case) that doesn't result FLOW_DISSECT_RET_PROTO_AGAIN.
Fix the issue by extending flow dissector VLAN key structure with new 'vlan_eth_type' field that matches first ethertype following previously parsed VLAN header. Modify flower classifier to set the new flow_dissector_key_vlan->vlan_eth_type with value obtained from TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs.
Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/ Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Fixes: d64efd0926ba ("net/sched: flower: Add supprt for matching on QinQ vlan headers") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/flow_dissector.h | 2 ++ net/core/flow_dissector.c | 1 + net/sched/cls_flower.c | 18 +++++++++++++----- 3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index cc10b10dc3a1..5eecf4436965 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -59,6 +59,8 @@ struct flow_dissector_key_vlan { __be16 vlan_tci; }; __be16 vlan_tpid; + __be16 vlan_eth_type; + u16 padding; };
struct flow_dissector_mpls_lse { diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 813c709c61cf..f9baa9b1c77f 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1171,6 +1171,7 @@ bool __skb_flow_dissect(const struct net *net, VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; } key_vlan->vlan_tpid = saved_vlan_tpid; + key_vlan->vlan_eth_type = proto; }
fdret = FLOW_DISSECT_RET_PROTO_AGAIN; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 8ff6945b9f8f..35ee6d8226e6 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -998,6 +998,7 @@ static int fl_set_key_mpls(struct nlattr **tb, static void fl_set_key_vlan(struct nlattr **tb, __be16 ethertype, int vlan_id_key, int vlan_prio_key, + int vlan_next_eth_type_key, struct flow_dissector_key_vlan *key_val, struct flow_dissector_key_vlan *key_mask) { @@ -1016,6 +1017,11 @@ static void fl_set_key_vlan(struct nlattr **tb, } key_val->vlan_tpid = ethertype; key_mask->vlan_tpid = cpu_to_be16(~0); + if (tb[vlan_next_eth_type_key]) { + key_val->vlan_eth_type = + nla_get_be16(tb[vlan_next_eth_type_key]); + key_mask->vlan_eth_type = cpu_to_be16(~0); + } }
static void fl_set_key_flag(u32 flower_key, u32 flower_mask, @@ -1497,8 +1503,9 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
if (eth_type_vlan(ethertype)) { fl_set_key_vlan(tb, ethertype, TCA_FLOWER_KEY_VLAN_ID, - TCA_FLOWER_KEY_VLAN_PRIO, &key->vlan, - &mask->vlan); + TCA_FLOWER_KEY_VLAN_PRIO, + TCA_FLOWER_KEY_VLAN_ETH_TYPE, + &key->vlan, &mask->vlan);
if (tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE]) { ethertype = nla_get_be16(tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE]); @@ -1506,6 +1513,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb, fl_set_key_vlan(tb, ethertype, TCA_FLOWER_KEY_CVLAN_ID, TCA_FLOWER_KEY_CVLAN_PRIO, + TCA_FLOWER_KEY_CVLAN_ETH_TYPE, &key->cvlan, &mask->cvlan); fl_set_key_val(tb, &key->basic.n_proto, TCA_FLOWER_KEY_CVLAN_ETH_TYPE, @@ -2861,13 +2869,13 @@ static int fl_dump_key(struct sk_buff *skb, struct net *net, goto nla_put_failure;
if (mask->basic.n_proto) { - if (mask->cvlan.vlan_tpid) { + if (mask->cvlan.vlan_eth_type) { if (nla_put_be16(skb, TCA_FLOWER_KEY_CVLAN_ETH_TYPE, key->basic.n_proto)) goto nla_put_failure; - } else if (mask->vlan.vlan_tpid) { + } else if (mask->vlan.vlan_eth_type) { if (nla_put_be16(skb, TCA_FLOWER_KEY_VLAN_ETH_TYPE, - key->basic.n_proto)) + key->vlan.vlan_eth_type)) goto nla_put_failure; } }
From: Guillaume Nault gnault@redhat.com
[ Upstream commit 726e2c5929de841fdcef4e2bf995680688ae1b87 ]
After feeding a decapsulated packet to a veth device with act_mirred, skb_headlen() may be 0. But veth_xmit() calls __dev_forward_skb(), which expects at least ETH_HLEN byte of linear data (as __dev_forward_skb2() calls eth_type_trans(), which pulls ETH_HLEN bytes unconditionally).
Use pskb_may_pull() to ensure veth_xmit() respects this constraint.
kernel BUG at include/linux/skbuff.h:2328! RIP: 0010:eth_type_trans+0xcf/0x140 Call Trace: <IRQ> __dev_forward_skb2+0xe3/0x160 veth_xmit+0x6e/0x250 [veth] dev_hard_start_xmit+0xc7/0x200 __dev_queue_xmit+0x47f/0x520 ? skb_ensure_writable+0x85/0xa0 ? skb_mpls_pop+0x98/0x1c0 tcf_mirred_act+0x442/0x47e [act_mirred] tcf_action_exec+0x86/0x140 fl_classify+0x1d8/0x1e0 [cls_flower] ? dma_pte_clear_level+0x129/0x1a0 ? dma_pte_clear_level+0x129/0x1a0 ? prb_fill_curr_block+0x2f/0xc0 ? skb_copy_bits+0x11a/0x220 __tcf_classify+0x58/0x110 tcf_classify_ingress+0x6b/0x140 __netif_receive_skb_core.constprop.0+0x47d/0xfd0 ? __iommu_dma_unmap_swiotlb+0x44/0x90 __netif_receive_skb_one_core+0x3d/0xa0 netif_receive_skb+0x116/0x170 be_process_rx+0x22f/0x330 [be2net] be_poll+0x13c/0x370 [be2net] __napi_poll+0x2a/0x170 net_rx_action+0x22f/0x2f0 __do_softirq+0xca/0x2a8 __irq_exit_rcu+0xc1/0xe0 common_interrupt+0x83/0xa0
Fixes: e314dbdc1c0d ("[NET]: Virtual ethernet device driver.") Signed-off-by: Guillaume Nault gnault@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index f7e3eb309a26..5be8ed910553 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -292,7 +292,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
rcu_read_lock(); rcv = rcu_dereference(priv->peer); - if (unlikely(!rcv)) { + if (unlikely(!rcv) || !pskb_may_pull(skb, ETH_HLEN)) { kfree_skb(skb); goto drop; }
From: Linus Torvalds torvalds@linux-foundation.org
[ Upstream commit 213d266ebfb1621aab79cfe63388facc520a1381 ]
When compiling with -Wformat, clang emits the following warning:
gpiolib-acpi.c:393:4: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] pin); ^~~
So warning that '%hhX' is paired with an 'int' is all just completely mindless and wrong. Sadly, I can see a different bogus warning reason why people would want to use '%02hhX'.
Again, the *sane* thing from a human perspective is to use '%02X. But if the compiler doesn't do any range analysis at all, it could decide that "Oh, that print format could need up to 8 bytes of space in the result". Using '%02hhX' would cut that down to two.
And since we use
char ev_name[5];
and currently use "_%c%02hhX" as the format string, even a compiler that doesn't notice that "pin <= 255" test that guards this all will go "OK, that's at most 4 bytes and the final NUL termination, so it's fine".
While a compiler - like gcc - that only sees that the original source of the 'pin' value is a 'unsigned short' array, and then doesn't take the "pin <= 255" into account, will warn like this:
gpiolib-acpi.c: In function 'acpi_gpiochip_request_interrupt': gpiolib-acpi.c:206:24: warning: '%02X' directive writing between 2 and 4 bytes into a region of size 3 [-Wformat-overflow=] sprintf(ev_name, "_%c%02X", ^~~~ gpiolib-acpi.c:206:20: note: directive argument in the range [0, 65535]
because gcc isn't being very good at that argument range analysis either.
In other words, the original use of 'hhx' was bogus to begin with, and due to *another* compiler warning being bad, and we had that bad code being written back in 2016 to work around _that_ compiler warning (commit e40a3ae1f794: "gpio: acpi: work around false-positive -Wstring-overflow warning").
Sadly, two different bad compiler warnings together does not make for one good one.
It just makes for even more pain.
End result: I think the simplest and cleanest option is simply the proposed change which undoes that '%hhX' change for gcc, and replaces it with just using a slightly bigger stack allocation. It's not like a 5-byte allocation is in any way likely to have saved any actual stack, since all the other variables in that function are 'int' or bigger.
False-positive compiler warnings really do make people write worse code, and that's a problem. But on a scale of bad code, I feel that extending the buffer trivially is better than adding a pointless cast that literally makes no sense.
At least in this case the end result isn't unreadable or buggy. We've had several cases of bad compiler warnings that caused changes that were actually horrendously wrong.
Fixes: e40a3ae1f794 ("gpio: acpi: work around false-positive -Wstring-overflow warning") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib-acpi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c index 55e4f402ec8b..44ee319da1b3 100644 --- a/drivers/gpio/gpiolib-acpi.c +++ b/drivers/gpio/gpiolib-acpi.c @@ -276,8 +276,8 @@ static acpi_status acpi_gpiochip_alloc_event(struct acpi_resource *ares, pin = agpio->pin_table[0];
if (pin <= 255) { - char ev_name[5]; - sprintf(ev_name, "_%c%02hhX", + char ev_name[8]; + sprintf(ev_name, "_%c%02X", agpio->triggering == ACPI_EDGE_SENSITIVE ? 'E' : 'L', pin); if (ACPI_SUCCESS(acpi_get_handle(handle, ev_name, &evt_handle)))
From: Calvin Johnson calvin.johnson@oss.nxp.com
[ Upstream commit 1bf343665057312167750509b0c48e8299293ac5 ]
Alphabetically sort header inclusion
Signed-off-by: Calvin Johnson calvin.johnson@oss.nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/mdio/mdio-bcm-unimac.c | 16 +++++++--------- drivers/net/mdio/mdio-bitbang.c | 4 ++-- drivers/net/mdio/mdio-cavium.c | 2 +- drivers/net/mdio/mdio-gpio.c | 10 +++++----- drivers/net/mdio/mdio-ipq4019.c | 4 ++-- drivers/net/mdio/mdio-ipq8064.c | 4 ++-- drivers/net/mdio/mdio-mscc-miim.c | 8 ++++---- drivers/net/mdio/mdio-mux-bcm-iproc.c | 10 +++++----- drivers/net/mdio/mdio-mux-gpio.c | 8 ++++---- drivers/net/mdio/mdio-mux-mmioreg.c | 6 +++--- drivers/net/mdio/mdio-mux-multiplexer.c | 2 +- drivers/net/mdio/mdio-mux.c | 6 +++--- drivers/net/mdio/mdio-octeon.c | 8 ++++---- drivers/net/mdio/mdio-thunder.c | 10 +++++----- drivers/net/mdio/mdio-xgene.c | 6 +++--- drivers/net/mdio/of_mdio.c | 10 +++++----- 16 files changed, 56 insertions(+), 58 deletions(-)
diff --git a/drivers/net/mdio/mdio-bcm-unimac.c b/drivers/net/mdio/mdio-bcm-unimac.c index fbd36891ee64..5d171e7f118d 100644 --- a/drivers/net/mdio/mdio-bcm-unimac.c +++ b/drivers/net/mdio/mdio-bcm-unimac.c @@ -5,20 +5,18 @@ * Copyright (C) 2014-2017 Broadcom */
+#include <linux/clk.h> +#include <linux/delay.h> +#include <linux/io.h> #include <linux/kernel.h> -#include <linux/phy.h> -#include <linux/platform_device.h> -#include <linux/sched.h> #include <linux/module.h> -#include <linux/io.h> -#include <linux/delay.h> -#include <linux/clk.h> - #include <linux/of.h> -#include <linux/of_platform.h> #include <linux/of_mdio.h> - +#include <linux/of_platform.h> +#include <linux/phy.h> #include <linux/platform_data/mdio-bcm-unimac.h> +#include <linux/platform_device.h> +#include <linux/sched.h>
#define MDIO_CMD 0x00 #define MDIO_START_BUSY (1 << 29) diff --git a/drivers/net/mdio/mdio-bitbang.c b/drivers/net/mdio/mdio-bitbang.c index 5136275c8e73..99588192cc78 100644 --- a/drivers/net/mdio/mdio-bitbang.c +++ b/drivers/net/mdio/mdio-bitbang.c @@ -14,10 +14,10 @@ * Vitaly Bordug vbordug@ru.mvista.com */
-#include <linux/module.h> +#include <linux/delay.h> #include <linux/mdio-bitbang.h> +#include <linux/module.h> #include <linux/types.h> -#include <linux/delay.h>
#define MDIO_READ 2 #define MDIO_WRITE 1 diff --git a/drivers/net/mdio/mdio-cavium.c b/drivers/net/mdio/mdio-cavium.c index 1afd6fc1a351..95ce274c1be1 100644 --- a/drivers/net/mdio/mdio-cavium.c +++ b/drivers/net/mdio/mdio-cavium.c @@ -4,9 +4,9 @@ */
#include <linux/delay.h> +#include <linux/io.h> #include <linux/module.h> #include <linux/phy.h> -#include <linux/io.h>
#include "mdio-cavium.h"
diff --git a/drivers/net/mdio/mdio-gpio.c b/drivers/net/mdio/mdio-gpio.c index 1b00235d7dc5..56c8f914f893 100644 --- a/drivers/net/mdio/mdio-gpio.c +++ b/drivers/net/mdio/mdio-gpio.c @@ -17,15 +17,15 @@ * Vitaly Bordug vbordug@ru.mvista.com */
-#include <linux/module.h> -#include <linux/slab.h> +#include <linux/gpio/consumer.h> #include <linux/interrupt.h> -#include <linux/platform_device.h> -#include <linux/platform_data/mdio-gpio.h> #include <linux/mdio-bitbang.h> #include <linux/mdio-gpio.h> -#include <linux/gpio/consumer.h> +#include <linux/module.h> #include <linux/of_mdio.h> +#include <linux/platform_data/mdio-gpio.h> +#include <linux/platform_device.h> +#include <linux/slab.h>
struct mdio_gpio_info { struct mdiobb_ctrl ctrl; diff --git a/drivers/net/mdio/mdio-ipq4019.c b/drivers/net/mdio/mdio-ipq4019.c index 25c25ea6da66..9cd71d896963 100644 --- a/drivers/net/mdio/mdio-ipq4019.c +++ b/drivers/net/mdio/mdio-ipq4019.c @@ -3,10 +3,10 @@ /* Copyright (c) 2020 Sartura Ltd. */
#include <linux/delay.h> -#include <linux/kernel.h> -#include <linux/module.h> #include <linux/io.h> #include <linux/iopoll.h> +#include <linux/kernel.h> +#include <linux/module.h> #include <linux/of_address.h> #include <linux/of_mdio.h> #include <linux/phy.h> diff --git a/drivers/net/mdio/mdio-ipq8064.c b/drivers/net/mdio/mdio-ipq8064.c index f0a6bfa61645..49d4e9aa30bb 100644 --- a/drivers/net/mdio/mdio-ipq8064.c +++ b/drivers/net/mdio/mdio-ipq8064.c @@ -7,12 +7,12 @@
#include <linux/delay.h> #include <linux/kernel.h> +#include <linux/mfd/syscon.h> #include <linux/module.h> -#include <linux/regmap.h> #include <linux/of_mdio.h> #include <linux/of_address.h> #include <linux/platform_device.h> -#include <linux/mfd/syscon.h> +#include <linux/regmap.h>
/* MII address register definitions */ #define MII_ADDR_REG_ADDR 0x10 diff --git a/drivers/net/mdio/mdio-mscc-miim.c b/drivers/net/mdio/mdio-mscc-miim.c index 1c9232fca1e2..037649bef92e 100644 --- a/drivers/net/mdio/mdio-mscc-miim.c +++ b/drivers/net/mdio/mdio-mscc-miim.c @@ -6,14 +6,14 @@ * Copyright (c) 2017 Microsemi Corporation */
-#include <linux/kernel.h> -#include <linux/module.h> -#include <linux/phy.h> -#include <linux/platform_device.h> #include <linux/bitops.h> #include <linux/io.h> #include <linux/iopoll.h> +#include <linux/kernel.h> +#include <linux/module.h> #include <linux/of_mdio.h> +#include <linux/phy.h> +#include <linux/platform_device.h>
#define MSCC_MIIM_REG_STATUS 0x0 #define MSCC_MIIM_STATUS_STAT_PENDING BIT(2) diff --git a/drivers/net/mdio/mdio-mux-bcm-iproc.c b/drivers/net/mdio/mdio-mux-bcm-iproc.c index 42fb5f166136..641cfa41f492 100644 --- a/drivers/net/mdio/mdio-mux-bcm-iproc.c +++ b/drivers/net/mdio/mdio-mux-bcm-iproc.c @@ -3,14 +3,14 @@ * Copyright 2016 Broadcom */ #include <linux/clk.h> -#include <linux/platform_device.h> +#include <linux/delay.h> #include <linux/device.h> -#include <linux/of_mdio.h> +#include <linux/iopoll.h> +#include <linux/mdio-mux.h> #include <linux/module.h> +#include <linux/of_mdio.h> #include <linux/phy.h> -#include <linux/mdio-mux.h> -#include <linux/delay.h> -#include <linux/iopoll.h> +#include <linux/platform_device.h>
#define MDIO_RATE_ADJ_EXT_OFFSET 0x000 #define MDIO_RATE_ADJ_INT_OFFSET 0x004 diff --git a/drivers/net/mdio/mdio-mux-gpio.c b/drivers/net/mdio/mdio-mux-gpio.c index 10a758fdc9e6..3c7f16f06b45 100644 --- a/drivers/net/mdio/mdio-mux-gpio.c +++ b/drivers/net/mdio/mdio-mux-gpio.c @@ -3,13 +3,13 @@ * Copyright (C) 2011, 2012 Cavium, Inc. */
-#include <linux/platform_device.h> #include <linux/device.h> -#include <linux/of_mdio.h> +#include <linux/gpio/consumer.h> +#include <linux/mdio-mux.h> #include <linux/module.h> +#include <linux/of_mdio.h> #include <linux/phy.h> -#include <linux/mdio-mux.h> -#include <linux/gpio/consumer.h> +#include <linux/platform_device.h>
#define DRV_VERSION "1.1" #define DRV_DESCRIPTION "GPIO controlled MDIO bus multiplexer driver" diff --git a/drivers/net/mdio/mdio-mux-mmioreg.c b/drivers/net/mdio/mdio-mux-mmioreg.c index d1a8780e24d8..c02fb2a067ee 100644 --- a/drivers/net/mdio/mdio-mux-mmioreg.c +++ b/drivers/net/mdio/mdio-mux-mmioreg.c @@ -7,13 +7,13 @@ * Copyright 2012 Freescale Semiconductor, Inc. */
-#include <linux/platform_device.h> #include <linux/device.h> +#include <linux/mdio-mux.h> +#include <linux/module.h> #include <linux/of_address.h> #include <linux/of_mdio.h> -#include <linux/module.h> #include <linux/phy.h> -#include <linux/mdio-mux.h> +#include <linux/platform_device.h>
struct mdio_mux_mmioreg_state { void *mux_handle; diff --git a/drivers/net/mdio/mdio-mux-multiplexer.c b/drivers/net/mdio/mdio-mux-multiplexer.c index d6564381aa3e..527acfc3c045 100644 --- a/drivers/net/mdio/mdio-mux-multiplexer.c +++ b/drivers/net/mdio/mdio-mux-multiplexer.c @@ -4,10 +4,10 @@ * Copyright 2019 NXP */
-#include <linux/platform_device.h> #include <linux/mdio-mux.h> #include <linux/module.h> #include <linux/mux/consumer.h> +#include <linux/platform_device.h>
struct mdio_mux_multiplexer_state { struct mux_control *muxc; diff --git a/drivers/net/mdio/mdio-mux.c b/drivers/net/mdio/mdio-mux.c index ccb3ee704eb1..3dde0c2b3e09 100644 --- a/drivers/net/mdio/mdio-mux.c +++ b/drivers/net/mdio/mdio-mux.c @@ -3,12 +3,12 @@ * Copyright (C) 2011, 2012 Cavium, Inc. */
-#include <linux/platform_device.h> -#include <linux/mdio-mux.h> -#include <linux/of_mdio.h> #include <linux/device.h> +#include <linux/mdio-mux.h> #include <linux/module.h> +#include <linux/of_mdio.h> #include <linux/phy.h> +#include <linux/platform_device.h>
#define DRV_DESCRIPTION "MDIO bus multiplexer driver"
diff --git a/drivers/net/mdio/mdio-octeon.c b/drivers/net/mdio/mdio-octeon.c index 6faf39314ac9..e096e68ac667 100644 --- a/drivers/net/mdio/mdio-octeon.c +++ b/drivers/net/mdio/mdio-octeon.c @@ -3,13 +3,13 @@ * Copyright (C) 2009-2015 Cavium, Inc. */
-#include <linux/platform_device.h> +#include <linux/gfp.h> +#include <linux/io.h> +#include <linux/module.h> #include <linux/of_address.h> #include <linux/of_mdio.h> -#include <linux/module.h> -#include <linux/gfp.h> #include <linux/phy.h> -#include <linux/io.h> +#include <linux/platform_device.h>
#include "mdio-cavium.h"
diff --git a/drivers/net/mdio/mdio-thunder.c b/drivers/net/mdio/mdio-thunder.c index dd7430c998a2..822d2cdd2f35 100644 --- a/drivers/net/mdio/mdio-thunder.c +++ b/drivers/net/mdio/mdio-thunder.c @@ -3,14 +3,14 @@ * Copyright (C) 2009-2016 Cavium, Inc. */
-#include <linux/of_address.h> -#include <linux/of_mdio.h> -#include <linux/module.h> +#include <linux/acpi.h> #include <linux/gfp.h> -#include <linux/phy.h> #include <linux/io.h> -#include <linux/acpi.h> +#include <linux/module.h> +#include <linux/of_address.h> +#include <linux/of_mdio.h> #include <linux/pci.h> +#include <linux/phy.h>
#include "mdio-cavium.h"
diff --git a/drivers/net/mdio/mdio-xgene.c b/drivers/net/mdio/mdio-xgene.c index 461207cdf5d6..7ab4e26db08c 100644 --- a/drivers/net/mdio/mdio-xgene.c +++ b/drivers/net/mdio/mdio-xgene.c @@ -13,11 +13,11 @@ #include <linux/io.h> #include <linux/mdio/mdio-xgene.h> #include <linux/module.h> -#include <linux/of_platform.h> -#include <linux/of_net.h> #include <linux/of_mdio.h> -#include <linux/prefetch.h> +#include <linux/of_net.h> +#include <linux/of_platform.h> #include <linux/phy.h> +#include <linux/prefetch.h> #include <net/ip.h>
static bool xgene_mdio_status; diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c index 4daf94bb56a5..ea0bf13e8ac3 100644 --- a/drivers/net/mdio/of_mdio.c +++ b/drivers/net/mdio/of_mdio.c @@ -8,17 +8,17 @@ * out of the OpenFirmware device tree and using it to populate an mii_bus. */
-#include <linux/kernel.h> #include <linux/device.h> -#include <linux/netdevice.h> #include <linux/err.h> -#include <linux/phy.h> -#include <linux/phy_fixed.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/netdevice.h> #include <linux/of.h> #include <linux/of_irq.h> #include <linux/of_mdio.h> #include <linux/of_net.h> -#include <linux/module.h> +#include <linux/phy.h> +#include <linux/phy_fixed.h>
#define DEFAULT_GPIO_RESET_DELAY 10 /* in microseconds */
From: Vadim Pasternak vadimp@nvidia.com
[ Upstream commit d452088cdfd5a4ad9d96d847d2273fe958d6339b ]
Add mutex_destroy() call in driver initialization error flow.
Fixes: 6882b0aee180f ("mlxsw: Introduce support for I2C bus") Signed-off-by: Vadim Pasternak vadimp@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Link: https://lore.kernel.org/r/20220407070703.2421076-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/i2c.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/i2c.c b/drivers/net/ethernet/mellanox/mlxsw/i2c.c index 939b692ffc33..ce843ea91464 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/i2c.c +++ b/drivers/net/ethernet/mellanox/mlxsw/i2c.c @@ -650,6 +650,7 @@ static int mlxsw_i2c_probe(struct i2c_client *client, return 0;
errout: + mutex_destroy(&mlxsw_i2c->cmd.lock); i2c_set_clientdata(client, NULL);
return err;
From: Marcelo Ricardo Leitner marcelo.leitner@gmail.com
[ Upstream commit e65812fd22eba32f11abe28cb377cbd64cfb1ba0 ]
Currently, when inserting a new filter that needs to sit at the head of chain 0, it will first update the heads pointer on all devices using the (shared) block, and only then complete the initialization of the new element so that it has a "next" element.
This can lead to a situation that the chain 0 head is propagated to another CPU before the "next" initialization is done. When this race condition is triggered, packets being matched on that CPU will simply miss all other filters, and will flow through the stack as if there were no other filters installed. If the system is using OVS + TC, such packets will get handled by vswitchd via upcall, which results in much higher latency and reordering. For other applications it may result in packet drops.
This is reproducible with a tc only setup, but it varies from system to system. It could be reproduced with a shared block amongst 10 veth tunnels, and an ingress filter mirroring packets to another veth. That's because using the last added veth tunnel to the shared block to do the actual traffic, it makes the race window bigger and easier to trigger.
The fix is rather simple, to just initialize the next pointer of the new filter instance (tp) before propagating the head change.
The fixes tag is pointing to the original code though this issue should only be observed when using it unlocked.
Fixes: 2190d1d0944f ("net: sched: introduce helpers to work with filter chains") Signed-off-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Davide Caratti dcaratti@redhat.com Link: https://lore.kernel.org/r/b97d5f4eaffeeb9d058155bcab63347527261abf.164934136... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 9a789a057a74..b8ffb7e4f696 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1656,10 +1656,10 @@ static int tcf_chain_tp_insert(struct tcf_chain *chain, if (chain->flushing) return -EAGAIN;
+ RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info)); if (*chain_info->pprev == chain->filter_chain) tcf_chain0_head_change(chain, tp); tcf_proto_get(tp); - RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info)); rcu_assign_pointer(*chain_info->pprev, tp);
return 0;
From: Michael Walle michael@walle.cc
[ Upstream commit e6934e4048c91502efcb21da92b7ae37cd8fa741 ]
The DSA master might not have been probed yet in which case the probe of the felix switch fails with -EPROBE_DEFER: [ 4.435305] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
It is not an error. Use dev_err_probe() to demote this particular error to a debug message.
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Michael Walle michael@walle.cc Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://lore.kernel.org/r/20220408101521.281886-1-michael@walle.cc Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix_vsc9959.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c index cd8d9b0e0edb..c96dfc11aa6f 100644 --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@ -1466,7 +1466,7 @@ static int felix_pci_probe(struct pci_dev *pdev,
err = dsa_register_switch(ds); if (err) { - dev_err(&pdev->dev, "Failed to register DSA switch: %d\n", err); + dev_err_probe(&pdev->dev, err, "Failed to register DSA switch\n"); goto err_register_ds; }
From: Dinh Nguyen dinguyen@kernel.org
[ Upstream commit a6aaa00324240967272b451bfa772547bd576ee6 ]
When using a fixed-link, the altr_tse_pcs driver crashes due to null-pointer dereference as no phy_device is provided to tse_pcs_fix_mac_speed function. Fix this by adding a check for phy_dev before calling the tse_pcs_fix_mac_speed() function.
Also clean up the tse_pcs_fix_mac_speed function a bit. There is no need to check for splitter_base and sgmii_adapter_base because the driver will fail if these 2 variables are not derived from the device tree.
Fixes: fb3bbdb85989 ("net: ethernet: Add TSE PCS support to dwmac-socfpga") Signed-off-by: Dinh Nguyen dinguyen@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c | 8 -------- drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h | 4 ++++ drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 13 +++++-------- 3 files changed, 9 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c index cd478d2cd871..00f6d347eaf7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c +++ b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c @@ -57,10 +57,6 @@ #define TSE_PCS_USE_SGMII_ENA BIT(0) #define TSE_PCS_IF_USE_SGMII 0x03
-#define SGMII_ADAPTER_CTRL_REG 0x00 -#define SGMII_ADAPTER_DISABLE 0x0001 -#define SGMII_ADAPTER_ENABLE 0x0000 - #define AUTONEGO_LINK_TIMER 20
static int tse_pcs_reset(void __iomem *base, struct tse_pcs *pcs) @@ -202,12 +198,8 @@ void tse_pcs_fix_mac_speed(struct tse_pcs *pcs, struct phy_device *phy_dev, unsigned int speed) { void __iomem *tse_pcs_base = pcs->tse_pcs_base; - void __iomem *sgmii_adapter_base = pcs->sgmii_adapter_base; u32 val;
- writew(SGMII_ADAPTER_ENABLE, - sgmii_adapter_base + SGMII_ADAPTER_CTRL_REG); - pcs->autoneg = phy_dev->autoneg;
if (phy_dev->autoneg == AUTONEG_ENABLE) { diff --git a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h index 442812c0a4bd..694ac25ef426 100644 --- a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h +++ b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h @@ -10,6 +10,10 @@ #include <linux/phy.h> #include <linux/timer.h>
+#define SGMII_ADAPTER_CTRL_REG 0x00 +#define SGMII_ADAPTER_ENABLE 0x0000 +#define SGMII_ADAPTER_DISABLE 0x0001 + struct tse_pcs { struct device *dev; void __iomem *tse_pcs_base; diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c index f37b6d57b2fe..8bb0106cb7ea 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c @@ -18,9 +18,6 @@
#include "altr_tse_pcs.h"
-#define SGMII_ADAPTER_CTRL_REG 0x00 -#define SGMII_ADAPTER_DISABLE 0x0001 - #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII 0x0 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII 0x1 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RMII 0x2 @@ -62,16 +59,14 @@ static void socfpga_dwmac_fix_mac_speed(void *priv, unsigned int speed) { struct socfpga_dwmac *dwmac = (struct socfpga_dwmac *)priv; void __iomem *splitter_base = dwmac->splitter_base; - void __iomem *tse_pcs_base = dwmac->pcs.tse_pcs_base; void __iomem *sgmii_adapter_base = dwmac->pcs.sgmii_adapter_base; struct device *dev = dwmac->dev; struct net_device *ndev = dev_get_drvdata(dev); struct phy_device *phy_dev = ndev->phydev; u32 val;
- if ((tse_pcs_base) && (sgmii_adapter_base)) - writew(SGMII_ADAPTER_DISABLE, - sgmii_adapter_base + SGMII_ADAPTER_CTRL_REG); + writew(SGMII_ADAPTER_DISABLE, + sgmii_adapter_base + SGMII_ADAPTER_CTRL_REG);
if (splitter_base) { val = readl(splitter_base + EMAC_SPLITTER_CTRL_REG); @@ -93,7 +88,9 @@ static void socfpga_dwmac_fix_mac_speed(void *priv, unsigned int speed) writel(val, splitter_base + EMAC_SPLITTER_CTRL_REG); }
- if (tse_pcs_base && sgmii_adapter_base) + writew(SGMII_ADAPTER_ENABLE, + sgmii_adapter_base + SGMII_ADAPTER_CTRL_REG); + if (phy_dev) tse_pcs_fix_mac_speed(&dwmac->pcs, phy_dev, speed); }
From: Benedikt Spranger b.spranger@linutronix.de
[ Upstream commit e8a64bbaaad1f6548cec5508297bc6d45e8ab69e ]
A user may set the SO_TXTIME socket option to ensure a packet is send at a given time. The taprio scheduler has to confirm, that it is allowed to send a packet at that given time, by a check against the packet time schedule. The scheduler drop the packet, if the gates are closed at the given send time.
The check, if SO_TXTIME is set, may fail since sk_flags are part of an union and the union is used otherwise. This happen, if a socket is not a full socket, like a request socket for example.
Add a check to verify, if the union is used for sk_flags.
Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Signed-off-by: Benedikt Spranger b.spranger@linutronix.de Reviewed-by: Kurt Kanzenbach kurt@linutronix.de Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_taprio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 806babdd838d..eca525791013 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -427,7 +427,8 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (unlikely(!child)) return qdisc_drop(skb, sch, to_free);
- if (skb->sk && sock_flag(skb->sk, SOCK_TXTIME)) { + /* sk_flags are only safe to use on full sockets. */ + if (skb->sk && sk_fullsock(skb->sk) && sock_flag(skb->sk, SOCK_TXTIME)) { if (!is_valid_interval(skb, sch)) return qdisc_drop(skb, sch, to_free); } else if (TXTIME_ASSIST_IS_ENABLED(q->flags)) {
From: Rameshkumar Sundaram quic_ramess@quicinc.com
[ Upstream commit a5199b5626cd6913cf8776a835bc63d40e0686ad ]
Synchronize additions to nontrans_list of transmitting BSS with bss_lock to avoid races. Also when cfg80211_add_nontrans_list() fails __cfg80211_unlink_bss() needs bss_lock to be held (has lockdep assert on bss_lock). So protect the whole block with bss_lock to avoid races and warnings. Found during code review.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Rameshkumar Sundaram quic_ramess@quicinc.com Link: https://lore.kernel.org/r/1649668071-9370-1-git-send-email-quic_ramess@quici... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/scan.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/wireless/scan.c b/net/wireless/scan.c index c1b2655682a8..6dc9b7e22b71 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -1968,11 +1968,13 @@ cfg80211_inform_single_bss_data(struct wiphy *wiphy, /* this is a nontransmitting bss, we need to add it to * transmitting bss' list if it is not there */ + spin_lock_bh(&rdev->bss_lock); if (cfg80211_add_nontrans_list(non_tx_data->tx_bss, &res->pub)) { if (__cfg80211_unlink_bss(rdev, res)) rdev->bss_generation++; } + spin_unlock_bh(&rdev->bss_lock); }
trace_cfg80211_return_bss(&res->pub);
From: Rob Clark robdclark@chromium.org
[ Upstream commit 537fef808be5ea56f6fc06932162550819a3b3c3 ]
The fourth param is size, rather than range_end.
Note that we could increase the address space size if we had a way to prevent buffers from spanning a 4G split, mostly just to avoid fw bugs with 64b math.
Fixes: 84c31ee16f90 ("drm/msm/a6xx: Add support for per-instance pagetables") Signed-off-by: Rob Clark robdclark@chromium.org Link: https://lore.kernel.org/r/20220407202836.1211268-1-robdclark@gmail.com Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 9e09805575db..39563daff4a0 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1222,7 +1222,7 @@ a6xx_create_private_address_space(struct msm_gpu *gpu) return ERR_CAST(mmu);
return msm_gem_address_space_create(mmu, - "gpu", 0x100000000ULL, 0x1ffffffffULL); + "gpu", 0x100000000ULL, SZ_4G); }
static uint32_t a6xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
From: Stephen Boyd swboyd@chromium.org
[ Upstream commit 47b7de6b88b962ef339a2427a023d2a23d161654 ]
The member 'msm_dsi->connector' isn't assigned until msm_dsi_manager_connector_init() returns (see msm_dsi_modeset_init() and how it assigns the return value). Therefore this pointer is going to be NULL here. Let's use 'connector' which is what was intended.
Cc: Dmitry Baryshkov dmitry.baryshkov@linaro.org Cc: Sean Paul seanpaul@chromium.org Fixes: 6d5e78406991 ("drm/msm/dsi: Move dsi panel init into modeset init path") Signed-off-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/478693/ Link: https://lore.kernel.org/r/20220318000731.2823718-1-swboyd@chromium.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dsi/dsi_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/dsi/dsi_manager.c b/drivers/gpu/drm/msm/dsi/dsi_manager.c index 1d28dfba2c9b..fb421ca56b3d 100644 --- a/drivers/gpu/drm/msm/dsi/dsi_manager.c +++ b/drivers/gpu/drm/msm/dsi/dsi_manager.c @@ -644,7 +644,7 @@ struct drm_connector *msm_dsi_manager_connector_init(u8 id) return connector;
fail: - connector->funcs->destroy(msm_dsi->connector); + connector->funcs->destroy(connector); return ERR_PTR(ret); }
From: Karsten Graul kgraul@linux.ibm.com
[ Upstream commit d22f4f977236f97e01255a80bca2ea93a8094fc8 ]
dev_name() was called with dev.parent as argument but without to NULL-check it before. Solve this by checking the pointer before the call to dev_name().
Fixes: af5f60c7e3d5 ("net/smc: allow PCI IDs as ib device names in the pnet table") Reported-by: syzbot+03e3e228510223dabd34@syzkaller.appspotmail.com Signed-off-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_pnet.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index 9007c7e3bae4..30bae60d626c 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -310,8 +310,9 @@ static struct smc_ib_device *smc_pnet_find_ib(char *ib_name) list_for_each_entry(ibdev, &smc_ib_devices.list, list) { if (!strncmp(ibdev->ibdev->name, ib_name, sizeof(ibdev->ibdev->name)) || - !strncmp(dev_name(ibdev->ibdev->dev.parent), ib_name, - IB_DEVICE_NAME_MAX - 1)) { + (ibdev->ibdev->dev.parent && + !strncmp(dev_name(ibdev->ibdev->dev.parent), ib_name, + IB_DEVICE_NAME_MAX - 1))) { goto out; } }
From: Ajish Koshy Ajish.Koshy@microchip.com
[ Upstream commit 294080eacf92a0781e6d43663448a55001ec8c64 ]
When upper inbound and outbound queues 32-63 are enabled, we see upper vectors 32-63 in interrupt service routine. We need corresponding registers to handle masking and unmasking of these upper interrupts.
To achieve this, we use registers MSGU_ODMR_U(0x34) to mask and MSGU_ODMR_CLR_U(0x3C) to unmask the interrupts. In these registers bit 0-31 represents interrupt vectors 32-63.
Link: https://lore.kernel.org/r/20220411064603.668448-2-Ajish.Koshy@microchip.com Fixes: 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") Reviewed-by: John Garry john.garry@huawei.com Acked-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Ajish Koshy Ajish.Koshy@microchip.com Signed-off-by: Viswas G Viswas.G@microchip.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm80xx_hwi.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index 4c03bf08b543..0543ff3ff1ba 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -1682,10 +1682,11 @@ static void pm80xx_chip_interrupt_enable(struct pm8001_hba_info *pm8001_ha, u8 vec) { #ifdef PM8001_USE_MSIX - u32 mask; - mask = (u32)(1 << vec); - - pm8001_cw32(pm8001_ha, 0, MSGU_ODMR_CLR, (u32)(mask & 0xFFFFFFFF)); + if (vec < 32) + pm8001_cw32(pm8001_ha, 0, MSGU_ODMR_CLR, 1U << vec); + else + pm8001_cw32(pm8001_ha, 0, MSGU_ODMR_CLR_U, + 1U << (vec - 32)); return; #endif pm80xx_chip_intx_interrupt_enable(pm8001_ha); @@ -1701,12 +1702,15 @@ static void pm80xx_chip_interrupt_disable(struct pm8001_hba_info *pm8001_ha, u8 vec) { #ifdef PM8001_USE_MSIX - u32 mask; - if (vec == 0xFF) - mask = 0xFFFFFFFF; + if (vec == 0xFF) { + /* disable all vectors 0-31, 32-63 */ + pm8001_cw32(pm8001_ha, 0, MSGU_ODMR, 0xFFFFFFFF); + pm8001_cw32(pm8001_ha, 0, MSGU_ODMR_U, 0xFFFFFFFF); + } else if (vec < 32) + pm8001_cw32(pm8001_ha, 0, MSGU_ODMR, 1U << vec); else - mask = (u32)(1 << vec); - pm8001_cw32(pm8001_ha, 0, MSGU_ODMR, (u32)(mask & 0xFFFFFFFF)); + pm8001_cw32(pm8001_ha, 0, MSGU_ODMR_U, + 1U << (vec - 32)); return; #endif pm80xx_chip_intx_interrupt_disable(pm8001_ha);
From: Ajish Koshy Ajish.Koshy@microchip.com
[ Upstream commit bcd8a45223470e00b5f254018174d64a75db4bbe ]
Executing driver on servers with more than 32 CPUs were faced with command timeouts. This is because we were not geting completions for commands submitted on IQ32 - IQ63.
Set E64Q bit to enable upper inbound and outbound queues 32 to 63 in the MPI main configuration table.
Added 500ms delay after successful MPI initialization as mentioned in controller datasheet.
Link: https://lore.kernel.org/r/20220411064603.668448-3-Ajish.Koshy@microchip.com Fixes: 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") Reviewed-by: Damien Le Moal damien.lemoal@opensource.wdc.com Acked-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Ajish Koshy Ajish.Koshy@microchip.com Signed-off-by: Viswas G Viswas.G@microchip.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm80xx_hwi.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index 0543ff3ff1ba..0305c8999ba5 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -765,6 +765,10 @@ static void init_default_table_values(struct pm8001_hba_info *pm8001_ha) pm8001_ha->main_cfg_tbl.pm80xx_tbl.pcs_event_log_severity = 0x01; pm8001_ha->main_cfg_tbl.pm80xx_tbl.fatal_err_interrupt = 0x01;
+ /* Enable higher IQs and OQs, 32 to 63, bit 16 */ + if (pm8001_ha->max_q_num > 32) + pm8001_ha->main_cfg_tbl.pm80xx_tbl.fatal_err_interrupt |= + 1 << 16; /* Disable end to end CRC checking */ pm8001_ha->main_cfg_tbl.pm80xx_tbl.crc_core_dump = (0x1 << 16);
@@ -1024,6 +1028,13 @@ static int mpi_init_check(struct pm8001_hba_info *pm8001_ha) if (0x0000 != gst_len_mpistate) return -EBUSY;
+ /* + * As per controller datasheet, after successful MPI + * initialization minimum 500ms delay is required before + * issuing commands. + */ + msleep(500); + return 0; }
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 891e2639deae721dc43764a44fa255890dc34313 ]
During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new I/O but every driver except cxgbi and iscsi_tcp can still get I/O from __iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before ep_disconnect. This could happen if we were terminating the session, and the logout timed out before it was even sent to libiscsi.
Fix the issue by adding a helper which reverses the bind_conn call that allows new I/O to be queued. Drivers implementing ep_disconnect can use this to make sure new I/O is not queued to them when handling the disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.com Reviewed-by: Lee Duncan lduncan@suse.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/iser/iscsi_iser.c | 1 + drivers/scsi/be2iscsi/be_main.c | 1 + drivers/scsi/bnx2i/bnx2i_iscsi.c | 1 + drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 1 + drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 1 + drivers/scsi/libiscsi.c | 70 +++++++++++++++++++++--- drivers/scsi/qedi/qedi_iscsi.c | 1 + drivers/scsi/qla4xxx/ql4_os.c | 1 + drivers/scsi/scsi_transport_iscsi.c | 10 +++- include/scsi/libiscsi.h | 1 + include/scsi/scsi_transport_iscsi.h | 1 + 11 files changed, 78 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 3690e28cc7ea..8857d8322710 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -988,6 +988,7 @@ static struct iscsi_transport iscsi_iser_transport = { /* connection management */ .create_conn = iscsi_iser_conn_create, .bind_conn = iscsi_iser_conn_bind, + .unbind_conn = iscsi_conn_unbind, .destroy_conn = iscsi_conn_teardown, .attr_is_visible = iser_attr_is_visible, .set_param = iscsi_iser_set_param, diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c index 987dc8135a9b..b977e039bb78 100644 --- a/drivers/scsi/be2iscsi/be_main.c +++ b/drivers/scsi/be2iscsi/be_main.c @@ -5810,6 +5810,7 @@ struct iscsi_transport beiscsi_iscsi_transport = { .destroy_session = beiscsi_session_destroy, .create_conn = beiscsi_conn_create, .bind_conn = beiscsi_conn_bind, + .unbind_conn = iscsi_conn_unbind, .destroy_conn = iscsi_conn_teardown, .attr_is_visible = beiscsi_attr_is_visible, .set_iface_param = beiscsi_iface_set_param, diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c index 21efc73b87be..173d8c6a8cbf 100644 --- a/drivers/scsi/bnx2i/bnx2i_iscsi.c +++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c @@ -2278,6 +2278,7 @@ struct iscsi_transport bnx2i_iscsi_transport = { .destroy_session = bnx2i_session_destroy, .create_conn = bnx2i_conn_create, .bind_conn = bnx2i_conn_bind, + .unbind_conn = iscsi_conn_unbind, .destroy_conn = bnx2i_conn_destroy, .attr_is_visible = bnx2i_attr_is_visible, .set_param = iscsi_set_param, diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c index 37d99357120f..edcd3fab6973 100644 --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c @@ -117,6 +117,7 @@ static struct iscsi_transport cxgb3i_iscsi_transport = { /* connection management */ .create_conn = cxgbi_create_conn, .bind_conn = cxgbi_bind_conn, + .unbind_conn = iscsi_conn_unbind, .destroy_conn = iscsi_tcp_conn_teardown, .start_conn = iscsi_conn_start, .stop_conn = iscsi_conn_stop, diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c index 2c3491528d42..efb3e2b3398e 100644 --- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c +++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c @@ -134,6 +134,7 @@ static struct iscsi_transport cxgb4i_iscsi_transport = { /* connection management */ .create_conn = cxgbi_create_conn, .bind_conn = cxgbi_bind_conn, + .unbind_conn = iscsi_conn_unbind, .destroy_conn = iscsi_tcp_conn_teardown, .start_conn = iscsi_conn_start, .stop_conn = iscsi_conn_stop, diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index d4e66c595eb8..05799b41974d 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1367,23 +1367,32 @@ void iscsi_session_failure(struct iscsi_session *session, } EXPORT_SYMBOL_GPL(iscsi_session_failure);
-void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err) +static bool iscsi_set_conn_failed(struct iscsi_conn *conn) { struct iscsi_session *session = conn->session;
- spin_lock_bh(&session->frwd_lock); - if (session->state == ISCSI_STATE_FAILED) { - spin_unlock_bh(&session->frwd_lock); - return; - } + if (session->state == ISCSI_STATE_FAILED) + return false;
if (conn->stop_stage == 0) session->state = ISCSI_STATE_FAILED; - spin_unlock_bh(&session->frwd_lock);
set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx); - iscsi_conn_error_event(conn->cls_conn, err); + return true; +} + +void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err) +{ + struct iscsi_session *session = conn->session; + bool needs_evt; + + spin_lock_bh(&session->frwd_lock); + needs_evt = iscsi_set_conn_failed(conn); + spin_unlock_bh(&session->frwd_lock); + + if (needs_evt) + iscsi_conn_error_event(conn->cls_conn, err); } EXPORT_SYMBOL_GPL(iscsi_conn_failure);
@@ -2117,6 +2126,51 @@ static void iscsi_check_transport_timeouts(struct timer_list *t) spin_unlock(&session->frwd_lock); }
+/** + * iscsi_conn_unbind - prevent queueing to conn. + * @cls_conn: iscsi conn ep is bound to. + * @is_active: is the conn in use for boot or is this for EH/termination + * + * This must be called by drivers implementing the ep_disconnect callout. + * It disables queueing to the connection from libiscsi in preparation for + * an ep_disconnect call. + */ +void iscsi_conn_unbind(struct iscsi_cls_conn *cls_conn, bool is_active) +{ + struct iscsi_session *session; + struct iscsi_conn *conn; + + if (!cls_conn) + return; + + conn = cls_conn->dd_data; + session = conn->session; + /* + * Wait for iscsi_eh calls to exit. We don't wait for the tmf to + * complete or timeout. The caller just wants to know what's running + * is everything that needs to be cleaned up, and no cmds will be + * queued. + */ + mutex_lock(&session->eh_mutex); + + iscsi_suspend_queue(conn); + iscsi_suspend_tx(conn); + + spin_lock_bh(&session->frwd_lock); + if (!is_active) { + /* + * if logout timed out before userspace could even send a PDU + * the state might still be in ISCSI_STATE_LOGGED_IN and + * allowing new cmds and TMFs. + */ + if (session->state == ISCSI_STATE_LOGGED_IN) + iscsi_set_conn_failed(conn); + } + spin_unlock_bh(&session->frwd_lock); + mutex_unlock(&session->eh_mutex); +} +EXPORT_SYMBOL_GPL(iscsi_conn_unbind); + static void iscsi_prep_abort_task_pdu(struct iscsi_task *task, struct iscsi_tm *hdr) { diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index f51723e2d522..8f8036ae9a56 100644 --- a/drivers/scsi/qedi/qedi_iscsi.c +++ b/drivers/scsi/qedi/qedi_iscsi.c @@ -1428,6 +1428,7 @@ struct iscsi_transport qedi_iscsi_transport = { .destroy_session = qedi_session_destroy, .create_conn = qedi_conn_create, .bind_conn = qedi_conn_bind, + .unbind_conn = iscsi_conn_unbind, .start_conn = qedi_conn_start, .stop_conn = iscsi_conn_stop, .destroy_conn = qedi_conn_destroy, diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c index 2c23b692e318..a8b8bf118c76 100644 --- a/drivers/scsi/qla4xxx/ql4_os.c +++ b/drivers/scsi/qla4xxx/ql4_os.c @@ -259,6 +259,7 @@ static struct iscsi_transport qla4xxx_iscsi_transport = { .start_conn = qla4xxx_conn_start, .create_conn = qla4xxx_conn_create, .bind_conn = qla4xxx_conn_bind, + .unbind_conn = iscsi_conn_unbind, .stop_conn = iscsi_conn_stop, .destroy_conn = qla4xxx_conn_destroy, .set_param = iscsi_set_param, diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index a5759d0e388a..6490211b55d2 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2957,7 +2957,7 @@ static int iscsi_if_ep_connect(struct iscsi_transport *transport, }
static int iscsi_if_ep_disconnect(struct iscsi_transport *transport, - u64 ep_handle) + u64 ep_handle, bool is_active) { struct iscsi_cls_conn *conn; struct iscsi_endpoint *ep; @@ -2974,6 +2974,8 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport, conn->ep = NULL; mutex_unlock(&conn->ep_mutex); conn->state = ISCSI_CONN_FAILED; + + transport->unbind_conn(conn, is_active); }
transport->ep_disconnect(ep); @@ -3005,7 +3007,8 @@ iscsi_if_transport_ep(struct iscsi_transport *transport, break; case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT: rc = iscsi_if_ep_disconnect(transport, - ev->u.ep_disconnect.ep_handle); + ev->u.ep_disconnect.ep_handle, + false); break; } return rc; @@ -3730,7 +3733,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group) conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
if (conn && conn->ep) - iscsi_if_ep_disconnect(transport, conn->ep->id); + iscsi_if_ep_disconnect(transport, conn->ep->id, true);
if (!session || !conn) { err = -EINVAL; @@ -4649,6 +4652,7 @@ iscsi_register_transport(struct iscsi_transport *tt) int err;
BUG_ON(!tt); + WARN_ON(tt->ep_disconnect && !tt->unbind_conn);
priv = iscsi_if_transport_lookup(tt); if (priv) diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h index 2b5f97224f69..fa00e2543ad6 100644 --- a/include/scsi/libiscsi.h +++ b/include/scsi/libiscsi.h @@ -421,6 +421,7 @@ extern int iscsi_conn_start(struct iscsi_cls_conn *); extern void iscsi_conn_stop(struct iscsi_cls_conn *, int); extern int iscsi_conn_bind(struct iscsi_cls_session *, struct iscsi_cls_conn *, int); +extern void iscsi_conn_unbind(struct iscsi_cls_conn *cls_conn, bool is_active); extern void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err); extern void iscsi_session_failure(struct iscsi_session *session, enum iscsi_err err); diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h index f28bb20d6271..eb6ed499324d 100644 --- a/include/scsi/scsi_transport_iscsi.h +++ b/include/scsi/scsi_transport_iscsi.h @@ -82,6 +82,7 @@ struct iscsi_transport { void (*destroy_session) (struct iscsi_cls_session *session); struct iscsi_cls_conn *(*create_conn) (struct iscsi_cls_session *sess, uint32_t cid); + void (*unbind_conn) (struct iscsi_cls_conn *conn, bool is_active); int (*bind_conn) (struct iscsi_cls_session *session, struct iscsi_cls_conn *cls_conn, uint64_t transport_eph, int is_leading);
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 06c203a5566beecebb1f8838d026de8a61c8df71 ]
If the system is not up, we can just fail immediately since iscsid is not going to ever answer our netlink events. We are already setting the recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will block the session and start the recovery timer, because for that flag userspace will do the unbind and destroy events which would remove the devices and wake up and kill the eh.
Since the conn is dead and the system is going dowm this just has us use STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately. However, if the user has set the recovery_tmo=-1 we let the system hang like they requested since they might have used that setting for specific reasons (one known reason is for buggy cluster software).
Link: https://lore.kernel.org/r/20210525181821.7617-5-michael.christie@oracle.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 6490211b55d2..f93660142e35 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2508,11 +2508,17 @@ static void stop_conn_work_fn(struct work_struct *work) session = iscsi_session_lookup(sid); if (session) { if (system_state != SYSTEM_RUNNING) { - session->recovery_tmo = 0; - iscsi_if_stop_conn(conn, STOP_CONN_TERM); - } else { - iscsi_if_stop_conn(conn, STOP_CONN_RECOVER); + /* + * If the user has set up for the session to + * never timeout then hang like they wanted. + * For all other cases fail right away since + * userspace is not going to relogin. + */ + if (session->recovery_tmo > 0) + session->recovery_tmo = 0; } + + iscsi_if_stop_conn(conn, STOP_CONN_RECOVER); }
list_del_init(&conn->conn_list_err);
From: Mike Christie michael.christie@oracle.com
[ Upstream commit b25b957d2db1585602c2c70fdf4261a5641fe6b7 ]
Use the system_unbound_wq for async session destruction. We don't need a dedicated workqueue for async session destruction because:
1. perf does not seem to be an issue since we only allow 1 active work.
2. it does not have deps with other system works and we can run them in parallel with each other.
Link: https://lore.kernel.org/r/20210525181821.7617-6-michael.christie@oracle.com Reviewed-by: Lee Duncan lduncan@suse.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index f93660142e35..ed0c7e812445 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -95,8 +95,6 @@ static DECLARE_WORK(stop_conn_work, stop_conn_work_fn); static atomic_t iscsi_session_nr; /* sysfs session id for next new session */ static struct workqueue_struct *iscsi_eh_timer_workq;
-static struct workqueue_struct *iscsi_destroy_workq; - static DEFINE_IDA(iscsi_sess_ida); /* * list of registered transports and lock that must @@ -3717,7 +3715,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group) list_del_init(&session->sess_list); spin_unlock_irqrestore(&sesslock, flags);
- queue_work(iscsi_destroy_workq, &session->destroy_work); + queue_work(system_unbound_wq, &session->destroy_work); } break; case ISCSI_UEVENT_UNBIND_SESSION: @@ -4813,18 +4811,8 @@ static __init int iscsi_transport_init(void) goto release_nls; }
- iscsi_destroy_workq = alloc_workqueue("%s", - WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, - 1, "iscsi_destroy"); - if (!iscsi_destroy_workq) { - err = -ENOMEM; - goto destroy_wq; - } - return 0;
-destroy_wq: - destroy_workqueue(iscsi_eh_timer_workq); release_nls: netlink_kernel_release(nls); unregister_flashnode_bus: @@ -4846,7 +4834,6 @@ static __init int iscsi_transport_init(void)
static void __exit iscsi_transport_exit(void) { - destroy_workqueue(iscsi_destroy_workq); destroy_workqueue(iscsi_eh_timer_workq); netlink_kernel_release(nls); bus_unregister(&iscsi_flashnode_bus);
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 9e5fe1700896c85040943fdc0d3fee0dd3e0d36f ]
Subsequent commits allow the kernel to do ep_disconnect. In that case we will have to get a proper refcount on the ep so one thread does not delete it from under another.
Link: https://lore.kernel.org/r/20210525181821.7617-7-michael.christie@oracle.com Reviewed-by: Lee Duncan lduncan@suse.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/iser/iscsi_iser.c | 1 + drivers/scsi/be2iscsi/be_iscsi.c | 19 ++++++++++++------ drivers/scsi/bnx2i/bnx2i_iscsi.c | 23 +++++++++++++++------- drivers/scsi/cxgbi/libcxgbi.c | 12 ++++++++---- drivers/scsi/qedi/qedi_iscsi.c | 25 +++++++++++++++++------- drivers/scsi/qla4xxx/ql4_os.c | 1 + drivers/scsi/scsi_transport_iscsi.c | 25 ++++++++++++++++-------- include/scsi/scsi_transport_iscsi.h | 1 + 8 files changed, 75 insertions(+), 32 deletions(-)
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 8857d8322710..a16e066989fa 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -499,6 +499,7 @@ iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session, iser_conn->iscsi_conn = conn;
out: + iscsi_put_endpoint(ep); mutex_unlock(&iser_conn->state_mutex); return error; } diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c index a13c203ef7a9..c4881657a807 100644 --- a/drivers/scsi/be2iscsi/be_iscsi.c +++ b/drivers/scsi/be2iscsi/be_iscsi.c @@ -182,6 +182,7 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session, struct beiscsi_endpoint *beiscsi_ep; struct iscsi_endpoint *ep; uint16_t cri_index; + int rc = 0;
ep = iscsi_lookup_endpoint(transport_fd); if (!ep) @@ -189,15 +190,17 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
beiscsi_ep = ep->dd_data;
- if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) - return -EINVAL; + if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) { + rc = -EINVAL; + goto put_ep; + }
if (beiscsi_ep->phba != phba) { beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG, "BS_%d : beiscsi_ep->hba=%p not equal to phba=%p\n", beiscsi_ep->phba, phba); - - return -EEXIST; + rc = -EEXIST; + goto put_ep; } cri_index = BE_GET_CRI_FROM_CID(beiscsi_ep->ep_cid); if (phba->conn_table[cri_index]) { @@ -209,7 +212,8 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session, beiscsi_ep->ep_cid, beiscsi_conn, phba->conn_table[cri_index]); - return -EINVAL; + rc = -EINVAL; + goto put_ep; } }
@@ -226,7 +230,10 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session, "BS_%d : cid %d phba->conn_table[%u]=%p\n", beiscsi_ep->ep_cid, cri_index, beiscsi_conn); phba->conn_table[cri_index] = beiscsi_conn; - return 0; + +put_ep: + iscsi_put_endpoint(ep); + return rc; }
static int beiscsi_iface_create_ipv4(struct beiscsi_hba *phba) diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c index 173d8c6a8cbf..649664dc6da4 100644 --- a/drivers/scsi/bnx2i/bnx2i_iscsi.c +++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c @@ -1422,17 +1422,23 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session, * Forcefully terminate all in progress connection recovery at the * earliest, either in bind(), send_pdu(LOGIN), or conn_start() */ - if (bnx2i_adapter_ready(hba)) - return -EIO; + if (bnx2i_adapter_ready(hba)) { + ret_code = -EIO; + goto put_ep; + }
bnx2i_ep = ep->dd_data; if ((bnx2i_ep->state == EP_STATE_TCP_FIN_RCVD) || - (bnx2i_ep->state == EP_STATE_TCP_RST_RCVD)) + (bnx2i_ep->state == EP_STATE_TCP_RST_RCVD)) { /* Peer disconnect via' FIN or RST */ - return -EINVAL; + ret_code = -EINVAL; + goto put_ep; + }
- if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) - return -EINVAL; + if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) { + ret_code = -EINVAL; + goto put_ep; + }
if (bnx2i_ep->hba != hba) { /* Error - TCP connection does not belong to this device @@ -1443,7 +1449,8 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session, iscsi_conn_printk(KERN_ALERT, cls_conn->dd_data, "belong to hba (%s)\n", hba->netdev->name); - return -EEXIST; + ret_code = -EEXIST; + goto put_ep; } bnx2i_ep->conn = bnx2i_conn; bnx2i_conn->ep = bnx2i_ep; @@ -1460,6 +1467,8 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session, bnx2i_put_rq_buf(bnx2i_conn, 0);
bnx2i_arm_cq_event_coalescing(bnx2i_conn->ep, CNIC_ARM_CQE); +put_ep: + iscsi_put_endpoint(ep); return ret_code; }
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c index ecb134b4699f..506b561670af 100644 --- a/drivers/scsi/cxgbi/libcxgbi.c +++ b/drivers/scsi/cxgbi/libcxgbi.c @@ -2690,11 +2690,13 @@ int cxgbi_bind_conn(struct iscsi_cls_session *cls_session, err = csk->cdev->csk_ddp_setup_pgidx(csk, csk->tid, ppm->tformat.pgsz_idx_dflt); if (err < 0) - return err; + goto put_ep;
err = iscsi_conn_bind(cls_session, cls_conn, is_leading); - if (err) - return -EINVAL; + if (err) { + err = -EINVAL; + goto put_ep; + }
/* calculate the tag idx bits needed for this conn based on cmds_max */ cconn->task_idx_bits = (__ilog2_u32(conn->session->cmds_max - 1)) + 1; @@ -2715,7 +2717,9 @@ int cxgbi_bind_conn(struct iscsi_cls_session *cls_session, /* init recv engine */ iscsi_tcp_hdr_recv_prep(tcp_conn);
- return 0; +put_ep: + iscsi_put_endpoint(ep); + return err; } EXPORT_SYMBOL_GPL(cxgbi_bind_conn);
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index 8f8036ae9a56..5f7e62f19d83 100644 --- a/drivers/scsi/qedi/qedi_iscsi.c +++ b/drivers/scsi/qedi/qedi_iscsi.c @@ -387,6 +387,7 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session, struct qedi_ctx *qedi = iscsi_host_priv(shost); struct qedi_endpoint *qedi_ep; struct iscsi_endpoint *ep; + int rc = 0;
ep = iscsi_lookup_endpoint(transport_fd); if (!ep) @@ -394,11 +395,16 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
qedi_ep = ep->dd_data; if ((qedi_ep->state == EP_STATE_TCP_FIN_RCVD) || - (qedi_ep->state == EP_STATE_TCP_RST_RCVD)) - return -EINVAL; + (qedi_ep->state == EP_STATE_TCP_RST_RCVD)) { + rc = -EINVAL; + goto put_ep; + } + + if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) { + rc = -EINVAL; + goto put_ep; + }
- if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) - return -EINVAL;
qedi_ep->conn = qedi_conn; qedi_conn->ep = qedi_ep; @@ -408,13 +414,18 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session, qedi_conn->cmd_cleanup_req = 0; qedi_conn->cmd_cleanup_cmpl = 0;
- if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn)) - return -EINVAL; + if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn)) { + rc = -EINVAL; + goto put_ep; + } +
spin_lock_init(&qedi_conn->tmf_work_lock); INIT_LIST_HEAD(&qedi_conn->tmf_work_list); init_waitqueue_head(&qedi_conn->wait_queue); - return 0; +put_ep: + iscsi_put_endpoint(ep); + return rc; }
static int qedi_iscsi_update_conn(struct qedi_ctx *qedi, diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c index a8b8bf118c76..8d82d2a83059 100644 --- a/drivers/scsi/qla4xxx/ql4_os.c +++ b/drivers/scsi/qla4xxx/ql4_os.c @@ -3238,6 +3238,7 @@ static int qla4xxx_conn_bind(struct iscsi_cls_session *cls_session, conn = cls_conn->dd_data; qla_conn = conn->dd_data; qla_conn->qla_ep = ep->dd_data; + iscsi_put_endpoint(ep); return 0; }
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index ed0c7e812445..dea0944ebbfa 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -266,9 +266,20 @@ void iscsi_destroy_endpoint(struct iscsi_endpoint *ep) } EXPORT_SYMBOL_GPL(iscsi_destroy_endpoint);
+void iscsi_put_endpoint(struct iscsi_endpoint *ep) +{ + put_device(&ep->dev); +} +EXPORT_SYMBOL_GPL(iscsi_put_endpoint); + +/** + * iscsi_lookup_endpoint - get ep from handle + * @handle: endpoint handle + * + * Caller must do a iscsi_put_endpoint. + */ struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle) { - struct iscsi_endpoint *ep; struct device *dev;
dev = class_find_device(&iscsi_endpoint_class, NULL, &handle, @@ -276,13 +287,7 @@ struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle) if (!dev) return NULL;
- ep = iscsi_dev_to_endpoint(dev); - /* - * we can drop this now because the interface will prevent - * removals and lookups from racing. - */ - put_device(dev); - return ep; + return iscsi_dev_to_endpoint(dev); } EXPORT_SYMBOL_GPL(iscsi_lookup_endpoint);
@@ -2983,6 +2988,7 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport, }
transport->ep_disconnect(ep); + iscsi_put_endpoint(ep); return 0; }
@@ -3008,6 +3014,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
ev->r.retcode = transport->ep_poll(ep, ev->u.ep_poll.timeout_ms); + iscsi_put_endpoint(ep); break; case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT: rc = iscsi_if_ep_disconnect(transport, @@ -3691,6 +3698,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group) ev->u.c_bound_session.initial_cmdsn, ev->u.c_bound_session.cmds_max, ev->u.c_bound_session.queue_depth); + iscsi_put_endpoint(ep); break; case ISCSI_UEVENT_DESTROY_SESSION: session = iscsi_session_lookup(ev->u.d_session.sid); @@ -3762,6 +3770,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group) mutex_lock(&conn->ep_mutex); conn->ep = ep; mutex_unlock(&conn->ep_mutex); + iscsi_put_endpoint(ep); } else iscsi_cls_conn_printk(KERN_ERR, conn, "Could not set ep conn " diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h index eb6ed499324d..09992f019dc9 100644 --- a/include/scsi/scsi_transport_iscsi.h +++ b/include/scsi/scsi_transport_iscsi.h @@ -444,6 +444,7 @@ extern int iscsi_scan_finished(struct Scsi_Host *shost, unsigned long time); extern struct iscsi_endpoint *iscsi_create_endpoint(int dd_size); extern void iscsi_destroy_endpoint(struct iscsi_endpoint *ep); extern struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle); +extern void iscsi_put_endpoint(struct iscsi_endpoint *ep); extern int iscsi_block_scsi_eh(struct scsi_cmnd *cmd); extern struct iscsi_iface *iscsi_create_iface(struct Scsi_Host *shost, struct iscsi_transport *t,
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 23d6fefbb3f6b1cc29794427588b470ed06ff64e ]
Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") has the following regressions/bugs that this patch fixes:
1. It can return cmds to upper layers like dm-multipath where that can retry them. After they are successful the fs/app can send new I/O to the same sectors, but we've left the cmds running in FW or in the net layer. We need to be calling ep_disconnect if userspace is not up.
This patch only fixes the issue for offload drivers. iscsi_tcp will be fixed in separate commit because it doesn't have a ep_disconnect call.
2. The drivers that implement ep_disconnect expect that it's called before conn_stop. Besides crashes, if the cleanup_task callout is called before ep_disconnect it might free up driver/card resources for session1 then they could be allocated for session2. But because the driver's ep_disconnect is not called it has not cleaned up the firmware so the card is still using the resources for the original cmd.
3. The stop_conn_work_fn can run after userspace has done its recovery and we are happily using the session. We will then end up with various bugs depending on what is going on at the time.
We may also run stop_conn_work_fn late after userspace has called stop_conn and ep_disconnect and is now going to call start/bind conn. If stop_conn_work_fn runs after bind but before start, we would leave the conn in a unbound but sort of started state where IO might be allowed even though the drivers have been set in a state where they no longer expect I/O.
4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in kernel stop_conn function is breaking userspace. We should have been doing this for the caller.
Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Reviewed-by: Lee Duncan lduncan@suse.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 471 ++++++++++++++++------------ include/scsi/scsi_transport_iscsi.h | 10 +- 2 files changed, 283 insertions(+), 198 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index dea0944ebbfa..7246f7b2ff03 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -86,15 +86,11 @@ struct iscsi_internal { struct transport_container session_cont; };
-/* Worker to perform connection failure on unresponsive connections - * completely in kernel space. - */ -static void stop_conn_work_fn(struct work_struct *work); -static DECLARE_WORK(stop_conn_work, stop_conn_work_fn); - static atomic_t iscsi_session_nr; /* sysfs session id for next new session */ static struct workqueue_struct *iscsi_eh_timer_workq;
+static struct workqueue_struct *iscsi_conn_cleanup_workq; + static DEFINE_IDA(iscsi_sess_ida); /* * list of registered transports and lock that must @@ -1601,12 +1597,6 @@ static DECLARE_TRANSPORT_CLASS(iscsi_connection_class, static struct sock *nls; static DEFINE_MUTEX(rx_queue_mutex);
-/* - * conn_mutex protects the {start,bind,stop,destroy}_conn from racing - * against the kernel stop_connection recovery mechanism - */ -static DEFINE_MUTEX(conn_mutex); - static LIST_HEAD(sesslist); static DEFINE_SPINLOCK(sesslock); static LIST_HEAD(connlist); @@ -2228,6 +2218,123 @@ void iscsi_remove_session(struct iscsi_cls_session *session) } EXPORT_SYMBOL_GPL(iscsi_remove_session);
+static void iscsi_stop_conn(struct iscsi_cls_conn *conn, int flag) +{ + ISCSI_DBG_TRANS_CONN(conn, "Stopping conn.\n"); + + switch (flag) { + case STOP_CONN_RECOVER: + conn->state = ISCSI_CONN_FAILED; + break; + case STOP_CONN_TERM: + conn->state = ISCSI_CONN_DOWN; + break; + default: + iscsi_cls_conn_printk(KERN_ERR, conn, "invalid stop flag %d\n", + flag); + return; + } + + conn->transport->stop_conn(conn, flag); + ISCSI_DBG_TRANS_CONN(conn, "Stopping conn done.\n"); +} + +static int iscsi_if_stop_conn(struct iscsi_transport *transport, + struct iscsi_uevent *ev) +{ + int flag = ev->u.stop_conn.flag; + struct iscsi_cls_conn *conn; + + conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid); + if (!conn) + return -EINVAL; + + ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop.\n"); + /* + * If this is a termination we have to call stop_conn with that flag + * so the correct states get set. If we haven't run the work yet try to + * avoid the extra run. + */ + if (flag == STOP_CONN_TERM) { + cancel_work_sync(&conn->cleanup_work); + iscsi_stop_conn(conn, flag); + } else { + /* + * Figure out if it was the kernel or userspace initiating this. + */ + if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + iscsi_stop_conn(conn, flag); + } else { + ISCSI_DBG_TRANS_CONN(conn, + "flush kernel conn cleanup.\n"); + flush_work(&conn->cleanup_work); + } + /* + * Only clear for recovery to avoid extra cleanup runs during + * termination. + */ + clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags); + } + ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop done.\n"); + return 0; +} + +static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active) +{ + struct iscsi_cls_session *session = iscsi_conn_to_session(conn); + struct iscsi_endpoint *ep; + + ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n"); + conn->state = ISCSI_CONN_FAILED; + + if (!conn->ep || !session->transport->ep_disconnect) + return; + + ep = conn->ep; + conn->ep = NULL; + + session->transport->unbind_conn(conn, is_active); + session->transport->ep_disconnect(ep); + ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n"); +} + +static void iscsi_cleanup_conn_work_fn(struct work_struct *work) +{ + struct iscsi_cls_conn *conn = container_of(work, struct iscsi_cls_conn, + cleanup_work); + struct iscsi_cls_session *session = iscsi_conn_to_session(conn); + + mutex_lock(&conn->ep_mutex); + /* + * If we are not at least bound there is nothing for us to do. Userspace + * will do a ep_disconnect call if offload is used, but will not be + * doing a stop since there is nothing to clean up, so we have to clear + * the cleanup bit here. + */ + if (conn->state != ISCSI_CONN_BOUND && conn->state != ISCSI_CONN_UP) { + ISCSI_DBG_TRANS_CONN(conn, "Got error while conn is already failed. Ignoring.\n"); + clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags); + mutex_unlock(&conn->ep_mutex); + return; + } + + iscsi_ep_disconnect(conn, false); + + if (system_state != SYSTEM_RUNNING) { + /* + * If the user has set up for the session to never timeout + * then hang like they wanted. For all other cases fail right + * away since userspace is not going to relogin. + */ + if (session->recovery_tmo > 0) + session->recovery_tmo = 0; + } + + iscsi_stop_conn(conn, STOP_CONN_RECOVER); + mutex_unlock(&conn->ep_mutex); + ISCSI_DBG_TRANS_CONN(conn, "cleanup done.\n"); +} + void iscsi_free_session(struct iscsi_cls_session *session) { ISCSI_DBG_TRANS_SESSION(session, "Freeing session\n"); @@ -2267,7 +2374,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
mutex_init(&conn->ep_mutex); INIT_LIST_HEAD(&conn->conn_list); - INIT_LIST_HEAD(&conn->conn_list_err); + INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn); conn->transport = transport; conn->cid = cid; conn->state = ISCSI_CONN_DOWN; @@ -2324,7 +2431,6 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
spin_lock_irqsave(&connlock, flags); list_del(&conn->conn_list); - list_del(&conn->conn_list_err); spin_unlock_irqrestore(&connlock, flags);
transport_unregister_device(&conn->dev); @@ -2451,83 +2557,6 @@ int iscsi_offload_mesg(struct Scsi_Host *shost, } EXPORT_SYMBOL_GPL(iscsi_offload_mesg);
-/* - * This can be called without the rx_queue_mutex, if invoked by the kernel - * stop work. But, in that case, it is guaranteed not to race with - * iscsi_destroy by conn_mutex. - */ -static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag) -{ - /* - * It is important that this path doesn't rely on - * rx_queue_mutex, otherwise, a thread doing allocation on a - * start_session/start_connection could sleep waiting on a - * writeback to a failed iscsi device, that cannot be recovered - * because the lock is held. If we don't hold it here, the - * kernel stop_conn_work_fn has a chance to stop the broken - * session and resolve the allocation. - * - * Still, the user invoked .stop_conn() needs to be serialized - * with stop_conn_work_fn by a private mutex. Not pretty, but - * it works. - */ - mutex_lock(&conn_mutex); - switch (flag) { - case STOP_CONN_RECOVER: - conn->state = ISCSI_CONN_FAILED; - break; - case STOP_CONN_TERM: - conn->state = ISCSI_CONN_DOWN; - break; - default: - iscsi_cls_conn_printk(KERN_ERR, conn, - "invalid stop flag %d\n", flag); - goto unlock; - } - - conn->transport->stop_conn(conn, flag); -unlock: - mutex_unlock(&conn_mutex); -} - -static void stop_conn_work_fn(struct work_struct *work) -{ - struct iscsi_cls_conn *conn, *tmp; - unsigned long flags; - LIST_HEAD(recovery_list); - - spin_lock_irqsave(&connlock, flags); - if (list_empty(&connlist_err)) { - spin_unlock_irqrestore(&connlock, flags); - return; - } - list_splice_init(&connlist_err, &recovery_list); - spin_unlock_irqrestore(&connlock, flags); - - list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) { - uint32_t sid = iscsi_conn_get_sid(conn); - struct iscsi_cls_session *session; - - session = iscsi_session_lookup(sid); - if (session) { - if (system_state != SYSTEM_RUNNING) { - /* - * If the user has set up for the session to - * never timeout then hang like they wanted. - * For all other cases fail right away since - * userspace is not going to relogin. - */ - if (session->recovery_tmo > 0) - session->recovery_tmo = 0; - } - - iscsi_if_stop_conn(conn, STOP_CONN_RECOVER); - } - - list_del_init(&conn->conn_list_err); - } -} - void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error) { struct nlmsghdr *nlh; @@ -2535,12 +2564,9 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error) struct iscsi_uevent *ev; struct iscsi_internal *priv; int len = nlmsg_total_size(sizeof(*ev)); - unsigned long flags;
- spin_lock_irqsave(&connlock, flags); - list_add(&conn->conn_list_err, &connlist_err); - spin_unlock_irqrestore(&connlock, flags); - queue_work(system_unbound_wq, &stop_conn_work); + if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) + queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work);
priv = iscsi_if_transport_lookup(conn->transport); if (!priv) @@ -2870,26 +2896,17 @@ static int iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev) { struct iscsi_cls_conn *conn; - unsigned long flags;
conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid); if (!conn) return -EINVAL;
- spin_lock_irqsave(&connlock, flags); - if (!list_empty(&conn->conn_list_err)) { - spin_unlock_irqrestore(&connlock, flags); - return -EAGAIN; - } - spin_unlock_irqrestore(&connlock, flags); - + ISCSI_DBG_TRANS_CONN(conn, "Flushing cleanup during destruction\n"); + flush_work(&conn->cleanup_work); ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
- mutex_lock(&conn_mutex); if (transport->destroy_conn) transport->destroy_conn(conn); - mutex_unlock(&conn_mutex); - return 0; }
@@ -2966,7 +2983,7 @@ static int iscsi_if_ep_connect(struct iscsi_transport *transport, }
static int iscsi_if_ep_disconnect(struct iscsi_transport *transport, - u64 ep_handle, bool is_active) + u64 ep_handle) { struct iscsi_cls_conn *conn; struct iscsi_endpoint *ep; @@ -2977,17 +2994,30 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport, ep = iscsi_lookup_endpoint(ep_handle); if (!ep) return -EINVAL; + conn = ep->conn; - if (conn) { - mutex_lock(&conn->ep_mutex); - conn->ep = NULL; + if (!conn) { + /* + * conn was not even bound yet, so we can't get iscsi conn + * failures yet. + */ + transport->ep_disconnect(ep); + goto put_ep; + } + + mutex_lock(&conn->ep_mutex); + /* Check if this was a conn error and the kernel took ownership */ + if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n"); mutex_unlock(&conn->ep_mutex); - conn->state = ISCSI_CONN_FAILED;
- transport->unbind_conn(conn, is_active); + flush_work(&conn->cleanup_work); + goto put_ep; }
- transport->ep_disconnect(ep); + iscsi_ep_disconnect(conn, false); + mutex_unlock(&conn->ep_mutex); +put_ep: iscsi_put_endpoint(ep); return 0; } @@ -3018,8 +3048,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport, break; case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT: rc = iscsi_if_ep_disconnect(transport, - ev->u.ep_disconnect.ep_handle, - false); + ev->u.ep_disconnect.ep_handle); break; } return rc; @@ -3646,18 +3675,129 @@ iscsi_get_host_stats(struct iscsi_transport *transport, struct nlmsghdr *nlh) return err; }
+static int iscsi_if_transport_conn(struct iscsi_transport *transport, + struct nlmsghdr *nlh) +{ + struct iscsi_uevent *ev = nlmsg_data(nlh); + struct iscsi_cls_session *session; + struct iscsi_cls_conn *conn = NULL; + struct iscsi_endpoint *ep; + uint32_t pdu_len; + int err = 0; + + switch (nlh->nlmsg_type) { + case ISCSI_UEVENT_CREATE_CONN: + return iscsi_if_create_conn(transport, ev); + case ISCSI_UEVENT_DESTROY_CONN: + return iscsi_if_destroy_conn(transport, ev); + case ISCSI_UEVENT_STOP_CONN: + return iscsi_if_stop_conn(transport, ev); + } + + /* + * The following cmds need to be run under the ep_mutex so in kernel + * conn cleanup (ep_disconnect + unbind and conn) is not done while + * these are running. They also must not run if we have just run a conn + * cleanup because they would set the state in a way that might allow + * IO or send IO themselves. + */ + switch (nlh->nlmsg_type) { + case ISCSI_UEVENT_START_CONN: + conn = iscsi_conn_lookup(ev->u.start_conn.sid, + ev->u.start_conn.cid); + break; + case ISCSI_UEVENT_BIND_CONN: + conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid); + break; + case ISCSI_UEVENT_SEND_PDU: + conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid); + break; + } + + if (!conn) + return -EINVAL; + + mutex_lock(&conn->ep_mutex); + if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + mutex_unlock(&conn->ep_mutex); + ev->r.retcode = -ENOTCONN; + return 0; + } + + switch (nlh->nlmsg_type) { + case ISCSI_UEVENT_BIND_CONN: + if (conn->ep) { + /* + * For offload boot support where iscsid is restarted + * during the pivot root stage, the ep will be intact + * here when the new iscsid instance starts up and + * reconnects. + */ + iscsi_ep_disconnect(conn, true); + } + + session = iscsi_session_lookup(ev->u.b_conn.sid); + if (!session) { + err = -EINVAL; + break; + } + + ev->r.retcode = transport->bind_conn(session, conn, + ev->u.b_conn.transport_eph, + ev->u.b_conn.is_leading); + if (!ev->r.retcode) + conn->state = ISCSI_CONN_BOUND; + + if (ev->r.retcode || !transport->ep_connect) + break; + + ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph); + if (ep) { + ep->conn = conn; + conn->ep = ep; + iscsi_put_endpoint(ep); + } else { + err = -ENOTCONN; + iscsi_cls_conn_printk(KERN_ERR, conn, + "Could not set ep conn binding\n"); + } + break; + case ISCSI_UEVENT_START_CONN: + ev->r.retcode = transport->start_conn(conn); + if (!ev->r.retcode) + conn->state = ISCSI_CONN_UP; + break; + case ISCSI_UEVENT_SEND_PDU: + pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev); + + if ((ev->u.send_pdu.hdr_size > pdu_len) || + (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) { + err = -EINVAL; + break; + } + + ev->r.retcode = transport->send_pdu(conn, + (struct iscsi_hdr *)((char *)ev + sizeof(*ev)), + (char *)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size, + ev->u.send_pdu.data_size); + break; + default: + err = -ENOSYS; + } + + mutex_unlock(&conn->ep_mutex); + return err; +}
static int iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group) { int err = 0; u32 portid; - u32 pdu_len; struct iscsi_uevent *ev = nlmsg_data(nlh); struct iscsi_transport *transport = NULL; struct iscsi_internal *priv; struct iscsi_cls_session *session; - struct iscsi_cls_conn *conn; struct iscsi_endpoint *ep = NULL;
if (!netlink_capable(skb, CAP_SYS_ADMIN)) @@ -3734,90 +3874,16 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group) else err = -EINVAL; break; - case ISCSI_UEVENT_CREATE_CONN: - err = iscsi_if_create_conn(transport, ev); - break; - case ISCSI_UEVENT_DESTROY_CONN: - err = iscsi_if_destroy_conn(transport, ev); - break; - case ISCSI_UEVENT_BIND_CONN: - session = iscsi_session_lookup(ev->u.b_conn.sid); - conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid); - - if (conn && conn->ep) - iscsi_if_ep_disconnect(transport, conn->ep->id, true); - - if (!session || !conn) { - err = -EINVAL; - break; - } - - mutex_lock(&conn_mutex); - ev->r.retcode = transport->bind_conn(session, conn, - ev->u.b_conn.transport_eph, - ev->u.b_conn.is_leading); - if (!ev->r.retcode) - conn->state = ISCSI_CONN_BOUND; - mutex_unlock(&conn_mutex); - - if (ev->r.retcode || !transport->ep_connect) - break; - - ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph); - if (ep) { - ep->conn = conn; - - mutex_lock(&conn->ep_mutex); - conn->ep = ep; - mutex_unlock(&conn->ep_mutex); - iscsi_put_endpoint(ep); - } else - iscsi_cls_conn_printk(KERN_ERR, conn, - "Could not set ep conn " - "binding\n"); - break; case ISCSI_UEVENT_SET_PARAM: err = iscsi_set_param(transport, ev); break; - case ISCSI_UEVENT_START_CONN: - conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid); - if (conn) { - mutex_lock(&conn_mutex); - ev->r.retcode = transport->start_conn(conn); - if (!ev->r.retcode) - conn->state = ISCSI_CONN_UP; - mutex_unlock(&conn_mutex); - } - else - err = -EINVAL; - break; + case ISCSI_UEVENT_CREATE_CONN: + case ISCSI_UEVENT_DESTROY_CONN: case ISCSI_UEVENT_STOP_CONN: - conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid); - if (conn) - iscsi_if_stop_conn(conn, ev->u.stop_conn.flag); - else - err = -EINVAL; - break; + case ISCSI_UEVENT_START_CONN: + case ISCSI_UEVENT_BIND_CONN: case ISCSI_UEVENT_SEND_PDU: - pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev); - - if ((ev->u.send_pdu.hdr_size > pdu_len) || - (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) { - err = -EINVAL; - break; - } - - conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid); - if (conn) { - mutex_lock(&conn_mutex); - ev->r.retcode = transport->send_pdu(conn, - (struct iscsi_hdr*)((char*)ev + sizeof(*ev)), - (char*)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size, - ev->u.send_pdu.data_size); - mutex_unlock(&conn_mutex); - } - else - err = -EINVAL; + err = iscsi_if_transport_conn(transport, nlh); break; case ISCSI_UEVENT_GET_STATS: err = iscsi_if_get_stats(transport, nlh); @@ -4820,8 +4886,18 @@ static __init int iscsi_transport_init(void) goto release_nls; }
+ iscsi_conn_cleanup_workq = alloc_workqueue("%s", + WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 0, + "iscsi_conn_cleanup"); + if (!iscsi_conn_cleanup_workq) { + err = -ENOMEM; + goto destroy_wq; + } + return 0;
+destroy_wq: + destroy_workqueue(iscsi_eh_timer_workq); release_nls: netlink_kernel_release(nls); unregister_flashnode_bus: @@ -4843,6 +4919,7 @@ static __init int iscsi_transport_init(void)
static void __exit iscsi_transport_exit(void) { + destroy_workqueue(iscsi_conn_cleanup_workq); destroy_workqueue(iscsi_eh_timer_workq); netlink_kernel_release(nls); bus_unregister(&iscsi_flashnode_bus); diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h index 09992f019dc9..c5d7810fd792 100644 --- a/include/scsi/scsi_transport_iscsi.h +++ b/include/scsi/scsi_transport_iscsi.h @@ -197,15 +197,23 @@ enum iscsi_connection_state { ISCSI_CONN_BOUND, };
+#define ISCSI_CLS_CONN_BIT_CLEANUP 1 + struct iscsi_cls_conn { struct list_head conn_list; /* item in connlist */ - struct list_head conn_list_err; /* item in connlist_err */ void *dd_data; /* LLD private data */ struct iscsi_transport *transport; uint32_t cid; /* connection id */ + /* + * This protects the conn startup and binding/unbinding of the ep to + * the conn. Unbinding includes ep_disconnect and stop_conn. + */ struct mutex ep_mutex; struct iscsi_endpoint *ep;
+ unsigned long flags; + struct work_struct cleanup_work; + struct device dev; /* sysfs transport/container device */ enum iscsi_connection_state state; };
From: Mike Christie michael.christie@oracle.com
[ Upstream commit c34f95e98d8fb750eefd4f3fe58b4f8b5e89253b ]
This patch moves iscsi_ep_disconnect() so it can be called earlier in the next patch.
Link: https://lore.kernel.org/r/20220408001314.5014-2-michael.christie@oracle.com Tested-by: Manish Rangankar mrangankar@marvell.com Reviewed-by: Lee Duncan lduncan@suse.com Reviewed-by: Chris Leech cleech@redhat.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 38 ++++++++++++++--------------- 1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 7246f7b2ff03..8f6dc391356d 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2239,6 +2239,25 @@ static void iscsi_stop_conn(struct iscsi_cls_conn *conn, int flag) ISCSI_DBG_TRANS_CONN(conn, "Stopping conn done.\n"); }
+static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active) +{ + struct iscsi_cls_session *session = iscsi_conn_to_session(conn); + struct iscsi_endpoint *ep; + + ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n"); + conn->state = ISCSI_CONN_FAILED; + + if (!conn->ep || !session->transport->ep_disconnect) + return; + + ep = conn->ep; + conn->ep = NULL; + + session->transport->unbind_conn(conn, is_active); + session->transport->ep_disconnect(ep); + ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n"); +} + static int iscsi_if_stop_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev) { @@ -2279,25 +2298,6 @@ static int iscsi_if_stop_conn(struct iscsi_transport *transport, return 0; }
-static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active) -{ - struct iscsi_cls_session *session = iscsi_conn_to_session(conn); - struct iscsi_endpoint *ep; - - ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n"); - conn->state = ISCSI_CONN_FAILED; - - if (!conn->ep || !session->transport->ep_disconnect) - return; - - ep = conn->ep; - conn->ep = NULL; - - session->transport->unbind_conn(conn, is_active); - session->transport->ep_disconnect(ep); - ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n"); -} - static void iscsi_cleanup_conn_work_fn(struct work_struct *work) { struct iscsi_cls_conn *conn = container_of(work, struct iscsi_cls_conn,
From: Mike Christie michael.christie@oracle.com
[ Upstream commit cbd2283aaf47fef4ded4b29124b1ef3beb515f3a ]
When userspace restarts during boot or upgrades it won't know about the offload driver's endpoint and connection mappings. iscsid will start by cleaning up the old session by doing a stop_conn call. Later, if we are able to create a new connection, we clean up the old endpoint during the binding stage. The problem is that if we do stop_conn before doing the ep_disconnect call offload, drivers can still be executing I/O. We then might free tasks from the under the card/driver.
This moves the ep_disconnect call to before we do the stop_conn call for this case. It will then work and look like a normal recovery/cleanup procedure from the driver's point of view.
Link: https://lore.kernel.org/r/20220408001314.5014-3-michael.christie@oracle.com Tested-by: Manish Rangankar mrangankar@marvell.com Reviewed-by: Lee Duncan lduncan@suse.com Reviewed-by: Chris Leech cleech@redhat.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 48 +++++++++++++++++------------ 1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 8f6dc391356d..38abc3a15692 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2258,6 +2258,23 @@ static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active) ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n"); }
+static void iscsi_if_disconnect_bound_ep(struct iscsi_cls_conn *conn, + struct iscsi_endpoint *ep, + bool is_active) +{ + /* Check if this was a conn error and the kernel took ownership */ + if (!test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + iscsi_ep_disconnect(conn, is_active); + } else { + ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n"); + mutex_unlock(&conn->ep_mutex); + + flush_work(&conn->cleanup_work); + + mutex_lock(&conn->ep_mutex); + } +} + static int iscsi_if_stop_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev) { @@ -2278,6 +2295,16 @@ static int iscsi_if_stop_conn(struct iscsi_transport *transport, cancel_work_sync(&conn->cleanup_work); iscsi_stop_conn(conn, flag); } else { + /* + * For offload, when iscsid is restarted it won't know about + * existing endpoints so it can't do a ep_disconnect. We clean + * it up here for userspace. + */ + mutex_lock(&conn->ep_mutex); + if (conn->ep) + iscsi_if_disconnect_bound_ep(conn, conn->ep, true); + mutex_unlock(&conn->ep_mutex); + /* * Figure out if it was the kernel or userspace initiating this. */ @@ -3006,16 +3033,7 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport, }
mutex_lock(&conn->ep_mutex); - /* Check if this was a conn error and the kernel took ownership */ - if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { - ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n"); - mutex_unlock(&conn->ep_mutex); - - flush_work(&conn->cleanup_work); - goto put_ep; - } - - iscsi_ep_disconnect(conn, false); + iscsi_if_disconnect_bound_ep(conn, ep, false); mutex_unlock(&conn->ep_mutex); put_ep: iscsi_put_endpoint(ep); @@ -3726,16 +3744,6 @@ static int iscsi_if_transport_conn(struct iscsi_transport *transport,
switch (nlh->nlmsg_type) { case ISCSI_UEVENT_BIND_CONN: - if (conn->ep) { - /* - * For offload boot support where iscsid is restarted - * during the pivot root stage, the ep will be intact - * here when the new iscsid instance starts up and - * reconnects. - */ - iscsi_ep_disconnect(conn, true); - } - session = iscsi_session_lookup(ev->u.b_conn.sid); if (!session) { err = -EINVAL;
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 7c6e99c18167ed89729bf167ccb4a7e3ab3115ba ]
If iscsid is doing a stop_conn at the same time the kernel is starting error recovery we can hit a race that allows the cleanup work to run on a valid connection. In the race, iscsi_if_stop_conn sees the cleanup bit set, but it calls flush_work on the clean_work before iscsi_conn_error_event has queued it. The flush then returns before the queueing and so the cleanup_work can run later and disconnect/stop a conn while it's in a connected state.
The patch:
Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space")
added the late stop_conn call bug originally, and the patch:
Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling")
attempted to fix it but only fixed the normal EH case and left the above race for the iscsid restart case. For the normal EH case we don't hit the race because we only signal userspace to start recovery after we have done the queueing, so the flush will always catch the queued work or see it completed.
For iscsid restart cases like boot, we can hit the race because iscsid will call down to the kernel before the kernel has signaled any error, so both code paths can be running at the same time. This adds a lock around the setting of the cleanup bit and queueing so they happen together.
Link: https://lore.kernel.org/r/20220408001314.5014-6-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar mrangankar@marvell.com Reviewed-by: Lee Duncan lduncan@suse.com Reviewed-by: Chris Leech cleech@redhat.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 17 +++++++++++++++++ include/scsi/scsi_transport_iscsi.h | 2 ++ 2 files changed, 19 insertions(+)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 38abc3a15692..21cc306a0f11 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2263,9 +2263,12 @@ static void iscsi_if_disconnect_bound_ep(struct iscsi_cls_conn *conn, bool is_active) { /* Check if this was a conn error and the kernel took ownership */ + spin_lock_irq(&conn->lock); if (!test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + spin_unlock_irq(&conn->lock); iscsi_ep_disconnect(conn, is_active); } else { + spin_unlock_irq(&conn->lock); ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n"); mutex_unlock(&conn->ep_mutex);
@@ -2308,9 +2311,12 @@ static int iscsi_if_stop_conn(struct iscsi_transport *transport, /* * Figure out if it was the kernel or userspace initiating this. */ + spin_lock_irq(&conn->lock); if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + spin_unlock_irq(&conn->lock); iscsi_stop_conn(conn, flag); } else { + spin_unlock_irq(&conn->lock); ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n"); flush_work(&conn->cleanup_work); @@ -2319,7 +2325,9 @@ static int iscsi_if_stop_conn(struct iscsi_transport *transport, * Only clear for recovery to avoid extra cleanup runs during * termination. */ + spin_lock_irq(&conn->lock); clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags); + spin_unlock_irq(&conn->lock); } ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop done.\n"); return 0; @@ -2340,7 +2348,9 @@ static void iscsi_cleanup_conn_work_fn(struct work_struct *work) */ if (conn->state != ISCSI_CONN_BOUND && conn->state != ISCSI_CONN_UP) { ISCSI_DBG_TRANS_CONN(conn, "Got error while conn is already failed. Ignoring.\n"); + spin_lock_irq(&conn->lock); clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags); + spin_unlock_irq(&conn->lock); mutex_unlock(&conn->ep_mutex); return; } @@ -2400,6 +2410,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid) conn->dd_data = &conn[1];
mutex_init(&conn->ep_mutex); + spin_lock_init(&conn->lock); INIT_LIST_HEAD(&conn->conn_list); INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn); conn->transport = transport; @@ -2591,9 +2602,12 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error) struct iscsi_uevent *ev; struct iscsi_internal *priv; int len = nlmsg_total_size(sizeof(*ev)); + unsigned long flags;
+ spin_lock_irqsave(&conn->lock, flags); if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work); + spin_unlock_irqrestore(&conn->lock, flags);
priv = iscsi_if_transport_lookup(conn->transport); if (!priv) @@ -3736,11 +3750,14 @@ static int iscsi_if_transport_conn(struct iscsi_transport *transport, return -EINVAL;
mutex_lock(&conn->ep_mutex); + spin_lock_irq(&conn->lock); if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) { + spin_unlock_irq(&conn->lock); mutex_unlock(&conn->ep_mutex); ev->r.retcode = -ENOTCONN; return 0; } + spin_unlock_irq(&conn->lock);
switch (nlh->nlmsg_type) { case ISCSI_UEVENT_BIND_CONN: diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h index c5d7810fd792..037c77fb5dc5 100644 --- a/include/scsi/scsi_transport_iscsi.h +++ b/include/scsi/scsi_transport_iscsi.h @@ -211,6 +211,8 @@ struct iscsi_cls_conn { struct mutex ep_mutex; struct iscsi_endpoint *ep;
+ /* Used when accessing flags and queueing work. */ + spinlock_t lock; unsigned long flags; struct work_struct cleanup_work;
From: Petr Malat oss@malat.biz
[ Upstream commit 8467dda0c26583547731e7f3ea73fc3856bae3bf ]
Function sctp_do_peeloff() wrongly initializes daddr of the original socket instead of the peeled off socket, which makes getpeername() return zeroes instead of the primary address. Initialize the new socket instead.
Fixes: d570ee490fb1 ("[SCTP]: Correctly set daddr for IPv6 sockets during peeloff") Signed-off-by: Petr Malat oss@malat.biz Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Link: https://lore.kernel.org/r/20220409063611.673193-1-oss@malat.biz Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 0a9e2c7d8e5f..e9b4ea3d934f 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -5518,7 +5518,7 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) * Set the daddr and initialize id to something more random and also * copy over any ip options. */ - sp->pf->to_sk_daddr(&asoc->peer.primary_addr, sk); + sp->pf->to_sk_daddr(&asoc->peer.primary_addr, sock->sk); sp->pf->copy_ip_options(sk, sock->sk);
/* Populate the fields of the newsk from the oldsk and migrate the
From: Athira Rajeev atrajeev@linux.vnet.ibm.com
[ Upstream commit ce64763c63854b4079f2e036638aa881a1fb3fbc ]
The selftest "mqueue/mq_perf_tests.c" use CPU_ALLOC to allocate CPU set. This cpu set is used further in pthread_attr_setaffinity_np and by pthread_create in the code. But in current code, allocated cpu set is not freed.
Fix this issue by adding CPU_FREE in the "shutdown" function which is called in most of the error/exit path for the cleanup. There are few error paths which exit without using shutdown. Add a common goto error path with CPU_FREE for these cases.
Fixes: 7820b0715b6f ("tools/selftests: add mq_perf_tests") Signed-off-by: Athira Rajeev atrajeev@linux.vnet.ibm.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/mqueue/mq_perf_tests.c | 25 +++++++++++++------ 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mqueue/mq_perf_tests.c b/tools/testing/selftests/mqueue/mq_perf_tests.c index b019e0b8221c..84fda3b49073 100644 --- a/tools/testing/selftests/mqueue/mq_perf_tests.c +++ b/tools/testing/selftests/mqueue/mq_perf_tests.c @@ -180,6 +180,9 @@ void shutdown(int exit_val, char *err_cause, int line_no) if (in_shutdown++) return;
+ /* Free the cpu_set allocated using CPU_ALLOC in main function */ + CPU_FREE(cpu_set); + for (i = 0; i < num_cpus_to_pin; i++) if (cpu_threads[i]) { pthread_kill(cpu_threads[i], SIGUSR1); @@ -551,6 +554,12 @@ int main(int argc, char *argv[]) perror("sysconf(_SC_NPROCESSORS_ONLN)"); exit(1); } + + if (getuid() != 0) + ksft_exit_skip("Not running as root, but almost all tests " + "require root in order to modify\nsystem settings. " + "Exiting.\n"); + cpus_online = min(MAX_CPUS, sysconf(_SC_NPROCESSORS_ONLN)); cpu_set = CPU_ALLOC(cpus_online); if (cpu_set == NULL) { @@ -589,7 +598,7 @@ int main(int argc, char *argv[]) cpu_set)) { fprintf(stderr, "Any given CPU may " "only be given once.\n"); - exit(1); + goto err_code; } else CPU_SET_S(cpus_to_pin[cpu], cpu_set_size, cpu_set); @@ -607,7 +616,7 @@ int main(int argc, char *argv[]) queue_path = malloc(strlen(option) + 2); if (!queue_path) { perror("malloc()"); - exit(1); + goto err_code; } queue_path[0] = '/'; queue_path[1] = 0; @@ -622,17 +631,12 @@ int main(int argc, char *argv[]) fprintf(stderr, "Must pass at least one CPU to continuous " "mode.\n"); poptPrintUsage(popt_context, stderr, 0); - exit(1); + goto err_code; } else if (!continuous_mode) { num_cpus_to_pin = 1; cpus_to_pin[0] = cpus_online - 1; }
- if (getuid() != 0) - ksft_exit_skip("Not running as root, but almost all tests " - "require root in order to modify\nsystem settings. " - "Exiting.\n"); - max_msgs = fopen(MAX_MSGS, "r+"); max_msgsize = fopen(MAX_MSGSIZE, "r+"); if (!max_msgs) @@ -740,4 +744,9 @@ int main(int argc, char *argv[]) sleep(1); } shutdown(0, "", 0); + +err_code: + CPU_FREE(cpu_set); + exit(1); + }
From: Adrian Hunter adrian.hunter@intel.com
[ Upstream commit f034fc50d3c7d9385c20d505ab4cf56b8fd18ac7 ]
Fix incorrect debug message:
Attempting to add event pmu 'intel_pt' with '' that may result in non-fatal errors
which always appears with perf record -vv and intel_pt e.g.
perf record -vv -e intel_pt//u uname
The message is incorrect because there will never be non-fatal errors.
Suppress the message if the PMU is 'selectable' i.e. meant to be selected directly as an event.
Fixes: 4ac22b484d4c79e8 ("perf parse-events: Make add PMU verbose output clearer") Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Link: http://lore.kernel.org/lkml/20220411061758.2458417-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 3b273580fb84..3a0a7930cd10 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1442,7 +1442,9 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, bool use_uncore_alias; LIST_HEAD(config_terms);
- if (verbose > 1) { + pmu = parse_state->fake_pmu ?: perf_pmu__find(name); + + if (verbose > 1 && !(pmu && pmu->selectable)) { fprintf(stderr, "Attempting to add event pmu '%s' with '", name); if (head_config) { @@ -1455,7 +1457,6 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, fprintf(stderr, "' that may result in non-fatal errors\n"); }
- pmu = parse_state->fake_pmu ?: perf_pmu__find(name); if (!pmu) { char *err_str;
From: Lin Ma linma@zju.edu.cn
[ Upstream commit ef27324e2cb7bb24542d6cb2571740eefe6b00dc ]
Our detector found a concurrent use-after-free bug when detaching an NCI device. The main reason for this bug is the unexpected scheduling between the used delayed mechanism (timer and workqueue).
The race can be demonstrated below:
Thread-1 Thread-2 | nci_dev_up() | nci_open_device() | __nci_request(nci_reset_req) | nci_send_cmd | queue_work(cmd_work) nci_unregister_device() | nci_close_device() | ... del_timer_sync(cmd_timer)[1] | ... | Worker nci_free_device() | nci_cmd_work() kfree(ndev)[3] | mod_timer(cmd_timer)[2]
In short, the cleanup routine thought that the cmd_timer has already been detached by [1] but the mod_timer can re-attach the timer [2], even it is already released [3], resulting in UAF.
This UAF is easy to trigger, crash trace by POC is like below
[ 66.703713] ================================================================== [ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490 [ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33 [ 66.703974] [ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5 [ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work [ 66.703974] Call Trace: [ 66.703974] <TASK> [ 66.703974] dump_stack_lvl+0x57/0x7d [ 66.703974] print_report.cold+0x5e/0x5db [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] kasan_report+0xbe/0x1c0 [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] enqueue_timer+0x448/0x490 [ 66.703974] __mod_timer+0x5e6/0xb80 [ 66.703974] ? mark_held_locks+0x9e/0xe0 [ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 66.703974] ? queue_work_on+0x61/0x80 [ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130 [ 66.703974] process_one_work+0x8bb/0x1510 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230 [ 66.703974] ? rwlock_bug.part.0+0x90/0x90 [ 66.703974] ? _raw_spin_lock_irq+0x41/0x50 [ 66.703974] worker_thread+0x575/0x1190 [ 66.703974] ? process_one_work+0x1510/0x1510 [ 66.703974] kthread+0x2a0/0x340 [ 66.703974] ? kthread_complete_and_exit+0x20/0x20 [ 66.703974] ret_from_fork+0x22/0x30 [ 66.703974] </TASK> [ 66.703974] [ 66.703974] Allocated by task 267: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] __kasan_kmalloc+0x81/0xa0 [ 66.703974] nci_allocate_device+0xd3/0x390 [ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0 [ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 66.703974] tty_ioctl+0x764/0x1310 [ 66.703974] __x64_sys_ioctl+0x122/0x190 [ 66.703974] do_syscall_64+0x3b/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 66.703974] [ 66.703974] Freed by task 406: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] kasan_set_track+0x21/0x30 [ 66.703974] kasan_set_free_info+0x20/0x30 [ 66.703974] __kasan_slab_free+0x108/0x170 [ 66.703974] kfree+0xb0/0x330 [ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0 [ 66.703974] nci_uart_tty_close+0xdf/0x180 [ 66.703974] tty_ldisc_kill+0x73/0x110 [ 66.703974] tty_ldisc_hangup+0x281/0x5b0 [ 66.703974] __tty_hangup.part.0+0x431/0x890 [ 66.703974] tty_release+0x3a8/0xc80 [ 66.703974] __fput+0x1f0/0x8c0 [ 66.703974] task_work_run+0xc9/0x170 [ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0 [ 66.703974] syscall_exit_to_user_mode+0x19/0x50 [ 66.703974] do_syscall_64+0x48/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
To fix the UAF, this patch adds flush_workqueue() to ensure the nci_cmd_work is finished before the following del_timer_sync. This combination will promise the timer is actually detached.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Signed-off-by: Lin Ma linma@zju.edu.cn Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/nci/core.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c index e38719e2ee58..2cfff70f70e0 100644 --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -548,6 +548,10 @@ static int nci_close_device(struct nci_dev *ndev) mutex_lock(&ndev->req_lock);
if (!test_and_clear_bit(NCI_UP, &ndev->flags)) { + /* Need to flush the cmd wq in case + * there is a queued/running cmd_work + */ + flush_workqueue(ndev->cmd_wq); del_timer_sync(&ndev->cmd_timer); del_timer_sync(&ndev->data_timer); mutex_unlock(&ndev->req_lock);
From: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
[ Upstream commit 64c4a37ac04eeb43c42d272f6e6c8c12bfcf4304 ]
Smatch printed a warning: arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error: __memcpy() 'dctx->buf' too small (16 vs u32max)
It's caused because Smatch marks 'link_len' as untrusted since it comes from sscanf(). Add a check to ensure that 'link_len' is not larger than the size of the 'link_str' buffer.
Fixes: c69c1b6eaea1 ("cifs: implement CIFSParseMFSymlink()") Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Reviewed-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/link.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/cifs/link.c b/fs/cifs/link.c index 94dab4309fbb..85d30fef98a2 100644 --- a/fs/cifs/link.c +++ b/fs/cifs/link.c @@ -97,6 +97,9 @@ parse_mf_symlink(const u8 *buf, unsigned int buf_len, unsigned int *_link_len, if (rc != 1) return -EINVAL;
+ if (link_len > CIFS_MF_SYMLINK_LINK_MAXLEN) + return -EINVAL; + rc = symlink_hash(link_len, link_str, md5_hash); if (rc) { cifs_dbg(FYI, "%s: MD5 hash failure: %d\n", __func__, rc);
From: Khazhismel Kumykov khazhy@google.com
[ Upstream commit ce40426fdc3c92acdba6b5ca74bc7277ffaa6a3d ]
Mixing sched_clock() and ktime_get_ns() usage will give bad results.
Switch hst_select_path() from using sched_clock() to ktime_get_ns(). Also rename path_service_time()'s 'sched_now' variable to 'now'.
Fixes: 2613eab11996 ("dm mpath: add Historical Service Time Path Selector") Signed-off-by: Khazhismel Kumykov khazhy@google.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-historical-service-time.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-historical-service-time.c b/drivers/md/dm-historical-service-time.c index 186f91e2752c..06fe43c13ba3 100644 --- a/drivers/md/dm-historical-service-time.c +++ b/drivers/md/dm-historical-service-time.c @@ -429,7 +429,7 @@ static struct dm_path *hst_select_path(struct path_selector *ps, { struct selector *s = ps->context; struct path_info *pi = NULL, *best = NULL; - u64 time_now = sched_clock(); + u64 time_now = ktime_get_ns(); struct dm_path *ret = NULL; unsigned long flags;
@@ -470,7 +470,7 @@ static int hst_start_io(struct path_selector *ps, struct dm_path *path,
static u64 path_service_time(struct path_info *pi, u64 start_time) { - u64 sched_now = ktime_get_ns(); + u64 now = ktime_get_ns();
/* if a previous disk request has finished after this IO was * sent to the hardware, pretend the submission happened @@ -479,11 +479,11 @@ static u64 path_service_time(struct path_info *pi, u64 start_time) if (time_after64(pi->last_finish, start_time)) start_time = pi->last_finish;
- pi->last_finish = sched_now; - if (time_before64(sched_now, start_time)) + pi->last_finish = now; + if (time_before64(now, start_time)) return 0;
- return sched_now - start_time; + return now - start_time; }
static int hst_end_io(struct path_selector *ps, struct dm_path *path,
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit 2df3fc4a84e917a422935cc5bae18f43f9955d31 ]
It turns out after digging deeper into this bug, that it was being triggered by GCC12 failing to call the bcmgenet_enable_dma() routine. Given that a gcc12 fix has been merged [1] and the genet driver now works properly when built with gcc12, this commit should be reverted.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=aabb9a261ef060cf24fd626713...
Fixes: 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes to assure ordering") Signed-off-by: Jeremy Linton jeremy.linton@arm.com Acked-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20220412210420.1129430-1-jeremy.linton@arm.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index 7dcd5613ee56..a2062144d7ca 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -76,7 +76,7 @@ static inline void bcmgenet_writel(u32 value, void __iomem *offset) if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) __raw_writel(value, offset); else - writel(value, offset); + writel_relaxed(value, offset); }
static inline u32 bcmgenet_readl(void __iomem *offset) @@ -84,7 +84,7 @@ static inline u32 bcmgenet_readl(void __iomem *offset) if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) return __raw_readl(offset); else - return readl(offset); + return readl_relaxed(offset); }
static inline void dmadesc_set_length_status(struct bcmgenet_priv *priv,
From: Aurabindo Pillai aurabindo.pillai@amd.com
[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]
[Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output
Signed-off-by: Aurabindo Pillai aurabindo.pillai@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 5b393622f592..a0f0a17e224f 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17
/* deleted */
From: Darrick J. Wong djwong@kernel.org
[ Upstream commit 05fd9564e9faf0f23b4676385e27d9405cef6637 ]
Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities.
The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly.
Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/file.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f59ec55e5feb..416a1b753ff6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2833,8 +2833,9 @@ int btrfs_replace_file_extents(struct inode *inode, struct btrfs_path *path, return ret; }
-static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) +static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len) { + struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; struct extent_state *cached_state = NULL; @@ -2866,6 +2867,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) goto out_only_mutex; }
+ ret = file_modified(file); + if (ret) + goto out_only_mutex; + lockstart = round_up(offset, btrfs_inode_sectorsize(BTRFS_I(inode))); lockend = round_down(offset + len, btrfs_inode_sectorsize(BTRFS_I(inode))) - 1; @@ -3301,7 +3306,7 @@ static long btrfs_fallocate(struct file *file, int mode, return -EOPNOTSUPP;
if (mode & FALLOC_FL_PUNCH_HOLE) - return btrfs_punch_hole(inode, offset, len); + return btrfs_punch_hole(file, offset, len);
/* * Only trigger disk allocation, don't trigger qgroup reserve @@ -3323,6 +3328,10 @@ static long btrfs_fallocate(struct file *file, int mode, goto out; }
+ ret = file_modified(file); + if (ret) + goto out; + /* * TODO: Move these two operations after we have checked * accurate reserved space, or fallocate can still fail but
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit a7d16d9a07bbcb7dcd5214a1bea75c808830bc0d ]
This is a long time leftover from when I originally added the free space inode, the point was to catch cases where we weren't honoring the NOCOW flag. However there exists a race with relocation, if we allocate our free space inode in a block group that is about to be relocated, we could trigger the COW path before the relocation has the opportunity to find the extents and delete the free space cache. In production where we have auto-relocation enabled we're seeing this WARN_ON_ONCE() around 5k times in a 2 week period, so not super common but enough that it's at the top of our metrics.
We're properly handling the error here, and with us phasing out v1 space cache anyway just drop the WARN_ON_ONCE.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/inode.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f7f4ac01589b..4a5248097d7a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -995,7 +995,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode, int ret = 0;
if (btrfs_is_free_space_inode(inode)) { - WARN_ON_ONCE(1); ret = -EINVAL; goto out_unlock; }
From: Charlene Liu Charlene.Liu@amd.com
[ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ]
[why] for the case edid change only changed audio format. driver still need to update stream.
Reviewed-by: Alvin Lee Alvin.Lee2@amd.com Reviewed-by: Aric Cyr Aric.Cyr@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Charlene Liu Charlene.Liu@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 5c5ccbad9658..1e47afc4ccc1 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1701,8 +1701,8 @@ bool dc_is_stream_unchanged( if (old_stream->ignore_msa_timing_param != stream->ignore_msa_timing_param) return false;
- // Only Have Audio left to check whether it is same or not. This is a corner case for Tiled sinks - if (old_stream->audio_info.mode_count != stream->audio_info.mode_count) + /*compare audio info*/ + if (memcmp(&old_stream->audio_info, &stream->audio_info, sizeof(stream->audio_info)) != 0) return false;
return true;
From: Chiawen Huang chiawen.huang@amd.com
[ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ]
[Why] disable/enable leads FEC mismatch between hw/sw FEC state.
[How] check FEC status to fastboot on/off.
Reviewed-by: Anthony Koo Anthony.Koo@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Chiawen Huang chiawen.huang@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 93f5229c303e..ac5323596c65 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1173,6 +1173,10 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false;
+ /* Check for FEC status*/ + if (link->link_enc->funcs->fec_is_active(link->link_enc)) + return false; + enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
if (enc_inst == ENGINE_ID_UNKNOWN)
From: Leo (Hanghong) Ma hanghong.ma@amd.com
[ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ]
[Why & How] The latest HDMI SPEC has updated the VTEM packet structure, so change the VTEM Infopacket defined in the driver side to align with the SPEC.
Reviewed-by: Chris Park Chris.Park@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Leo (Hanghong) Ma hanghong.ma@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/display/modules/info_packet/info_packet.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c index 0fdf7a3e96de..96e18050a617 100644 --- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c +++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c @@ -100,7 +100,8 @@ enum vsc_packet_revision { //PB7 = MD0 #define MASK_VTEM_MD0__VRR_EN 0x01 #define MASK_VTEM_MD0__M_CONST 0x02 -#define MASK_VTEM_MD0__RESERVED2 0x0C +#define MASK_VTEM_MD0__QMS_EN 0x04 +#define MASK_VTEM_MD0__RESERVED2 0x08 #define MASK_VTEM_MD0__FVA_FACTOR_M1 0xF0
//MD1 @@ -109,7 +110,7 @@ enum vsc_packet_revision { //MD2 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98 0x03 #define MASK_VTEM_MD2__RB 0x04 -#define MASK_VTEM_MD2__RESERVED3 0xF8 +#define MASK_VTEM_MD2__NEXT_TFR 0xF8
//MD3 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07 0xFF
From: Tushar Patel tushar.patel@amd.com
[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]
Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS
v2: squash in warning fix (Alex)
Signed-off-by: Tushar Patel tushar.patel@amd.com Reviewed-by: Felix Kuehling felix.kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++-------- 2 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index ed13a2f76884..30659c1776e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -632,7 +632,7 @@ MODULE_PARM_DESC(sched_policy, * Maximum number of processes that HWS can schedule concurrently. The maximum is the * number of VMIDs assigned to the HWS, which is also the default. */ -int hws_max_conc_proc = 8; +int hws_max_conc_proc = -1; module_param(hws_max_conc_proc, int, 0444); MODULE_PARM_DESC(hws_max_conc_proc, "Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))"); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 84313135c2ea..148e43dee657 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -664,15 +664,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, - kfd->vm_info.first_vmid_kfd + 1;
/* Verify module parameters regarding mapped process number*/ - if ((hws_max_conc_proc < 0) - || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) { - dev_err(kfd_device, - "hws_max_conc_proc %d must be between 0 and %d, use %d instead\n", - hws_max_conc_proc, kfd->vm_info.vmid_num_kfd, - kfd->vm_info.vmid_num_kfd); + if (hws_max_conc_proc >= 0) + kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, kfd->vm_info.vmid_num_kfd); + else kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd; - } else - kfd->max_proc_per_quantum = hws_max_conc_proc;
/* calculate max size of mqds needed for queues */ size = max_num_of_queues_per_device *
From: Tianci Yin tianci.yin@amd.com
[ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ]
Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in S3 resuming.
Reviewed-by: James Zhu James.Zhu@amd.com Signed-off-by: Tianci Yin tianci.yin@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 2099f6ebd833..bdb8e596bda6 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1429,8 +1429,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device *adev)
static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx) { + struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE}; uint32_t tmp;
+ vcn_v3_0_pause_dpg_mode(adev, 0, &state); + /* Wait for power status to be 1 */ SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
From: QintaoShen unSimple1993@163.com
[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]
As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion.
Signed-off-by: QintaoShen unSimple1993@163.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index ba2c2ce0c55a..159be13ef20b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -531,6 +531,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL;
for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait);
From: Michael Kelley mikelley@microsoft.com
[ Upstream commit b6cae15b5710c8097aad26a2e5e752c323ee5348 ]
When reading a packet from a host-to-guest ring buffer, there is no memory barrier between reading the write index (to see if there is a packet to read) and reading the contents of the packet. The Hyper-V host uses store-release when updating the write index to ensure that writes of the packet data are completed first. On the guest side, the processor can reorder and read the packet data before the write index, and sometimes get stale packet data. Getting such stale packet data has been observed in a reproducible case in a VM on ARM64.
Fix this by using virt_load_acquire() to read the write index, ensuring that reads of the packet data cannot be reordered before it. Preventing such reordering is logically correct, and with this change, getting stale data can no longer be reproduced.
Signed-off-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Andrea Parri (Microsoft) parri.andrea@gmail.com Link: https://lore.kernel.org/r/1648394710-33480-1-git-send-email-mikelley@microso... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hv/ring_buffer.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c index 356e22159e83..769851b6e74c 100644 --- a/drivers/hv/ring_buffer.c +++ b/drivers/hv/ring_buffer.c @@ -378,7 +378,16 @@ int hv_ringbuffer_read(struct vmbus_channel *channel, static u32 hv_pkt_iter_avail(const struct hv_ring_buffer_info *rbi) { u32 priv_read_loc = rbi->priv_read_index; - u32 write_loc = READ_ONCE(rbi->ring_buffer->write_index); + u32 write_loc; + + /* + * The Hyper-V host writes the packet data, then uses + * store_release() to update the write_index. Use load_acquire() + * here to prevent loads of the packet data from being re-ordered + * before the read of the write_index and potentially getting + * stale data. + */ + write_loc = virt_load_acquire(&rbi->ring_buffer->write_index);
if (write_loc >= priv_read_loc) return write_loc - priv_read_loc;
From: Xiaoguang Wang xiaoguang.wang@linux.alibaba.com
[ Upstream commit a6968f7a367f128d120447360734344d5a3d5336 ]
tcmu_try_get_data_page() looks up pages under cmdr_lock, but it does not take refcount properly and just returns page pointer. When tcmu_try_get_data_page() returns, the returned page may have been freed by tcmu_blocks_release().
We need to get_page() under cmdr_lock to avoid concurrent tcmu_blocks_release().
Link: https://lore.kernel.org/r/20220311132206.24515-1-xiaoguang.wang@linux.alibab... Reviewed-by: Bodo Stroesser bostroesser@gmail.com Signed-off-by: Xiaoguang Wang xiaoguang.wang@linux.alibaba.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_user.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index c6950f157b99..c283e45ac300 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -1676,6 +1676,7 @@ static struct page *tcmu_try_get_block_page(struct tcmu_dev *udev, uint32_t dbi) mutex_lock(&udev->cmdr_lock); page = tcmu_get_block_page(udev, dbi); if (likely(page)) { + get_page(page); mutex_unlock(&udev->cmdr_lock); return page; } @@ -1714,6 +1715,7 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf) /* For the vmalloc()ed cmd area pages */ addr = (void *)(unsigned long)info->mem[mi].addr + offset; page = vmalloc_to_page(addr); + get_page(page); } else { uint32_t dbi;
@@ -1724,7 +1726,6 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; }
- get_page(page); vmf->page = page; return 0; }
From: James Smart jsmart2021@gmail.com
[ Upstream commit df0101197c4d9596682901631f3ee193ed354873 ]
When recovering from a pci-parity error the driver is failing to re-create queues, causing recovery to fail. Looking deeper, it was found that the interrupt vector count allocated on the recovery was fewer than the vectors originally allocated. This disparity resulted in CPU map entries with stale information. When the driver tries to re-create the queues, it attempts to use the stale information which indicates an eq/interrupt vector that was no longer created.
Fix by clearng the cpup map array before enabling and requesting the IRQs in the lpfc_sli_reset_slot_s4 routine().
Link: https://lore.kernel.org/r/20220317032737.45308-4-jsmart2021@gmail.com Co-developed-by: Justin Tee justin.tee@broadcom.com Signed-off-by: Justin Tee justin.tee@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_init.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 1149bfc42fe6..134e4ee5dc48 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -13614,6 +13614,8 @@ lpfc_io_slot_reset_s4(struct pci_dev *pdev) psli->sli_flag &= ~LPFC_SLI_ACTIVE; spin_unlock_irq(&phba->hbalock);
+ /* Init cpu_map array */ + lpfc_cpu_map_array_init(phba); /* Configure and enable interrupt */ intr_mode = lpfc_sli4_enable_intr(phba, phba->intr_mode); if (intr_mode == LPFC_INTR_ERROR) {
From: Tyrel Datwyler tyreld@linux.ibm.com
[ Upstream commit 0bade8e53279157c7cc9dd95d573b7e82223d78a ]
The adapter request_limit is hardcoded to be INITIAL_SRP_LIMIT which is currently an arbitrary value of 800. Increase this value to 1024 which better matches the characteristics of the typical IBMi Initiator that supports 32 LUNs and a queue depth of 32.
This change also has the secondary benefit of being a power of two as required by the kfifo API. Since, Commit ab9bb6318b09 ("Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"") the size of IU pool for each target has been rounded down to 512 when attempting to kfifo_init() those pools with the current request_limit size of 800.
Link: https://lore.kernel.org/r/20220322194443.678433-1-tyreld@linux.ibm.com Signed-off-by: Tyrel Datwyler tyreld@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c index cc3908c2d2f9..a3431485def8 100644 --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c @@ -35,7 +35,7 @@
#define IBMVSCSIS_VERSION "v0.2"
-#define INITIAL_SRP_LIMIT 800 +#define INITIAL_SRP_LIMIT 1024 #define DEFAULT_MAX_SECTORS 256 #define MAX_TXU 1024 * 1024
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit c3efcedd272aa6dd5929e20cf902a52ddaa1197a ]
KS8851_MLL selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL, so make KS8851_MLL also depend on PTP_1588_CLOCK_OPTIONAL since 'select' does not follow any dependency chains.
Fixes kconfig warning and build errors:
WARNING: unmet direct dependencies detected for MICREL_PHY Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m] Selected by [y]: - KS8851_MLL [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && HAS_IOMEM [=y]
ld: drivers/net/phy/micrel.o: in function `lan8814_ts_info': micrel.c:(.text+0xb35): undefined reference to `ptp_clock_index' ld: drivers/net/phy/micrel.o: in function `lan8814_probe': micrel.c:(.text+0x2586): undefined reference to `ptp_clock_register'
Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/micrel/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/micrel/Kconfig b/drivers/net/ethernet/micrel/Kconfig index 42bc014136fe..9ceb7e1fb169 100644 --- a/drivers/net/ethernet/micrel/Kconfig +++ b/drivers/net/ethernet/micrel/Kconfig @@ -37,6 +37,7 @@ config KS8851 config KS8851_MLL tristate "Micrel KS8851 MLL" depends on HAS_IOMEM + depends on PTP_1588_CLOCK_OPTIONAL select MII select CRC32 select EEPROM_93CX6
From: Christian Lamparter chunkeey@gmail.com
[ Upstream commit 5399752299396a3c9df6617f4b3c907d7aa4ded8 ]
Samsung' 840 EVO with the latest firmware (EXT0DB6Q) locks up with the a message: "READ LOG DMA EXT failed, trying PIO" during boot.
Initially this was discovered because it caused a crash with the sata_dwc_460ex controller on a WD MyBook Live DUO.
The reporter "Tice Rex" which has the unique opportunity that he has two Samsung 840 EVO SSD! One with the older firmware "EXT0BB0Q" which booted fine and didn't expose "READ LOG DMA EXT". But the newer/latest firmware "EXT0DB6Q" caused the headaches.
BugLink: https://github.com/openwrt/openwrt/issues/9505 Signed-off-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/libata-core.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index d2b544bdc7b5..f963a0a7da46 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3974,6 +3974,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Crucial_CT*MX100*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "Samsung SSD 840 EVO*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | + ATA_HORKAGE_NO_DMA_LOG | + ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Samsung SSD 840*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
From: Leo Ruan tingquan.ruan@cn.bosch.com
[ Upstream commit 070a88fd4a03f921b73a2059e97d55faaa447dab ]
This commit corrects the printing of the IPU clock error percentage if it is between -0.1% to -0.9%. For example, if the pixel clock requested is 27.2 MHz but only 27.0 MHz can be achieved the deviation is -0.8%. But the fixed point math had a flaw and calculated error of 0.2%.
Before: Clocks: IPU 270000000Hz DI 24716667Hz Needed 27200000Hz IPU clock can give 27000000 with divider 10, error 0.2% Want 27200000Hz IPU 270000000Hz DI 24716667Hz using IPU, 27000000Hz
After: Clocks: IPU 270000000Hz DI 24716667Hz Needed 27200000Hz IPU clock can give 27000000 with divider 10, error -0.8% Want 27200000Hz IPU 270000000Hz DI 24716667Hz using IPU, 27000000Hz
Signed-off-by: Leo Ruan tingquan.ruan@cn.bosch.com Signed-off-by: Mark Jonas mark.jonas@de.bosch.com Reviewed-by: Philipp Zabel p.zabel@pengutronix.de Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220207151411.5009-1-mark.jonas@de.bosch.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/ipu-v3/ipu-di.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/ipu-v3/ipu-di.c b/drivers/gpu/ipu-v3/ipu-di.c index b4a31d506fcc..74eca68891ad 100644 --- a/drivers/gpu/ipu-v3/ipu-di.c +++ b/drivers/gpu/ipu-v3/ipu-di.c @@ -451,8 +451,9 @@ static void ipu_di_config_clock(struct ipu_di *di,
error = rate / (sig->mode.pixelclock / 1000);
- dev_dbg(di->ipu->dev, " IPU clock can give %lu with divider %u, error %d.%u%%\n", - rate, div, (signed)(error - 1000) / 10, error % 10); + dev_dbg(di->ipu->dev, " IPU clock can give %lu with divider %u, error %c%d.%d%%\n", + rate, div, error < 1000 ? '-' : '+', + abs(error - 1000) / 10, abs(error - 1000) % 10);
/* Allow a 1% error */ if (error < 1010 && error >= 990) {
From: Jonathan Bakker xc-racer2@live.ca
[ Upstream commit 92d96b603738ec4f35cde7198c303ae264dd47cb ]
As per Table 130 of the wm8994 datasheet at [1], there is an off-on delay for LDO1 and LDO2. In the wm8958 datasheet [2], I could not find any reference to it. I could not find a wm1811 datasheet to double-check there, but as no one has complained presumably it works without it.
This solves the issue on Samsung Aries boards with a wm8994 where register writes fail when the device is powered off and back-on quickly.
[1] https://statics.cirrus.com/pubs/proDatasheet/WM8994_Rev4.6.pdf [2] https://statics.cirrus.com/pubs/proDatasheet/WM8958_v3.5.pdf
Signed-off-by: Jonathan Bakker xc-racer2@live.ca Acked-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/CY4PR04MB056771CFB80DC447C30D5A31CB1D9@CY4PR04MB05... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/wm8994-regulator.c | 42 ++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-)
diff --git a/drivers/regulator/wm8994-regulator.c b/drivers/regulator/wm8994-regulator.c index cadea0344486..40befdd9dfa9 100644 --- a/drivers/regulator/wm8994-regulator.c +++ b/drivers/regulator/wm8994-regulator.c @@ -71,6 +71,35 @@ static const struct regulator_ops wm8994_ldo2_ops = { };
static const struct regulator_desc wm8994_ldo_desc[] = { + { + .name = "LDO1", + .id = 1, + .type = REGULATOR_VOLTAGE, + .n_voltages = WM8994_LDO1_MAX_SELECTOR + 1, + .vsel_reg = WM8994_LDO_1, + .vsel_mask = WM8994_LDO1_VSEL_MASK, + .ops = &wm8994_ldo1_ops, + .min_uV = 2400000, + .uV_step = 100000, + .enable_time = 3000, + .off_on_delay = 36000, + .owner = THIS_MODULE, + }, + { + .name = "LDO2", + .id = 2, + .type = REGULATOR_VOLTAGE, + .n_voltages = WM8994_LDO2_MAX_SELECTOR + 1, + .vsel_reg = WM8994_LDO_2, + .vsel_mask = WM8994_LDO2_VSEL_MASK, + .ops = &wm8994_ldo2_ops, + .enable_time = 3000, + .off_on_delay = 36000, + .owner = THIS_MODULE, + }, +}; + +static const struct regulator_desc wm8958_ldo_desc[] = { { .name = "LDO1", .id = 1, @@ -172,9 +201,16 @@ static int wm8994_ldo_probe(struct platform_device *pdev) * regulator core and we need not worry about it on the * error path. */ - ldo->regulator = devm_regulator_register(&pdev->dev, - &wm8994_ldo_desc[id], - &config); + if (ldo->wm8994->type == WM8994) { + ldo->regulator = devm_regulator_register(&pdev->dev, + &wm8994_ldo_desc[id], + &config); + } else { + ldo->regulator = devm_regulator_register(&pdev->dev, + &wm8958_ldo_desc[id], + &config); + } + if (IS_ERR(ldo->regulator)) { ret = PTR_ERR(ldo->regulator); dev_err(wm8994->dev, "Failed to register LDO%d: %d\n",
From: Joey Gouly joey.gouly@arm.com
[ Upstream commit a2c0b0fbe01419f8f5d1c0b9c581631f34ffce8b ]
The alternatives code must be `noinstr` such that it does not patch itself, as the cache invalidation is only performed after all the alternatives have been applied.
Mark patch_alternative() as `noinstr`. Mark branch_insn_requires_update() and get_alt_insn() with `__always_inline` since they are both only called through patch_alternative().
Booting a kernel in QEMU TCG with KCSAN=y and ARM64_USE_LSE_ATOMICS=y caused a boot hang: [ 0.241121] CPU: All CPU(s) started at EL2
The alternatives code was patching the atomics in __tsan_read4() from LL/SC atomics to LSE atomics.
The following fragment is using LL/SC atomics in the .text section: | <__tsan_unaligned_read4+304>: ldxr x6, [x2] | <__tsan_unaligned_read4+308>: add x6, x6, x5 | <__tsan_unaligned_read4+312>: stxr w7, x6, [x2] | <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304>
This LL/SC atomic sequence was to be replaced with LSE atomics. However since the alternatives code was instrumentable, __tsan_read4() was being called after only the first instruction was replaced, which led to the following code in memory: | <__tsan_unaligned_read4+304>: ldadd x5, x6, [x2] | <__tsan_unaligned_read4+308>: add x6, x6, x5 | <__tsan_unaligned_read4+312>: stxr w7, x6, [x2] | <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304>
This caused an infinite loop as the `stxr` instruction never completed successfully, so `w7` was always 0.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20220405104733.11476-1-joey.gouly@arm.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/alternative.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index 73039949b5ce..5f8e4c2df53c 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -41,7 +41,7 @@ bool alternative_is_applied(u16 cpufeature) /* * Check if the target PC is within an alternative block. */ -static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc) +static __always_inline bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc) { unsigned long replptr = (unsigned long)ALT_REPL_PTR(alt); return !(pc >= replptr && pc <= (replptr + alt->alt_len)); @@ -49,7 +49,7 @@ static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc)
#define align_down(x, a) ((unsigned long)(x) & ~(((unsigned long)(a)) - 1))
-static u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnptr) +static __always_inline u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnptr) { u32 insn;
@@ -94,7 +94,7 @@ static u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnp return insn; }
-static void patch_alternative(struct alt_instr *alt, +static noinstr void patch_alternative(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) { __le32 *replptr;
From: Steve Capper steve.capper@arm.com
[ Upstream commit 697a1d44af8ba0477ee729e632f4ade37999249a ]
tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when updating the mmu_gather structure.
Unfortunately on arm64 there are two additional huge page sizes that need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced due to the fact that the tlb structure hasn't been correctly updated by the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry.
This patch adds inequality logic to the generic implementation of tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are effectively covered on arm64. Also, as well as ptes, pmds and puds; p4ds are now considered too.
Reported-by: David Hildenbrand david@redhat.com Suggested-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/linux-mm/811c5c8e-b3a2-85d2-049c-717f17c3a03a@redhat... Signed-off-by: Steve Capper steve.capper@arm.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220330112543.863-1-steve.capper@arm.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/asm-generic/tlb.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 6661ee1cff47..a0c4b99d2899 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -563,10 +563,14 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ do { \ unsigned long _sz = huge_page_size(h); \ - if (_sz == PMD_SIZE) \ - tlb_flush_pmd_range(tlb, address, _sz); \ - else if (_sz == PUD_SIZE) \ + if (_sz >= P4D_SIZE) \ + tlb_flush_p4d_range(tlb, address, _sz); \ + else if (_sz >= PUD_SIZE) \ tlb_flush_pud_range(tlb, address, _sz); \ + else if (_sz >= PMD_SIZE) \ + tlb_flush_pmd_range(tlb, address, _sz); \ + else \ + tlb_flush_pte_range(tlb, address, _sz); \ __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0)
From: Andy Chiu andy.chiu@sifive.com
[ Upstream commit d1c4f93e3f0a023024a6f022a61528c06cf1daa9 ]
The call to axienet_mdio_setup should not depend on whether "phy-node" pressents on the DT. Besides, since `lp->phy_node` is used if PHY is in SGMII or 100Base-X modes, move it into the if statement. And the next patch will remove `lp->phy_node` from driver's private structure and do an of_node_put on it right away after use since it is not used elsewhere.
Signed-off-by: Andy Chiu andy.chiu@sifive.com Reviewed-by: Greentime Hu greentime.hu@sifive.com Reviewed-by: Robert Hancock robert.hancock@calian.com Reviewed-by: Radhey Shyam Pandey radhey.shyam.pandey@xilinx.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index bbdcba88c021..3d91baf2e55a 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -2060,15 +2060,14 @@ static int axienet_probe(struct platform_device *pdev) if (ret) goto cleanup_clk;
- lp->phy_node = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0); - if (lp->phy_node) { - ret = axienet_mdio_setup(lp); - if (ret) - dev_warn(&pdev->dev, - "error registering MDIO bus: %d\n", ret); - } + ret = axienet_mdio_setup(lp); + if (ret) + dev_warn(&pdev->dev, + "error registering MDIO bus: %d\n", ret); + if (lp->phy_mode == PHY_INTERFACE_MODE_SGMII || lp->phy_mode == PHY_INTERFACE_MODE_1000BASEX) { + lp->phy_node = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0); if (!lp->phy_node) { dev_err(&pdev->dev, "phy-handle required for 1000BaseX/SGMII\n"); ret = -EINVAL;
From: Marcin Kozlowski marcinguy@gmail.com
[ Upstream commit afb8e246527536848b9b4025b40e613edf776a9d ]
aqc111_rx_fixup() contains several out-of-bounds accesses that can be triggered by a malicious (or defective) USB device, in particular:
- The metadata array (desc_offset..desc_offset+2*pkt_count) can be out of bounds, causing OOB reads and (on big-endian systems) OOB endianness flips. - A packet can overlap the metadata array, causing a later OOB endianness flip to corrupt data used by a cloned SKB that has already been handed off into the network stack. - A packet SKB can be constructed whose tail is far beyond its end, causing out-of-bounds heap data to be considered part of the SKB's data.
Found doing variant analysis. Tested it with another driver (ax88179_178a), since I don't have a aqc111 device to test it, but the code looks very similar.
Signed-off-by: Marcin Kozlowski marcinguy@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/aqc111.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index 0717c18015c9..c9c409518174 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -1102,10 +1102,15 @@ static int aqc111_rx_fixup(struct usbnet *dev, struct sk_buff *skb) if (start_of_descs != desc_offset) goto err;
- /* self check desc_offset from header*/ - if (desc_offset >= skb_len) + /* self check desc_offset from header and make sure that the + * bounds of the metadata array are inside the SKB + */ + if (pkt_count * 2 + desc_offset >= skb_len) goto err;
+ /* Packets must not overlap the metadata array */ + skb_trim(skb, desc_offset); + if (pkt_count == 0) goto err;
From: Xiaomeng Tong xiam0nd.tong@gmail.com
[ Upstream commit b423e54ba965b4469b48e46fd16941f1e1701697 ]
All remaining skbs should be released when myri10ge_xmit fails to transmit a packet. Fix it within another skb_list_walk_safe.
Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c index fc99ad8e4a38..1664e9184c9c 100644 --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c @@ -2895,11 +2895,9 @@ static netdev_tx_t myri10ge_sw_tso(struct sk_buff *skb, status = myri10ge_xmit(curr, dev); if (status != 0) { dev_kfree_skb_any(curr); - if (segs != NULL) { - curr = segs; - segs = next; + skb_list_walk_safe(next, curr, next) { curr->next = NULL; - dev_kfree_skb_any(segs); + dev_kfree_skb_any(curr); } goto drop; }
From: Martin Leung Martin.Leung@amd.com
[ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ]
why and how: causes failure on install on certain machines
Reviewed-by: George Shen George.Shen@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Martin Leung Martin.Leung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index ac5323596c65..93f5229c303e 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1173,10 +1173,6 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false;
- /* Check for FEC status*/ - if (link->link_enc->funcs->fec_is_active(link->link_enc)) - return false; - enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
if (enc_inst == ENGINE_ID_UNKNOWN)
From: Roman Li Roman.Li@amd.com
[ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ]
[Why] On resume we do link detection for all non-MST connectors. MST is handled separately. However the condition for telling if connector is on mst branch is not enough for mst hub case. Link detection for mst branch link leads to mst topology reset. That causes assert in dc_link_allocate_mst_payload()
[How] Use link type as indicator for mst link.
Reviewed-by: Wayne Lin Wayne.Lin@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Roman Li Roman.Li@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e828f9414ba2..7bb151283f44 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2022,7 +2022,8 @@ static int dm_resume(void *handle) * this is the case when traversing through already created * MST connectors, should be skipped */ - if (aconnector->mst_port) + if (aconnector->dc_link && + aconnector->dc_link->type == dc_connection_mst_branch) continue;
mutex_lock(&aconnector->hpd_lock);
From: Alexey Galakhov agalakhov@gmail.com
[ Upstream commit 5f2bce1e222028dc1c15f130109a17aa654ae6e8 ]
The HighPoint RocketRaid 2640 is a low-cost SAS controller based on Marvell chip. The chip in question was already supported by the kernel, just the PCI ID of this particular board was missing.
Link: https://lore.kernel.org/r/20220309212535.402987-1-agalakhov@gmail.com Signed-off-by: Alexey Galakhov agalakhov@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mvsas/mv_init.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index 0cfea7b2ab13..85ca8421fb86 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -646,6 +646,7 @@ static struct pci_device_id mvs_pci_table[] = { { PCI_VDEVICE(ARECA, PCI_DEVICE_ID_ARECA_1300), chip_1300 }, { PCI_VDEVICE(ARECA, PCI_DEVICE_ID_ARECA_1320), chip_1320 }, { PCI_VDEVICE(ADAPTEC2, 0x0450), chip_6440 }, + { PCI_VDEVICE(TTI, 0x2640), chip_6440 }, { PCI_VDEVICE(TTI, 0x2710), chip_9480 }, { PCI_VDEVICE(TTI, 0x2720), chip_9480 }, { PCI_VDEVICE(TTI, 0x2721), chip_9480 },
From: Chandrakanth patil chandrakanth.patil@broadcom.com
[ Upstream commit 56495f295d8e021f77d065b890fc0100e3f9f6d8 ]
The megaraid_sas driver supports single LUN for RAID devices. That is LUN 0. All other LUNs are unsupported. When a device scan on a logical target with invalid LUN number is invoked through sysfs, that target ends up getting removed.
Add LUN ID validation in the slave destroy function to avoid the target deletion.
Link: https://lore.kernel.org/r/20220324094711.48833-1-chandrakanth.patil@broadcom... Signed-off-by: Chandrakanth patil chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/megaraid/megaraid_sas.h | 3 +++ drivers/scsi/megaraid/megaraid_sas_base.c | 7 +++++++ 2 files changed, 10 insertions(+)
diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 6b8ec57e8bdf..c088a848776e 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -2554,6 +2554,9 @@ struct megasas_instance_template { #define MEGASAS_IS_LOGICAL(sdev) \ ((sdev->channel < MEGASAS_MAX_PD_CHANNELS) ? 0 : 1)
+#define MEGASAS_IS_LUN_VALID(sdev) \ + (((sdev)->lun == 0) ? 1 : 0) + #define MEGASAS_DEV_INDEX(scp) \ (((scp->device->channel % 2) * MEGASAS_MAX_DEV_PER_CHANNEL) + \ scp->device->id) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 1a70cc995c28..84a2e9292fd0 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -2111,6 +2111,9 @@ static int megasas_slave_alloc(struct scsi_device *sdev) goto scan_target; } return -ENXIO; + } else if (!MEGASAS_IS_LUN_VALID(sdev)) { + sdev_printk(KERN_INFO, sdev, "%s: invalid LUN\n", __func__); + return -ENXIO; }
scan_target: @@ -2141,6 +2144,10 @@ static void megasas_slave_destroy(struct scsi_device *sdev) instance = megasas_lookup_instance(sdev->host->host_no);
if (MEGASAS_IS_LOGICAL(sdev)) { + if (!MEGASAS_IS_LUN_VALID(sdev)) { + sdev_printk(KERN_INFO, sdev, "%s: invalid LUN\n", __func__); + return; + } ld_tgt_id = MEGASAS_TARGET_ID(sdev); instance->ld_tgtid_status[ld_tgt_id] = LD_TARGET_ID_DELETED; if (megasas_dbg_lvl & LD_PD_DEBUG)
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit ec4eb8a86ade4d22633e1da2a7d85a846b7d1798 ]
When a slip driver is detaching, the slip_close() will act to cleanup necessary resources and sl->tty is set to NULL in slip_close(). Meanwhile, the packet we transmit is blocked, sl_tx_timeout() will be called. Although slip_close() and sl_tx_timeout() use sl->lock to synchronize, we don`t judge whether sl->tty equals to NULL in sl_tx_timeout() and the null pointer dereference bug will happen.
(Thread 1) | (Thread 2) | slip_close() | spin_lock_bh(&sl->lock) | ... ... | sl->tty = NULL //(1) sl_tx_timeout() | spin_unlock_bh(&sl->lock) spin_lock(&sl->lock); | ... | ... tty_chars_in_buffer(sl->tty)| if (tty->ops->..) //(2) | ... | synchronize_rcu()
We set NULL to sl->tty in position (1) and dereference sl->tty in position (2).
This patch adds check in sl_tx_timeout(). If sl->tty equals to NULL, sl_tx_timeout() will goto out.
Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Jiri Slaby jirislaby@kernel.org Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/slip/slip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c index f81fb0b13a94..369bd30fed35 100644 --- a/drivers/net/slip/slip.c +++ b/drivers/net/slip/slip.c @@ -468,7 +468,7 @@ static void sl_tx_timeout(struct net_device *dev, unsigned int txqueue) spin_lock(&sl->lock);
if (netif_queue_stopped(dev)) { - if (!netif_running(dev)) + if (!netif_running(dev) || !sl->tty) goto out;
/* May be we must check transmitter timeout here ?
From: Borislav Petkov bp@suse.de
[ Upstream commit d02b4dd84e1a90f7f1444d027c0289bf355b0d5a ]
Fix:
In file included from <command-line>:0:0: In function ‘ddr_perf_counter_enable’, inlined from ‘ddr_perf_irq_handler’ at drivers/perf/fsl_imx8_ddr_perf.c:651:2: ././include/linux/compiler_types.h:352:38: error: call to ‘__compiletime_assert_729’ \ declared with attribute error: FIELD_PREP: mask is not constant _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ...
See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory details as to why it triggers with older gccs only.
Signed-off-by: Borislav Petkov bp@suse.de Cc: Frank Li Frank.li@nxp.com Cc: Will Deacon will@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Shawn Guo shawnguo@kernel.org Cc: Sascha Hauer s.hauer@pengutronix.de Cc: Pengutronix Kernel Team kernel@pengutronix.de Cc: Fabio Estevam festevam@gmail.com Cc: NXP Linux Team linux-imx@nxp.com Cc: linux-arm-kernel@lists.infradead.org Acked-by: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20220405151517.29753-10-bp@alien8.de Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/fsl_imx8_ddr_perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c index 7f7bc0993670..e09bbf3890c4 100644 --- a/drivers/perf/fsl_imx8_ddr_perf.c +++ b/drivers/perf/fsl_imx8_ddr_perf.c @@ -29,7 +29,7 @@ #define CNTL_OVER_MASK 0xFFFFFFFE
#define CNTL_CSV_SHIFT 24 -#define CNTL_CSV_MASK (0xFF << CNTL_CSV_SHIFT) +#define CNTL_CSV_MASK (0xFFU << CNTL_CSV_SHIFT)
#define EVENT_CYCLES_ID 0 #define EVENT_CYCLES_COUNTER 0
From: Juergen Gross jgross@suse.com
commit e553f62f10d93551eb883eca227ac54d1a4fad84 upstream.
Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") only zones with free memory are included in a built zonelist. This is problematic when e.g. all memory of a zone has been ballooned out when zonelists are being rebuilt.
The decision whether to rebuild the zonelists when onlining new memory is done based on populated_zone() returning 0 for the zone the memory will be added to. The new zone is added to the zonelists only, if it has free memory pages (managed_zone() returns a non-zero value) after the memory has been onlined. This implies, that onlining memory will always free the added pages to the allocator immediately, but this is not true in all cases: when e.g. running as a Xen guest the onlined new memory will be added only to the ballooned memory list, it will be freed only when the guest is being ballooned up afterwards.
Another problem with using managed_zone() for the decision whether a zone is being added to the zonelists is, that a zone with all memory used will in fact be removed from all zonelists in case the zonelists happen to be rebuilt.
Use populated_zone() when building a zonelist as it has been done before that commit.
There was a report that QubesOS (based on Xen) is hitting this problem. Xen has switched to use the zone device functionality in kernel 5.9 and QubesOS wants to use memory hotplugging for guests in order to be able to start a guest with minimal memory and expand it as needed. This was the report leading to the patch.
Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") Signed-off-by: Juergen Gross jgross@suse.com Reported-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: David Hildenbrand david@redhat.com Cc: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Reviewed-by: Wei Yang richard.weiyang@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5653,7 +5653,7 @@ static int build_zonerefs_node(pg_data_t do { zone_type--; zone = pgdat->node_zones + zone_type; - if (managed_zone(zone)) { + if (populated_zone(zone)) { zoneref_set_zone(zone, &zonerefs[nr_zones++]); check_highest_zone(zone_type); }
From: Minchan Kim minchan@kernel.org
commit e914d8f00391520ecc4495dd0ca0124538ab7119 upstream.
Two processes under CLONE_VM cloning, user process can be corrupted by seeing zeroed page unexpectedly.
CPU A CPU B
do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage valid data swap_slot_free_notify delete zram entry swap_readpage zeroed(invalid) data pte_lock map the *zero data* to userspace pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return and next refault will read zeroed data
The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't increase the refcount of swap slot at copy_mm so it couldn't catch up whether it's safe or not to discard data from backing device. In the case, only the lock it could rely on to synchronize swap slot freeing is page table lock. Thus, this patch gets rid of the swap_slot_free_notify function. With this patch, CPU A will see correct data.
CPU A CPU B
do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage original data pte_lock map the original data swap_free swap_range_free bd_disk->fops->swap_slot_free_notify swap_readpage read zeroed data pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return on next refault will see mapped data by CPU B
The concern of the patch would increase memory consumption since it could keep wasted memory with compressed form in zram as well as uncompressed form in address space. However, most of cases of zram uses no readahead and do_swap_page is followed by swap_free so it will free the compressed form from in zram quickly.
Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by: Ivan Babrou ivan@cloudflare.com Tested-by: Ivan Babrou ivan@cloudflare.com Signed-off-by: Minchan Kim minchan@kernel.org Cc: Nitin Gupta ngupta@vflare.org Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: Jens Axboe axboe@kernel.dk Cc: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org [4.14+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page_io.c | 54 ------------------------------------------------------ 1 file changed, 54 deletions(-)
--- a/mm/page_io.c +++ b/mm/page_io.c @@ -69,54 +69,6 @@ void end_swap_bio_write(struct bio *bio) bio_put(bio); }
-static void swap_slot_free_notify(struct page *page) -{ - struct swap_info_struct *sis; - struct gendisk *disk; - swp_entry_t entry; - - /* - * There is no guarantee that the page is in swap cache - the software - * suspend code (at least) uses end_swap_bio_read() against a non- - * swapcache page. So we must check PG_swapcache before proceeding with - * this optimization. - */ - if (unlikely(!PageSwapCache(page))) - return; - - sis = page_swap_info(page); - if (data_race(!(sis->flags & SWP_BLKDEV))) - return; - - /* - * The swap subsystem performs lazy swap slot freeing, - * expecting that the page will be swapped out again. - * So we can avoid an unnecessary write if the page - * isn't redirtied. - * This is good for real swap storage because we can - * reduce unnecessary I/O and enhance wear-leveling - * if an SSD is used as the as swap device. - * But if in-memory swap device (eg zram) is used, - * this causes a duplicated copy between uncompressed - * data in VM-owned memory and compressed data in - * zram-owned memory. So let's free zram-owned memory - * and make the VM-owned decompressed page *dirty*, - * so the page should be swapped out somewhere again if - * we again wish to reclaim it. - */ - disk = sis->bdev->bd_disk; - entry.val = page_private(page); - if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) { - unsigned long offset; - - offset = swp_offset(entry); - - SetPageDirty(page); - disk->fops->swap_slot_free_notify(sis->bdev, - offset); - } -} - static void end_swap_bio_read(struct bio *bio) { struct page *page = bio_first_page_all(bio); @@ -132,7 +84,6 @@ static void end_swap_bio_read(struct bio }
SetPageUptodate(page); - swap_slot_free_notify(page); out: unlock_page(page); WRITE_ONCE(bio->bi_private, NULL); @@ -409,11 +360,6 @@ int swap_readpage(struct page *page, boo if (sis->flags & SWP_SYNCHRONOUS_IO) { ret = bdev_read_page(sis->bdev, swap_page_sector(page), page); if (!ret) { - if (trylock_page(page)) { - swap_slot_free_notify(page); - unlock_page(page); - } - count_vm_event(PSWPIN); goto out; }
From: Patrick Wang patrick.wang.shcn@gmail.com
commit 23c2d497de21f25898fbea70aeb292ab8acc8c94 upstream.
The kmemleak_*_phys() apis do not check the address for lowmem's min boundary, while the caller may pass an address below lowmem, which will trigger an oops:
# echo scan > /sys/kernel/debug/kmemleak Unable to handle kernel paging request at virtual address ff5fffffffe00000 Oops [#1] Modules linked in: CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-20220407 #33 Hardware name: riscv-virtio,qemu (DT) epc : scan_block+0x74/0x15c ra : scan_block+0x72/0x15c epc : ffffffff801e5806 ra : ffffffff801e5804 sp : ff200000104abc30 gp : ffffffff815cd4e8 tp : ff60000004cfa340 t0 : 0000000000000200 t1 : 00aaaaaac23954cc t2 : 00000000000003ff s0 : ff200000104abc90 s1 : ffffffff81b0ff28 a0 : 0000000000000000 a1 : ff5fffffffe01000 a2 : ffffffff81b0ff28 a3 : 0000000000000002 a4 : 0000000000000001 a5 : 0000000000000000 a6 : ff200000104abd7c a7 : 0000000000000005 s2 : ff5fffffffe00ff9 s3 : ffffffff815cd998 s4 : ffffffff815d0e90 s5 : ffffffff81b0ff28 s6 : 0000000000000020 s7 : ffffffff815d0eb0 s8 : ffffffffffffffff s9 : ff5fffffffe00000 s10: ff5fffffffe01000 s11: 0000000000000022 t3 : 00ffffffaa17db4c t4 : 000000000000000f t5 : 0000000000000001 t6 : 0000000000000000 status: 0000000000000100 badaddr: ff5fffffffe00000 cause: 000000000000000d scan_gray_list+0x12e/0x1a6 kmemleak_scan+0x2aa/0x57e kmemleak_write+0x32a/0x40c full_proxy_write+0x56/0x82 vfs_write+0xa6/0x2a6 ksys_write+0x6c/0xe2 sys_write+0x22/0x2a ret_from_syscall+0x0/0x2
The callers may not quite know the actual address they pass(e.g. from devicetree). So the kmemleak_*_phys() apis should guarantee the address they finally use is in lowmem range, so check the address for lowmem's min boundary.
Link: https://lkml.kernel.org/r/20220413122925.33856-1-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang patrick.wang.shcn@gmail.com Acked-by: Catalin Marinas catalin.marinas@arm.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/kmemleak.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -1123,7 +1123,7 @@ EXPORT_SYMBOL(kmemleak_no_scan); void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count, gfp_t gfp) { - if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn) + if (PHYS_PFN(phys) >= min_low_pfn && PHYS_PFN(phys) < max_low_pfn) kmemleak_alloc(__va(phys), size, min_count, gfp); } EXPORT_SYMBOL(kmemleak_alloc_phys); @@ -1137,7 +1137,7 @@ EXPORT_SYMBOL(kmemleak_alloc_phys); */ void __ref kmemleak_free_part_phys(phys_addr_t phys, size_t size) { - if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn) + if (PHYS_PFN(phys) >= min_low_pfn && PHYS_PFN(phys) < max_low_pfn) kmemleak_free_part(__va(phys), size); } EXPORT_SYMBOL(kmemleak_free_part_phys); @@ -1149,7 +1149,7 @@ EXPORT_SYMBOL(kmemleak_free_part_phys); */ void __ref kmemleak_not_leak_phys(phys_addr_t phys) { - if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn) + if (PHYS_PFN(phys) >= min_low_pfn && PHYS_PFN(phys) < max_low_pfn) kmemleak_not_leak(__va(phys)); } EXPORT_SYMBOL(kmemleak_not_leak_phys); @@ -1161,7 +1161,7 @@ EXPORT_SYMBOL(kmemleak_not_leak_phys); */ void __ref kmemleak_ignore_phys(phys_addr_t phys) { - if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn) + if (PHYS_PFN(phys) >= min_low_pfn && PHYS_PFN(phys) < max_low_pfn) kmemleak_ignore(__va(phys)); } EXPORT_SYMBOL(kmemleak_ignore_phys);
From: Sean Christopherson seanjc@google.com
commit 1d0e84806047f38027d7572adb4702ef7c09b317 upstream.
Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'.
Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user.
In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation.
Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows.
========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> =========================================================================
Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: Bruno Goncalves bgoncalv@redhat.com Reported-by: Jan Stancek jstancek@redhat.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220331221359.3912754-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/kvm_host.h | 5 +++-- arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++++---- arch/x86/kvm/x86.c | 20 ++++++++++++++++++-- 3 files changed, 37 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1340,8 +1340,9 @@ static inline int kvm_arch_flush_remote_ return -ENOTSUPP; }
-int kvm_mmu_module_init(void); -void kvm_mmu_module_exit(void); +void kvm_mmu_x86_module_init(void); +int kvm_mmu_vendor_module_init(void); +void kvm_mmu_vendor_module_exit(void);
void kvm_mmu_destroy(struct kvm_vcpu *vcpu); int kvm_mmu_create(struct kvm_vcpu *vcpu); --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5876,12 +5876,24 @@ static int set_nx_huge_pages(const char return 0; }
-int kvm_mmu_module_init(void) +/* + * nx_huge_pages needs to be resolved to true/false when kvm.ko is loaded, as + * its default value of -1 is technically undefined behavior for a boolean. + */ +void kvm_mmu_x86_module_init(void) { - int ret = -ENOMEM; - if (nx_huge_pages == -1) __set_nx_huge_pages(get_nx_auto_mode()); +} + +/* + * The bulk of the MMU initialization is deferred until the vendor module is + * loaded as many of the masks/values may be modified by VMX or SVM, i.e. need + * to be reset when a potentially different vendor module is loaded. + */ +int kvm_mmu_vendor_module_init(void) +{ + int ret = -ENOMEM;
/* * MMU roles use union aliasing which is, generally speaking, an @@ -5955,7 +5967,7 @@ void kvm_mmu_destroy(struct kvm_vcpu *vc mmu_free_memory_caches(vcpu); }
-void kvm_mmu_module_exit(void) +void kvm_mmu_vendor_module_exit(void) { mmu_destroy_caches(); percpu_counter_destroy(&kvm_total_used_mmu_pages); --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8005,7 +8005,7 @@ int kvm_arch_init(void *opaque) goto out_free_x86_emulator_cache; }
- r = kvm_mmu_module_init(); + r = kvm_mmu_vendor_module_init(); if (r) goto out_free_percpu;
@@ -8065,7 +8065,7 @@ void kvm_arch_exit(void) cancel_work_sync(&pvclock_gtod_work); #endif kvm_x86_ops.hardware_enable = NULL; - kvm_mmu_module_exit(); + kvm_mmu_vendor_module_exit(); free_percpu(user_return_msrs); kmem_cache_destroy(x86_emulator_cache); kmem_cache_destroy(x86_fpu_cache); @@ -11426,3 +11426,19 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_un EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request); + +static int __init kvm_x86_init(void) +{ + kvm_mmu_x86_module_init(); + return 0; +} +module_init(kvm_x86_init); + +static void __exit kvm_x86_exit(void) +{ + /* + * If module_init() is implemented, module_exit() must also be + * implemented to allow module unload. + */ +} +module_exit(kvm_x86_exit);
From: Oliver Upton oupton@google.com
commit a44a4cc1c969afec97dbb2aedaf6f38eaa6253bb upstream.
Unfortunately, there is no guarantee that KVM was able to instantiate a debugfs directory for a particular VM. To that end, KVM shouldn't even attempt to create new debugfs files in this case. If the specified parent dentry is NULL, debugfs_create_file() will instantiate files at the root of debugfs.
For arm64, it is possible to create the vgic-state file outside of a VM directory, the file is not cleaned up when a VM is destroyed. Nonetheless, the corresponding struct kvm is freed when the VM is destroyed.
Nip the problem in the bud for all possible errant debugfs file creations by initializing kvm->debugfs_dentry to -ENOENT. In so doing, debugfs_create_file() will fail instead of creating the file in the root directory.
Cc: stable@kernel.org Fixes: 929f45e32499 ("kvm: no need to check return value of debugfs_create functions") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220406235615.1447180-2-oupton@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/kvm_main.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -673,7 +673,7 @@ static void kvm_destroy_vm_debugfs(struc { int i;
- if (!kvm->debugfs_dentry) + if (IS_ERR(kvm->debugfs_dentry)) return;
debugfs_remove_recursive(kvm->debugfs_dentry); @@ -693,6 +693,12 @@ static int kvm_create_vm_debugfs(struct struct kvm_stat_data *stat_data; struct kvm_stats_debugfs_item *p;
+ /* + * Force subsequent debugfs file creations to fail if the VM directory + * is not created. + */ + kvm->debugfs_dentry = ERR_PTR(-ENOENT); + if (!debugfs_initialized()) return 0;
@@ -4731,7 +4737,7 @@ static void kvm_uevent_notify_change(uns } add_uevent_var(env, "PID=%d", kvm->userspace_pid);
- if (kvm->debugfs_dentry) { + if (!IS_ERR(kvm->debugfs_dentry)) { char *tmp, *p = kmalloc(PATH_MAX, GFP_KERNEL_ACCOUNT);
if (p) {
From: Johan Hovold johan@kernel.org
commit b452dbf24d7d9a990d70118462925f6ee287d135 upstream.
Make sure to free the flash platform device in the event that registration fails during probe.
Fixes: ca7d8b980b67 ("memory: add Renesas RPC-IF driver") Cc: stable@vger.kernel.org # 5.8 Cc: Sergei Shtylyov sergei.shtylyov@cogentembedded.com Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20220303180632.3194-1-johan@kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/memory/renesas-rpc-if.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/memory/renesas-rpc-if.c +++ b/drivers/memory/renesas-rpc-if.c @@ -592,6 +592,7 @@ static int rpcif_probe(struct platform_d struct platform_device *vdev; struct device_node *flash; const char *name; + int ret;
flash = of_get_next_child(pdev->dev.of_node, NULL); if (!flash) { @@ -615,7 +616,14 @@ static int rpcif_probe(struct platform_d return -ENOMEM; vdev->dev.parent = &pdev->dev; platform_set_drvdata(pdev, vdev); - return platform_device_add(vdev); + + ret = platform_device_add(vdev); + if (ret) { + platform_device_put(vdev); + return ret; + } + + return 0; }
static int rpcif_remove(struct platform_device *pdev)
From: Jason A. Donenfeld Jason@zx2c4.com
commit c40160f2998c897231f8454bf797558d30a20375 upstream.
While the latent entropy plugin mostly doesn't derive entropy from get_random_const() for measuring the call graph, when __latent_entropy is applied to a constant, then it's initialized statically to output from get_random_const(). In that case, this data is derived from a 64-bit seed, which means a buffer of 512 bits doesn't really have that amount of compile-time entropy.
This patch fixes that shortcoming by just buffering chunks of /dev/urandom output and doling it out as requested.
At the same time, it's important that we don't break the use of -frandom-seed, for people who want the runtime benefits of the latent entropy plugin, while still having compile-time determinism. In that case, we detect whether gcc's set_random_seed() has been called by making a call to get_random_seed(noinit=true) in the plugin init function, which is called after set_random_seed() is called but before anything that calls get_random_seed(noinit=false), and seeing if it's zero or not. If it's not zero, we're in deterministic mode, and so we just generate numbers with a basic xorshift prng.
Note that we don't detect if -frandom-seed is being used using the documented local_tick variable, because it's assigned via: local_tick = (unsigned) tv.tv_sec * 1000 + tv.tv_usec / 1000; which may well overflow and become -1 on its own, and so isn't reliable: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105171
[kees: The 256 byte rnd_buf size was chosen based on average (250), median (64), and std deviation (575) bytes of used entropy for a defconfig x86_64 build]
Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org Cc: PaX Team pageexec@freemail.hu Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20220405222815.21155-1-Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/gcc-plugins/latent_entropy_plugin.c | 44 +++++++++++++++++----------- 1 file changed, 27 insertions(+), 17 deletions(-)
--- a/scripts/gcc-plugins/latent_entropy_plugin.c +++ b/scripts/gcc-plugins/latent_entropy_plugin.c @@ -86,25 +86,31 @@ static struct plugin_info latent_entropy .help = "disable\tturn off latent entropy instrumentation\n", };
-static unsigned HOST_WIDE_INT seed; -/* - * get_random_seed() (this is a GCC function) generates the seed. - * This is a simple random generator without any cryptographic security because - * the entropy doesn't come from here. - */ +static unsigned HOST_WIDE_INT deterministic_seed; +static unsigned HOST_WIDE_INT rnd_buf[32]; +static size_t rnd_idx = ARRAY_SIZE(rnd_buf); +static int urandom_fd = -1; + static unsigned HOST_WIDE_INT get_random_const(void) { - unsigned int i; - unsigned HOST_WIDE_INT ret = 0; - - for (i = 0; i < 8 * sizeof(ret); i++) { - ret = (ret << 1) | (seed & 1); - seed >>= 1; - if (ret & 1) - seed ^= 0xD800000000000000ULL; + if (deterministic_seed) { + unsigned HOST_WIDE_INT w = deterministic_seed; + w ^= w << 13; + w ^= w >> 7; + w ^= w << 17; + deterministic_seed = w; + return deterministic_seed; }
- return ret; + if (urandom_fd < 0) { + urandom_fd = open("/dev/urandom", O_RDONLY); + gcc_assert(urandom_fd >= 0); + } + if (rnd_idx >= ARRAY_SIZE(rnd_buf)) { + gcc_assert(read(urandom_fd, rnd_buf, sizeof(rnd_buf)) == sizeof(rnd_buf)); + rnd_idx = 0; + } + return rnd_buf[rnd_idx++]; }
static tree tree_get_random_const(tree type) @@ -549,8 +555,6 @@ static void latent_entropy_start_unit(vo tree type, id; int quals;
- seed = get_random_seed(false); - if (in_lto_p) return;
@@ -585,6 +589,12 @@ __visible int plugin_init(struct plugin_ const struct plugin_argument * const argv = plugin_info->argv; int i;
+ /* + * Call get_random_seed() with noinit=true, so that this returns + * 0 in the case where no seed has been passed via -frandom-seed. + */ + deterministic_seed = get_random_seed(true); + static const struct ggc_root_tab gt_ggc_r_gt_latent_entropy[] = { { .base = &latent_entropy_decl,
From: Toke Høiland-Jørgensen toke@toke.dk
commit 037250f0a45cf9ecf5b52d4b9ff8eadeb609c800 upstream.
The ath9k driver was not properly clearing the status area in the ieee80211_tx_info struct before reporting TX status to mac80211. Instead, it was manually filling in fields, which meant that fields introduced later were left as-is.
Conveniently, mac80211 actually provides a helper to zero out the status area, so use that to make sure we zero everything.
The last commit touching the driver function writing the status information seems to have actually been fixing an issue that was also caused by the area being uninitialised; but it only added clearing of a single field instead of the whole struct. That is now redundant, though, so revert that commit and use it as a convenient Fixes tag.
Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status") Reported-by: Bagas Sanjaya bagasdotme@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Toke Høiland-Jørgensen toke@toke.dk Tested-by: Bagas Sanjaya bagasdotme@gmail.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20220330164409.16645-1-toke@toke.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/ath/ath9k/xmit.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -2512,6 +2512,8 @@ static void ath_tx_rc_status(struct ath_ struct ath_hw *ah = sc->sc_ah; u8 i, tx_rateindex;
+ ieee80211_tx_info_clear_status(tx_info); + if (txok) tx_info->status.ack_signal = ts->ts_rssi;
@@ -2554,9 +2556,6 @@ static void ath_tx_rc_status(struct ath_ }
tx_info->status.rates[tx_rateindex].count = ts->ts_longretry + 1; - - /* we report airtime in ath_tx_count_airtime(), don't report twice */ - tx_info->status.tx_time = 0; }
static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
From: Toke Høiland-Jørgensen toke@redhat.com
commit 5a6b06f5927c940fa44026695779c30b7536474c upstream.
The ieee80211_tx_info_clear_status() helper also clears the rate counts and the driver-private part of struct ieee80211_tx_info, so using it breaks quite a few other things. So back out of using it, and instead define a ath-internal helper that only clears the area between the status_driver_data and the rates info. Combined with moving the ath_frame_info struct to status_driver_data, this avoids clearing anything we shouldn't be, and so we can keep the existing code for handling the rate information.
While fixing this I also noticed that the setting of tx_info->status.rates[tx_rateindex].count on hardware underrun errors was always immediately overridden by the normal setting of the same fields, so rearrange the code so that the underrun detection actually takes effect.
The new helper could be generalised to a 'memset_between()' helper, but leave it as a driver-internal helper for now since this needs to go to stable.
Cc: stable@vger.kernel.org Reported-by: Peter Seiderer ps.report@gmx.net Fixes: 037250f0a45c ("ath9k: Properly clear TX status area before reporting to mac80211") Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Reviewed-by: Peter Seiderer ps.report@gmx.net Tested-by: Peter Seiderer ps.report@gmx.net Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20220404204800.2681133-1-toke@toke.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/ath/ath9k/main.c | 2 +- drivers/net/wireless/ath/ath9k/xmit.c | 30 ++++++++++++++++++++---------- 2 files changed, 21 insertions(+), 11 deletions(-)
--- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -839,7 +839,7 @@ static bool ath9k_txq_list_has_key(struc continue;
txinfo = IEEE80211_SKB_CB(bf->bf_mpdu); - fi = (struct ath_frame_info *)&txinfo->rate_driver_data[0]; + fi = (struct ath_frame_info *)&txinfo->status.status_driver_data[0]; if (fi->keyix == keyix) return true; } --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -141,8 +141,8 @@ static struct ath_frame_info *get_frame_ { struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb); BUILD_BUG_ON(sizeof(struct ath_frame_info) > - sizeof(tx_info->rate_driver_data)); - return (struct ath_frame_info *) &tx_info->rate_driver_data[0]; + sizeof(tx_info->status.status_driver_data)); + return (struct ath_frame_info *) &tx_info->status.status_driver_data[0]; }
static void ath_send_bar(struct ath_atx_tid *tid, u16 seqno) @@ -2501,6 +2501,16 @@ skip_tx_complete: spin_unlock_irqrestore(&sc->tx.txbuflock, flags); }
+static void ath_clear_tx_status(struct ieee80211_tx_info *tx_info) +{ + void *ptr = &tx_info->status; + + memset(ptr + sizeof(tx_info->status.rates), 0, + sizeof(tx_info->status) - + sizeof(tx_info->status.rates) - + sizeof(tx_info->status.status_driver_data)); +} + static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf, struct ath_tx_status *ts, int nframes, int nbad, int txok) @@ -2512,7 +2522,7 @@ static void ath_tx_rc_status(struct ath_ struct ath_hw *ah = sc->sc_ah; u8 i, tx_rateindex;
- ieee80211_tx_info_clear_status(tx_info); + ath_clear_tx_status(tx_info);
if (txok) tx_info->status.ack_signal = ts->ts_rssi; @@ -2528,6 +2538,13 @@ static void ath_tx_rc_status(struct ath_ tx_info->status.ampdu_len = nframes; tx_info->status.ampdu_ack_len = nframes - nbad;
+ tx_info->status.rates[tx_rateindex].count = ts->ts_longretry + 1; + + for (i = tx_rateindex + 1; i < hw->max_rates; i++) { + tx_info->status.rates[i].count = 0; + tx_info->status.rates[i].idx = -1; + } + if ((ts->ts_status & ATH9K_TXERR_FILT) == 0 && (tx_info->flags & IEEE80211_TX_CTL_NO_ACK) == 0) { /* @@ -2549,13 +2566,6 @@ static void ath_tx_rc_status(struct ath_ tx_info->status.rates[tx_rateindex].count = hw->max_rate_tries; } - - for (i = tx_rateindex + 1; i < hw->max_rates; i++) { - tx_info->status.rates[i].count = 0; - tx_info->status.rates[i].idx = -1; - } - - tx_info->status.rates[tx_rateindex].count = ts->ts_longretry + 1; }
static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
From: Jia-Ju Bai baijiaju1990@gmail.com
commit 168a2f776b9762f4021421008512dd7ab7474df1 upstream.
In btrfs_get_root_ref(), when btrfs_insert_fs_root() fails, btrfs_put_root() can happen for two reasons:
- the root already exists in the tree, in that case it returns the reference obtained in btrfs_lookup_fs_root()
- another error so the cleanup is done in the fail label
Calling btrfs_put_root() unconditionally would lead to double decrement of the root reference possibly freeing it in the second case.
Reported-by: TOTE Robot oslab@tsinghua.edu.cn Fixes: bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1596,9 +1596,10 @@ again:
ret = btrfs_insert_fs_root(fs_info, root); if (ret) { - btrfs_put_root(root); - if (ret == -EEXIST) + if (ret == -EEXIST) { + btrfs_put_root(root); goto again; + } goto fail; } return root;
From: Naohiro Aota naohiro.aota@wdc.com
commit a690e5f2db4d1dca742ce734aaff9f3112d63764 upstream.
When btrfs balance is interrupted with umount, the background balance resumes on the next mount. There is a potential deadlock with FS freezing here like as described in commit 26559780b953 ("btrfs: zoned: mark relocation as writing"). Mark the process as sb_writing to avoid it.
Reviewed-by: Filipe Manana fdmanana@suse.com CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4220,10 +4220,12 @@ static int balance_kthread(void *data) struct btrfs_fs_info *fs_info = data; int ret = 0;
+ sb_start_write(fs_info->sb); mutex_lock(&fs_info->balance_mutex); if (fs_info->balance_ctl) ret = btrfs_balance(fs_info, fs_info->balance_ctl, NULL); mutex_unlock(&fs_info->balance_mutex); + sb_end_write(fs_info->sb);
return ret; }
From: Tim Crawford tcrawford@system76.com
commit 9eb6f5c388060d8cef3c8b616cc31b765e022359 upstream.
Fixes speaker output and headset detection on Clevo PD50PNT.
Signed-off-by: Tim Crawford tcrawford@system76.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220405182029.27431-1-tcrawford@system76.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -2626,6 +2626,7 @@ static const struct snd_pci_quirk alc882 SND_PCI_QUIRK(0x1558, 0x65e1, "Clevo PB51[ED][DF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65e5, "Clevo PC50D[PRS](?:-D|-G)?", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65f1, "Clevo PC50HS", ALC1220_FIXUP_CLEVO_PB51ED_PINS), + SND_PCI_QUIRK(0x1558, 0x65f5, "Clevo PD50PN[NRT]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x67d1, "Clevo PB71[ER][CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x67e1, "Clevo PB71[DE][CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x67e5, "Clevo PC70D[PRS](?:-D|-G)?", ALC1220_FIXUP_CLEVO_PB51ED_PINS),
From: Tao Jin tao-j@outlook.com
commit 264fb03497ec1c7841bba872571bcd11beed57a7 upstream.
For this specific device on Lenovo Thinkpad X12 tablet, the verbs were dumped by qemu running a guest OS that init this codec properly. After studying the dump, it turns out that the same quirk used by the other Lenovo devices can be reused.
The patch was tested working against the mainline kernel.
Cc: stable@vger.kernel.org Signed-off-by: Tao Jin tao-j@outlook.com Link: https://lore.kernel.org/r/CO6PR03MB6241CD73310B37858FE64C85E1E89@CO6PR03MB62... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -8995,6 +8995,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x17aa, 0x505d, "Thinkpad", ALC298_FIXUP_TPT470_DOCK), SND_PCI_QUIRK(0x17aa, 0x505f, "Thinkpad", ALC298_FIXUP_TPT470_DOCK), SND_PCI_QUIRK(0x17aa, 0x5062, "Thinkpad", ALC298_FIXUP_TPT470_DOCK), + SND_PCI_QUIRK(0x17aa, 0x508b, "Thinkpad X12 Gen 1", ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS), SND_PCI_QUIRK(0x17aa, 0x5109, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x511e, "Thinkpad", ALC298_FIXUP_TPT470_DOCK), SND_PCI_QUIRK(0x17aa, 0x511f, "Thinkpad", ALC298_FIXUP_TPT470_DOCK),
From: Fabio M. De Francesco fmdefrancesco@gmail.com
commit 2f7a26abb8241a0208c68d22815aa247c5ddacab upstream.
Syzbot reports "KASAN: null-ptr-deref Write in snd_pcm_format_set_silence".[1]
It is due to missing validation of the "silence" field of struct "pcm_format_data" in "pcm_formats" array.
Add a test for valid "pat" and, if it is not so, return -EINVAL.
[1] https://lore.kernel.org/lkml/000000000000d188ef05dc2c7279@google.com/
Reported-and-tested-by: syzbot+205eb15961852c2c5974@syzkaller.appspotmail.com Signed-off-by: Fabio M. De Francesco fmdefrancesco@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220409012655.9399-1-fmdefrancesco@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/pcm_misc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/core/pcm_misc.c +++ b/sound/core/pcm_misc.c @@ -429,7 +429,7 @@ int snd_pcm_format_set_silence(snd_pcm_f return 0; width = pcm_formats[(INT)format].phys; /* physical width */ pat = pcm_formats[(INT)format].silence; - if (! width) + if (!width || !pat) return -EINVAL; /* signed or 1 byte data */ if (pcm_formats[(INT)format].signd == 1 || width <= 8) {
From: Johannes Berg johannes.berg@intel.com
commit 6624bb34b4eb19f715db9908cca00122748765d7 upstream.
We need this to be at least two bytes, so we can access alpha2[0] and alpha2[1]. It may be three in case some userspace used NUL-termination since it was NLA_STRING (and we also push it out with NUL-termination).
Cc: stable@vger.kernel.org Reported-by: Lee Jones lee.jones@linaro.org Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/nl80211.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -475,7 +475,8 @@ static const struct nla_policy nl80211_p .len = IEEE80211_MAX_MESH_ID_LEN }, [NL80211_ATTR_MPATH_NEXT_HOP] = NLA_POLICY_ETH_ADDR_COMPAT,
- [NL80211_ATTR_REG_ALPHA2] = { .type = NLA_STRING, .len = 2 }, + /* allow 3 for NUL-termination, we used to declare this NLA_STRING */ + [NL80211_ATTR_REG_ALPHA2] = NLA_POLICY_RANGE(NLA_BINARY, 2, 3), [NL80211_ATTR_REG_RULES] = { .type = NLA_NESTED },
[NL80211_ATTR_BSS_CTS_PROT] = { .type = NLA_U8 },
From: Nicolas Dichtel nicolas.dichtel@6wind.com
commit e3fa461d8b0e185b7da8a101fe94dfe6dd500ac0 upstream.
kongweibin reported a kernel panic in ip6_forward() when input interface has no in6 dev associated.
The following tc commands were used to reproduce this panic: tc qdisc del dev vxlan100 root tc qdisc add dev vxlan100 root netem corrupt 5%
CC: stable@vger.kernel.org Fixes: ccd27f05ae7b ("ipv6: fix 'disable_policy' for fwd packets") Reported-by: kongweibin kongweibin2@huawei.com Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -508,7 +508,7 @@ int ip6_forward(struct sk_buff *skb) goto drop;
if (!net->ipv6.devconf_all->disable_policy && - !idev->cnf.disable_policy && + (!idev || !idev->cnf.disable_policy) && !xfrm6_policy_check(NULL, XFRM_POLICY_FWD, skb)) { __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); goto drop;
From: Melissa Wen mwen@igalia.com
commit e4f1541caf60fcbe5a59e9d25805c0b5865e546a upstream.
"Pre-multiplied" is the default pixel blend mode for KMS/DRM, as documented in supported_modes of drm_plane_create_blend_mode_property(): https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/drm_blend.c
In this mode, both 'pixel alpha' and 'plane alpha' participate in the calculation, as described by the pixel blend mode formula in KMS/DRM documentation:
out.rgb = plane_alpha * fg.rgb + (1 - (plane_alpha * fg.alpha)) * bg.rgb
Considering the blend config mechanisms we have in the driver so far, the alpha mode that better fits this blend mode is the _PER_PIXEL_ALPHA_COMBINED_GLOBAL_GAIN, where the value for global_gain is the plane alpha (global_alpha).
With this change, alpha property stops to be ignored. It also addresses Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1734
v2: * keep the 8-bit value for global_alpha_value (Nicholas) * correct the logical ordering for combined global gain (Nicholas) * apply to dcn10 too (Nicholas)
Signed-off-by: Melissa Wen mwen@igalia.com Tested-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Tested-by: Simon Ser contact@emersion.fr Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 14 +++++++++----- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 14 +++++++++----- 2 files changed, 18 insertions(+), 10 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -2387,14 +2387,18 @@ void dcn10_update_mpcc(struct dc *dc, st &blnd_cfg.black_color); }
- if (per_pixel_alpha) - blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_PER_PIXEL_ALPHA; - else - blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_GLOBAL_ALPHA; - blnd_cfg.overlap_only = false; blnd_cfg.global_gain = 0xff;
+ if (per_pixel_alpha && pipe_ctx->plane_state->global_alpha) { + blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_PER_PIXEL_ALPHA_COMBINED_GLOBAL_GAIN; + blnd_cfg.global_gain = pipe_ctx->plane_state->global_alpha_value; + } else if (per_pixel_alpha) { + blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_PER_PIXEL_ALPHA; + } else { + blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_GLOBAL_ALPHA; + } + if (pipe_ctx->plane_state->global_alpha) blnd_cfg.global_alpha = pipe_ctx->plane_state->global_alpha_value; else --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -2270,14 +2270,18 @@ void dcn20_update_mpcc(struct dc *dc, st pipe_ctx, &blnd_cfg.black_color); }
- if (per_pixel_alpha) - blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_PER_PIXEL_ALPHA; - else - blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_GLOBAL_ALPHA; - blnd_cfg.overlap_only = false; blnd_cfg.global_gain = 0xff;
+ if (per_pixel_alpha && pipe_ctx->plane_state->global_alpha) { + blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_PER_PIXEL_ALPHA_COMBINED_GLOBAL_GAIN; + blnd_cfg.global_gain = pipe_ctx->plane_state->global_alpha_value; + } else if (per_pixel_alpha) { + blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_PER_PIXEL_ALPHA; + } else { + blnd_cfg.alpha_mode = MPCC_ALPHA_BLEND_MODE_GLOBAL_ALPHA; + } + if (pipe_ctx->plane_state->global_alpha) blnd_cfg.global_alpha = pipe_ctx->plane_state->global_alpha_value; else
From: Tomasz Moń desowin@gmail.com
commit 4593c1b6d159f1e5c35c07a7f125e79e5a864302 upstream.
Enabling gfxoff quirk results in perfectly usable graphical user interface on MacBook Pro (15-inch, 2019) with Radeon Pro Vega 20 4 GB.
Without the quirk, X server is completely unusable as every few seconds there is gpu reset due to ring gfx timeout.
Signed-off-by: Tomasz Moń desowin@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1248,6 +1248,8 @@ static const struct amdgpu_gfxoff_quirk { 0x1002, 0x15dd, 0x103c, 0x83e7, 0xd3 }, /* GFXOFF is unstable on C6 parts with a VBIOS 113-RAVEN-114 */ { 0x1002, 0x15dd, 0x1002, 0x15dd, 0xc6 }, + /* Apple MacBook Pro (15-inch, 2019) Radeon Pro Vega 20 4 GB */ + { 0x1002, 0x69af, 0x106b, 0x019a, 0xc0 }, { 0, 0, 0, 0, 0 }, };
From: Rei Yamamoto yamamoto.rei@jp.fujitsu.com
commit 08d835dff916bfe8f45acc7b92c7af6c4081c8a7 upstream.
If CPUs on a node are offline at boot time, the number of nodes is different when building affinity masks for present cpus and when building affinity masks for possible cpus. This causes the following problem:
In the case that the number of vectors is less than the number of nodes there are cases where bits of masks for present cpus are overwritten when building masks for possible cpus.
Fix this by excluding CPUs, which are not part of the current build mask (present/possible).
[ tglx: Massaged changelog and added comment ]
Fixes: b82592199032 ("genirq/affinity: Spread IRQs to all available NUMA nodes") Signed-off-by: Rei Yamamoto yamamoto.rei@jp.fujitsu.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ming Lei ming.lei@redhat.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220331003309.10891-1-yamamoto.rei@jp.fujitsu.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/affinity.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -269,8 +269,9 @@ static int __irq_build_affinity_masks(un */ if (numvecs <= nodes) { for_each_node_mask(n, nodemsk) { - cpumask_or(&masks[curvec].mask, &masks[curvec].mask, - node_to_cpumask[n]); + /* Ensure that only CPUs which are in both masks are set */ + cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); + cpumask_or(&masks[curvec].mask, &masks[curvec].mask, nmsk); if (++curvec == last_affv) curvec = firstvec; }
From: Paul Gortmaker paul.gortmaker@windriver.com
commit 40e97e42961f8c6cc7bd5fe67cc18417e02d78f1 upstream.
While running some testing on code that happened to allow the variable tick_nohz_full_running to get set but with no "possible" NOHZ cores to back up that setting, this warning triggered:
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) WARN_ON(tick_nohz_full_running);
The console was overwhemled with an endless stream of one WARN per tick per core and there was no way to even see what was going on w/o using a serial console to capture it and then trace it back to this.
Change it to WARN_ON_ONCE().
Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Paul Gortmaker paul.gortmaker@windriver.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211206145950.10927-3-paul.gortmaker@windriver.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -136,7 +136,7 @@ static void tick_sched_do_timer(struct t */ if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) { #ifdef CONFIG_NO_HZ_FULL - WARN_ON(tick_nohz_full_running); + WARN_ON_ONCE(tick_nohz_full_running); #endif tick_do_timer_cpu = cpu; }
From: Nathan Chancellor nathan@kernel.org
commit 83a1cde5c74bfb44b49cb2a940d044bb2380f4ea upstream.
With newer versions of GCC, there is a panic in da850_evm_config_emac() when booting multi_v5_defconfig in QEMU under the palmetto-bmc machine:
Unable to handle kernel NULL pointer dereference at virtual address 00000020 pgd = (ptrval) [00000020] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0 #1 Hardware name: Generic DT based system PC is at da850_evm_config_emac+0x1c/0x120 LR is at do_one_initcall+0x50/0x1e0
The emac_pdata pointer in soc_info is NULL because davinci_soc_info only gets populated on davinci machines but da850_evm_config_emac() is called on all machines via device_initcall().
Move the rmii_en assignment below the machine check so that it is only dereferenced when running on a supported SoC.
Fixes: bae105879f2f ("davinci: DA850/OMAP-L138 EVM: implement autodetect of RMII PHY") Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Bartosz Golaszewski brgl@bgdev.pl Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/YcS4xVWs6bQlQSPC@archlinux-ax161/ Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-davinci/board-da850-evm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/arm/mach-davinci/board-da850-evm.c +++ b/arch/arm/mach-davinci/board-da850-evm.c @@ -1101,11 +1101,13 @@ static int __init da850_evm_config_emac( int ret; u32 val; struct davinci_soc_info *soc_info = &davinci_soc_info; - u8 rmii_en = soc_info->emac_pdata->rmii_en; + u8 rmii_en;
if (!machine_is_davinci_da850_evm()) return 0;
+ rmii_en = soc_info->emac_pdata->rmii_en; + cfg_chip3_base = DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP3_REG);
val = __raw_readl(cfg_chip3_base);
From: Mikulas Patocka mpatocka@redhat.com
commit 08c1af8f1c13bbf210f1760132f4df24d0ed46d6 upstream.
It is possible to set up dm-integrity in such a way that the "tag_size" parameter is less than the actual digest size. In this situation, a part of the digest beyond tag_size is ignored.
In this case, dm-integrity would write beyond the end of the ic->recalc_tags array and corrupt memory. The corruption happened in integrity_recalc->integrity_sector_checksum->crypto_shash_final.
Fix this corruption by increasing the tags array so that it has enough padding at the end to accomodate the loop in integrity_recalc() being able to write a full digest size for the last member of the tags array.
Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-integrity.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -4232,6 +4232,7 @@ try_smaller_buffer: }
if (ic->internal_hash) { + size_t recalc_tags_size; ic->recalc_wq = alloc_workqueue("dm-integrity-recalc", WQ_MEM_RECLAIM, 1); if (!ic->recalc_wq ) { ti->error = "Cannot allocate workqueue"; @@ -4245,8 +4246,10 @@ try_smaller_buffer: r = -ENOMEM; goto bad; } - ic->recalc_tags = kvmalloc_array(RECALC_SECTORS >> ic->sb->log2_sectors_per_block, - ic->tag_size, GFP_KERNEL); + recalc_tags_size = (RECALC_SECTORS >> ic->sb->log2_sectors_per_block) * ic->tag_size; + if (crypto_shash_digestsize(ic->internal_hash) > ic->tag_size) + recalc_tags_size += crypto_shash_digestsize(ic->internal_hash) - ic->tag_size; + ic->recalc_tags = kvmalloc(recalc_tags_size, GFP_KERNEL); if (!ic->recalc_tags) { ti->error = "Cannot allocate tags for recalculating"; r = -ENOMEM;
From: Nadav Amit namit@vmware.com
commit 9e949a3886356fe9112c6f6f34a6e23d1d35407f upstream.
The check in flush_smp_call_function_queue() for callbacks that are sent to offline CPUs currently checks whether the queue is empty.
However, flush_smp_call_function_queue() has just deleted all the callbacks from the queue and moved all the entries into a local list. This checks would only be positive if some callbacks were added in the short time after llist_del_all() was called. This does not seem to be the intention of this check.
Change the check to look at the local list to which the entries were moved instead of the queue from which all the callbacks were just removed.
Fixes: 8d056c48e4862 ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline") Signed-off-by: Nadav Amit namit@vmware.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20220319072015.1495036-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/smp.c +++ b/kernel/smp.c @@ -346,7 +346,7 @@ static void flush_smp_call_function_queu
/* There shouldn't be any pending callbacks on an offline CPU. */ if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) && - !warned && !llist_empty(head))) { + !warned && entry != NULL)) { warned = true; WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
From: Martin Povišer povik+lin@cutebit.org
commit bd8963e602c77adc76dbbbfc3417c3cf14fed76b upstream.
Wait for completion of write transfers before returning from the driver. At first sight it may seem advantageous to leave write transfers queued for the controller to carry out on its own time, but there's a couple of issues with it:
* Driver doesn't check for FIFO space.
* The queued writes can complete while the driver is in its I2C read transfer path which means it will get confused by the raising of XEN (the 'transaction ended' signal). This can cause a spurious ENODATA error due to premature reading of the MRXFIFO register.
Adding the wait fixes some unreliability issues with the driver. There's some efficiency cost to it (especially with pasemi_smb_waitready doing its polling), but that will be alleviated once the driver receives interrupt support.
Fixes: beb58aa39e6e ("i2c: PA Semi SMBus driver") Signed-off-by: Martin Povišer povik+lin@cutebit.org Reviewed-by: Sven Peter sven@svenpeter.dev Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-pasemi.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/i2c/busses/i2c-pasemi.c +++ b/drivers/i2c/busses/i2c-pasemi.c @@ -137,6 +137,12 @@ static int pasemi_i2c_xfer_msg(struct i2
TXFIFO_WR(smbus, msg->buf[msg->len-1] | (stop ? MTXFIFO_STOP : 0)); + + if (stop) { + err = pasemi_smb_waitready(smbus); + if (err) + goto reset_out; + } }
return 0;
From: Anna-Maria Behnsen anna-maria@linutronix.de
commit c54bc0fc84214b203f7a0ebfd1bd308ce2abe920 upstream.
When the timer base is empty, base::next_expiry is set to base::clk + NEXT_TIMER_MAX_DELTA and base::next_expiry_recalc is false. When no timer is queued until jiffies reaches base::next_expiry value, the warning for not finding any expired timer and base::next_expiry_recalc is false in __run_timers() triggers.
To prevent triggering the warning in this valid scenario base::timers_pending needs to be added to the warning condition.
Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary") Reported-by: Johannes Berg johannes@sipsolutions.net Signed-off-by: Anna-Maria Behnsen anna-maria@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Frederic Weisbecker frederic@kernel.org Link: https://lore.kernel.org/r/20220405191732.7438-3-anna-maria@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/timer.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1738,11 +1738,14 @@ static inline void __run_timers(struct t time_after_eq(jiffies, base->next_expiry)) { levels = collect_expired_timers(base, heads); /* - * The only possible reason for not finding any expired - * timer at this clk is that all matching timers have been - * dequeued. + * The two possible reasons for not finding any expired + * timer at this clk are that all matching timers have been + * dequeued or no timer has been queued since + * base::next_expiry was set to base::clk + + * NEXT_TIMER_MAX_DELTA. */ - WARN_ON_ONCE(!levels && !base->next_expiry_recalc); + WARN_ON_ONCE(!levels && !base->next_expiry_recalc + && base->timers_pending); base->clk++; base->next_expiry = __next_timer_interrupt(base);
From: Chao Gao chao.gao@intel.com
commit 9e02977bfad006af328add9434c8bffa40e053bb upstream.
When we looked into FIO performance with swiotlb enabled in VM, we found swiotlb_bounce() is always called one more time than expected for each DMA read request.
It turns out that the bounce buffer is copied to original DMA buffer twice after the completion of a DMA request (one is done by in dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()). But the content in bounce buffer actually doesn't change between the two rounds of copy. So, one round of copy is redundant.
Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to skip the memory copy in it.
This fix increases FIO 64KB sequential read throughput in a guest with swiotlb=force by 5.6%.
Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code") Reported-by: Wang Zhaoyang1 zhaoyang1.wang@intel.com Reported-by: Gao Liang liang.gao@intel.com Signed-off-by: Chao Gao chao.gao@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/dma/direct.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/dma/direct.h +++ b/kernel/dma/direct.h @@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page dma_direct_sync_single_for_cpu(dev, addr, size, dir);
if (unlikely(is_swiotlb_buffer(phys))) - swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs); + swiotlb_tbl_unmap_single(dev, phys, size, size, dir, + attrs | DMA_ATTR_SKIP_CPU_SYNC); } #endif /* _KERNEL_DMA_DIRECT_H */
From: Mike Christie michael.christie@oracle.com
commit 0aadafb5c34403a7cced1a8d61877048dc059f70 upstream.
This patch fixes a bug where when using iSCSI offload we can free an endpoint while userspace still thinks it's active. That then causes the endpoint ID to be reused for a new connection's endpoint while userspace still thinks the ID is for the original connection. Userspace will then end up disconnecting a running connection's endpoint or trying to bind to another connection's endpoint.
This bug is a regression added in:
Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling")
where we added a in kernel ep_disconnect call to fix a bug in:
Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space")
where we would call stop_conn without having done ep_disconnect. This early ep_disconnect call will then free the endpoint and it's ID while userspace still thinks the ID is valid.
Fix the early release of the ID by having the in kernel recovery code keep a reference to the endpoint until userspace has called into the kernel to finish cleaning up the endpoint/connection. It requires the previous commit "scsi: iscsi: Release endpoint ID when its freed" which moved the freeing of the ID until when the endpoint is released.
Link: https://lore.kernel.org/r/20220408001314.5014-5-michael.christie@oracle.com Fixes: 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") Tested-by: Manish Rangankar mrangankar@marvell.com Reviewed-by: Lee Duncan lduncan@suse.com Reviewed-by: Chris Leech cleech@redhat.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/scsi_transport_iscsi.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2273,7 +2273,11 @@ static void iscsi_if_disconnect_bound_ep mutex_unlock(&conn->ep_mutex);
flush_work(&conn->cleanup_work); - + /* + * Userspace is now done with the EP so we can release the ref + * iscsi_cleanup_conn_work_fn took. + */ + iscsi_put_endpoint(ep); mutex_lock(&conn->ep_mutex); } } @@ -2355,6 +2359,12 @@ static void iscsi_cleanup_conn_work_fn(s return; }
+ /* + * Get a ref to the ep, so we don't release its ID until after + * userspace is done referencing it in iscsi_if_disconnect_bound_ep. + */ + if (conn->ep) + get_device(&conn->ep->dev); iscsi_ep_disconnect(conn, false);
if (system_state != SYSTEM_RUNNING) {
From: Mike Christie michael.christie@oracle.com
commit 03690d81974535f228e892a14f0d2d44404fe555 upstream.
If a driver raises a connection error before the connection is bound, we can leave a cleanup_work queued that can later run and disconnect/stop a connection that is logged in. The problem is that drivers can call iscsi_conn_error_event for endpoints that are connected but not yet bound when something like the network port they are using is brought down. iscsi_cleanup_conn_work_fn will check for this and exit early, but if the cleanup_work is stuck behind other works, it might not get run until after userspace has done ep_disconnect. Because the endpoint is not yet bound there was no way for ep_disconnect to flush the work.
The bug of leaving stop_conns queued was added in:
Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling")
and:
Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space")
was supposed to fix it, but left this case.
This patch moves the conn state check to before we even queue the work so we can avoid queueing.
Link: https://lore.kernel.org/r/20220408001314.5014-7-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar mrangankar@marvell.com Reviewed-by: Lee Duncan <lduncan@@suse.com> Reviewed-by: Chris Leech cleech@redhat.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/scsi_transport_iscsi.c | 65 +++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 29 deletions(-)
--- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -2224,10 +2224,10 @@ static void iscsi_stop_conn(struct iscsi
switch (flag) { case STOP_CONN_RECOVER: - conn->state = ISCSI_CONN_FAILED; + WRITE_ONCE(conn->state, ISCSI_CONN_FAILED); break; case STOP_CONN_TERM: - conn->state = ISCSI_CONN_DOWN; + WRITE_ONCE(conn->state, ISCSI_CONN_DOWN); break; default: iscsi_cls_conn_printk(KERN_ERR, conn, "invalid stop flag %d\n", @@ -2245,7 +2245,7 @@ static void iscsi_ep_disconnect(struct i struct iscsi_endpoint *ep;
ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n"); - conn->state = ISCSI_CONN_FAILED; + WRITE_ONCE(conn->state, ISCSI_CONN_FAILED);
if (!conn->ep || !session->transport->ep_disconnect) return; @@ -2345,21 +2345,6 @@ static void iscsi_cleanup_conn_work_fn(s
mutex_lock(&conn->ep_mutex); /* - * If we are not at least bound there is nothing for us to do. Userspace - * will do a ep_disconnect call if offload is used, but will not be - * doing a stop since there is nothing to clean up, so we have to clear - * the cleanup bit here. - */ - if (conn->state != ISCSI_CONN_BOUND && conn->state != ISCSI_CONN_UP) { - ISCSI_DBG_TRANS_CONN(conn, "Got error while conn is already failed. Ignoring.\n"); - spin_lock_irq(&conn->lock); - clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags); - spin_unlock_irq(&conn->lock); - mutex_unlock(&conn->ep_mutex); - return; - } - - /* * Get a ref to the ep, so we don't release its ID until after * userspace is done referencing it in iscsi_if_disconnect_bound_ep. */ @@ -2425,7 +2410,7 @@ iscsi_create_conn(struct iscsi_cls_sessi INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn); conn->transport = transport; conn->cid = cid; - conn->state = ISCSI_CONN_DOWN; + WRITE_ONCE(conn->state, ISCSI_CONN_DOWN);
/* this is released in the dev's release function */ if (!get_device(&session->dev)) @@ -2613,10 +2598,30 @@ void iscsi_conn_error_event(struct iscsi struct iscsi_internal *priv; int len = nlmsg_total_size(sizeof(*ev)); unsigned long flags; + int state;
spin_lock_irqsave(&conn->lock, flags); - if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) - queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work); + /* + * Userspace will only do a stop call if we are at least bound. And, we + * only need to do the in kernel cleanup if in the UP state so cmds can + * be released to upper layers. If in other states just wait for + * userspace to avoid races that can leave the cleanup_work queued. + */ + state = READ_ONCE(conn->state); + switch (state) { + case ISCSI_CONN_BOUND: + case ISCSI_CONN_UP: + if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, + &conn->flags)) { + queue_work(iscsi_conn_cleanup_workq, + &conn->cleanup_work); + } + break; + default: + ISCSI_DBG_TRANS_CONN(conn, "Got conn error in state %d\n", + state); + break; + } spin_unlock_irqrestore(&conn->lock, flags);
priv = iscsi_if_transport_lookup(conn->transport); @@ -2967,7 +2972,7 @@ iscsi_set_param(struct iscsi_transport * char *data = (char*)ev + sizeof(*ev); struct iscsi_cls_conn *conn; struct iscsi_cls_session *session; - int err = 0, value = 0; + int err = 0, value = 0, state;
if (ev->u.set_param.len > PAGE_SIZE) return -EINVAL; @@ -2984,8 +2989,8 @@ iscsi_set_param(struct iscsi_transport * session->recovery_tmo = value; break; default: - if ((conn->state == ISCSI_CONN_BOUND) || - (conn->state == ISCSI_CONN_UP)) { + state = READ_ONCE(conn->state); + if (state == ISCSI_CONN_BOUND || state == ISCSI_CONN_UP) { err = transport->set_param(conn, ev->u.set_param.param, data, ev->u.set_param.len); } else { @@ -3781,7 +3786,7 @@ static int iscsi_if_transport_conn(struc ev->u.b_conn.transport_eph, ev->u.b_conn.is_leading); if (!ev->r.retcode) - conn->state = ISCSI_CONN_BOUND; + WRITE_ONCE(conn->state, ISCSI_CONN_BOUND);
if (ev->r.retcode || !transport->ep_connect) break; @@ -3800,7 +3805,8 @@ static int iscsi_if_transport_conn(struc case ISCSI_UEVENT_START_CONN: ev->r.retcode = transport->start_conn(conn); if (!ev->r.retcode) - conn->state = ISCSI_CONN_UP; + WRITE_ONCE(conn->state, ISCSI_CONN_UP); + break; case ISCSI_UEVENT_SEND_PDU: pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev); @@ -4108,10 +4114,11 @@ static ssize_t show_conn_state(struct de { struct iscsi_cls_conn *conn = iscsi_dev_to_conn(dev->parent); const char *state = "unknown"; + int conn_state = READ_ONCE(conn->state);
- if (conn->state >= 0 && - conn->state < ARRAY_SIZE(connection_state_names)) - state = connection_state_names[conn->state]; + if (conn_state >= 0 && + conn_state < ARRAY_SIZE(connection_state_names)) + state = connection_state_names[conn_state];
return sysfs_emit(buf, "%s\n", state); }
From: Duoming Zhou duoming@zju.edu.cn
commit d01ffb9eee4af165d83b08dd73ebdf9fe94a519b upstream.
If we dereference ax25_dev after we call kfree(ax25_dev) in ax25_dev_device_down(), it will lead to concurrency UAF bugs. There are eight syscall functions suffer from UAF bugs, include ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(), ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and ax25_info_show().
One of the concurrency UAF can be shown as below:
(USE) | (FREE) | ax25_device_event | ax25_dev_device_down ax25_bind | ... ... | kfree(ax25_dev) ax25_fillin_cb() | ... ax25_fillin_cb_from_dev() | ... |
The root cause of UAF bugs is that kfree(ax25_dev) in ax25_dev_device_down() is not protected by any locks. When ax25_dev, which there are still pointers point to, is released, the concurrency UAF bug will happen.
This patch introduces refcount into ax25_dev in order to guarantee that there are no pointers point to it when ax25_dev is released.
Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net [OP: backport to 5.10: adjusted context] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ax25.h | 10 ++++++++++ net/ax25/af_ax25.c | 2 ++ net/ax25/ax25_dev.c | 12 ++++++++++-- net/ax25/ax25_route.c | 3 +++ 4 files changed, 25 insertions(+), 2 deletions(-)
--- a/include/net/ax25.h +++ b/include/net/ax25.h @@ -236,6 +236,7 @@ typedef struct ax25_dev { #if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER) ax25_dama_info dama; #endif + refcount_t refcount; } ax25_dev;
typedef struct ax25_cb { @@ -290,6 +291,15 @@ static __inline__ void ax25_cb_put(ax25_ } }
+#define ax25_dev_hold(__ax25_dev) \ + refcount_inc(&((__ax25_dev)->refcount)) + +static __inline__ void ax25_dev_put(ax25_dev *ax25_dev) +{ + if (refcount_dec_and_test(&ax25_dev->refcount)) { + kfree(ax25_dev); + } +} static inline __be16 ax25_type_trans(struct sk_buff *skb, struct net_device *dev) { skb->dev = dev; --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -98,6 +98,7 @@ again: spin_unlock_bh(&ax25_list_lock); lock_sock(sk); s->ax25_dev = NULL; + ax25_dev_put(ax25_dev); release_sock(sk); ax25_disconnect(s, ENETUNREACH); spin_lock_bh(&ax25_list_lock); @@ -446,6 +447,7 @@ static int ax25_ctl_ioctl(const unsigned }
out_put: + ax25_dev_put(ax25_dev); ax25_cb_put(ax25); return ret;
--- a/net/ax25/ax25_dev.c +++ b/net/ax25/ax25_dev.c @@ -37,6 +37,7 @@ ax25_dev *ax25_addr_ax25dev(ax25_address for (ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next) if (ax25cmp(addr, (ax25_address *)ax25_dev->dev->dev_addr) == 0) { res = ax25_dev; + ax25_dev_hold(ax25_dev); } spin_unlock_bh(&ax25_dev_lock);
@@ -56,6 +57,7 @@ void ax25_dev_device_up(struct net_devic return; }
+ refcount_set(&ax25_dev->refcount, 1); dev->ax25_ptr = ax25_dev; ax25_dev->dev = dev; dev_hold(dev); @@ -83,6 +85,7 @@ void ax25_dev_device_up(struct net_devic spin_lock_bh(&ax25_dev_lock); ax25_dev->next = ax25_dev_list; ax25_dev_list = ax25_dev; + ax25_dev_hold(ax25_dev); spin_unlock_bh(&ax25_dev_lock);
ax25_register_dev_sysctl(ax25_dev); @@ -112,20 +115,22 @@ void ax25_dev_device_down(struct net_dev
if ((s = ax25_dev_list) == ax25_dev) { ax25_dev_list = s->next; + ax25_dev_put(ax25_dev); spin_unlock_bh(&ax25_dev_lock); dev->ax25_ptr = NULL; dev_put(dev); - kfree(ax25_dev); + ax25_dev_put(ax25_dev); return; }
while (s != NULL && s->next != NULL) { if (s->next == ax25_dev) { s->next = ax25_dev->next; + ax25_dev_put(ax25_dev); spin_unlock_bh(&ax25_dev_lock); dev->ax25_ptr = NULL; dev_put(dev); - kfree(ax25_dev); + ax25_dev_put(ax25_dev); return; }
@@ -133,6 +138,7 @@ void ax25_dev_device_down(struct net_dev } spin_unlock_bh(&ax25_dev_lock); dev->ax25_ptr = NULL; + ax25_dev_put(ax25_dev); }
int ax25_fwd_ioctl(unsigned int cmd, struct ax25_fwd_struct *fwd) @@ -149,6 +155,7 @@ int ax25_fwd_ioctl(unsigned int cmd, str if (ax25_dev->forward != NULL) return -EINVAL; ax25_dev->forward = fwd_dev->dev; + ax25_dev_put(fwd_dev); break;
case SIOCAX25DELFWD: @@ -161,6 +168,7 @@ int ax25_fwd_ioctl(unsigned int cmd, str return -EINVAL; }
+ ax25_dev_put(ax25_dev); return 0; }
--- a/net/ax25/ax25_route.c +++ b/net/ax25/ax25_route.c @@ -116,6 +116,7 @@ static int __must_check ax25_rt_add(stru ax25_rt->dev = ax25_dev->dev; ax25_rt->digipeat = NULL; ax25_rt->ip_mode = ' '; + ax25_dev_put(ax25_dev); if (route->digi_count != 0) { if ((ax25_rt->digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { write_unlock_bh(&ax25_route_lock); @@ -172,6 +173,7 @@ static int ax25_rt_del(struct ax25_route } } } + ax25_dev_put(ax25_dev); write_unlock_bh(&ax25_route_lock);
return 0; @@ -214,6 +216,7 @@ static int ax25_rt_opt(struct ax25_route }
out: + ax25_dev_put(ax25_dev); write_unlock_bh(&ax25_route_lock); return err; }
From: Duoming Zhou duoming@zju.edu.cn
commit 87563a043cef044fed5db7967a75741cc16ad2b1 upstream.
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ax25.h | 8 +++++--- net/ax25/af_ax25.c | 12 ++++++++---- net/ax25/ax25_dev.c | 24 +++++++++++++++++------- net/ax25/ax25_route.c | 16 +++++++++++----- 4 files changed, 41 insertions(+), 19 deletions(-)
--- a/include/net/ax25.h +++ b/include/net/ax25.h @@ -291,10 +291,12 @@ static __inline__ void ax25_cb_put(ax25_ } }
-#define ax25_dev_hold(__ax25_dev) \ - refcount_inc(&((__ax25_dev)->refcount)) +static inline void ax25_dev_hold(ax25_dev *ax25_dev) +{ + refcount_inc(&ax25_dev->refcount); +}
-static __inline__ void ax25_dev_put(ax25_dev *ax25_dev) +static inline void ax25_dev_put(ax25_dev *ax25_dev) { if (refcount_dec_and_test(&ax25_dev->refcount)) { kfree(ax25_dev); --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -366,21 +366,25 @@ static int ax25_ctl_ioctl(const unsigned if (copy_from_user(&ax25_ctl, arg, sizeof(ax25_ctl))) return -EFAULT;
- if ((ax25_dev = ax25_addr_ax25dev(&ax25_ctl.port_addr)) == NULL) - return -ENODEV; - if (ax25_ctl.digi_count > AX25_MAX_DIGIS) return -EINVAL;
if (ax25_ctl.arg > ULONG_MAX / HZ && ax25_ctl.cmd != AX25_KILL) return -EINVAL;
+ ax25_dev = ax25_addr_ax25dev(&ax25_ctl.port_addr); + if (!ax25_dev) + return -ENODEV; + digi.ndigi = ax25_ctl.digi_count; for (k = 0; k < digi.ndigi; k++) digi.calls[k] = ax25_ctl.digi_addr[k];
- if ((ax25 = ax25_find_cb(&ax25_ctl.source_addr, &ax25_ctl.dest_addr, &digi, ax25_dev->dev)) == NULL) + ax25 = ax25_find_cb(&ax25_ctl.source_addr, &ax25_ctl.dest_addr, &digi, ax25_dev->dev); + if (!ax25) { + ax25_dev_put(ax25_dev); return -ENOTCONN; + }
switch (ax25_ctl.cmd) { case AX25_KILL: --- a/net/ax25/ax25_dev.c +++ b/net/ax25/ax25_dev.c @@ -85,8 +85,8 @@ void ax25_dev_device_up(struct net_devic spin_lock_bh(&ax25_dev_lock); ax25_dev->next = ax25_dev_list; ax25_dev_list = ax25_dev; - ax25_dev_hold(ax25_dev); spin_unlock_bh(&ax25_dev_lock); + ax25_dev_hold(ax25_dev);
ax25_register_dev_sysctl(ax25_dev); } @@ -115,8 +115,8 @@ void ax25_dev_device_down(struct net_dev
if ((s = ax25_dev_list) == ax25_dev) { ax25_dev_list = s->next; - ax25_dev_put(ax25_dev); spin_unlock_bh(&ax25_dev_lock); + ax25_dev_put(ax25_dev); dev->ax25_ptr = NULL; dev_put(dev); ax25_dev_put(ax25_dev); @@ -126,8 +126,8 @@ void ax25_dev_device_down(struct net_dev while (s != NULL && s->next != NULL) { if (s->next == ax25_dev) { s->next = ax25_dev->next; - ax25_dev_put(ax25_dev); spin_unlock_bh(&ax25_dev_lock); + ax25_dev_put(ax25_dev); dev->ax25_ptr = NULL; dev_put(dev); ax25_dev_put(ax25_dev); @@ -150,25 +150,35 @@ int ax25_fwd_ioctl(unsigned int cmd, str
switch (cmd) { case SIOCAX25ADDFWD: - if ((fwd_dev = ax25_addr_ax25dev(&fwd->port_to)) == NULL) + fwd_dev = ax25_addr_ax25dev(&fwd->port_to); + if (!fwd_dev) { + ax25_dev_put(ax25_dev); return -EINVAL; - if (ax25_dev->forward != NULL) + } + if (ax25_dev->forward) { + ax25_dev_put(fwd_dev); + ax25_dev_put(ax25_dev); return -EINVAL; + } ax25_dev->forward = fwd_dev->dev; ax25_dev_put(fwd_dev); + ax25_dev_put(ax25_dev); break;
case SIOCAX25DELFWD: - if (ax25_dev->forward == NULL) + if (!ax25_dev->forward) { + ax25_dev_put(ax25_dev); return -EINVAL; + } ax25_dev->forward = NULL; + ax25_dev_put(ax25_dev); break;
default: + ax25_dev_put(ax25_dev); return -EINVAL; }
- ax25_dev_put(ax25_dev); return 0; }
--- a/net/ax25/ax25_route.c +++ b/net/ax25/ax25_route.c @@ -75,11 +75,13 @@ static int __must_check ax25_rt_add(stru ax25_dev *ax25_dev; int i;
- if ((ax25_dev = ax25_addr_ax25dev(&route->port_addr)) == NULL) - return -EINVAL; if (route->digi_count > AX25_MAX_DIGIS) return -EINVAL;
+ ax25_dev = ax25_addr_ax25dev(&route->port_addr); + if (!ax25_dev) + return -EINVAL; + write_lock_bh(&ax25_route_lock);
ax25_rt = ax25_route_list; @@ -91,6 +93,7 @@ static int __must_check ax25_rt_add(stru if (route->digi_count != 0) { if ((ax25_rt->digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { write_unlock_bh(&ax25_route_lock); + ax25_dev_put(ax25_dev); return -ENOMEM; } ax25_rt->digipeat->lastrepeat = -1; @@ -101,6 +104,7 @@ static int __must_check ax25_rt_add(stru } } write_unlock_bh(&ax25_route_lock); + ax25_dev_put(ax25_dev); return 0; } ax25_rt = ax25_rt->next; @@ -108,6 +112,7 @@ static int __must_check ax25_rt_add(stru
if ((ax25_rt = kmalloc(sizeof(ax25_route), GFP_ATOMIC)) == NULL) { write_unlock_bh(&ax25_route_lock); + ax25_dev_put(ax25_dev); return -ENOMEM; }
@@ -116,11 +121,11 @@ static int __must_check ax25_rt_add(stru ax25_rt->dev = ax25_dev->dev; ax25_rt->digipeat = NULL; ax25_rt->ip_mode = ' '; - ax25_dev_put(ax25_dev); if (route->digi_count != 0) { if ((ax25_rt->digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { write_unlock_bh(&ax25_route_lock); kfree(ax25_rt); + ax25_dev_put(ax25_dev); return -ENOMEM; } ax25_rt->digipeat->lastrepeat = -1; @@ -133,6 +138,7 @@ static int __must_check ax25_rt_add(stru ax25_rt->next = ax25_route_list; ax25_route_list = ax25_rt; write_unlock_bh(&ax25_route_lock); + ax25_dev_put(ax25_dev);
return 0; } @@ -173,8 +179,8 @@ static int ax25_rt_del(struct ax25_route } } } - ax25_dev_put(ax25_dev); write_unlock_bh(&ax25_route_lock); + ax25_dev_put(ax25_dev);
return 0; } @@ -216,8 +222,8 @@ static int ax25_rt_opt(struct ax25_route }
out: - ax25_dev_put(ax25_dev); write_unlock_bh(&ax25_route_lock); + ax25_dev_put(ax25_dev); return err; }
From: Duoming Zhou duoming@zju.edu.cn
commit feef318c855a361a1eccd880f33e88c460eb63b4 upstream.
The ax25_kill_by_device() will set s->ax25_dev = NULL and call ax25_disconnect() to change states of ax25_cb and sock, if we call ax25_bind() before ax25_kill_by_device().
However, if we call ax25_bind() again between the window of ax25_kill_by_device() and ax25_dev_device_down(), the values and states changed by ax25_kill_by_device() will be reassigned.
Finally, ax25_dev_device_down() will deallocate net_device. If we dereference net_device in syscall functions such as ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname() and ax25_info_show(), a UAF bug will occur.
One of the possible race conditions is shown below:
(USE) | (FREE) ax25_bind() | | ax25_kill_by_device() ax25_bind() | ax25_connect() | ... | ax25_dev_device_down() | ... | dev_put_track(dev, ...) //FREE ax25_release() | ... ax25_send_control() | alloc_skb() //USE |
the corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210 ... Call Trace: ... ax25_send_control+0x43/0x210 ax25_release+0x2db/0x3b0 __sock_release+0x6d/0x120 sock_close+0xf/0x20 __fput+0x11f/0x420 ... Allocated by task 1283: ... __kasan_kmalloc+0x81/0xa0 alloc_netdev_mqs+0x5a/0x680 mkiss_open+0x6c/0x380 tty_ldisc_open+0x55/0x90 ... Freed by task 1969: ... kfree+0xa3/0x2c0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 tty_ldisc_kill+0x3e/0x80 ...
In order to fix these UAF bugs caused by rebinding operation, this patch adds dev_hold_track() into ax25_bind() and corresponding dev_put_track() into ax25_kill_by_device().
Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net [OP: backport to 5.10: adjust dev_put_track()->dev_put() and dev_hold_track()->dev_hold()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -98,6 +98,7 @@ again: spin_unlock_bh(&ax25_list_lock); lock_sock(sk); s->ax25_dev = NULL; + dev_put(ax25_dev->dev); ax25_dev_put(ax25_dev); release_sock(sk); ax25_disconnect(s, ENETUNREACH); @@ -1122,8 +1123,10 @@ static int ax25_bind(struct socket *sock } }
- if (ax25_dev != NULL) + if (ax25_dev) { ax25_fillin_cb(ax25, ax25_dev); + dev_hold(ax25_dev->dev); + }
done: ax25_cb_add(ax25);
From: Duoming Zhou duoming@zju.edu.cn
commit 9fd75b66b8f68498454d685dc4ba13192ae069b0 upstream.
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") increase the refcounts of ax25_dev and net_device in ax25_bind() and decrease the matching refcounts in ax25_kill_by_device() in order to prevent UAF bugs, but there are reference count leaks.
The root cause of refcount leaks is shown below:
(Thread 1) | (Thread 2) ax25_bind() | ... | ax25_addr_ax25dev() | ax25_dev_hold() //(1) | ... | dev_hold_track() //(2) | ... | ax25_destroy_socket() | ax25_cb_del() | ... | hlist_del_init() //(3) | | (Thread 3) | ax25_kill_by_device() | ... | ax25_for_each(s, &ax25_list) { | if (s->ax25_dev == ax25_dev) //(4) | ... |
Firstly, we use ax25_bind() to increase the refcount of ax25_dev in position (1) and increase the refcount of net_device in position (2). Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete ax25_cb in hlist in position (3) before calling ax25_kill_by_device(). Finally, the decrements of refcounts in ax25_kill_by_device() will not be executed, because no s->ax25_dev equals to ax25_dev in position (4).
This patch adds decrements of refcounts in ax25_release() and use lock_sock() to do synchronization. If refcounts decrease in ax25_release(), the decrements of refcounts in ax25_kill_by_device() will not be executed and vice versa.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Fixes: 87563a043cef ("ax25: fix reference count leaks of ax25_dev") Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") Reported-by: Thomas Osterried thomas@osterried.de Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net [OP: backport to 5.10: adjust dev_put_track()->dev_put()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -98,8 +98,10 @@ again: spin_unlock_bh(&ax25_list_lock); lock_sock(sk); s->ax25_dev = NULL; - dev_put(ax25_dev->dev); - ax25_dev_put(ax25_dev); + if (sk->sk_socket) { + dev_put(ax25_dev->dev); + ax25_dev_put(ax25_dev); + } release_sock(sk); ax25_disconnect(s, ENETUNREACH); spin_lock_bh(&ax25_list_lock); @@ -978,14 +980,20 @@ static int ax25_release(struct socket *s { struct sock *sk = sock->sk; ax25_cb *ax25; + ax25_dev *ax25_dev;
if (sk == NULL) return 0;
sock_hold(sk); - sock_orphan(sk); lock_sock(sk); + sock_orphan(sk); ax25 = sk_to_ax25(sk); + ax25_dev = ax25->ax25_dev; + if (ax25_dev) { + dev_put(ax25_dev->dev); + ax25_dev_put(ax25_dev); + }
if (sk->sk_type == SOCK_SEQPACKET) { switch (ax25->state) {
From: Duoming Zhou duoming@zju.edu.cn
commit 5352a761308397a0e6250fdc629bb3f615b94747 upstream.
There are UAF bugs in ax25_send_control(), when we call ax25_release() to deallocate ax25_dev. The possible race condition is shown below:
(Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25->state = AX25_STATE_1 | ... | ax25_dev_device_down() //(3)
(Thread 3) ax25_release() | ax25_dev_put() //(4) FREE | case AX25_STATE_1: | ax25_send_control() | alloc_skb() //USE |
The refcount of ax25_dev increases in position (1) and (2), and decreases in position (3) and (4). The ax25_dev will be freed before dereference sites in ax25_send_control().
The following is part of the report:
[ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210 [ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602 [ 102.297448] Call Trace: [ 102.303751] ax25_send_control+0x33/0x210 [ 102.303751] ax25_release+0x356/0x450 [ 102.305431] __sock_release+0x6d/0x120 [ 102.305431] sock_close+0xf/0x20 [ 102.305431] __fput+0x11f/0x420 [ 102.305431] task_work_run+0x86/0xd0 [ 102.307130] get_signal+0x1075/0x1220 [ 102.308253] arch_do_signal_or_restart+0x1df/0xc00 [ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0 [ 102.308253] syscall_exit_to_user_mode+0x19/0x50 [ 102.308253] do_syscall_64+0x48/0x90 [ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 102.308253] RIP: 0033:0x405ae7
This patch defers the free operation of ax25_dev and net_device after all corresponding dereference sites in ax25_release() to avoid UAF.
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: Paolo Abeni pabeni@redhat.com [OP: backport to 5.10: adjust dev_put_track()->dev_put()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -990,10 +990,6 @@ static int ax25_release(struct socket *s sock_orphan(sk); ax25 = sk_to_ax25(sk); ax25_dev = ax25->ax25_dev; - if (ax25_dev) { - dev_put(ax25_dev->dev); - ax25_dev_put(ax25_dev); - }
if (sk->sk_type == SOCK_SEQPACKET) { switch (ax25->state) { @@ -1055,6 +1051,10 @@ static int ax25_release(struct socket *s sk->sk_state_change(sk); ax25_destroy_socket(ax25); } + if (ax25_dev) { + dev_put(ax25_dev->dev); + ax25_dev_put(ax25_dev); + }
sock->sk = NULL; release_sock(sk);
From: Duoming Zhou duoming@zju.edu.cn
commit 7ec02f5ac8a5be5a3f20611731243dc5e1d9ba10 upstream.
The ax25_disconnect() in ax25_kill_by_device() is not protected by any locks, thus there is a race condition between ax25_disconnect() and ax25_destroy_socket(). when ax25->sk is assigned as NULL by ax25_destroy_socket(), a NULL pointer dereference bug will occur if site (1) or (2) dereferences ax25->sk.
ax25_kill_by_device() | ax25_release() ax25_disconnect() | ax25_destroy_socket() ... | if(ax25->sk != NULL) | ... ... | ax25->sk = NULL; bh_lock_sock(ax25->sk); //(1) | ... ... | bh_unlock_sock(ax25->sk); //(2)|
This patch moves ax25_disconnect() into lock_sock(), which can synchronize with ax25_destroy_socket() in ax25_release().
Fail log: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000088 ... RIP: 0010:_raw_spin_lock+0x7e/0xd0 ... Call Trace: ax25_disconnect+0xf6/0x220 ax25_device_event+0x187/0x250 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x17d/0x230 rollback_registered_many+0x1f1/0x950 unregister_netdevice_queue+0x133/0x200 unregister_netdev+0x13/0x20 ...
Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -102,8 +102,8 @@ again: dev_put(ax25_dev->dev); ax25_dev_put(ax25_dev); } - release_sock(sk); ax25_disconnect(s, ENETUNREACH); + release_sock(sk); spin_lock_bh(&ax25_list_lock); sock_put(sk); /* The entry could have been deleted from the
From: Duoming Zhou duoming@zju.edu.cn
commit fc6d01ff9ef03b66d4a3a23b46fc3c3d8cf92009 upstream.
The previous commit 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect") move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But there are race conditions that may lead to null pointer dereferences in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use ax25_kill_by_device() to detach the ax25 device.
One of the race conditions that cause null pointer dereferences can be shown as below:
(Thread 1) | (Thread 2) ax25_connect() | ax25_std_establish_data_link() | ax25_start_t1timer() | mod_timer(&ax25->t1timer,..) | | ax25_kill_by_device() (wait a time) | ... | s->ax25_dev = NULL; //(1) ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //(2)| ... ... |
We set null to ax25_cb->ax25_dev in position (1) and dereference the null pointer in position (2).
The corresponding fail log is shown below: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000050 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0 RIP: 0010:ax25_t1timer_expiry+0x12/0x40 ... Call Trace: call_timer_fn+0x21/0x120 __run_timers.part.0+0x1ca/0x250 run_timer_softirq+0x2c/0x60 __do_softirq+0xef/0x2f3 irq_exit_rcu+0xb6/0x100 sysvec_apic_timer_interrupt+0xa2/0xd0 ...
This patch moves ax25_disconnect() before s->ax25_dev = NULL and uses del_timer_sync() to delete timers in ax25_disconnect(). If ax25_disconnect() is called by ax25_kill_by_device() or ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be equal to ENETUNREACH, it will wait all timers to stop before we set null to s->ax25_dev in ax25_kill_by_device().
Fixes: 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 4 ++-- net/ax25/ax25_subr.c | 20 ++++++++++++++------ 2 files changed, 16 insertions(+), 8 deletions(-)
--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -89,20 +89,20 @@ again: sk = s->sk; if (!sk) { spin_unlock_bh(&ax25_list_lock); - s->ax25_dev = NULL; ax25_disconnect(s, ENETUNREACH); + s->ax25_dev = NULL; spin_lock_bh(&ax25_list_lock); goto again; } sock_hold(sk); spin_unlock_bh(&ax25_list_lock); lock_sock(sk); + ax25_disconnect(s, ENETUNREACH); s->ax25_dev = NULL; if (sk->sk_socket) { dev_put(ax25_dev->dev); ax25_dev_put(ax25_dev); } - ax25_disconnect(s, ENETUNREACH); release_sock(sk); spin_lock_bh(&ax25_list_lock); sock_put(sk); --- a/net/ax25/ax25_subr.c +++ b/net/ax25/ax25_subr.c @@ -261,12 +261,20 @@ void ax25_disconnect(ax25_cb *ax25, int { ax25_clear_queues(ax25);
- if (!ax25->sk || !sock_flag(ax25->sk, SOCK_DESTROY)) - ax25_stop_heartbeat(ax25); - ax25_stop_t1timer(ax25); - ax25_stop_t2timer(ax25); - ax25_stop_t3timer(ax25); - ax25_stop_idletimer(ax25); + if (reason == ENETUNREACH) { + del_timer_sync(&ax25->timer); + del_timer_sync(&ax25->t1timer); + del_timer_sync(&ax25->t2timer); + del_timer_sync(&ax25->t3timer); + del_timer_sync(&ax25->idletimer); + } else { + if (!ax25->sk || !sock_flag(ax25->sk, SOCK_DESTROY)) + ax25_stop_heartbeat(ax25); + ax25_stop_t1timer(ax25); + ax25_stop_t2timer(ax25); + ax25_stop_t3timer(ax25); + ax25_stop_idletimer(ax25); + }
ax25->state = AX25_STATE_0;
From: Duoming Zhou duoming@zju.edu.cn
commit 82e31755e55fbcea6a9dfaae5fe4860ade17cbc0 upstream.
There are race conditions that may lead to UAF bugs in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call ax25_release() to deallocate ax25_dev.
One of the UAF bugs caused by ax25_release() is shown below:
(Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | ... | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25_std_establish_data_link() | ax25_start_t1timer() | ax25_dev_device_down() //(3) mod_timer(&ax25->t1timer,..) | | ax25_release() (wait a time) | ... | ax25_dev_put(ax25_dev) //(4)FREE ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //USE| ... ... |
We increase the refcount of ax25_dev in position (1) and (2), and decrease the refcount of ax25_dev in position (3) and (4). The ax25_dev will be freed in position (4) and be used in ax25_t1timer_expiry().
The fail log is shown below: ==============================================================
[ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0 [ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574 [ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14 [ 106.116942] Call Trace: ... [ 106.116942] ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] call_timer_fn+0x122/0x3d0 [ 106.116942] __run_timers.part.0+0x3f6/0x520 [ 106.116942] run_timer_softirq+0x4f/0xb0 [ 106.116942] __do_softirq+0x1c2/0x651 ...
This patch adds del_timer_sync() in ax25_release(), which could ensure that all timers stop before we deallocate ax25_dev.
Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: Paolo Abeni pabeni@redhat.com [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1052,6 +1052,11 @@ static int ax25_release(struct socket *s ax25_destroy_socket(ax25); } if (ax25_dev) { + del_timer_sync(&ax25->timer); + del_timer_sync(&ax25->t1timer); + del_timer_sync(&ax25->t2timer); + del_timer_sync(&ax25->t3timer); + del_timer_sync(&ax25->idletimer); dev_put(ax25_dev->dev); ax25_dev_put(ax25_dev); }
On 4/18/2022 5:12 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Mon, Apr 18, 2022 at 02:12:02PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
Build results: total: 161 pass: 161 fail: 0 Qemu test results: total: 477 pass: 477 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 4/18/22 6:12 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, 18 Apr 2022 at 18:06, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.10.112-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.10.y * git commit: d5c581fe77b5122ed284c7739724414ca5059b0e * git describe: v5.10.111-106-gd5c581fe77b5 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
## Test Regressions (compared to v5.10.110-171-g6c8e5cb264df) No test regressions found.
## Metric Regressions (compared to v5.10.110-171-g6c8e5cb264df) No metric regressions found.
## Test Fixes (compared to v5.10.110-171-g6c8e5cb264df) No metric fixes found.
## Metric Fixes (compared to v5.10.110-171-g6c8e5cb264df) No metric fixes found.
## Test result summary total: 95928, pass: 81529, fail: 646, skip: 13045, xfail: 708
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 296 total, 296 passed, 0 failed * arm64: 47 total, 47 passed, 0 failed * i386: 44 total, 41 passed, 3 failed * mips: 41 total, 38 passed, 3 failed * parisc: 14 total, 14 passed, 0 failed * powerpc: 65 total, 53 passed, 12 failed * riscv: 32 total, 29 passed, 3 failed * s390: 26 total, 26 passed, 0 failed * sh: 26 total, 24 passed, 2 failed * sparc: 14 total, 14 passed, 0 failed * x86_64: 47 total, 47 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * ssuite * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Mon, Apr 18, 2022 at 02:12:02PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.2.1 20220314): 63 configs -> no failure arm (gcc version 11.2.1 20220314): 105 configs -> no new failure arm64 (gcc version 11.2.1 20220314): 3 configs -> no failure x86_64 (gcc version 11.2.1 20220314): 4 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2]
[1]. https://openqa.qa.codethink.co.uk/tests/1036 [2]. https://openqa.qa.codethink.co.uk/tests/1037
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Mon, 18 Apr 2022 14:12:02 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.10: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 75 tests: 75 pass, 0 fail
Linux version: 5.10.112-rc1-gd5c581fe77b5 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 2022/4/18 20:12, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.112 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 20 Apr 2022 12:11:14 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Tested on arm64 and x86 for 5.10.112-rc1,
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.10.y Version: 5.10.112-rc1 Commit: d5c581fe77b5122ed284c7739724414ca5059b0e Compiler: gcc version 7.3.0 (GCC)
arm64: -------------------------------------------------------------------- Testcase Result Summary: total: 9031 passed: 9031 failed: 0 timeout: 0 --------------------------------------------------------------------
x86: -------------------------------------------------------------------- Testcase Result Summary: total: 9031 passed: 9031 failed: 0 timeout: 0 --------------------------------------------------------------------
Tested-by: Hulk Robot hulkrobot@huawei.com
linux-stable-mirror@lists.linaro.org