This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.39-rc1
Marek Behún kabel@kernel.org PCI: aardvark: Update comment about link going down after link-up
Marek Behún kabel@kernel.org PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy()
Pali Rohár pali@kernel.org PCI: aardvark: Don't mask irq when mapping
Pali Rohár pali@kernel.org PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts
Pali Rohár pali@kernel.org PCI: aardvark: Use separate INTA interrupt for emulated root bridge
Pali Rohár pali@kernel.org PCI: aardvark: Fix support for PME requester on emulated bridge
Pali Rohár pali@kernel.org PCI: aardvark: Add support for PME interrupts
Pali Rohár pali@kernel.org PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge
Pali Rohár pali@kernel.org PCI: aardvark: Add support for ERR interrupt on emulated bridge
Pali Rohár pali@kernel.org PCI: aardvark: Enable MSI-X support
Pali Rohár pali@kernel.org PCI: aardvark: Fix setting MSI address
Pali Rohár pali@kernel.org PCI: aardvark: Add support for masking MSI interrupts
Pali Rohár pali@kernel.org PCI: aardvark: Refactor unmasking summary MSI interrupt
Marek Behún kabel@kernel.org PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node)
Marek Behún kabel@kernel.org PCI: aardvark: Make msi_domain_info structure a static driver structure
Marek Behún kabel@kernel.org PCI: aardvark: Make MSI irq_chip structures static driver structures
Pali Rohár pali@kernel.org PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ
Pali Rohár pali@kernel.org PCI: aardvark: Rewrite IRQ code to chained IRQ handler
Pali Rohár pali@kernel.org PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_*
Pali Rohár pali@kernel.org PCI: aardvark: Disable common PHY when unbinding driver
Pali Rohár pali@kernel.org PCI: aardvark: Disable link training when unbinding driver
Pali Rohár pali@kernel.org PCI: aardvark: Assert PERST# when unbinding driver
Pali Rohár pali@kernel.org PCI: aardvark: Fix memory leak in driver unbind
Pali Rohár pali@kernel.org PCI: aardvark: Mask all interrupts when unbinding driver
Pali Rohár pali@kernel.org PCI: aardvark: Disable bus mastering when unbinding driver
Pali Rohár pali@kernel.org PCI: aardvark: Comment actions in driver remove method
Pali Rohár pali@kernel.org PCI: aardvark: Clear all MSIs at setup
Pali Rohár pali@kernel.org PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge
Pali Rohár pali@kernel.org PCI: pci-bridge-emul: Add definitions for missing capabilities registers
Pali Rohár pali@kernel.org PCI: pci-bridge-emul: Add description for class_revision field
Frederic Weisbecker frederic@kernel.org rcu: Apply callbacks processing time limit only on softirq
Frederic Weisbecker frederic@kernel.org rcu: Fix callbacks processing time limit retaining cond_resched()
Helge Deller deller@gmx.de Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
Ricky WU ricky_wu@realtek.com mmc: rtsx: add 74 Clocks in power on flow
Sidhartha Kumar sidhartha.kumar@oracle.com selftest/vm: verify remap destination address in mremap_test
Sidhartha Kumar sidhartha.kumar@oracle.com selftest/vm: verify mmap addr in mremap_test
Wanpeng Li wanpengli@tencent.com KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised
Paolo Bonzini pbonzini@redhat.com KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
Paolo Bonzini pbonzini@redhat.com KVM: x86: Do not change ICR on write to APIC_SELF_IPI
Wanpeng Li wanpengli@tencent.com x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
Thomas Huth thuth@redhat.com KVM: selftests: Silence compiler warning in the kvm_page_table_test
Paolo Bonzini pbonzini@redhat.com kvm: selftests: do not use bitfields larger than 32-bits for PTEs
Hector Martin marcan@marcan.st iommu/dart: Add missing module owner to ops structure
Vlad Buslov vladbu@nvidia.com net/mlx5e: Lag, Don't skip fib events on current dst
Vlad Buslov vladbu@nvidia.com net/mlx5e: Lag, Fix fib_info pointer assignment
Vlad Buslov vladbu@nvidia.com net/mlx5e: Lag, Fix use-after-free in fib event handler
Aya Levin ayal@nvidia.com net/mlx5: Fix slab-out-of-bounds while reading resource dump menu
Javier Martinez Canillas javierm@redhat.com fbdev: Make fb_release() return -ENODEV if fbdev was unregistered
Sandipan Das sandipan.das@amd.com kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU
Baruch Siach baruch@tkos.co.il gpio: mvebu: drop pwm base assignment
Kai-Heng Feng kai.heng.feng@canonical.com drm/amdgpu: Ensure HDA function is suspended before ASIC reset
Mario Limonciello mario.limonciello@amd.com drm/amdgpu: don't set s3 and s0ix at the same time
Mario Limonciello mario.limonciello@amd.com drm/amdgpu: explicitly check for s0ix when evicting resources
Nirmoy Das nirmoy.das@amd.com drm/amdgpu: unify BO evicting method in amdgpu_ttm
Filipe Manana fdmanana@suse.com btrfs: always log symlinks in full mode
Qu Wenruo wqu@suse.com btrfs: force v2 space cache usage for subpage mount
Sergey Shtylyov s.shtylyov@omp.ru smsc911x: allow using IRQ0
Vladimir Oltean vladimir.oltean@nxp.com selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
Michael Chan michael.chan@broadcom.com bnxt_en: Fix unnecessary dropping of RX packets
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag
Ido Schimmel idosch@nvidia.com selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational
David Howells dhowells@redhat.com rxrpc: Enable IPv6 checksums on transport socket
Eric Dumazet edumazet@google.com mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()
Qiao Ma mqaio@linux.alibaba.com hinic: fix bug of wq out of bound access
Filipe Manana fdmanana@suse.com btrfs: do not BUG_ON() on failure to update inode when setting xattr
Kuogee Hsieh quic_khsieh@quicinc.com drm/msm/dp: remove fail safe mode related code
Marc Kleine-Budde mkl@pengutronix.de selftests/net: so_txtime: usage(): fix documentation of default clock
Marc Kleine-Budde mkl@pengutronix.de selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems
Shravya Kumbham shravya.kumbham@xilinx.com net: emaclite: Add error handling for of_address_to_resource()
Eric Dumazet edumazet@google.com net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()
Yang Yingliang yangyingliang@huawei.com net: cpsw: add missing of_node_put() in cpsw_probe_dt()
Niels Dossche dossche.niels@gmail.com net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller
Yang Yingliang yangyingliang@huawei.com net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux()
Yang Yingliang yangyingliang@huawei.com net: dsa: mt7530: add missing of_node_put() in mt7530_setup()
Yang Yingliang yangyingliang@huawei.com net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init()
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Don't invalidate inode attributes on delegation return
Mustafa Ismail mustafa.ismail@intel.com RDMA/irdma: Fix possible crash due to NULL netdev in notifier
Shiraz Saleem shiraz.saleem@intel.com RDMA/irdma: Reduce iWARP QP destroy time
Tatyana Nikolova tatyana.e.nikolova@intel.com RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
Cheng Xu chengyou@linux.alibaba.com RDMA/siw: Fix a condition race issue in MPA request processing
Olga Kornievskaia kolga@netapp.com SUNRPC release the transport of a relocated task with an assigned transport
Jann Horn jannh@google.com selftests/seccomp: Don't call read() on TTY from background pgrp
Moshe Shemesh moshe@nvidia.com net/mlx5: Fix deadlock in sync reset flow
Moshe Shemesh moshe@nvidia.com net/mlx5: Avoid double clear or set of sync reset requested
Mark Zhang markzhang@nvidia.com net/mlx5e: Fix the calling of update_buffer_lossy() API
Paul Blakey paulb@nvidia.com net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release
Vlad Buslov vladbu@nvidia.com net/mlx5e: Don't match double-vlan packets if cvlan is not set
Moshe Tal moshet@nvidia.com net/mlx5e: Fix trust state reset in reload
Yang Yingliang yangyingliang@huawei.com iommu/dart: check return value after calling platform_get_resource()
Lu Baolu baolu.lu@linux.intel.com iommu/vt-d: Drop stop marker messages
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: soc-ops: fix error handling
Codrin Ciubotariu codrin.ciubotariu@microchip.com ASoC: dmaengine: Restore NULL prepare_slave_config() callback
Adam Wujek dev_public@wujek.eu hwmon: (pmbus) disable PEC if not enabled
Armin Wolf W_Armin@gmx.de hwmon: (adt7470) Fix warning on module removal
Puyou Lu puyou.lu@gmail.com gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)
Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp gpio: visconti: Fix fwnode of GPIO IRQ
Duoming Zhou duoming@zju.edu.cn NFC: netlink: fix sleep in atomic bug when firmware download timeout
Duoming Zhou duoming@zju.edu.cn nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
Duoming Zhou duoming@zju.edu.cn nfc: replace improper check device_is_registered() in netlink related functions
Andreas Larsson andreas@gaisler.com can: grcan: only use the NAPI poll budget for RX
Andreas Larsson andreas@gaisler.com can: grcan: grcan_probe(): fix broken system id check for errata workaround needs
Daniel Hellstrom daniel@gaisler.com can: grcan: use ofdev->dev when allocating DMA memory
Oliver Hartkopp socketcan@hartkopp.net can: isotp: remove re-binding of bound socket
Duoming Zhou duoming@zju.edu.cn can: grcan: grcan_close(): fix deadlock
Jan Höppner hoeppner@linux.ibm.com s390/dasd: Fix read inconsistency for ESE DASD devices
Jan Höppner hoeppner@linux.ibm.com s390/dasd: Fix read for ESE with blksize < 4k
Stefan Haberland sth@linux.ibm.com s390/dasd: prevent double format of tracks for ESE devices
Stefan Haberland sth@linux.ibm.com s390/dasd: fix data corruption for ESE devices
Mark Brown broonie@kernel.org ASoC: meson: Fix event generation for AUI CODEC mux
Mark Brown broonie@kernel.org ASoC: meson: Fix event generation for G12A tohdmi mux
Mark Brown broonie@kernel.org ASoC: meson: Fix event generation for AUI ACODEC mux
Mark Brown broonie@kernel.org ASoC: wm8958: Fix change notifications for DSP controls
Mark Brown broonie@kernel.org ASoC: da7219: Fix change notifications for tone generator frequency
Thomas Pfaff tpfaff@pcs.com genirq: Synchronize interrupt thread startup
Tan Tee Min tee.min.tan@linux.intel.com net: stmmac: disable Split Header (SPH) for Intel platforms
Niels Dossche dossche.niels@gmail.com firewire: core: extend card->lock in fw_core_handle_bus_reset
Jakob Koschel jakobkoschel@gmail.com firewire: remove check of list iterator against head past the loop body
Chengfeng Ye cyeaa@connect.ust.hk firewire: fix potential uaf in outbound_phy_packet_callback()
Kurt Kanzenbach kurt@linutronix.de timekeeping: Mark NMI safe time accessors as notrace
Trond Myklebust trond.myklebust@hammerspace.com Revert "SUNRPC: attempt AF_LOCAL connect on setup"
Nick Kossifidis mick@ics.forth.gr RISC-V: relocate DTB if it's outside memory region
Marek Marczykowski-Górecki marmarek@invisiblethingslab.com drm/amdgpu: do not use passthrough mode in Xen dom0
Harry Wentland harry.wentland@amd.com drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
Nicolin Chen nicolinc@nvidia.com iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()
David Stevens stevensd@chromium.org iommu/vt-d: Calculate mask for non-aligned flushes
Kyle Huey me@kylehuey.com KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id
Thomas Gleixner tglx@linutronix.de x86/fpu: Prevent FPU state corruption
Andrei Lalaev andrei.lalaev@emlid.com gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
Brian Norris briannorris@chromium.org mmc: core: Set HS clock speed before sending HS CMD13
Samuel Holland samuel@sholland.org mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits
Shaik Sajida Bhanu quic_c_sbhanu@quicinc.com mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC
Takashi Sakamoto o-takashi@sakamocchi.jp ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes
Zihao Wang wzhd@ustc.edu ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers
Helge Deller deller@gmx.de parisc: Merge model and model name into one line in /proc/cpuinfo
Maciej W. Rozycki macro@orcam.me.uk MIPS: Fix CP0 counter erratum detection for R4k CPUs
-------------
Diffstat:
Makefile | 4 +- arch/mips/include/asm/timex.h | 8 +- arch/mips/kernel/time.c | 11 +- arch/parisc/kernel/processor.c | 3 +- arch/parisc/kernel/setup.c | 2 + arch/parisc/kernel/time.c | 6 +- arch/riscv/mm/init.c | 21 +- arch/x86/kernel/fpu/core.c | 67 ++-- arch/x86/kernel/kvm.c | 13 + arch/x86/kvm/cpuid.c | 5 + arch/x86/kvm/lapic.c | 10 +- arch/x86/kvm/mmu/mmu.c | 2 + arch/x86/kvm/svm/pmu.c | 28 +- drivers/firewire/core-card.c | 3 + drivers/firewire/core-cdev.c | 4 +- drivers/firewire/core-topology.c | 9 +- drivers/firewire/core-transaction.c | 30 +- drivers/firewire/sbp2.c | 13 +- drivers/gpio/gpio-mvebu.c | 7 - drivers/gpio/gpio-pca953x.c | 4 +- drivers/gpio/gpio-visconti.c | 7 +- drivers/gpio/gpiolib-of.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 24 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 23 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 4 +- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +- drivers/gpu/drm/msm/dp/dp_display.c | 6 - drivers/gpu/drm/msm/dp/dp_panel.c | 11 - drivers/gpu/drm/msm/dp/dp_panel.h | 1 - drivers/hwmon/adt7470.c | 4 +- drivers/hwmon/pmbus/pmbus_core.c | 3 + drivers/infiniband/hw/irdma/cm.c | 26 +- drivers/infiniband/hw/irdma/utils.c | 21 +- drivers/infiniband/hw/irdma/verbs.c | 4 +- drivers/infiniband/sw/siw/siw_cm.c | 7 +- drivers/iommu/apple-dart.c | 10 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 9 +- drivers/iommu/intel/iommu.c | 27 +- drivers/iommu/intel/svm.c | 4 + drivers/mmc/core/mmc.c | 23 +- drivers/mmc/host/rtsx_pci_sdmmc.c | 29 +- drivers/mmc/host/sdhci-msm.c | 42 ++ drivers/mmc/host/sunxi-mmc.c | 5 +- drivers/net/can/grcan.c | 46 +-- drivers/net/dsa/mt7530.c | 1 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 +- drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c | 7 +- drivers/net/ethernet/mediatek/mtk_sgmii.c | 1 + .../ethernet/mellanox/mlx5/core/diag/rsc_dump.c | 31 +- .../ethernet/mellanox/mlx5/core/en/port_buffer.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 4 + drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 10 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 11 + drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 60 +-- drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 38 +- drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h | 7 +- drivers/net/ethernet/smsc/smsc911x.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 + drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- drivers/net/ethernet/ti/cpsw_new.c | 5 +- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 15 +- drivers/net/mdio/mdio-mux-bcm6368.c | 2 +- drivers/nfc/nfcmrvl/main.c | 2 +- drivers/pci/controller/pci-aardvark.c | 428 ++++++++++++++++----- drivers/pci/pci-bridge-emul.c | 49 ++- drivers/s390/block/dasd.c | 18 +- drivers/s390/block/dasd_eckd.c | 28 +- drivers/s390/block/dasd_int.h | 14 + drivers/video/fbdev/core/fbmem.c | 5 +- fs/btrfs/disk-io.c | 11 + fs/btrfs/tree-log.c | 14 +- fs/btrfs/xattr.c | 6 +- fs/nfs/nfs4proc.c | 12 +- include/linux/stmmac.h | 1 + kernel/irq/internals.h | 2 + kernel/irq/irqdesc.c | 2 + kernel/irq/manage.c | 39 +- kernel/rcu/tree.c | 31 +- kernel/time/timekeeping.c | 4 +- net/can/isotp.c | 22 +- net/ipv4/igmp.c | 9 +- net/ipv6/mcast.c | 8 +- net/nfc/core.c | 29 +- net/nfc/netlink.c | 4 +- net/rxrpc/local_object.c | 3 + net/sunrpc/clnt.c | 11 +- net/sunrpc/xprtsock.c | 3 - sound/firewire/fireworks/fireworks_hwdep.c | 1 + sound/pci/hda/patch_realtek.c | 1 + sound/soc/codecs/da7219.c | 14 +- sound/soc/codecs/wm8958-dsp2.c | 8 +- sound/soc/meson/aiu-acodec-ctrl.c | 2 +- sound/soc/meson/aiu-codec-ctrl.c | 2 +- sound/soc/meson/g12a-tohdmitx.c | 2 +- sound/soc/soc-generic-dmaengine-pcm.c | 6 +- sound/soc/soc-ops.c | 2 +- .../drivers/net/ocelot/tc_flower_chains.sh | 2 +- .../selftests/kvm/include/x86_64/processor.h | 15 + tools/testing/selftests/kvm/kvm_page_table_test.c | 2 +- tools/testing/selftests/kvm/lib/x86_64/processor.c | 192 ++++----- .../net/forwarding/mirror_gre_bridge_1q.sh | 3 + tools/testing/selftests/net/so_txtime.c | 4 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 10 +- tools/testing/selftests/vm/mremap_test.c | 53 +++ 110 files changed, 1293 insertions(+), 656 deletions(-)
From: Maciej W. Rozycki macro@orcam.me.uk
commit f0a6c68f69981214cb7858738dd2bc81475111f7 upstream.
Fix the discrepancy between the two places we check for the CP0 counter erratum in along with the incorrect comparison of the R4400 revision number against 0x30 which matches none and consistently consider all R4000 and R4400 processors affected, as documented in processor errata publications[1][2][3], following the mapping between CP0 PRId register values and processor models:
PRId | Processor Model ---------+-------------------- 00000422 | R4000 Revision 2.2 00000430 | R4000 Revision 3.0 00000440 | R4400 Revision 1.0 00000450 | R4400 Revision 2.0 00000460 | R4400 Revision 3.0
No other revision of either processor has ever been spotted.
Contrary to what has been stated in commit ce202cbb9e0b ("[MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug") marking the CP0 counter as buggy does not preclude it from being used as either a clock event or a clock source device. It just cannot be used as both at a time, because in that case clock event interrupts will be occasionally lost, and the use as a clock event device takes precedence.
Compare against 0x4ff in `can_use_mips_counter' so that a single machine instruction is produced.
[1] "MIPS R4000PC/SC Errata, Processor Revision 2.2 and 3.0", MIPS Technologies Inc., May 10, 1994, Erratum 53, p.13
[2] "MIPS R4400PC/SC Errata, Processor Revision 1.0", MIPS Technologies Inc., February 9, 1994, Erratum 21, p.4
[3] "MIPS R4400PC/SC Errata, Processor Revision 2.0 & 3.0", MIPS Technologies Inc., January 24, 1995, Erratum 14, p.3
Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk Fixes: ce202cbb9e0b ("[MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug") Cc: stable@vger.kernel.org # v2.6.24+ Reviewed-by: Philippe Mathieu-Daudé f4bug@amsat.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/include/asm/timex.h | 8 ++++---- arch/mips/kernel/time.c | 11 +++-------- 2 files changed, 7 insertions(+), 12 deletions(-)
--- a/arch/mips/include/asm/timex.h +++ b/arch/mips/include/asm/timex.h @@ -40,9 +40,9 @@ typedef unsigned int cycles_t;
/* - * On R4000/R4400 before version 5.0 an erratum exists such that if the - * cycle counter is read in the exact moment that it is matching the - * compare register, no interrupt will be generated. + * On R4000/R4400 an erratum exists such that if the cycle counter is + * read in the exact moment that it is matching the compare register, + * no interrupt will be generated. * * There is a suggested workaround and also the erratum can't strike if * the compare interrupt isn't being used as the clock source device. @@ -63,7 +63,7 @@ static inline int can_use_mips_counter(u if (!__builtin_constant_p(cpu_has_counter)) asm volatile("" : "=m" (cpu_data[0].options)); if (likely(cpu_has_counter && - prid >= (PRID_IMP_R4000 | PRID_REV_ENCODE_44(5, 0)))) + prid > (PRID_IMP_R4000 | PRID_REV_ENCODE_44(15, 15)))) return 1; else return 0; --- a/arch/mips/kernel/time.c +++ b/arch/mips/kernel/time.c @@ -141,15 +141,10 @@ static __init int cpu_has_mfc0_count_bug case CPU_R4400MC: /* * The published errata for the R4400 up to 3.0 say the CPU - * has the mfc0 from count bug. + * has the mfc0 from count bug. This seems the last version + * produced. */ - if ((current_cpu_data.processor_id & 0xff) <= 0x30) - return 1; - - /* - * we assume newer revisions are ok - */ - return 0; + return 1; }
return 0;
From: Helge Deller deller@gmx.de
commit 5b89966bc96a06f6ad65f64ae4b0461918fcc9d3 upstream.
The Linux tool "lscpu" shows the double amount of CPUs if we have "model" and "model name" in two different lines in /proc/cpuinfo. This change combines the model and the model name into one line.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/processor.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/parisc/kernel/processor.c +++ b/arch/parisc/kernel/processor.c @@ -418,8 +418,7 @@ show_cpuinfo (struct seq_file *m, void * } seq_printf(m, " (0x%02lx)\n", boot_cpu_data.pdc.capabilities);
- seq_printf(m, "model\t\t: %s\n" - "model name\t: %s\n", + seq_printf(m, "model\t\t: %s - %s\n", boot_cpu_data.pdc.sys_model_name, cpuinfo->dev ? cpuinfo->dev->name : "Unknown");
From: Zihao Wang wzhd@ustc.edu
commit 3b79954fd00d540677c97a560622b73f3a1f4e28 upstream.
Lenovo Yoga Duet 7 13ITL6 has Realtek ALC287 and built-in speakers do not work out of the box. The fix developed for Yoga 7i 14ITL5 also enables speaker output for this model.
Signed-off-by: Zihao Wang wzhd@ustc.edu Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220424084120.74125-1-wzhd@ustc.edu Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9034,6 +9034,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x17aa, 0x3813, "Legion 7i 15IMHG05", ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS), SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940", ALC298_FIXUP_LENOVO_SPK_VOLUME), SND_PCI_QUIRK(0x17aa, 0x3819, "Lenovo 13s Gen2 ITL", ALC287_FIXUP_13S_GEN2_SPEAKERS), + SND_PCI_QUIRK(0x17aa, 0x3820, "Yoga Duet 7 13ITL6", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS), SND_PCI_QUIRK(0x17aa, 0x3824, "Legion Y9000X 2020", ALC285_FIXUP_LEGION_Y9000X_SPEAKERS), SND_PCI_QUIRK(0x17aa, 0x3827, "Ideapad S740", ALC285_FIXUP_IDEAPAD_S740_COEF), SND_PCI_QUIRK(0x17aa, 0x3834, "Lenovo IdeaPad Slim 9i 14ITL5", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS),
From: Takashi Sakamoto o-takashi@sakamocchi.jp
commit eb9d84b0ffe39893cb23b0b6712bbe3637fa25fa upstream.
ALSA fireworks driver has a bug in its initial state to return count shorter than expected by 4 bytes to userspace applications when handling response frame for Echo Audio Fireworks transaction. It's due to missing addition of the size for the type of event in ALSA firewire stack.
Fixes: 555e8a8f7f14 ("ALSA: fireworks: Add command/response functionality into hwdep interface") Cc: stable@vger.kernel.org Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20220424102428.21109-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/firewire/fireworks/fireworks_hwdep.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/firewire/fireworks/fireworks_hwdep.c +++ b/sound/firewire/fireworks/fireworks_hwdep.c @@ -34,6 +34,7 @@ hwdep_read_resp_buf(struct snd_efw *efw, type = SNDRV_FIREWIRE_EVENT_EFW_RESPONSE; if (copy_to_user(buf, &type, sizeof(type))) return -EFAULT; + count += sizeof(type); remained -= sizeof(type); buf += sizeof(type);
From: Shaik Sajida Bhanu quic_c_sbhanu@quicinc.com
commit 3e5a8e8494a8122fe4eb3f167662f406cab753b9 upstream.
Reset GCC_SDCC_BCR register before every fresh initilazation. This will reset whole SDHC-msm controller, clears the previous power control states and avoids, software reset timeout issues as below.
[ 5.458061][ T262] mmc1: Reset 0x1 never completed. [ 5.462454][ T262] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5.469065][ T262] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00007202 [ 5.475688][ T262] mmc1: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5.482315][ T262] mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5.488927][ T262] mmc1: sdhci: Present: 0x01f800f0 | Host ctl: 0x00000000 [ 5.495539][ T262] mmc1: sdhci: Power: 0x00000000 | Blk gap: 0x00000000 [ 5.502162][ T262] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000003 [ 5.508768][ T262] mmc1: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5.515381][ T262] mmc1: sdhci: Int enab: 0x00000000 | Sig enab: 0x00000000 [ 5.521996][ T262] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5.528607][ T262] mmc1: sdhci: Caps: 0x362dc8b2 | Caps_1: 0x0000808f [ 5.535227][ T262] mmc1: sdhci: Cmd: 0x00000000 | Max curr: 0x00000000 [ 5.541841][ T262] mmc1: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000 [ 5.548454][ T262] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 [ 5.555079][ T262] mmc1: sdhci: Host ctl2: 0x00000000 [ 5.559651][ T262] mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP----------- [ 5.566621][ T262] mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x6000642c | DLL cfg2: 0x0020a000 [ 5.575465][ T262] mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00010800 | DDR cfg: 0x80040873 [ 5.584658][ T262] mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88218a8 Vndr func3: 0x02626040
Fixes: 0eb0d9f4de34 ("mmc: sdhci-msm: Initial support for Qualcomm chipsets") Signed-off-by: Shaik Sajida Bhanu quic_c_sbhanu@quicinc.com Acked-by: Adrian Hunter adrian.hunter@intel.com Reviewed-by: Philipp Zabel p.zabel@pengutronix.de Tested-by: Konrad Dybcio konrad.dybcio@somainline.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1650816153-23797-1-git-send-email-quic_c_sbhanu@qu... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-msm.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+)
--- a/drivers/mmc/host/sdhci-msm.c +++ b/drivers/mmc/host/sdhci-msm.c @@ -17,6 +17,7 @@ #include <linux/regulator/consumer.h> #include <linux/interconnect.h> #include <linux/pinctrl/consumer.h> +#include <linux/reset.h>
#include "sdhci-pltfm.h" #include "cqhci.h" @@ -2482,6 +2483,43 @@ static inline void sdhci_msm_get_of_prop of_property_read_u32(node, "qcom,dll-config", &msm_host->dll_config); }
+static int sdhci_msm_gcc_reset(struct device *dev, struct sdhci_host *host) +{ + struct reset_control *reset; + int ret = 0; + + reset = reset_control_get_optional_exclusive(dev, NULL); + if (IS_ERR(reset)) + return dev_err_probe(dev, PTR_ERR(reset), + "unable to acquire core_reset\n"); + + if (!reset) + return ret; + + ret = reset_control_assert(reset); + if (ret) { + reset_control_put(reset); + return dev_err_probe(dev, ret, "core_reset assert failed\n"); + } + + /* + * The hardware requirement for delay between assert/deassert + * is at least 3-4 sleep clock (32.7KHz) cycles, which comes to + * ~125us (4/32768). To be on the safe side add 200us delay. + */ + usleep_range(200, 210); + + ret = reset_control_deassert(reset); + if (ret) { + reset_control_put(reset); + return dev_err_probe(dev, ret, "core_reset deassert failed\n"); + } + + usleep_range(200, 210); + reset_control_put(reset); + + return ret; +}
static int sdhci_msm_probe(struct platform_device *pdev) { @@ -2529,6 +2567,10 @@ static int sdhci_msm_probe(struct platfo
msm_host->saved_tuning_phase = INVALID_TUNING_PHASE;
+ ret = sdhci_msm_gcc_reset(&pdev->dev, host); + if (ret) + goto pltfm_free; + /* Setup SDCC bus voter clock. */ msm_host->bus_clk = devm_clk_get(&pdev->dev, "bus"); if (!IS_ERR(msm_host->bus_clk)) {
From: Samuel Holland samuel@sholland.org
commit e9f3fb523dbf476dc86beea23f5b5ca8f9687c93 upstream.
Newer variants of the MMC controller support a 34-bit physical address space by using word addresses instead of byte addresses. However, the code truncates the DMA descriptor address to 32 bits before applying the shift. This breaks DMA for descriptors allocated above the 32-bit limit.
Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller") Signed-off-by: Samuel Holland samuel@sholland.org Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Jernej Skrabec jernej.skrabec@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220424231751.32053-1-samuel@sholland.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sunxi-mmc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/sunxi-mmc.c +++ b/drivers/mmc/host/sunxi-mmc.c @@ -377,8 +377,9 @@ static void sunxi_mmc_init_idma_des(stru pdes[i].buf_addr_ptr1 = cpu_to_le32(sg_dma_address(&data->sg[i]) >> host->cfg->idma_des_shift); - pdes[i].buf_addr_ptr2 = cpu_to_le32((u32)next_desc >> - host->cfg->idma_des_shift); + pdes[i].buf_addr_ptr2 = + cpu_to_le32(next_desc >> + host->cfg->idma_des_shift); }
pdes[0].config |= cpu_to_le32(SDXC_IDMAC_DES0_FD);
From: Brian Norris briannorris@chromium.org
commit 4bc31edebde51fcf8ad0794763b8679a7ecb5ec0 upstream.
Way back in commit 4f25580fb84d ("mmc: core: changes frequency to hs_max_dtr when selecting hs400es"), Rockchip engineers noticed that some eMMC don't respond to SEND_STATUS commands very reliably if they're still running at a low initial frequency. As mentioned in that commit, JESD84-B51 P49 suggests a sequence in which the host: 1. sets HS_TIMING 2. bumps the clock ("<= 52 MHz") 3. sends further commands
It doesn't exactly require that we don't use a lower-than-52MHz frequency, but in practice, these eMMC don't like it.
The aforementioned commit tried to get that right for HS400ES, although it's unclear whether this ever truly worked as committed into mainline, as other changes/refactoring adjusted the sequence in conflicting ways:
08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode switch")
53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode for mmc")
In any case, today we do step 3 before step 2. Let's fix that, and also apply the same logic to HS200/400, where this eMMC has problems too.
Resolves errors like this seen when booting some RK3399 Gru/Scarlet systems:
[ 2.058881] mmc1: CQHCI version 5.10 [ 2.097545] mmc1: SDHCI controller on fe330000.mmc [fe330000.mmc] using ADMA [ 2.209804] mmc1: mmc_select_hs400es failed, error -84 [ 2.215597] mmc1: error -84 whilst initialising MMC card [ 2.417514] mmc1: mmc_select_hs400es failed, error -110 [ 2.423373] mmc1: error -110 whilst initialising MMC card [ 2.605052] mmc1: mmc_select_hs400es failed, error -110 [ 2.617944] mmc1: error -110 whilst initialising MMC card [ 2.835884] mmc1: mmc_select_hs400es failed, error -110 [ 2.841751] mmc1: error -110 whilst initialising MMC card
Ealier versions of this patch bumped to 200MHz/HS200 speeds too early, which caused issues on, e.g., qcom-msm8974-fairphone-fp2. (Thanks for the report Luca!) After a second look, it appears that aligns with JESD84 / page 45 / table 28, so we need to keep to lower (HS / 52 MHz) rates first.
Fixes: 08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode switch") Fixes: 53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode for mmc") Fixes: 4f25580fb84d ("mmc: core: changes frequency to hs_max_dtr when selecting hs400es") Cc: Shawn Lin shawn.lin@rock-chips.com Link: https://lore.kernel.org/linux-mmc/11962455.O9o76ZdvQC@g550jk/ Reported-by: Luca Weiss luca@z3ntu.xyz Signed-off-by: Brian Norris briannorris@chromium.org Tested-by: Luca Weiss luca@z3ntu.xyz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220422100824.v4.1.I484f4ee35609f78b932bd50feed63... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/mmc.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
--- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1381,13 +1381,17 @@ static int mmc_select_hs400es(struct mmc goto out_err; }
+ /* + * Bump to HS timing and frequency. Some cards don't handle + * SEND_STATUS reliably at the initial frequency. + */ mmc_set_timing(host, MMC_TIMING_MMC_HS); + mmc_set_bus_speed(card); + err = mmc_switch_status(card, true); if (err) goto out_err;
- mmc_set_clock(host, card->ext_csd.hs_max_dtr); - /* Switch card to DDR with strobe bit */ val = EXT_CSD_DDR_BUS_WIDTH_8 | EXT_CSD_BUS_WIDTH_STROBE; err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, @@ -1445,7 +1449,7 @@ out_err: static int mmc_select_hs200(struct mmc_card *card) { struct mmc_host *host = card->host; - unsigned int old_timing, old_signal_voltage; + unsigned int old_timing, old_signal_voltage, old_clock; int err = -EINVAL; u8 val;
@@ -1476,8 +1480,17 @@ static int mmc_select_hs200(struct mmc_c false, true, MMC_CMD_RETRIES); if (err) goto err; + + /* + * Bump to HS timing and frequency. Some cards don't handle + * SEND_STATUS reliably at the initial frequency. + * NB: We can't move to full (HS200) speeds until after we've + * successfully switched over. + */ old_timing = host->ios.timing; + old_clock = host->ios.clock; mmc_set_timing(host, MMC_TIMING_MMC_HS200); + mmc_set_clock(card->host, card->ext_csd.hs_max_dtr);
/* * For HS200, CRC errors are not a reliable way to know the @@ -1490,8 +1503,10 @@ static int mmc_select_hs200(struct mmc_c * mmc_select_timing() assumes timing has not changed if * it is a switch error. */ - if (err == -EBADMSG) + if (err == -EBADMSG) { + mmc_set_clock(host, old_clock); mmc_set_timing(host, old_timing); + } } err: if (err) {
From: Andrei Lalaev andrei.lalaev@emlid.com
commit e75f88efac05bf4e107e4171d8db6d8c3937252d upstream.
Gpiolib interprets the elements of "gpio-reserved-ranges" as "start,size" because it clears "size" bits starting from the "start" bit in the according bitmap. So it has to use "greater" instead of "greater or equal" when performs bounds check to make sure that GPIOs are in the available range. Previous implementation skipped ranges that include the last GPIO in the range.
I wrote the mail to the maintainers (https://lore.kernel.org/linux-gpio/20220412115554.159435-1-andrei.lalaev@eml...) of the questioned DTSes (because I couldn't understand how the maintainers interpreted this property), but I haven't received a response. Since the questioned DTSes use "gpio-reserved-ranges = <0 4>" (i.e., the beginning of the range), this patch doesn't affect these DTSes at all. TBH this patch doesn't break any existing DTSes because none of them reserve gpios at the end of range.
Fixes: 726cb3ba4969 ("gpiolib: Support 'gpio-reserved-ranges' property") Signed-off-by: Andrei Lalaev andrei.lalaev@emlid.com Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Cc: stable@vger.kernel.org Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-of.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -912,7 +912,7 @@ static void of_gpiochip_init_valid_mask( i, &start); of_property_read_u32_index(np, "gpio-reserved-ranges", i + 1, &count); - if (start >= chip->ngpio || start + count >= chip->ngpio) + if (start >= chip->ngpio || start + count > chip->ngpio) continue;
bitmap_clear(chip->valid_mask, start, count);
From: Thomas Gleixner tglx@linutronix.de
commit 59f5ede3bc0f00eb856425f636dab0c10feb06d8 upstream.
The FPU usage related to task FPU management is either protected by disabling interrupts (switch_to, return to user) or via fpregs_lock() which is a wrapper around local_bh_disable(). When kernel code wants to use the FPU then it has to check whether it is possible by calling irq_fpu_usable().
But the condition in irq_fpu_usable() is wrong. It allows FPU to be used when:
!in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()
The latter is checking whether some other context already uses FPU in the kernel, but if that's not the case then it allows FPU to be used unconditionally even if the calling context interrupted a fpregs_lock() critical region. If that happens then the FPU state of the interrupted context becomes corrupted.
Allow in kernel FPU usage only when no other context has in kernel FPU usage and either the calling context is not hard interrupt context or the hard interrupt did not interrupt a local bottomhalf disabled region.
It's hard to find a proper Fixes tag as the condition was broken in one way or the other for a very long time and the eager/lazy FPU changes caused a lot of churn. Picked something remotely connected from the history.
This survived undetected for quite some time as FPU usage in interrupt context is rare, but the recent changes to the random code unearthed it at least on a kernel which had FPU debugging enabled. There is probably a higher rate of silent corruption as not all issues can be detected by the FPU debugging code. This will be addressed in a subsequent change.
Fixes: 5d2bd7009f30 ("x86, fpu: decouple non-lazy/eager fpu restore from xsave") Reported-by: Filipe Manana fdmanana@suse.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Filipe Manana fdmanana@suse.com Reviewed-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220501193102.588689270@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fpu/core.c | 67 +++++++++++++++++---------------------------- 1 file changed, 26 insertions(+), 41 deletions(-)
--- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -25,17 +25,7 @@ */ union fpregs_state init_fpstate __ro_after_init;
-/* - * Track whether the kernel is using the FPU state - * currently. - * - * This flag is used: - * - * - by IRQ context code to potentially use the FPU - * if it's unused. - * - * - to debug kernel_fpu_begin()/end() correctness - */ +/* Track in-kernel FPU usage */ static DEFINE_PER_CPU(bool, in_kernel_fpu);
/* @@ -43,42 +33,37 @@ static DEFINE_PER_CPU(bool, in_kernel_fp */ DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
-static bool kernel_fpu_disabled(void) -{ - return this_cpu_read(in_kernel_fpu); -} - -static bool interrupted_kernel_fpu_idle(void) -{ - return !kernel_fpu_disabled(); -} - -/* - * Were we in user mode (or vm86 mode) when we were - * interrupted? - * - * Doing kernel_fpu_begin/end() is ok if we are running - * in an interrupt context from user mode - we'll just - * save the FPU state as required. - */ -static bool interrupted_user_mode(void) -{ - struct pt_regs *regs = get_irq_regs(); - return regs && user_mode(regs); -} - /* * Can we use the FPU in kernel mode with the * whole "kernel_fpu_begin/end()" sequence? - * - * It's always ok in process context (ie "not interrupt") - * but it is sometimes ok even from an irq. */ bool irq_fpu_usable(void) { - return !in_interrupt() || - interrupted_user_mode() || - interrupted_kernel_fpu_idle(); + if (WARN_ON_ONCE(in_nmi())) + return false; + + /* In kernel FPU usage already active? */ + if (this_cpu_read(in_kernel_fpu)) + return false; + + /* + * When not in NMI or hard interrupt context, FPU can be used in: + * + * - Task context except from within fpregs_lock()'ed critical + * regions. + * + * - Soft interrupt processing context which cannot happen + * while in a fpregs_lock()'ed critical region. + */ + if (!in_hardirq()) + return true; + + /* + * In hard interrupt context it's safe when soft interrupts + * are enabled, which means the interrupt did not hit in + * a fpregs_lock()'ed critical region. + */ + return !softirq_count(); } EXPORT_SYMBOL(irq_fpu_usable);
From: Kyle Huey me@kylehuey.com
commit 5eb849322d7f7ae9d5c587c7bc3b4f7c6872cd2f upstream.
Zen renumbered some of the performance counters that correspond to the well known events in perf_hw_id. This code in KVM was never updated for that, so guest that attempt to use counters on Zen that correspond to the pre-Zen perf_hw_id values will silently receive the wrong values.
This has been observed in the wild with rr[0] when running in Zen 3 guests. rr uses the retired conditional branch counter 00d1 which is incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
Signed-off-by: Kyle Huey me@kylehuey.com Message-Id: 20220503050136.86298-1-khuey@kylehuey.com Cc: stable@vger.kernel.org [Check guest family, not host. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/pmu.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-)
--- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -44,6 +44,22 @@ static struct kvm_event_hw_type_mapping [7] = { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }, };
+/* duplicated from amd_f17h_perfmon_event_map. */ +static struct kvm_event_hw_type_mapping amd_f17h_event_mapping[] = { + [0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES }, + [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS }, + [2] = { 0x60, 0xff, PERF_COUNT_HW_CACHE_REFERENCES }, + [3] = { 0x64, 0x09, PERF_COUNT_HW_CACHE_MISSES }, + [4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS }, + [5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES }, + [6] = { 0x87, 0x02, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }, + [7] = { 0x87, 0x01, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }, +}; + +/* amd_pmc_perf_hw_id depends on these being the same size */ +static_assert(ARRAY_SIZE(amd_event_mapping) == + ARRAY_SIZE(amd_f17h_event_mapping)); + static unsigned int get_msr_base(struct kvm_pmu *pmu, enum pmu_type type) { struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu); @@ -136,19 +152,25 @@ static inline struct kvm_pmc *get_gp_pmc
static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc) { + struct kvm_event_hw_type_mapping *event_mapping; u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT; u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8; int i;
+ if (guest_cpuid_family(pmc->vcpu) >= 0x17) + event_mapping = amd_f17h_event_mapping; + else + event_mapping = amd_event_mapping; + for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++) - if (amd_event_mapping[i].eventsel == event_select - && amd_event_mapping[i].unit_mask == unit_mask) + if (event_mapping[i].eventsel == event_select + && event_mapping[i].unit_mask == unit_mask) break;
if (i == ARRAY_SIZE(amd_event_mapping)) return PERF_COUNT_HW_MAX;
- return amd_event_mapping[i].event_type; + return event_mapping[i].event_type; }
/* return PERF_COUNT_HW_MAX as AMD doesn't have fixed events */
From: David Stevens stevensd@chromium.org
commit 59bf3557cf2f8a469a554aea1e3d2c8e72a579f7 upstream.
Calculate the appropriate mask for non-size-aligned page selective invalidation. Since psi uses the mask value to mask out the lower order bits of the target address, properly flushing the iotlb requires using a mask value such that [pfn, pfn+pages) all lie within the flushed size-aligned region. This is not normally an issue because iova.c always allocates iovas that are aligned to their size. However, iovas which come from other sources (e.g. userspace via VFIO) may not be aligned.
To properly flush the IOTLB, both the start and end pfns need to be equal after applying the mask. That means that the most efficient mask to use is the index of the lowest bit that is equal where all higher bits are also equal. For example, if pfn=0x17f and pages=3, then end_pfn=0x181, so the smallest mask we can use is 8. Any differences above the highest bit of pages are due to carrying, so by xnor'ing pfn and end_pfn and then masking out the lower order bits based on pages, we get 0xffffff00, where the first set bit is the mask we want to use.
Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking") Cc: stable@vger.kernel.org Signed-off-by: David Stevens stevensd@chromium.org Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20220401022430.1262215-1-stevensd@google.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220410013533.3959168-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/intel/iommu.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
--- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1637,7 +1637,8 @@ static void iommu_flush_iotlb_psi(struct unsigned long pfn, unsigned int pages, int ih, int map) { - unsigned int mask = ilog2(__roundup_pow_of_two(pages)); + unsigned int aligned_pages = __roundup_pow_of_two(pages); + unsigned int mask = ilog2(aligned_pages); uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT; u16 did = domain->iommu_did[iommu->seq_id];
@@ -1649,10 +1650,30 @@ static void iommu_flush_iotlb_psi(struct if (domain_use_first_level(domain)) { domain_flush_piotlb(iommu, domain, addr, pages, ih); } else { + unsigned long bitmask = aligned_pages - 1; + + /* + * PSI masks the low order bits of the base address. If the + * address isn't aligned to the mask, then compute a mask value + * needed to ensure the target range is flushed. + */ + if (unlikely(bitmask & pfn)) { + unsigned long end_pfn = pfn + pages - 1, shared_bits; + + /* + * Since end_pfn <= pfn + bitmask, the only way bits + * higher than bitmask can differ in pfn and end_pfn is + * by carrying. This means after masking out bitmask, + * high bits starting with the first set bit in + * shared_bits are all equal in both pfn and end_pfn. + */ + shared_bits = ~(pfn ^ end_pfn) & ~bitmask; + mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG; + } + /* * Fallback to domain selective flush if no PSI support or - * the size is too big. PSI requires page size to be 2 ^ x, - * and the base address is naturally aligned to the size. + * the size is too big. */ if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
From: Nicolin Chen nicolinc@nvidia.com
commit 95d4782c34a60800ccf91d9f0703137d4367a2fc upstream.
The arm_smmu_mm_invalidate_range function is designed to be called by mm core for Shared Virtual Addressing purpose between IOMMU and CPU MMU. However, the ways of two subsystems defining their "end" addresses are slightly different. IOMMU defines its "end" address using the last address of an address range, while mm core defines that using the following address of an address range:
include/linux/mm_types.h: unsigned long vm_end; /* The first byte after our end address ...
This mismatch resulted in an incorrect calculation for size so it failed to be page-size aligned. Further, it caused a dead loop at "while (iova < end)" check in __arm_smmu_tlb_inv_range function.
This patch fixes the issue by doing the calculation correctly.
Fixes: 2f7e8c553e98 ("iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Jean-Philippe Brucker jean-philippe@linaro.org Link: https://lore.kernel.org/r/20220419210158.21320-1-nicolinc@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -183,7 +183,14 @@ static void arm_smmu_mm_invalidate_range { struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); struct arm_smmu_domain *smmu_domain = smmu_mn->domain; - size_t size = end - start + 1; + size_t size; + + /* + * The mm_types defines vm_end as the first byte after the end address, + * different from IOMMU subsystem using the last address of an address + * range. So do a simple translation here by calculating size correctly. + */ + size = end - start;
if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
From: Harry Wentland harry.wentland@amd.com
commit 3dfe85fa87b2a26bdbd292b66653bba065cf9941 upstream.
A faulty receiver might report an erroneous channel count. We should guard against reading beyond AUDIO_CHANNELS_COUNT as that would overflow the dpcd_pattern_period array.
Signed-off-by: Harry Wentland harry.wentland@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3118,7 +3118,7 @@ static void dp_test_get_audio_test_data( &dpcd_pattern_type.value, sizeof(dpcd_pattern_type));
- channel_count = dpcd_test_mode.bits.channel_count + 1; + channel_count = min(dpcd_test_mode.bits.channel_count + 1, AUDIO_CHANNELS_COUNT);
// read pattern periods for requested channels when sawTooth pattern is requested if (dpcd_pattern_type.value == AUDIO_TEST_PATTERN_SAWTOOTH ||
From: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com
commit 19965d8259fdabc6806da92adda49684f5bcbec5 upstream.
While technically Xen dom0 is a virtual machine too, it does have access to most of the hardware so it doesn't need to be considered a "passthrough". Commit b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough") changed how FB is accessed based on passthrough mode. This breaks amdgpu in Xen dom0 with message like this:
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
While the reason for this failure is unclear, the passthrough mode is not really necessary in Xen dom0 anyway. So, to unbreak booting affected kernels, disable passthrough mode in this case.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985 Fixes: b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough") Signed-off-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -24,6 +24,7 @@ #include <linux/module.h>
#include <drm/drm_drv.h> +#include <xen/xen.h>
#include "amdgpu.h" #include "amdgpu_ras.h" @@ -694,7 +695,8 @@ void amdgpu_detect_virtualization(struct adev->virt.caps |= AMDGPU_SRIOV_CAPS_ENABLE_IOV;
if (!reg) { - if (is_virtual_machine()) /* passthrough mode exclus sriov mod */ + /* passthrough mode exclus sriov mod */ + if (is_virtual_machine() && !xen_initial_domain()) adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE; }
From: Nick Kossifidis mick@ics.forth.gr
commit c6fe81191bd74f7e6ae9ce96a4837df9485f3ab8 upstream.
In case the DTB provided by the bootloader/BootROM is before the kernel image or outside /memory, we won't be able to access it through the linear mapping, and get a segfault on setup_arch(). Currently OpenSBI relocates DTB but that's not always the case (e.g. if FW_JUMP_FDT_ADDR is not specified), and it's also not the most portable approach since the default FW_JUMP_FDT_ADDR of the generic platform relocates the DTB at a specific offset that may not be available. To avoid this situation copy DTB so that it's visible through the linear mapping.
Signed-off-by: Nick Kossifidis mick@ics.forth.gr Link: https://lore.kernel.org/r/20220322132839.3653682-1-mick@ics.forth.gr Tested-by: Conor Dooley conor.dooley@microchip.com Fixes: f105aa940e78 ("riscv: add BUILTIN_DTB support for MMU-enabled targets") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
--- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -218,8 +218,25 @@ static void __init setup_bootmem(void) * early_init_fdt_reserve_self() since __pa() does * not work for DTB pointers that are fixmap addresses */ - if (!IS_ENABLED(CONFIG_BUILTIN_DTB)) - memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va)); + if (!IS_ENABLED(CONFIG_BUILTIN_DTB)) { + /* + * In case the DTB is not located in a memory region we won't + * be able to locate it later on via the linear mapping and + * get a segfault when accessing it via __va(dtb_early_pa). + * To avoid this situation copy DTB to a memory region. + * Note that memblock_phys_alloc will also reserve DTB region. + */ + if (!memblock_is_memory(dtb_early_pa)) { + size_t fdt_size = fdt_totalsize(dtb_early_va); + phys_addr_t new_dtb_early_pa = memblock_phys_alloc(fdt_size, PAGE_SIZE); + void *new_dtb_early_va = early_memremap(new_dtb_early_pa, fdt_size); + + memcpy(new_dtb_early_va, dtb_early_va, fdt_size); + early_memunmap(new_dtb_early_va, fdt_size); + _dtb_early_pa = new_dtb_early_pa; + } else + memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va)); + }
early_init_fdt_scan_reserved_mem(); dma_contiguous_reserve(dma32_phys_limit);
From: Trond Myklebust trond.myklebust@hammerspace.com
commit a3d0562d4dc039bca39445e1cddde7951662e17d upstream.
This reverts commit 7073ea8799a8cf73db60270986f14e4aae20fa80.
We must not try to connect the socket while the transport is under construction, because the mechanisms to safely tear it down are not in place. As the code stands, we end up leaking the sockets on a connection error.
Reported-by: wanghai (M) wanghai38@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/xprtsock.c | 3 --- 1 file changed, 3 deletions(-)
--- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2848,9 +2848,6 @@ static struct rpc_xprt *xs_setup_local(s } xprt_set_bound(xprt); xs_format_peer_addresses(xprt, "local", RPCBIND_NETID_LOCAL); - ret = ERR_PTR(xs_local_setup_socket(transport)); - if (ret) - goto out_err; break; default: ret = ERR_PTR(-EAFNOSUPPORT);
From: Kurt Kanzenbach kurt@linutronix.de
commit 2c33d775ef4c25c0e1e1cc0fd5496d02f76bfa20 upstream.
Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are used in tracing to retrieve timestamps, so they should not recurse.
Fixes: 4498e7467e9e ("time: Parametrize all tk_fast_mono users") Fixes: f09cb9a1808e ("time: Introduce tk_fast_raw") Reported-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Kurt Kanzenbach kurt@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220426175338.3807ca4f@gandalf.local.home/ Link: https://lore.kernel.org/r/20220428062432.61063-1-kurt@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/timekeeping.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -482,7 +482,7 @@ static __always_inline u64 __ktime_get_f * of the following timestamps. Callers need to be aware of that and * deal with it. */ -u64 ktime_get_mono_fast_ns(void) +u64 notrace ktime_get_mono_fast_ns(void) { return __ktime_get_fast_ns(&tk_fast_mono); } @@ -494,7 +494,7 @@ EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns * Contrary to ktime_get_mono_fast_ns() this is always correct because the * conversion factor is not affected by NTP/PTP correction. */ -u64 ktime_get_raw_fast_ns(void) +u64 notrace ktime_get_raw_fast_ns(void) { return __ktime_get_fast_ns(&tk_fast_raw); }
From: Chengfeng Ye cyeaa@connect.ust.hk
commit b7c81f80246fac44077166f3e07103affe6db8ff upstream.
&e->event and e point to the same address, and &e->event could be freed in queue_event. So there is a potential uaf issue if we dereference e after calling queue_event(). Fix this by adding a temporary variable to maintain e->client in advance, this can avoid the potential uaf issue.
Cc: stable@vger.kernel.org Signed-off-by: Chengfeng Ye cyeaa@connect.ust.hk Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20220409041243.603210-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firewire/core-cdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/firewire/core-cdev.c +++ b/drivers/firewire/core-cdev.c @@ -1480,6 +1480,7 @@ static void outbound_phy_packet_callback { struct outbound_phy_packet_event *e = container_of(packet, struct outbound_phy_packet_event, p); + struct client *e_client;
switch (status) { /* expected: */ @@ -1496,9 +1497,10 @@ static void outbound_phy_packet_callback } e->phy_packet.data[0] = packet->timestamp;
+ e_client = e->client; queue_event(e->client, &e->event, &e->phy_packet, sizeof(e->phy_packet) + e->phy_packet.length, NULL, 0); - client_put(e->client); + client_put(e_client); }
static int ioctl_send_phy_packet(struct client *client, union ioctl_arg *arg)
From: Jakob Koschel jakobkoschel@gmail.com
commit 9423973869bd4632ffe669f950510c49296656e0 upstream.
When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element.
While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided.
In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1].
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWX... [1] Cc: stable@vger.kernel.org Signed-off-by: Jakob Koschel jakobkoschel@gmail.com Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20220409041243.603210-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firewire/core-transaction.c | 30 ++++++++++++++++-------------- drivers/firewire/sbp2.c | 13 +++++++------ 2 files changed, 23 insertions(+), 20 deletions(-)
--- a/drivers/firewire/core-transaction.c +++ b/drivers/firewire/core-transaction.c @@ -73,24 +73,25 @@ static int try_cancel_split_timeout(stru static int close_transaction(struct fw_transaction *transaction, struct fw_card *card, int rcode) { - struct fw_transaction *t; + struct fw_transaction *t = NULL, *iter; unsigned long flags;
spin_lock_irqsave(&card->lock, flags); - list_for_each_entry(t, &card->transaction_list, link) { - if (t == transaction) { - if (!try_cancel_split_timeout(t)) { + list_for_each_entry(iter, &card->transaction_list, link) { + if (iter == transaction) { + if (!try_cancel_split_timeout(iter)) { spin_unlock_irqrestore(&card->lock, flags); goto timed_out; } - list_del_init(&t->link); - card->tlabel_mask &= ~(1ULL << t->tlabel); + list_del_init(&iter->link); + card->tlabel_mask &= ~(1ULL << iter->tlabel); + t = iter; break; } } spin_unlock_irqrestore(&card->lock, flags);
- if (&t->link != &card->transaction_list) { + if (t) { t->callback(card, rcode, NULL, 0, t->callback_data); return 0; } @@ -935,7 +936,7 @@ EXPORT_SYMBOL(fw_core_handle_request);
void fw_core_handle_response(struct fw_card *card, struct fw_packet *p) { - struct fw_transaction *t; + struct fw_transaction *t = NULL, *iter; unsigned long flags; u32 *data; size_t data_length; @@ -947,20 +948,21 @@ void fw_core_handle_response(struct fw_c rcode = HEADER_GET_RCODE(p->header[1]);
spin_lock_irqsave(&card->lock, flags); - list_for_each_entry(t, &card->transaction_list, link) { - if (t->node_id == source && t->tlabel == tlabel) { - if (!try_cancel_split_timeout(t)) { + list_for_each_entry(iter, &card->transaction_list, link) { + if (iter->node_id == source && iter->tlabel == tlabel) { + if (!try_cancel_split_timeout(iter)) { spin_unlock_irqrestore(&card->lock, flags); goto timed_out; } - list_del_init(&t->link); - card->tlabel_mask &= ~(1ULL << t->tlabel); + list_del_init(&iter->link); + card->tlabel_mask &= ~(1ULL << iter->tlabel); + t = iter; break; } } spin_unlock_irqrestore(&card->lock, flags);
- if (&t->link == &card->transaction_list) { + if (!t) { timed_out: fw_notice(card, "unsolicited response (source %x, tlabel %x)\n", source, tlabel); --- a/drivers/firewire/sbp2.c +++ b/drivers/firewire/sbp2.c @@ -408,7 +408,7 @@ static void sbp2_status_write(struct fw_ void *payload, size_t length, void *callback_data) { struct sbp2_logical_unit *lu = callback_data; - struct sbp2_orb *orb; + struct sbp2_orb *orb = NULL, *iter; struct sbp2_status status; unsigned long flags;
@@ -433,17 +433,18 @@ static void sbp2_status_write(struct fw_
/* Lookup the orb corresponding to this status write. */ spin_lock_irqsave(&lu->tgt->lock, flags); - list_for_each_entry(orb, &lu->orb_list, link) { + list_for_each_entry(iter, &lu->orb_list, link) { if (STATUS_GET_ORB_HIGH(status) == 0 && - STATUS_GET_ORB_LOW(status) == orb->request_bus) { - orb->rcode = RCODE_COMPLETE; - list_del(&orb->link); + STATUS_GET_ORB_LOW(status) == iter->request_bus) { + iter->rcode = RCODE_COMPLETE; + list_del(&iter->link); + orb = iter; break; } } spin_unlock_irqrestore(&lu->tgt->lock, flags);
- if (&orb->link != &lu->orb_list) { + if (orb) { orb->callback(orb, &status); kref_put(&orb->kref, free_orb); /* orb callback reference */ } else {
From: Niels Dossche dossche.niels@gmail.com
commit a7ecbe92b9243edbe94772f6f2c854e4142a3345 upstream.
card->local_node and card->bm_retries are both always accessed under card->lock. fw_core_handle_bus_reset has a check whose condition depends on card->local_node and whose body writes to card->bm_retries. Both of these accesses are not under card->lock. Move the lock acquiring of card->lock to before this check such that these accesses do happen when card->lock is held. fw_destroy_nodes is called inside the check. Since fw_destroy_nodes already acquires card->lock inside its function body, move this out to the callsites of fw_destroy_nodes. Also add a comment to indicate which locking is necessary when calling fw_destroy_nodes.
Cc: stable@vger.kernel.org Signed-off-by: Niels Dossche dossche.niels@gmail.com Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20220409041243.603210-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firewire/core-card.c | 3 +++ drivers/firewire/core-topology.c | 9 +++------ 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/firewire/core-card.c +++ b/drivers/firewire/core-card.c @@ -668,6 +668,7 @@ EXPORT_SYMBOL_GPL(fw_card_release); void fw_core_remove_card(struct fw_card *card) { struct fw_card_driver dummy_driver = dummy_driver_template; + unsigned long flags;
card->driver->update_phy_reg(card, 4, PHY_LINK_ACTIVE | PHY_CONTENDER, 0); @@ -682,7 +683,9 @@ void fw_core_remove_card(struct fw_card dummy_driver.stop_iso = card->driver->stop_iso; card->driver = &dummy_driver;
+ spin_lock_irqsave(&card->lock, flags); fw_destroy_nodes(card); + spin_unlock_irqrestore(&card->lock, flags);
/* Wait for all users, especially device workqueue jobs, to finish. */ fw_card_put(card); --- a/drivers/firewire/core-topology.c +++ b/drivers/firewire/core-topology.c @@ -375,16 +375,13 @@ static void report_found_node(struct fw_ card->bm_retries = 0; }
+/* Must be called with card->lock held */ void fw_destroy_nodes(struct fw_card *card) { - unsigned long flags; - - spin_lock_irqsave(&card->lock, flags); card->color++; if (card->local_node != NULL) for_each_fw_node(card, card->local_node, report_lost_node); card->local_node = NULL; - spin_unlock_irqrestore(&card->lock, flags); }
static void move_tree(struct fw_node *node0, struct fw_node *node1, int port) @@ -510,6 +507,8 @@ void fw_core_handle_bus_reset(struct fw_ struct fw_node *local_node; unsigned long flags;
+ spin_lock_irqsave(&card->lock, flags); + /* * If the selfID buffer is not the immediate successor of the * previously processed one, we cannot reliably compare the @@ -521,8 +520,6 @@ void fw_core_handle_bus_reset(struct fw_ card->bm_retries = 0; }
- spin_lock_irqsave(&card->lock, flags); - card->broadcast_channel_allocated = card->broadcast_channel_auto_allocated; card->node_id = node_id; /*
From: Tan Tee Min tee.min.tan@linux.intel.com
commit 47f753c1108e287edb3e27fad8a7511a9d55578e upstream.
Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation of Split Header (SPH) feature is not supported for Ipv4 fragmented packet. This SPH limitation will cause ping failure when the packets size exceed the MTU size. For example, the issue happens once the basic ping packet size is larger than the configured MTU size and the data is lost inside the fragmented packet, replaced by zeros/corrupted values, and leads to ping fail.
So, disable the Split Header for Intel platforms.
v2: Add fixes tag in commit message.
Fixes: 67afd6d1cfdf("net: stmmac: Add Split Header support and enable it in XGMAC cores") Cc: stable@vger.kernel.org # 5.10.x Suggested-by: Ong, Boon Leong boon.leong.ong@intel.com Signed-off-by: Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com Signed-off-by: Wong Vee Khee vee.khee.wong@linux.intel.com Signed-off-by: Tan Tee Min tee.min.tan@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- include/linux/stmmac.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c @@ -454,6 +454,7 @@ static int intel_mgbe_common_data(struct plat->has_gmac4 = 1; plat->force_sf_dma_mode = 0; plat->tso_en = 1; + plat->sph_disable = 1;
/* Multiplying factor to the clk_eee_i clock time * period to make it closer to 100 ns. This value --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7081,7 +7081,7 @@ int stmmac_dvr_probe(struct device *devi dev_info(priv->device, "TSO feature enabled\n"); }
- if (priv->dma_cap.sphen) { + if (priv->dma_cap.sphen && !priv->plat->sph_disable) { ndev->hw_features |= NETIF_F_GRO; priv->sph_cap = true; priv->sph = priv->sph_cap; --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -269,5 +269,6 @@ struct plat_stmmacenet_data { int msi_rx_base_vec; int msi_tx_base_vec; bool use_phy_wol; + bool sph_disable; }; #endif
From: Thomas Pfaff tpfaff@pcs.com
commit 8707898e22fd665bc1d7b18b809be4b56ce25bdd upstream.
A kernel hang can be observed when running setserial in a loop on a kernel with force threaded interrupts. The sequence of events is:
setserial open("/dev/ttyXXX") request_irq() do_stuff() -> serial interrupt -> wake(irq_thread) desc->threads_active++; close() free_irq() kthread_stop(irq_thread) synchronize_irq() <- hangs because desc->threads_active != 0
The thread is created in request_irq() and woken up, but does not get on a CPU to reach the actual thread function, which would handle the pending wake-up. kthread_stop() sets the should stop condition which makes the thread immediately exit, which in turn leaves the stale threads_active count around.
This problem was introduced with commit 519cc8652b3a, which addressed a interrupt sharing issue in the PCIe code.
Before that commit free_irq() invoked synchronize_irq(), which waits for the hard interrupt handler and also for associated threads to complete.
To address the PCIe issue synchronize_irq() was replaced with __synchronize_hardirq(), which only waits for the hard interrupt handler to complete, but not for threaded handlers.
This was done under the assumption, that the interrupt thread already reached the thread function and waits for a wake-up, which is guaranteed to be handled before acting on the stop condition. The problematic case, that the thread would not reach the thread function, was obviously overlooked.
Make sure that the interrupt thread is really started and reaches thread_fn() before returning from __setup_irq().
This utilizes the existing wait queue in the interrupt descriptor. The wait queue is unused for non-shared interrupts. For shared interrupts the usage might cause a spurious wake-up of a waiter in synchronize_irq() or the completion of a threaded handler might cause a spurious wake-up of the waiter for the ready flag. Both are harmless and have no functional impact.
[ tglx: Amended changelog ]
Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()") Signed-off-by: Thomas Pfaff tpfaff@pcs.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/internals.h | 2 ++ kernel/irq/irqdesc.c | 2 ++ kernel/irq/manage.c | 39 +++++++++++++++++++++++++++++---------- 3 files changed, 33 insertions(+), 10 deletions(-)
--- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -29,12 +29,14 @@ extern struct irqaction chained_action; * IRQTF_WARNED - warning "IRQ_WAKE_THREAD w/o thread_fn" has been printed * IRQTF_AFFINITY - irq thread is requested to adjust affinity * IRQTF_FORCED_THREAD - irq action is force threaded + * IRQTF_READY - signals that irq thread is ready */ enum { IRQTF_RUNTHREAD, IRQTF_WARNED, IRQTF_AFFINITY, IRQTF_FORCED_THREAD, + IRQTF_READY, };
/* --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -407,6 +407,7 @@ static struct irq_desc *alloc_desc(int i lockdep_set_class(&desc->lock, &irq_desc_lock_class); mutex_init(&desc->request_mutex); init_rcu_head(&desc->rcu); + init_waitqueue_head(&desc->wait_for_threads);
desc_set_defaults(irq, desc, node, affinity, owner); irqd_set(&desc->irq_data, flags); @@ -575,6 +576,7 @@ int __init early_irq_init(void) raw_spin_lock_init(&desc[i].lock); lockdep_set_class(&desc[i].lock, &irq_desc_lock_class); mutex_init(&desc[i].request_mutex); + init_waitqueue_head(&desc[i].wait_for_threads); desc_set_defaults(i, &desc[i], node, NULL, NULL); } return arch_early_irq_init(); --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1249,6 +1249,31 @@ static void irq_wake_secondary(struct ir }
/* + * Internal function to notify that a interrupt thread is ready. + */ +static void irq_thread_set_ready(struct irq_desc *desc, + struct irqaction *action) +{ + set_bit(IRQTF_READY, &action->thread_flags); + wake_up(&desc->wait_for_threads); +} + +/* + * Internal function to wake up a interrupt thread and wait until it is + * ready. + */ +static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc, + struct irqaction *action) +{ + if (!action || !action->thread) + return; + + wake_up_process(action->thread); + wait_event(desc->wait_for_threads, + test_bit(IRQTF_READY, &action->thread_flags)); +} + +/* * Interrupt handler thread */ static int irq_thread(void *data) @@ -1259,6 +1284,8 @@ static int irq_thread(void *data) irqreturn_t (*handler_fn)(struct irq_desc *desc, struct irqaction *action);
+ irq_thread_set_ready(desc, action); + if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD, &action->thread_flags)) handler_fn = irq_forced_thread_fn; @@ -1683,8 +1710,6 @@ __setup_irq(unsigned int irq, struct irq }
if (!shared) { - init_waitqueue_head(&desc->wait_for_threads); - /* Setup the type (level, edge polarity) if configured: */ if (new->flags & IRQF_TRIGGER_MASK) { ret = __irq_set_trigger(desc, @@ -1780,14 +1805,8 @@ __setup_irq(unsigned int irq, struct irq
irq_setup_timings(desc, new);
- /* - * Strictly no need to wake it up, but hung_task complains - * when no hard interrupt wakes the thread up. - */ - if (new->thread) - wake_up_process(new->thread); - if (new->secondary) - wake_up_process(new->secondary->thread); + wake_up_and_wait_for_irq_thread_ready(desc, new); + wake_up_and_wait_for_irq_thread_ready(desc, new->secondary);
register_irq_proc(irq, desc); new->dir = NULL;
From: Mark Brown broonie@kernel.org
commit 08ef48404965cfef99343d6bbbcf75b88c74aa0e upstream.
The tone generator frequency control just returns 0 on successful write, not a boolean value indicating if there was a change or not. Compare what was written with the value that was there previously so that notifications are generated appropriately when the value changes.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Adam Thomson Adam.Thomson.Opensource@diasemi.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220420133437.569229-1-broonie@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/da7219.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/sound/soc/codecs/da7219.c +++ b/sound/soc/codecs/da7219.c @@ -446,7 +446,7 @@ static int da7219_tonegen_freq_put(struc struct soc_mixer_control *mixer_ctrl = (struct soc_mixer_control *) kcontrol->private_value; unsigned int reg = mixer_ctrl->reg; - __le16 val; + __le16 val_new, val_old; int ret;
/* @@ -454,13 +454,19 @@ static int da7219_tonegen_freq_put(struc * Therefore we need to convert to little endian here to align with * HW registers. */ - val = cpu_to_le16(ucontrol->value.integer.value[0]); + val_new = cpu_to_le16(ucontrol->value.integer.value[0]);
mutex_lock(&da7219->ctrl_lock); - ret = regmap_raw_write(da7219->regmap, reg, &val, sizeof(val)); + ret = regmap_raw_read(da7219->regmap, reg, &val_old, sizeof(val_old)); + if (ret == 0 && (val_old != val_new)) + ret = regmap_raw_write(da7219->regmap, reg, + &val_new, sizeof(val_new)); mutex_unlock(&da7219->ctrl_lock);
- return ret; + if (ret < 0) + return ret; + + return val_old != val_new; }
From: Mark Brown broonie@kernel.org
commit b4f5c6b2e52b27462c0599e64e96e53b58438de1 upstream.
The WM8958 DSP controls all return 0 on successful write, not a boolean value indicating if the write changed the value of the control. Fix this by returning 1 after a change, there is already a check at the start of each put() that skips the function in the case that there is no change.
Signed-off-by: Mark Brown broonie@kernel.org Acked-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/20220416125408.197440-1-broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/wm8958-dsp2.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/sound/soc/codecs/wm8958-dsp2.c +++ b/sound/soc/codecs/wm8958-dsp2.c @@ -530,7 +530,7 @@ static int wm8958_mbc_put(struct snd_kco
wm8958_dsp_apply(component, mbc, wm8994->mbc_ena[mbc]);
- return 0; + return 1; }
#define WM8958_MBC_SWITCH(xname, xval) {\ @@ -656,7 +656,7 @@ static int wm8958_vss_put(struct snd_kco
wm8958_dsp_apply(component, vss, wm8994->vss_ena[vss]);
- return 0; + return 1; }
@@ -730,7 +730,7 @@ static int wm8958_hpf_put(struct snd_kco
wm8958_dsp_apply(component, hpf % 3, ucontrol->value.integer.value[0]);
- return 0; + return 1; }
#define WM8958_HPF_SWITCH(xname, xval) {\ @@ -824,7 +824,7 @@ static int wm8958_enh_eq_put(struct snd_
wm8958_dsp_apply(component, eq, ucontrol->value.integer.value[0]);
- return 0; + return 1; }
#define WM8958_ENH_EQ_SWITCH(xname, xval) {\
From: Mark Brown broonie@kernel.org
commit 2e3a0d1bfa95b54333f7add3e50e288769373873 upstream.
The AIU ACODEC has a custom put() operation which returns 0 when the value of the mux changes, meaning that events are not generated for userspace. Change to return 1 in this case, the function returns early in the case where there is no change.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Jerome Brunet jbrunet@baylibre.com Link: https://lore.kernel.org/r/20220421123803.292063-2-broonie@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/meson/aiu-acodec-ctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/meson/aiu-acodec-ctrl.c +++ b/sound/soc/meson/aiu-acodec-ctrl.c @@ -58,7 +58,7 @@ static int aiu_acodec_ctrl_mux_put_enum(
snd_soc_dapm_mux_update_power(dapm, kcontrol, mux, e, NULL);
- return 0; + return 1; }
static SOC_ENUM_SINGLE_DECL(aiu_acodec_ctrl_mux_enum, AIU_ACODEC_CTRL,
From: Mark Brown broonie@kernel.org
commit 12131008fc13ff7f7690d170b7a8f72d24fd7d1e upstream.
The G12A tohdmi has a custom put() operation which returns 0 when the value of the mux changes, meaning that events are not generated for userspace. Change to return 1 in this case, the function returns early in the case where there is no change.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Jerome Brunet jbrunet@baylibre.com Link: https://lore.kernel.org/r/20220421123803.292063-4-broonie@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/meson/g12a-tohdmitx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/meson/g12a-tohdmitx.c +++ b/sound/soc/meson/g12a-tohdmitx.c @@ -67,7 +67,7 @@ static int g12a_tohdmitx_i2s_mux_put_enu
snd_soc_dapm_mux_update_power(dapm, kcontrol, mux, e, NULL);
- return 0; + return 1; }
static SOC_ENUM_SINGLE_DECL(g12a_tohdmitx_i2s_mux_enum, TOHDMITX_CTRL0,
From: Mark Brown broonie@kernel.org
commit fce49921a22262736cdc3cc74fa67915b75e9363 upstream.
The AIU CODEC has a custom put() operation which returns 0 when the value of the mux changes, meaning that events are not generated for userspace. Change to return 1 in this case, the function returns early in the case where there is no change.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Jerome Brunet jbrunet@baylibre.com Link: https://lore.kernel.org/r/20220421123803.292063-3-broonie@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/meson/aiu-codec-ctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/meson/aiu-codec-ctrl.c +++ b/sound/soc/meson/aiu-codec-ctrl.c @@ -57,7 +57,7 @@ static int aiu_codec_ctrl_mux_put_enum(s
snd_soc_dapm_mux_update_power(dapm, kcontrol, mux, e, NULL);
- return 0; + return 1; }
static SOC_ENUM_SINGLE_DECL(aiu_hdmi_ctrl_mux_enum, AIU_HDMI_CLK_DATA_CTRL,
From: Stefan Haberland sth@linux.ibm.com
commit 5b53a405e4658580e1faf7c217db3f55a21ba849 upstream.
For ESE devices we get an error when accessing an unformatted track. The handling of this error will return zero data for read requests and format the track on demand before writing to it. To do this the code needs to distinguish between read and write requests. This is done with data from the blocklayer request. A pointer to the blocklayer request is stored in the CQR.
If there is an error on the device an ERP request is built to do error recovery. While the ERP request is mostly a copy of the original CQR the pointer to the blocklayer request is not copied to not accidentally pass it back to the blocklayer without cleanup.
This leads to the error that during ESE handling after an ERP request was built it is not possible to determine the IO direction. This leads to the formatting of a track for read requests which might in turn lead to data corruption.
Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Stefan Haberland sth@linux.ibm.com Reviewed-by: Jan Hoeppner hoeppner@linux.ibm.com Link: https://lore.kernel.org/r/20220505141733.1989450-2-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd.c | 8 +++++++- drivers/s390/block/dasd_eckd.c | 2 +- drivers/s390/block/dasd_int.h | 12 ++++++++++++ 3 files changed, 20 insertions(+), 2 deletions(-)
--- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -1639,6 +1639,7 @@ void dasd_int_handler(struct ccw_device unsigned long now; int nrf_suppressed = 0; int fp_suppressed = 0; + struct request *req; u8 *sense = NULL; int expires;
@@ -1739,7 +1740,12 @@ void dasd_int_handler(struct ccw_device }
if (dasd_ese_needs_format(cqr->block, irb)) { - if (rq_data_dir((struct request *)cqr->callback_data) == READ) { + req = dasd_get_callback_data(cqr); + if (!req) { + cqr->status = DASD_CQR_ERROR; + return; + } + if (rq_data_dir(req) == READ) { device->discipline->ese_read(cqr, irb); cqr->status = DASD_CQR_SUCCESS; cqr->stopclk = now; --- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -3157,7 +3157,7 @@ dasd_eckd_ese_format(struct dasd_device sector_t curr_trk; int rc;
- req = cqr->callback_data; + req = dasd_get_callback_data(cqr); block = cqr->block; base = block->base; private = base->private; --- a/drivers/s390/block/dasd_int.h +++ b/drivers/s390/block/dasd_int.h @@ -757,6 +757,18 @@ dasd_check_blocksize(int bsize) return 0; }
+/* + * return the callback data of the original request in case there are + * ERP requests build on top of it + */ +static inline void *dasd_get_callback_data(struct dasd_ccw_req *cqr) +{ + while (cqr->refers) + cqr = cqr->refers; + + return cqr->callback_data; +} + /* externals in dasd.c */ #define DASD_PROFILE_OFF 0 #define DASD_PROFILE_ON 1
From: Stefan Haberland sth@linux.ibm.com
commit 71f3871657370dbbaf942a1c758f64e49a36c70f upstream.
For ESE devices we get an error for write operations on an unformatted track. Afterwards the track will be formatted and the IO operation restarted. When using alias devices a track might be accessed by multiple requests simultaneously and there is a race window that a track gets formatted twice resulting in data loss.
Prevent this by remembering the amount of formatted tracks when starting a request and comparing this number before actually formatting a track on the fly. If the number has changed there is a chance that the current track was finally formatted in between. As a result do not format the track and restart the current IO to check.
The number of formatted tracks does not match the overall number of formatted tracks on the device and it might wrap around but this is no problem. It is only needed to recognize that a track has been formatted at all in between.
Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Stefan Haberland sth@linux.ibm.com Reviewed-by: Jan Hoeppner hoeppner@linux.ibm.com Link: https://lore.kernel.org/r/20220505141733.1989450-3-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd.c | 7 +++++++ drivers/s390/block/dasd_eckd.c | 19 +++++++++++++++++-- drivers/s390/block/dasd_int.h | 2 ++ 3 files changed, 26 insertions(+), 2 deletions(-)
--- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -1422,6 +1422,13 @@ int dasd_start_IO(struct dasd_ccw_req *c if (!cqr->lpm) cqr->lpm = dasd_path_get_opm(device); } + /* + * remember the amount of formatted tracks to prevent double format on + * ESE devices + */ + if (cqr->block) + cqr->trkcount = atomic_read(&cqr->block->trkcount); + if (cqr->cpmode == 1) { rc = ccw_device_tm_start(device->cdev, cqr->cpaddr, (long) cqr, cqr->lpm); --- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -3095,13 +3095,24 @@ static int dasd_eckd_format_device(struc }
static bool test_and_set_format_track(struct dasd_format_entry *to_format, - struct dasd_block *block) + struct dasd_ccw_req *cqr) { + struct dasd_block *block = cqr->block; struct dasd_format_entry *format; unsigned long flags; bool rc = false;
spin_lock_irqsave(&block->format_lock, flags); + if (cqr->trkcount != atomic_read(&block->trkcount)) { + /* + * The number of formatted tracks has changed after request + * start and we can not tell if the current track was involved. + * To avoid data corruption treat it as if the current track is + * involved + */ + rc = true; + goto out; + } list_for_each_entry(format, &block->format_list, list) { if (format->track == to_format->track) { rc = true; @@ -3121,6 +3132,7 @@ static void clear_format_track(struct da unsigned long flags;
spin_lock_irqsave(&block->format_lock, flags); + atomic_inc(&block->trkcount); list_del_init(&format->list); spin_unlock_irqrestore(&block->format_lock, flags); } @@ -3182,8 +3194,11 @@ dasd_eckd_ese_format(struct dasd_device } format->track = curr_trk; /* test if track is already in formatting by another thread */ - if (test_and_set_format_track(format, block)) + if (test_and_set_format_track(format, cqr)) { + /* this is no real error so do not count down retries */ + cqr->retries++; return ERR_PTR(-EEXIST); + }
fdata.start_unit = curr_trk; fdata.stop_unit = curr_trk; --- a/drivers/s390/block/dasd_int.h +++ b/drivers/s390/block/dasd_int.h @@ -188,6 +188,7 @@ struct dasd_ccw_req { void (*callback)(struct dasd_ccw_req *, void *data); void *callback_data; unsigned int proc_bytes; /* bytes for partial completion */ + unsigned int trkcount; /* count formatted tracks */ };
/* @@ -611,6 +612,7 @@ struct dasd_block {
struct list_head format_list; spinlock_t format_lock; + atomic_t trkcount; };
struct dasd_attention_data {
From: Jan Höppner hoeppner@linux.ibm.com
commit cd68c48ea15c85f1577a442dc4c285e112ff1b37 upstream.
When reading unformatted tracks on ESE devices, the corresponding memory areas are simply set to zero for each segment. This is done incorrectly for blocksizes < 4096.
There are two problems. First, the increment of dst is done using the counter of the loop (off), which is increased by blksize every iteration. This leads to a much bigger increment for dst as actually intended. Second, the increment of dst is done before the memory area is set to 0, skipping a significant amount of bytes of memory.
This leads to illegal overwriting of memory and ultimately to a kernel panic.
This is not a problem with 4k blocksize because blk_queue_max_segment_size is set to PAGE_SIZE, always resulting in a single iteration for the inner segment loop (bv.bv_len == blksize). The incorrectly used 'off' value to increment dst is 0 and the correct memory area is used.
In order to fix this for blksize < 4k, increment dst correctly using the blksize and only do it at the end of the loop.
Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Jan Höppner hoeppner@linux.ibm.com Reviewed-by: Stefan Haberland sth@linux.ibm.com Link: https://lore.kernel.org/r/20220505141733.1989450-4-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd_eckd.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -3297,12 +3297,11 @@ static int dasd_eckd_ese_read(struct das cqr->proc_bytes = blk_count * blksize; return 0; } - if (dst && !skip_block) { - dst += off; + if (dst && !skip_block) memset(dst, 0, blksize); - } else { + else skip_block--; - } + dst += blksize; blk_count++; } }
From: Jan Höppner hoeppner@linux.ibm.com
commit b9c10f68e23c13f56685559a0d6fdaca9f838324 upstream.
Read requests that return with NRF error are partially completed in dasd_eckd_ese_read(). The function keeps track of the amount of processed bytes and the driver will eventually return this information back to the block layer for further processing via __dasd_cleanup_cqr() when the request is in the final stage of processing (from the driver's perspective).
For this, blk_update_request() is used which requires the number of bytes to complete the request. As per documentation the nr_bytes parameter is described as follows: "number of bytes to complete for @req".
This was mistakenly interpreted as "number of bytes _left_ for @req" leading to new requests with incorrect data length. The consequence are inconsistent and completely wrong read requests as data from random memory areas are read back.
Fix this by correctly specifying the amount of bytes that should be used to complete the request.
Fixes: 5e6bdd37c552 ("s390/dasd: fix data corruption for thin provisioned devices") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Jan Höppner hoeppner@linux.ibm.com Reviewed-by: Stefan Haberland sth@linux.ibm.com Link: https://lore.kernel.org/r/20220505141733.1989450-5-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -2775,8 +2775,7 @@ static void __dasd_cleanup_cqr(struct da * complete a request partially. */ if (proc_bytes) { - blk_update_request(req, BLK_STS_OK, - blk_rq_bytes(req) - proc_bytes); + blk_update_request(req, BLK_STS_OK, proc_bytes); blk_mq_requeue_request(req, true); } else if (likely(!blk_should_fake_timeout(req->q))) { blk_mq_complete_request(req);
From: Duoming Zhou duoming@zju.edu.cn
commit 47f070a63e735bcc8d481de31be1b5a1aa62b31c upstream.
There are deadlocks caused by del_timer_sync(&priv->hang_timer) and del_timer_sync(&priv->rr_timer) in grcan_close(), one of the deadlocks are shown below:
(Thread 1) | (Thread 2) | grcan_reset_timer() grcan_close() | mod_timer() spin_lock_irqsave() //(1) | (wait a time) ... | grcan_initiate_running_reset() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ...
We hold priv->lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need priv->lock in position (2) of thread 2. As a result, grcan_close() will block forever.
This patch extracts del_timer_sync() from the protection of spin_lock_irqsave(), which could let timer handler to obtain the needed lock.
Link: https://lore.kernel.org/all/20220425042400.66517-1-duoming@zju.edu.cn Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores") Cc: stable@vger.kernel.org Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Andreas Larsson andreas@gaisler.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/grcan.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/can/grcan.c +++ b/drivers/net/can/grcan.c @@ -1113,8 +1113,10 @@ static int grcan_close(struct net_device
priv->closing = true; if (priv->need_txbug_workaround) { + spin_unlock_irqrestore(&priv->lock, flags); del_timer_sync(&priv->hang_timer); del_timer_sync(&priv->rr_timer); + spin_lock_irqsave(&priv->lock, flags); } netif_stop_queue(dev); grcan_stop_hardware(dev);
From: Oliver Hartkopp socketcan@hartkopp.net
commit 72ed3ee9fa0b461ad086403a8b5336154bd82234 upstream.
As a carry over from the CAN_RAW socket (which allows to change the CAN interface while mantaining the filter setup) the re-binding of the CAN_ISOTP socket needs to take care about CAN ID address information and subscriptions. It turned out that this feature is so limited (e.g. the sockopts remain fix) that it finally has never been needed/used.
In opposite to the stateless CAN_RAW socket the switching of the CAN ID subscriptions might additionally lead to an interrupted ongoing PDU reception. So better remove this unneeded complexity.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/20220422082337.1676-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/isotp.c | 22 +++++----------------- 1 file changed, 5 insertions(+), 17 deletions(-)
--- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -1146,6 +1146,11 @@ static int isotp_bind(struct socket *soc
lock_sock(sk);
+ if (so->bound) { + err = -EINVAL; + goto out; + } + /* do not register frame reception for functional addressing */ if (so->opt.flags & CAN_ISOTP_SF_BROADCAST) do_rx_reg = 0; @@ -1156,10 +1161,6 @@ static int isotp_bind(struct socket *soc goto out; }
- if (so->bound && addr->can_ifindex == so->ifindex && - rx_id == so->rxid && tx_id == so->txid) - goto out; - dev = dev_get_by_index(net, addr->can_ifindex); if (!dev) { err = -ENODEV; @@ -1186,19 +1187,6 @@ static int isotp_bind(struct socket *soc
dev_put(dev);
- if (so->bound && do_rx_reg) { - /* unregister old filter */ - if (so->ifindex) { - dev = dev_get_by_index(net, so->ifindex); - if (dev) { - can_rx_unregister(net, dev, so->rxid, - SINGLE_MASK(so->rxid), - isotp_rcv, sk); - dev_put(dev); - } - } - } - /* switch to new settings */ so->ifindex = ifindex; so->rxid = rx_id;
From: Daniel Hellstrom daniel@gaisler.com
commit 101da4268626b00d16356a6bf284d66e44c46ff9 upstream.
Use the device of the device tree node should be rather than the device of the struct net_device when allocating DMA buffers.
The driver got away with it on sparc32 until commit 53b7670e5735 ("sparc: factor the dma coherent mapping into helper") after which the driver oopses.
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores") Link: https://lore.kernel.org/all/20220429084656.29788-2-andreas@gaisler.com Cc: stable@vger.kernel.org Signed-off-by: Daniel Hellstrom daniel@gaisler.com Signed-off-by: Andreas Larsson andreas@gaisler.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/grcan.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/net/can/grcan.c +++ b/drivers/net/can/grcan.c @@ -248,6 +248,7 @@ struct grcan_device_config { struct grcan_priv { struct can_priv can; /* must be the first member */ struct net_device *dev; + struct device *ofdev_dev; struct napi_struct napi;
struct grcan_registers __iomem *regs; /* ioremap'ed registers */ @@ -924,7 +925,7 @@ static void grcan_free_dma_buffers(struc struct grcan_priv *priv = netdev_priv(dev); struct grcan_dma *dma = &priv->dma;
- dma_free_coherent(&dev->dev, dma->base_size, dma->base_buf, + dma_free_coherent(priv->ofdev_dev, dma->base_size, dma->base_buf, dma->base_handle); memset(dma, 0, sizeof(*dma)); } @@ -949,7 +950,7 @@ static int grcan_allocate_dma_buffers(st
/* Extra GRCAN_BUFFER_ALIGNMENT to allow for alignment */ dma->base_size = lsize + ssize + GRCAN_BUFFER_ALIGNMENT; - dma->base_buf = dma_alloc_coherent(&dev->dev, + dma->base_buf = dma_alloc_coherent(priv->ofdev_dev, dma->base_size, &dma->base_handle, GFP_KERNEL); @@ -1602,6 +1603,7 @@ static int grcan_setup_netdev(struct pla memcpy(&priv->config, &grcan_module_config, sizeof(struct grcan_device_config)); priv->dev = dev; + priv->ofdev_dev = &ofdev->dev; priv->regs = base; priv->can.bittiming_const = &grcan_bittiming_const; priv->can.do_set_bittiming = grcan_set_bittiming;
From: Andreas Larsson andreas@gaisler.com
commit 1e93ed26acf03fe6c97c6d573a10178596aadd43 upstream.
The systemid property was checked for in the wrong place of the device tree and compared to the wrong value.
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores") Link: https://lore.kernel.org/all/20220429084656.29788-3-andreas@gaisler.com Cc: stable@vger.kernel.org Signed-off-by: Andreas Larsson andreas@gaisler.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/grcan.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/net/can/grcan.c +++ b/drivers/net/can/grcan.c @@ -241,7 +241,7 @@ struct grcan_device_config { .rxsize = GRCAN_DEFAULT_BUFFER_SIZE, \ }
-#define GRCAN_TXBUG_SAFE_GRLIB_VERSION 0x4100 +#define GRCAN_TXBUG_SAFE_GRLIB_VERSION 4100 #define GRLIB_VERSION_MASK 0xffff
/* GRCAN private data structure */ @@ -1656,6 +1656,7 @@ exit_free_candev: static int grcan_probe(struct platform_device *ofdev) { struct device_node *np = ofdev->dev.of_node; + struct device_node *sysid_parent; u32 sysid, ambafreq; int irq, err; void __iomem *base; @@ -1664,10 +1665,15 @@ static int grcan_probe(struct platform_d /* Compare GRLIB version number with the first that does not * have the tx bug (see start_xmit) */ - err = of_property_read_u32(np, "systemid", &sysid); - if (!err && ((sysid & GRLIB_VERSION_MASK) - >= GRCAN_TXBUG_SAFE_GRLIB_VERSION)) - txbug = false; + sysid_parent = of_find_node_by_path("/ambapp0"); + if (sysid_parent) { + of_node_get(sysid_parent); + err = of_property_read_u32(sysid_parent, "systemid", &sysid); + if (!err && ((sysid & GRLIB_VERSION_MASK) >= + GRCAN_TXBUG_SAFE_GRLIB_VERSION)) + txbug = false; + of_node_put(sysid_parent); + }
err = of_property_read_u32(np, "freq", &ambafreq); if (err) {
From: Andreas Larsson andreas@gaisler.com
commit 2873d4d52f7c52d60b316ba6c47bd7122b5a9861 upstream.
The previous split budget between TX and RX made it return not using the entire budget but at the same time not having calling called napi_complete. This sometimes led to the poll to not be called, and at the same time having TX and RX interrupts disabled resulting in the driver getting stuck.
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores") Link: https://lore.kernel.org/all/20220429084656.29788-4-andreas@gaisler.com Cc: stable@vger.kernel.org Signed-off-by: Andreas Larsson andreas@gaisler.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/grcan.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-)
--- a/drivers/net/can/grcan.c +++ b/drivers/net/can/grcan.c @@ -1137,7 +1137,7 @@ static int grcan_close(struct net_device return 0; }
-static int grcan_transmit_catch_up(struct net_device *dev, int budget) +static void grcan_transmit_catch_up(struct net_device *dev) { struct grcan_priv *priv = netdev_priv(dev); unsigned long flags; @@ -1145,7 +1145,7 @@ static int grcan_transmit_catch_up(struc
spin_lock_irqsave(&priv->lock, flags);
- work_done = catch_up_echo_skb(dev, budget, true); + work_done = catch_up_echo_skb(dev, -1, true); if (work_done) { if (!priv->resetting && !priv->closing && !(priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)) @@ -1159,8 +1159,6 @@ static int grcan_transmit_catch_up(struc }
spin_unlock_irqrestore(&priv->lock, flags); - - return work_done; }
static int grcan_receive(struct net_device *dev, int budget) @@ -1242,19 +1240,13 @@ static int grcan_poll(struct napi_struct struct net_device *dev = priv->dev; struct grcan_registers __iomem *regs = priv->regs; unsigned long flags; - int tx_work_done, rx_work_done; - int rx_budget = budget / 2; - int tx_budget = budget - rx_budget; + int work_done;
- /* Half of the budget for receiving messages */ - rx_work_done = grcan_receive(dev, rx_budget); + work_done = grcan_receive(dev, budget);
- /* Half of the budget for transmitting messages as that can trigger echo - * frames being received - */ - tx_work_done = grcan_transmit_catch_up(dev, tx_budget); + grcan_transmit_catch_up(dev);
- if (rx_work_done < rx_budget && tx_work_done < tx_budget) { + if (work_done < budget) { napi_complete(napi);
/* Guarantee no interference with a running reset that otherwise @@ -1271,7 +1263,7 @@ static int grcan_poll(struct napi_struct spin_unlock_irqrestore(&priv->lock, flags); }
- return rx_work_done + tx_work_done; + return work_done; }
/* Work tx bug by waiting while for the risky situation to clear. If that fails,
From: Duoming Zhou duoming@zju.edu.cn
commit da5c0f119203ad9728920456a0f52a6d850c01cd upstream.
The device_is_registered() in nfc core is used to check whether nfc device is registered in netlink related functions such as nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered() is protected by device_lock, there is still a race condition between device_del() and device_is_registered(). The root cause is that kobject_del() in device_del() is not protected by device_lock.
(cleanup task) | (netlink task) | nfc_unregister_device | nfc_fw_download device_del | device_lock ... | if (!device_is_registered)//(1) kobject_del//(2) | ... ... | device_unlock
The device_is_registered() returns the value of state_in_sysfs and the state_in_sysfs is set to zero in kobject_del(). If we pass check in position (1), then set zero in position (2). As a result, the check in position (1) is useless.
This patch uses bool variable instead of device_is_registered() to judge whether the nfc device is registered, which is well synchronized.
Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/nfc/core.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-)
--- a/net/nfc/core.c +++ b/net/nfc/core.c @@ -38,7 +38,7 @@ int nfc_fw_download(struct nfc_dev *dev,
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -94,7 +94,7 @@ int nfc_dev_up(struct nfc_dev *dev)
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -142,7 +142,7 @@ int nfc_dev_down(struct nfc_dev *dev)
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -207,7 +207,7 @@ int nfc_start_poll(struct nfc_dev *dev,
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -246,7 +246,7 @@ int nfc_stop_poll(struct nfc_dev *dev)
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -291,7 +291,7 @@ int nfc_dep_link_up(struct nfc_dev *dev,
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -335,7 +335,7 @@ int nfc_dep_link_down(struct nfc_dev *de
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -401,7 +401,7 @@ int nfc_activate_target(struct nfc_dev *
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -448,7 +448,7 @@ int nfc_deactivate_target(struct nfc_dev
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -495,7 +495,7 @@ int nfc_data_exchange(struct nfc_dev *de
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; kfree_skb(skb); goto error; @@ -552,7 +552,7 @@ int nfc_enable_se(struct nfc_dev *dev, u
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -601,7 +601,7 @@ int nfc_disable_se(struct nfc_dev *dev,
device_lock(&dev->dev);
- if (!device_is_registered(&dev->dev)) { + if (dev->shutting_down) { rc = -ENODEV; goto error; } @@ -1134,6 +1134,7 @@ int nfc_register_device(struct nfc_dev * dev->rfkill = NULL; } } + dev->shutting_down = false; device_unlock(&dev->dev);
rc = nfc_genl_device_added(dev); @@ -1166,12 +1167,10 @@ void nfc_unregister_device(struct nfc_de rfkill_unregister(dev->rfkill); rfkill_destroy(dev->rfkill); } + dev->shutting_down = true; device_unlock(&dev->dev);
if (dev->ops->check_presence) { - device_lock(&dev->dev); - dev->shutting_down = true; - device_unlock(&dev->dev); del_timer_sync(&dev->check_pres_timer); cancel_work_sync(&dev->check_pres_work); }
From: Duoming Zhou duoming@zju.edu.cn
commit d270453a0d9ec10bb8a802a142fb1b3601a83098 upstream.
There are destructive operations such as nfcmrvl_fw_dnld_abort and gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware, gpio and so on could be destructed while the upper layer functions such as nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads to double-free, use-after-free and null-ptr-deref bugs.
There are three situations that could lead to double-free bugs.
The first situation is shown below:
(Thread 1) | (Thread 2) nfcmrvl_fw_dnld_start | ... | nfcmrvl_nci_unregister_dev release_firmware() | nfcmrvl_fw_dnld_abort kfree(fw) //(1) | fw_dnld_over | release_firmware ... | kfree(fw) //(2) | ...
The second situation is shown below:
(Thread 1) | (Thread 2) nfcmrvl_fw_dnld_start | ... | mod_timer | (wait a time) | fw_dnld_timeout | nfcmrvl_nci_unregister_dev fw_dnld_over | nfcmrvl_fw_dnld_abort release_firmware | fw_dnld_over kfree(fw) //(1) | release_firmware ... | kfree(fw) //(2)
The third situation is shown below:
(Thread 1) | (Thread 2) nfcmrvl_nci_recv_frame | if(..->fw_download_in_progress)| nfcmrvl_fw_dnld_recv_frame | queue_work | | fw_dnld_rx_work | nfcmrvl_nci_unregister_dev fw_dnld_over | nfcmrvl_fw_dnld_abort release_firmware | fw_dnld_over kfree(fw) //(1) | release_firmware | kfree(fw) //(2)
The firmware struct is deallocated in position (1) and deallocated in position (2) again.
The crash trace triggered by POC is like below:
BUG: KASAN: double-free or invalid-free in fw_dnld_over Call Trace: kfree fw_dnld_over nfcmrvl_nci_unregister_dev nci_uart_tty_close tty_ldisc_kill tty_ldisc_hangup __tty_hangup.part.0 tty_release ...
What's more, there are also use-after-free and null-ptr-deref bugs in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev, then, we dereference firmware, gpio or the members of priv->fw_dnld in nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
This patch reorders destructive operations after nci_unregister_device in order to synchronize between cleanup routine and firmware download routine.
The nci_unregister_device is well synchronized. If the device is detaching, the firmware download routine will goto error. If firmware download routine is executing, nci_unregister_device will wait until firmware download routine is finished.
Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nfc/nfcmrvl/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nfc/nfcmrvl/main.c +++ b/drivers/nfc/nfcmrvl/main.c @@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct n { struct nci_dev *ndev = priv->ndev;
+ nci_unregister_device(ndev); if (priv->ndev->nfc_dev->fw_download_in_progress) nfcmrvl_fw_dnld_abort(priv);
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct n if (gpio_is_valid(priv->config.reset_n_io)) gpio_free(priv->config.reset_n_io);
- nci_unregister_device(ndev); nci_free_device(ndev); kfree(priv); }
From: Duoming Zhou duoming@zju.edu.cn
commit 4071bf121d59944d5cd2238de0642f3d7995a997 upstream.
There are sleep in atomic bug that could cause kernel panic during firmware download process. The root cause is that nlmsg_new with GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer handler. The call trace is shown below:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 Call Trace: kmem_cache_alloc_node __alloc_skb nfc_genl_fw_download_done call_timer_fn __run_timers.part.0 run_timer_softirq __do_softirq ...
The nlmsg_new with GFP_KERNEL parameter may sleep during memory allocation process, and the timer handler is run as the result of a "software interrupt" that should not call any other function that could sleep.
This patch changes allocation mode of netlink message from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context.
Fixes: 9674da8759df ("NFC: Add firmware upload netlink command") Fixes: 9ea7187c53f6 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/nfc/netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -1244,7 +1244,7 @@ int nfc_genl_fw_download_done(struct nfc struct sk_buff *msg; void *hdr;
- msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); if (!msg) return -ENOMEM;
@@ -1260,7 +1260,7 @@ int nfc_genl_fw_download_done(struct nfc
genlmsg_end(msg, hdr);
- genlmsg_multicast(&nfc_genl_family, msg, 0, 0, GFP_KERNEL); + genlmsg_multicast(&nfc_genl_family, msg, 0, 0, GFP_ATOMIC);
return 0;
From: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp
commit 171865dab096da1ab980a32eeea5d1b88cd7bc50 upstream.
The fwnode of GPIO IRQ must be set to its own fwnode, not the fwnode of the parent IRQ. Therefore, this sets own fwnode instead of the parent IRQ fwnode to GPIO IRQ's.
Fixes: 2ad74f40dacc ("gpio: visconti: Add Toshiba Visconti GPIO support") Signed-off-by: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-visconti.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
--- a/drivers/gpio/gpio-visconti.c +++ b/drivers/gpio/gpio-visconti.c @@ -130,7 +130,6 @@ static int visconti_gpio_probe(struct pl struct gpio_irq_chip *girq; struct irq_domain *parent; struct device_node *irq_parent; - struct fwnode_handle *fwnode; int ret;
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); @@ -150,14 +149,12 @@ static int visconti_gpio_probe(struct pl }
parent = irq_find_host(irq_parent); + of_node_put(irq_parent); if (!parent) { dev_err(dev, "No IRQ parent domain\n"); return -ENODEV; }
- fwnode = of_node_to_fwnode(irq_parent); - of_node_put(irq_parent); - ret = bgpio_init(&priv->gpio_chip, dev, 4, priv->base + GPIO_IDATA, priv->base + GPIO_OSET, @@ -180,7 +177,7 @@ static int visconti_gpio_probe(struct pl
girq = &priv->gpio_chip.irq; girq->chip = irq_chip; - girq->fwnode = fwnode; + girq->fwnode = of_node_to_fwnode(dev->of_node); girq->parent_domain = parent; girq->child_to_parent_hwirq = visconti_gpio_child_to_parent_hwirq; girq->populate_parent_alloc_arg = visconti_gpio_populate_parent_fwspec;
From: Puyou Lu puyou.lu@gmail.com
commit dba785798526a3282cc4d0f0ea751883715dbbb4 upstream.
When one port's input state get inverted (eg. from low to hight) after pca953x_irq_setup but before setting irq_mask (by some other driver such as "gpio-keys"), the next inversion of this port (eg. from hight to low) will not be triggered any more (because irq_stat is not updated at the first time). Issue should be fixed after this commit.
Fixes: 89ea8bbe9c3e ("gpio: pca953x.c: add interrupt handling capability") Signed-off-by: Puyou Lu puyou.lu@gmail.com Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-pca953x.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -762,11 +762,11 @@ static bool pca953x_irq_pending(struct p bitmap_xor(cur_stat, new_stat, old_stat, gc->ngpio); bitmap_and(trigger, cur_stat, chip->irq_mask, gc->ngpio);
+ bitmap_copy(chip->irq_stat, new_stat, gc->ngpio); + if (bitmap_empty(trigger, gc->ngpio)) return false;
- bitmap_copy(chip->irq_stat, new_stat, gc->ngpio); - bitmap_and(cur_stat, chip->irq_trig_fall, old_stat, gc->ngpio); bitmap_and(old_stat, chip->irq_trig_raise, new_stat, gc->ngpio); bitmap_or(new_stat, old_stat, cur_stat, gc->ngpio);
From: Armin Wolf W_Armin@gmx.de
commit 7b2666ce445c700b8dcee994da44ddcf050a0842 upstream.
When removing the adt7470 module, a warning might be printed:
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffffa006052b>] adt7470_update_thread+0x7b/0x130 [adt7470]
This happens because adt7470_update_thread() can leave the kthread in TASK_INTERRUPTIBLE state when the kthread is being stopped before the call of set_current_state(). Since kthread_exit() might sleep in exit_signals(), the warning is printed. Fix that by using schedule_timeout_interruptible() and removing the call of set_current_state(). This causes TASK_INTERRUPTIBLE to be set after kthread_should_stop() which might cause the kthread to exit.
Reported-by: Zheyu Ma zheyuma97@gmail.com Fixes: 93cacfd41f82 (hwmon: (adt7470) Allow faster removal) Signed-off-by: Armin Wolf W_Armin@gmx.de Tested-by: Zheyu Ma zheyuma97@gmail.com Link: https://lore.kernel.org/r/20220407101312.13331-1-W_Armin@gmx.de Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/adt7470.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/hwmon/adt7470.c +++ b/drivers/hwmon/adt7470.c @@ -19,6 +19,7 @@ #include <linux/log2.h> #include <linux/kthread.h> #include <linux/regmap.h> +#include <linux/sched.h> #include <linux/slab.h> #include <linux/util_macros.h>
@@ -294,11 +295,10 @@ static int adt7470_update_thread(void *p adt7470_read_temperatures(data); mutex_unlock(&data->lock);
- set_current_state(TASK_INTERRUPTIBLE); if (kthread_should_stop()) break;
- schedule_timeout(msecs_to_jiffies(data->auto_update_interval)); + schedule_timeout_interruptible(msecs_to_jiffies(data->auto_update_interval)); }
return 0;
From: Adam Wujek dev_public@wujek.eu
commit 75d2b2b06bd8407d03a3f126bc8b95eb356906c7 upstream.
Explicitly disable PEC when the client does not support it. The problematic scenario is the following. A device with enabled PEC support is up and running and a kernel driver is loaded. Then the driver is unloaded (or device unbound), the HW device is reconfigured externally (e.g. by i2cset) to advertise itself as not supporting PEC. Without a new code, at the second load of the driver (or bind) the "flags" variable is not updated to avoid PEC usage. As a consequence the further communication with the device is done with the PEC enabled, which is wrong and may fail.
The implementation first disable the I2C_CLIENT_PEC flag, then the old code enable it if needed.
Fixes: 4e5418f787ec ("hwmon: (pmbus_core) Check adapter PEC support") Signed-off-by: Adam Wujek dev_public@wujek.eu Link: https://lore.kernel.org/r/20220420145059.431061-1-dev_public@wujek.eu Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/pmbus/pmbus_core.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/hwmon/pmbus/pmbus_core.c b/drivers/hwmon/pmbus/pmbus_core.c index b2618b1d529e..d93574d6a1fb 100644 --- a/drivers/hwmon/pmbus/pmbus_core.c +++ b/drivers/hwmon/pmbus/pmbus_core.c @@ -2326,6 +2326,9 @@ static int pmbus_init_common(struct i2c_client *client, struct pmbus_data *data, data->has_status_word = true; }
+ /* Make sure PEC is disabled, will be enabled later if needed */ + client->flags &= ~I2C_CLIENT_PEC; + /* Enable PEC if the controller and bus supports it */ if (!(data->flags & PMBUS_NO_CAPABILITY)) { ret = i2c_smbus_read_byte_data(client, PMBUS_CAPABILITY);
From: Codrin Ciubotariu codrin.ciubotariu@microchip.com
commit 660564fc9a92a893a14f255be434f7ea0b967901 upstream.
As pointed out by Sascha Hauer, this patch changes: if (pmc->config && !pcm->config->prepare_slave_config) <do nothing> to: if (pmc->config && !pcm->config->prepare_slave_config) snd_dmaengine_pcm_prepare_slave_config()
This breaks the drivers that do not need a call to dmaengine_slave_config(). Drivers that still need to call snd_dmaengine_pcm_prepare_slave_config(), but have a NULL pcm->config->prepare_slave_config should use snd_dmaengine_pcm_prepare_slave_config() as their prepare_slave_config callback.
Fixes: 9a1e13440a4f ("ASoC: dmaengine: do not use a NULL prepare_slave_config() callback") Reported-by: Sascha Hauer sha@pengutronix.de Signed-off-by: Codrin Ciubotariu codrin.ciubotariu@microchip.com Link: https://lore.kernel.org/r/20220421125403.2180824-1-codrin.ciubotariu@microch... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/soc-generic-dmaengine-pcm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/sound/soc/soc-generic-dmaengine-pcm.c +++ b/sound/soc/soc-generic-dmaengine-pcm.c @@ -82,10 +82,10 @@ static int dmaengine_pcm_hw_params(struc
memset(&slave_config, 0, sizeof(slave_config));
- if (pcm->config && pcm->config->prepare_slave_config) - prepare_slave_config = pcm->config->prepare_slave_config; - else + if (!pcm->config) prepare_slave_config = snd_dmaengine_pcm_prepare_slave_config; + else + prepare_slave_config = pcm->config->prepare_slave_config;
if (prepare_slave_config) { int ret = prepare_slave_config(substream, params, &slave_config);
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
commit eb5773201b1c5d603424bd21f161c8c2d1075b42 upstream.
cppcheck throws the following warning:
sound/soc/soc-ops.c:461:8: style: Variable 'ret' is assigned a value that is never used. [unreadVariable] ret = err; ^
This seems to be a missing change in the return value.
Fixes: 7f3d90a351968 ("ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_sx()") Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Reviewed-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20220421162328.302017-1-pierre-louis.bossart@linux... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/soc-ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/soc-ops.c +++ b/sound/soc/soc-ops.c @@ -461,7 +461,7 @@ int snd_soc_put_volsw_sx(struct snd_kcon ret = err; } } - return err; + return ret; } EXPORT_SYMBOL_GPL(snd_soc_put_volsw_sx);
From: Lu Baolu baolu.lu@linux.intel.com
commit da8669ff41fa31573375c9a4180f5c080677204b upstream.
The page fault handling framework in the IOMMU core explicitly states that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must discard them before reporting faults. This handles Stop Marker messages in prq_event_thread() before reporting events to the core.
The VT-d driver explicitly drains the pending page requests when a CPU page table (represented by a mm struct) is unbound from a PASID according to the procedures defined in the VT-d spec. The Stop Marker messages do not need a response. Hence, it is safe to drop the Stop Marker messages silently if any of them is found in the page request queue.
Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework") Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jacob Pan jacob.jun.pan@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20220421113558.3504874-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220423082330.3897867-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/intel/svm.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -978,6 +978,10 @@ bad_req: goto bad_req; }
+ /* Drop Stop Marker message. No need for a response. */ + if (unlikely(req->lpig && !req->rd_req && !req->wr_req)) + goto prq_advance; + if (!svm || svm->pasid != req->pasid) { /* * It can't go away, because the driver is not permitted
From: Yang Yingliang yangyingliang@huawei.com
commit a15932f4377062364d22096afe25bc579134a1c3 upstream.
It will cause null-ptr-deref in resource_size(), if platform_get_resource() returns NULL, move calling resource_size() after devm_ioremap_resource() that will check 'res' to avoid null-ptr-deref. And use devm_platform_get_and_ioremap_resource() to simplify code.
Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Sven Peter sven@svenpeter.dev Link: https://lore.kernel.org/r/20220425090826.2532165-1-yangyingliang@huawei.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/apple-dart.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -832,16 +832,15 @@ static int apple_dart_probe(struct platf dart->dev = dev; spin_lock_init(&dart->lock);
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + dart->regs = devm_platform_get_and_ioremap_resource(pdev, 0, &res); + if (IS_ERR(dart->regs)) + return PTR_ERR(dart->regs); + if (resource_size(res) < 0x4000) { dev_err(dev, "MMIO region too small (%pr)\n", res); return -EINVAL; }
- dart->regs = devm_ioremap_resource(dev, res); - if (IS_ERR(dart->regs)) - return PTR_ERR(dart->regs); - dart->irq = platform_get_irq(pdev, 0); if (dart->irq < 0) return -ENODEV;
From: Moshe Tal moshet@nvidia.com
commit b781bff882d16175277ca129c382886cb4c74a2c upstream.
Setting dscp2prio during the driver reload can cause dcb ieee app list to be not empty after the reload finish and as a result to a conflict between the priority trust state reported by the app and the state in the device register.
Reset the dcb ieee app list on initialization in case this is conflicting with the register status.
Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support") Signed-off-by: Moshe Tal moshet@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -1198,6 +1198,16 @@ static int mlx5e_trust_initialize(struct if (err) return err;
+ if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_PCP && priv->dcbx.dscp_app_cnt) { + /* + * Align the driver state with the register state. + * Temporary state change is required to enable the app list reset. + */ + priv->dcbx_dp.trust_state = MLX5_QPTS_TRUST_DSCP; + mlx5e_dcbnl_delete_app(priv); + priv->dcbx_dp.trust_state = MLX5_QPTS_TRUST_PCP; + } + mlx5e_params_calc_trust_tx_min_inline_mode(priv->mdev, &priv->channels.params, priv->dcbx_dp.trust_state);
From: Vlad Buslov vladbu@nvidia.com
commit ada09af92e621ab500dd80a16d1d0299a18a1180 upstream.
Currently, match VLAN rule also matches packets that have multiple VLAN headers. This behavior is similar to buggy flower classifier behavior that has recently been fixed. Fix the issue by matching on outer_second_cvlan_tag with value 0 which will cause the HW to verify the packet doesn't contain second vlan header.
Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Maor Dickman maord@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -2291,6 +2291,17 @@ static int __parse_cls_flower(struct mlx match.key->vlan_priority);
*match_level = MLX5_MATCH_L2; + + if (!flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_CVLAN) && + match.mask->vlan_eth_type && + MLX5_CAP_FLOWTABLE_TYPE(priv->mdev, + ft_field_support.outer_second_vid, + fs_type)) { + MLX5_SET(fte_match_set_misc, misc_c, + outer_second_cvlan_tag, 1); + spec->match_criteria_enable |= + MLX5_MATCH_MISC_PARAMETERS; + } } } else if (*match_level != MLX5_MATCH_NONE) { /* cvlan_tag enabled in match criteria and
From: Paul Blakey paulb@nvidia.com
commit b069e14fff46c8da9fcc79957f8acaa3e2dfdb6b upstream.
__mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT, if that is the last reference to that tuple, the actual deletion of the tuple can happen after the FT is already destroyed and freed.
Flush the used workqueue before destroying the ct FT.
Fixes: a2173131526d ("net/mlx5e: CT: manage the lifetime of the ct entry object") Reviewed-by: Oz Shlomo ozsh@nvidia.com Signed-off-by: Paul Blakey paulb@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c @@ -1699,6 +1699,8 @@ mlx5_tc_ct_flush_ft_entry(void *ptr, voi static void mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft) { + struct mlx5e_priv *priv; + if (!refcount_dec_and_test(&ft->refcount)) return;
@@ -1708,6 +1710,8 @@ mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_p rhashtable_free_and_destroy(&ft->ct_entries_ht, mlx5_tc_ct_flush_ft_entry, ct_priv); + priv = netdev_priv(ct_priv->netdev); + flush_workqueue(priv->wq); mlx5_tc_ct_free_pre_ct_tables(ft); mapping_remove(ct_priv->zone_mapping, ft->zone_restore_id); kfree(ft);
From: Mark Zhang markzhang@nvidia.com
commit c4d963a588a6e7c4ef31160e80697ae8e5a47746 upstream.
The arguments of update_buffer_lossy() is in a wrong order. Fix it.
Fixes: 88b3d5c90e96 ("net/mlx5e: Fix port buffers cell size value") Signed-off-by: Mark Zhang markzhang@nvidia.com Reviewed-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c @@ -309,8 +309,8 @@ int mlx5e_port_manual_buffer_config(stru if (err) return err;
- err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer, port_buff_cell_sz, - xoff, &port_buffer, &update_buffer); + err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer, xoff, + port_buff_cell_sz, &port_buffer, &update_buffer); if (err) return err; }
From: Moshe Shemesh moshe@nvidia.com
commit fc3d3db07b35885f238e1fa06b9f04a8fa7a62d0 upstream.
Double clear of reset requested state can lead to NULL pointer as it will try to delete the timer twice. This can happen for example on a race between abort from FW and pci error or reset. Avoid such case using test_and_clear_bit() to verify only one time reset requested state clear flow. Similarly use test_and_set_bit() to verify only one time reset requested state set flow.
Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Maher Sanalla msanalla@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 28 ++++++++++++++------- 1 file changed, 19 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -134,14 +134,19 @@ static void mlx5_stop_sync_reset_poll(st del_timer_sync(&fw_reset->timer); }
-static void mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool poll_health) +static int mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool poll_health) { struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!test_and_clear_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) { + mlx5_core_warn(dev, "Reset request was already cleared\n"); + return -EALREADY; + } + mlx5_stop_sync_reset_poll(dev); - clear_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags); if (poll_health) mlx5_start_health_poll(dev); + return 0; }
#define MLX5_RESET_POLL_INTERVAL (HZ / 10) @@ -185,13 +190,17 @@ static int mlx5_fw_reset_set_reset_sync_ return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 2, false); }
-static void mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev) +static int mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev) { struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (test_and_set_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) { + mlx5_core_warn(dev, "Reset request was already set\n"); + return -EALREADY; + } mlx5_stop_health_poll(dev, true); - set_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags); mlx5_start_sync_reset_poll(dev); + return 0; }
static void mlx5_fw_live_patch_event(struct work_struct *work) @@ -220,7 +229,9 @@ static void mlx5_sync_reset_request_even err ? "Failed" : "Sent"); return; } - mlx5_sync_reset_set_reset_requested(dev); + if (mlx5_sync_reset_set_reset_requested(dev)) + return; + err = mlx5_fw_reset_set_reset_sync_ack(dev); if (err) mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed. Error code: %d\n", err); @@ -320,7 +331,8 @@ static void mlx5_sync_reset_now_event(st struct mlx5_core_dev *dev = fw_reset->dev; int err;
- mlx5_sync_reset_clear_reset_requested(dev, false); + if (mlx5_sync_reset_clear_reset_requested(dev, false)) + return;
mlx5_core_warn(dev, "Sync Reset now. Device is going to reset.\n");
@@ -349,10 +361,8 @@ static void mlx5_sync_reset_abort_event( reset_abort_work); struct mlx5_core_dev *dev = fw_reset->dev;
- if (!test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) + if (mlx5_sync_reset_clear_reset_requested(dev, true)) return; - - mlx5_sync_reset_clear_reset_requested(dev, true); mlx5_core_warn(dev, "PCI Sync FW Update Reset Aborted.\n"); }
From: Moshe Shemesh moshe@nvidia.com
commit cb7786a76ea39f394f0a059787fe24fa8e340fb6 upstream.
The sync reset flow can lead to the following deadlock when poll_sync_reset() is called by timer softirq and waiting on del_timer_sync() for the same timer. Fix that by moving the part of the flow that waits for the timer to reset_reload_work.
It fixes the following kernel Trace: RIP: 0010:del_timer_sync+0x32/0x40 ... Call Trace: <IRQ> mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core] poll_sync_reset.cold+0x36/0x52 [mlx5_core] call_timer_fn+0x32/0x130 __run_timers.part.0+0x180/0x280 ? tick_sched_handle+0x33/0x60 ? tick_sched_timer+0x3d/0x80 ? ktime_get+0x3e/0xa0 run_timer_softirq+0x2a/0x50 __do_softirq+0xe1/0x2d6 ? hrtimer_interrupt+0x136/0x220 irq_exit+0xae/0xb0 smp_apic_timer_interrupt+0x7b/0x140 apic_timer_interrupt+0xf/0x20 </IRQ>
Fixes: 3c5193a87b0f ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Maher Sanalla msanalla@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 34 ++++++++++----------- 1 file changed, 17 insertions(+), 17 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -111,22 +111,6 @@ static void mlx5_fw_reset_complete_reloa } }
-static void mlx5_sync_reset_reload_work(struct work_struct *work) -{ - struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, - reset_reload_work); - struct mlx5_core_dev *dev = fw_reset->dev; - int err; - - mlx5_enter_error_state(dev, true); - mlx5_unload_one(dev); - err = mlx5_health_wait_pci_up(dev); - if (err) - mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n"); - fw_reset->ret = err; - mlx5_fw_reset_complete_reload(dev); -} - static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev) { struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; @@ -149,6 +133,23 @@ static int mlx5_sync_reset_clear_reset_r return 0; }
+static void mlx5_sync_reset_reload_work(struct work_struct *work) +{ + struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset, + reset_reload_work); + struct mlx5_core_dev *dev = fw_reset->dev; + int err; + + mlx5_sync_reset_clear_reset_requested(dev, false); + mlx5_enter_error_state(dev, true); + mlx5_unload_one(dev); + err = mlx5_health_wait_pci_up(dev); + if (err) + mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n"); + fw_reset->ret = err; + mlx5_fw_reset_complete_reload(dev); +} + #define MLX5_RESET_POLL_INTERVAL (HZ / 10) static void poll_sync_reset(struct timer_list *t) { @@ -163,7 +164,6 @@ static void poll_sync_reset(struct timer
if (fatal_error) { mlx5_core_warn(dev, "Got Device Reset\n"); - mlx5_sync_reset_clear_reset_requested(dev, false); queue_work(fw_reset->wq, &fw_reset->reset_reload_work); return; }
From: Jann Horn jannh@google.com
commit 2bfed7d2ffa5d86c462d3e2067f2832eaf8c04c7 upstream.
Since commit 92d25637a3a4 ("kselftest: signal all child processes"), tests are executed in background process groups. This means that trying to read from stdin now throws SIGTTIN when stdin is a TTY, which breaks some seccomp selftests that try to use read(0, NULL, 0) as a dummy syscall.
The simplest way to fix that is probably to just use -1 instead of 0 as the dummy read()'s FD.
Fixes: 92d25637a3a4 ("kselftest: signal all child processes") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20220319010011.1374622-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/seccomp/seccomp_bpf.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -955,7 +955,7 @@ TEST(ERRNO_valid) ASSERT_EQ(0, ret);
EXPECT_EQ(parent, syscall(__NR_getppid)); - EXPECT_EQ(-1, read(0, NULL, 0)); + EXPECT_EQ(-1, read(-1, NULL, 0)); EXPECT_EQ(E2BIG, errno); }
@@ -974,7 +974,7 @@ TEST(ERRNO_zero)
EXPECT_EQ(parent, syscall(__NR_getppid)); /* "errno" of 0 is ok. */ - EXPECT_EQ(0, read(0, NULL, 0)); + EXPECT_EQ(0, read(-1, NULL, 0)); }
/* @@ -995,7 +995,7 @@ TEST(ERRNO_capped) ASSERT_EQ(0, ret);
EXPECT_EQ(parent, syscall(__NR_getppid)); - EXPECT_EQ(-1, read(0, NULL, 0)); + EXPECT_EQ(-1, read(-1, NULL, 0)); EXPECT_EQ(4095, errno); }
@@ -1026,7 +1026,7 @@ TEST(ERRNO_order) ASSERT_EQ(0, ret);
EXPECT_EQ(parent, syscall(__NR_getppid)); - EXPECT_EQ(-1, read(0, NULL, 0)); + EXPECT_EQ(-1, read(-1, NULL, 0)); EXPECT_EQ(12, errno); }
@@ -2579,7 +2579,7 @@ void *tsync_sibling(void *data) ret = prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); if (!ret) return (void *)SIBLING_EXIT_NEWPRIVS; - read(0, NULL, 0); + read(-1, NULL, 0); return (void *)SIBLING_EXIT_UNKILLED; }
From: Olga Kornievskaia kolga@netapp.com
commit e13433b4416fa31a24e621cbbbb39227a3d651dd upstream.
A relocated task must release its previous transport.
Fixes: 82ee41b85cef1 ("SUNRPC don't resend a task on an offlined transport") Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/clnt.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1065,10 +1065,13 @@ rpc_task_get_next_xprt(struct rpc_clnt * static void rpc_task_set_transport(struct rpc_task *task, struct rpc_clnt *clnt) { - if (task->tk_xprt && - !(test_bit(XPRT_OFFLINE, &task->tk_xprt->state) && - (task->tk_flags & RPC_TASK_MOVEABLE))) - return; + if (task->tk_xprt) { + if (!(test_bit(XPRT_OFFLINE, &task->tk_xprt->state) && + (task->tk_flags & RPC_TASK_MOVEABLE))) + return; + xprt_release(task); + xprt_put(task->tk_xprt); + } if (task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) task->tk_xprt = rpc_task_get_first_xprt(clnt); else
From: Cheng Xu chengyou@linux.alibaba.com
commit ef91271c65c12d36e4c2b61c61d4849fb6d11aa0 upstream.
The calling of siw_cm_upcall and detaching new_cep with its listen_cep should be atomistic semantics. Otherwise siw_reject may be called in a temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep has not being cleared.
This fixes a WARN:
WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: iw_cm_wq cm_work_handler [iw_cm] RIP: 0010:siw_cep_put+0x125/0x130 [siw] Call Trace: <TASK> siw_reject+0xac/0x180 [siw] iw_cm_reject+0x68/0xc0 [iw_cm] cm_work_handler+0x59d/0xe20 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.165052655... Reported-by: Luis Chamberlain mcgrof@kernel.org Reviewed-by: Bernard Metzler bmt@zurich.ibm.com Signed-off-by: Cheng Xu chengyou@linux.alibaba.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/sw/siw/siw_cm.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/drivers/infiniband/sw/siw/siw_cm.c +++ b/drivers/infiniband/sw/siw/siw_cm.c @@ -968,14 +968,15 @@ static void siw_accept_newconn(struct si
siw_cep_set_inuse(new_cep); rv = siw_proc_mpareq(new_cep); - siw_cep_set_free(new_cep); - if (rv != -EAGAIN) { siw_cep_put(cep); new_cep->listen_cep = NULL; - if (rv) + if (rv) { + siw_cep_set_free(new_cep); goto error; + } } + siw_cep_set_free(new_cep); } return;
From: Tatyana Nikolova tatyana.e.nikolova@intel.com
commit 7b8943b821bafab492f43aafbd006b57c6b65845 upstream.
When connection establishment fails in iWARP mode, an app can drain the QPs and hang because flush isn't issued when the QP is modified from RTR state to error. Issue a flush in this case using function irdma_cm_disconn().
Update irdma_cm_disconn() to do flush when cm_id is NULL, which is the case when the QP is in RTR state and there is an error in the connection establishment.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20220425181703.1634-2-shiraz.saleem@intel.com Signed-off-by: Tatyana Nikolova tatyana.e.nikolova@intel.com Signed-off-by: Shiraz Saleem shiraz.saleem@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/irdma/cm.c | 16 +++++----------- drivers/infiniband/hw/irdma/verbs.c | 4 ++-- 2 files changed, 7 insertions(+), 13 deletions(-)
--- a/drivers/infiniband/hw/irdma/cm.c +++ b/drivers/infiniband/hw/irdma/cm.c @@ -3465,12 +3465,6 @@ static void irdma_cm_disconn_true(struct }
cm_id = iwqp->cm_id; - /* make sure we havent already closed this connection */ - if (!cm_id) { - spin_unlock_irqrestore(&iwqp->lock, flags); - return; - } - original_hw_tcp_state = iwqp->hw_tcp_state; original_ibqp_state = iwqp->ibqp_state; last_ae = iwqp->last_aeq; @@ -3492,11 +3486,11 @@ static void irdma_cm_disconn_true(struct disconn_status = -ECONNRESET; }
- if ((original_hw_tcp_state == IRDMA_TCP_STATE_CLOSED || - original_hw_tcp_state == IRDMA_TCP_STATE_TIME_WAIT || - last_ae == IRDMA_AE_RDMAP_ROE_BAD_LLP_CLOSE || - last_ae == IRDMA_AE_BAD_CLOSE || - last_ae == IRDMA_AE_LLP_CONNECTION_RESET || iwdev->rf->reset)) { + if (original_hw_tcp_state == IRDMA_TCP_STATE_CLOSED || + original_hw_tcp_state == IRDMA_TCP_STATE_TIME_WAIT || + last_ae == IRDMA_AE_RDMAP_ROE_BAD_LLP_CLOSE || + last_ae == IRDMA_AE_BAD_CLOSE || + last_ae == IRDMA_AE_LLP_CONNECTION_RESET || iwdev->rf->reset || !cm_id) { issue_close = 1; iwqp->cm_id = NULL; qp->term_flags = 0; --- a/drivers/infiniband/hw/irdma/verbs.c +++ b/drivers/infiniband/hw/irdma/verbs.c @@ -1617,13 +1617,13 @@ int irdma_modify_qp(struct ib_qp *ibqp,
if (issue_modify_qp && iwqp->ibqp_state > IB_QPS_RTS) { if (dont_wait) { - if (iwqp->cm_id && iwqp->hw_tcp_state) { + if (iwqp->hw_tcp_state) { spin_lock_irqsave(&iwqp->lock, flags); iwqp->hw_tcp_state = IRDMA_TCP_STATE_CLOSED; iwqp->last_aeq = IRDMA_AE_RESET_SENT; spin_unlock_irqrestore(&iwqp->lock, flags); - irdma_cm_disconn(iwqp); } + irdma_cm_disconn(iwqp); } else { int close_timer_started;
From: Shiraz Saleem shiraz.saleem@intel.com
commit 2df6d895907b2f5dfbc558cbff7801bba82cb3cc upstream.
QP destroy is synchronous and waits for its refcnt to be decremented in irdma_cm_node_free_cb (for iWARP) which fires after the RCU grace period elapses.
Applications running a large number of connections are exposed to high wait times on destroy QP for events like SIGABORT.
The long pole for this wait time is the firing of the call_rcu callback during a CM node destroy which can be slow. It holds the QP reference count and blocks the destroy QP from completing.
call_rcu only needs to make sure that list walkers have a reference to the cm_node object before freeing it and thus need to wait for grace period elapse. The rest of the connection teardown in irdma_cm_node_free_cb is moved out of the grace period wait in irdma_destroy_connection. Also, replace call_rcu with a simple kfree_rcu as it just needs to do a kfree on the cm_node
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220425181703.1634-3-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem shiraz.saleem@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/irdma/cm.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
--- a/drivers/infiniband/hw/irdma/cm.c +++ b/drivers/infiniband/hw/irdma/cm.c @@ -2305,10 +2305,8 @@ err: return NULL; }
-static void irdma_cm_node_free_cb(struct rcu_head *rcu_head) +static void irdma_destroy_connection(struct irdma_cm_node *cm_node) { - struct irdma_cm_node *cm_node = - container_of(rcu_head, struct irdma_cm_node, rcu_head); struct irdma_cm_core *cm_core = cm_node->cm_core; struct irdma_qp *iwqp; struct irdma_cm_info nfo; @@ -2356,7 +2354,6 @@ static void irdma_cm_node_free_cb(struct }
cm_core->cm_free_ah(cm_node); - kfree(cm_node); }
/** @@ -2384,8 +2381,9 @@ void irdma_rem_ref_cm_node(struct irdma_
spin_unlock_irqrestore(&cm_core->ht_lock, flags);
- /* wait for all list walkers to exit their grace period */ - call_rcu(&cm_node->rcu_head, irdma_cm_node_free_cb); + irdma_destroy_connection(cm_node); + + kfree_rcu(cm_node, rcu_head); }
/**
From: Mustafa Ismail mustafa.ismail@intel.com
commit 1c9043ae0667a43bd87beeebbdd4bed674713629 upstream.
For some net events in irdma_net_event notifier, the netdev can be NULL which will cause a crash in rdma_vlan_dev_real_dev. Fix this by moving all processing to the NETEVENT_NEIGH_UPDATE case where the netdev is guaranteed to not be NULL.
Fixes: 6702bc147448 ("RDMA/irdma: Fix netdev notifications for vlan's") Link: https://lore.kernel.org/r/20220425181703.1634-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail mustafa.ismail@intel.com Signed-off-by: Shiraz Saleem shiraz.saleem@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
--- a/drivers/infiniband/hw/irdma/utils.c +++ b/drivers/infiniband/hw/irdma/utils.c @@ -258,18 +258,16 @@ int irdma_net_event(struct notifier_bloc u32 local_ipaddr[4] = {}; bool ipv4 = true;
- real_dev = rdma_vlan_dev_real_dev(netdev); - if (!real_dev) - real_dev = netdev; - - ibdev = ib_device_get_by_netdev(real_dev, RDMA_DRIVER_IRDMA); - if (!ibdev) - return NOTIFY_DONE; - - iwdev = to_iwdev(ibdev); - switch (event) { case NETEVENT_NEIGH_UPDATE: + real_dev = rdma_vlan_dev_real_dev(netdev); + if (!real_dev) + real_dev = netdev; + ibdev = ib_device_get_by_netdev(real_dev, RDMA_DRIVER_IRDMA); + if (!ibdev) + return NOTIFY_DONE; + + iwdev = to_iwdev(ibdev); p = (__be32 *)neigh->primary_key; if (neigh->tbl->family == AF_INET6) { ipv4 = false; @@ -290,13 +288,12 @@ int irdma_net_event(struct notifier_bloc irdma_manage_arp_cache(iwdev->rf, neigh->ha, local_ipaddr, ipv4, IRDMA_ARP_DELETE); + ib_device_put(ibdev); break; default: break; }
- ib_device_put(ibdev); - return NOTIFY_DONE; }
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 00c94ebec5925593c0377b941289224469e72ac7 upstream.
There is no need to declare attributes such as the ctime, mtime and block size invalid when we're just returning a delegation, so it is inappropriate to call nfs_post_op_update_inode_force_wcc(). Instead, just call nfs_refresh_inode() after faking up the change attribute. We know that the GETATTR op occurs before the DELEGRETURN, so we are safe when doing this.
Fixes: 0bc2c9b4dca9 ("NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/nfs4proc.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -366,6 +366,14 @@ static void nfs4_setup_readdir(u64 cooki kunmap_atomic(start); }
+static void nfs4_fattr_set_prechange(struct nfs_fattr *fattr, u64 version) +{ + if (!(fattr->valid & NFS_ATTR_FATTR_PRECHANGE)) { + fattr->pre_change_attr = version; + fattr->valid |= NFS_ATTR_FATTR_PRECHANGE; + } +} + static void nfs4_test_and_free_stateid(struct nfs_server *server, nfs4_stateid *stateid, const struct cred *cred) @@ -6558,7 +6566,9 @@ static void nfs4_delegreturn_release(voi pnfs_roc_release(&data->lr.arg, &data->lr.res, data->res.lr_ret); if (inode) { - nfs_post_op_update_inode_force_wcc(inode, &data->fattr); + nfs4_fattr_set_prechange(&data->fattr, + inode_peek_iversion_raw(inode)); + nfs_refresh_inode(inode, &data->fattr); nfs_iput_and_deactive(inode); } kfree(calldata);
From: Yang Yingliang yangyingliang@huawei.com
commit ff5265d45345d01fefc98fcb9ae891b59633c919 upstream.
The node pointer returned by of_parse_phandle() with refcount incremented, so add of_node_put() after using it in mtk_sgmii_init().
Fixes: 9ffee4a8276c ("net: ethernet: mediatek: Extend SGMII related functions") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Link: https://lore.kernel.org/r/20220428062543.64883-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mediatek/mtk_sgmii.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c +++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c @@ -26,6 +26,7 @@ int mtk_sgmii_init(struct mtk_sgmii *ss, break;
ss->regmap[i] = syscon_node_to_regmap(np); + of_node_put(np); if (IS_ERR(ss->regmap[i])) return PTR_ERR(ss->regmap[i]); }
From: Yang Yingliang yangyingliang@huawei.com
commit a9e9b091a1c14ecd8bd9d3214a62142a1786fe30 upstream.
Add of_node_put() if of_get_phy_mode() fails in mt7530_setup()
Fixes: 0c65b2b90d13 ("net: of_get_phy_mode: Change API to solve int/unit warnings") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Link: https://lore.kernel.org/r/20220428095317.538829-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/mt7530.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -2216,6 +2216,7 @@ mt7530_setup(struct dsa_switch *ds) ret = of_get_phy_mode(mac_np, &interface); if (ret && ret != -ENODEV) { of_node_put(mac_np); + of_node_put(phy_node); return ret; } id = of_mdio_parse_addr(ds->dev, phy_node);
From: Yang Yingliang yangyingliang@huawei.com
commit 1a15267b7be77e0792cf0c7b36ca65c8eb2df0d8 upstream.
The node pointer returned by of_get_child_by_name() with refcount incremented, so add of_node_put() after using it.
Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Link: https://lore.kernel.org/r/20220428095716.540452-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c @@ -916,6 +916,7 @@ static int sun8i_dwmac_register_mdio_mux
ret = mdio_mux_init(priv->device, mdio_mux, mdio_mux_syscon_switch_fn, &gmac->mux_handle, priv, priv->mii); + of_node_put(mdio_mux); return ret; }
From: Niels Dossche dossche.niels@gmail.com
commit e87f66b38e66dffdec9daa9f8f0eb044e9a62e3b upstream.
Error values inside the probe function must be < 0. The ENOMEM return value has the wrong sign: it is positive instead of negative. Add a minus sign.
Fixes: e239756717b5 ("net: mdio: Add BCM6368 MDIO mux bus controller") Signed-off-by: Niels Dossche dossche.niels@gmail.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20220428211931.8130-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/mdio/mdio-mux-bcm6368.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/mdio/mdio-mux-bcm6368.c +++ b/drivers/net/mdio/mdio-mux-bcm6368.c @@ -115,7 +115,7 @@ static int bcm6368_mdiomux_probe(struct md->mii_bus = devm_mdiobus_alloc(&pdev->dev); if (!md->mii_bus) { dev_err(&pdev->dev, "mdiomux bus alloc failed\n"); - return ENOMEM; + return -ENOMEM; }
bus = md->mii_bus;
From: Yang Yingliang yangyingliang@huawei.com
commit 95098d5ac2551769807031444e55a0da5d4f0952 upstream.
'tmp_node' need be put before returning from cpsw_probe_dt(), so add missing of_node_put() in error path.
Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ti/cpsw_new.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -1246,8 +1246,10 @@ static int cpsw_probe_dt(struct cpsw_com data->slave_data = devm_kcalloc(dev, CPSW_SLAVE_PORTS_NUM, sizeof(struct cpsw_slave_data), GFP_KERNEL); - if (!data->slave_data) + if (!data->slave_data) { + of_node_put(tmp_node); return -ENOMEM; + }
/* Populate all the child nodes here... */ @@ -1341,6 +1343,7 @@ static int cpsw_probe_dt(struct cpsw_com
err_node_put: of_node_put(port_np); + of_node_put(tmp_node); return ret; }
From: Eric Dumazet edumazet@google.com
commit dba5bdd57bea587ea4f0b79b03c71135f84a7e8b upstream.
syzbot reported an UAF in ip_mc_sf_allow() [1]
Whenever RCU protected list replaces an object, the pointer to the new object needs to be updated _before_ the call to kfree_rcu() or call_rcu()
Because kfree_rcu(ptr, rcu) got support for NULL ptr only recently in commit 12edff045bc6 ("rcu: Make kfree_rcu() ignore NULL pointers"), I chose to use the conditional to make sure stable backports won't miss this detail.
if (psl) kfree_rcu(psl, rcu);
net/ipv6/mcast.c has similar issues, addressed in a separate patch.
[1] BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655 Read of size 4 at addr ffff88807d37b904 by task syz-executor.5/908
CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted 5.18.0-rc4-syzkaller-00064-g8f4dd16603ce #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313 print_report mm/kasan/report.c:429 [inline] kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491 ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655 raw_v4_input net/ipv4/raw.c:190 [inline] raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218 ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193 ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519 netif_receive_skb_internal net/core/dev.c:5605 [inline] netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664 tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534 tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985 tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015 call_write_iter include/linux/fs.h:2050 [inline] new_sync_write+0x38a/0x560 fs/read_write.c:504 vfs_write+0x7c0/0xac0 fs/read_write.c:591 ksys_write+0x127/0x250 fs/read_write.c:644 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f3f12c3bbff Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48 RSP: 002b:00007f3f13ea9130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f3f12d9bf60 RCX: 00007f3f12c3bbff RDX: 0000000000000036 RSI: 0000000020002ac0 RDI: 00000000000000c8 RBP: 00007f3f12ce308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000036 R11: 0000000000000293 R12: 0000000000000000 R13: 00007fffb68dd79f R14: 00007f3f13ea9300 R15: 0000000000022000 </TASK>
Allocated by task 908: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:436 [inline] ____kasan_kmalloc mm/kasan/common.c:515 [inline] ____kasan_kmalloc mm/kasan/common.c:474 [inline] __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524 kasan_kmalloc include/linux/kasan.h:234 [inline] __do_kmalloc mm/slab.c:3710 [inline] __kmalloc+0x209/0x4d0 mm/slab.c:3719 kmalloc include/linux/slab.h:586 [inline] sock_kmalloc net/core/sock.c:2501 [inline] sock_kmalloc+0xb5/0x100 net/core/sock.c:2492 ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392 do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline] ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432 raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861 __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 753: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328 kasan_slab_free include/linux/kasan.h:200 [inline] __cache_free mm/slab.c:3439 [inline] kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774 kfree_bulk include/linux/slab.h:437 [inline] kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
Last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348 kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595 ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510 do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline] ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432 raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861 __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348 call_rcu+0x99/0x790 kernel/rcu/tree.c:3074 mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656 notifier_call_chain+0xb5/0x200 kernel/notifier.c:84 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938 call_netdevice_notifiers_extack net/core/dev.c:1976 [inline] call_netdevice_notifiers net/core/dev.c:1990 [inline] unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751 default_device_exit_batch+0x449/0x590 net/core/dev.c:11245 ops_exit_list+0x125/0x170 net/core/net_namespace.c:167 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
The buggy address belongs to the object at ffff88807d37b900 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 4 bytes inside of 64-byte region [ffff88807d37b900, ffff88807d37b940)
The buggy address belongs to the physical page: page:ffffea0001f4dec0 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200 raw: ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts 139732238007, free_ts 139730893262 prep_new_page mm/page_alloc.c:2441 [inline] get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408 __alloc_pages_node include/linux/gfp.h:587 [inline] kmem_getpages mm/slab.c:1378 [inline] cache_grow_begin+0x75/0x350 mm/slab.c:2584 cache_alloc_refill+0x27f/0x380 mm/slab.c:2957 ____cache_alloc mm/slab.c:3040 [inline] ____cache_alloc mm/slab.c:3023 [inline] __do_cache_alloc mm/slab.c:3267 [inline] slab_alloc mm/slab.c:3309 [inline] __do_kmalloc mm/slab.c:3708 [inline] __kmalloc+0x3b3/0x4d0 mm/slab.c:3719 kmalloc include/linux/slab.h:586 [inline] kzalloc include/linux/slab.h:714 [inline] tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45 tomoyo_encode2 security/tomoyo/realpath.c:31 [inline] tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80 tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822 security_inode_getattr+0xcf/0x140 security/security.c:1350 vfs_getattr fs/stat.c:157 [inline] vfs_statx+0x16a/0x390 fs/stat.c:232 vfs_fstatat+0x8c/0xb0 fs/stat.c:255 __do_sys_newfstatat+0x91/0x110 fs/stat.c:425 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1356 [inline] free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406 free_unref_page_prepare mm/page_alloc.c:3328 [inline] free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423 __vunmap+0x85d/0xd30 mm/vmalloc.c:2667 __vfree+0x3c/0xd0 mm/vmalloc.c:2715 vfree+0x5a/0x90 mm/vmalloc.c:2746 __do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117 do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline] do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639 nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101 ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026 tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696 __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
Memory state around the buggy address: ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^ ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
Fixes: c85bb41e9318 ("igmp: fix ip_mc_sf_allow race [v5]") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Flavio Leitner fbl@sysclose.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/igmp.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -2403,9 +2403,10 @@ int ip_mc_source(int add, int omode, str /* decrease mem now to avoid the memleak warning */ atomic_sub(struct_size(psl, sl_addr, psl->sl_max), &sk->sk_omem_alloc); - kfree_rcu(psl, rcu); } rcu_assign_pointer(pmc->sflist, newpsl); + if (psl) + kfree_rcu(psl, rcu); psl = newpsl; } rv = 1; /* > 0 for insert logic below if sl_count is 0 */ @@ -2507,11 +2508,13 @@ int ip_mc_msfilter(struct sock *sk, stru /* decrease mem now to avoid the memleak warning */ atomic_sub(struct_size(psl, sl_addr, psl->sl_max), &sk->sk_omem_alloc); - kfree_rcu(psl, rcu); - } else + } else { (void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode, 0, NULL, 0); + } rcu_assign_pointer(pmc->sflist, newpsl); + if (psl) + kfree_rcu(psl, rcu); pmc->sfmode = msf->imsf_fmode; err = 0; done:
From: Shravya Kumbham shravya.kumbham@xilinx.com
commit 7a6bc33ab54923d325d9a1747ec9652c4361ebd1 upstream.
check the return value of of_address_to_resource() and also add missing of_node_put() for np and npp nodes.
Fixes: e0a3bc65448c ("net: emaclite: Support multiple phys connected to one MDIO bus") Addresses-Coverity: Event check_return value. Signed-off-by: Shravya Kumbham shravya.kumbham@xilinx.com Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@xilinx.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -822,10 +822,10 @@ static int xemaclite_mdio_write(struct m static int xemaclite_mdio_setup(struct net_local *lp, struct device *dev) { struct mii_bus *bus; - int rc; struct resource res; struct device_node *np = of_get_parent(lp->phy_node); struct device_node *npp; + int rc, ret;
/* Don't register the MDIO bus if the phy_node or its parent node * can't be found. @@ -835,8 +835,14 @@ static int xemaclite_mdio_setup(struct n return -ENODEV; } npp = of_get_parent(np); - - of_address_to_resource(npp, 0, &res); + ret = of_address_to_resource(npp, 0, &res); + of_node_put(npp); + if (ret) { + dev_err(dev, "%s resource error!\n", + dev->of_node->full_name); + of_node_put(np); + return ret; + } if (lp->ndev->mem_start != res.start) { struct phy_device *phydev; phydev = of_phy_find_device(lp->phy_node); @@ -845,6 +851,7 @@ static int xemaclite_mdio_setup(struct n "MDIO of the phy is not registered yet\n"); else put_device(&phydev->mdio.dev); + of_node_put(np); return 0; }
@@ -857,6 +864,7 @@ static int xemaclite_mdio_setup(struct n bus = mdiobus_alloc(); if (!bus) { dev_err(dev, "Failed to allocate mdiobus\n"); + of_node_put(np); return -ENOMEM; }
@@ -869,6 +877,7 @@ static int xemaclite_mdio_setup(struct n bus->parent = dev;
rc = of_mdiobus_register(bus, np); + of_node_put(np); if (rc) { dev_err(dev, "Failed to register mdio bus.\n"); goto err_register;
From: Marc Kleine-Budde mkl@pengutronix.de
commit 97926d5a847ca1758ad8702ce591e3b05a701e0d upstream.
This patch fixes the parsing of the cmd line supplied start time on 32 bit systems. A "long" on 32 bit systems is only 32 bit wide and cannot hold a timestamp in nano second resolution.
Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support") Cc: Carlos Llamas cmllamas@google.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Acked-by: Willem de Bruijn willemb@google.com Reviewed-by: Carlos Llamas cmllamas@google.com Link: https://lore.kernel.org/r/20220502094638.1921702-2-mkl@pengutronix.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/so_txtime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/so_txtime.c +++ b/tools/testing/selftests/net/so_txtime.c @@ -475,7 +475,7 @@ static void parse_opts(int argc, char ** cfg_rx = true; break; case 't': - cfg_start_time_ns = strtol(optarg, NULL, 0); + cfg_start_time_ns = strtoll(optarg, NULL, 0); break; case 'm': cfg_mark = strtol(optarg, NULL, 0);
From: Marc Kleine-Budde mkl@pengutronix.de
commit f5c2174a3775491e890ce285df52f5715fbef875 upstream.
The program uses CLOCK_TAI as default clock since it was added to the Linux repo. In commit: | 040806343bb4 ("selftests/net: so_txtime multi-host support") a help text stating the wrong default clock was added.
This patch fixes the help text.
Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support") Cc: Carlos Llamas cmllamas@google.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Acked-by: Willem de Bruijn willemb@google.com Reviewed-by: Carlos Llamas cmllamas@google.com Link: https://lore.kernel.org/r/20220502094638.1921702-3-mkl@pengutronix.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/so_txtime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/so_txtime.c +++ b/tools/testing/selftests/net/so_txtime.c @@ -421,7 +421,7 @@ static void usage(const char *progname) "Options:\n" " -4 only IPv4\n" " -6 only IPv6\n" - " -c <clock> monotonic (default) or tai\n" + " -c <clock> monotonic or tai (default)\n" " -D <addr> destination IP address (server)\n" " -S <addr> source IP address (client)\n" " -r run rx mode\n"
From: Kuogee Hsieh quic_khsieh@quicinc.com
commit 3f65b1e2f424f44585bd701024a3bfd0b1e0ade2 upstream.
Current DP driver implementation has adding safe mode done at dp_hpd_plug_handle() which is expected to be executed under event thread context.
However there is possible circular locking happen (see blow stack trace) after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which is executed under drm_thread context.
After review all possibilities methods and as discussed on https://patchwork.freedesktop.org/patch/483155/, supporting EDID compliance tests in the driver is quite hacky. As seen with other vendor drivers, supporting these will be much easier with IGT. Hence removing all the related fail safe code for it so that no possibility of circular lock will happen. Reviewed-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
====================================================== WARNING: possible circular locking dependency detected 5.15.35-lockdep #6 Tainted: G W ------------------------------------------------------ frecon/429 is trying to acquire lock: ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at: dp_panel_add_fail_safe_mode+0x4c/0xa0
but task is already holding lock: ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}: __mutex_lock_common+0x174/0x1a64 mutex_lock_nested+0x98/0xac lock_crtcs+0xb4/0x124 msm_atomic_commit_tail+0x330/0x748 commit_tail+0x19c/0x278 drm_atomic_helper_commit+0x1dc/0x1f0 drm_atomic_commit+0xc0/0xd8 drm_atomic_helper_set_config+0xb4/0x134 drm_mode_setcrtc+0x688/0x1248 drm_ioctl_kernel+0x1e4/0x338 drm_ioctl+0x3a4/0x684 __arm64_sys_ioctl+0x118/0x154 invoke_syscall+0x78/0x224 el0_svc_common+0x178/0x200 do_el0_svc+0x94/0x13c el0_svc+0x5c/0xec el0t_64_sync_handler+0x78/0x108 el0t_64_sync+0x1a4/0x1a8
-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}: __mutex_lock_common+0x174/0x1a64 ww_mutex_lock+0xb8/0x278 modeset_lock+0x304/0x4ac drm_modeset_lock+0x4c/0x7c drmm_mode_config_init+0x4a8/0xc50 msm_drm_init+0x274/0xac0 msm_drm_bind+0x20/0x2c try_to_bring_up_master+0x3dc/0x470 __component_add+0x18c/0x3c0 component_add+0x1c/0x28 dp_display_probe+0x954/0xa98 platform_probe+0x124/0x15c really_probe+0x1b0/0x5f8 __driver_probe_device+0x174/0x20c driver_probe_device+0x70/0x134 __device_attach_driver+0x130/0x1d0 bus_for_each_drv+0xfc/0x14c __device_attach+0x1bc/0x2bc device_initial_probe+0x1c/0x28 bus_probe_device+0x94/0x178 deferred_probe_work_func+0x1a4/0x1f0 process_one_work+0x5d4/0x9dc worker_thread+0x898/0xccc kthread+0x2d4/0x3d4 ret_from_fork+0x10/0x20
-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}: ww_acquire_init+0x1c4/0x2c8 drm_modeset_acquire_init+0x44/0xc8 drm_helper_probe_single_connector_modes+0xb0/0x12dc drm_mode_getconnector+0x5dc/0xfe8 drm_ioctl_kernel+0x1e4/0x338 drm_ioctl+0x3a4/0x684 __arm64_sys_ioctl+0x118/0x154 invoke_syscall+0x78/0x224 el0_svc_common+0x178/0x200 do_el0_svc+0x94/0x13c el0_svc+0x5c/0xec el0t_64_sync_handler+0x78/0x108 el0t_64_sync+0x1a4/0x1a8
-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}: __lock_acquire+0x2650/0x672c lock_acquire+0x1b4/0x4ac __mutex_lock_common+0x174/0x1a64 mutex_lock_nested+0x98/0xac dp_panel_add_fail_safe_mode+0x4c/0xa0 dp_hpd_plug_handle+0x1f0/0x280 dp_bridge_enable+0x94/0x2b8 drm_atomic_bridge_chain_enable+0x11c/0x168 drm_atomic_helper_commit_modeset_enables+0x500/0x740 msm_atomic_commit_tail+0x3e4/0x748 commit_tail+0x19c/0x278 drm_atomic_helper_commit+0x1dc/0x1f0 drm_atomic_commit+0xc0/0xd8 drm_atomic_helper_set_config+0xb4/0x134 drm_mode_setcrtc+0x688/0x1248 drm_ioctl_kernel+0x1e4/0x338 drm_ioctl+0x3a4/0x684 __arm64_sys_ioctl+0x118/0x154 invoke_syscall+0x78/0x224 el0_svc_common+0x178/0x200 do_el0_svc+0x94/0x13c el0_svc+0x5c/0xec el0t_64_sync_handler+0x78/0x108 el0t_64_sync+0x1a4/0x1a8
Changes in v2: -- re text commit title -- remove all fail safe mode
Changes in v3: -- remove dp_panel_add_fail_safe_mode() from dp_panel.h -- add Fixes
Changes in v5: -- to=dianders@chromium.org
Changes in v6: -- fix Fixes commit ID
Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context") Reported-by: Douglas Anderson dianders@chromium.org Signed-off-by: Kuogee Hsieh quic_khsieh@quicinc.com Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quic... Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/dp/dp_display.c | 6 ------ drivers/gpu/drm/msm/dp/dp_panel.c | 11 ----------- drivers/gpu/drm/msm/dp/dp_panel.h | 1 - 3 files changed, 18 deletions(-)
--- a/drivers/gpu/drm/msm/dp/dp_display.c +++ b/drivers/gpu/drm/msm/dp/dp_display.c @@ -551,12 +551,6 @@ static int dp_hpd_plug_handle(struct dp_
mutex_unlock(&dp->event_mutex);
- /* - * add fail safe mode outside event_mutex scope - * to avoid potiential circular lock with drm thread - */ - dp_panel_add_fail_safe_mode(dp->dp_display.connector); - /* uevent will complete connection part */ return 0; }; --- a/drivers/gpu/drm/msm/dp/dp_panel.c +++ b/drivers/gpu/drm/msm/dp/dp_panel.c @@ -151,15 +151,6 @@ static int dp_panel_update_modes(struct return rc; }
-void dp_panel_add_fail_safe_mode(struct drm_connector *connector) -{ - /* fail safe edid */ - mutex_lock(&connector->dev->mode_config.mutex); - if (drm_add_modes_noedid(connector, 640, 480)) - drm_set_preferred_mode(connector, 640, 480); - mutex_unlock(&connector->dev->mode_config.mutex); -} - int dp_panel_read_sink_caps(struct dp_panel *dp_panel, struct drm_connector *connector) { @@ -215,8 +206,6 @@ int dp_panel_read_sink_caps(struct dp_pa rc = -ETIMEDOUT; goto end; } - - dp_panel_add_fail_safe_mode(connector); }
if (panel->aux_cfg_update_done) { --- a/drivers/gpu/drm/msm/dp/dp_panel.h +++ b/drivers/gpu/drm/msm/dp/dp_panel.h @@ -59,7 +59,6 @@ int dp_panel_init_panel_info(struct dp_p int dp_panel_deinit(struct dp_panel *dp_panel); int dp_panel_timing_cfg(struct dp_panel *dp_panel); void dp_panel_dump_regs(struct dp_panel *dp_panel); -void dp_panel_add_fail_safe_mode(struct drm_connector *connector); int dp_panel_read_sink_caps(struct dp_panel *dp_panel, struct drm_connector *connector); u32 dp_panel_get_mode_bpp(struct dp_panel *dp_panel, u32 mode_max_bpp,
From: Filipe Manana fdmanana@suse.com
commit 193b4e83986d7ee6caa8ceefb5ee9f58240fbee0 upstream.
We are doing a BUG_ON() if we fail to update an inode after setting (or clearing) a xattr, but there's really no reason to not instead simply abort the transaction and return the error to the caller. This should be a rare error because we have previously reserved enough metadata space to update the inode and the delayed inode should have already been setup, so an -ENOSPC or -ENOMEM, which are the possible errors, are very unlikely to happen.
So replace the BUG_ON()s with a transaction abort.
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo wqu@suse.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/xattr.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -264,7 +264,8 @@ int btrfs_setxattr_trans(struct inode *i inode_inc_iversion(inode); inode->i_ctime = current_time(inode); ret = btrfs_update_inode(trans, root, BTRFS_I(inode)); - BUG_ON(ret); + if (ret) + btrfs_abort_transaction(trans, ret); out: if (start_trans) btrfs_end_transaction(trans); @@ -418,7 +419,8 @@ static int btrfs_xattr_handler_set_prop( inode_inc_iversion(inode); inode->i_ctime = current_time(inode); ret = btrfs_update_inode(trans, root, BTRFS_I(inode)); - BUG_ON(ret); + if (ret) + btrfs_abort_transaction(trans, ret); }
btrfs_end_transaction(trans);
From: Qiao Ma mqaio@linux.alibaba.com
commit 52b2abef450a78e25d485ac61e32f4ce86a87701 upstream.
If wq has only one page, we need to check wqe rolling over page by compare end_idx and curr_idx, and then copy wqe to shadow wqe to avoid out of bound access. This work has been done in hinic_get_wqe, but missed for hinic_read_wqe. This patch fixes it, and removes unnecessary MASKED_WQE_IDX().
Fixes: 7dd29ee12865 ("hinic: add sriov feature support") Signed-off-by: Qiao Ma mqaio@linux.alibaba.com Reviewed-by: Xunlei Pang xlpang@linux.alibaba.com Link: https://lore.kernel.org/r/282817b0e1ae2e28fdf3ed8271a04e77f57bf42e.165114858... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c @@ -772,7 +772,7 @@ struct hinic_hw_wqe *hinic_get_wqe(struc /* If we only have one page, still need to get shadown wqe when * wqe rolling-over page */ - if (curr_pg != end_pg || MASKED_WQE_IDX(wq, end_prod_idx) < *prod_idx) { + if (curr_pg != end_pg || end_prod_idx < *prod_idx) { void *shadow_addr = &wq->shadow_wqe[curr_pg * wq->max_wqe_size];
copy_wqe_to_shadow(wq, shadow_addr, num_wqebbs, *prod_idx); @@ -842,7 +842,10 @@ struct hinic_hw_wqe *hinic_read_wqe(stru
*cons_idx = curr_cons_idx;
- if (curr_pg != end_pg) { + /* If we only have one page, still need to get shadown wqe when + * wqe rolling-over page + */ + if (curr_pg != end_pg || end_cons_idx < curr_cons_idx) { void *shadow_addr = &wq->shadow_wqe[curr_pg * wq->max_wqe_size];
copy_wqe_to_shadow(wq, shadow_addr, num_wqebbs, *cons_idx);
From: Eric Dumazet edumazet@google.com
commit a9384a4c1d250cb40cebf50e41459426d160b08e upstream.
Whenever RCU protected list replaces an object, the pointer to the new object needs to be updated _before_ the call to kfree_rcu() or call_rcu()
Also ip6_mc_msfilter() needs to update the pointer before releasing the mc_lock mutex.
Note that linux-5.13 was supporting kfree_rcu(NULL, rcu), so this fix does not need the conditional test I was forced to use in the equivalent patch for IPv4.
Fixes: 882ba1f73c06 ("mld: convert ipv6_mc_socklist->sflist to RCU") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/mcast.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 909f937befd7..7f695c39d9a8 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -460,10 +460,10 @@ int ip6_mc_source(int add, int omode, struct sock *sk, newpsl->sl_addr[i] = psl->sl_addr[i]; atomic_sub(struct_size(psl, sl_addr, psl->sl_max), &sk->sk_omem_alloc); - kfree_rcu(psl, rcu); } + rcu_assign_pointer(pmc->sflist, newpsl); + kfree_rcu(psl, rcu); psl = newpsl; - rcu_assign_pointer(pmc->sflist, psl); } rv = 1; /* > 0 for insert logic below if sl_count is 0 */ for (i = 0; i < psl->sl_count; i++) { @@ -565,12 +565,12 @@ int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf, psl->sl_count, psl->sl_addr, 0); atomic_sub(struct_size(psl, sl_addr, psl->sl_max), &sk->sk_omem_alloc); - kfree_rcu(psl, rcu); } else { ip6_mc_del_src(idev, group, pmc->sfmode, 0, NULL, 0); } - mutex_unlock(&idev->mc_lock); rcu_assign_pointer(pmc->sflist, newpsl); + mutex_unlock(&idev->mc_lock); + kfree_rcu(psl, rcu); pmc->sfmode = gsf->gf_fmode; err = 0; done:
From: David Howells dhowells@redhat.com
commit 39cb9faa5d46d0d0694f4b594ef905f517600c8e upstream.
AF_RXRPC doesn't currently enable IPv6 UDP Tx checksums on the transport socket it opens and the checksums in the packets it generates end up 0.
It probably should also enable IPv6 UDP Rx checksums and IPv4 UDP checksums. The latter only seem to be applied if the socket family is AF_INET and don't seem to apply if it's AF_INET6. IPv4 packets from an IPv6 socket seem to have checksums anyway.
What seems to have happened is that the inet_inv_convert_csum() call didn't get converted to the appropriate udp_port_cfg parameters - and udp_sock_create() disables checksums unless explicitly told not too.
Fix this by enabling the three udp_port_cfg checksum options.
Fixes: 1a9b86c9fd95 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket") Reported-by: Marc Dionne marc.dionne@auristor.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Xin Long lucien.xin@gmail.com Reviewed-by: Marc Dionne marc.dionne@auristor.com cc: Vadim Fedorenko vfedorenko@novek.ru cc: David S. Miller davem@davemloft.net cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rxrpc/local_object.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -117,6 +117,7 @@ static int rxrpc_open_socket(struct rxrp local, srx->transport_type, srx->transport.family);
udp_conf.family = srx->transport.family; + udp_conf.use_udp_checksums = true; if (udp_conf.family == AF_INET) { udp_conf.local_ip = srx->transport.sin.sin_addr; udp_conf.local_udp_port = srx->transport.sin.sin_port; @@ -124,6 +125,8 @@ static int rxrpc_open_socket(struct rxrp } else { udp_conf.local_ip6 = srx->transport.sin6.sin6_addr; udp_conf.local_udp_port = srx->transport.sin6.sin6_port; + udp_conf.use_udp6_tx_checksums = true; + udp_conf.use_udp6_rx_checksums = true; #endif } ret = udp_sock_create(net, &udp_conf, &local->socket);
From: Ido Schimmel idosch@nvidia.com
commit 3122257c02afd9f199a8fc84ae981e1fc4958532 upstream.
In emulated environments, the bridge ports enslaved to br1 get a carrier before changing br1's PVID. This means that by the time the PVID is changed, br1 is already operational and configured with an IPv6 link-local address.
When the test is run with netdevs registered by mlxsw, changing the PVID is vetoed, as changing the VID associated with an existing L3 interface is forbidden. This restriction is similar to the 8021q driver's restriction of changing the VID of an existing interface.
Fix this by taking br1 down and bringing it back up when it is fully configured.
With this fix, the test reliably passes on top of both the SW and HW data paths (emulated or not).
Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh | 3 +++ 1 file changed, 3 insertions(+)
--- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh +++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh @@ -61,9 +61,12 @@ setup_prepare()
vrf_prepare mirror_gre_topo_create + # Avoid changing br1's PVID while it is operational as a L3 interface. + ip link set dev br1 down
ip link set dev $swp3 master br1 bridge vlan add dev br1 vid 555 pvid untagged self + ip link set dev br1 up ip address add dev br1 192.0.2.129/28 ip address add dev br1 2001:db8:2::1/64
From: Somnath Kotur somnath.kotur@broadcom.com
commit 13ba794397e45e52893cfc21d7a69cb5f341b407 upstream.
bnxt_open() can fail in this code path, especially on a VF when it fails to reserve default rings:
bnxt_open() __bnxt_open_nic() bnxt_clear_int_mode() bnxt_init_dflt_ring_mode()
RX rings would be set to 0 when we hit this error path.
It is possible for a subsequent bnxt_open() call to potentially succeed with a code path like this:
bnxt_open() bnxt_hwrm_if_change() bnxt_fw_init_one() bnxt_fw_init_one_p3() bnxt_set_dflt_rfs() bnxt_rfs_capable() bnxt_hwrm_reserve_rings()
On older chips, RFS is capable if we can reserve the number of vnics that is equal to RX rings + 1. But since RX rings is still set to 0 in this code path, we may mistakenly think that RFS is supported for 0 RX rings.
Later, when the default RX rings are reserved and we try to enable RFS, it would fail and cause bnxt_open() to fail unnecessarily.
We fix this in 2 places. bnxt_rfs_capable() will always return false if RX rings is not yet set. bnxt_init_dflt_ring_mode() will call bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not supported.
Fixes: 20d7d1c5c9b1 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash") Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -10881,7 +10881,7 @@ static bool bnxt_rfs_capable(struct bnxt
if (bp->flags & BNXT_FLAG_CHIP_P5) return bnxt_rfs_supported(bp); - if (!(bp->flags & BNXT_FLAG_MSIX_CAP) || !bnxt_can_reserve_rings(bp)) + if (!(bp->flags & BNXT_FLAG_MSIX_CAP) || !bnxt_can_reserve_rings(bp) || !bp->rx_nr_rings) return false;
vnics = 1 + bp->rx_nr_rings; @@ -13087,10 +13087,9 @@ static int bnxt_init_dflt_ring_mode(stru goto init_dflt_ring_err;
bp->tx_nr_rings_per_tc = bp->tx_nr_rings; - if (bnxt_rfs_supported(bp) && bnxt_rfs_capable(bp)) { - bp->flags |= BNXT_FLAG_RFS; - bp->dev->features |= NETIF_F_NTUPLE; - } + + bnxt_set_dflt_rfs(bp); + init_dflt_ring_err: bnxt_ulp_irq_restart(bp, rc); return rc;
From: Michael Chan michael.chan@broadcom.com
commit 195af57914d15229186658ed26dab24b9ada4122 upstream.
In bnxt_poll_p5(), we first check cpr->has_more_work. If it is true, we are in NAPI polling mode and we will call __bnxt_poll_cqs() to continue polling. It is possible to exhanust the budget again when __bnxt_poll_cqs() returns.
We then enter the main while loop to check for new entries in the NQ. If we had previously exhausted the NAPI budget, we may call __bnxt_poll_work() to process an RX entry with zero budget. This will cause packets to be dropped unnecessarily, thinking that we are in the netpoll path. Fix it by breaking out of the while loop if we need to process an RX NQ entry with no budget left. We will then exit NAPI and stay in polling mode.
Fixes: 389a877a3b20 ("bnxt_en: Process the NQ under NAPI continuous polling.") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -2699,6 +2699,10 @@ static int bnxt_poll_p5(struct napi_stru u32 idx = le32_to_cpu(nqcmp->cq_handle_low); struct bnxt_cp_ring_info *cpr2;
+ /* No more budget for RX work */ + if (budget && work_done >= budget && idx == BNXT_RX_HDL) + break; + cpr2 = cpr->cp_ring_arr[idx]; work_done += __bnxt_poll_work(bp, cpr2, budget - work_done);
From: Vladimir Oltean vladimir.oltean@nxp.com
commit 5a7c5f70c743c6cf32b44b05bd6b19d4ad82f49d upstream.
As discussed here with Ido Schimmel: https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-j...
the default conform-exceed action is "reclassify", for a reason we don't really understand.
The point is that hardware can't offload that police action, so not specifying "conform-exceed" was always wrong, even though the command used to work in hardware (but not in software) until the kernel started adding validation for it.
Fix the command used by the selftest by making the policer drop on exceed, and pass the packet to the next action (goto) on conform.
Fixes: 8cd6b020b644 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://lore.kernel.org/r/20220503121428.842906-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh +++ b/tools/testing/selftests/drivers/net/ocelot/tc_flower_chains.sh @@ -185,7 +185,7 @@ setup_prepare()
tc filter add dev $eth0 ingress chain $(IS2 0 0) pref 1 \ protocol ipv4 flower skip_sw ip_proto udp dst_port 5201 \ - action police rate 50mbit burst 64k \ + action police rate 50mbit burst 64k conform-exceed drop/pipe \ action goto chain $(IS2 1 0) }
From: Sergey Shtylyov s.shtylyov@omp.ru
commit 5ef9b803a4af0f5e42012176889b40bb2a978b18 upstream.
The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC LAN911x Ethernet chip, so the networking on them must have been broken by commit 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure") which filtered out 0 as well as the negative error codes -- it was kinda correct at the time, as platform_get_irq() could return 0 on of_irq_get() failure and on the actual 0 in an IRQ resource. This issue was fixed by me (back in 2016!), so we should be able to fix this driver to allow IRQ0 usage again...
When merging this to the stable kernels, make sure you also merge commit e330b9a6bb35 ("platform: don't return 0 from platform_get_irq[_byname]() on error") -- that's my fix to platform_get_irq() for the DT platforms...
Fixes: 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/smsc/smsc911x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -2429,7 +2429,7 @@ static int smsc911x_drv_probe(struct pla if (irq == -EPROBE_DEFER) { retval = -EPROBE_DEFER; goto out_0; - } else if (irq <= 0) { + } else if (irq < 0) { pr_warn("Could not allocate irq resource\n"); retval = -ENODEV; goto out_0;
From: Qu Wenruo wqu@suse.com
commit 9f73f1aef98b2fa7252c0a89be64840271ce8ea0 upstream.
[BUG] For a 4K sector sized btrfs with v1 cache enabled and only mounted on systems with 4K page size, if it's mounted on subpage (64K page size) systems, it can cause the following warning on v1 space cache:
BTRFS error (device dm-1): csum mismatch on free space cache BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now
Although not a big deal, as kernel can rebuild it without problem, such warning will bother end users, especially if they want to switch the same btrfs seamlessly between different page sized systems.
[CAUSE] V1 free space cache is still using fixed PAGE_SIZE for various bitmap, like BITS_PER_BITMAP.
Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1 cache size to checksum.
Thus kernel will always reject v1 cache with a different PAGE_SIZE with csum mismatch.
[FIX] Although we should fix v1 cache, it's already going to be marked deprecated soon.
And we have v2 cache based on metadata (which is already fully subpage compatible), and it has almost everything superior than v1 cache.
So just force subpage mount to use v2 cache on mount.
Reported-by: Matt Corallo blnxfsl@bluematt.me CC: stable@vger.kernel.org # 5.15+ Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@blu... Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3415,6 +3415,17 @@ int __cold open_ctree(struct super_block }
if (sectorsize != PAGE_SIZE) { + /* + * V1 space cache has some hardcoded PAGE_SIZE usage, and is + * going to be deprecated. + * + * Force to use v2 cache for subpage case. + */ + btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE); + btrfs_set_and_info(fs_info, FREE_SPACE_TREE, + "forcing free space tree for sector size %u with page size %lu", + sectorsize, PAGE_SIZE); + btrfs_warn(fs_info, "read-write for sector size %u with page size %lu is experimental", sectorsize, PAGE_SIZE);
From: Filipe Manana fdmanana@suse.com
commit d0e64a981fd841cb0f28fcd6afcac55e6f1e6994 upstream.
On Linux, empty symlinks are invalid, and attempting to create one with the system call symlink(2) results in an -ENOENT error and this is explicitly documented in the man page.
If we rename a symlink that was created in the current transaction and its parent directory was logged before, we actually end up logging the symlink without logging its content, which is stored in an inline extent. That means that after a power failure we can end up with an empty symlink, having no content and an i_size of 0 bytes.
It can be easily reproduced like this:
$ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt
$ mkdir /mnt/testdir $ sync
# Create a file inside the directory and fsync the directory. $ touch /mnt/testdir/foo $ xfs_io -c "fsync" /mnt/testdir
# Create a symlink inside the directory and then rename the symlink. $ ln -s /mnt/testdir/foo /mnt/testdir/bar $ mv /mnt/testdir/bar /mnt/testdir/baz
# Now fsync again the directory, this persist the log tree. $ xfs_io -c "fsync" /mnt/testdir
<power failure>
$ mount /dev/sdc /mnt $ stat -c %s /mnt/testdir/baz 0 $ readlink /mnt/testdir/baz $
Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that their content is also logged.
A test case for fstests will follow.
CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-log.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5484,6 +5484,18 @@ static int btrfs_log_inode(struct btrfs_ }
/* + * For symlinks, we must always log their content, which is stored in an + * inline extent, otherwise we could end up with an empty symlink after + * log replay, which is invalid on linux (symlink(2) returns -ENOENT if + * one attempts to create an empty symlink). + * We don't need to worry about flushing delalloc, because when we create + * the inline extent when the symlink is created (we never have delalloc + * for symlinks). + */ + if (S_ISLNK(inode->vfs_inode.i_mode)) + inode_only = LOG_INODE_ALL; + + /* * This is for cases where logging a directory could result in losing a * a file after replaying the log. For example, if we move a file from a * directory A to a directory B, then fsync directory A, we have no way @@ -5853,7 +5865,7 @@ process_leaf: }
ctx->log_new_dentries = false; - if (type == BTRFS_FT_DIR || type == BTRFS_FT_SYMLINK) + if (type == BTRFS_FT_DIR) log_mode = LOG_INODE_ALL; ret = btrfs_log_inode(trans, root, BTRFS_I(di_inode), log_mode, ctx);
From: Nirmoy Das nirmoy.das@amd.com
commit 58144d283712c9e80e528e001af6ac5aeee71af2 upstream.
Unify BO evicting functionality for possible memory types in amdgpu_ttm.c.
Signed-off-by: Nirmoy Das nirmoy.das@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: "Limonciello, Mario" Mario.Limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 8 ++----- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++++++++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 23 --------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30 ++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 6 files changed, 58 insertions(+), 35 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1176,7 +1176,7 @@ static int amdgpu_debugfs_evict_vram(voi return r; }
- *val = amdgpu_bo_evict_vram(adev); + *val = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); @@ -1189,17 +1189,15 @@ static int amdgpu_debugfs_evict_gtt(void { struct amdgpu_device *adev = (struct amdgpu_device *)data; struct drm_device *dev = adev_to_drm(adev); - struct ttm_resource_manager *man; int r;
r = pm_runtime_get_sync(dev->dev); if (r < 0) { - pm_runtime_put_autosuspend(adev_to_drm(adev)->dev); + pm_runtime_put_autosuspend(dev->dev); return r; }
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT); - *val = ttm_resource_manager_evict_all(&adev->mman.bdev, man); + *val = amdgpu_ttm_evict_resources(adev, TTM_PL_TT);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3928,6 +3928,25 @@ void amdgpu_device_fini_sw(struct amdgpu
}
+/** + * amdgpu_device_evict_resources - evict device resources + * @adev: amdgpu device object + * + * Evicts all ttm device resources(vram BOs, gart table) from the lru list + * of the vram memory type. Mainly used for evicting device resources + * at suspend time. + * + */ +static void amdgpu_device_evict_resources(struct amdgpu_device *adev) +{ + /* No need to evict vram on APUs for suspend to ram */ + if (adev->in_s3 && (adev->flags & AMD_IS_APU)) + return; + + if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM)) + DRM_WARN("evicting device resources failed\n"); + +}
/* * Suspend & resume. @@ -3968,17 +3987,16 @@ int amdgpu_device_suspend(struct drm_dev if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm);
- /* evict vram memory */ - amdgpu_bo_evict_vram(adev); + /* First evict vram memory */ + amdgpu_device_evict_resources(adev);
amdgpu_fence_driver_hw_fini(adev);
amdgpu_device_ip_suspend_phase2(adev); - /* evict remaining vram memory - * This second call to evict vram is to evict the gart page table - * using the CPU. + /* This second call to evict device resources is to evict + * the gart page table using the CPU. */ - amdgpu_bo_evict_vram(adev); + amdgpu_device_evict_resources(adev);
return 0; } --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1038,29 +1038,6 @@ void amdgpu_bo_unpin(struct amdgpu_bo *b } }
-/** - * amdgpu_bo_evict_vram - evict VRAM buffers - * @adev: amdgpu device object - * - * Evicts all VRAM buffers on the lru list of the memory type. - * Mainly used for evicting vram at suspend time. - * - * Returns: - * 0 for success or a negative error code on failure. - */ -int amdgpu_bo_evict_vram(struct amdgpu_device *adev) -{ - struct ttm_resource_manager *man; - - if (adev->in_s3 && (adev->flags & AMD_IS_APU)) { - /* No need to evict vram on APUs for suspend to ram */ - return 0; - } - - man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM); - return ttm_resource_manager_evict_all(&adev->mman.bdev, man); -} - static const char *amdgpu_vram_names[] = { "UNKNOWN", "GDDR1", --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -304,7 +304,6 @@ int amdgpu_bo_pin(struct amdgpu_bo *bo, int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, u64 min_offset, u64 max_offset); void amdgpu_bo_unpin(struct amdgpu_bo *bo); -int amdgpu_bo_evict_vram(struct amdgpu_device *adev); int amdgpu_bo_init(struct amdgpu_device *adev); void amdgpu_bo_fini(struct amdgpu_device *adev); int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2036,6 +2036,36 @@ error_free: return r; }
+/** + * amdgpu_ttm_evict_resources - evict memory buffers + * @adev: amdgpu device object + * @mem_type: evicted BO's memory type + * + * Evicts all @mem_type buffers on the lru list of the memory type. + * + * Returns: + * 0 for success or a negative error code on failure. + */ +int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type) +{ + struct ttm_resource_manager *man; + + switch (mem_type) { + case TTM_PL_VRAM: + case TTM_PL_TT: + case AMDGPU_PL_GWS: + case AMDGPU_PL_GDS: + case AMDGPU_PL_OA: + man = ttm_manager_type(&adev->mman.bdev, mem_type); + break; + default: + DRM_ERROR("Trying to evict invalid memory type\n"); + return -EINVAL; + } + + return ttm_resource_manager_evict_all(&adev->mman.bdev, man); +} + #if defined(CONFIG_DEBUG_FS)
static int amdgpu_mm_vram_table_show(struct seq_file *m, void *unused) --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -190,6 +190,7 @@ bool amdgpu_ttm_tt_is_readonly(struct tt uint64_t amdgpu_ttm_tt_pde_flags(struct ttm_tt *ttm, struct ttm_resource *mem); uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct ttm_tt *ttm, struct ttm_resource *mem); +int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type);
void amdgpu_ttm_debugfs_init(struct amdgpu_device *adev);
Hi Greg,
sorry only noticing this now. Why is that patch backported?
I mean it probably doesn't hurt, but that is just a code cleanup without much function difference and not a bug fix.
Regards, Christian.
Am 10.05.22 um 15:07 schrieb Greg Kroah-Hartman:
From: Nirmoy Das nirmoy.das@amd.com
commit 58144d283712c9e80e528e001af6ac5aeee71af2 upstream.
Unify BO evicting functionality for possible memory types in amdgpu_ttm.c.
Signed-off-by: Nirmoy Das nirmoy.das@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: "Limonciello, Mario" Mario.Limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 8 ++----- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++++++++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 23 --------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30 ++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 6 files changed, 58 insertions(+), 35 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1176,7 +1176,7 @@ static int amdgpu_debugfs_evict_vram(voi return r; }
- *val = amdgpu_bo_evict_vram(adev);
- *val = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); @@ -1189,17 +1189,15 @@ static int amdgpu_debugfs_evict_gtt(void { struct amdgpu_device *adev = (struct amdgpu_device *)data; struct drm_device *dev = adev_to_drm(adev);
- struct ttm_resource_manager *man; int r;
r = pm_runtime_get_sync(dev->dev); if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
return r; }pm_runtime_put_autosuspend(dev->dev);
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
- *val = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
- *val = amdgpu_ttm_evict_resources(adev, TTM_PL_TT);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3928,6 +3928,25 @@ void amdgpu_device_fini_sw(struct amdgpu } +/**
- amdgpu_device_evict_resources - evict device resources
- @adev: amdgpu device object
- Evicts all ttm device resources(vram BOs, gart table) from the lru list
- of the vram memory type. Mainly used for evicting device resources
- at suspend time.
- */
+static void amdgpu_device_evict_resources(struct amdgpu_device *adev) +{
- /* No need to evict vram on APUs for suspend to ram */
- if (adev->in_s3 && (adev->flags & AMD_IS_APU))
return;
- if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM))
DRM_WARN("evicting device resources failed\n");
+} /*
- Suspend & resume.
@@ -3968,17 +3987,16 @@ int amdgpu_device_suspend(struct drm_dev if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm);
- /* evict vram memory */
- amdgpu_bo_evict_vram(adev);
- /* First evict vram memory */
- amdgpu_device_evict_resources(adev);
amdgpu_fence_driver_hw_fini(adev); amdgpu_device_ip_suspend_phase2(adev);
- /* evict remaining vram memory
* This second call to evict vram is to evict the gart page table
* using the CPU.
- /* This second call to evict device resources is to evict
*/* the gart page table using the CPU.
- amdgpu_bo_evict_vram(adev);
- amdgpu_device_evict_resources(adev);
return 0; } --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1038,29 +1038,6 @@ void amdgpu_bo_unpin(struct amdgpu_bo *b } } -/**
- amdgpu_bo_evict_vram - evict VRAM buffers
- @adev: amdgpu device object
- Evicts all VRAM buffers on the lru list of the memory type.
- Mainly used for evicting vram at suspend time.
- Returns:
- 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_evict_vram(struct amdgpu_device *adev) -{
- struct ttm_resource_manager *man;
- if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {
/* No need to evict vram on APUs for suspend to ram */
return 0;
- }
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
- return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
-}
- static const char *amdgpu_vram_names[] = { "UNKNOWN", "GDDR1",
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -304,7 +304,6 @@ int amdgpu_bo_pin(struct amdgpu_bo *bo, int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, u64 min_offset, u64 max_offset); void amdgpu_bo_unpin(struct amdgpu_bo *bo); -int amdgpu_bo_evict_vram(struct amdgpu_device *adev); int amdgpu_bo_init(struct amdgpu_device *adev); void amdgpu_bo_fini(struct amdgpu_device *adev); int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2036,6 +2036,36 @@ error_free: return r; } +/**
- amdgpu_ttm_evict_resources - evict memory buffers
- @adev: amdgpu device object
- @mem_type: evicted BO's memory type
- Evicts all @mem_type buffers on the lru list of the memory type.
- Returns:
- 0 for success or a negative error code on failure.
- */
+int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type) +{
- struct ttm_resource_manager *man;
- switch (mem_type) {
- case TTM_PL_VRAM:
- case TTM_PL_TT:
- case AMDGPU_PL_GWS:
- case AMDGPU_PL_GDS:
- case AMDGPU_PL_OA:
man = ttm_manager_type(&adev->mman.bdev, mem_type);
break;
- default:
DRM_ERROR("Trying to evict invalid memory type\n");
return -EINVAL;
- }
- return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
+}
- #if defined(CONFIG_DEBUG_FS)
static int amdgpu_mm_vram_table_show(struct seq_file *m, void *unused) --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -190,6 +190,7 @@ bool amdgpu_ttm_tt_is_readonly(struct tt uint64_t amdgpu_ttm_tt_pde_flags(struct ttm_tt *ttm, struct ttm_resource *mem); uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct ttm_tt *ttm, struct ttm_resource *mem); +int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type); void amdgpu_ttm_debugfs_init(struct amdgpu_device *adev);
[AMD Official Use Only - General]
-----Original Message----- From: Koenig, Christian Christian.Koenig@amd.com Sent: Tuesday, May 10, 2022 10:15 To: Greg Kroah-Hartman gregkh@linuxfoundation.org; linux- kernel@vger.kernel.org Cc: stable@vger.kernel.org; Nirmoy Das nirmoy.das@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com Subject: Re: [PATCH 5.15 082/135] drm/amdgpu: unify BO evicting method in amdgpu_ttm
Hi Greg,
sorry only noticing this now. Why is that patch backported?
I mean it probably doesn't hurt, but that is just a code cleanup without much function difference and not a bug fix.
Christian,
It was for supporting a backport of some other fixes. See: https://lore.kernel.org/stable/BL1PR12MB5157776D00DAA747EF550CF1E2C69@BL1PR1...
Technically it could have been a hand modified e53d9665ab00 but distros are doing it as a straight backport already so my thought was it's better to align what they're doing.
Regards, Christian.
Am 10.05.22 um 15:07 schrieb Greg Kroah-Hartman:
From: Nirmoy Das nirmoy.das@amd.com
commit 58144d283712c9e80e528e001af6ac5aeee71af2 upstream.
Unify BO evicting functionality for possible memory types in amdgpu_ttm.c.
Signed-off-by: Nirmoy Das nirmoy.das@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: "Limonciello, Mario" Mario.Limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 8 ++----- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30
++++++++++++++++++++++------
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 23 --------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30
++++++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 6 files changed, 58 insertions(+), 35 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1176,7 +1176,7 @@ static int amdgpu_debugfs_evict_vram(voi return r; }
- *val = amdgpu_bo_evict_vram(adev);
*val = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev);
@@ -1189,17 +1189,15 @@ static int amdgpu_debugfs_evict_gtt(void { struct amdgpu_device *adev = (struct amdgpu_device *)data; struct drm_device *dev = adev_to_drm(adev);
struct ttm_resource_manager *man; int r;
r = pm_runtime_get_sync(dev->dev); if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
return r; }pm_runtime_put_autosuspend(dev->dev);
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
- *val = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
*val = amdgpu_ttm_evict_resources(adev, TTM_PL_TT);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev);
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3928,6 +3928,25 @@ void amdgpu_device_fini_sw(struct amdgpu
}
+/**
- amdgpu_device_evict_resources - evict device resources
- @adev: amdgpu device object
- Evicts all ttm device resources(vram BOs, gart table) from the lru list
- of the vram memory type. Mainly used for evicting device resources
- at suspend time.
- */
+static void amdgpu_device_evict_resources(struct amdgpu_device
*adev)
+{
- /* No need to evict vram on APUs for suspend to ram */
- if (adev->in_s3 && (adev->flags & AMD_IS_APU))
return;
- if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM))
DRM_WARN("evicting device resources failed\n");
+}
/*
- Suspend & resume.
@@ -3968,17 +3987,16 @@ int amdgpu_device_suspend(struct drm_dev if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm);
- /* evict vram memory */
- amdgpu_bo_evict_vram(adev);
/* First evict vram memory */
amdgpu_device_evict_resources(adev);
amdgpu_fence_driver_hw_fini(adev);
amdgpu_device_ip_suspend_phase2(adev);
- /* evict remaining vram memory
* This second call to evict vram is to evict the gart page table
* using the CPU.
- /* This second call to evict device resources is to evict
*/* the gart page table using the CPU.
- amdgpu_bo_evict_vram(adev);
amdgpu_device_evict_resources(adev);
return 0; }
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1038,29 +1038,6 @@ void amdgpu_bo_unpin(struct amdgpu_bo *b } }
-/**
- amdgpu_bo_evict_vram - evict VRAM buffers
- @adev: amdgpu device object
- Evicts all VRAM buffers on the lru list of the memory type.
- Mainly used for evicting vram at suspend time.
- Returns:
- 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_evict_vram(struct amdgpu_device *adev) -{
- struct ttm_resource_manager *man;
- if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {
/* No need to evict vram on APUs for suspend to ram */
return 0;
- }
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
- return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
-}
- static const char *amdgpu_vram_names[] = { "UNKNOWN", "GDDR1",
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -304,7 +304,6 @@ int amdgpu_bo_pin(struct amdgpu_bo *bo, int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, u64 min_offset, u64 max_offset); void amdgpu_bo_unpin(struct amdgpu_bo *bo); -int amdgpu_bo_evict_vram(struct amdgpu_device *adev); int amdgpu_bo_init(struct amdgpu_device *adev); void amdgpu_bo_fini(struct amdgpu_device *adev); int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2036,6 +2036,36 @@ error_free: return r; }
+/**
- amdgpu_ttm_evict_resources - evict memory buffers
- @adev: amdgpu device object
- @mem_type: evicted BO's memory type
- Evicts all @mem_type buffers on the lru list of the memory type.
- Returns:
- 0 for success or a negative error code on failure.
- */
+int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int
mem_type)
+{
- struct ttm_resource_manager *man;
- switch (mem_type) {
- case TTM_PL_VRAM:
- case TTM_PL_TT:
- case AMDGPU_PL_GWS:
- case AMDGPU_PL_GDS:
- case AMDGPU_PL_OA:
man = ttm_manager_type(&adev->mman.bdev,
mem_type);
break;
- default:
DRM_ERROR("Trying to evict invalid memory type\n");
return -EINVAL;
- }
- return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
+}
#if defined(CONFIG_DEBUG_FS)
static int amdgpu_mm_vram_table_show(struct seq_file *m, void
*unused)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -190,6 +190,7 @@ bool amdgpu_ttm_tt_is_readonly(struct tt uint64_t amdgpu_ttm_tt_pde_flags(struct ttm_tt *ttm, struct
ttm_resource *mem);
uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct
ttm_tt *ttm,
struct ttm_resource *mem);
+int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int
mem_type);
void amdgpu_ttm_debugfs_init(struct amdgpu_device *adev);
Am 10.05.22 um 17:17 schrieb Limonciello, Mario:
[AMD Official Use Only - General]
-----Original Message----- From: Koenig, Christian Christian.Koenig@amd.com Sent: Tuesday, May 10, 2022 10:15 To: Greg Kroah-Hartman gregkh@linuxfoundation.org; linux- kernel@vger.kernel.org Cc: stable@vger.kernel.org; Nirmoy Das nirmoy.das@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com Subject: Re: [PATCH 5.15 082/135] drm/amdgpu: unify BO evicting method in amdgpu_ttm
Hi Greg,
sorry only noticing this now. Why is that patch backported?
I mean it probably doesn't hurt, but that is just a code cleanup without much function difference and not a bug fix.
Christian,
It was for supporting a backport of some other fixes. See: https://lore.kernel.org/stable/BL1PR12MB5157776D00DAA747EF550CF1E2C69@BL1PR1...
Technically it could have been a hand modified e53d9665ab00 but distros are doing it as a straight backport already so my thought was it's better to align what they're doing.
Ah! That makes more sense now, yes feel free to go ahead with that.
It's just that the backmerge notice of this patch alone ended up in my inbox and that didn't made to much sense.
Thanks, Christian.
Regards, Christian.
Am 10.05.22 um 15:07 schrieb Greg Kroah-Hartman:
From: Nirmoy Das nirmoy.das@amd.com
commit 58144d283712c9e80e528e001af6ac5aeee71af2 upstream.
Unify BO evicting functionality for possible memory types in amdgpu_ttm.c.
Signed-off-by: Nirmoy Das nirmoy.das@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: "Limonciello, Mario" Mario.Limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 8 ++----- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30
++++++++++++++++++++++------
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 23 --------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30
++++++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 6 files changed, 58 insertions(+), 35 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1176,7 +1176,7 @@ static int amdgpu_debugfs_evict_vram(voi return r; }
- *val = amdgpu_bo_evict_vram(adev);
*val = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev);
@@ -1189,17 +1189,15 @@ static int amdgpu_debugfs_evict_gtt(void { struct amdgpu_device *adev = (struct amdgpu_device *)data; struct drm_device *dev = adev_to_drm(adev);
struct ttm_resource_manager *man; int r;
r = pm_runtime_get_sync(dev->dev); if (r < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
}pm_runtime_put_autosuspend(dev->dev); return r;
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
- *val = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
*val = amdgpu_ttm_evict_resources(adev, TTM_PL_TT);
pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev);
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3928,6 +3928,25 @@ void amdgpu_device_fini_sw(struct amdgpu
}
+/**
- amdgpu_device_evict_resources - evict device resources
- @adev: amdgpu device object
- Evicts all ttm device resources(vram BOs, gart table) from the lru list
- of the vram memory type. Mainly used for evicting device resources
- at suspend time.
- */
+static void amdgpu_device_evict_resources(struct amdgpu_device
*adev)
+{
- /* No need to evict vram on APUs for suspend to ram */
- if (adev->in_s3 && (adev->flags & AMD_IS_APU))
return;
- if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM))
DRM_WARN("evicting device resources failed\n");
+}
/* * Suspend & resume. @@ -3968,17 +3987,16 @@ int amdgpu_device_suspend(struct drm_dev if (!adev->in_s0ix) amdgpu_amdkfd_suspend(adev, adev->in_runpm);
- /* evict vram memory */
- amdgpu_bo_evict_vram(adev);
/* First evict vram memory */
amdgpu_device_evict_resources(adev);
amdgpu_fence_driver_hw_fini(adev);
amdgpu_device_ip_suspend_phase2(adev);
- /* evict remaining vram memory
* This second call to evict vram is to evict the gart page table
* using the CPU.
- /* This second call to evict device resources is to evict
*/* the gart page table using the CPU.
- amdgpu_bo_evict_vram(adev);
amdgpu_device_evict_resources(adev);
return 0; }
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1038,29 +1038,6 @@ void amdgpu_bo_unpin(struct amdgpu_bo *b } }
-/**
- amdgpu_bo_evict_vram - evict VRAM buffers
- @adev: amdgpu device object
- Evicts all VRAM buffers on the lru list of the memory type.
- Mainly used for evicting vram at suspend time.
- Returns:
- 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_evict_vram(struct amdgpu_device *adev) -{
- struct ttm_resource_manager *man;
- if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {
/* No need to evict vram on APUs for suspend to ram */
return 0;
- }
- man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
- return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
-}
- static const char *amdgpu_vram_names[] = { "UNKNOWN", "GDDR1",
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -304,7 +304,6 @@ int amdgpu_bo_pin(struct amdgpu_bo *bo, int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, u64 min_offset, u64 max_offset); void amdgpu_bo_unpin(struct amdgpu_bo *bo); -int amdgpu_bo_evict_vram(struct amdgpu_device *adev); int amdgpu_bo_init(struct amdgpu_device *adev); void amdgpu_bo_fini(struct amdgpu_device *adev); int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2036,6 +2036,36 @@ error_free: return r; }
+/**
- amdgpu_ttm_evict_resources - evict memory buffers
- @adev: amdgpu device object
- @mem_type: evicted BO's memory type
- Evicts all @mem_type buffers on the lru list of the memory type.
- Returns:
- 0 for success or a negative error code on failure.
- */
+int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int
mem_type)
+{
- struct ttm_resource_manager *man;
- switch (mem_type) {
- case TTM_PL_VRAM:
- case TTM_PL_TT:
- case AMDGPU_PL_GWS:
- case AMDGPU_PL_GDS:
- case AMDGPU_PL_OA:
man = ttm_manager_type(&adev->mman.bdev,
mem_type);
break;
- default:
DRM_ERROR("Trying to evict invalid memory type\n");
return -EINVAL;
- }
- return ttm_resource_manager_evict_all(&adev->mman.bdev, man);
+}
#if defined(CONFIG_DEBUG_FS)
static int amdgpu_mm_vram_table_show(struct seq_file *m, void
*unused)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -190,6 +190,7 @@ bool amdgpu_ttm_tt_is_readonly(struct tt uint64_t amdgpu_ttm_tt_pde_flags(struct ttm_tt *ttm, struct
ttm_resource *mem);
uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct
ttm_tt *ttm,
struct ttm_resource *mem);
+int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int
mem_type);
void amdgpu_ttm_debugfs_init(struct amdgpu_device *adev);
From: Mario Limonciello mario.limonciello@amd.com
commit e53d9665ab003df0ece8f869fcd3c2bbbecf7190 upstream.
This codepath should be running in both s0ix and s3, but only does currently because s3 and s0ix are both set in the s0ix case.
Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: "Limonciello, Mario" Mario.Limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3939,8 +3939,8 @@ void amdgpu_device_fini_sw(struct amdgpu */ static void amdgpu_device_evict_resources(struct amdgpu_device *adev) { - /* No need to evict vram on APUs for suspend to ram */ - if (adev->in_s3 && (adev->flags & AMD_IS_APU)) + /* No need to evict vram on APUs for suspend to ram or s2idle */ + if ((adev->in_s3 || adev->in_s0ix) && (adev->flags & AMD_IS_APU)) return;
if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM))
From: Mario Limonciello mario.limonciello@amd.com
commit eac4c54bf7f17fb4681b85e5fe383b74d6261a2b upstream.
This makes it clearer which codepaths are in use specifically in one state or the other.
Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2250,9 +2250,9 @@ static int amdgpu_pmops_suspend(struct d
if (amdgpu_acpi_is_s0ix_active(adev)) adev->in_s0ix = true; - adev->in_s3 = true; + else + adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); - adev->in_s3 = false; if (r) return r; if (!adev->in_s0ix) @@ -2269,6 +2269,8 @@ static int amdgpu_pmops_resume(struct de r = amdgpu_device_resume(drm_dev, true); if (amdgpu_acpi_is_s0ix_active(adev)) adev->in_s0ix = false; + else + adev->in_s3 = false; return r; }
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit 887f75cfd0da44c19dda93b2ff9e70ca8792cdc1 upstream.
DP/HDMI audio on AMD PRO VII stops working after S3: [ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset [ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset [ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset [ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot [ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot ... [ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535
The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for reset in S3 ") doesn't help, so the issue is something different.
Assuming that to make HDA resume to D0 fully realized, it needs to be successfully put to D3 first. And this guesswork proves working, by moving amdgpu_asic_reset() to noirq callback, so it's called after HDA function is in D3.
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Cc: "Limonciello, Mario" Mario.Limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2246,18 +2246,23 @@ static int amdgpu_pmops_suspend(struct d { struct drm_device *drm_dev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(drm_dev); - int r;
if (amdgpu_acpi_is_s0ix_active(adev)) adev->in_s0ix = true; else adev->in_s3 = true; - r = amdgpu_device_suspend(drm_dev, true); - if (r) - return r; + return amdgpu_device_suspend(drm_dev, true); +} + +static int amdgpu_pmops_suspend_noirq(struct device *dev) +{ + struct drm_device *drm_dev = dev_get_drvdata(dev); + struct amdgpu_device *adev = drm_to_adev(drm_dev); + if (!adev->in_s0ix) - r = amdgpu_asic_reset(adev); - return r; + return amdgpu_asic_reset(adev); + + return 0; }
static int amdgpu_pmops_resume(struct device *dev) @@ -2494,6 +2499,7 @@ static const struct dev_pm_ops amdgpu_pm .prepare = amdgpu_pmops_prepare, .complete = amdgpu_pmops_complete, .suspend = amdgpu_pmops_suspend, + .suspend_noirq = amdgpu_pmops_suspend_noirq, .resume = amdgpu_pmops_resume, .freeze = amdgpu_pmops_freeze, .thaw = amdgpu_pmops_thaw,
From: Baruch Siach baruch@tkos.co.il
[ Upstream commit e5f6e5d554ac274f9c8ba60078103d0425b93c19 ]
pwmchip_add() unconditionally assigns the base ID dynamically. Commit f9a8ee8c8bcd1 ("pwm: Always allocate PWM chip base ID dynamically") dropped all base assignment from drivers under drivers/pwm/. It missed this driver. Fix that.
Fixes: f9a8ee8c8bcd1 ("pwm: Always allocate PWM chip base ID dynamically") Signed-off-by: Baruch Siach baruch@tkos.co.il Reviewed-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Acked-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-mvebu.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c index 8f429d9f3661..ad8822da7c27 100644 --- a/drivers/gpio/gpio-mvebu.c +++ b/drivers/gpio/gpio-mvebu.c @@ -871,13 +871,6 @@ static int mvebu_pwm_probe(struct platform_device *pdev, mvpwm->chip.dev = dev; mvpwm->chip.ops = &mvebu_pwm_ops; mvpwm->chip.npwm = mvchip->chip.ngpio; - /* - * There may already be some PWM allocated, so we can't force - * mvpwm->chip.base to a fixed point like mvchip->chip.base. - * So, we let pwmchip_add() do the numbering and take the next free - * region. - */ - mvpwm->chip.base = -1;
spin_lock_init(&mvpwm->lock);
From: Sandipan Das sandipan.das@amd.com
[ Upstream commit 5a1bde46f98b893cda6122b00e94c0c40a6ead3c ]
On some x86 processors, CPUID leaf 0xA provides information on Architectural Performance Monitoring features. It advertises a PMU version which Qemu uses to determine the availability of additional MSRs to manage the PMCs.
Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for the same, the kernel constructs return values based on the x86_pmu_capability irrespective of the vendor.
This leaf and the additional MSRs are not supported on AMD and Hygon processors. If AMD PerfMonV2 is detected, the PMU version is set to 2 and guest startup breaks because of an attempt to access a non-existent MSR. Return zeros to avoid this.
Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf") Reported-by: Vasant Hegde vasant.hegde@amd.com Signed-off-by: Sandipan Das sandipan.das@amd.com Message-Id: 3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/cpuid.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 5f1d4a5aa871..b17c9b00669e 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -725,6 +725,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) union cpuid10_eax eax; union cpuid10_edx edx;
+ if (!static_cpu_has(X86_FEATURE_ARCH_PERFMON)) { + entry->eax = entry->ebx = entry->ecx = entry->edx = 0; + break; + } + perf_get_x86_pmu_capability(&cap);
/*
From: Javier Martinez Canillas javierm@redhat.com
[ Upstream commit aafa025c76dcc7d1a8c8f0bdefcbe4eb480b2f6a ]
A reference to the framebuffer device struct fb_info is stored in the file private data, but this reference could no longer be valid and must not be accessed directly. Instead, the file_fb_info() accessor function must be used since it does sanity checking to make sure that the fb_info is valid.
This can happen for example if the registered framebuffer device is for a driver that just uses a framebuffer provided by the system firmware. In that case, the fbdev core would unregister the framebuffer device when a real video driver is probed and ask to remove conflicting framebuffers.
The bug has been present for a long time but commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") unmasked it since the fbdev core started unregistering the framebuffers' devices associated.
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Reported-by: Maxime Ripard maxime@cerno.tech Reported-by: Junxiao Chang junxiao.chang@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Link: https://patchwork.freedesktop.org/patch/msgid/20220502135014.377945-1-javier... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/core/fbmem.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 0371ad233fdf..8e38a7a5cf2f 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1436,7 +1436,10 @@ fb_release(struct inode *inode, struct file *file) __acquires(&info->lock) __releases(&info->lock) { - struct fb_info * const info = file->private_data; + struct fb_info * const info = file_fb_info(file); + + if (!info) + return -ENODEV;
lock_fb_info(info); if (info->fbops->fb_release)
From: Aya Levin ayal@nvidia.com
[ Upstream commit 7ba2d9d8de96696c1451fee1b01da11f45bdc2b9 ]
Resource dump menu may span over more than a single page, support it. Otherwise, menu read may result in a memory access violation: reading outside of the allocated page. Note that page format of the first menu page contains menu headers while the proceeding menu pages contain only records.
The KASAN logs are as follows: BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0 Read of size 1 at addr ffff88812b2e1fd0 by task systemd-udevd/496
CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G B 5.16.0_for_upstream_debug_2022_01_10_23_12 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1f/0x140 ? strcmp+0x9b/0xb0 ? strcmp+0x9b/0xb0 kasan_report.cold+0x83/0xdf ? strcmp+0x9b/0xb0 strcmp+0x9b/0xb0 mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core] ? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x286/0x400 ? raw_spin_unlock_irqrestore+0x47/0x50 ? aomic_notifier_chain_register+0x32/0x40 mlx5_load+0x104/0x2e0 [mlx5_core] mlx5_init_one+0x41b/0x610 [mlx5_core] .... The buggy address belongs to the object at ffff88812b2e0000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 4048 bytes to the right of 4096-byte region [ffff88812b2e0000, ffff88812b2e1000) The buggy address belongs to the page: page:000000009d69807a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0 head:000000009d69807a order:3 compound_mapcount:0 compound_pincount:0 flags: 0x8000000000010200(slab|head|zone=2) raw: 8000000000010200 0000000000000000 dead000000000001 ffff888100043040 raw: ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 12206b17235a ("net/mlx5: Add support for resource dump") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/diag/rsc_dump.c | 31 +++++++++++++++---- 1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c index ed4fb79b4db7..75b6060f7a9a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c @@ -31,6 +31,7 @@ static const char *const mlx5_rsc_sgmt_name[] = { struct mlx5_rsc_dump { u32 pdn; struct mlx5_core_mkey mkey; + u32 number_of_menu_items; u16 fw_segment_type[MLX5_SGMT_TYPE_NUM]; };
@@ -50,21 +51,37 @@ static int mlx5_rsc_dump_sgmt_get_by_name(char *name) return -EINVAL; }
-static void mlx5_rsc_dump_read_menu_sgmt(struct mlx5_rsc_dump *rsc_dump, struct page *page) +#define MLX5_RSC_DUMP_MENU_HEADER_SIZE (MLX5_ST_SZ_BYTES(resource_dump_info_segment) + \ + MLX5_ST_SZ_BYTES(resource_dump_command_segment) + \ + MLX5_ST_SZ_BYTES(resource_dump_menu_segment)) + +static int mlx5_rsc_dump_read_menu_sgmt(struct mlx5_rsc_dump *rsc_dump, struct page *page, + int read_size, int start_idx) { void *data = page_address(page); enum mlx5_sgmt_type sgmt_idx; int num_of_items; char *sgmt_name; void *member; + int size = 0; void *menu; int i;
- menu = MLX5_ADDR_OF(menu_resource_dump_response, data, menu); - num_of_items = MLX5_GET(resource_dump_menu_segment, menu, num_of_records); + if (!start_idx) { + menu = MLX5_ADDR_OF(menu_resource_dump_response, data, menu); + rsc_dump->number_of_menu_items = MLX5_GET(resource_dump_menu_segment, menu, + num_of_records); + size = MLX5_RSC_DUMP_MENU_HEADER_SIZE; + data += size; + } + num_of_items = rsc_dump->number_of_menu_items; + + for (i = 0; start_idx + i < num_of_items; i++) { + size += MLX5_ST_SZ_BYTES(resource_dump_menu_record); + if (size >= read_size) + return start_idx + i;
- for (i = 0; i < num_of_items; i++) { - member = MLX5_ADDR_OF(resource_dump_menu_segment, menu, record[i]); + member = data + MLX5_ST_SZ_BYTES(resource_dump_menu_record) * i; sgmt_name = MLX5_ADDR_OF(resource_dump_menu_record, member, segment_name); sgmt_idx = mlx5_rsc_dump_sgmt_get_by_name(sgmt_name); if (sgmt_idx == -EINVAL) @@ -72,6 +89,7 @@ static void mlx5_rsc_dump_read_menu_sgmt(struct mlx5_rsc_dump *rsc_dump, struct rsc_dump->fw_segment_type[sgmt_idx] = MLX5_GET(resource_dump_menu_record, member, segment_type); } + return 0; }
static int mlx5_rsc_dump_trigger(struct mlx5_core_dev *dev, struct mlx5_rsc_dump_cmd *cmd, @@ -168,6 +186,7 @@ static int mlx5_rsc_dump_menu(struct mlx5_core_dev *dev) struct mlx5_rsc_dump_cmd *cmd = NULL; struct mlx5_rsc_key key = {}; struct page *page; + int start_idx = 0; int size; int err;
@@ -189,7 +208,7 @@ static int mlx5_rsc_dump_menu(struct mlx5_core_dev *dev) if (err < 0) goto destroy_cmd;
- mlx5_rsc_dump_read_menu_sgmt(dev->rsc_dump, page); + start_idx = mlx5_rsc_dump_read_menu_sgmt(dev->rsc_dump, page, size, start_idx);
} while (err > 0);
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit 27b0420fd959e38e3500e60b637d39dfab065645 ]
Recent commit that modified fib route event handler to handle events according to their priority introduced use-after-free[0] in mp->mfi pointer usage. The pointer now is not just cached in order to be compared to following fib_info instances, but is also dereferenced to obtain fib_priority. However, since mlx5 lag code doesn't hold the reference to fin_info during whole mp->mfi lifetime, it could be used after fib_info instance has already been freed be kernel infrastructure code.
Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*' type and cache fib_info priority in dedicated integer. Group fib_info-related data into dedicated 'fib' structure that will be further extended by following patches in the series.
[0]:
[ 203.588029] ================================================================== [ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core] [ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6 [ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core] [ 203.600053] Call Trace: [ 203.600608] <TASK> [ 203.601110] dump_stack_lvl+0x48/0x5e [ 203.601860] print_address_description.constprop.0+0x1f/0x160 [ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core] [ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core] [ 203.605177] kasan_report.cold+0x83/0xdf [ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core] [ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core] [ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core] [ 203.609382] ? read_word_at_a_time+0xe/0x20 [ 203.610463] ? strscpy+0xa0/0x2a0 [ 203.611463] process_one_work+0x722/0x1270 [ 203.612344] worker_thread+0x540/0x11e0 [ 203.613136] ? rescuer_thread+0xd50/0xd50 [ 203.613949] kthread+0x26e/0x300 [ 203.614627] ? kthread_complete_and_exit+0x20/0x20 [ 203.615542] ret_from_fork+0x1f/0x30 [ 203.616273] </TASK>
[ 203.617174] Allocated by task 3746: [ 203.617874] kasan_save_stack+0x1e/0x40 [ 203.618644] __kasan_kmalloc+0x81/0xa0 [ 203.619394] fib_create_info+0xb41/0x3c50 [ 203.620213] fib_table_insert+0x190/0x1ff0 [ 203.621020] fib_magic.isra.0+0x246/0x2e0 [ 203.621803] fib_add_ifaddr+0x19f/0x670 [ 203.622563] fib_inetaddr_event+0x13f/0x270 [ 203.623377] blocking_notifier_call_chain+0xd4/0x130 [ 203.624355] __inet_insert_ifa+0x641/0xb20 [ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0 [ 203.626009] rtnetlink_rcv_msg+0x309/0x880 [ 203.626826] netlink_rcv_skb+0x11d/0x340 [ 203.627626] netlink_unicast+0x4cc/0x790 [ 203.628430] netlink_sendmsg+0x762/0xc00 [ 203.629230] sock_sendmsg+0xb2/0xe0 [ 203.629955] ____sys_sendmsg+0x58a/0x770 [ 203.630756] ___sys_sendmsg+0xd8/0x160 [ 203.631523] __sys_sendmsg+0xb7/0x140 [ 203.632294] do_syscall_64+0x35/0x80 [ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.634427] Freed by task 0: [ 203.635063] kasan_save_stack+0x1e/0x40 [ 203.635844] kasan_set_track+0x21/0x30 [ 203.636618] kasan_set_free_info+0x20/0x30 [ 203.637450] __kasan_slab_free+0xfc/0x140 [ 203.638271] kfree+0x94/0x3b0 [ 203.638903] rcu_core+0x5e4/0x1990 [ 203.639640] __do_softirq+0x1ba/0x5d3
[ 203.640828] Last potentially related work creation: [ 203.641785] kasan_save_stack+0x1e/0x40 [ 203.642571] __kasan_record_aux_stack+0x9f/0xb0 [ 203.643478] call_rcu+0x88/0x9c0 [ 203.644178] fib_release_info+0x539/0x750 [ 203.644997] fib_table_delete+0x659/0xb80 [ 203.645809] fib_magic.isra.0+0x1a3/0x2e0 [ 203.646617] fib_del_ifaddr+0x93f/0x1300 [ 203.647415] fib_inetaddr_event+0x9f/0x270 [ 203.648251] blocking_notifier_call_chain+0xd4/0x130 [ 203.649225] __inet_del_ifa+0x474/0xc10 [ 203.650016] devinet_ioctl+0x781/0x17f0 [ 203.650788] inet_ioctl+0x1ad/0x290 [ 203.651533] sock_do_ioctl+0xce/0x1c0 [ 203.652315] sock_ioctl+0x27b/0x4f0 [ 203.653058] __x64_sys_ioctl+0x124/0x190 [ 203.653850] do_syscall_64+0x35/0x80 [ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.666952] The buggy address belongs to the object at ffff888144df2000 which belongs to the cache kmalloc-256 of size 256 [ 203.669250] The buggy address is located 80 bytes inside of 256-byte region [ffff888144df2000, ffff888144df2100) [ 203.671332] The buggy address belongs to the page: [ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0 [ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0 [ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40 [ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 203.679928] page dumped because: kasan: bad access detected
[ 203.681455] Memory state around the buggy address: [ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 203.686701] ^ [ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 203.690620] ==================================================================
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Maor Dickman maord@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/lag_mp.c | 26 ++++++++++++------- .../net/ethernet/mellanox/mlx5/core/lag_mp.h | 5 +++- 2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c index cb0a48d374a3..8d278c45e7cc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c @@ -100,6 +100,12 @@ static void mlx5_lag_fib_event_flush(struct notifier_block *nb) flush_workqueue(mp->wq); }
+static void mlx5_lag_fib_set(struct lag_mp *mp, struct fib_info *fi) +{ + mp->fib.mfi = fi; + mp->fib.priority = fi->fib_priority; +} + struct mlx5_fib_event_work { struct work_struct work; struct mlx5_lag *ldev; @@ -121,13 +127,13 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, /* Handle delete event */ if (event == FIB_EVENT_ENTRY_DEL) { /* stop track */ - if (mp->mfi == fi) - mp->mfi = NULL; + if (mp->fib.mfi == fi) + mp->fib.mfi = NULL; return; }
/* Handle multipath entry with lower priority value */ - if (mp->mfi && mp->mfi != fi && fi->fib_priority >= mp->mfi->fib_priority) + if (mp->fib.mfi && mp->fib.mfi != fi && fi->fib_priority >= mp->fib.priority) return;
/* Handle add/replace event */ @@ -145,7 +151,7 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, mlx5_lag_set_port_affinity(ldev, i); }
- mp->mfi = fi; + mlx5_lag_fib_set(mp, fi); return; }
@@ -165,7 +171,7 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, }
/* First time we see multipath route */ - if (!mp->mfi && !__mlx5_lag_is_active(ldev)) { + if (!mp->fib.mfi && !__mlx5_lag_is_active(ldev)) { struct lag_tracker tracker;
tracker = ldev->tracker; @@ -173,7 +179,7 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, }
mlx5_lag_set_port_affinity(ldev, MLX5_LAG_NORMAL_AFFINITY); - mp->mfi = fi; + mlx5_lag_fib_set(mp, fi); }
static void mlx5_lag_fib_nexthop_event(struct mlx5_lag *ldev, @@ -184,7 +190,7 @@ static void mlx5_lag_fib_nexthop_event(struct mlx5_lag *ldev, struct lag_mp *mp = &ldev->lag_mp;
/* Check the nh event is related to the route */ - if (!mp->mfi || mp->mfi != fi) + if (!mp->fib.mfi || mp->fib.mfi != fi) return;
/* nh added/removed */ @@ -313,7 +319,7 @@ void mlx5_lag_mp_reset(struct mlx5_lag *ldev) /* Clear mfi, as it might become stale when a route delete event * has been missed, see mlx5_lag_fib_route_event(). */ - ldev->lag_mp.mfi = NULL; + ldev->lag_mp.fib.mfi = NULL; }
int mlx5_lag_mp_init(struct mlx5_lag *ldev) @@ -324,7 +330,7 @@ int mlx5_lag_mp_init(struct mlx5_lag *ldev) /* always clear mfi, as it might become stale when a route delete event * has been missed */ - mp->mfi = NULL; + mp->fib.mfi = NULL;
if (mp->fib_nb.notifier_call) return 0; @@ -354,5 +360,5 @@ void mlx5_lag_mp_cleanup(struct mlx5_lag *ldev) unregister_fib_notifier(&init_net, &mp->fib_nb); destroy_workqueue(mp->wq); mp->fib_nb.notifier_call = NULL; - mp->mfi = NULL; + mp->fib.mfi = NULL; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h index dea199e79bed..e8380eb0dd6a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h @@ -15,7 +15,10 @@ enum mlx5_lag_port_affinity {
struct lag_mp { struct notifier_block fib_nb; - struct fib_info *mfi; /* used in tracking fib events */ + struct { + const void *mfi; /* used in tracking fib events */ + u32 priority; + } fib; struct workqueue_struct *wq; };
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit a6589155ec9847918e00e7279b8aa6d4c272bea7 ]
Referenced change incorrectly sets single path fib_info even when LAG is not active. Fix it by moving call to mlx5_lag_fib_set() into conditional that verifies LAG state.
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Maor Dickman maord@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c index 8d278c45e7cc..9d50b9c2db5e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c @@ -149,9 +149,9 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev,
i++; mlx5_lag_set_port_affinity(ldev, i); + mlx5_lag_fib_set(mp, fi); }
- mlx5_lag_fib_set(mp, fi); return; }
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit 4a2a664ed87962c4ddb806a84b5c9634820bcf55 ]
Referenced change added check to skip updating fib when new fib instance has same or lower priority. However, new fib instance can be an update on same dst address as existing one even though the structure is another instance that has different address. Ignoring events on such instances causes multipath LAG state to not be correctly updated.
Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info structure and don't skip events that have the same value of that fields.
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Maor Dickman maord@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/lag_mp.c | 20 +++++++++++-------- .../net/ethernet/mellanox/mlx5/core/lag_mp.h | 2 ++ 2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c index 9d50b9c2db5e..81786a9a424c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c @@ -100,10 +100,12 @@ static void mlx5_lag_fib_event_flush(struct notifier_block *nb) flush_workqueue(mp->wq); }
-static void mlx5_lag_fib_set(struct lag_mp *mp, struct fib_info *fi) +static void mlx5_lag_fib_set(struct lag_mp *mp, struct fib_info *fi, u32 dst, int dst_len) { mp->fib.mfi = fi; mp->fib.priority = fi->fib_priority; + mp->fib.dst = dst; + mp->fib.dst_len = dst_len; }
struct mlx5_fib_event_work { @@ -116,10 +118,10 @@ struct mlx5_fib_event_work { }; };
-static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, - unsigned long event, - struct fib_info *fi) +static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, unsigned long event, + struct fib_entry_notifier_info *fen_info) { + struct fib_info *fi = fen_info->fi; struct lag_mp *mp = &ldev->lag_mp; struct fib_nh *fib_nh0, *fib_nh1; unsigned int nhs; @@ -133,7 +135,9 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, }
/* Handle multipath entry with lower priority value */ - if (mp->fib.mfi && mp->fib.mfi != fi && fi->fib_priority >= mp->fib.priority) + if (mp->fib.mfi && mp->fib.mfi != fi && + (mp->fib.dst != fen_info->dst || mp->fib.dst_len != fen_info->dst_len) && + fi->fib_priority >= mp->fib.priority) return;
/* Handle add/replace event */ @@ -149,7 +153,7 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev,
i++; mlx5_lag_set_port_affinity(ldev, i); - mlx5_lag_fib_set(mp, fi); + mlx5_lag_fib_set(mp, fi, fen_info->dst, fen_info->dst_len); }
return; @@ -179,7 +183,7 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, }
mlx5_lag_set_port_affinity(ldev, MLX5_LAG_NORMAL_AFFINITY); - mlx5_lag_fib_set(mp, fi); + mlx5_lag_fib_set(mp, fi, fen_info->dst, fen_info->dst_len); }
static void mlx5_lag_fib_nexthop_event(struct mlx5_lag *ldev, @@ -220,7 +224,7 @@ static void mlx5_lag_fib_update(struct work_struct *work) case FIB_EVENT_ENTRY_REPLACE: case FIB_EVENT_ENTRY_DEL: mlx5_lag_fib_route_event(ldev, fib_work->event, - fib_work->fen_info.fi); + &fib_work->fen_info); fib_info_put(fib_work->fen_info.fi); break; case FIB_EVENT_NH_ADD: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h index e8380eb0dd6a..b3a7f18b9e30 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h @@ -18,6 +18,8 @@ struct lag_mp { struct { const void *mfi; /* used in tracking fib events */ u32 priority; + u32 dst; + int dst_len; } fib; struct workqueue_struct *wq; };
From: Hector Martin marcan@marcan.st
[ Upstream commit 2ac2fab52917ae82cbca97cf6e5d2993530257ed ]
This is required to make loading this as a module work.
Signed-off-by: Hector Martin marcan@marcan.st Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver") Reviewed-by: Sven Peter sven@svenpeter.dev Link: https://lore.kernel.org/r/20220502092238.30486-1-marcan@marcan.st Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/apple-dart.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 9c9bbccc00bd..baba4571c815 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -757,6 +757,7 @@ static const struct iommu_ops apple_dart_iommu_ops = { .of_xlate = apple_dart_of_xlate, .def_domain_type = apple_dart_def_domain_type, .pgsize_bitmap = -1UL, /* Restricted during dart probe */ + .owner = THIS_MODULE, };
static irqreturn_t apple_dart_irq(int irq, void *dev)
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit f18b4aebe107d092e384b1ae680b1e1de7a0196d ]
Red Hat's QE team reported test failure on access_tracking_perf_test:
Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages guest physical test memory offset: 0x3fffbffff000
Populating memory : 0.684014577s Writing to populated memory : 0.006230175s Reading from populated memory : 0.004557805s ==== Test Assertion Failure ==== lib/kvm_util.c:1411: false pid=125806 tid=125809 errno=4 - Interrupted system call 1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411 2 (inlined by) addr_gpa2hva at kvm_util.c:1405 3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98 4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152 5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232 6 0x00007fefe9ff81ce: ?? ??:0 7 0x00007fefe9c64d82: ?? ??:0 No vm physical memory at 0xffbffff000
I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46 bits PA.
It turns out that the address translation for clearing idle page tracking returned a wrong result; addr_gva2gpa()'s last step, which is based on "pte[index[0]].pfn", did the calculation with 40 bits length and the high 12 bits got truncated. In above case the GPA address to be returned should be 0x3fffbffff000 for GVA 0xc0000000, but it got truncated into 0xffbffff000 and the subsequent gpa2hva lookup failed.
The width of operations on bit fields greater than 32-bit is implementation defined, and differs between GCC (which uses the bitfield precision) and clang (which uses 64-bit arithmetic), so this is a potential minefield. Remove the bit fields and using manual masking instead.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036 Reported-by: Nana Liu nanliu@redhat.com Reviewed-by: Peter Xu peterx@redhat.com Tested-by: Peter Xu peterx@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/kvm/include/x86_64/processor.h | 15 ++ .../selftests/kvm/lib/x86_64/processor.c | 192 +++++++----------- 2 files changed, 92 insertions(+), 115 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 05e65ca1c30c..23861c8faa61 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -58,6 +58,21 @@ /* CPUID.0x8000_0001.EDX */ #define CPUID_GBPAGES (1ul << 26)
+/* Page table bitfield declarations */ +#define PTE_PRESENT_MASK BIT_ULL(0) +#define PTE_WRITABLE_MASK BIT_ULL(1) +#define PTE_USER_MASK BIT_ULL(2) +#define PTE_ACCESSED_MASK BIT_ULL(5) +#define PTE_DIRTY_MASK BIT_ULL(6) +#define PTE_LARGE_MASK BIT_ULL(7) +#define PTE_GLOBAL_MASK BIT_ULL(8) +#define PTE_NX_MASK BIT_ULL(63) + +#define PAGE_SHIFT 12 + +#define PHYSICAL_PAGE_MASK GENMASK_ULL(51, 12) +#define PTE_GET_PFN(pte) (((pte) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT) + /* General Registers in 64-Bit Mode */ struct gpr64_regs { u64 rax; diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index da73b97e1e6d..46057079d8bb 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -19,38 +19,6 @@
vm_vaddr_t exception_handlers;
-/* Virtual translation table structure declarations */ -struct pageUpperEntry { - uint64_t present:1; - uint64_t writable:1; - uint64_t user:1; - uint64_t write_through:1; - uint64_t cache_disable:1; - uint64_t accessed:1; - uint64_t ignored_06:1; - uint64_t page_size:1; - uint64_t ignored_11_08:4; - uint64_t pfn:40; - uint64_t ignored_62_52:11; - uint64_t execute_disable:1; -}; - -struct pageTableEntry { - uint64_t present:1; - uint64_t writable:1; - uint64_t user:1; - uint64_t write_through:1; - uint64_t cache_disable:1; - uint64_t accessed:1; - uint64_t dirty:1; - uint64_t reserved_07:1; - uint64_t global:1; - uint64_t ignored_11_09:3; - uint64_t pfn:40; - uint64_t ignored_62_52:11; - uint64_t execute_disable:1; -}; - void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent) { @@ -195,23 +163,21 @@ static void *virt_get_pte(struct kvm_vm *vm, uint64_t pt_pfn, uint64_t vaddr, return &page_table[index]; }
-static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm, - uint64_t pt_pfn, - uint64_t vaddr, - uint64_t paddr, - int level, - enum x86_page_size page_size) +static uint64_t *virt_create_upper_pte(struct kvm_vm *vm, + uint64_t pt_pfn, + uint64_t vaddr, + uint64_t paddr, + int level, + enum x86_page_size page_size) { - struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, level); - - if (!pte->present) { - pte->writable = true; - pte->present = true; - pte->page_size = (level == page_size); - if (pte->page_size) - pte->pfn = paddr >> vm->page_shift; + uint64_t *pte = virt_get_pte(vm, pt_pfn, vaddr, level); + + if (!(*pte & PTE_PRESENT_MASK)) { + *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK; + if (level == page_size) + *pte |= PTE_LARGE_MASK | (paddr & PHYSICAL_PAGE_MASK); else - pte->pfn = vm_alloc_page_table(vm) >> vm->page_shift; + *pte |= vm_alloc_page_table(vm) & PHYSICAL_PAGE_MASK; } else { /* * Entry already present. Assert that the caller doesn't want @@ -221,7 +187,7 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm, TEST_ASSERT(level != page_size, "Cannot create hugepage at level: %u, vaddr: 0x%lx\n", page_size, vaddr); - TEST_ASSERT(!pte->page_size, + TEST_ASSERT(!(*pte & PTE_LARGE_MASK), "Cannot create page table at level: %u, vaddr: 0x%lx\n", level, vaddr); } @@ -232,8 +198,8 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, enum x86_page_size page_size) { const uint64_t pg_size = 1ull << ((page_size * 9) + 12); - struct pageUpperEntry *pml4e, *pdpe, *pde; - struct pageTableEntry *pte; + uint64_t *pml4e, *pdpe, *pde; + uint64_t *pte;
TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Unknown or unsupported guest mode, mode: 0x%x", vm->mode); @@ -257,24 +223,22 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, */ pml4e = virt_create_upper_pte(vm, vm->pgd >> vm->page_shift, vaddr, paddr, 3, page_size); - if (pml4e->page_size) + if (*pml4e & PTE_LARGE_MASK) return;
- pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, page_size); - if (pdpe->page_size) + pdpe = virt_create_upper_pte(vm, PTE_GET_PFN(*pml4e), vaddr, paddr, 2, page_size); + if (*pdpe & PTE_LARGE_MASK) return;
- pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, page_size); - if (pde->page_size) + pde = virt_create_upper_pte(vm, PTE_GET_PFN(*pdpe), vaddr, paddr, 1, page_size); + if (*pde & PTE_LARGE_MASK) return;
/* Fill in page table entry. */ - pte = virt_get_pte(vm, pde->pfn, vaddr, 0); - TEST_ASSERT(!pte->present, + pte = virt_get_pte(vm, PTE_GET_PFN(*pde), vaddr, 0); + TEST_ASSERT(!(*pte & PTE_PRESENT_MASK), "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr); - pte->pfn = paddr >> vm->page_shift; - pte->writable = true; - pte->present = 1; + *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK); }
void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr) @@ -282,12 +246,12 @@ void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr) __virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K); }
-static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid, +static uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid, uint64_t vaddr) { uint16_t index[4]; - struct pageUpperEntry *pml4e, *pdpe, *pde; - struct pageTableEntry *pte; + uint64_t *pml4e, *pdpe, *pde; + uint64_t *pte; struct kvm_cpuid_entry2 *entry; struct kvm_sregs sregs; int max_phy_addr; @@ -329,30 +293,29 @@ static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vc index[3] = (vaddr >> 39) & 0x1ffu;
pml4e = addr_gpa2hva(vm, vm->pgd); - TEST_ASSERT(pml4e[index[3]].present, + TEST_ASSERT(pml4e[index[3]] & PTE_PRESENT_MASK, "Expected pml4e to be present for gva: 0x%08lx", vaddr); - TEST_ASSERT((*(uint64_t*)(&pml4e[index[3]]) & - (rsvd_mask | (1ull << 7))) == 0, + TEST_ASSERT((pml4e[index[3]] & (rsvd_mask | PTE_LARGE_MASK)) == 0, "Unexpected reserved bits set.");
- pdpe = addr_gpa2hva(vm, pml4e[index[3]].pfn * vm->page_size); - TEST_ASSERT(pdpe[index[2]].present, + pdpe = addr_gpa2hva(vm, PTE_GET_PFN(pml4e[index[3]]) * vm->page_size); + TEST_ASSERT(pdpe[index[2]] & PTE_PRESENT_MASK, "Expected pdpe to be present for gva: 0x%08lx", vaddr); - TEST_ASSERT(pdpe[index[2]].page_size == 0, + TEST_ASSERT(!(pdpe[index[2]] & PTE_LARGE_MASK), "Expected pdpe to map a pde not a 1-GByte page."); - TEST_ASSERT((*(uint64_t*)(&pdpe[index[2]]) & rsvd_mask) == 0, + TEST_ASSERT((pdpe[index[2]] & rsvd_mask) == 0, "Unexpected reserved bits set.");
- pde = addr_gpa2hva(vm, pdpe[index[2]].pfn * vm->page_size); - TEST_ASSERT(pde[index[1]].present, + pde = addr_gpa2hva(vm, PTE_GET_PFN(pdpe[index[2]]) * vm->page_size); + TEST_ASSERT(pde[index[1]] & PTE_PRESENT_MASK, "Expected pde to be present for gva: 0x%08lx", vaddr); - TEST_ASSERT(pde[index[1]].page_size == 0, + TEST_ASSERT(!(pde[index[1]] & PTE_LARGE_MASK), "Expected pde to map a pte not a 2-MByte page."); - TEST_ASSERT((*(uint64_t*)(&pde[index[1]]) & rsvd_mask) == 0, + TEST_ASSERT((pde[index[1]] & rsvd_mask) == 0, "Unexpected reserved bits set.");
- pte = addr_gpa2hva(vm, pde[index[1]].pfn * vm->page_size); - TEST_ASSERT(pte[index[0]].present, + pte = addr_gpa2hva(vm, PTE_GET_PFN(pde[index[1]]) * vm->page_size); + TEST_ASSERT(pte[index[0]] & PTE_PRESENT_MASK, "Expected pte to be present for gva: 0x%08lx", vaddr);
return &pte[index[0]]; @@ -360,7 +323,7 @@ static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vc
uint64_t vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid, uint64_t vaddr) { - struct pageTableEntry *pte = _vm_get_page_table_entry(vm, vcpuid, vaddr); + uint64_t *pte = _vm_get_page_table_entry(vm, vcpuid, vaddr);
return *(uint64_t *)pte; } @@ -368,18 +331,17 @@ uint64_t vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid, uint64_t vaddr) void vm_set_page_table_entry(struct kvm_vm *vm, int vcpuid, uint64_t vaddr, uint64_t pte) { - struct pageTableEntry *new_pte = _vm_get_page_table_entry(vm, vcpuid, - vaddr); + uint64_t *new_pte = _vm_get_page_table_entry(vm, vcpuid, vaddr);
*(uint64_t *)new_pte = pte; }
void virt_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent) { - struct pageUpperEntry *pml4e, *pml4e_start; - struct pageUpperEntry *pdpe, *pdpe_start; - struct pageUpperEntry *pde, *pde_start; - struct pageTableEntry *pte, *pte_start; + uint64_t *pml4e, *pml4e_start; + uint64_t *pdpe, *pdpe_start; + uint64_t *pde, *pde_start; + uint64_t *pte, *pte_start;
if (!vm->pgd_created) return; @@ -389,58 +351,58 @@ void virt_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent) fprintf(stream, "%*s index hvaddr gpaddr " "addr w exec dirty\n", indent, ""); - pml4e_start = (struct pageUpperEntry *) addr_gpa2hva(vm, vm->pgd); + pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->pgd); for (uint16_t n1 = 0; n1 <= 0x1ffu; n1++) { pml4e = &pml4e_start[n1]; - if (!pml4e->present) + if (!(*pml4e & PTE_PRESENT_MASK)) continue; - fprintf(stream, "%*spml4e 0x%-3zx %p 0x%-12lx 0x%-10lx %u " + fprintf(stream, "%*spml4e 0x%-3zx %p 0x%-12lx 0x%-10llx %u " " %u\n", indent, "", pml4e - pml4e_start, pml4e, - addr_hva2gpa(vm, pml4e), (uint64_t) pml4e->pfn, - pml4e->writable, pml4e->execute_disable); + addr_hva2gpa(vm, pml4e), PTE_GET_PFN(*pml4e), + !!(*pml4e & PTE_WRITABLE_MASK), !!(*pml4e & PTE_NX_MASK));
- pdpe_start = addr_gpa2hva(vm, pml4e->pfn * vm->page_size); + pdpe_start = addr_gpa2hva(vm, *pml4e & PHYSICAL_PAGE_MASK); for (uint16_t n2 = 0; n2 <= 0x1ffu; n2++) { pdpe = &pdpe_start[n2]; - if (!pdpe->present) + if (!(*pdpe & PTE_PRESENT_MASK)) continue; - fprintf(stream, "%*spdpe 0x%-3zx %p 0x%-12lx 0x%-10lx " + fprintf(stream, "%*spdpe 0x%-3zx %p 0x%-12lx 0x%-10llx " "%u %u\n", indent, "", pdpe - pdpe_start, pdpe, addr_hva2gpa(vm, pdpe), - (uint64_t) pdpe->pfn, pdpe->writable, - pdpe->execute_disable); + PTE_GET_PFN(*pdpe), !!(*pdpe & PTE_WRITABLE_MASK), + !!(*pdpe & PTE_NX_MASK));
- pde_start = addr_gpa2hva(vm, pdpe->pfn * vm->page_size); + pde_start = addr_gpa2hva(vm, *pdpe & PHYSICAL_PAGE_MASK); for (uint16_t n3 = 0; n3 <= 0x1ffu; n3++) { pde = &pde_start[n3]; - if (!pde->present) + if (!(*pde & PTE_PRESENT_MASK)) continue; fprintf(stream, "%*spde 0x%-3zx %p " - "0x%-12lx 0x%-10lx %u %u\n", + "0x%-12lx 0x%-10llx %u %u\n", indent, "", pde - pde_start, pde, addr_hva2gpa(vm, pde), - (uint64_t) pde->pfn, pde->writable, - pde->execute_disable); + PTE_GET_PFN(*pde), !!(*pde & PTE_WRITABLE_MASK), + !!(*pde & PTE_NX_MASK));
- pte_start = addr_gpa2hva(vm, pde->pfn * vm->page_size); + pte_start = addr_gpa2hva(vm, *pde & PHYSICAL_PAGE_MASK); for (uint16_t n4 = 0; n4 <= 0x1ffu; n4++) { pte = &pte_start[n4]; - if (!pte->present) + if (!(*pte & PTE_PRESENT_MASK)) continue; fprintf(stream, "%*spte 0x%-3zx %p " - "0x%-12lx 0x%-10lx %u %u " + "0x%-12lx 0x%-10llx %u %u " " %u 0x%-10lx\n", indent, "", pte - pte_start, pte, addr_hva2gpa(vm, pte), - (uint64_t) pte->pfn, - pte->writable, - pte->execute_disable, - pte->dirty, + PTE_GET_PFN(*pte), + !!(*pte & PTE_WRITABLE_MASK), + !!(*pte & PTE_NX_MASK), + !!(*pte & PTE_DIRTY_MASK), ((uint64_t) n1 << 27) | ((uint64_t) n2 << 18) | ((uint64_t) n3 << 9) @@ -558,8 +520,8 @@ static void kvm_seg_set_kernel_data_64bit(struct kvm_vm *vm, uint16_t selector, vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) { uint16_t index[4]; - struct pageUpperEntry *pml4e, *pdpe, *pde; - struct pageTableEntry *pte; + uint64_t *pml4e, *pdpe, *pde; + uint64_t *pte;
TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use " "unknown or unsupported guest mode, mode: 0x%x", vm->mode); @@ -572,22 +534,22 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) if (!vm->pgd_created) goto unmapped_gva; pml4e = addr_gpa2hva(vm, vm->pgd); - if (!pml4e[index[3]].present) + if (!(pml4e[index[3]] & PTE_PRESENT_MASK)) goto unmapped_gva;
- pdpe = addr_gpa2hva(vm, pml4e[index[3]].pfn * vm->page_size); - if (!pdpe[index[2]].present) + pdpe = addr_gpa2hva(vm, PTE_GET_PFN(pml4e[index[3]]) * vm->page_size); + if (!(pdpe[index[2]] & PTE_PRESENT_MASK)) goto unmapped_gva;
- pde = addr_gpa2hva(vm, pdpe[index[2]].pfn * vm->page_size); - if (!pde[index[1]].present) + pde = addr_gpa2hva(vm, PTE_GET_PFN(pdpe[index[2]]) * vm->page_size); + if (!(pde[index[1]] & PTE_PRESENT_MASK)) goto unmapped_gva;
- pte = addr_gpa2hva(vm, pde[index[1]].pfn * vm->page_size); - if (!pte[index[0]].present) + pte = addr_gpa2hva(vm, PTE_GET_PFN(pde[index[1]]) * vm->page_size); + if (!(pte[index[0]] & PTE_PRESENT_MASK)) goto unmapped_gva;
- return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu); + return (PTE_GET_PFN(pte[index[0]]) * vm->page_size) + (gva & 0xfffu);
unmapped_gva: TEST_FAIL("No mapping for vm virtual address, gva: 0x%lx", gva);
From: Thomas Huth thuth@redhat.com
[ Upstream commit 266a19a0bc4fbfab4d981a47640ca98972a01865 ]
When compiling kvm_page_table_test.c, I get this compiler warning with gcc 11.2:
kvm_page_table_test.c: In function 'pre_init_before_test': ../../../../tools/include/linux/kernel.h:44:24: warning: comparison of distinct pointer types lacks a cast 44 | (void) (&_max1 == &_max2); \ | ^~ kvm_page_table_test.c:281:21: note: in expansion of macro 'max' 281 | alignment = max(0x100000, alignment); | ^~~
Fix it by adjusting the type of the absolute value.
Signed-off-by: Thomas Huth thuth@redhat.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Message-Id: 20220414103031.565037-1-thuth@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/kvm_page_table_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/kvm_page_table_test.c b/tools/testing/selftests/kvm/kvm_page_table_test.c index 36407cb0ec85..f1ddfe4c4a03 100644 --- a/tools/testing/selftests/kvm/kvm_page_table_test.c +++ b/tools/testing/selftests/kvm/kvm_page_table_test.c @@ -278,7 +278,7 @@ static struct kvm_vm *pre_init_before_test(enum vm_guest_mode mode, void *arg) else guest_test_phys_mem = p->phys_offset; #ifdef __s390x__ - alignment = max(0x100000, alignment); + alignment = max(0x100000UL, alignment); #endif guest_test_phys_mem &= ~(alignment - 1);
From: Wanpeng Li wanpengli@tencent.com
[ Upstream commit 0361bdfddca20c8855ea3bdbbbc9c999912b10ff ]
MSR_KVM_POLL_CONTROL is cleared on reset, thus reverting guests to host-side polling after suspend/resume. Non-bootstrap CPUs are restored correctly by the haltpoll driver because they are hot-unplugged during suspend and hot-plugged during resume; however, the BSP is not hotpluggable and remains in host-sde polling mode after the guest resume. The makes the guest pay for the cost of vmexits every time the guest enters idle.
Fix it by recording BSP's haltpoll state and resuming it during guest resume.
Cc: Marcelo Tosatti mtosatti@redhat.com Signed-off-by: Wanpeng Li wanpengli@tencent.com Message-Id: 1650267752-46796-1-git-send-email-wanpengli@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/kvm.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index bd7b65081eb0..d36b58e705b6 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -66,6 +66,7 @@ static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __align DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64) __visible; static int has_steal_clock = 0;
+static int has_guest_poll = 0; /* * No need for any "IO delay" on KVM */ @@ -650,14 +651,26 @@ static int kvm_cpu_down_prepare(unsigned int cpu)
static int kvm_suspend(void) { + u64 val = 0; + kvm_guest_cpu_offline(false);
+#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL + if (kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) + rdmsrl(MSR_KVM_POLL_CONTROL, val); + has_guest_poll = !(val & 1); +#endif return 0; }
static void kvm_resume(void) { kvm_cpu_online(raw_smp_processor_id()); + +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL + if (kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL) && has_guest_poll) + wrmsrl(MSR_KVM_POLL_CONTROL, 0); +#endif }
static struct syscore_ops kvm_syscore_ops = {
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit d22a81b304a27fca6124174a8e842e826c193466 ]
Emulating writes to SELF_IPI with a write to ICR has an unwanted side effect: the value of ICR in vAPIC page gets changed. The lists SELF_IPI as write-only, with no associated MMIO offset, so any write should have no visible side effect in the vAPIC page.
Reported-by: Chao Gao chao.gao@intel.com Reviewed-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/lapic.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4d92fb4fdf69..83d1743a1dd0 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2125,10 +2125,9 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) break;
case APIC_SELF_IPI: - if (apic_x2apic_mode(apic)) { - kvm_lapic_reg_write(apic, APIC_ICR, - APIC_DEST_SELF | (val & APIC_VECTOR_MASK)); - } else + if (apic_x2apic_mode(apic)) + kvm_apic_send_ipi(apic, APIC_DEST_SELF | (val & APIC_VECTOR_MASK), 0); + else ret = 1; break; default:
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit 9191b8f0745e63edf519e4a54a4aaae1d3d46fbd ]
WARN and bail if KVM attempts to free a root that isn't backed by a shadow page. KVM allocates a bare page for "special" roots, e.g. when using PAE paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa will be valid but won't be backed by a shadow page. It's all too easy to blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of crashing KVM and possibly the kernel.
Reviewed-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/mmu.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 34e828badc51..806f9d42bcce 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3314,6 +3314,8 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, return;
sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK); + if (WARN_ON(!sp)) + return;
if (is_tdp_mmu_page(sp)) kvm_tdp_mmu_put_root(kvm, sp, false);
From: Wanpeng Li wanpengli@tencent.com
[ Upstream commit 1714a4eb6fb0cb79f182873cd011a8ed60ac65e8 ]
As commit 0c5f81dad46 ("KVM: LAPIC: Inject timer interrupt via posted interrupt") mentioned that the host admin should well tune the guest setup, so that vCPUs are placed on isolated pCPUs, and with several pCPUs surplus for *busy* housekeeping. In this setup, it is preferrable to disable mwait/hlt/pause vmexits to keep the vCPUs in non-root mode.
However, if only some guests isolated and others not, they would not have any benefit from posted timer interrupts, and at the same time lose VMX preemption timer fast paths because kvm_can_post_timer_interrupt() returns true and therefore forces kvm_can_use_hv_timer() to false.
By guaranteeing that posted-interrupt timer is only used if MWAIT or HLT are done without vmexit, KVM can make a better choice and use the VMX preemption timer and the corresponding fast paths.
Reported-by: Aili Yao yaoaili@kingsoft.com Reviewed-by: Sean Christopherson seanjc@google.com Cc: Aili Yao yaoaili@kingsoft.com Cc: Sean Christopherson seanjc@google.com Signed-off-by: Wanpeng Li wanpengli@tencent.com Message-Id: 1643112538-36743-1-git-send-email-wanpengli@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/lapic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 83d1743a1dd0..493d636e6231 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -113,7 +113,8 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
static bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu) { - return pi_inject_timer && kvm_vcpu_apicv_active(vcpu); + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) && + (kvm_mwait_in_guest(vcpu->kvm) || kvm_hlt_in_guest(vcpu->kvm)); }
bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu)
From: Sidhartha Kumar sidhartha.kumar@oracle.com
[ Upstream commit 9c85a9bae267f6b5e5e374d0d023bbbe9db096d3 ]
Avoid calling mmap with requested addresses that are less than the system's mmap_min_addr. When run as root, mmap returns EACCES when trying to map addresses < mmap_min_addr. This is not one of the error codes for the condition to retry the mmap in the test.
Rather than arbitrarily retrying on EACCES, don't attempt an mmap until addr > vm.mmap_min_addr.
Add a munmap call after an alignment check as the mappings are retained after the retry and can reach the vm.max_map_count sysctl.
Link: https://lkml.kernel.org/r/20220420215721.4868-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar sidhartha.kumar@oracle.com Reviewed-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/vm/mremap_test.c | 29 ++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c index e3ce33a9954e..efcbf537b3d5 100644 --- a/tools/testing/selftests/vm/mremap_test.c +++ b/tools/testing/selftests/vm/mremap_test.c @@ -66,6 +66,35 @@ enum { .expect_failure = should_fail \ }
+/* Returns mmap_min_addr sysctl tunable from procfs */ +static unsigned long long get_mmap_min_addr(void) +{ + FILE *fp; + int n_matched; + static unsigned long long addr; + + if (addr) + return addr; + + fp = fopen("/proc/sys/vm/mmap_min_addr", "r"); + if (fp == NULL) { + ksft_print_msg("Failed to open /proc/sys/vm/mmap_min_addr: %s\n", + strerror(errno)); + exit(KSFT_SKIP); + } + + n_matched = fscanf(fp, "%llu", &addr); + if (n_matched != 1) { + ksft_print_msg("Failed to read /proc/sys/vm/mmap_min_addr: %s\n", + strerror(errno)); + fclose(fp); + exit(KSFT_SKIP); + } + + fclose(fp); + return addr; +} + /* * Returns false if the requested remap region overlaps with an * existing mapping (e.g text, stack) else returns true.
From: Sidhartha Kumar sidhartha.kumar@oracle.com
[ Upstream commit 18d609daa546c919fd36b62a7b510c18de4b4af8 ]
Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy existing mappings. This causes a segfault when regions such as text are remapped and the permissions are changed.
Verify the requested mremap destination address does not overlap any existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep incrementing the destination address until a valid mapping is found or fail the current test once the max address is reached.
Link: https://lkml.kernel.org/r/20220420215721.4868-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar sidhartha.kumar@oracle.com Reviewed-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/vm/mremap_test.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c index efcbf537b3d5..8f4dbbd60c09 100644 --- a/tools/testing/selftests/vm/mremap_test.c +++ b/tools/testing/selftests/vm/mremap_test.c @@ -66,6 +66,30 @@ enum { .expect_failure = should_fail \ }
+/* + * Returns false if the requested remap region overlaps with an + * existing mapping (e.g text, stack) else returns true. + */ +static bool is_remap_region_valid(void *addr, unsigned long long size) +{ + void *remap_addr = NULL; + bool ret = true; + + /* Use MAP_FIXED_NOREPLACE flag to ensure region is not mapped */ + remap_addr = mmap(addr, size, PROT_READ | PROT_WRITE, + MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED, + -1, 0); + + if (remap_addr == MAP_FAILED) { + if (errno == EEXIST) + ret = false; + } else { + munmap(remap_addr, size); + } + + return ret; +} + /* Returns mmap_min_addr sysctl tunable from procfs */ static unsigned long long get_mmap_min_addr(void) {
Hello,
This commit is the second backport of the upstream commit 18d609da ("selftest/vm: verify remap destination address in mremap_test"). It re-introduces function is_remap_region_valid and breaks vm selftest target build with the following diagnostics:
mremap_test.c:126:13: error: redefinition of ‘is_remap_region_valid’
The original backport to 5.15 was done as commit 2688d967. This one (0b4e1609) should be reverted.
The same happend with the upstream commit 9c85a9ba ("selftest/vm: verify mmap addr in mremap_test"). Original backport: a17404fc, repeated backport with just the new function added: e8b99895. Build error message:
mremap_test.c:147:27: error: redefinition of ‘get_mmap_min_addr’
Thank you
On Thu, Jul 14, 2022 at 10:43:51PM +0000, Oleksandr Tymoshenko wrote:
Hello,
This commit is the second backport of the upstream commit 18d609da ("selftest/vm: verify remap destination address in mremap_test"). It re-introduces function is_remap_region_valid and breaks vm selftest target build with the following diagnostics:
mremap_test.c:126:13: error: redefinition of ‘is_remap_region_valid’
The original backport to 5.15 was done as commit 2688d967. This one (0b4e1609) should be reverted.
The same happend with the upstream commit 9c85a9ba ("selftest/vm: verify mmap addr in mremap_test"). Original backport: a17404fc, repeated backport with just the new function added: e8b99895. Build error message:
mremap_test.c:147:27: error: redefinition of ‘get_mmap_min_addr’
This is already released, so can you send a revert for this?
thanks,
greg k-h
From: Ricky WU ricky_wu@realtek.com
commit 1f311c94aabdb419c28e3147bcc8ab89269f1a7e upstream.
SD spec definition: "Host provides at least 74 Clocks before issuing first command" After 1ms for the voltage stable then start issuing the Clock signals
if POWER STATE is MMC_POWER_OFF to MMC_POWER_UP to issue Clock signal to card MMC_POWER_UP to MMC_POWER_ON to stop issuing signal to card
Signed-off-by: Ricky Wu ricky_wu@realtek.com Link: https://lore.kernel.org/r/1badf10aba764191a1a752edcbf90389@realtek.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Cc: Christian Löhle CLoehle@hyperstone.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/rtsx_pci_sdmmc.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)
--- a/drivers/mmc/host/rtsx_pci_sdmmc.c +++ b/drivers/mmc/host/rtsx_pci_sdmmc.c @@ -38,10 +38,7 @@ struct realtek_pci_sdmmc { bool double_clk; bool eject; bool initial_mode; - int power_state; -#define SDMMC_POWER_ON 1 -#define SDMMC_POWER_OFF 0 - + int prev_power_state; int sg_count; s32 cookie; int cookie_sg_count; @@ -905,7 +902,7 @@ static int sd_set_bus_width(struct realt return err; }
-static int sd_power_on(struct realtek_pci_sdmmc *host) +static int sd_power_on(struct realtek_pci_sdmmc *host, unsigned char power_mode) { struct rtsx_pcr *pcr = host->pcr; struct mmc_host *mmc = host->mmc; @@ -913,9 +910,14 @@ static int sd_power_on(struct realtek_pc u32 val; u8 test_mode;
- if (host->power_state == SDMMC_POWER_ON) + if (host->prev_power_state == MMC_POWER_ON) return 0;
+ if (host->prev_power_state == MMC_POWER_UP) { + rtsx_pci_write_register(pcr, SD_BUS_STAT, SD_CLK_TOGGLE_EN, 0); + goto finish; + } + msleep(100);
rtsx_pci_init_cmd(pcr); @@ -936,10 +938,15 @@ static int sd_power_on(struct realtek_pc if (err < 0) return err;
+ mdelay(1); + err = rtsx_pci_write_register(pcr, CARD_OE, SD_OUTPUT_EN, SD_OUTPUT_EN); if (err < 0) return err;
+ /* send at least 74 clocks */ + rtsx_pci_write_register(pcr, SD_BUS_STAT, SD_CLK_TOGGLE_EN, SD_CLK_TOGGLE_EN); + if (PCI_PID(pcr) == PID_5261) { /* * If test mode is set switch to SD Express mandatorily, @@ -964,7 +971,8 @@ static int sd_power_on(struct realtek_pc } }
- host->power_state = SDMMC_POWER_ON; +finish: + host->prev_power_state = power_mode; return 0; }
@@ -973,7 +981,7 @@ static int sd_power_off(struct realtek_p struct rtsx_pcr *pcr = host->pcr; int err;
- host->power_state = SDMMC_POWER_OFF; + host->prev_power_state = MMC_POWER_OFF;
rtsx_pci_init_cmd(pcr);
@@ -999,7 +1007,7 @@ static int sd_set_power_mode(struct real if (power_mode == MMC_POWER_OFF) err = sd_power_off(host); else - err = sd_power_on(host); + err = sd_power_on(host, power_mode);
return err; } @@ -1482,10 +1490,11 @@ static int rtsx_pci_sdmmc_drv_probe(stru
host = mmc_priv(mmc); host->pcr = pcr; + mmc->ios.power_delay_ms = 5; host->mmc = mmc; host->pdev = pdev; host->cookie = -1; - host->power_state = SDMMC_POWER_OFF; + host->prev_power_state = MMC_POWER_OFF; INIT_WORK(&host->work, sd_request); platform_set_drvdata(pdev, host); pcr->slots[RTSX_SD_CARD].p_dev = pdev;
From: Helge Deller deller@gmx.de
commit 7962c0896429af2a0e00ec6bc15d992536453b2d upstream.
This reverts commit d97180ad68bdb7ee10f327205a649bc2f558741d.
It triggers RCU stalls at boot with a 32-bit kernel.
Signed-off-by: Helge Deller deller@gmx.de Noticed-by: John David Anglin dave.anglin@bell.net Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/setup.c | 2 ++ arch/parisc/kernel/time.c | 6 +----- 2 files changed, 3 insertions(+), 5 deletions(-)
--- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -150,6 +150,8 @@ void __init setup_arch(char **cmdline_p) #ifdef CONFIG_PA11 dma_ops_init(); #endif + + clear_sched_clock_stable(); }
/* --- a/arch/parisc/kernel/time.c +++ b/arch/parisc/kernel/time.c @@ -249,13 +249,9 @@ void __init time_init(void) static int __init init_cr16_clocksource(void) { /* - * The cr16 interval timers are not syncronized across CPUs, even if - * they share the same socket. + * The cr16 interval timers are not synchronized across CPUs. */ if (num_online_cpus() > 1 && !running_on_qemu) { - /* mark sched_clock unstable */ - clear_sched_clock_stable(); - clocksource_cr16.name = "cr16_unstable"; clocksource_cr16.flags = CLOCK_SOURCE_UNSTABLE; clocksource_cr16.rating = 0;
From: Frederic Weisbecker frederic@kernel.org
commit 3e61e95e2d095e308616cba4ffb640f95a480e01 upstream.
The callbacks processing time limit makes sure we are not exceeding a given amount of time executing the queue.
However its "continue" clause bypasses the cond_resched() call on rcuc and NOCB kthreads, delaying it until we reach the limit, which can be very long...
Make sure the scheduler has a higher priority than the time limit.
Reviewed-by: Valentin Schneider valentin.schneider@arm.com Tested-by: Valentin Schneider valentin.schneider@arm.com Tested-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Frederic Weisbecker frederic@kernel.org Cc: Valentin Schneider valentin.schneider@arm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Josh Triplett josh@joshtriplett.org Cc: Joel Fernandes joel@joelfernandes.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Neeraj Upadhyay neeraju@codeaurora.org Cc: Uladzislau Rezki urezki@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Paul E. McKenney paulmck@kernel.org [UR: backport to 5.15-stable + commit update] Signed-off-by: Uladzislau Rezki (Sony) urezki@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/rcu/tree.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-)
--- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2513,10 +2513,22 @@ static void rcu_do_batch(struct rcu_data /* * Stop only if limit reached and CPU has something to do. */ - if (count >= bl && !offloaded && - (need_resched() || - (!is_idle_task(current) && !rcu_is_callbacks_kthread()))) - break; + if (in_serving_softirq()) { + if (count >= bl && (need_resched() || + (!is_idle_task(current) && !rcu_is_callbacks_kthread()))) + break; + } else { + local_bh_enable(); + lockdep_assert_irqs_enabled(); + cond_resched_tasks_rcu_qs(); + lockdep_assert_irqs_enabled(); + local_bh_disable(); + } + + /* + * Make sure we don't spend too much time here and deprive other + * softirq vectors of CPU cycles. + */ if (unlikely(tlimit)) { /* only call local_clock() every 32 callbacks */ if (likely((count & 31) || local_clock() < tlimit)) @@ -2524,13 +2536,6 @@ static void rcu_do_batch(struct rcu_data /* Exceeded the time limit, so leave. */ break; } - if (!in_serving_softirq()) { - local_bh_enable(); - lockdep_assert_irqs_enabled(); - cond_resched_tasks_rcu_qs(); - lockdep_assert_irqs_enabled(); - local_bh_disable(); - } }
local_irq_save(flags);
From: Frederic Weisbecker frederic@kernel.org
commit a554ba288845fd3f6f12311fd76a51694233458a upstream.
Time limit only makes sense when callbacks are serviced in softirq mode because:
_ In case we need to get back to the scheduler, cond_resched_tasks_rcu_qs() is called after each callback.
_ In case some other softirq vector needs the CPU, the call to local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about them via a call to do_softirq().
Therefore, make sure the time limit only applies to softirq mode.
Reviewed-by: Valentin Schneider valentin.schneider@arm.com Tested-by: Valentin Schneider valentin.schneider@arm.com Tested-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Frederic Weisbecker frederic@kernel.org Cc: Valentin Schneider valentin.schneider@arm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Josh Triplett josh@joshtriplett.org Cc: Joel Fernandes joel@joelfernandes.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Neeraj Upadhyay neeraju@codeaurora.org Cc: Uladzislau Rezki urezki@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Paul E. McKenney paulmck@kernel.org [UR: backport to 5.15-stable] Signed-off-by: Uladzislau Rezki (Sony) urezki@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/rcu/tree.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)
--- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2476,7 +2476,7 @@ static void rcu_do_batch(struct rcu_data div = READ_ONCE(rcu_divisor); div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div; bl = max(rdp->blimit, pending >> div); - if (unlikely(bl > 100)) { + if (in_serving_softirq() && unlikely(bl > 100)) { long rrn = READ_ONCE(rcu_resched_ns);
rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn; @@ -2517,6 +2517,18 @@ static void rcu_do_batch(struct rcu_data if (count >= bl && (need_resched() || (!is_idle_task(current) && !rcu_is_callbacks_kthread()))) break; + + /* + * Make sure we don't spend too much time here and deprive other + * softirq vectors of CPU cycles. + */ + if (unlikely(tlimit)) { + /* only call local_clock() every 32 callbacks */ + if (likely((count & 31) || local_clock() < tlimit)) + continue; + /* Exceeded the time limit, so leave. */ + break; + } } else { local_bh_enable(); lockdep_assert_irqs_enabled(); @@ -2524,18 +2536,6 @@ static void rcu_do_batch(struct rcu_data lockdep_assert_irqs_enabled(); local_bh_disable(); } - - /* - * Make sure we don't spend too much time here and deprive other - * softirq vectors of CPU cycles. - */ - if (unlikely(tlimit)) { - /* only call local_clock() every 32 callbacks */ - if (likely((count & 31) || local_clock() < tlimit)) - continue; - /* Exceeded the time limit, so leave. */ - break; - } }
local_irq_save(flags);
From: Pali Roh�r pali@kernel.org
commit 9319230ac147067652b58fe849ffe0ceec098665 upstream.
The current assignment to the class_revision member
class_revision |= cpu_to_le32(PCI_CLASS_BRIDGE_PCI << 16);
can make the reader think that class is at high 16 bits of the member and revision at low 16 bits.
In reality, class is at high 24 bits, but the class for PCI Bridge Normal Decode is PCI_CLASS_BRIDGE_PCI << 8.
Change the assignment and add a comment to make this clearer.
Link: https://lore.kernel.org/r/20211130172913.9727-2-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pci-bridge-emul.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/pci/pci-bridge-emul.c +++ b/drivers/pci/pci-bridge-emul.c @@ -284,7 +284,11 @@ int pci_bridge_emul_init(struct pci_brid { BUILD_BUG_ON(sizeof(bridge->conf) != PCI_BRIDGE_CONF_END);
- bridge->conf.class_revision |= cpu_to_le32(PCI_CLASS_BRIDGE_PCI << 16); + /* + * class_revision: Class is high 24 bits and revision is low 8 bit of this member, + * while class for PCI Bridge Normal Decode has the 24-bit value: PCI_CLASS_BRIDGE_PCI << 8 + */ + bridge->conf.class_revision |= cpu_to_le32((PCI_CLASS_BRIDGE_PCI << 8) << 8); bridge->conf.header_type = PCI_HEADER_TYPE_BRIDGE; bridge->conf.cache_line_size = 0x10; bridge->conf.status = cpu_to_le16(PCI_STATUS_CAP_LIST);
From: Pali Roh�r pali@kernel.org
commit 8ea673a8b30b4a32516b8adabb15e2a68ff02ec8 upstream.
pci-bridge-emul driver already allocates buffer for capabilities up to the PCI_EXP_SLTSTA2 register, but does not define bit access behavior for these registers. Add these missing definitions.
Link: https://lore.kernel.org/r/20211130172913.9727-3-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pci-bridge-emul.c | 43 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
--- a/drivers/pci/pci-bridge-emul.c +++ b/drivers/pci/pci-bridge-emul.c @@ -270,6 +270,49 @@ struct pci_bridge_reg_behavior pcie_cap_ .ro = GENMASK(15, 0) | PCI_EXP_RTSTA_PENDING, .w1c = PCI_EXP_RTSTA_PME, }, + + [PCI_EXP_DEVCAP2 / 4] = { + /* + * Device capabilities 2 register has reserved bits [30:27]. + * Also bits [26:24] are reserved for non-upstream ports. + */ + .ro = BIT(31) | GENMASK(23, 0), + }, + + [PCI_EXP_DEVCTL2 / 4] = { + /* + * Device control 2 register is RW. Bit 11 is reserved for + * non-upstream ports. + * + * Device status 2 register is reserved. + */ + .rw = GENMASK(15, 12) | GENMASK(10, 0), + }, + + [PCI_EXP_LNKCAP2 / 4] = { + /* Link capabilities 2 register has reserved bits [30:25] and 0. */ + .ro = BIT(31) | GENMASK(24, 1), + }, + + [PCI_EXP_LNKCTL2 / 4] = { + /* + * Link control 2 register is RW. + * + * Link status 2 register has bits 5, 15 W1C; + * bits 10, 11 reserved and others are RO. + */ + .rw = GENMASK(15, 0), + .w1c = (BIT(15) | BIT(5)) << 16, + .ro = (GENMASK(14, 12) | GENMASK(9, 6) | GENMASK(4, 0)) << 16, + }, + + [PCI_EXP_SLTCAP2 / 4] = { + /* Slot capabilities 2 register is reserved. */ + }, + + [PCI_EXP_SLTCTL2 / 4] = { + /* Both Slot control 2 and Slot status 2 registers are reserved. */ + }, };
/*
From: Pali Roh�r pali@kernel.org
commit 1d3e170344dff2cef8827db6c09909b78cbc11d7 upstream.
PCI aardvark hardware supports access to DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 configuration registers of PCIe core via PCIE_CORE_PCIEXP_CAP. Export them via emulated software root bridge.
Link: https://lore.kernel.org/r/20211130172913.9727-4-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -876,8 +876,13 @@ advk_pci_bridge_emul_pcie_conf_read(stru
case PCI_EXP_DEVCAP: case PCI_EXP_DEVCTL: + case PCI_EXP_DEVCAP2: + case PCI_EXP_DEVCTL2: + case PCI_EXP_LNKCAP2: + case PCI_EXP_LNKCTL2: *value = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + reg); return PCI_BRIDGE_EMUL_HANDLED; + default: return PCI_BRIDGE_EMUL_NOT_HANDLED; } @@ -891,10 +896,6 @@ advk_pci_bridge_emul_pcie_conf_write(str struct advk_pcie *pcie = bridge->data;
switch (reg) { - case PCI_EXP_DEVCTL: - advk_writel(pcie, new, PCIE_CORE_PCIEXP_CAP + reg); - break; - case PCI_EXP_LNKCTL: advk_writel(pcie, new, PCIE_CORE_PCIEXP_CAP + reg); if (new & PCI_EXP_LNKCTL_RL) @@ -916,6 +917,12 @@ advk_pci_bridge_emul_pcie_conf_write(str advk_writel(pcie, new, PCIE_ISR0_REG); break;
+ case PCI_EXP_DEVCTL: + case PCI_EXP_DEVCTL2: + case PCI_EXP_LNKCTL2: + advk_writel(pcie, new, PCIE_CORE_PCIEXP_CAP + reg); + break; + default: break; }
From: Pali Roh�r pali@kernel.org
commit 7d8dc1f7cd007a7ce94c5b4c20d63a8b8d6d7751 upstream.
We already clear all the other interrupts (ISR0, ISR1, HOST_CTRL_INT).
Define a new macro PCIE_MSI_ALL_MASK and do the same clearing for MSIs, to ensure that we don't start receiving spurious interrupts.
Use this new mask in advk_pcie_handle_msi();
Link: https://lore.kernel.org/r/20211130172913.9727-5-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -115,6 +115,7 @@ #define PCIE_MSI_ADDR_HIGH_REG (CONTROL_BASE_ADDR + 0x54) #define PCIE_MSI_STATUS_REG (CONTROL_BASE_ADDR + 0x58) #define PCIE_MSI_MASK_REG (CONTROL_BASE_ADDR + 0x5C) +#define PCIE_MSI_ALL_MASK GENMASK(31, 0) #define PCIE_MSI_PAYLOAD_REG (CONTROL_BASE_ADDR + 0x9C) #define PCIE_MSI_DATA_MASK GENMASK(15, 0)
@@ -570,6 +571,7 @@ static void advk_pcie_setup_hw(struct ad advk_writel(pcie, reg, PCIE_CORE_CTRL2_REG);
/* Clear all interrupts */ + advk_writel(pcie, PCIE_MSI_ALL_MASK, PCIE_MSI_STATUS_REG); advk_writel(pcie, PCIE_ISR0_ALL_MASK, PCIE_ISR0_REG); advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_REG); advk_writel(pcie, PCIE_IRQ_ALL_MASK, HOST_CTRL_INT_STATUS_REG); @@ -582,7 +584,7 @@ static void advk_pcie_setup_hw(struct ad advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_MASK_REG);
/* Unmask all MSIs */ - advk_writel(pcie, 0, PCIE_MSI_MASK_REG); + advk_writel(pcie, ~(u32)PCIE_MSI_ALL_MASK, PCIE_MSI_MASK_REG);
/* Enable summary interrupt for GIC SPI source */ reg = PCIE_IRQ_ALL_MASK & (~PCIE_IRQ_ENABLE_INTS_MASK); @@ -1389,7 +1391,7 @@ static void advk_pcie_handle_msi(struct
msi_mask = advk_readl(pcie, PCIE_MSI_MASK_REG); msi_val = advk_readl(pcie, PCIE_MSI_STATUS_REG); - msi_status = msi_val & ~msi_mask; + msi_status = msi_val & ((~msi_mask) & PCIE_MSI_ALL_MASK);
for (msi_idx = 0; msi_idx < MSI_IRQ_NUM; msi_idx++) { if (!(BIT(msi_idx) & msi_status))
From: Pali Roh�r pali@kernel.org
commit a4ca7948e1d47275f8f3e5023243440c40561916 upstream.
Add two more comments into the advk_pcie_remove() method.
Link: https://lore.kernel.org/r/20211130172913.9727-6-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1681,11 +1681,13 @@ static int advk_pcie_remove(struct platf struct pci_host_bridge *bridge = pci_host_bridge_from_priv(pcie); int i;
+ /* Remove PCI bus with all devices */ pci_lock_rescan_remove(); pci_stop_root_bus(bridge->bus); pci_remove_root_bus(bridge->bus); pci_unlock_rescan_remove();
+ /* Remove IRQ domains */ advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie);
From: Pali Roh�r pali@kernel.org
commit a46f2f6dd4093438d9615dfbf5c0fea2a9835dba upstream.
Ensure that after driver unbind PCIe cards are not able to forward memory and I/O requests in the upstream direction.
Link: https://lore.kernel.org/r/20211130172913.9727-7-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1679,6 +1679,7 @@ static int advk_pcie_remove(struct platf { struct advk_pcie *pcie = platform_get_drvdata(pdev); struct pci_host_bridge *bridge = pci_host_bridge_from_priv(pcie); + u32 val; int i;
/* Remove PCI bus with all devices */ @@ -1687,6 +1688,11 @@ static int advk_pcie_remove(struct platf pci_remove_root_bus(bridge->bus); pci_unlock_rescan_remove();
+ /* Disable Root Bridge I/O space, memory space and bus mastering */ + val = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG); + val &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); + advk_writel(pcie, val, PCIE_CORE_CMD_STATUS_REG); + /* Remove IRQ domains */ advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie);
From: Pali Roh�r pali@kernel.org
commit 13bcdf07cb2ecff5d45d2c141df2539b15211448 upstream.
Ensure that no interrupt can be triggered after driver unbind.
Link: https://lore.kernel.org/r/20211130172913.9727-8-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1693,6 +1693,27 @@ static int advk_pcie_remove(struct platf val &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); advk_writel(pcie, val, PCIE_CORE_CMD_STATUS_REG);
+ /* Disable MSI */ + val = advk_readl(pcie, PCIE_CORE_CTRL2_REG); + val &= ~PCIE_CORE_CTRL2_MSI_ENABLE; + advk_writel(pcie, val, PCIE_CORE_CTRL2_REG); + + /* Clear MSI address */ + advk_writel(pcie, 0, PCIE_MSI_ADDR_LOW_REG); + advk_writel(pcie, 0, PCIE_MSI_ADDR_HIGH_REG); + + /* Mask all interrupts */ + advk_writel(pcie, PCIE_MSI_ALL_MASK, PCIE_MSI_MASK_REG); + advk_writel(pcie, PCIE_ISR0_ALL_MASK, PCIE_ISR0_MASK_REG); + advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_MASK_REG); + advk_writel(pcie, PCIE_IRQ_ALL_MASK, HOST_CTRL_INT_MASK_REG); + + /* Clear all interrupts */ + advk_writel(pcie, PCIE_MSI_ALL_MASK, PCIE_MSI_STATUS_REG); + advk_writel(pcie, PCIE_ISR0_ALL_MASK, PCIE_ISR0_REG); + advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_REG); + advk_writel(pcie, PCIE_IRQ_ALL_MASK, HOST_CTRL_INT_STATUS_REG); + /* Remove IRQ domains */ advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie);
From: Pali Roh�r pali@kernel.org
commit 2f040a17f5061457ae95035326d3159eddc1e5cc upstream.
Free config space for emulated root bridge when unbinding driver to fix memory leak. Do it after disabling and masking all interrupts, since aardvark interrupt handler accesses config space of emulated root bridge.
Link: https://lore.kernel.org/r/20211130172913.9727-9-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1718,6 +1718,9 @@ static int advk_pcie_remove(struct platf advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie);
+ /* Free config space for emulated root bridge */ + pci_bridge_emul_cleanup(&pcie->bridge); + /* Disable outbound address windows mapping */ for (i = 0; i < OB_WIN_COUNT; i++) advk_pcie_disable_ob_win(pcie, i);
From: Pali Roh�r pali@kernel.org
commit 1f54391be8ce0c981d312cb93acdc5608def576a upstream.
Put the PCIe card into reset by asserting PERST# signal when unbinding driver. It doesn't make sense to leave the card working if it can't communicate with the host. This should also save some power.
Link: https://lore.kernel.org/r/20211130172913.9727-10-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1721,6 +1721,10 @@ static int advk_pcie_remove(struct platf /* Free config space for emulated root bridge */ pci_bridge_emul_cleanup(&pcie->bridge);
+ /* Assert PERST# signal which prepares PCIe card for power down */ + if (pcie->reset_gpio) + gpiod_set_value_cansleep(pcie->reset_gpio, 1); + /* Disable outbound address windows mapping */ for (i = 0; i < OB_WIN_COUNT; i++) advk_pcie_disable_ob_win(pcie, i);
From: Pali Roh�r pali@kernel.org
commit 759dec2e3dfdbd261c41d2279f04f2351c971a49 upstream.
Disable link training circuit in driver unbind sequence. We want to leave link training in the same state as it was before the driver was probed.
Link: https://lore.kernel.org/r/20211130172913.9727-11-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1725,6 +1725,11 @@ static int advk_pcie_remove(struct platf if (pcie->reset_gpio) gpiod_set_value_cansleep(pcie->reset_gpio, 1);
+ /* Disable link training */ + val = advk_readl(pcie, PCIE_CORE_CTRL0_REG); + val &= ~LINK_TRAINING_EN; + advk_writel(pcie, val, PCIE_CORE_CTRL0_REG); + /* Disable outbound address windows mapping */ for (i = 0; i < OB_WIN_COUNT; i++) advk_pcie_disable_ob_win(pcie, i);
From: Pali Roh�r pali@kernel.org
commit fdbbe242c15a8f2cd0e3ad8a56cd0a447b771d0d upstream.
Disable the PCIe PHY when unbinding driver. This should save some power.
Link: https://lore.kernel.org/r/20211130172913.9727-12-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1734,6 +1734,9 @@ static int advk_pcie_remove(struct platf for (i = 0; i < OB_WIN_COUNT; i++) advk_pcie_disable_ob_win(pcie, i);
+ /* Disable phy */ + advk_pcie_disable_phy(pcie); + return 0; }
From: Pali Roh�r pali@kernel.org
commit 1d86abf1f89672a70f2ab65f6000299feb1f1781 upstream.
Header file linux/pci.h defines enum pci_interrupt_pin with corresponding PCI_INTERRUPT_* values.
Link: https://lore.kernel.org/r/20220110015018.26359-2-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -38,10 +38,6 @@ #define PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX_EN BIT(6) #define PCIE_CORE_ERR_CAPCTL_ECRC_CHCK BIT(7) #define PCIE_CORE_ERR_CAPCTL_ECRC_CHCK_RCV BIT(8) -#define PCIE_CORE_INT_A_ASSERT_ENABLE 1 -#define PCIE_CORE_INT_B_ASSERT_ENABLE 2 -#define PCIE_CORE_INT_C_ASSERT_ENABLE 3 -#define PCIE_CORE_INT_D_ASSERT_ENABLE 4 /* PIO registers base address and register offsets */ #define PIO_BASE_ADDR 0x4000 #define PIO_CTRL (PIO_BASE_ADDR + 0x0) @@ -961,7 +957,7 @@ static int advk_sw_pci_bridge_init(struc bridge->conf.pref_mem_limit = cpu_to_le16(PCI_PREF_RANGE_TYPE_64);
/* Support interrupt A for MSI feature */ - bridge->conf.intpin = PCIE_CORE_INT_A_ASSERT_ENABLE; + bridge->conf.intpin = PCI_INTERRUPT_INTA;
/* Aardvark HW provides PCIe Capability structure in version 2 */ bridge->pcie_conf.cap = cpu_to_le16(2);
From: Pali Roh�r pali@kernel.org
commit 1571d67dc190e50c6c56e8f88cdc39f7cc53166e upstream.
Rewrite the code to use irq_set_chained_handler_and_data() handler with chained_irq_enter() and chained_irq_exit() processing instead of using devm_request_irq().
advk_pcie_irq_handler() reads IRQ status bits and calls other functions based on which bits are set. These functions then read its own IRQ status bits and calls other aardvark functions based on these bits. Finally generic_handle_domain_irq() with translated linux IRQ numbers are called.
Link: https://lore.kernel.org/r/20220110015018.26359-5-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 48 ++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 22 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -268,6 +268,7 @@ struct advk_pcie { u32 actions; } wins[OB_WIN_COUNT]; u8 wins_count; + int irq; struct irq_domain *irq_domain; struct irq_chip irq_chip; raw_spinlock_t irq_lock; @@ -1432,21 +1433,26 @@ static void advk_pcie_handle_int(struct } }
-static irqreturn_t advk_pcie_irq_handler(int irq, void *arg) +static void advk_pcie_irq_handler(struct irq_desc *desc) { - struct advk_pcie *pcie = arg; - u32 status; + struct advk_pcie *pcie = irq_desc_get_handler_data(desc); + struct irq_chip *chip = irq_desc_get_chip(desc); + u32 val, mask, status;
- status = advk_readl(pcie, HOST_CTRL_INT_STATUS_REG); - if (!(status & PCIE_IRQ_CORE_INT)) - return IRQ_NONE; + chained_irq_enter(chip, desc);
- advk_pcie_handle_int(pcie); + val = advk_readl(pcie, HOST_CTRL_INT_STATUS_REG); + mask = advk_readl(pcie, HOST_CTRL_INT_MASK_REG); + status = val & ((~mask) & PCIE_IRQ_ALL_MASK);
- /* Clear interrupt */ - advk_writel(pcie, PCIE_IRQ_CORE_INT, HOST_CTRL_INT_STATUS_REG); + if (status & PCIE_IRQ_CORE_INT) { + advk_pcie_handle_int(pcie);
- return IRQ_HANDLED; + /* Clear interrupt */ + advk_writel(pcie, PCIE_IRQ_CORE_INT, HOST_CTRL_INT_STATUS_REG); + } + + chained_irq_exit(chip, desc); }
static void __maybe_unused advk_pcie_disable_phy(struct advk_pcie *pcie) @@ -1513,7 +1519,7 @@ static int advk_pcie_probe(struct platfo struct advk_pcie *pcie; struct pci_host_bridge *bridge; struct resource_entry *entry; - int ret, irq; + int ret;
bridge = devm_pci_alloc_host_bridge(dev, sizeof(struct advk_pcie)); if (!bridge) @@ -1599,17 +1605,9 @@ static int advk_pcie_probe(struct platfo if (IS_ERR(pcie->base)) return PTR_ERR(pcie->base);
- irq = platform_get_irq(pdev, 0); - if (irq < 0) - return irq; - - ret = devm_request_irq(dev, irq, advk_pcie_irq_handler, - IRQF_SHARED | IRQF_NO_THREAD, "advk-pcie", - pcie); - if (ret) { - dev_err(dev, "Failed to register interrupt\n"); - return ret; - } + pcie->irq = platform_get_irq(pdev, 0); + if (pcie->irq < 0) + return pcie->irq;
pcie->reset_gpio = devm_gpiod_get_from_of_node(dev, dev->of_node, "reset-gpios", 0, @@ -1658,11 +1656,14 @@ static int advk_pcie_probe(struct platfo return ret; }
+ irq_set_chained_handler_and_data(pcie->irq, advk_pcie_irq_handler, pcie); + bridge->sysdata = pcie; bridge->ops = &advk_pcie_ops;
ret = pci_host_probe(bridge); if (ret < 0) { + irq_set_chained_handler_and_data(pcie->irq, NULL, NULL); advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie); return ret; @@ -1710,6 +1711,9 @@ static int advk_pcie_remove(struct platf advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_REG); advk_writel(pcie, PCIE_IRQ_ALL_MASK, HOST_CTRL_INT_STATUS_REG);
+ /* Remove IRQ handler */ + irq_set_chained_handler_and_data(pcie->irq, NULL, NULL); + /* Remove IRQ domains */ advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie);
From: Pali Roh�r pali@kernel.org
commit 51f96e287c6f003d3bb29672811c757c5fbf0028 upstream.
It is possible that we receive spurious INTx interrupt. Check for the return value of generic_handle_domain_irq() when processing INTx IRQ.
Link: https://lore.kernel.org/r/20220110015018.26359-6-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1429,7 +1429,9 @@ static void advk_pcie_handle_int(struct advk_writel(pcie, PCIE_ISR1_INTX_ASSERT(i), PCIE_ISR1_REG);
- generic_handle_domain_irq(pcie->irq_domain, i); + if (generic_handle_domain_irq(pcie->irq_domain, i) == -EINVAL) + dev_err_ratelimited(&pcie->pdev->dev, "unexpected INT%c IRQ\n", + (char)i + 'A'); } }
From: "Marek Beh�n" kabel@kernel.org
commit c3cb8e51839adc0aaef478c47665443d02f5aa07 upstream.
In [1] it was agreed that we should use struct irq_chip as a global static struct in the driver. Even though the structure currently contains a dynamic member (parent_device), In [2] the plans to kill it and make the structure completely static were set out.
Convert Aardvark's priv->msi_bottom_irq_chip and priv->msi_irq_chip to static driver structure.
[1] https://lore.kernel.org/linux-pci/877dbcvngf.wl-maz@kernel.org/ [2] https://lore.kernel.org/linux-pci/874k6gvkhz.wl-maz@kernel.org/
Link: https://lore.kernel.org/r/20220110015018.26359-7-kabel@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -274,8 +274,6 @@ struct advk_pcie { raw_spinlock_t irq_lock; struct irq_domain *msi_domain; struct irq_domain *msi_inner_domain; - struct irq_chip msi_bottom_irq_chip; - struct irq_chip msi_irq_chip; struct msi_domain_info msi_domain_info; DECLARE_BITMAP(msi_used, MSI_IRQ_NUM); struct mutex msi_used_lock; @@ -1194,6 +1192,12 @@ static int advk_msi_set_affinity(struct return -EINVAL; }
+static struct irq_chip advk_msi_bottom_irq_chip = { + .name = "MSI", + .irq_compose_msi_msg = advk_msi_irq_compose_msi_msg, + .irq_set_affinity = advk_msi_set_affinity, +}; + static int advk_msi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) @@ -1210,7 +1214,7 @@ static int advk_msi_irq_domain_alloc(str
for (i = 0; i < nr_irqs; i++) irq_domain_set_info(domain, virq + i, hwirq + i, - &pcie->msi_bottom_irq_chip, + &advk_msi_bottom_irq_chip, domain->host_data, handle_simple_irq, NULL, NULL);
@@ -1280,29 +1284,23 @@ static const struct irq_domain_ops advk_ .xlate = irq_domain_xlate_onecell, };
+static struct irq_chip advk_msi_irq_chip = { + .name = "advk-MSI", +}; + static int advk_pcie_init_msi_irq_domain(struct advk_pcie *pcie) { struct device *dev = &pcie->pdev->dev; struct device_node *node = dev->of_node; - struct irq_chip *bottom_ic, *msi_ic; struct msi_domain_info *msi_di; phys_addr_t msi_msg_phys;
mutex_init(&pcie->msi_used_lock);
- bottom_ic = &pcie->msi_bottom_irq_chip; - - bottom_ic->name = "MSI"; - bottom_ic->irq_compose_msi_msg = advk_msi_irq_compose_msi_msg; - bottom_ic->irq_set_affinity = advk_msi_set_affinity; - - msi_ic = &pcie->msi_irq_chip; - msi_ic->name = "advk-MSI"; - msi_di = &pcie->msi_domain_info; msi_di->flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI; - msi_di->chip = msi_ic; + msi_di->chip = &advk_msi_irq_chip;
msi_msg_phys = virt_to_phys(&pcie->msi_msg);
From: "Marek Beh�n" kabel@kernel.org
commit 26bcd54e4a5cd51ec12d06fdc30e22863ed4c422 upstream.
Make Aardvark's msi_domain_info structure into a private driver structure. Domain info is same for every potential instatination of a controller.
Link: https://lore.kernel.org/r/20220110015018.26359-8-kabel@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -274,7 +274,6 @@ struct advk_pcie { raw_spinlock_t irq_lock; struct irq_domain *msi_domain; struct irq_domain *msi_inner_domain; - struct msi_domain_info msi_domain_info; DECLARE_BITMAP(msi_used, MSI_IRQ_NUM); struct mutex msi_used_lock; u16 msi_msg; @@ -1288,20 +1287,20 @@ static struct irq_chip advk_msi_irq_chip .name = "advk-MSI", };
+static struct msi_domain_info advk_msi_domain_info = { + .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | + MSI_FLAG_MULTI_PCI_MSI, + .chip = &advk_msi_irq_chip, +}; + static int advk_pcie_init_msi_irq_domain(struct advk_pcie *pcie) { struct device *dev = &pcie->pdev->dev; struct device_node *node = dev->of_node; - struct msi_domain_info *msi_di; phys_addr_t msi_msg_phys;
mutex_init(&pcie->msi_used_lock);
- msi_di = &pcie->msi_domain_info; - msi_di->flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | - MSI_FLAG_MULTI_PCI_MSI; - msi_di->chip = &advk_msi_irq_chip; - msi_msg_phys = virt_to_phys(&pcie->msi_msg);
advk_writel(pcie, lower_32_bits(msi_msg_phys), @@ -1317,7 +1316,8 @@ static int advk_pcie_init_msi_irq_domain
pcie->msi_domain = pci_msi_create_irq_domain(of_node_to_fwnode(node), - msi_di, pcie->msi_inner_domain); + &advk_msi_domain_info, + pcie->msi_inner_domain); if (!pcie->msi_domain) { irq_domain_remove(pcie->msi_inner_domain); return -ENOMEM;
From: "Marek Beh�n" kabel@kernel.org
commit 222af78532fa299cd9b1008e49c347b7f5a45c17 upstream.
Use simple dev_fwnode(dev) instead of struct device_node *node = dev->of_node; of_node_to_fwnode(node) especially since the node variable is not used elsewhere in the function.
Link: https://lore.kernel.org/r/20220110015018.26359-9-kabel@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1296,7 +1296,6 @@ static struct msi_domain_info advk_msi_d static int advk_pcie_init_msi_irq_domain(struct advk_pcie *pcie) { struct device *dev = &pcie->pdev->dev; - struct device_node *node = dev->of_node; phys_addr_t msi_msg_phys;
mutex_init(&pcie->msi_used_lock); @@ -1315,7 +1314,7 @@ static int advk_pcie_init_msi_irq_domain return -ENOMEM;
pcie->msi_domain = - pci_msi_create_irq_domain(of_node_to_fwnode(node), + pci_msi_create_irq_domain(dev_fwnode(dev), &advk_msi_domain_info, pcie->msi_inner_domain); if (!pcie->msi_domain) {
From: Pali Roh�r pali@kernel.org
commit 4689c0916320f112a8a33f2689d3addc3262f02c upstream.
Refactor the masking of ISR0/1 Sources and unmasking of summary MSI interrupt so that it corresponds to the comments: - first mask all ISR0/1 - then unmask all MSIs - then unmask summary MSI interrupt
Link: https://lore.kernel.org/r/20220110015018.26359-10-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -571,15 +571,17 @@ static void advk_pcie_setup_hw(struct ad advk_writel(pcie, PCIE_IRQ_ALL_MASK, HOST_CTRL_INT_STATUS_REG);
/* Disable All ISR0/1 Sources */ - reg = PCIE_ISR0_ALL_MASK; - reg &= ~PCIE_ISR0_MSI_INT_PENDING; - advk_writel(pcie, reg, PCIE_ISR0_MASK_REG); - + advk_writel(pcie, PCIE_ISR0_ALL_MASK, PCIE_ISR0_MASK_REG); advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_MASK_REG);
/* Unmask all MSIs */ advk_writel(pcie, ~(u32)PCIE_MSI_ALL_MASK, PCIE_MSI_MASK_REG);
+ /* Unmask summary MSI interrupt */ + reg = advk_readl(pcie, PCIE_ISR0_MASK_REG); + reg &= ~PCIE_ISR0_MSI_INT_PENDING; + advk_writel(pcie, reg, PCIE_ISR0_MASK_REG); + /* Enable summary interrupt for GIC SPI source */ reg = PCIE_IRQ_ALL_MASK & (~PCIE_IRQ_ENABLE_INTS_MASK); advk_writel(pcie, reg, HOST_CTRL_INT_MASK_REG);
From: Pali Roh�r pali@kernel.org
commit e77d9c90691071769cd2b86ef097f7d07167dc3b upstream.
We should not unmask MSIs at setup, but only when kernel asks for them to be unmasked.
At setup, mask all MSIs, and implement IRQ chip callbacks for masking and unmasking particular MSIs.
Link: https://lore.kernel.org/r/20220110015018.26359-11-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 54 ++++++++++++++++++++++++++++++---- 1 file changed, 49 insertions(+), 5 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -274,6 +274,7 @@ struct advk_pcie { raw_spinlock_t irq_lock; struct irq_domain *msi_domain; struct irq_domain *msi_inner_domain; + raw_spinlock_t msi_irq_lock; DECLARE_BITMAP(msi_used, MSI_IRQ_NUM); struct mutex msi_used_lock; u16 msi_msg; @@ -570,12 +571,10 @@ static void advk_pcie_setup_hw(struct ad advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_REG); advk_writel(pcie, PCIE_IRQ_ALL_MASK, HOST_CTRL_INT_STATUS_REG);
- /* Disable All ISR0/1 Sources */ + /* Disable All ISR0/1 and MSI Sources */ advk_writel(pcie, PCIE_ISR0_ALL_MASK, PCIE_ISR0_MASK_REG); advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_MASK_REG); - - /* Unmask all MSIs */ - advk_writel(pcie, ~(u32)PCIE_MSI_ALL_MASK, PCIE_MSI_MASK_REG); + advk_writel(pcie, PCIE_MSI_ALL_MASK, PCIE_MSI_MASK_REG);
/* Unmask summary MSI interrupt */ reg = advk_readl(pcie, PCIE_ISR0_MASK_REG); @@ -1193,10 +1192,52 @@ static int advk_msi_set_affinity(struct return -EINVAL; }
+static void advk_msi_irq_mask(struct irq_data *d) +{ + struct advk_pcie *pcie = d->domain->host_data; + irq_hw_number_t hwirq = irqd_to_hwirq(d); + unsigned long flags; + u32 mask; + + raw_spin_lock_irqsave(&pcie->msi_irq_lock, flags); + mask = advk_readl(pcie, PCIE_MSI_MASK_REG); + mask |= BIT(hwirq); + advk_writel(pcie, mask, PCIE_MSI_MASK_REG); + raw_spin_unlock_irqrestore(&pcie->msi_irq_lock, flags); +} + +static void advk_msi_irq_unmask(struct irq_data *d) +{ + struct advk_pcie *pcie = d->domain->host_data; + irq_hw_number_t hwirq = irqd_to_hwirq(d); + unsigned long flags; + u32 mask; + + raw_spin_lock_irqsave(&pcie->msi_irq_lock, flags); + mask = advk_readl(pcie, PCIE_MSI_MASK_REG); + mask &= ~BIT(hwirq); + advk_writel(pcie, mask, PCIE_MSI_MASK_REG); + raw_spin_unlock_irqrestore(&pcie->msi_irq_lock, flags); +} + +static void advk_msi_top_irq_mask(struct irq_data *d) +{ + pci_msi_mask_irq(d); + irq_chip_mask_parent(d); +} + +static void advk_msi_top_irq_unmask(struct irq_data *d) +{ + pci_msi_unmask_irq(d); + irq_chip_unmask_parent(d); +} + static struct irq_chip advk_msi_bottom_irq_chip = { .name = "MSI", .irq_compose_msi_msg = advk_msi_irq_compose_msi_msg, .irq_set_affinity = advk_msi_set_affinity, + .irq_mask = advk_msi_irq_mask, + .irq_unmask = advk_msi_irq_unmask, };
static int advk_msi_irq_domain_alloc(struct irq_domain *domain, @@ -1286,7 +1327,9 @@ static const struct irq_domain_ops advk_ };
static struct irq_chip advk_msi_irq_chip = { - .name = "advk-MSI", + .name = "advk-MSI", + .irq_mask = advk_msi_top_irq_mask, + .irq_unmask = advk_msi_top_irq_unmask, };
static struct msi_domain_info advk_msi_domain_info = { @@ -1300,6 +1343,7 @@ static int advk_pcie_init_msi_irq_domain struct device *dev = &pcie->pdev->dev; phys_addr_t msi_msg_phys;
+ raw_spin_lock_init(&pcie->msi_irq_lock); mutex_init(&pcie->msi_used_lock);
msi_msg_phys = virt_to_phys(&pcie->msi_msg);
From: Pali Roh�r pali@kernel.org
commit 46ad3dc4171b5ee1d12267d70112563d5760210a upstream.
MSI address for receiving MSI interrupts needs to be correctly set before enabling processing of MSI interrupts.
Move code for setting PCIE_MSI_ADDR_LOW_REG and PCIE_MSI_ADDR_HIGH_REG from advk_pcie_init_msi_irq_domain() to advk_pcie_setup_hw(), before enabling PCIE_CORE_CTRL2_MSI_ENABLE.
After this we can remove the now unused member msi_msg, which was used only for MSI doorbell address. MSI address can be any address which cannot be used to DMA to. So change it to the address of the main struct advk_pcie.
Link: https://lore.kernel.org/r/20220110015018.26359-12-kabel@kernel.org Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver") Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org # f21a8b1b6837 ("PCI: aardvark: Move to MSI handling using generic MSI support") Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -277,7 +277,6 @@ struct advk_pcie { raw_spinlock_t msi_irq_lock; DECLARE_BITMAP(msi_used, MSI_IRQ_NUM); struct mutex msi_used_lock; - u16 msi_msg; int link_gen; struct pci_bridge_emul bridge; struct gpio_desc *reset_gpio; @@ -472,6 +471,7 @@ static void advk_pcie_disable_ob_win(str
static void advk_pcie_setup_hw(struct advk_pcie *pcie) { + phys_addr_t msi_addr; u32 reg; int i;
@@ -560,6 +560,11 @@ static void advk_pcie_setup_hw(struct ad reg |= LANE_COUNT_1; advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG);
+ /* Set MSI address */ + msi_addr = virt_to_phys(pcie); + advk_writel(pcie, lower_32_bits(msi_addr), PCIE_MSI_ADDR_LOW_REG); + advk_writel(pcie, upper_32_bits(msi_addr), PCIE_MSI_ADDR_HIGH_REG); + /* Enable MSI */ reg = advk_readl(pcie, PCIE_CORE_CTRL2_REG); reg |= PCIE_CORE_CTRL2_MSI_ENABLE; @@ -1179,10 +1184,10 @@ static void advk_msi_irq_compose_msi_msg struct msi_msg *msg) { struct advk_pcie *pcie = irq_data_get_irq_chip_data(data); - phys_addr_t msi_msg = virt_to_phys(&pcie->msi_msg); + phys_addr_t msi_addr = virt_to_phys(pcie);
- msg->address_lo = lower_32_bits(msi_msg); - msg->address_hi = upper_32_bits(msi_msg); + msg->address_lo = lower_32_bits(msi_addr); + msg->address_hi = upper_32_bits(msi_addr); msg->data = data->hwirq; }
@@ -1341,18 +1346,10 @@ static struct msi_domain_info advk_msi_d static int advk_pcie_init_msi_irq_domain(struct advk_pcie *pcie) { struct device *dev = &pcie->pdev->dev; - phys_addr_t msi_msg_phys;
raw_spin_lock_init(&pcie->msi_irq_lock); mutex_init(&pcie->msi_used_lock);
- msi_msg_phys = virt_to_phys(&pcie->msi_msg); - - advk_writel(pcie, lower_32_bits(msi_msg_phys), - PCIE_MSI_ADDR_LOW_REG); - advk_writel(pcie, upper_32_bits(msi_msg_phys), - PCIE_MSI_ADDR_HIGH_REG); - pcie->msi_inner_domain = irq_domain_add_linear(NULL, MSI_IRQ_NUM, &advk_msi_domain_ops, pcie);
From: Pali Roh�r pali@kernel.org
commit 754e449889b22fc3c34235e8836f08f51121d307 upstream.
According to PCI 3.0 specification, sending both MSI and MSI-X interrupts is done by DWORD memory write operation to doorbell message address. The write operation for MSI has zero upper 16 bits and the MSI interrupt number in the lower 16 bits, while the write operation for MSI-X contains a 32-bit value from MSI-X table.
Since the driver only uses interrupt numbers from range 0..31, the upper 16 bits of the DWORD memory write operation to doorbell message address are zero even for MSI-X interrupts. Thus we can enable MSI-X interrupts.
Testing proves that kernel can correctly receive MSI-X interrupts from PCIe cards which supports both MSI and MSI-X interrupts.
Link: https://lore.kernel.org/r/20220110015018.26359-13-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1339,7 +1339,7 @@ static struct irq_chip advk_msi_irq_chip
static struct msi_domain_info advk_msi_domain_info = { .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | - MSI_FLAG_MULTI_PCI_MSI, + MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX, .chip = &advk_msi_irq_chip, };
From: Pali Roh�r pali@kernel.org
commit 3ebfefa396ebee21061fd5fa36073368ed2cd467 upstream.
ERR interrupt is triggered when corresponding bit is unmasked in both ISR0 and PCI_EXP_DEVCTL registers. Unmasking ERR bits in PCI_EXP_DEVCTL register is not enough. This means that currently the ERR interrupt is never triggered.
Unmask ERR bits in ISR0 register at driver probe time. ERR interrupt is not triggered until ERR bits are unmasked also in PCI_EXP_DEVCTL register, which is done by AER driver. So it is safe to unconditionally unmask all ERR bits in aardvark probe.
Aardvark HW sets PCI_ERR_ROOT_AER_IRQ to zero and when corresponding bits in ISR0 and PCI_EXP_DEVCTL are enabled, the HW triggers a generic interrupt on GIC. Chain this interrupt to PCIe interrupt 0 with generic_handle_domain_irq() to allow processing of ERR interrupts.
Link: https://lore.kernel.org/r/20220110015018.26359-14-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 35 +++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -98,6 +98,10 @@ #define PCIE_MSG_PM_PME_MASK BIT(7) #define PCIE_ISR0_MASK_REG (CONTROL_BASE_ADDR + 0x44) #define PCIE_ISR0_MSI_INT_PENDING BIT(24) +#define PCIE_ISR0_CORR_ERR BIT(11) +#define PCIE_ISR0_NFAT_ERR BIT(12) +#define PCIE_ISR0_FAT_ERR BIT(13) +#define PCIE_ISR0_ERR_MASK GENMASK(13, 11) #define PCIE_ISR0_INTX_ASSERT(val) BIT(16 + (val)) #define PCIE_ISR0_INTX_DEASSERT(val) BIT(20 + (val)) #define PCIE_ISR0_ALL_MASK GENMASK(31, 0) @@ -778,11 +782,15 @@ advk_pci_bridge_emul_base_conf_read(stru case PCI_INTERRUPT_LINE: { /* * From the whole 32bit register we support reading from HW only - * one bit: PCI_BRIDGE_CTL_BUS_RESET. + * two bits: PCI_BRIDGE_CTL_BUS_RESET and PCI_BRIDGE_CTL_SERR. * Other bits are retrieved only from emulated config buffer. */ __le32 *cfgspace = (__le32 *)&bridge->conf; u32 val = le32_to_cpu(cfgspace[PCI_INTERRUPT_LINE / 4]); + if (advk_readl(pcie, PCIE_ISR0_MASK_REG) & PCIE_ISR0_ERR_MASK) + val &= ~(PCI_BRIDGE_CTL_SERR << 16); + else + val |= PCI_BRIDGE_CTL_SERR << 16; if (advk_readl(pcie, PCIE_CORE_CTRL1_REG) & HOT_RESET_GEN) val |= PCI_BRIDGE_CTL_BUS_RESET << 16; else @@ -808,6 +816,19 @@ advk_pci_bridge_emul_base_conf_write(str break;
case PCI_INTERRUPT_LINE: + /* + * According to Figure 6-3: Pseudo Logic Diagram for Error + * Message Controls in PCIe base specification, SERR# Enable bit + * in Bridge Control register enable receiving of ERR_* messages + */ + if (mask & (PCI_BRIDGE_CTL_SERR << 16)) { + u32 val = advk_readl(pcie, PCIE_ISR0_MASK_REG); + if (new & (PCI_BRIDGE_CTL_SERR << 16)) + val &= ~PCIE_ISR0_ERR_MASK; + else + val |= PCIE_ISR0_ERR_MASK; + advk_writel(pcie, val, PCIE_ISR0_MASK_REG); + } if (mask & (PCI_BRIDGE_CTL_BUS_RESET << 16)) { u32 val = advk_readl(pcie, PCIE_CORE_CTRL1_REG); if (new & (PCI_BRIDGE_CTL_BUS_RESET << 16)) @@ -1457,6 +1478,18 @@ static void advk_pcie_handle_int(struct isr1_mask = advk_readl(pcie, PCIE_ISR1_MASK_REG); isr1_status = isr1_val & ((~isr1_mask) & PCIE_ISR1_ALL_MASK);
+ /* Process ERR interrupt */ + if (isr0_status & PCIE_ISR0_ERR_MASK) { + advk_writel(pcie, PCIE_ISR0_ERR_MASK, PCIE_ISR0_REG); + + /* + * Aardvark HW returns zero for PCI_ERR_ROOT_AER_IRQ, so use + * PCIe interrupt 0 + */ + if (generic_handle_domain_irq(pcie->irq_domain, 0) == -EINVAL) + dev_err_ratelimited(&pcie->pdev->dev, "unhandled ERR IRQ\n"); + } + /* Process MSI interrupts */ if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) advk_pcie_handle_msi(pcie);
From: Pali Roh�r pali@kernel.org
commit 7122bcb33295228c882c0aa32a04b2547beba2c3 upstream.
To optimize advk_pci_bridge_emul_pcie_conf_write() code, touch PCIE_ISR0_REG and PCIE_ISR0_MASK_REG registers only when it is really needed, when processing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME bits.
Link: https://lore.kernel.org/r/20220110015018.26359-16-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -925,19 +925,21 @@ advk_pci_bridge_emul_pcie_conf_write(str advk_pcie_wait_for_retrain(pcie); break;
- case PCI_EXP_RTCTL: { + case PCI_EXP_RTCTL: /* Only mask/unmask PME interrupt */ - u32 val = advk_readl(pcie, PCIE_ISR0_MASK_REG) & - ~PCIE_MSG_PM_PME_MASK; - if ((new & PCI_EXP_RTCTL_PMEIE) == 0) - val |= PCIE_MSG_PM_PME_MASK; - advk_writel(pcie, val, PCIE_ISR0_MASK_REG); + if (mask & PCI_EXP_RTCTL_PMEIE) { + u32 val = advk_readl(pcie, PCIE_ISR0_MASK_REG); + if (new & PCI_EXP_RTCTL_PMEIE) + val &= ~PCIE_MSG_PM_PME_MASK; + else + val |= PCIE_MSG_PM_PME_MASK; + advk_writel(pcie, val, PCIE_ISR0_MASK_REG); + } break; - }
case PCI_EXP_RTSTA: - new = (new & PCI_EXP_RTSTA_PME) >> 9; - advk_writel(pcie, new, PCIE_ISR0_REG); + if (new & PCI_EXP_RTSTA_PME) + advk_writel(pcie, PCIE_MSG_PM_PME_MASK, PCIE_ISR0_REG); break;
case PCI_EXP_DEVCTL:
From: Pali Roh�r pali@kernel.org
commit 0fc75d87454195885bd1a81fc7e6ce92572b6109 upstream.
Currently enabling PCI_EXP_RTSTA_PME bit in PCI_EXP_RTCTL register does nothing. This is because PCIe PME driver expects to receive PCIe interrupt defined in PCI_EXP_FLAGS_IRQ register, but aardvark hardware does not trigger PCIe INTx/MSI interrupt for PME event, rather it triggers custom aardvark interrupt which this driver is not processing yet.
Fix this issue by handling PME interrupt in advk_pcie_handle_int() and chaining it to PCIe interrupt 0 with generic_handle_domain_irq() (since aardvark sets PCI_EXP_FLAGS_IRQ to zero). With this change PCIe PME driver finally starts receiving PME interrupt.
Link: https://lore.kernel.org/r/20220110015018.26359-17-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1480,6 +1480,18 @@ static void advk_pcie_handle_int(struct isr1_mask = advk_readl(pcie, PCIE_ISR1_MASK_REG); isr1_status = isr1_val & ((~isr1_mask) & PCIE_ISR1_ALL_MASK);
+ /* Process PME interrupt */ + if (isr0_status & PCIE_MSG_PM_PME_MASK) { + /* + * Do not clear PME interrupt bit in ISR0, it is cleared by IRQ + * receiver by writing to the PCI_EXP_RTSTA register of emulated + * root bridge. Aardvark HW returns zero for PCI_EXP_FLAGS_IRQ, + * so use PCIe interrupt 0. + */ + if (generic_handle_domain_irq(pcie->irq_domain, 0) == -EINVAL) + dev_err_ratelimited(&pcie->pdev->dev, "unhandled PME IRQ\n"); + } + /* Process ERR interrupt */ if (isr0_status & PCIE_ISR0_ERR_MASK) { advk_writel(pcie, PCIE_ISR0_ERR_MASK, PCIE_ISR0_REG);
From: Pali Roh�r pali@kernel.org
commit 273ddd86d67694e3639e3bfe337a96d8861798b8 upstream.
Enable aardvark PME interrupt unconditionally by unmasking it and read PME requester ID to emulated bridge config space immediately after receiving interrupt.
PME requester ID is stored in the PCIE_MSG_LOG_REG register, which contains the last inbound message. So when new inbound message is received by HW (including non-PM), the content in PCIE_MSG_LOG_REG register is replaced by a new value.
PCIe specification mandates that subsequent PMEs are kept pending until the PME Status Register bit is cleared by software by writing a 1b.
Support for masking/unmasking PME interrupt on emulated bridge via PCI_EXP_RTCTL_PMEIE bit is now implemented only in emulated bridge config space, to ensure that we do not miss any aardvark PME interrupt.
Reading of PCI_EXP_RTCAP and PCI_EXP_RTSTA registers is simplified as final value is now always stored into emulated bridge config space by the interrupt handler, so there is no need to implement support for these registers in read_pcie callback.
Clearing of W1C bit PCI_EXP_RTSTA_PME is now also simplified as it is done by pci-bridge-emul.c code for emulated bridge config space. So there is no need to implement support for clearing this bit in write_pcie callback.
Link: https://lore.kernel.org/r/20220110015018.26359-18-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 91 ++++++++++++++++++---------------- 1 file changed, 50 insertions(+), 41 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -590,6 +590,11 @@ static void advk_pcie_setup_hw(struct ad reg &= ~PCIE_ISR0_MSI_INT_PENDING; advk_writel(pcie, reg, PCIE_ISR0_MASK_REG);
+ /* Unmask PME interrupt for processing of PME requester */ + reg = advk_readl(pcie, PCIE_ISR0_MASK_REG); + reg &= ~PCIE_MSG_PM_PME_MASK; + advk_writel(pcie, reg, PCIE_ISR0_MASK_REG); + /* Enable summary interrupt for GIC SPI source */ reg = PCIE_IRQ_ALL_MASK & (~PCIE_IRQ_ENABLE_INTS_MASK); advk_writel(pcie, reg, HOST_CTRL_INT_MASK_REG); @@ -856,22 +861,11 @@ advk_pci_bridge_emul_pcie_conf_read(stru *value = PCI_EXP_SLTSTA_PDS << 16; return PCI_BRIDGE_EMUL_HANDLED;
- case PCI_EXP_RTCTL: { - u32 val = advk_readl(pcie, PCIE_ISR0_MASK_REG); - *value = (val & PCIE_MSG_PM_PME_MASK) ? 0 : PCI_EXP_RTCTL_PMEIE; - *value |= le16_to_cpu(bridge->pcie_conf.rootctl) & PCI_EXP_RTCTL_CRSSVE; - *value |= PCI_EXP_RTCAP_CRSVIS << 16; - return PCI_BRIDGE_EMUL_HANDLED; - } - - case PCI_EXP_RTSTA: { - u32 isr0 = advk_readl(pcie, PCIE_ISR0_REG); - u32 msglog = advk_readl(pcie, PCIE_MSG_LOG_REG); - *value = msglog >> 16; - if (isr0 & PCIE_MSG_PM_PME_MASK) - *value |= PCI_EXP_RTSTA_PME; - return PCI_BRIDGE_EMUL_HANDLED; - } + /* + * PCI_EXP_RTCTL and PCI_EXP_RTSTA are also supported, but do not need + * to be handled here, because their values are stored in emulated + * config space buffer, and we read them from there when needed. + */
case PCI_EXP_LNKCAP: { u32 val = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + reg); @@ -925,22 +919,19 @@ advk_pci_bridge_emul_pcie_conf_write(str advk_pcie_wait_for_retrain(pcie); break;
- case PCI_EXP_RTCTL: - /* Only mask/unmask PME interrupt */ - if (mask & PCI_EXP_RTCTL_PMEIE) { - u32 val = advk_readl(pcie, PCIE_ISR0_MASK_REG); - if (new & PCI_EXP_RTCTL_PMEIE) - val &= ~PCIE_MSG_PM_PME_MASK; - else - val |= PCIE_MSG_PM_PME_MASK; - advk_writel(pcie, val, PCIE_ISR0_MASK_REG); - } + case PCI_EXP_RTCTL: { + u16 rootctl = le16_to_cpu(bridge->pcie_conf.rootctl); + /* Only emulation of PMEIE and CRSSVE bits is provided */ + rootctl &= PCI_EXP_RTCTL_PMEIE | PCI_EXP_RTCTL_CRSSVE; + bridge->pcie_conf.rootctl = cpu_to_le16(rootctl); break; + }
- case PCI_EXP_RTSTA: - if (new & PCI_EXP_RTSTA_PME) - advk_writel(pcie, PCIE_MSG_PM_PME_MASK, PCIE_ISR0_REG); - break; + /* + * PCI_EXP_RTSTA is also supported, but does not need to be handled + * here, because its value is stored in emulated config space buffer, + * and we write it there when needed. + */
case PCI_EXP_DEVCTL: case PCI_EXP_DEVCTL2: @@ -1445,6 +1436,32 @@ static void advk_pcie_remove_irq_domain( irq_domain_remove(pcie->irq_domain); }
+static void advk_pcie_handle_pme(struct advk_pcie *pcie) +{ + u32 requester = advk_readl(pcie, PCIE_MSG_LOG_REG) >> 16; + + advk_writel(pcie, PCIE_MSG_PM_PME_MASK, PCIE_ISR0_REG); + + /* + * PCIE_MSG_LOG_REG contains the last inbound message, so store + * the requester ID only when PME was not asserted yet. + * Also do not trigger PME interrupt when PME is still asserted. + */ + if (!(le32_to_cpu(pcie->bridge.pcie_conf.rootsta) & PCI_EXP_RTSTA_PME)) { + pcie->bridge.pcie_conf.rootsta = cpu_to_le32(requester | PCI_EXP_RTSTA_PME); + + /* + * Trigger PME interrupt only if PMEIE bit in Root Control is set. + * Aardvark HW returns zero for PCI_EXP_FLAGS_IRQ, so use PCIe interrupt 0. + */ + if (!(le16_to_cpu(pcie->bridge.pcie_conf.rootctl) & PCI_EXP_RTCTL_PMEIE)) + return; + + if (generic_handle_domain_irq(pcie->irq_domain, 0) == -EINVAL) + dev_err_ratelimited(&pcie->pdev->dev, "unhandled PME IRQ\n"); + } +} + static void advk_pcie_handle_msi(struct advk_pcie *pcie) { u32 msi_val, msi_mask, msi_status, msi_idx; @@ -1480,17 +1497,9 @@ static void advk_pcie_handle_int(struct isr1_mask = advk_readl(pcie, PCIE_ISR1_MASK_REG); isr1_status = isr1_val & ((~isr1_mask) & PCIE_ISR1_ALL_MASK);
- /* Process PME interrupt */ - if (isr0_status & PCIE_MSG_PM_PME_MASK) { - /* - * Do not clear PME interrupt bit in ISR0, it is cleared by IRQ - * receiver by writing to the PCI_EXP_RTSTA register of emulated - * root bridge. Aardvark HW returns zero for PCI_EXP_FLAGS_IRQ, - * so use PCIe interrupt 0. - */ - if (generic_handle_domain_irq(pcie->irq_domain, 0) == -EINVAL) - dev_err_ratelimited(&pcie->pdev->dev, "unhandled PME IRQ\n"); - } + /* Process PME interrupt as the first one to do not miss PME requester id */ + if (isr0_status & PCIE_MSG_PM_PME_MASK) + advk_pcie_handle_pme(pcie);
/* Process ERR interrupt */ if (isr0_status & PCIE_ISR0_ERR_MASK) {
From: Pali Roh�r pali@kernel.org
commit 815bc313686783e3a1823ec0efc332c70e6bd976 upstream.
Emulated root bridge currently provides only one Legacy INTA interrupt which is used for reporting PCIe PME and ERR events and handled by kernel PCIe PME and AER drivers.
Aardvark HW reports these PME and ERR events separately, so there is no need to mix real INTA interrupt and emulated INTA interrupt for PCIe PME and AER drivers.
Register a new advk-RP (as in Root Port) irq chip and a new irq domain for emulated root bridge and use this new separate irq domain for providing INTA interrupt from emulated root bridge for PME and ERR events.
The real INTA interrupt from real devices is now separate.
A custom map_irq callback function on PCI host bridge structure is used to allocate IRQ mapping for emulated root bridge from new irq domain. Original callback of_irq_parse_and_map_pci() is used for all other devices as before.
Link: https://lore.kernel.org/r/20220110015018.26359-19-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 69 +++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+), 2 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -273,6 +273,7 @@ struct advk_pcie { } wins[OB_WIN_COUNT]; u8 wins_count; int irq; + struct irq_domain *rp_irq_domain; struct irq_domain *irq_domain; struct irq_chip irq_chip; raw_spinlock_t irq_lock; @@ -1436,6 +1437,44 @@ static void advk_pcie_remove_irq_domain( irq_domain_remove(pcie->irq_domain); }
+static struct irq_chip advk_rp_irq_chip = { + .name = "advk-RP", +}; + +static int advk_pcie_rp_irq_map(struct irq_domain *h, + unsigned int virq, irq_hw_number_t hwirq) +{ + struct advk_pcie *pcie = h->host_data; + + irq_set_chip_and_handler(virq, &advk_rp_irq_chip, handle_simple_irq); + irq_set_chip_data(virq, pcie); + + return 0; +} + +static const struct irq_domain_ops advk_pcie_rp_irq_domain_ops = { + .map = advk_pcie_rp_irq_map, + .xlate = irq_domain_xlate_onecell, +}; + +static int advk_pcie_init_rp_irq_domain(struct advk_pcie *pcie) +{ + pcie->rp_irq_domain = irq_domain_add_linear(NULL, 1, + &advk_pcie_rp_irq_domain_ops, + pcie); + if (!pcie->rp_irq_domain) { + dev_err(&pcie->pdev->dev, "Failed to add Root Port IRQ domain\n"); + return -ENOMEM; + } + + return 0; +} + +static void advk_pcie_remove_rp_irq_domain(struct advk_pcie *pcie) +{ + irq_domain_remove(pcie->rp_irq_domain); +} + static void advk_pcie_handle_pme(struct advk_pcie *pcie) { u32 requester = advk_readl(pcie, PCIE_MSG_LOG_REG) >> 16; @@ -1457,7 +1496,7 @@ static void advk_pcie_handle_pme(struct if (!(le16_to_cpu(pcie->bridge.pcie_conf.rootctl) & PCI_EXP_RTCTL_PMEIE)) return;
- if (generic_handle_domain_irq(pcie->irq_domain, 0) == -EINVAL) + if (generic_handle_domain_irq(pcie->rp_irq_domain, 0) == -EINVAL) dev_err_ratelimited(&pcie->pdev->dev, "unhandled PME IRQ\n"); } } @@ -1509,7 +1548,7 @@ static void advk_pcie_handle_int(struct * Aardvark HW returns zero for PCI_ERR_ROOT_AER_IRQ, so use * PCIe interrupt 0 */ - if (generic_handle_domain_irq(pcie->irq_domain, 0) == -EINVAL) + if (generic_handle_domain_irq(pcie->rp_irq_domain, 0) == -EINVAL) dev_err_ratelimited(&pcie->pdev->dev, "unhandled ERR IRQ\n"); }
@@ -1553,6 +1592,21 @@ static void advk_pcie_irq_handler(struct chained_irq_exit(chip, desc); }
+static int advk_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin) +{ + struct advk_pcie *pcie = dev->bus->sysdata; + + /* + * Emulated root bridge has its own emulated irq chip and irq domain. + * Argument pin is the INTx pin (1=INTA, 2=INTB, 3=INTC, 4=INTD) and + * hwirq for irq_create_mapping() is indexed from zero. + */ + if (pci_is_root_bus(dev->bus)) + return irq_create_mapping(pcie->rp_irq_domain, pin - 1); + else + return of_irq_parse_and_map_pci(dev, slot, pin); +} + static void __maybe_unused advk_pcie_disable_phy(struct advk_pcie *pcie) { phy_power_off(pcie->phy); @@ -1754,14 +1808,24 @@ static int advk_pcie_probe(struct platfo return ret; }
+ ret = advk_pcie_init_rp_irq_domain(pcie); + if (ret) { + dev_err(dev, "Failed to initialize irq\n"); + advk_pcie_remove_msi_irq_domain(pcie); + advk_pcie_remove_irq_domain(pcie); + return ret; + } + irq_set_chained_handler_and_data(pcie->irq, advk_pcie_irq_handler, pcie);
bridge->sysdata = pcie; bridge->ops = &advk_pcie_ops; + bridge->map_irq = advk_pcie_map_irq;
ret = pci_host_probe(bridge); if (ret < 0) { irq_set_chained_handler_and_data(pcie->irq, NULL, NULL); + advk_pcie_remove_rp_irq_domain(pcie); advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie); return ret; @@ -1813,6 +1877,7 @@ static int advk_pcie_remove(struct platf irq_set_chained_handler_and_data(pcie->irq, NULL, NULL);
/* Remove IRQ domains */ + advk_pcie_remove_rp_irq_domain(pcie); advk_pcie_remove_msi_irq_domain(pcie); advk_pcie_remove_irq_domain(pcie);
From: Pali Roh�r pali@kernel.org
commit b08e5b53d17be58eb2311d6790a84fe2c200ee47 upstream.
Callback for irq_mask_ack() is the same as for irq_mask(). As there is no special handling for irq_ack(), there is no need to define irq_mask_ack() too.
Link: https://lore.kernel.org/r/20220110015018.26359-20-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Marc Zyngier maz@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1415,7 +1415,6 @@ static int advk_pcie_init_irq_domain(str }
irq_chip->irq_mask = advk_pcie_irq_mask; - irq_chip->irq_mask_ack = advk_pcie_irq_mask; irq_chip->irq_unmask = advk_pcie_irq_unmask;
pcie->irq_domain =
From: Pali Roh�r pali@kernel.org
commit befa71000160b39c1bf6cdfca6837bb5e9d372d7 upstream.
By default, all Legacy INTx interrupts are masked, so there is no need to mask this interrupt during irq_map() callback.
Link: https://lore.kernel.org/r/20220110015018.26359-21-kabel@kernel.org Signed-off-by: Pali Roh�r pali@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1332,7 +1332,6 @@ static int advk_pcie_irq_map(struct irq_ { struct advk_pcie *pcie = h->host_data;
- advk_pcie_irq_mask(irq_get_irq_data(virq)); irq_set_status_flags(virq, IRQ_LEVEL); irq_set_chip_and_handler(virq, &pcie->irq_chip, handle_level_irq);
From: "Marek Beh�n" kabel@kernel.org
commit 0c36ab437e1d94b6628b006a1d48f05ea3b0b222 upstream.
This function is now always used in driver remove method, drop the __maybe_unused attribute.
Link: https://lore.kernel.org/r/20220110015018.26359-22-kabel@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1605,7 +1605,7 @@ static int advk_pcie_map_irq(const struc return of_irq_parse_and_map_pci(dev, slot, pin); }
-static void __maybe_unused advk_pcie_disable_phy(struct advk_pcie *pcie) +static void advk_pcie_disable_phy(struct advk_pcie *pcie) { phy_power_off(pcie->phy); phy_exit(pcie->phy);
From: "Marek Beh�n" kabel@kernel.org
commit 92f4ffecc4170ce29e67a1f8d51c168c3de95fb2 upstream.
Update the comment about what happens when link goes down after we have checked for link-up. If a PIO request is done while link-down, we have a serious problem.
Link: https://lore.kernel.org/r/20220110015018.26359-23-kabel@kernel.org Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Beh�n kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -998,8 +998,12 @@ static bool advk_pcie_valid_device(struc return false;
/* - * If the link goes down after we check for link-up, nothing bad - * happens but the config access times out. + * If the link goes down after we check for link-up, we have a problem: + * if a PIO request is executed while link-down, the whole controller + * gets stuck in a non-functional state, and even after link comes up + * again, PIO requests won't work anymore, and a reset of the whole PCIe + * controller is needed. Therefore we need to prevent sending PIO + * requests while the link is down. */ if (!pci_is_root_bus(bus) && !advk_pcie_link_up(pcie)) return false;
On Tue, 10 May 2022 15:06:22 +0200, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
5.15.39-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
On 5/10/22 06:06, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Tue, May 10, 2022, at 9:06 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
5.15.39-rc1 compiled and booted with no errors or regressions on my x86_64 test system.
Tested-by: Slade Watkins slade@sladewatkins.com
Cheers,
-srw
On 5/10/22 7:06 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, May 10, 2022 at 03:06:22PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
Build results: total: 156 pass: 156 fail: 0 Qemu test results: total: 488 pass: 488 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, 10 May 2022 at 19:00, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.15.39-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.15.y * git commit: 60041d0985244b5cda37d857a7807f2d572b3762 * git describe: v5.15.37-314-g60041d098524 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15....
## Test Regressions (compared to v5.15.37-259-gab77581473a3) No test regressions found.
## Metric Regressions (compared to v5.15.37-259-gab77581473a3) No metric regressions found.
## Test Fixes (compared to v5.15.37-259-gab77581473a3) No test fixes found.
## Metric Fixes (compared to v5.15.37-259-gab77581473a3) No metric fixes found.
## Test result summary total: 103063, pass: 87539, fail: 626, skip: 13925, xfail: 973
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 291 total, 291 passed, 0 failed * arm64: 41 total, 41 passed, 0 failed * i386: 39 total, 39 passed, 0 failed * mips: 37 total, 37 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 60 total, 54 passed, 6 failed * riscv: 27 total, 22 passed, 5 failed * s390: 21 total, 21 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 41 total, 41 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * ssuite * timesync-off * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On 5/10/22 6:06 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, 10 May 2022 15:06:22 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.39-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.15: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 114 tests: 114 pass, 0 fail
Linux version: 5.15.39-rc1-g60041d098524 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi Greg,
On Tue, May 10, 2022 at 03:06:22PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
Build test (gcc-11): mips (gcc version 11.2.1 20220408): 62 configs -> no failure arm (gcc version 11.2.1 20220408): 100 configs -> no new failure arm64 (gcc version 11.2.1 20220408): 3 configs -> no failure x86_64 (gcc version 11.2.1 20220408): 4 configs -> no failure
Build test (gcc-12): All the allmodconfig builds failed. Will check later what is needed.
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2] mips: Booted on ci20 board. No regression. [3]
[1]. https://openqa.qa.codethink.co.uk/tests/1122 [2]. https://openqa.qa.codethink.co.uk/tests/1126 [3]. https://openqa.qa.codethink.co.uk/tests/1127
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
This is the start of the stable review cycle for the 5.15.39 release. There are 135 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 May 2022 13:07:16 +0000. Anything received after that time might be too late.
Compiled and booted on x86 & arm64 systems. No dmesg regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
linux-stable-mirror@lists.linaro.org