This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.16.9-rc1
SeongJae Park sj@kernel.org samples/damon/prcl: avoid starting DAMON before initialization
Chen-Yu Tsai wens@csie.org clk: sunxi-ng: mp: Fix dual-divider clock rate readback
SeongJae Park sj@kernel.org samples/damon/mtier: avoid starting DAMON before initialization
Honggyu Kim honggyu.kim@sk.com samples/damon: change enable parameters to enabled
SeongJae Park sj@kernel.org samples/damon/prcl: fix boot time enable crash
Alex Elder elder@riscstar.com dt-bindings: serial: 8250: move a constraint
Yixun Lan dlan@gentoo.org dt-bindings: serial: 8250: spacemit: set clocks property as required
Frank Li Frank.Li@nxp.com dt-bindings: serial: 8250: allow clock 'uartclk' and 'reg' for nxp,lpc1850-uart
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: nl: announce deny-join-id0 flag
Antheas Kapenekakis lkml@antheas.dev platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
Antheas Kapenekakis lkml@antheas.dev platform/x86: asus-wmi: Fix ROG button mapping, tablet mode on ASUS ROG Z13
Yang Xiuwei yangxiuwei@kylinos.cn io_uring: fix incorrect io_kiocb reference in io_link_skb
Stefan Metzmacher metze@samba.org smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path
Paulo Alcantara pc@manguebit.org smb: client: fix file open check in __cifs_unlink()
Jens Axboe axboe@kernel.dk io_uring/msg_ring: kill alloc_cache for io_kiocb allocations
Herbert Xu herbert@gondor.apana.org.au crypto: af_alg - Set merge to zero early in af_alg_sendmsg
Stefan Metzmacher metze@samba.org smb: client: let smbd_destroy() call disable_work_sync(&info->post_send_credits_work)
Stefan Metzmacher metze@samba.org smb: client: use disable[_delayed]_work_sync in smbdirect.c
Paulo Alcantara pc@manguebit.org smb: client: fix filename matching of deferred files
Stefan Metzmacher metze@samba.org smb: client: let recv_done verify data_offset, data_length and remaining_data_length
Stefan Metzmacher metze@samba.org smb: client: make use of struct smbdirect_recv_io
Stefan Metzmacher metze@samba.org smb: smbdirect: introduce struct smbdirect_recv_io
Stefan Metzmacher metze@samba.org smb: client: make use of smbdirect_socket->recv_io.expected
Stefan Metzmacher metze@samba.org smb: smbdirect: introduce smbdirect_socket.recv_io.expected
Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com drm/xe/guc: Set RCS/CCS yield policy
Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com drm/xe/guc: Enable extended CAT error reporting
Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com drm/xe: Fix error handling if PXP fails to start
Takashi Iwai tiwai@suse.de ALSA: usb: qcom: Fix false-positive address space check
Dan Carpenter dan.carpenter@linaro.org drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
Qi Xi xiqi2@huawei.com drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path
Loic Poulain loic.poulain@oss.qualcomm.com drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ
Michal Wajdeczko michal.wajdeczko@intel.com drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
Shuicheng Lin shuicheng.lin@intel.com drm/xe/tile: Release kobject for the failure path
Venkata Prasad Potturu venkataprasad.potturu@amd.com ASoC: amd: acp: Fix incorrect retrival of acp_chip_info
Vasant Hegde vasant.hegde@amd.com iommu/amd: Fix alias device DTE setting
Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com ASoC: Intel: catpt: Expose correct bit depth to userspace
Charles Keepax ckeepax@opensource.cirrus.com ASoC: SDCA: Fix return value in sdca_regmap_mbq_size()
Colin Ian King colin.i.king@gmail.com ASoC: SOF: Intel: hda-stream: Fix incorrect variable used in error message
Dan Carpenter dan.carpenter@linaro.org ASoC: codec: sma1307: Fix memory corruption in sma1307_setting_loaded()
Charles Keepax ckeepax@opensource.cirrus.com ASoC: wm8974: Correct PLL rate rounding
Charles Keepax ckeepax@opensource.cirrus.com ASoC: wm8940: Correct typo in control name
Charles Keepax ckeepax@opensource.cirrus.com ASoC: wm8940: Correct PLL rate rounding
Praful Adiga praful.adiga@gmail.com ALSA: hda/realtek: Fix mute led for HP Laptop 15-dw4xx
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: avoid spurious errors on TCP disconnect
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: connect: catch IO errors on listen side
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: propagate shutdown to subflows when possible
Håkon Bugge haakon.bugge@oracle.com rds: ib: Increment i_fastreg_wrs before bailing out
Borislav Petkov (AMD) bp@alien8.de crypto: ccp - Always pass in an error pointer to __sev_platform_shutdown_locked()
Sébastien Szymanski sebastien.szymanski@armadeus.com gpiolib: acpi: initialize acpi_gpio_info struct
Hans de Goede hansg@kernel.org net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
Jens Axboe axboe@kernel.dk io_uring: include dying ring in task_work "should cancel" state
Max Kellermann max.kellermann@ionos.com io_uring/io-wq: fix `max_workers` breakage and `nr_workers` underflow
Mario Limonciello mario.limonciello@amd.com drm/amd: Only restore cached manual clock settings in restore if OD enabled
Ivan Lipski ivan.lipski@amd.com drm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put
Alex Deucher alexander.deucher@amd.com drm/amdgpu: suspend KFD and KGD user queues for S0ix
Alex Deucher alexander.deucher@amd.com drm/amdkfd: add proper handling for S0ix
Maciej S. Szmigiero maciej.szmigiero@oracle.com KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
Tom Lendacky thomas.lendacky@amd.com x86/sev: Guard sev_evict_cache() with CONFIG_AMD_MEM_ENCRYPT
Ben Chuang ben.chuang@genesyslogic.com.tw mmc: sdhci-uhs2: Fix calling incorrect sdhci_set_clock() function
Ben Chuang ben.chuang@genesyslogic.com.tw mmc: sdhci-pci-gli: GL9767: Fix initializing the UHS-II interface during a power-on
Ben Chuang ben.chuang@genesyslogic.com.tw mmc: sdhci: Move the code related to setting the clock from sdhci_set_ios_common() into sdhci_set_ios()
Thomas Fourier fourier.thomas@gmail.com mmc: mvsdio: Fix dma_unmap_sg() nents value
Mohammad Rafi Shaik mohammad.rafi.shaik@oss.qualcomm.com ASoC: qcom: q6apm-lpass-dais: Fix missing set_fmt DAI op for I2S
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org ASoC: qcom: q6apm-lpass-dais: Fix NULL pointer dereference if source graph failed
Mohammad Rafi Shaik mohammad.rafi.shaik@oss.qualcomm.com ASoC: qcom: audioreach: Fix lpaif_type configuration for the I2S interface
Maciej Strozek mstrozek@opensource.cirrus.com ASoC: SDCA: Add quirk for incorrect function types for 3 systems
Qu Wenruo wqu@suse.com btrfs: tree-checker: fix the incorrect inode ref size check
Niklas Schnelle schnelle@linux.ibm.com iommu/s390: Make attach succeed when the device was surprise removed
Matthew Rosato mjrosato@linux.ibm.com iommu/s390: Fix memory corruption when using identity domain
Vasant Hegde vasant.hegde@amd.com iommu/amd/pgtbl: Fix possible race while increase page table level
Zhen Ni zhen.ni@easystack.cn iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init()
Eugene Koira eugkoira@amazon.com iommu/vt-d: Fix __domain_mapping()'s usage of switch_to_super_page()
Bibo Mao maobibo@loongson.cn LoongArch: KVM: Fix VM migration failure with PTW enabled
Bibo Mao maobibo@loongson.cn LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_pch_pic_regs_access()
Bibo Mao maobibo@loongson.cn LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_sw_status_access()
Bibo Mao maobibo@loongson.cn LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_regs_access()
Bibo Mao maobibo@loongson.cn LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()
Tiezhu Yang yangtiezhu@loongson.cn LoongArch: Handle jump tables options for RUST
Tiezhu Yang yangtiezhu@loongson.cn LoongArch: Make LTO case independent in Makefile
Tao Cui cuitao@kylinos.cn LoongArch: Check the return value when creating kobj
Huacai Chen chenhuacai@kernel.org LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled
Guangshuo Li 202321181@mail.sdu.edu.cn LoongArch: vDSO: Check kcalloc() result in init_vdso()
Tiezhu Yang yangtiezhu@loongson.cn LoongArch: Fix unreliable stack for live patching
Tiezhu Yang yangtiezhu@loongson.cn objtool/LoongArch: Mark special atomic instruction as INSN_BUG type
Tiezhu Yang yangtiezhu@loongson.cn objtool/LoongArch: Mark types based on break immediate code
Tiezhu Yang yangtiezhu@loongson.cn LoongArch: Update help info of ARCH_STRICT_ALIGN
Hugh Dickins hughd@google.com mm: folio_may_be_lru_cached() unless folio_test_large()
Hugh Dickins hughd@google.com mm: revert "mm: vmscan.c: fix OOM on swap stress test"
Hugh Dickins hughd@google.com mm/gup: local lru_add_drain() to avoid lru_add_drain_all()
Li Zhe lizhe.67@bytedance.com gup: optimize longterm pin_user_pages() for large folio
Hugh Dickins hughd@google.com mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch"
Hugh Dickins hughd@google.com mm/gup: check ref_count instead of lru before migration
Mikulas Patocka mpatocka@redhat.com dm-stripe: fix a possible integer overflow
Mikulas Patocka mpatocka@redhat.com dm-raid: don't set io_min and io_opt for raid1
austinchang austinchang@synology.com btrfs: initialize inode::file_extent_tree after i_mode has been set
Andrea Righi arighi@nvidia.com Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
H. Nikolaus Schaller hns@goldelico.com power: supply: bq27xxx: restrict no-battery detection to bq27000
H. Nikolaus Schaller hns@goldelico.com power: supply: bq27xxx: fix error return in case of no bq27000 hdq battery
Herbert Xu herbert@gondor.apana.org.au crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg
Nathan Chancellor nathan@kernel.org nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
Sergey Senozhatsky senozhatsky@chromium.org zram: fix slot write race condition
Stefan Metzmacher metze@samba.org ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size
Namjae Jeon linkinjeon@kernel.org ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer
Duoming Zhou duoming@zju.edu.cn octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
Duoming Zhou duoming@zju.edu.cn cnic: Fix use-after-free bugs in cnic_delete_task
Alexey Nepomnyashih sdl@nppct.ru net: liquidio: fix overflow in octeon_init_instr_queue()
Eric Dumazet edumazet@google.com net: clear sk->sk_ino in sk_set_socket(sk, NULL)
Tariq Toukan tariqt@nvidia.com Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
Jakub Kicinski kuba@kernel.org tls: make sure to abort the stream if headers are bogus
Kuniyuki Iwashima kuniyu@google.com tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
Sathesh B Edara sedara@marvell.com octeon_ep: fix VF MAC address lifecycle handling
Hangbin Liu liuhangbin@gmail.com bonding: don't set oif to bond dev when getting NS target destination
Lama Kayal lkayal@nvidia.com net/mlx5e: Add a miss level for ipsec crypto offload
Jianbo Liu jianbol@nvidia.com net/mlx5e: Harden uplink netdev access against device unbind
Remy D. Farley one-d-wide@protonmail.com doc/netlink: Fix typos in operation attributes
Kohei Enju enjuk@amazon.com igc: don't fail igc_probe() on LED setup error
Jedrzej Jagielski jedrzej.jagielski@intel.com ixgbe: destroy aci.lock later within ixgbe_remove path
Jedrzej Jagielski jedrzej.jagielski@intel.com ixgbe: initialize aci.lock before it's used
Maciej Fijalkowski maciej.fijalkowski@intel.com i40e: remove redundant memory barrier when cleaning Tx descs
Jacob Keller jacob.e.keller@intel.com ice: fix Rx page leak on multi-buffer frames
Yeounsu Moon yyyynoom@gmail.com net: natsemi: fix `rx_dropped` double accounting on `netif_rx()` failure
Geliang Tang geliang@kernel.org selftests: mptcp: sockopt: fix error messages
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: tfo: record 'deny join id0' info
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: userspace pm: validate deny-join-id0 flag
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: set remote_deny_join_id0 on SYN recv
Hangbin Liu liuhangbin@gmail.com bonding: set random address only when slaves already exist
Ilya Maximets i.maximets@ovn.org net: dst_metadata: fix IP_DF bit not extracted from tunnel headers
Jamie Bainbridge jamie.bainbridge@gmail.com qed: Don't collect too many protection override GRC elements
Kamal Heib kheib@redhat.com octeon_ep: Validate the VF ID
David Howells dhowells@redhat.com rxrpc: Fix untrusted unsigned subtract
David Howells dhowells@redhat.com rxrpc: Fix unhandled errors in rxgk_verify_packet_integrity()
Ivan Vecera ivecera@redhat.com dpll: fix clock quality level reporting
Anderson Nascimento anderson@allelesecurity.com net/tcp: Fix a NULL pointer dereference when using TCP-AO with TCP_REPAIR
Ioana Ciornei ioana.ciornei@nxp.com dpaa2-switch: fix buffer pool seeding for control traffic
Li Tian litian@redhat.com net/mlx5: Not returning mlx5_link_info table when speed is unknown
Tiwei Bie tiwei.btw@antgroup.com um: Fix FD copy size in os_rcv_fd_msg()
Miaoqian Lin linmq006@gmail.com um: virtio_uml: Fix use-after-free after put_device in probe
Stefan Metzmacher metze@samba.org smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
Geert Uytterhoeven geert+renesas@glider.be pcmcia: omap_cf: Mark driver struct with __refdata to prevent section mismatch
Liao Yuanhong liaoyuanhong@vivo.com wifi: mac80211: fix incorrect type for ret
Lachlan Hodges lachlan.hodges@morsemicro.com wifi: mac80211: increase scan_ies_len for S1G
Felix Fietkau nbd@nbd.name wifi: mt76: do not add non-sta wcid entries to the poll list
Takashi Sakamoto o-takashi@sakamocchi.jp ALSA: firewire-motu: drop EPOLLOUT from poll return values as write is not supported
Christoph Hellwig hch@lst.de nvme: fix PI insert on write
Ajay.Kathat@microchip.com Ajay.Kathat@microchip.com wifi: wilc1000: avoid buffer overflow in WID string configuration
Ian Rogers irogers@google.com perf maps: Ensure kmap is set up for all inserts
Johannes Thumshirn johannes.thumshirn@wdc.com btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
Filipe Manana fdmanana@suse.com btrfs: fix invalid extref key setup when replaying dentry
Chen Ridong chenridong@huawei.com cgroup: split cgroup_destroy_wq into 3 workqueues
-------------
Diffstat:
Documentation/devicetree/bindings/serial/8250.yaml | 77 ++++++--- Documentation/netlink/specs/conntrack.yaml | 9 +- Documentation/netlink/specs/mptcp_pm.yaml | 4 +- Makefile | 4 +- arch/loongarch/Kconfig | 12 +- arch/loongarch/Makefile | 15 +- arch/loongarch/include/asm/acenv.h | 7 +- arch/loongarch/include/asm/kvm_mmu.h | 20 ++- arch/loongarch/kernel/env.c | 2 + arch/loongarch/kernel/stacktrace.c | 3 +- arch/loongarch/kernel/vdso.c | 3 + arch/loongarch/kvm/intc/eiointc.c | 87 ++++++---- arch/loongarch/kvm/intc/pch_pic.c | 21 ++- arch/loongarch/kvm/mmu.c | 8 +- arch/s390/include/asm/pci_insn.h | 10 +- arch/um/drivers/virtio_uml.c | 6 +- arch/um/os-Linux/file.c | 2 +- arch/x86/include/asm/sev.h | 38 ++--- arch/x86/kvm/svm/svm.c | 3 +- crypto/af_alg.c | 10 +- drivers/block/zram/zram_drv.c | 8 +- drivers/clk/sunxi-ng/ccu_mp.c | 2 +- drivers/crypto/ccp/sev-dev.c | 2 +- drivers/dpll/dpll_netlink.c | 4 +- drivers/gpio/gpiolib-acpi-core.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 12 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 36 ++++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 39 ++++- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +- drivers/gpu/drm/bridge/analogix/anx7625.c | 6 +- .../gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 6 +- drivers/gpu/drm/xe/abi/guc_actions_abi.h | 5 + drivers/gpu/drm/xe/abi/guc_klvs_abi.h | 40 +++++ drivers/gpu/drm/xe/xe_exec_queue.c | 22 ++- drivers/gpu/drm/xe/xe_exec_queue_types.h | 8 +- drivers/gpu/drm/xe/xe_execlist.c | 25 ++- drivers/gpu/drm/xe/xe_execlist_types.h | 2 +- drivers/gpu/drm/xe/xe_gt.c | 3 +- drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 1 - drivers/gpu/drm/xe/xe_guc.c | 62 ++++++- drivers/gpu/drm/xe/xe_guc.h | 1 + drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 4 +- drivers/gpu/drm/xe/xe_guc_submit.c | 141 +++++++++++++--- drivers/gpu/drm/xe/xe_guc_submit.h | 2 + drivers/gpu/drm/xe/xe_tile_sysfs.c | 12 +- drivers/gpu/drm/xe/xe_uc.c | 4 + drivers/gpu/drm/xe/xe_vm.c | 4 +- drivers/iommu/amd/amd_iommu_types.h | 1 + drivers/iommu/amd/init.c | 9 +- drivers/iommu/amd/io_pgtable.c | 25 ++- drivers/iommu/intel/iommu.c | 7 +- drivers/iommu/s390-iommu.c | 29 +++- drivers/md/dm-raid.c | 6 +- drivers/md/dm-stripe.c | 10 +- drivers/mmc/host/mvsdio.c | 2 +- drivers/mmc/host/sdhci-pci-gli.c | 68 +++++++- drivers/mmc/host/sdhci-uhs2.c | 3 +- drivers/mmc/host/sdhci.c | 34 ++-- drivers/net/bonding/bond_main.c | 2 +- drivers/net/ethernet/broadcom/cnic.c | 3 +- .../net/ethernet/cavium/liquidio/request_manager.c | 2 +- .../net/ethernet/freescale/dpaa2/dpaa2-switch.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 - drivers/net/ethernet/intel/ice/ice_txrx.c | 80 ++++----- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - drivers/net/ethernet/intel/igc/igc.h | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 12 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 22 +-- .../net/ethernet/marvell/octeon_ep/octep_main.c | 16 ++ .../ethernet/marvell/octeon_ep/octep_pfvf_mbox.c | 3 + .../net/ethernet/marvell/octeontx2/nic/otx2_ptp.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 1 + .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 + .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 - drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 27 ++- drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 15 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 6 +- drivers/net/ethernet/natsemi/ns83820.c | 13 +- drivers/net/ethernet/qlogic/qed/qed_debug.c | 7 +- drivers/net/wireless/mediatek/mt76/mac80211.c | 2 +- drivers/net/wireless/microchip/wilc1000/wlan_cfg.c | 39 +++-- drivers/net/wireless/microchip/wilc1000/wlan_cfg.h | 5 +- drivers/nvme/host/core.c | 18 +- drivers/pcmcia/omap_cf.c | 8 +- drivers/platform/x86/asus-nb-wmi.c | 25 ++- drivers/platform/x86/asus-wmi.h | 3 +- drivers/power/supply/bq27xxx_battery.c | 4 +- fs/btrfs/delayed-inode.c | 3 - fs/btrfs/inode.c | 11 +- fs/btrfs/tree-checker.c | 4 +- fs/btrfs/tree-log.c | 2 +- fs/btrfs/zoned.c | 2 +- fs/nilfs2/sysfs.c | 4 +- fs/nilfs2/sysfs.h | 8 +- fs/smb/client/cifsproto.h | 4 +- fs/smb/client/inode.c | 23 ++- fs/smb/client/misc.c | 36 ++-- fs/smb/client/smbdirect.c | 132 +++++++++------ fs/smb/client/smbdirect.h | 23 --- fs/smb/common/smbdirect/smbdirect_socket.h | 29 ++++ fs/smb/server/transport_rdma.c | 185 ++++++++++++++------- include/crypto/if_alg.h | 10 +- include/linux/io_uring_types.h | 3 - include/linux/mlx5/driver.h | 1 + include/linux/swap.h | 10 ++ include/net/dst_metadata.h | 11 +- include/net/sock.h | 5 +- include/sound/sdca.h | 1 + include/uapi/linux/mptcp.h | 2 + include/uapi/linux/mptcp_pm.h | 4 +- io_uring/io-wq.c | 6 +- io_uring/io_uring.c | 10 +- io_uring/io_uring.h | 4 +- io_uring/msg_ring.c | 24 +-- io_uring/notif.c | 2 +- io_uring/poll.c | 2 +- io_uring/timeout.c | 2 +- io_uring/uring_cmd.c | 2 +- kernel/cgroup/cgroup.c | 43 ++++- kernel/sched/ext.c | 6 +- mm/gup.c | 52 ++++-- mm/mlock.c | 6 +- mm/swap.c | 50 +++--- mm/vmscan.c | 2 +- net/ipv4/tcp.c | 5 + net/ipv4/tcp_ao.c | 4 +- net/mac80211/driver-ops.h | 2 +- net/mac80211/main.c | 7 +- net/mptcp/options.c | 6 +- net/mptcp/pm_netlink.c | 7 + net/mptcp/protocol.c | 16 ++ net/mptcp/subflow.c | 4 + net/rds/ib_frmr.c | 20 ++- net/rfkill/rfkill-gpio.c | 4 +- net/rxrpc/rxgk.c | 18 +- net/rxrpc/rxgk_app.c | 29 +++- net/rxrpc/rxgk_common.h | 14 +- net/tls/tls.h | 1 + net/tls/tls_strp.c | 14 +- net/tls/tls_sw.c | 3 +- samples/damon/mtier.c | 25 +-- samples/damon/prcl.c | 31 +++- samples/damon/wsse.c | 22 +-- sound/firewire/motu/motu-hwdep.c | 2 +- sound/pci/hda/patch_realtek.c | 1 + sound/soc/amd/acp/acp-i2s.c | 11 +- sound/soc/codecs/sma1307.c | 7 +- sound/soc/codecs/wm8940.c | 9 +- sound/soc/codecs/wm8974.c | 8 +- sound/soc/intel/catpt/pcm.c | 23 ++- sound/soc/qcom/qdsp6/audioreach.c | 1 + sound/soc/qcom/qdsp6/q6apm-lpass-dais.c | 7 +- sound/soc/sdca/sdca_device.c | 20 +++ sound/soc/sdca/sdca_functions.c | 13 +- sound/soc/sdca/sdca_regmap.c | 2 +- sound/soc/sof/intel/hda-stream.c | 2 +- sound/usb/qcom/qc_audio_offload.c | 92 +++++----- tools/arch/loongarch/include/asm/inst.h | 12 ++ tools/objtool/arch/loongarch/decode.c | 33 +++- tools/perf/util/maps.c | 9 +- tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 +- tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 16 +- tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 7 + tools/testing/selftests/net/mptcp/userspace_pm.sh | 14 +- 169 files changed, 1797 insertions(+), 824 deletions(-)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ridong chenridong@huawei.com
[ Upstream commit 79f919a89c9d06816dbdbbd168fa41d27411a7f9 ]
A hung task can occur during [1] LTP cgroup testing when repeatedly mounting/unmounting perf_event and net_prio controllers with systemd.unified_cgroup_hierarchy=1. The hang manifests in cgroup_lock_and_drain_offline() during root destruction.
Related case: cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio
Call Trace: cgroup_lock_and_drain_offline+0x14c/0x1e8 cgroup_destroy_root+0x3c/0x2c0 css_free_rwork_fn+0x248/0x338 process_one_work+0x16c/0x3b8 worker_thread+0x22c/0x3b0 kthread+0xec/0x100 ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1 mount perf_event umount net_prio cgroup1_get_tree cgroup_kill_sb rebind_subsystems // root destruction enqueues // cgroup_destroy_wq // kill all perf_event css // one perf_event css A is dying // css A offline enqueues cgroup_destroy_wq // root destruction will be executed first css_free_rwork_fn cgroup_destroy_root cgroup_lock_and_drain_offline // some perf descendants are dying // cgroup_destroy_wq max_active = 1 // waiting for css A to die
Problem scenario: 1. CPU0 mounts perf_event (rebind_subsystems) 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work 3. A dying perf_event CSS gets queued for offline after root destruction 4. Root destruction waits for offline completion, but offline work is blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution: Split cgroup_destroy_wq into three dedicated workqueues: cgroup_offline_wq – Handles CSS offline operations cgroup_release_wq – Manages resource release cgroup_free_wq – Performs final memory deallocation
This separation eliminates blocking in the CSS free path while waiting for offline operations to complete.
[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends") Reported-by: Gao Yingjie gaoyingjie@uniontech.com Signed-off-by: Chen Ridong chenridong@huawei.com Suggested-by: Teju Heo tj@kernel.org Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/cgroup/cgroup.c | 43 +++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index a723b7dc6e4e2..20f76b2176501 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -126,8 +126,31 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem); * of concurrent destructions. Use a separate workqueue so that cgroup * destruction work items don't end up filling up max_active of system_wq * which may lead to deadlock. + * + * A cgroup destruction should enqueue work sequentially to: + * cgroup_offline_wq: use for css offline work + * cgroup_release_wq: use for css release work + * cgroup_free_wq: use for free work + * + * Rationale for using separate workqueues: + * The cgroup root free work may depend on completion of other css offline + * operations. If all tasks were enqueued to a single workqueue, this could + * create a deadlock scenario where: + * - Free work waits for other css offline work to complete. + * - But other css offline work is queued after free work in the same queue. + * + * Example deadlock scenario with single workqueue (cgroup_destroy_wq): + * 1. umount net_prio + * 2. net_prio root destruction enqueues work to cgroup_destroy_wq (CPUx) + * 3. perf_event CSS A offline enqueues work to same cgroup_destroy_wq (CPUx) + * 4. net_prio cgroup_destroy_root->cgroup_lock_and_drain_offline. + * 5. net_prio root destruction blocks waiting for perf_event CSS A offline, + * which can never complete as it's behind in the same queue and + * workqueue's max_active is 1. */ -static struct workqueue_struct *cgroup_destroy_wq; +static struct workqueue_struct *cgroup_offline_wq; +static struct workqueue_struct *cgroup_release_wq; +static struct workqueue_struct *cgroup_free_wq;
/* generate an array of cgroup subsystem pointers */ #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys, @@ -5553,7 +5576,7 @@ static void css_release_work_fn(struct work_struct *work) cgroup_unlock();
INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); - queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); + queue_rcu_work(cgroup_free_wq, &css->destroy_rwork); }
static void css_release(struct percpu_ref *ref) @@ -5562,7 +5585,7 @@ static void css_release(struct percpu_ref *ref) container_of(ref, struct cgroup_subsys_state, refcnt);
INIT_WORK(&css->destroy_work, css_release_work_fn); - queue_work(cgroup_destroy_wq, &css->destroy_work); + queue_work(cgroup_release_wq, &css->destroy_work); }
static void init_and_link_css(struct cgroup_subsys_state *css, @@ -5696,7 +5719,7 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, list_del_rcu(&css->sibling); err_free_css: INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); - queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); + queue_rcu_work(cgroup_free_wq, &css->destroy_rwork); return ERR_PTR(err); }
@@ -5934,7 +5957,7 @@ static void css_killed_ref_fn(struct percpu_ref *ref)
if (atomic_dec_and_test(&css->online_cnt)) { INIT_WORK(&css->destroy_work, css_killed_work_fn); - queue_work(cgroup_destroy_wq, &css->destroy_work); + queue_work(cgroup_offline_wq, &css->destroy_work); } }
@@ -6320,8 +6343,14 @@ static int __init cgroup_wq_init(void) * We would prefer to do this in cgroup_init() above, but that * is called before init_workqueues(): so leave this until after. */ - cgroup_destroy_wq = alloc_workqueue("cgroup_destroy", 0, 1); - BUG_ON(!cgroup_destroy_wq); + cgroup_offline_wq = alloc_workqueue("cgroup_offline", 0, 1); + BUG_ON(!cgroup_offline_wq); + + cgroup_release_wq = alloc_workqueue("cgroup_release", 0, 1); + BUG_ON(!cgroup_release_wq); + + cgroup_free_wq = alloc_workqueue("cgroup_free", 0, 1); + BUG_ON(!cgroup_free_wq); return 0; } core_initcall(cgroup_wq_init);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit b62fd63ade7cb573b114972ef8f9fa505be8d74a ]
The offset for an extref item's key is not the object ID of the parent dir, otherwise we would not need the extref item and would use plain ref items. Instead the offset is the result of a hash computation that uses the object ID of the parent dir and the name associated to the entry. So fix this by setting the key offset at replay_one_name() to be the result of calling btrfs_extref_hash().
Fixes: 725af92a6251 ("btrfs: Open-code name_in_log_ref in replay_one_name") Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 56d30ec0f52fc..5466a93a28f58 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1933,7 +1933,7 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
search_key.objectid = log_key.objectid; search_key.type = BTRFS_INODE_EXTREF_KEY; - search_key.offset = key->objectid; + search_key.offset = btrfs_extref_hash(key->objectid, name.name, name.len); ret = backref_in_log(root->log_root, &search_key, key->objectid, &name); if (ret < 0) { goto out;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Thumshirn johannes.thumshirn@wdc.com
[ Upstream commit 5b8d2964754102323ca24495ba94892426284e3a ]
When moving a block-group to the dedicated data relocation space-info in btrfs_zoned_reserve_data_reloc_bg() it is asserted that the newly created block group for data relocation does not contain any zone_unusable bytes.
But on disks with zone_capacity < zone_size, the difference between zone_size and zone_capacity is accounted as zone_unusable.
Instead of asserting that the block-group does not contain any zone_unusable bytes, remove them from the block-groups total size.
Reported-by: Yi Zhang yi.zhang@redhat.com Link: https://lore.kernel.org/linux-block/CAHj4cs8-cS2E+-xQ-d2Bj6vMJZ+CwT_cbdWBTju... Fixes: daa0fde322350 ("btrfs: zoned: fix data relocation block group reservation") Reviewed-by: Naohiro Aota naohiro.aota@wdc.com Tested-by: Yi Zhang yi.zhang@redhat.com Signed-off-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/zoned.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index d7a1193332d94..60937127a0bc6 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2577,9 +2577,9 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info) spin_lock(&space_info->lock); space_info->total_bytes -= bg->length; space_info->disk_total -= bg->length * factor; + space_info->disk_total -= bg->zone_unusable; /* There is no allocation ever happened. */ ASSERT(bg->used == 0); - ASSERT(bg->zone_unusable == 0); /* No super block in a block group on the zoned setup. */ ASSERT(bg->bytes_super == 0); spin_unlock(&space_info->lock);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Rogers irogers@google.com
[ Upstream commit 20c9ccffccd61b37325a0519fb6d485caeecf7fa ]
__maps__fixup_overlap_and_insert may split or directly insert a map, when doing this the map may need to have a kmap set up for the sake of the kmaps. The missing kmap set up fails the check_invariants test in maps, later "Internal error" reports from map__kmap and ultimately causes segfaults.
Similar fixes were added in commit e0e4e0b8b7fa ("perf maps: Add missing map__set_kmap_maps() when replacing a kernel map") and commit 25d9c0301d36 ("perf maps: Set the kmaps for newly created/added kernel maps") but they missed cases. To try to reduce the risk of this, update the kmap directly following any manual insert. This identified another problem in maps__copy_from.
Fixes: e0e4e0b8b7fa ("perf maps: Add missing map__set_kmap_maps() when replacing a kernel map") Fixes: 25d9c0301d36 ("perf maps: Set the kmaps for newly created/added kernel maps") Signed-off-by: Ian Rogers irogers@google.com Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/maps.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c index 85b2a93a59ac6..779f6230130af 100644 --- a/tools/perf/util/maps.c +++ b/tools/perf/util/maps.c @@ -477,6 +477,7 @@ static int __maps__insert(struct maps *maps, struct map *new) } /* Insert the value at the end. */ maps_by_address[nr_maps] = map__get(new); + map__set_kmap_maps(new, maps); if (maps_by_name) maps_by_name[nr_maps] = map__get(new);
@@ -502,8 +503,6 @@ static int __maps__insert(struct maps *maps, struct map *new) if (map__end(new) < map__start(new)) RC_CHK_ACCESS(maps)->ends_broken = true;
- map__set_kmap_maps(new, maps); - return 0; }
@@ -891,6 +890,7 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new) if (before) { map__put(maps_by_address[i]); maps_by_address[i] = before; + map__set_kmap_maps(before, maps);
if (maps_by_name) { map__put(maps_by_name[ni]); @@ -918,6 +918,7 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new) */ map__put(maps_by_address[i]); maps_by_address[i] = map__get(new); + map__set_kmap_maps(new, maps);
if (maps_by_name) { map__put(maps_by_name[ni]); @@ -942,14 +943,13 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new) */ map__put(maps_by_address[i]); maps_by_address[i] = map__get(new); + map__set_kmap_maps(new, maps);
if (maps_by_name) { map__put(maps_by_name[ni]); maps_by_name[ni] = map__get(new); }
- map__set_kmap_maps(new, maps); - check_invariants(maps); return err; } @@ -1019,6 +1019,7 @@ int maps__copy_from(struct maps *dest, struct maps *parent) err = unwind__prepare_access(dest, new, NULL); if (!err) { dest_maps_by_address[i] = new; + map__set_kmap_maps(new, dest); if (dest_maps_by_name) dest_maps_by_name[i] = map__get(new); RC_CHK_ACCESS(dest)->nr_maps = i + 1;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ajay.Kathat@microchip.com Ajay.Kathat@microchip.com
[ Upstream commit fe9e4d0c39311d0f97b024147a0d155333f388b5 ]
Fix the following copy overflow warning identified by Smatch checker.
drivers/net/wireless/microchip/wilc1000/wlan_cfg.c:184 wilc_wlan_parse_response_frame() error: '__memcpy()' 'cfg->s[i]->str' copy overflow (512 vs 65537)
This patch introduces size check before accessing the memory buffer. The checks are base on the WID type of received data from the firmware. For WID string configuration, the size limit is determined by individual element size in 'struct wilc_cfg_str_vals' that is maintained in 'len' field of 'struct wilc_cfg_str'.
Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/linux-wireless/aLFbr9Yu9j_TQTey@stanley.mountain Suggested-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Ajay Singh ajay.kathat@microchip.com Link: https://patch.msgid.link/20250829225829.5423-1-ajay.kathat@microchip.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../wireless/microchip/wilc1000/wlan_cfg.c | 37 ++++++++++++++----- .../wireless/microchip/wilc1000/wlan_cfg.h | 5 ++- 2 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/drivers/net/wireless/microchip/wilc1000/wlan_cfg.c b/drivers/net/wireless/microchip/wilc1000/wlan_cfg.c index 131388886acbf..cfabd5aebb540 100644 --- a/drivers/net/wireless/microchip/wilc1000/wlan_cfg.c +++ b/drivers/net/wireless/microchip/wilc1000/wlan_cfg.c @@ -41,10 +41,10 @@ static const struct wilc_cfg_word g_cfg_word[] = { };
static const struct wilc_cfg_str g_cfg_str[] = { - {WID_FIRMWARE_VERSION, NULL}, - {WID_MAC_ADDR, NULL}, - {WID_ASSOC_RES_INFO, NULL}, - {WID_NIL, NULL} + {WID_FIRMWARE_VERSION, 0, NULL}, + {WID_MAC_ADDR, 0, NULL}, + {WID_ASSOC_RES_INFO, 0, NULL}, + {WID_NIL, 0, NULL} };
#define WILC_RESP_MSG_TYPE_CONFIG_REPLY 'R' @@ -147,44 +147,58 @@ static void wilc_wlan_parse_response_frame(struct wilc *wl, u8 *info, int size)
switch (FIELD_GET(WILC_WID_TYPE, wid)) { case WID_CHAR: + len = 3; + if (len + 2 > size) + return; + while (cfg->b[i].id != WID_NIL && cfg->b[i].id != wid) i++;
if (cfg->b[i].id == wid) cfg->b[i].val = info[4];
- len = 3; break;
case WID_SHORT: + len = 4; + if (len + 2 > size) + return; + while (cfg->hw[i].id != WID_NIL && cfg->hw[i].id != wid) i++;
if (cfg->hw[i].id == wid) cfg->hw[i].val = get_unaligned_le16(&info[4]);
- len = 4; break;
case WID_INT: + len = 6; + if (len + 2 > size) + return; + while (cfg->w[i].id != WID_NIL && cfg->w[i].id != wid) i++;
if (cfg->w[i].id == wid) cfg->w[i].val = get_unaligned_le32(&info[4]);
- len = 6; break;
case WID_STR: + len = 2 + get_unaligned_le16(&info[2]); + while (cfg->s[i].id != WID_NIL && cfg->s[i].id != wid) i++;
- if (cfg->s[i].id == wid) + if (cfg->s[i].id == wid) { + if (len > cfg->s[i].len || (len + 2 > size)) + return; + memcpy(cfg->s[i].str, &info[2], - get_unaligned_le16(&info[2]) + 2); + len); + }
- len = 2 + get_unaligned_le16(&info[2]); break;
default: @@ -384,12 +398,15 @@ int wilc_wlan_cfg_init(struct wilc *wl) /* store the string cfg parameters */ wl->cfg.s[i].id = WID_FIRMWARE_VERSION; wl->cfg.s[i].str = str_vals->firmware_version; + wl->cfg.s[i].len = sizeof(str_vals->firmware_version); i++; wl->cfg.s[i].id = WID_MAC_ADDR; wl->cfg.s[i].str = str_vals->mac_address; + wl->cfg.s[i].len = sizeof(str_vals->mac_address); i++; wl->cfg.s[i].id = WID_ASSOC_RES_INFO; wl->cfg.s[i].str = str_vals->assoc_rsp; + wl->cfg.s[i].len = sizeof(str_vals->assoc_rsp); i++; wl->cfg.s[i].id = WID_NIL; wl->cfg.s[i].str = NULL; diff --git a/drivers/net/wireless/microchip/wilc1000/wlan_cfg.h b/drivers/net/wireless/microchip/wilc1000/wlan_cfg.h index 7038b74f8e8ff..5ae74bced7d74 100644 --- a/drivers/net/wireless/microchip/wilc1000/wlan_cfg.h +++ b/drivers/net/wireless/microchip/wilc1000/wlan_cfg.h @@ -24,12 +24,13 @@ struct wilc_cfg_word {
struct wilc_cfg_str { u16 id; + u16 len; u8 *str; };
struct wilc_cfg_str_vals { - u8 mac_address[7]; - u8 firmware_version[129]; + u8 mac_address[8]; + u8 firmware_version[130]; u8 assoc_rsp[WILC_MAX_ASSOC_RESP_FRAME_SIZE]; };
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit 7ac3c2889bc060c3f67cf44df0dbb093a835c176 ]
I recently ran into an issue where the PI generated using the block layer integrity code differs from that from a kernel using the PRACT fallback when the block layer integrity code is disabled, and I tracked this down to us using PRACT incorrectly.
The NVM Command Set Specification (section 5.33 in 1.2, similar in older versions) specifies the PRACT insert behavior as:
Inserted protection information consists of the computed CRC for the protection information format (refer to section 5.3.1) in the Guard field, the LBAT field value in the Application Tag field, the LBST field value in the Storage Tag field, if defined, and the computed reference tag in the Logical Block Reference Tag.
Where the computed reference tag is defined as following for type 1 and type 2 using the text below that is duplicated in the respective bullet points:
the value of the computed reference tag for the first logical block of the command is the value contained in the Initial Logical Block Reference Tag (ILBRT) or Expected Initial Logical Block Reference Tag (EILBRT) field in the command, and the computed reference tag is incremented for each subsequent logical block.
So we need to set ILBRT field, but we currently don't. Interestingly this works fine on my older type 1 formatted SSD, but Qemu trips up on this. We already set ILBRT for Write Same since commit aeb7bb061be5 ("nvme: set the PRACT bit when using Write Zeroes with T10 PI").
To ease this, move the PI type check into nvme_set_ref_tag.
Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 895fb163d48e6..5395623d2ba6a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -903,6 +903,15 @@ static void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd, u32 upper, lower; u64 ref48;
+ /* only type1 and type 2 PI formats have a reftag */ + switch (ns->head->pi_type) { + case NVME_NS_DPS_PI_TYPE1: + case NVME_NS_DPS_PI_TYPE2: + break; + default: + return; + } + /* both rw and write zeroes share the same reftag format */ switch (ns->head->guard_type) { case NVME_NVM_NS_16B_GUARD: @@ -942,13 +951,7 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
if (nvme_ns_has_pi(ns->head)) { cmnd->write_zeroes.control |= cpu_to_le16(NVME_RW_PRINFO_PRACT); - - switch (ns->head->pi_type) { - case NVME_NS_DPS_PI_TYPE1: - case NVME_NS_DPS_PI_TYPE2: - nvme_set_ref_tag(ns, cmnd, req); - break; - } + nvme_set_ref_tag(ns, cmnd, req); }
return BLK_STS_OK; @@ -1039,6 +1042,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head))) return BLK_STS_NOTSUPP; control |= NVME_RW_PRINFO_PRACT; + nvme_set_ref_tag(ns, cmnd, req); }
if (bio_integrity_flagged(req->bio, BIP_CHECK_GUARD))
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Sakamoto o-takashi@sakamocchi.jp
[ Upstream commit aea3493246c474bc917d124d6fb627663ab6bef0 ]
The ALSA HwDep character device of the firewire-motu driver incorrectly returns EPOLLOUT in poll(2), even though the driver implements no operation for write(2). This misleads userspace applications to believe write() is allowed, potentially resulting in unnecessarily wakeups.
This issue dates back to the driver's initial code added by a commit 71c3797779d3 ("ALSA: firewire-motu: add hwdep interface"), and persisted when POLLOUT was updated to EPOLLOUT by a commit a9a08845e9ac ('vfs: do bulk POLL* -> EPOLL* replacement("").').
This commit fixes the bug.
Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://patch.msgid.link/20250829233749.366222-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/firewire/motu/motu-hwdep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/firewire/motu/motu-hwdep.c b/sound/firewire/motu/motu-hwdep.c index 88d1f4b56e4be..a220ac0c8eb83 100644 --- a/sound/firewire/motu/motu-hwdep.c +++ b/sound/firewire/motu/motu-hwdep.c @@ -111,7 +111,7 @@ static __poll_t hwdep_poll(struct snd_hwdep *hwdep, struct file *file, events = 0; spin_unlock_irq(&motu->lock);
- return events | EPOLLOUT; + return events; }
static int hwdep_get_info(struct snd_motu *motu, void __user *arg)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit a3c99ef88a084e1c2b99dd56bbfa7f89c9be3e92 ]
Polling and airtime reporting is valid for station entries only
Link: https://patch.msgid.link/20250827085352.51636-2-nbd@nbd.name Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mac80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c index 8e6ce16ab5b88..c9e2dca308312 100644 --- a/drivers/net/wireless/mediatek/mt76/mac80211.c +++ b/drivers/net/wireless/mediatek/mt76/mac80211.c @@ -1731,7 +1731,7 @@ EXPORT_SYMBOL_GPL(mt76_wcid_cleanup);
void mt76_wcid_add_poll(struct mt76_dev *dev, struct mt76_wcid *wcid) { - if (test_bit(MT76_MCU_RESET, &dev->phy.state)) + if (test_bit(MT76_MCU_RESET, &dev->phy.state) || !wcid->sta) return;
spin_lock_bh(&dev->sta_poll_lock);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lachlan Hodges lachlan.hodges@morsemicro.com
[ Upstream commit 7e2f3213e85eba00acb4cfe6d71647892d63c3a1 ]
Currently the S1G capability element is not taken into account for the scan_ies_len, which leads to a buffer length validation failure in ieee80211_prep_hw_scan() and subsequent WARN in __ieee80211_start_scan(). This prevents hw scanning from functioning. To fix ensure we accommodate for the S1G capability length.
Signed-off-by: Lachlan Hodges lachlan.hodges@morsemicro.com Link: https://patch.msgid.link/20250826085437.3493-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/main.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/main.c b/net/mac80211/main.c index 1bad353d8a772..35c6755b817a8 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -1136,7 +1136,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) int result, i; enum nl80211_band band; int channels, max_bitrates; - bool supp_ht, supp_vht, supp_he, supp_eht; + bool supp_ht, supp_vht, supp_he, supp_eht, supp_s1g; struct cfg80211_chan_def dflt_chandef = {};
if (ieee80211_hw_check(hw, QUEUE_CONTROL) && @@ -1252,6 +1252,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) supp_vht = false; supp_he = false; supp_eht = false; + supp_s1g = false; for (band = 0; band < NUM_NL80211_BANDS; band++) { const struct ieee80211_sband_iftype_data *iftd; struct ieee80211_supported_band *sband; @@ -1299,6 +1300,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) max_bitrates = sband->n_bitrates; supp_ht = supp_ht || sband->ht_cap.ht_supported; supp_vht = supp_vht || sband->vht_cap.vht_supported; + supp_s1g = supp_s1g || sband->s1g_cap.s1g;
for_each_sband_iftype_data(sband, i, iftd) { u8 he_40_mhz_cap; @@ -1432,6 +1434,9 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) local->scan_ies_len += 2 + sizeof(struct ieee80211_vht_cap);
+ if (supp_s1g) + local->scan_ies_len += 2 + sizeof(struct ieee80211_s1g_cap); + /* * HE cap element is variable in size - set len to allow max size */ if (supp_he) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liao Yuanhong liaoyuanhong@vivo.com
[ Upstream commit a33b375ab5b3a9897a0ab76be8258d9f6b748628 ]
The variable ret is declared as a u32 type, but it is assigned a value of -EOPNOTSUPP. Since unsigned types cannot correctly represent negative values, the type of ret should be changed to int.
Signed-off-by: Liao Yuanhong liaoyuanhong@vivo.com Link: https://patch.msgid.link/20250825022911.139377-1-liaoyuanhong@vivo.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/driver-ops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h index 307587c8a0037..7964a7c5f0b2b 100644 --- a/net/mac80211/driver-ops.h +++ b/net/mac80211/driver-ops.h @@ -1389,7 +1389,7 @@ drv_get_ftm_responder_stats(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, struct cfg80211_ftm_responder_stats *ftm_stats) { - u32 ret = -EOPNOTSUPP; + int ret = -EOPNOTSUPP;
might_sleep(); lockdep_assert_wiphy(local->hw.wiphy);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit d1dfcdd30140c031ae091868fb5bed084132bca1 ]
As described in the added code comment, a reference to .exit.text is ok for drivers registered via platform_driver_probe(). Make this explicit to prevent the following section mismatch warning
WARNING: modpost: drivers/pcmcia/omap_cf: section mismatch in reference: omap_cf_driver+0x4 (section: .data) -> omap_cf_remove (section: .exit.text)
that triggers on an omap1_defconfig + CONFIG_OMAP_CF=m build.
Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Aaro Koskinen aaro.koskinen@iki.fi Reviewed-by: Uwe Kleine-König u.kleine-koenig@baylibre.com Signed-off-by: Dominik Brodowski linux@dominikbrodowski.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pcmcia/omap_cf.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pcmcia/omap_cf.c b/drivers/pcmcia/omap_cf.c index 441cdf83f5a44..d6f24c7d15622 100644 --- a/drivers/pcmcia/omap_cf.c +++ b/drivers/pcmcia/omap_cf.c @@ -304,7 +304,13 @@ static void __exit omap_cf_remove(struct platform_device *pdev) kfree(cf); }
-static struct platform_driver omap_cf_driver = { +/* + * omap_cf_remove() lives in .exit.text. For drivers registered via + * platform_driver_probe() this is ok because they cannot get unbound at + * runtime. So mark the driver struct with __refdata to prevent modpost + * triggering a section mismatch warning. + */ +static struct platform_driver omap_cf_driver __refdata = { .driver = { .name = driver_name, },
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit d162694037215fe25f1487999c58d70df809a2fd ]
We should not use more sges for ib_post_send() than we told the rdma device in rdma_create_qp()!
Otherwise ib_post_send() will return -EINVAL, so we disconnect the connection. Or with the current siw.ko we'll get 0 from ib_post_send(), but will never ever get a completion for the request. I've already sent a fix for siw.ko...
So we need to make sure smb_direct_writev() limits the number of vectors we pass to individual smb_direct_post_send_data() calls, so that we don't go over the queue pair limits.
Commit 621433b7e25d ("ksmbd: smbd: relax the count of sges required") was very strange and I guess only needed because SMB_DIRECT_MAX_SEND_SGES was 8 at that time. It basically removed the check that the rdma device is able to handle the number of sges we try to use.
While the real problem was added by commit ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write") as it used the minumun of device->attrs.max_send_sge and device->attrs.max_sge_rd, with the problem that device->attrs.max_sge_rd is always 1 for iWarp. And that limitation should only apply to RDMA Read operations. For now we keep that limitation for RDMA Write operations too, fixing that is a task for another day as it's not really required a bug fix.
Commit 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs") lowered SMB_DIRECT_MAX_SEND_SGES to 6, which is also used by our client code. And that client code enforces device->attrs.max_send_sge >= 6 since commit d2e81f92e5b7 ("Decrease the number of SMB3 smbdirect client SGEs") and (briefly looking) only the i40w driver provides only 3, see I40IW_MAX_WQ_FRAGMENT_COUNT. But currently we'd require 4 anyway, so that would not work anyway, but now it fails early.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Hyunchul Lee hyc.lee@gmail.com Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: linux-rdma@vger.kernel.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Fixes: ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write") Fixes: 621433b7e25d ("ksmbd: smbd: relax the count of sges required") Fixes: 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs") Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/server/transport_rdma.c | 157 ++++++++++++++++++++++----------- 1 file changed, 107 insertions(+), 50 deletions(-)
diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/transport_rdma.c index 5466aa8c39b1c..cc4322bfa1d61 100644 --- a/fs/smb/server/transport_rdma.c +++ b/fs/smb/server/transport_rdma.c @@ -1209,78 +1209,130 @@ static int smb_direct_writev(struct ksmbd_transport *t, bool need_invalidate, unsigned int remote_key) { struct smb_direct_transport *st = smb_trans_direct_transfort(t); - int remaining_data_length; - int start, i, j; - int max_iov_size = st->max_send_size - + size_t remaining_data_length; + size_t iov_idx; + size_t iov_ofs; + size_t max_iov_size = st->max_send_size - sizeof(struct smb_direct_data_transfer); int ret; - struct kvec vec; struct smb_direct_send_ctx send_ctx; + int error = 0;
if (st->status != SMB_DIRECT_CS_CONNECTED) return -ENOTCONN;
//FIXME: skip RFC1002 header.. + if (WARN_ON_ONCE(niovs <= 1 || iov[0].iov_len != 4)) + return -EINVAL; buflen -= 4; + iov_idx = 1; + iov_ofs = 0;
remaining_data_length = buflen; ksmbd_debug(RDMA, "Sending smb (RDMA): smb_len=%u\n", buflen);
smb_direct_send_ctx_init(st, &send_ctx, need_invalidate, remote_key); - start = i = 1; - buflen = 0; - while (true) { - buflen += iov[i].iov_len; - if (buflen > max_iov_size) { - if (i > start) { - remaining_data_length -= - (buflen - iov[i].iov_len); - ret = smb_direct_post_send_data(st, &send_ctx, - &iov[start], i - start, - remaining_data_length); - if (ret) + while (remaining_data_length) { + struct kvec vecs[SMB_DIRECT_MAX_SEND_SGES - 1]; /* minus smbdirect hdr */ + size_t possible_bytes = max_iov_size; + size_t possible_vecs; + size_t bytes = 0; + size_t nvecs = 0; + + /* + * For the last message remaining_data_length should be + * have been 0 already! + */ + if (WARN_ON_ONCE(iov_idx >= niovs)) { + error = -EINVAL; + goto done; + } + + /* + * We have 2 factors which limit the arguments we pass + * to smb_direct_post_send_data(): + * + * 1. The number of supported sges for the send, + * while one is reserved for the smbdirect header. + * And we currently need one SGE per page. + * 2. The number of negotiated payload bytes per send. + */ + possible_vecs = min_t(size_t, ARRAY_SIZE(vecs), niovs - iov_idx); + + while (iov_idx < niovs && possible_vecs && possible_bytes) { + struct kvec *v = &vecs[nvecs]; + int page_count; + + v->iov_base = ((u8 *)iov[iov_idx].iov_base) + iov_ofs; + v->iov_len = min_t(size_t, + iov[iov_idx].iov_len - iov_ofs, + possible_bytes); + page_count = get_buf_page_count(v->iov_base, v->iov_len); + if (page_count > possible_vecs) { + /* + * If the number of pages in the buffer + * is to much (because we currently require + * one SGE per page), we need to limit the + * length. + * + * We know possible_vecs is at least 1, + * so we always keep the first page. + * + * We need to calculate the number extra + * pages (epages) we can also keep. + * + * We calculate the number of bytes in the + * first page (fplen), this should never be + * larger than v->iov_len because page_count is + * at least 2, but adding a limitation feels + * better. + * + * Then we calculate the number of bytes (elen) + * we can keep for the extra pages. + */ + size_t epages = possible_vecs - 1; + size_t fpofs = offset_in_page(v->iov_base); + size_t fplen = min_t(size_t, PAGE_SIZE - fpofs, v->iov_len); + size_t elen = min_t(size_t, v->iov_len - fplen, epages*PAGE_SIZE); + + v->iov_len = fplen + elen; + page_count = get_buf_page_count(v->iov_base, v->iov_len); + if (WARN_ON_ONCE(page_count > possible_vecs)) { + /* + * Something went wrong in the above + * logic... + */ + error = -EINVAL; goto done; - } else { - /* iov[start] is too big, break it */ - int nvec = (buflen + max_iov_size - 1) / - max_iov_size; - - for (j = 0; j < nvec; j++) { - vec.iov_base = - (char *)iov[start].iov_base + - j * max_iov_size; - vec.iov_len = - min_t(int, max_iov_size, - buflen - max_iov_size * j); - remaining_data_length -= vec.iov_len; - ret = smb_direct_post_send_data(st, &send_ctx, &vec, 1, - remaining_data_length); - if (ret) - goto done; } - i++; - if (i == niovs) - break; } - start = i; - buflen = 0; - } else { - i++; - if (i == niovs) { - /* send out all remaining vecs */ - remaining_data_length -= buflen; - ret = smb_direct_post_send_data(st, &send_ctx, - &iov[start], i - start, - remaining_data_length); - if (ret) - goto done; - break; + possible_vecs -= page_count; + nvecs += 1; + possible_bytes -= v->iov_len; + bytes += v->iov_len; + + iov_ofs += v->iov_len; + if (iov_ofs >= iov[iov_idx].iov_len) { + iov_idx += 1; + iov_ofs = 0; } } + + remaining_data_length -= bytes; + + ret = smb_direct_post_send_data(st, &send_ctx, + vecs, nvecs, + remaining_data_length); + if (unlikely(ret)) { + error = ret; + goto done; + } }
done: ret = smb_direct_flush_send_list(st, &send_ctx, true); + if (unlikely(!ret && error)) + ret = error;
/* * As an optimization, we don't wait for individual I/O to finish @@ -1744,6 +1796,11 @@ static int smb_direct_init_params(struct smb_direct_transport *t, return -EINVAL; }
+ if (device->attrs.max_send_sge < SMB_DIRECT_MAX_SEND_SGES) { + pr_err("warning: device max_send_sge = %d too small\n", + device->attrs.max_send_sge); + return -EINVAL; + } if (device->attrs.max_recv_sge < SMB_DIRECT_MAX_RECV_SGES) { pr_err("warning: device max_recv_sge = %d too small\n", device->attrs.max_recv_sge); @@ -1767,7 +1824,7 @@ static int smb_direct_init_params(struct smb_direct_transport *t,
cap->max_send_wr = max_send_wrs; cap->max_recv_wr = t->recv_credit_max; - cap->max_send_sge = max_sge_per_wr; + cap->max_send_sge = SMB_DIRECT_MAX_SEND_SGES; cap->max_recv_sge = SMB_DIRECT_MAX_RECV_SGES; cap->max_inline_data = 0; cap->max_rdma_ctxs = t->max_rw_credits;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 7ebf70cf181651fe3f2e44e95e7e5073d594c9c0 ]
When register_virtio_device() fails in virtio_uml_probe(), the code sets vu_dev->registered = 1 even though the device was not successfully registered. This can lead to use-after-free or other issues.
Fixes: 04e5b1fb0183 ("um: virtio: Remove device on disconnect") Signed-off-by: Miaoqian Lin linmq006@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/um/drivers/virtio_uml.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c index ad8d78fb1d9aa..de7867ae220d0 100644 --- a/arch/um/drivers/virtio_uml.c +++ b/arch/um/drivers/virtio_uml.c @@ -1250,10 +1250,12 @@ static int virtio_uml_probe(struct platform_device *pdev) device_set_wakeup_capable(&vu_dev->vdev.dev, true);
rc = register_virtio_device(&vu_dev->vdev); - if (rc) + if (rc) { put_device(&vu_dev->vdev.dev); + return rc; + } vu_dev->registered = 1; - return rc; + return 0;
error_init: os_close_file(vu_dev->sock);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiwei Bie tiwei.btw@antgroup.com
[ Upstream commit df447a3b4a4b961c9979b4b3ffb74317394b9b40 ]
When copying FDs, the copy size should not include the control message header (cmsghdr). Fix it.
Fixes: 5cde6096a4dd ("um: generalize os_rcv_fd") Signed-off-by: Tiwei Bie tiwei.btw@antgroup.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/um/os-Linux/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c index 617886d1fb1e9..21f0e50fb1df9 100644 --- a/arch/um/os-Linux/file.c +++ b/arch/um/os-Linux/file.c @@ -535,7 +535,7 @@ ssize_t os_rcv_fd_msg(int fd, int *fds, unsigned int n_fds, cmsg->cmsg_type != SCM_RIGHTS) return n;
- memcpy(fds, CMSG_DATA(cmsg), cmsg->cmsg_len); + memcpy(fds, CMSG_DATA(cmsg), cmsg->cmsg_len - CMSG_LEN(0)); return n; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Tian litian@redhat.com
[ Upstream commit 5577352b55833d0f4350eb5d62eda2df09e84922 ]
Because mlx5e_link_info and mlx5e_ext_link_info have holes e.g. Azure mlx5 reports PTYS 19. Do not return it unless speed is retrieved successfully.
Fixes: 65a5d35571849 ("net/mlx5: Refactor link speed handling with mlx5_link_info struct") Suggested-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Li Tian litian@redhat.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20250910003732.5973-1-litian@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index 2d7adf7444ba2..aa9f2b0a77d36 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -1170,7 +1170,11 @@ const struct mlx5_link_info *mlx5_port_ptys2info(struct mlx5_core_dev *mdev, mlx5e_port_get_link_mode_info_arr(mdev, &table, &max_size, force_legacy); i = find_first_bit(&temp, max_size); - if (i < max_size) + + /* mlx5e_link_info has holes. Check speed + * is not zero as indication of one. + */ + if (i < max_size && table[i].speed) return &table[i];
return NULL;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit 2690cb089502b80b905f2abdafd1bf2d54e1abef ]
Starting with commit c50e7475961c ("dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()"), the probing of a second DPSW object errors out like below.
fsl_dpaa2_switch dpsw.1: fsl_mc_driver_probe failed: -12 fsl_dpaa2_switch dpsw.1: probe with driver fsl_dpaa2_switch failed with error -12
The aforementioned commit brought to the surface the fact that seeding buffers into the buffer pool destined for control traffic is not successful and an access violation recoverable error can be seen in the MC firmware log:
[E, qbman_rec_isr:391, QBMAN] QBMAN recoverable event 0x1000000
This happens because the driver incorrectly used the ID of the DPBP object instead of the hardware buffer pool ID when trying to release buffers into it.
This is because any DPSW object uses two buffer pools, one managed by the Linux driver and destined for control traffic packet buffers and the other one managed by the MC firmware and destined only for offloaded traffic. And since the buffer pool managed by the MC firmware does not have an external facing DPBP equivalent, any subsequent DPBP objects created after the first DPSW will have a DPBP id different to the underlying hardware buffer ID.
The issue was not caught earlier because these two numbers can be identical when all DPBP objects are created before the DPSW objects are. This is the case when the DPL file is used to describe the entire DPAA2 object layout and objects are created at boot time and it's also true for the first DPSW being created dynamically using ls-addsw.
Fix this by using the buffer pool ID instead of the DPBP id when releasing buffers into the pool.
Fixes: 2877e4f7e189 ("staging: dpaa2-switch: setup buffer pool and RX path rings") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Link: https://patch.msgid.link/20250910144825.2416019-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c index 4643a33806182..b1e1ad9e4b48e 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c @@ -2736,7 +2736,7 @@ static int dpaa2_switch_setup_dpbp(struct ethsw_core *ethsw) dev_err(dev, "dpsw_ctrl_if_set_pools() failed\n"); goto err_get_attr; } - ethsw->bpid = dpbp_attrs.id; + ethsw->bpid = dpbp_attrs.bpid;
return 0;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anderson Nascimento anderson@allelesecurity.com
[ Upstream commit 2e7bba08923ebc675b1f0e0e0959e68e53047838 ]
A NULL pointer dereference can occur in tcp_ao_finish_connect() during a connect() system call on a socket with a TCP-AO key added and TCP_REPAIR enabled.
The function is called with skb being NULL and attempts to dereference it on tcp_hdr(skb)->seq without a prior skb validation.
Fix this by checking if skb is NULL before dereferencing it.
The commentary is taken from bpf_skops_established(), which is also called in the same flow. Unlike the function being patched, bpf_skops_established() validates the skb before dereferencing it.
int main(void){ struct sockaddr_in sockaddr; struct tcp_ao_add tcp_ao; int sk; int one = 1;
memset(&sockaddr,'\0',sizeof(sockaddr)); memset(&tcp_ao,'\0',sizeof(tcp_ao));
sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr.sin_family = AF_INET;
memcpy(tcp_ao.alg_name,"cmac(aes128)",12); memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16); tcp_ao.keylen = 16;
memcpy(&tcp_ao.addr,&sockaddr,sizeof(sockaddr));
setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &tcp_ao, sizeof(tcp_ao)); setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &one, sizeof(one));
sockaddr.sin_family = AF_INET; sockaddr.sin_port = htobe16(123);
inet_aton("127.0.0.1", &sockaddr.sin_addr);
connect(sk,(struct sockaddr *)&sockaddr,sizeof(sockaddr));
return 0; }
$ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall $ unshare -Urn
BUG: kernel NULL pointer dereference, address: 00000000000000b6 PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182)
Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys") Signed-off-by: Anderson Nascimento anderson@allelesecurity.com Reviewed-by: Dmitry Safonov 0x7f454c46@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20250911230743.2551-3-anderson@allelesecurity.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_ao.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c index bbb8d5f0eae7d..3338b6cc85c48 100644 --- a/net/ipv4/tcp_ao.c +++ b/net/ipv4/tcp_ao.c @@ -1178,7 +1178,9 @@ void tcp_ao_finish_connect(struct sock *sk, struct sk_buff *skb) if (!ao) return;
- WRITE_ONCE(ao->risn, tcp_hdr(skb)->seq); + /* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */ + if (skb) + WRITE_ONCE(ao->risn, tcp_hdr(skb)->seq); ao->rcv_sne = 0;
hlist_for_each_entry_rcu(key, &ao->head, node, lockdep_sock_is_held(sk))
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 70d99623d5c11e1a9bcc564b8fbad6fa916913d8 ]
The DPLL_CLOCK_QUALITY_LEVEL_ITU_OPT1_EPRC is not reported via netlink due to bug in dpll_msg_add_clock_quality_level(). The usage of DPLL_CLOCK_QUALITY_LEVEL_MAX for both DECLARE_BITMAP() and for_each_set_bit() is not correct because these macros requires bitmap size and not the highest valid bit in the bitmap.
Use correct bitmap size to fix this issue.
Fixes: a1afb959add1 ("dpll: add clock quality level attribute and op") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Link: https://patch.msgid.link/20250912093331.862333-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dpll/dpll_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c index c130f87147fa3..815de752bcd38 100644 --- a/drivers/dpll/dpll_netlink.c +++ b/drivers/dpll/dpll_netlink.c @@ -173,8 +173,8 @@ static int dpll_msg_add_clock_quality_level(struct sk_buff *msg, struct dpll_device *dpll, struct netlink_ext_ack *extack) { + DECLARE_BITMAP(qls, DPLL_CLOCK_QUALITY_LEVEL_MAX + 1) = { 0 }; const struct dpll_device_ops *ops = dpll_device_ops(dpll); - DECLARE_BITMAP(qls, DPLL_CLOCK_QUALITY_LEVEL_MAX) = { 0 }; enum dpll_clock_quality_level ql; int ret;
@@ -183,7 +183,7 @@ dpll_msg_add_clock_quality_level(struct sk_buff *msg, struct dpll_device *dpll, ret = ops->clock_quality_level_get(dpll, dpll_priv(dpll), qls, extack); if (ret) return ret; - for_each_set_bit(ql, qls, DPLL_CLOCK_QUALITY_LEVEL_MAX) + for_each_set_bit(ql, qls, DPLL_CLOCK_QUALITY_LEVEL_MAX + 1) if (nla_put_u32(msg, DPLL_A_CLOCK_QUALITY_LEVEL, ql)) return -EMSGSIZE;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 64863f4ca4945bdb62ce2b30823f39ea9fe95415 ]
rxgk_verify_packet_integrity() may get more errors than just -EPROTO from rxgk_verify_mic_skb(). Pretty much anything other than -ENOMEM constitutes an unrecoverable error. In the case of -ENOMEM, we can just drop the packet and wait for a retransmission.
Similar happens with rxgk_decrypt_skb() and its callers.
Fix rxgk_decrypt_skb() or rxgk_verify_mic_skb() to return a greater variety of abort codes and fix their callers to abort the connection on any error apart from -ENOMEM.
Also preclear the variables used to hold the abort code returned from rxgk_decrypt_skb() or rxgk_verify_mic_skb() to eliminate uninitialised variable warnings.
Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009739.html Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009740.html Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/2038804.1757631496@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/rxgk.c | 18 ++++++++++-------- net/rxrpc/rxgk_app.c | 10 ++++++---- net/rxrpc/rxgk_common.h | 14 ++++++++++++-- 3 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/net/rxrpc/rxgk.c b/net/rxrpc/rxgk.c index 1e19c605bcc82..dce5a3d8a964f 100644 --- a/net/rxrpc/rxgk.c +++ b/net/rxrpc/rxgk.c @@ -475,7 +475,7 @@ static int rxgk_verify_packet_integrity(struct rxrpc_call *call, struct krb5_buffer metadata; unsigned int offset = sp->offset, len = sp->len; size_t data_offset = 0, data_len = len; - u32 ac; + u32 ac = 0; int ret = -ENOMEM;
_enter(""); @@ -499,9 +499,10 @@ static int rxgk_verify_packet_integrity(struct rxrpc_call *call, ret = rxgk_verify_mic_skb(gk->krb5, gk->rx_Kc, &metadata, skb, &offset, &len, &ac); kfree(hdr); - if (ret == -EPROTO) { - rxrpc_abort_eproto(call, skb, ac, - rxgk_abort_1_verify_mic_eproto); + if (ret < 0) { + if (ret != -ENOMEM) + rxrpc_abort_eproto(call, skb, ac, + rxgk_abort_1_verify_mic_eproto); } else { sp->offset = offset; sp->len = len; @@ -524,15 +525,16 @@ static int rxgk_verify_packet_encrypted(struct rxrpc_call *call, struct rxgk_header hdr; unsigned int offset = sp->offset, len = sp->len; int ret; - u32 ac; + u32 ac = 0;
_enter("");
ret = rxgk_decrypt_skb(gk->krb5, gk->rx_enc, skb, &offset, &len, &ac); - if (ret == -EPROTO) - rxrpc_abort_eproto(call, skb, ac, rxgk_abort_2_decrypt_eproto); - if (ret < 0) + if (ret < 0) { + if (ret != -ENOMEM) + rxrpc_abort_eproto(call, skb, ac, rxgk_abort_2_decrypt_eproto); goto error; + }
if (len < sizeof(hdr)) { ret = rxrpc_abort_eproto(call, skb, RXGK_PACKETSHORT, diff --git a/net/rxrpc/rxgk_app.c b/net/rxrpc/rxgk_app.c index b94b77a1c3178..df684b5a85314 100644 --- a/net/rxrpc/rxgk_app.c +++ b/net/rxrpc/rxgk_app.c @@ -187,7 +187,7 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb, struct key *server_key; unsigned int ticket_offset, ticket_len; u32 kvno, enctype; - int ret, ec; + int ret, ec = 0;
struct { __be32 kvno; @@ -236,9 +236,11 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb, &ticket_offset, &ticket_len, &ec); crypto_free_aead(token_enc); token_enc = NULL; - if (ret < 0) - return rxrpc_abort_conn(conn, skb, ec, ret, - rxgk_abort_resp_tok_dec); + if (ret < 0) { + if (ret != -ENOMEM) + return rxrpc_abort_conn(conn, skb, ec, ret, + rxgk_abort_resp_tok_dec); + }
ret = conn->security->default_decode_ticket(conn, skb, ticket_offset, ticket_len, _key); diff --git a/net/rxrpc/rxgk_common.h b/net/rxrpc/rxgk_common.h index 7370a56559853..80164d89e19c0 100644 --- a/net/rxrpc/rxgk_common.h +++ b/net/rxrpc/rxgk_common.h @@ -88,11 +88,16 @@ int rxgk_decrypt_skb(const struct krb5_enctype *krb5, *_offset += offset; *_len = len; break; + case -EBADMSG: /* Checksum mismatch. */ case -EPROTO: - case -EBADMSG: *_error_code = RXGK_SEALEDINCON; break; + case -EMSGSIZE: + *_error_code = RXGK_PACKETSHORT; + break; + case -ENOPKG: /* Would prefer RXGK_BADETYPE, but not available for YFS. */ default: + *_error_code = RXGK_INCONSISTENCY; break; }
@@ -127,11 +132,16 @@ int rxgk_verify_mic_skb(const struct krb5_enctype *krb5, *_offset += offset; *_len = len; break; + case -EBADMSG: /* Checksum mismatch */ case -EPROTO: - case -EBADMSG: *_error_code = RXGK_SEALEDINCON; break; + case -EMSGSIZE: + *_error_code = RXGK_PACKETSHORT; + break; + case -ENOPKG: /* Would prefer RXGK_BADETYPE, but not available for YFS. */ default: + *_error_code = RXGK_INCONSISTENCY; break; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 2429a197648178cd4dc930a9d87c13c547460564 ]
Fix the following Smatch static checker warning:
net/rxrpc/rxgk_app.c:65 rxgk_yfs_decode_ticket() warn: untrusted unsigned subtract. 'ticket_len - 10 * 4'
by prechecking the length of what we're trying to extract in two places in the token and decoding for a response packet.
Also use sizeof() on the struct we're extracting rather specifying the size numerically to be consistent with the other related statements.
Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lists.infradead.org/pipermail/linux-afs/2025-September/010135.html Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/2039268.1757631977@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/rxgk_app.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/net/rxrpc/rxgk_app.c b/net/rxrpc/rxgk_app.c index df684b5a85314..30275cb5ba3e2 100644 --- a/net/rxrpc/rxgk_app.c +++ b/net/rxrpc/rxgk_app.c @@ -54,6 +54,10 @@ int rxgk_yfs_decode_ticket(struct rxrpc_connection *conn, struct sk_buff *skb,
_enter("");
+ if (ticket_len < 10 * sizeof(__be32)) + return rxrpc_abort_conn(conn, skb, RXGK_INCONSISTENCY, -EPROTO, + rxgk_abort_resp_short_yfs_tkt); + /* Get the session key length */ ret = skb_copy_bits(skb, ticket_offset, tmp, sizeof(tmp)); if (ret < 0) @@ -195,22 +199,23 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb, __be32 token_len; } container;
+ if (token_len < sizeof(container)) + goto short_packet; + /* Decode the RXGK_TokenContainer object. This tells us which server * key we should be using. We can then fetch the key, get the secret * and set up the crypto to extract the token. */ if (skb_copy_bits(skb, token_offset, &container, sizeof(container)) < 0) - return rxrpc_abort_conn(conn, skb, RXGK_PACKETSHORT, -EPROTO, - rxgk_abort_resp_tok_short); + goto short_packet;
kvno = ntohl(container.kvno); enctype = ntohl(container.enctype); ticket_len = ntohl(container.token_len); ticket_offset = token_offset + sizeof(container);
- if (xdr_round_up(ticket_len) > token_len - 3 * 4) - return rxrpc_abort_conn(conn, skb, RXGK_PACKETSHORT, -EPROTO, - rxgk_abort_resp_tok_short); + if (xdr_round_up(ticket_len) > token_len - sizeof(container)) + goto short_packet;
_debug("KVNO %u", kvno); _debug("ENC %u", enctype); @@ -285,4 +290,8 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb, * also come out this way if the ticket decryption fails. */ return ret; + +short_packet: + return rxrpc_abort_conn(conn, skb, RXGK_PACKETSHORT, -EPROTO, + rxgk_abort_resp_tok_short); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kamal Heib kheib@redhat.com
[ Upstream commit af82e857df5dd883a4867bcaf5dde041e57a4e33 ]
Add a helper to validate the VF ID and use it in the VF ndo ops to prevent accessing out-of-range entries.
Without this check, users can run commands such as:
# ip link show dev enp135s0 2: enp135s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:00:00:01:01:00 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off # ip link set dev enp135s0 vf 4 mac 00:00:00:00:00:14 # echo $? 0
even though VF 4 does not exist, which results in silent success instead of returning an error.
Fixes: 8a241ef9b9b8 ("octeon_ep: add ndo ops for VFs in PF driver") Signed-off-by: Kamal Heib kheib@redhat.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250911223610.1803144-1-kheib@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/marvell/octeon_ep/octep_main.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c index 24499bb36c005..bcea3fc26a8c7 100644 --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c @@ -1124,11 +1124,24 @@ static int octep_set_features(struct net_device *dev, netdev_features_t features return err; }
+static bool octep_is_vf_valid(struct octep_device *oct, int vf) +{ + if (vf >= CFG_GET_ACTIVE_VFS(oct->conf)) { + netdev_err(oct->netdev, "Invalid VF ID %d\n", vf); + return false; + } + + return true; +} + static int octep_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivi) { struct octep_device *oct = netdev_priv(dev);
+ if (!octep_is_vf_valid(oct, vf)) + return -EINVAL; + ivi->vf = vf; ether_addr_copy(ivi->mac, oct->vf_info[vf].mac_addr); ivi->spoofchk = true; @@ -1143,6 +1156,9 @@ static int octep_set_vf_mac(struct net_device *dev, int vf, u8 *mac) struct octep_device *oct = netdev_priv(dev); int err;
+ if (!octep_is_vf_valid(oct, vf)) + return -EINVAL; + if (!is_valid_ether_addr(mac)) { dev_err(&oct->pdev->dev, "Invalid MAC Address %pM\n", mac); return -EADDRNOTAVAIL;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamie Bainbridge jamie.bainbridge@gmail.com
[ Upstream commit 56c0a2a9ddc2f5b5078c5fb0f81ab76bbc3d4c37 ]
In the protection override dump path, the firmware can return far too many GRC elements, resulting in attempting to write past the end of the previously-kmalloc'ed dump buffer.
This will result in a kernel panic with reason:
BUG: unable to handle kernel paging request at ADDRESS
where "ADDRESS" is just past the end of the protection override dump buffer. The start address of the buffer is: p_hwfn->cdev->dbg_features[DBG_FEATURE_PROTECTION_OVERRIDE].dump_buf and the size of the buffer is buf_size in the same data structure.
The panic can be arrived at from either the qede Ethernet driver path:
[exception RIP: qed_grc_dump_addr_range+0x108] qed_protection_override_dump at ffffffffc02662ed [qed] qed_dbg_protection_override_dump at ffffffffc0267792 [qed] qed_dbg_feature at ffffffffc026aa8f [qed] qed_dbg_all_data at ffffffffc026b211 [qed] qed_fw_fatal_reporter_dump at ffffffffc027298a [qed] devlink_health_do_dump at ffffffff82497f61 devlink_health_report at ffffffff8249cf29 qed_report_fatal_error at ffffffffc0272baf [qed] qede_sp_task at ffffffffc045ed32 [qede] process_one_work at ffffffff81d19783
or the qedf storage driver path:
[exception RIP: qed_grc_dump_addr_range+0x108] qed_protection_override_dump at ffffffffc068b2ed [qed] qed_dbg_protection_override_dump at ffffffffc068c792 [qed] qed_dbg_feature at ffffffffc068fa8f [qed] qed_dbg_all_data at ffffffffc0690211 [qed] qed_fw_fatal_reporter_dump at ffffffffc069798a [qed] devlink_health_do_dump at ffffffff8aa95e51 devlink_health_report at ffffffff8aa9ae19 qed_report_fatal_error at ffffffffc0697baf [qed] qed_hw_err_notify at ffffffffc06d32d7 [qed] qed_spq_post at ffffffffc06b1011 [qed] qed_fcoe_destroy_conn at ffffffffc06b2e91 [qed] qedf_cleanup_fcport at ffffffffc05e7597 [qedf] qedf_rport_event_handler at ffffffffc05e7bf7 [qedf] fc_rport_work at ffffffffc02da715 [libfc] process_one_work at ffffffff8a319663
Resolve this by clamping the firmware's return value to the maximum number of legal elements the firmware should return.
Fixes: d52c89f120de8 ("qed*: Utilize FW 8.37.2.0") Signed-off-by: Jamie Bainbridge jamie.bainbridge@gmail.com Link: https://patch.msgid.link/f8e1182934aa274c18d0682a12dbaf347595469c.1757485536... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_debug.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c index 9c3d3dd2f8475..1f0cea3cae92f 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c @@ -4462,10 +4462,11 @@ static enum dbg_status qed_protection_override_dump(struct qed_hwfn *p_hwfn, goto out; }
- /* Add override window info to buffer */ + /* Add override window info to buffer, preventing buffer overflow */ override_window_dwords = - qed_rd(p_hwfn, p_ptt, GRC_REG_NUMBER_VALID_OVERRIDE_WINDOW) * - PROTECTION_OVERRIDE_ELEMENT_DWORDS; + min(qed_rd(p_hwfn, p_ptt, GRC_REG_NUMBER_VALID_OVERRIDE_WINDOW) * + PROTECTION_OVERRIDE_ELEMENT_DWORDS, + PROTECTION_OVERRIDE_DEPTH_DWORDS); if (override_window_dwords) { addr = BYTES_TO_DWORDS(GRC_REG_PROTECTION_OVERRIDE_WINDOW); offset += qed_grc_dump_addr_range(p_hwfn,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit a9888628cb2c768202a4530e2816da1889cc3165 ]
Both OVS and TC flower allow extracting and matching on the DF bit of the outer IP header via OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT in the OVS_KEY_ATTR_TUNNEL and TCA_FLOWER_KEY_FLAGS_TUNNEL_DONT_FRAGMENT in the TCA_FLOWER_KEY_ENC_FLAGS respectively. Flow dissector extracts this information as FLOW_DIS_F_TUNNEL_DONT_FRAGMENT from the tunnel info key.
However, the IP_TUNNEL_DONT_FRAGMENT_BIT in the tunnel key is never actually set, because the tunneling code doesn't actually extract it from the IP header. OAM and CRIT_OPT are extracted by the tunnel implementation code, same code also sets the KEY flag, if present. UDP tunnel core takes care of setting the CSUM flag if the checksum is present in the UDP header, but the DONT_FRAGMENT is not handled at any layer.
Fix that by checking the bit and setting the corresponding flag while populating the tunnel info in the IP layer where it belongs.
Not using __assign_bit as we don't really need to clear the bit in a just initialized field. It also doesn't seem like using __assign_bit will make the code look better.
Clearly, users didn't rely on this functionality for anything very important until now. The reason why this doesn't break OVS logic is that it only matches on what kernel previously parsed out and if kernel consistently reports this bit as zero, OVS will only match on it to be zero, which sort of works. But it is still a bug that the uAPI reports and allows matching on the field that is not actually checked in the packet. And this is causing misleading -df reporting in OVS datapath flows, while the tunnel traffic actually has the bit set in most cases.
This may also cause issues if a hardware properly implements support for tunnel flag matching as it will disagree with the implementation in a software path of TC flower.
Fixes: 7d5437c709de ("openvswitch: Add tunneling interface.") Fixes: 1d17568e74de ("net/sched: cls_flower: add support for matching tunnel control flags") Signed-off-by: Ilya Maximets i.maximets@ovn.org Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://patch.msgid.link/20250909165440.229890-2-i.maximets@ovn.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/dst_metadata.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 4160731dcb6e3..1fc2fb03ce3f9 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -3,6 +3,7 @@ #define __NET_DST_METADATA_H 1
#include <linux/skbuff.h> +#include <net/ip.h> #include <net/ip_tunnels.h> #include <net/macsec.h> #include <net/dst.h> @@ -220,9 +221,15 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb, int md_size) { const struct iphdr *iph = ip_hdr(skb); + struct metadata_dst *tun_dst; + + tun_dst = __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl, + 0, flags, tunnel_id, md_size);
- return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl, - 0, flags, tunnel_id, md_size); + if (tun_dst && (iph->frag_off & htons(IP_DF))) + __set_bit(IP_TUNNEL_DONT_FRAGMENT_BIT, + tun_dst->u.tun_info.key.tun_flags); + return tun_dst; }
static inline struct metadata_dst *__ipv6_tun_set_dst(const struct in6_addr *saddr,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 35ae4e86292ef7dfe4edbb9942955c884e984352 ]
After commit 5c3bf6cba791 ("bonding: assign random address if device address is same as bond"), bonding will erroneously randomize the MAC address of the first interface added to the bond if fail_over_mac = follow.
Correct this by additionally testing for the bond being empty before randomizing the MAC.
Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond") Reported-by: Qiuling Ren qren@redhat.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c4d53e8e7c152..e413340be2bff 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2115,6 +2115,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, memcpy(ss.__data, bond_dev->dev_addr, bond_dev->addr_len); } else if (bond->params.fail_over_mac == BOND_FOM_FOLLOW && BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP && + bond_has_slaves(bond) && memcmp(slave_dev->dev_addr, bond_dev->dev_addr, bond_dev->addr_len) == 0) { /* Set slave to random address to avoid duplicate mac * address in later fail over.
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
[ Upstream commit 96939cec994070aa5df852c10fad5fc303a97ea3 ]
When a SYN containing the 'C' flag (deny join id0) was received, this piece of information was not propagated to the path-manager.
Even if this flag is mainly set on the server side, a client can also tell the server it cannot try to establish new subflows to the client's initial IP address and port. The server's PM should then record such info when received, and before sending events about the new connection.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-1-401... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/subflow.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 1802bc5435a1a..d77a2e374a7ae 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -882,6 +882,10 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
ctx->subflow_id = 1; owner = mptcp_sk(ctx->conn); + + if (mp_opt.deny_join_id0) + WRITE_ONCE(owner->pm.remote_deny_join_id0, true); + mptcp_pm_new_connection(owner, child, 1);
/* with OoO packets we can reach here without ingress
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
[ Upstream commit 24733e193a0d68f20d220e86da0362460c9aa812 ]
The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make sure it is correctly announced by the other peer when it has been received.
pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events, and when this flag was set by the other peer.
The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID.
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-401... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 7 +++++++ tools/testing/selftests/net/mptcp/userspace_pm.sh | 14 +++++++++++--- 2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c index 994a556f46c15..93fea3442216c 100644 --- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c +++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c @@ -188,6 +188,13 @@ static int capture_events(int fd, int event_group) fprintf(stderr, ",error:%u", *(__u8 *)RTA_DATA(attrs)); else if (attrs->rta_type == MPTCP_ATTR_SERVER_SIDE) fprintf(stderr, ",server_side:%u", *(__u8 *)RTA_DATA(attrs)); + else if (attrs->rta_type == MPTCP_ATTR_FLAGS) { + __u16 flags = *(__u16 *)RTA_DATA(attrs); + + /* only print when present, easier */ + if (flags & MPTCP_PM_EV_FLAG_DENY_JOIN_ID0) + fprintf(stderr, ",deny_join_id0:1"); + }
attrs = RTA_NEXT(attrs, msg_len); } diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh b/tools/testing/selftests/net/mptcp/userspace_pm.sh index 333064b0b5ac0..97819e18578f4 100755 --- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -201,6 +201,9 @@ make_connection() is_v6="v4" fi
+ # set this on the client side only: will not affect the rest + ip netns exec "$ns2" sysctl -q net.mptcp.allow_join_initial_addr_port=0 + :>"$client_evts" :>"$server_evts"
@@ -223,23 +226,28 @@ make_connection() local client_token local client_port local client_serverside + local client_nojoin local server_token local server_serverside + local server_nojoin
client_token=$(mptcp_lib_evts_get_info token "$client_evts") client_port=$(mptcp_lib_evts_get_info sport "$client_evts") client_serverside=$(mptcp_lib_evts_get_info server_side "$client_evts") + client_nojoin=$(mptcp_lib_evts_get_info deny_join_id0 "$client_evts") server_token=$(mptcp_lib_evts_get_info token "$server_evts") server_serverside=$(mptcp_lib_evts_get_info server_side "$server_evts") + server_nojoin=$(mptcp_lib_evts_get_info deny_join_id0 "$server_evts")
print_test "Established IP${is_v6} MPTCP Connection ns2 => ns1" - if [ "$client_token" != "" ] && [ "$server_token" != "" ] && [ "$client_serverside" = 0 ] && - [ "$server_serverside" = 1 ] + if [ "${client_token}" != "" ] && [ "${server_token}" != "" ] && + [ "${client_serverside}" = 0 ] && [ "${server_serverside}" = 1 ] && + [ "${client_nojoin:-0}" = 0 ] && [ "${server_nojoin:-0}" = 1 ] then test_pass print_title "Connection info: ${client_addr}:${client_port} -> ${connect_addr}:${app_port}" else - test_fail "Expected tokens (c:${client_token} - s:${server_token}) and server (c:${client_serverside} - s:${server_serverside})" + test_fail "Expected tokens (c:${client_token} - s:${server_token}), server (c:${client_serverside} - s:${server_serverside}), nojoin (c:${client_nojoin} - s:${server_nojoin})" mptcp_lib_result_print_all_tap exit ${KSFT_FAIL} fi
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
[ Upstream commit 92da495cb65719583aa06bc946aeb18a10e1e6e2 ]
When TFO is used, the check to see if the 'C' flag (deny join id0) was set was bypassed.
This flag can be set when TFO is used, so the check should also be done when TFO is used.
Note that the set_fully_established label is also used when a 4th ACK is received. In this case, deny_join_id0 will not be set.
Fixes: dfc8d0603033 ("mptcp: implement delayed seq generation for passive fastopen") Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-4-401... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/options.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c index c6983471dca55..bb4253aa675a6 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -984,13 +984,13 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk, return false; }
- if (mp_opt->deny_join_id0) - WRITE_ONCE(msk->pm.remote_deny_join_id0, true); - if (unlikely(!READ_ONCE(msk->pm.server_side))) pr_warn_once("bogus mpc option on established client sk");
set_fully_established: + if (mp_opt->deny_join_id0) + WRITE_ONCE(msk->pm.remote_deny_join_id0, true); + mptcp_data_lock((struct sock *)msk); __mptcp_subflow_fully_established(msk, subflow, mp_opt); mptcp_data_unlock((struct sock *)msk);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
[ Upstream commit b86418beade11d45540a2d20c4ec1128849b6c27 ]
This patch fixes several issues in the error reporting of the MPTCP sockopt selftest:
1. Fix diff not printed: The error messages for counter mismatches had the actual difference ('diff') as argument, but it was missing in the format string. Displaying it makes the debugging easier.
2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly used 'ret2' (sent bytes) for both the expected value and the difference calculation. It now correctly uses 'ret' (received bytes), which is the expected value for bytes_acked.
3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected value in the condition above it.
Fixes: 5dcff89e1455 ("selftests: mptcp: explicitly tests aggregate counters") Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-401... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/net/mptcp/mptcp_sockopt.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c index e934dd26a59d9..112c07c4c37a3 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c @@ -667,22 +667,26 @@ static void process_one_client(int fd, int pipefd)
do_getsockopts(&s, fd, ret, ret2); if (s.mptcpi_rcv_delta != (uint64_t)ret + 1) - xerror("mptcpi_rcv_delta %" PRIu64 ", expect %" PRIu64, s.mptcpi_rcv_delta, ret + 1, s.mptcpi_rcv_delta - ret); + xerror("mptcpi_rcv_delta %" PRIu64 ", expect %" PRIu64 ", diff %" PRId64, + s.mptcpi_rcv_delta, ret + 1, s.mptcpi_rcv_delta - (ret + 1));
/* be nice when running on top of older kernel */ if (s.pkt_stats_avail) { if (s.last_sample.mptcpi_bytes_sent != ret2) - xerror("mptcpi_bytes_sent %" PRIu64 ", expect %" PRIu64, + xerror("mptcpi_bytes_sent %" PRIu64 ", expect %" PRIu64 + ", diff %" PRId64, s.last_sample.mptcpi_bytes_sent, ret2, s.last_sample.mptcpi_bytes_sent - ret2); if (s.last_sample.mptcpi_bytes_received != ret) - xerror("mptcpi_bytes_received %" PRIu64 ", expect %" PRIu64, + xerror("mptcpi_bytes_received %" PRIu64 ", expect %" PRIu64 + ", diff %" PRId64, s.last_sample.mptcpi_bytes_received, ret, s.last_sample.mptcpi_bytes_received - ret); if (s.last_sample.mptcpi_bytes_acked != ret) - xerror("mptcpi_bytes_acked %" PRIu64 ", expect %" PRIu64, - s.last_sample.mptcpi_bytes_acked, ret2, - s.last_sample.mptcpi_bytes_acked - ret2); + xerror("mptcpi_bytes_acked %" PRIu64 ", expect %" PRIu64 + ", diff %" PRId64, + s.last_sample.mptcpi_bytes_acked, ret, + s.last_sample.mptcpi_bytes_acked - ret); }
close(fd);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yeounsu Moon yyyynoom@gmail.com
[ Upstream commit 93ab4881a4e2b9657bdce4b8940073bfb4ed5eab ]
`netif_rx()` already increments `rx_dropped` core stat when it fails. The driver was also updating `ndev->stats.rx_dropped` in the same path. Since both are reported together via `ip -s -s` command, this resulted in drops being counted twice in user-visible stats.
Keep the driver update on `if (unlikely(!skb))`, but skip it after `netif_rx()` errors.
Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter") Signed-off-by: Yeounsu Moon yyyynoom@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250913060135.35282-3-yyyynoom@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/natsemi/ns83820.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/natsemi/ns83820.c b/drivers/net/ethernet/natsemi/ns83820.c index 56d5464222d97..cdbf82affa7be 100644 --- a/drivers/net/ethernet/natsemi/ns83820.c +++ b/drivers/net/ethernet/natsemi/ns83820.c @@ -820,7 +820,7 @@ static void rx_irq(struct net_device *ndev) struct ns83820 *dev = PRIV(ndev); struct rx_info *info = &dev->rx_info; unsigned next_rx; - int rx_rc, len; + int len; u32 cmdsts; __le32 *desc; unsigned long flags; @@ -881,8 +881,10 @@ static void rx_irq(struct net_device *ndev) if (likely(CMDSTS_OK & cmdsts)) { #endif skb_put(skb, len); - if (unlikely(!skb)) + if (unlikely(!skb)) { + ndev->stats.rx_dropped++; goto netdev_mangle_me_harder_failed; + } if (cmdsts & CMDSTS_DEST_MULTI) ndev->stats.multicast++; ndev->stats.rx_packets++; @@ -901,15 +903,12 @@ static void rx_irq(struct net_device *ndev) __vlan_hwaccel_put_tag(skb, htons(ETH_P_IPV6), tag); } #endif - rx_rc = netif_rx(skb); - if (NET_RX_DROP == rx_rc) { -netdev_mangle_me_harder_failed: - ndev->stats.rx_dropped++; - } + netif_rx(skb); } else { dev_kfree_skb_irq(skb); }
+netdev_mangle_me_harder_failed: nr++; next_rx = info->next_rx; desc = info->descs + (DESC_SIZE * next_rx);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit 84bf1ac85af84d354c7a2fdbdc0d4efc8aaec34b ]
The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each buffer in the current frame. This function was introduced as part of handling multi-buffer XDP support in the ice driver.
It works by iterating over the buffers from first_desc up to 1 plus the total number of fragments in the frame, cached from before the XDP program was executed.
If the hardware posts a descriptor with a size of 0, the logic used in ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't call ice_put_rx_buf().
Because we don't call ice_put_rx_buf(), we don't attempt to re-use the page or free it. This leaves a stale page in the ring, as we don't increment next_to_alloc.
The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented properly, and that it always points to a buffer with a NULL page. Since this function doesn't check, it will happily recycle a page over the top of the next_to_alloc buffer, losing track of the old page.
Note that this leak only occurs for multi-buffer frames. The ice_put_rx_mbuf() function always handles at least one buffer, so a single-buffer frame will always get handled correctly. It is not clear precisely why the hardware hands us descriptors with a size of 0 sometimes, but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on all buffers between first_desc and next_to_clean. Borrow the logic of a similar function in i40e used for this same purpose. Use the same logic also in ice_get_pgcnts().
Instead of iterating over just the number of fragments, use a loop which iterates until the current index reaches to the next_to_clean element just past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does call ice_put_rx_buf() on the last buffer of the frame indicating the end of packet.
For non-linear (multi-buffer) frames, we need to take care when adjusting the pagecnt_bias. An XDP program might release fragments from the tail of the frame, in which case that fragment page is already released. Only update the pagecnt_bias for the first descriptor and fragments still remaining post-XDP program. Take care to only access the shared info for fragmented buffers, as this avoids a significant cache miss.
The xdp_xmit value only needs to be updated if an XDP program is run, and only once per packet. Drop the xdp_xmit pointer argument from ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function directly. This avoids needing to pass the argument and avoids an extra bit-wise OR for each buffer in the frame.
Move the increment of the ntc local variable to ensure its updated *before* all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic requires the index of the element just after the current frame.
Now that we use an index pointer in the ring to identify the packet, we no longer need to track or cache the number of fragments in the rx_ring.
Cc: Christoph Petrausch christoph.petrausch@deepl.com Cc: Jesper Dangaard Brouer hawk@kernel.org Reported-by: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44... Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") Tested-by: Michal Kubiak michal.kubiak@intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Acked-by: Jesper Dangaard Brouer hawk@kernel.org Tested-by: Priya Singh priyax.singh@intel.com Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 80 ++++++++++------------- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - 2 files changed, 34 insertions(+), 47 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index c50cf3ad190e9..4766597ac5550 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -865,10 +865,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, rx_buf->page_offset, size); sinfo->xdp_frags_size += size; - /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() - * can pop off frags but driver has to handle it on its own - */ - rx_ring->nr_frags = sinfo->nr_frags;
if (page_is_pfmemalloc(rx_buf->page)) xdp_buff_set_frag_pfmemalloc(xdp); @@ -939,20 +935,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, /** * ice_get_pgcnts - grab page_count() for gathered fragments * @rx_ring: Rx descriptor ring to store the page counts on + * @ntc: the next to clean element (not included in this frame!) * * This function is intended to be called right before running XDP * program so that the page recycling mechanism will be able to take * a correct decision regarding underlying pages; this is done in such * way as XDP program can change the refcount of page */ -static void ice_get_pgcnts(struct ice_rx_ring *rx_ring) +static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc) { - u32 nr_frags = rx_ring->nr_frags + 1; u32 idx = rx_ring->first_desc; struct ice_rx_buf *rx_buf; u32 cnt = rx_ring->count;
- for (int i = 0; i < nr_frags; i++) { + while (idx != ntc) { rx_buf = &rx_ring->rx_buf[idx]; rx_buf->pgcnt = page_count(rx_buf->page);
@@ -1125,62 +1121,51 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) }
/** - * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame * @rx_ring: Rx ring with all the auxiliary data * @xdp: XDP buffer carrying linear + frags part - * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage - * @ntc: a current next_to_clean value to be stored at rx_ring + * @ntc: the next to clean element (not included in this frame!) * @verdict: return code from XDP program execution * - * Walk through gathered fragments and satisfy internal page - * recycle mechanism; we take here an action related to verdict - * returned by XDP program; + * Called after XDP program is completed, or on error with verdict set to + * ICE_XDP_CONSUMED. + * + * Walk through buffers from first_desc to the end of the frame, releasing + * buffers and satisfying internal page recycle mechanism. The action depends + * on verdict from XDP program. */ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, - u32 *xdp_xmit, u32 ntc, u32 verdict) + u32 ntc, u32 verdict) { - u32 nr_frags = rx_ring->nr_frags + 1; u32 idx = rx_ring->first_desc; u32 cnt = rx_ring->count; - u32 post_xdp_frags = 1; struct ice_rx_buf *buf; - int i; + u32 xdp_frags = 0; + int i = 0;
if (unlikely(xdp_buff_has_frags(xdp))) - post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags; + xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
- for (i = 0; i < post_xdp_frags; i++) { + while (idx != ntc) { buf = &rx_ring->rx_buf[idx]; + if (++idx == cnt) + idx = 0;
- if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) { + /* An XDP program could release fragments from the end of the + * buffer. For these, we need to keep the pagecnt_bias as-is. + * To do this, only adjust pagecnt_bias for fragments up to + * the total remaining after the XDP program has run. + */ + if (verdict != ICE_XDP_CONSUMED) ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); - *xdp_xmit |= verdict; - } else if (verdict & ICE_XDP_CONSUMED) { + else if (i++ <= xdp_frags) buf->pagecnt_bias++; - } else if (verdict == ICE_XDP_PASS) { - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); - }
ice_put_rx_buf(rx_ring, buf); - - if (++idx == cnt) - idx = 0; - } - /* handle buffers that represented frags released by XDP prog; - * for these we keep pagecnt_bias as-is; refcount from struct page - * has been decremented within XDP prog and we do not have to increase - * the biased refcnt - */ - for (; i < nr_frags; i++) { - buf = &rx_ring->rx_buf[idx]; - ice_put_rx_buf(rx_ring, buf); - if (++idx == cnt) - idx = 0; }
xdp->data = NULL; rx_ring->first_desc = ntc; - rx_ring->nr_frags = 0; }
/** @@ -1260,6 +1245,10 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* retrieve a buffer from the ring */ rx_buf = ice_get_rx_buf(rx_ring, size, ntc);
+ /* Increment ntc before calls to ice_put_rx_mbuf() */ + if (++ntc == cnt) + ntc = 0; + if (!xdp->data) { void *hard_start;
@@ -1268,24 +1257,23 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) xdp_prepare_buff(xdp, hard_start, offset, size, !!offset); xdp_buff_clear_frags_flag(xdp); } else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) { - ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED); + ice_put_rx_mbuf(rx_ring, xdp, ntc, ICE_XDP_CONSUMED); break; } - if (++ntc == cnt) - ntc = 0;
/* skip if it is NOP desc */ if (ice_is_non_eop(rx_ring, rx_desc)) continue;
- ice_get_pgcnts(rx_ring); + ice_get_pgcnts(rx_ring, ntc); xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc); if (xdp_verdict == ICE_XDP_PASS) goto construct_skb; total_rx_bytes += xdp_get_buff_len(xdp); total_rx_pkts++;
- ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); + ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict); + xdp_xmit |= xdp_verdict & (ICE_XDP_TX | ICE_XDP_REDIR);
continue; construct_skb: @@ -1298,7 +1286,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) rx_ring->ring_stats->rx_stats.alloc_buf_failed++; xdp_verdict = ICE_XDP_CONSUMED; } - ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); + ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict);
if (!skb) break; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index a4b1e95146327..07155e615f75a 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -358,7 +358,6 @@ struct ice_rx_ring { struct ice_tx_ring *xdp_ring; struct ice_rx_ring *next; /* pointer to next ring in q_vector */ struct xsk_buff_pool *xsk_pool; - u32 nr_frags; u16 max_frame; u16 rx_buf_len; dma_addr_t dma; /* physical address of ring */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit e37084a26070c546ae7961ee135bbfb15fbe13fd ]
i40e has a feature which writes to memory location last descriptor successfully sent. Memory barrier in i40e_clean_tx_irq() was used to avoid forward-reading descriptor fields in case DD bit was not set. Having mentioned feature in place implies that such situation will not happen as we know in advance how many descriptors HW has dealt with.
Besides, this barrier placement was wrong. Idea is to have this protection *after* reading DD bit from HW descriptor, not before. Digging through git history showed me that indeed barrier was before DD bit check, anyways the commit introducing i40e_get_head() should have wiped it out altogether.
Also, there was one commit doing s/read_barrier_depends/smp_rmb when get head feature was already in place, but it was only theoretical based on ixgbe experiences, which is different in these terms as that driver has to read DD bit from HW descriptor.
Fixes: 1943d8ba9507 ("i40e/i40evf: enable hardware feature head write back") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index c006f716a3bdb..ca7517a68a2c3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -947,9 +947,6 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi, if (!eop_desc) break;
- /* prevent any other reads prior to eop_desc */ - smp_rmb(); - i40e_trace(clean_tx_irq, tx_ring, tx_desc, tx_buf); /* we have caught up to head, no work left to do */ if (tx_head == tx_desc)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
[ Upstream commit b85936e95a4bd2a07e134af71e2c0750a69d2b8b ]
Currently aci.lock is initialized too late. A bunch of ACI callbacks using the lock are called prior it's initialized.
Commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path") highlights that issue what results in call trace.
[ 4.092899] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 4.092910] WARNING: CPU: 0 PID: 578 at kernel/locking/mutex.c:154 mutex_lock+0x6d/0x80 [ 4.098757] Call Trace: [ 4.098847] <TASK> [ 4.098922] ixgbe_aci_send_cmd+0x8c/0x1e0 [ixgbe] [ 4.099108] ? hrtimer_try_to_cancel+0x18/0x110 [ 4.099277] ixgbe_aci_get_fw_ver+0x52/0xa0 [ixgbe] [ 4.099460] ixgbe_check_fw_error+0x1fc/0x2f0 [ixgbe] [ 4.099650] ? usleep_range_state+0x69/0xd0 [ 4.099811] ? usleep_range_state+0x8c/0xd0 [ 4.099964] ixgbe_probe+0x3b0/0x12d0 [ixgbe] [ 4.100132] local_pci_probe+0x43/0xa0 [ 4.100267] work_for_cpu_fn+0x13/0x20 [ 4.101647] </TASK>
Move aci.lock mutex initialization to ixgbe_sw_init() before any ACI command is sent. Along with that move also related SWFW semaphore in order to reduce size of ixgbe_probe() and that way all locks are initialized in ixgbe_sw_init().
Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device") Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index cba860f0e1f15..2a857037dd102 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6801,6 +6801,13 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter, break; }
+ /* Make sure the SWFW semaphore is in a valid state */ + if (hw->mac.ops.init_swfw_sync) + hw->mac.ops.init_swfw_sync(hw); + + if (hw->mac.type == ixgbe_mac_e610) + mutex_init(&hw->aci.lock); + #ifdef IXGBE_FCOE /* FCoE support exists, always init the FCoE lock */ spin_lock_init(&adapter->fcoe.lock); @@ -11474,10 +11481,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (err) goto err_sw_init;
- /* Make sure the SWFW semaphore is in a valid state */ - if (hw->mac.ops.init_swfw_sync) - hw->mac.ops.init_swfw_sync(hw); - if (ixgbe_check_fw_error(adapter)) return ixgbe_recovery_probe(adapter);
@@ -11681,8 +11684,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent) ether_addr_copy(hw->mac.addr, hw->mac.perm_addr); ixgbe_mac_set_default_filter(adapter);
- if (hw->mac.type == ixgbe_mac_e610) - mutex_init(&hw->aci.lock); timer_setup(&adapter->service_timer, ixgbe_service_timer, 0);
if (ixgbe_removed(hw->hw_addr)) { @@ -11838,9 +11839,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent) devl_unlock(adapter->devlink); ixgbe_release_hw_control(adapter); ixgbe_clear_interrupt_scheme(adapter); +err_sw_init: if (hw->mac.type == ixgbe_mac_e610) mutex_destroy(&adapter->hw.aci.lock); -err_sw_init: ixgbe_disable_sriov(adapter); adapter->flags2 &= ~IXGBE_FLAG2_SEARCH_FOR_SFP; iounmap(adapter->io_addr);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
[ Upstream commit 316ba68175b04a9f6f75295764789ea94e31d48c ]
There's another issue with aci.lock and previous patch uncovers it. aci.lock is being destroyed during removing ixgbe while some of the ixgbe closing routines are still ongoing. These routines use Admin Command Interface which require taking aci.lock which has been already destroyed what leads to call trace.
[ +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70 [ +0.000002] Call Trace: [ +0.000003] <TASK> [ +0.000006] ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe] [ +0.000049] ? try_to_wake_up+0x29d/0x5d0 [ +0.000009] ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe] [ +0.000032] ixgbe_disable_rx+0x3d/0x200 [ixgbe] [ +0.000027] ixgbe_down+0x102/0x3b0 [ixgbe] [ +0.000031] ixgbe_close_suspend+0x28/0x90 [ixgbe] [ +0.000028] ixgbe_close+0xfb/0x100 [ixgbe] [ +0.000025] __dev_close_many+0xae/0x220 [ +0.000005] dev_close_many+0xc2/0x1a0 [ +0.000004] ? kernfs_should_drain_open_files+0x2a/0x40 [ +0.000005] unregister_netdevice_many_notify+0x204/0xb00 [ +0.000006] ? __kernfs_remove.part.0+0x109/0x210 [ +0.000006] ? kobj_kset_leave+0x4b/0x70 [ +0.000008] unregister_netdevice_queue+0xf6/0x130 [ +0.000006] unregister_netdev+0x1c/0x40 [ +0.000005] ixgbe_remove+0x216/0x290 [ixgbe] [ +0.000021] pci_device_remove+0x42/0xb0 [ +0.000007] device_release_driver_internal+0x19c/0x200 [ +0.000008] driver_detach+0x48/0x90 [ +0.000003] bus_remove_driver+0x6d/0xf0 [ +0.000006] pci_unregister_driver+0x2e/0xb0 [ +0.000005] ixgbe_exit_module+0x1c/0xc80 [ixgbe]
Same as for the previous commit, the issue has been highlighted by the commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path").
Move destroying aci.lock to the end of ixgbe_remove(), as this simply fixes the issue.
Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device") Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 2a857037dd102..d5c421451f319 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -11892,10 +11892,8 @@ static void ixgbe_remove(struct pci_dev *pdev) set_bit(__IXGBE_REMOVING, &adapter->state); cancel_work_sync(&adapter->service_task);
- if (adapter->hw.mac.type == ixgbe_mac_e610) { + if (adapter->hw.mac.type == ixgbe_mac_e610) ixgbe_disable_link_status_events(adapter); - mutex_destroy(&adapter->hw.aci.lock); - }
if (adapter->mii_bus) mdiobus_unregister(adapter->mii_bus); @@ -11955,6 +11953,9 @@ static void ixgbe_remove(struct pci_dev *pdev) disable_dev = !test_and_set_bit(__IXGBE_DISABLED, &adapter->state); free_netdev(netdev);
+ if (adapter->hw.mac.type == ixgbe_mac_e610) + mutex_destroy(&adapter->hw.aci.lock); + if (disable_dev) pci_disable_device(pdev); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kohei Enju enjuk@amazon.com
[ Upstream commit 528eb4e19ec0df30d0c9ae4074ce945667dde919 ]
When igc_led_setup() fails, igc_probe() fails and triggers kernel panic in free_netdev() since unregister_netdev() is not called. [1] This behavior can be tested using fault-injection framework, especially the failslab feature. [2]
Since LED support is not mandatory, treat LED setup failures as non-fatal and continue probe with a warning message, consequently avoiding the kernel panic.
[1] kernel BUG at net/core/dev.c:12047! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:free_netdev+0x278/0x2b0 [...] Call Trace: <TASK> igc_probe+0x370/0x910 local_pci_probe+0x3a/0x80 pci_device_probe+0xd1/0x200 [...]
[2] #!/bin/bash -ex
FAILSLAB_PATH=/sys/kernel/debug/failslab/ DEVICE=0000:00:05.0 START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \ | awk '{printf("0x%s", $1)}') END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))
echo $START_ADDR > $FAILSLAB_PATH/require-start echo $END_ADDR > $FAILSLAB_PATH/require-end echo 1 > $FAILSLAB_PATH/times echo 100 > $FAILSLAB_PATH/probability echo N > $FAILSLAB_PATH/ignore-gfp-wait
echo $DEVICE > /sys/bus/pci/drivers/igc/bind
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") Signed-off-by: Kohei Enju enjuk@amazon.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Vitaly Lifshits vitaly.lifshits@intel.com Reviewed-by: Kurt Kanzenbach kurt@linutronix.de Tested-by: Mor Bar-Gabay morx.bar.gabay@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc.h | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h index 859a15e4ccbab..1bbe7f72757c0 100644 --- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -343,6 +343,7 @@ struct igc_adapter { /* LEDs */ struct mutex led_mutex; struct igc_led_classdev *leds; + bool leds_available; };
void igc_up(struct igc_adapter *adapter); diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 1b4465d6b2b72..5b8f9b5121489 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -7301,8 +7301,14 @@ static int igc_probe(struct pci_dev *pdev,
if (IS_ENABLED(CONFIG_IGC_LEDS)) { err = igc_led_setup(adapter); - if (err) - goto err_register; + if (err) { + netdev_warn_once(netdev, + "LED init failed (%d); continuing without LED support\n", + err); + adapter->leds_available = false; + } else { + adapter->leds_available = true; + } }
return 0; @@ -7358,7 +7364,7 @@ static void igc_remove(struct pci_dev *pdev) cancel_work_sync(&adapter->watchdog_task); hrtimer_cancel(&adapter->hrtimer);
- if (IS_ENABLED(CONFIG_IGC_LEDS)) + if (IS_ENABLED(CONFIG_IGC_LEDS) && adapter->leds_available) igc_led_free(adapter);
/* Release control of h/w to f/w. If f/w is AMT enabled, this
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remy D. Farley one-d-wide@protonmail.com
[ Upstream commit 109f8b51543d106aee50dfe911f439e43fb30c7a ]
I'm trying to generate Rust bindings for netlink using the yaml spec.
It looks like there's a typo in conntrack spec: attribute set conntrack-attrs defines attributes "counters-{orig,reply}" (plural), while get operation references "counter-{orig,reply}" (singular). The latter should be fixed, as it denotes multiple counters (packet and byte). The corresonding C define is CTA_COUNTERS_ORIG.
Also, dump request references "nfgen-family" attribute, which neither exists in conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct with the same name, which is where family value is actually taken from.
static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const cda[], struct netlink_ext_ack *extack) { int err; struct nfgenmsg *nfmsg = nlmsg_data(nlh); u_int8_t u3 = nfmsg->nfgen_family;
^^^^^^^^^^^^
Signed-off-by: Remy D. Farley one-d-wide@protonmail.com Fixes: 23fc9311a526 ("netlink: specs: add conntrack dump and stats dump support") Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/netlink/specs/conntrack.yaml | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/Documentation/netlink/specs/conntrack.yaml b/Documentation/netlink/specs/conntrack.yaml index 840dc4504216b..1865ddf01fb0f 100644 --- a/Documentation/netlink/specs/conntrack.yaml +++ b/Documentation/netlink/specs/conntrack.yaml @@ -575,8 +575,8 @@ operations: - nat-dst - timeout - mark - - counter-orig - - counter-reply + - counters-orig + - counters-reply - use - id - nat-dst @@ -591,7 +591,6 @@ operations: request: value: 0x101 attributes: - - nfgen-family - mark - filter - status @@ -608,8 +607,8 @@ operations: - nat-dst - timeout - mark - - counter-orig - - counter-reply + - counters-orig + - counters-reply - use - id - nat-dst
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jianbo Liu jianbol@nvidia.com
[ Upstream commit 6b4be64fd9fec16418f365c2d8e47a7566e9eba5 ]
The function mlx5_uplink_netdev_get() gets the uplink netdevice pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can be removed and its pointer cleared when unbound from the mlx5_core.eth driver. This results in a NULL pointer, causing a kernel panic.
BUG: unable to handle page fault for address: 0000000000001300 at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core] Call Trace: <TASK> mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core] esw_offloads_enable+0x593/0x910 [mlx5_core] mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core] devlink_nl_eswitch_set_doit+0x60/0xd0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x183/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x255/0x380 netlink_sendmsg+0x1f3/0x420 __sock_sendmsg+0x38/0x60 __sys_sendto+0x119/0x180 do_syscall_64+0x53/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53
Ensure the pointer is valid before use by checking it for NULL. If it is valid, immediately call netdev_hold() to take a reference, and preventing the netdevice from being freed while it is in use.
Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Jianbo Liu jianbol@nvidia.com Reviewed-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 27 +++++++++++++++---- .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 + .../ethernet/mellanox/mlx5/core/lib/mlx5.h | 15 ++++++++++- include/linux/mlx5/driver.h | 1 + 4 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 63a7a788fb0db..cd0242eb008c2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1506,12 +1506,21 @@ static const struct mlx5e_profile mlx5e_uplink_rep_profile = { static int mlx5e_vport_uplink_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep) { - struct mlx5e_priv *priv = netdev_priv(mlx5_uplink_netdev_get(dev)); struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep); + struct net_device *netdev; + struct mlx5e_priv *priv; + int err; + + netdev = mlx5_uplink_netdev_get(dev); + if (!netdev) + return 0;
+ priv = netdev_priv(netdev); rpriv->netdev = priv->netdev; - return mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile, - rpriv); + err = mlx5e_netdev_change_profile(priv, &mlx5e_uplink_rep_profile, + rpriv); + mlx5_uplink_netdev_put(dev, netdev); + return err; }
static void @@ -1638,8 +1647,16 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep) { struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep); struct net_device *netdev = rpriv->netdev; - struct mlx5e_priv *priv = netdev_priv(netdev); - void *ppriv = priv->ppriv; + struct mlx5e_priv *priv; + void *ppriv; + + if (!netdev) { + ppriv = rpriv; + goto free_ppriv; + } + + priv = netdev_priv(netdev); + ppriv = priv->ppriv;
if (rep->vport == MLX5_VPORT_UPLINK) { mlx5e_vport_uplink_rep_unload(rpriv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c index ad9f6fca9b6a2..c6476e943e98d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c @@ -743,6 +743,7 @@ static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev) speed = lksettings.base.speed;
out: + mlx5_uplink_netdev_put(mdev, slave); return speed; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h index 37d5f445598c7..a7486e6d0d5ef 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h @@ -52,7 +52,20 @@ static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
static inline struct net_device *mlx5_uplink_netdev_get(struct mlx5_core_dev *mdev) { - return mdev->mlx5e_res.uplink_netdev; + struct mlx5e_resources *mlx5e_res = &mdev->mlx5e_res; + struct net_device *netdev; + + mutex_lock(&mlx5e_res->uplink_netdev_lock); + netdev = mlx5e_res->uplink_netdev; + netdev_hold(netdev, &mlx5e_res->tracker, GFP_KERNEL); + mutex_unlock(&mlx5e_res->uplink_netdev_lock); + return netdev; +} + +static inline void mlx5_uplink_netdev_put(struct mlx5_core_dev *mdev, + struct net_device *netdev) +{ + netdev_put(netdev, &mdev->mlx5e_res.tracker); }
struct mlx5_sd; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e6ba8f4f4bd1f..27850ebb651b3 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -662,6 +662,7 @@ struct mlx5e_resources { bool tisn_valid; } hw_objs; struct net_device *uplink_netdev; + netdevice_tracker tracker; struct mutex uplink_netdev_lock; struct mlx5_crypto_dek_priv *dek_priv; };
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lama Kayal lkayal@nvidia.com
[ Upstream commit 7601a0a46216f4ba05adff2de75923b4e8e585c2 ]
The cited commit adds a miss table for switchdev mode. But it uses the same level as policy table. Will hit the following error when running command:
# ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \ esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \ 0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \ mode tunnel offload dev enp8s0f0 dir in Error: mlx5_core: Device failed to offload this state.
The dmesg error is:
mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22
Fix it by adding a new miss level to avoid the error.
Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption") Signed-off-by: Jianbo Liu jianbol@nvidia.com Signed-off-by: Chris Mi cmi@nvidia.com Signed-off-by: Lama Kayal lkayal@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c | 3 ++- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 ++-- 4 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h index 9560fcba643f5..ac65e31914802 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h @@ -92,6 +92,7 @@ enum { MLX5E_ACCEL_FS_ESP_FT_LEVEL = MLX5E_INNER_TTC_FT_LEVEL + 1, MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL, MLX5E_ACCEL_FS_POL_FT_LEVEL, + MLX5E_ACCEL_FS_POL_MISS_FT_LEVEL, MLX5E_ACCEL_FS_ESP_FT_ROCE_LEVEL, #endif }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h index ffcd0cdeb7754..23703f28386ad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h @@ -185,6 +185,7 @@ struct mlx5e_ipsec_rx_create_attr { u32 family; int prio; int pol_level; + int pol_miss_level; int sa_level; int status_level; enum mlx5_flow_namespace_type chains_ns; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c index 98b6a3a623f99..65dc3529283b6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c @@ -747,6 +747,7 @@ static void ipsec_rx_create_attr_set(struct mlx5e_ipsec *ipsec, attr->family = family; attr->prio = MLX5E_NIC_PRIO; attr->pol_level = MLX5E_ACCEL_FS_POL_FT_LEVEL; + attr->pol_miss_level = MLX5E_ACCEL_FS_POL_MISS_FT_LEVEL; attr->sa_level = MLX5E_ACCEL_FS_ESP_FT_LEVEL; attr->status_level = MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL; attr->chains_ns = MLX5_FLOW_NAMESPACE_KERNEL; @@ -833,7 +834,7 @@ static int ipsec_rx_chains_create_miss(struct mlx5e_ipsec *ipsec,
ft_attr.max_fte = 1; ft_attr.autogroup.max_num_groups = 1; - ft_attr.level = attr->pol_level; + ft_attr.level = attr->pol_miss_level; ft_attr.prio = attr->prio;
ft = mlx5_create_auto_grouped_flow_table(attr->ns, &ft_attr); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 29ce09af59aef..3b57ef6b3de38 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -114,9 +114,9 @@ #define ETHTOOL_NUM_PRIOS 11 #define ETHTOOL_MIN_LEVEL (KERNEL_MIN_LEVEL + ETHTOOL_NUM_PRIOS) /* Vlan, mac, ttc, inner ttc, {UDP/ANY/aRFS/accel/{esp, esp_err}}, IPsec policy, - * {IPsec RoCE MPV,Alias table},IPsec RoCE policy + * IPsec policy miss, {IPsec RoCE MPV,Alias table},IPsec RoCE policy */ -#define KERNEL_NIC_PRIO_NUM_LEVELS 10 +#define KERNEL_NIC_PRIO_NUM_LEVELS 11 #define KERNEL_NIC_NUM_PRIOS 1 /* One more level for tc, and one more for promisc */ #define KERNEL_MIN_LEVEL (KERNEL_NIC_PRIO_NUM_LEVELS + 2)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit a8ba87f04ca9cdec06776ce92dce1395026dc3bb ]
Unlike IPv4, IPv6 routing strictly requires the source address to be valid on the outgoing interface. If the NS target is set to a remote VLAN interface, and the source address is also configured on a VLAN over a bond interface, setting the oif to the bond device will fail to retrieve the correct destination route.
Fix this by not setting the oif to the bond device when retrieving the NS target destination. This allows the correct destination device (the VLAN interface) to be determined, so that bond_verify_device_path can return the proper VLAN tags for sending NS messages.
Reported-by: David Wilder wilder@us.ibm.com Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/ Suggested-by: Jay Vosburgh jv@jvosburgh.net Tested-by: David Wilder wilder@us.ibm.com Acked-by: Jay Vosburgh jv@jvosburgh.net Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e413340be2bff..e23195dd74776 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3339,7 +3339,6 @@ static void bond_ns_send_all(struct bonding *bond, struct slave *slave) /* Find out through which dev should the packet go */ memset(&fl6, 0, sizeof(struct flowi6)); fl6.daddr = targets[i]; - fl6.flowi6_oif = bond->dev->ifindex;
dst = ip6_route_output(dev_net(bond->dev), NULL, &fl6); if (dst->error) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sathesh B Edara sedara@marvell.com
[ Upstream commit a72175c985132885573593222a7b088cf49b07ae ]
Currently, VF MAC address info is not updated when the MAC address is configured from VF, and it is not cleared when the VF is removed. This leads to stale or missing MAC information in the PF, which may cause incorrect state tracking or inconsistencies when VFs are hot-plugged or reassigned.
Fix this by: - storing the VF MAC address in the PF when it is set from VF - clearing the stored VF MAC address when the VF is removed
This ensures that the PF always has correct VF MAC state.
Fixes: cde29af9e68e ("octeon_ep: add PF-VF mailbox communication") Signed-off-by: Sathesh B Edara sedara@marvell.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250916133207.21737-1-sedara@marvell.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c b/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c index ebecdd29f3bd0..0867fab61b190 100644 --- a/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c @@ -196,6 +196,7 @@ static void octep_pfvf_get_mac_addr(struct octep_device *oct, u32 vf_id, vf_id); return; } + ether_addr_copy(oct->vf_info[vf_id].mac_addr, rsp->s_set_mac.mac_addr); rsp->s_set_mac.type = OCTEP_PFVF_MBOX_TYPE_RSP_ACK; }
@@ -205,6 +206,8 @@ static void octep_pfvf_dev_remove(struct octep_device *oct, u32 vf_id, { int err;
+ /* Reset VF-specific information maintained by the PF */ + memset(&oct->vf_info[vf_id], 0, sizeof(struct octep_pfvf_info)); err = octep_ctrl_net_dev_remove(oct, vf_id); if (err) { rsp->s.type = OCTEP_PFVF_MBOX_TYPE_RSP_NACK;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@google.com
[ Upstream commit 45c8a6cc2bcd780e634a6ba8e46bffbdf1fc5c01 ]
syzbot reported the splat below where a socket had tcp_sk(sk)->fastopen_rsk in the TCP_ESTABLISHED state. [0]
syzbot reused the server-side TCP Fast Open socket as a new client before the TFO socket completes 3WHS:
1. accept() 2. connect(AF_UNSPEC) 3. connect() to another destination
As of accept(), sk->sk_state is TCP_SYN_RECV, and tcp_disconnect() changes it to TCP_CLOSE and makes connect() possible, which restarts timers.
Since tcp_disconnect() forgot to clear tcp_sk(sk)->fastopen_rsk, the retransmit timer triggered the warning and the intended packet was not retransmitted.
Let's call reqsk_fastopen_remove() in tcp_disconnect().
[0]: WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7)) Modules linked in: CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.17.0-rc5-g201825fb4278 #62 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7)) Code: 41 55 41 54 55 53 48 8b af b8 08 00 00 48 89 fb 48 85 ed 0f 84 55 01 00 00 0f b6 47 12 3c 03 74 0c 0f b6 47 12 3c 04 74 04 90 <0f> 0b 90 48 8b 85 c0 00 00 00 48 89 ef 48 8b 40 30 e8 6a 4f 06 3e RSP: 0018:ffffc900002f8d40 EFLAGS: 00010293 RAX: 0000000000000002 RBX: ffff888106911400 RCX: 0000000000000017 RDX: 0000000002517619 RSI: ffffffff83764080 RDI: ffff888106911400 RBP: ffff888106d5c000 R08: 0000000000000001 R09: ffffc900002f8de8 R10: 00000000000000c2 R11: ffffc900002f8ff8 R12: ffff888106911540 R13: ffff888106911480 R14: ffff888106911840 R15: ffffc900002f8de0 FS: 0000000000000000(0000) GS:ffff88907b768000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8044d69d90 CR3: 0000000002c30003 CR4: 0000000000370ef0 Call Trace: <IRQ> tcp_write_timer (net/ipv4/tcp_timer.c:738) call_timer_fn (kernel/time/timer.c:1747) __run_timers (kernel/time/timer.c:1799 kernel/time/timer.c:2372) timer_expire_remote (kernel/time/timer.c:2385 kernel/time/timer.c:2376 kernel/time/timer.c:2135) tmigr_handle_remote_up (kernel/time/timer_migration.c:944 kernel/time/timer_migration.c:1035) __walk_groups.isra.0 (kernel/time/timer_migration.c:533 (discriminator 1)) tmigr_handle_remote (kernel/time/timer_migration.c:1096) handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580) irq_exit_rcu (kernel/softirq.c:614 kernel/softirq.c:453 kernel/softirq.c:680 kernel/softirq.c:696) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1050 (discriminator 35) arch/x86/kernel/apic/apic.c:1050 (discriminator 35)) </IRQ>
Fixes: 8336886f786f ("tcp: TCP Fast Open Server - support TFO listeners") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@google.com Link: https://patch.msgid.link/20250915175800.118793-2-kuniyu@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 461a9ab540af0..98da33e0c308b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3330,6 +3330,7 @@ int tcp_disconnect(struct sock *sk, int flags) struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int old_state = sk->sk_state; + struct request_sock *req; u32 seq;
if (old_state != TCP_CLOSE) @@ -3445,6 +3446,10 @@ int tcp_disconnect(struct sock *sk, int flags)
/* Clean up fastopen related fields */ + req = rcu_dereference_protected(tp->fastopen_rsk, + lockdep_sock_is_held(sk)); + if (req) + reqsk_fastopen_remove(sk, req, false); tcp_free_fastopen_req(tp); inet_clear_bit(DEFER_CONNECT, sk); tp->fastopen_client_fail = 0;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 0aeb54ac4cd5cf8f60131b4d9ec0b6dc9c27b20d ]
Normally we wait for the socket to buffer up the whole record before we service it. If the socket has a tiny buffer, however, we read out the data sooner, to prevent connection stalls. Make sure that we abort the connection when we find out late that the record is actually invalid. Retrying the parsing is fine in itself but since we copy some more data each time before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without enough data in the socket to parse the length upfront is quite hard. syzbot figured out a way to do this by serving us the header in small OOB sends, and then filling in the recvbuf with a large normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach an invalid record there's really no way to recover.
Reported-by: Lee Jones lee@kernel.org Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Reviewed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Jakub Kicinski kuba@kernel.org Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls.h | 1 + net/tls/tls_strp.c | 14 +++++++++----- net/tls/tls_sw.c | 3 +-- 3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h index 4e077068e6d98..e4c42731ce39a 100644 --- a/net/tls/tls.h +++ b/net/tls/tls.h @@ -141,6 +141,7 @@ void update_sk_prot(struct sock *sk, struct tls_context *ctx);
int wait_on_pending_writer(struct sock *sk, long *timeo); void tls_err_abort(struct sock *sk, int err); +void tls_strp_abort_strp(struct tls_strparser *strp, int err);
int init_prot_info(struct tls_prot_info *prot, const struct tls_crypto_info *crypto_info, diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c index d71643b494a1a..98e12f0ff57e5 100644 --- a/net/tls/tls_strp.c +++ b/net/tls/tls_strp.c @@ -13,7 +13,7 @@
static struct workqueue_struct *tls_strp_wq;
-static void tls_strp_abort_strp(struct tls_strparser *strp, int err) +void tls_strp_abort_strp(struct tls_strparser *strp, int err) { if (strp->stopped) return; @@ -211,11 +211,17 @@ static int tls_strp_copyin_frag(struct tls_strparser *strp, struct sk_buff *skb, struct sk_buff *in_skb, unsigned int offset, size_t in_len) { + unsigned int nfrag = skb->len / PAGE_SIZE; size_t len, chunk; skb_frag_t *frag; int sz;
- frag = &skb_shinfo(skb)->frags[skb->len / PAGE_SIZE]; + if (unlikely(nfrag >= skb_shinfo(skb)->nr_frags)) { + DEBUG_NET_WARN_ON_ONCE(1); + return -EMSGSIZE; + } + + frag = &skb_shinfo(skb)->frags[nfrag];
len = in_len; /* First make sure we got the header */ @@ -520,10 +526,8 @@ static int tls_strp_read_sock(struct tls_strparser *strp) tls_strp_load_anchor_with_queue(strp, inq); if (!strp->stm.full_len) { sz = tls_rx_msg_size(strp, strp->anchor); - if (sz < 0) { - tls_strp_abort_strp(strp, sz); + if (sz < 0) return sz; - }
strp->stm.full_len = sz;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index bac65d0d4e3e1..daac9fd4be7eb 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2474,8 +2474,7 @@ int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb) return data_len + TLS_HEADER_SIZE;
read_failure: - tls_err_abort(strp->sk, ret); - + tls_strp_abort_strp(strp, ret); return ret; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tariq Toukan tariqt@nvidia.com
[ Upstream commit 3fbfe251cc9f6d391944282cdb9bcf0bd02e01f8 ]
This reverts commit d24341740fe48add8a227a753e68b6eedf4b385a. It causes errors when trying to configure QoS, as well as loss of L2 connectivity (on multi-host devices).
Reported-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/20250910170011.70528106@kernel.org Fixes: d24341740fe4 ("net/mlx5e: Update and set Xon/Xoff upon port speed set") Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e39c51cfc8e6c..f0142d32b648f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -136,8 +136,6 @@ void mlx5e_update_carrier(struct mlx5e_priv *priv) if (up) { netdev_info(priv->netdev, "Link up\n"); netif_carrier_on(priv->netdev); - mlx5e_port_manual_buffer_config(priv, 0, priv->netdev->mtu, - NULL, NULL, NULL); } else { netdev_info(priv->netdev, "Link down\n"); netif_carrier_off(priv->netdev);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 87ebb628a5acb892eba41ef1d8989beb8f036034 ]
Andrei Vagin reported that blamed commit broke CRIU.
Indeed, while we want to keep sk_uid unchanged when a socket is cloned, we want to clear sk->sk_ino.
Otherwise, sock_diag might report multiple sockets sharing the same inode number.
Move the clearing part from sock_orphan() to sk_set_socket(sk, NULL), called both from sock_orphan() and sk_clone_lock().
Fixes: 5d6b58c932ec ("net: lockless sock_i_ino()") Closes: https://lore.kernel.org/netdev/aMhX-VnXkYDpKd9V@google.com/ Closes: https://github.com/checkpoint-restore/criu/issues/2744 Reported-by: Andrei Vagin avagin@google.com Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Andrei Vagin avagin@google.com Link: https://patch.msgid.link/20250917135337.1736101-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sock.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index a348ae145eda4..6e9f4c126672d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2061,6 +2061,9 @@ static inline void sk_set_socket(struct sock *sk, struct socket *sock) if (sock) { WRITE_ONCE(sk->sk_uid, SOCK_INODE(sock)->i_uid); WRITE_ONCE(sk->sk_ino, SOCK_INODE(sock)->i_ino); + } else { + /* Note: sk_uid is unchanged. */ + WRITE_ONCE(sk->sk_ino, 0); } }
@@ -2082,8 +2085,6 @@ static inline void sock_orphan(struct sock *sk) sock_set_flag(sk, SOCK_DEAD); sk_set_socket(sk, NULL); sk->sk_wq = NULL; - /* Note: sk_uid is unchanged. */ - WRITE_ONCE(sk->sk_ino, 0); write_unlock_bh(&sk->sk_callback_lock); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Nepomnyashih sdl@nppct.ru
[ Upstream commit cca7b1cfd7b8a0eff2a3510c5e0f10efe8fa3758 ]
The expression `(conf->instr_type == 64) << iq_no` can overflow because `iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the operand to `u64` ensures correct 64-bit arithmetic.
Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters") Signed-off-by: Alexey Nepomnyashih sdl@nppct.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/liquidio/request_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c index de8a6ce86ad7e..12105ffb5dac6 100644 --- a/drivers/net/ethernet/cavium/liquidio/request_manager.c +++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c @@ -126,7 +126,7 @@ int octeon_init_instr_queue(struct octeon_device *oct, oct->io_qmask.iq |= BIT_ULL(iq_no);
/* Set the 32B/64B mode for each input queue */ - oct->io_qmask.iq64B |= ((conf->instr_type == 64) << iq_no); + oct->io_qmask.iq64B |= ((u64)(conf->instr_type == 64) << iq_no); iq->iqcmd_64B = (conf->instr_type == 64);
oct->fn_list.setup_iq_regs(oct, iq_no);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit cfa7d9b1e3a8604afc84e9e51d789c29574fb216 ]
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(), which does not guarantee that the delayed work item 'delete_task' has fully completed if it was already running. Additionally, the delayed work item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only blocks and waits for work items that were already queued to the workqueue prior to its invocation. Any work items submitted after flush_workqueue() is called are not included in the set of tasks that the flush operation awaits. This means that after the cyclic work items have finished executing, a delayed work item may still exist in the workqueue. This leads to use-after-free scenarios where the cnic_dev is deallocated by cnic_free_dev(), while delete_task remains active and attempt to dereference cnic_dev in cnic_delete_task().
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback) cnic_netdev_event() | cnic_stop_hw() | cnic_delete_task() cnic_cm_stop_bnx2x_hw() | ... cancel_delayed_work() | /* the queue_delayed_work() flush_workqueue() | executes after flush_workqueue()*/ | queue_delayed_work() cnic_free_dev(dev)//free | cnic_delete_task() //new instance | dev = cp->dev; //use
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure that the cyclic delayed work item is properly canceled and that any ongoing execution of the work item completes before the cnic_dev is deallocated. Furthermore, since cancel_delayed_work_sync() uses __flush_work(work, true) to synchronously wait for any currently executing instance of the work item to finish, the flush_workqueue() becomes redundant and should be removed.
This bug was identified through static analysis. To reproduce the issue and validate the fix, I simulated the cnic PCI device in QEMU and introduced intentional delays — such as inserting calls to ssleep() within the cnic_delete_task() function — to increase the likelihood of triggering the bug.
Fixes: fdf24086f475 ("cnic: Defer iscsi connection cleanup") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/cnic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c index a9040c42d2ff9..6e97a5a7daaf9 100644 --- a/drivers/net/ethernet/broadcom/cnic.c +++ b/drivers/net/ethernet/broadcom/cnic.c @@ -4230,8 +4230,7 @@ static void cnic_cm_stop_bnx2x_hw(struct cnic_dev *dev)
cnic_bnx2x_delete_wait(dev, 0);
- cancel_delayed_work(&cp->delete_task); - flush_workqueue(cnic_wq); + cancel_delayed_work_sync(&cp->delete_task);
if (atomic_read(&cp->iscsi_conn) != 0) netdev_warn(dev->netdev, "%d iSCSI connections not destroyed\n",
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit f8b4687151021db61841af983f1cb7be6915d4ef ]
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(), which does not ensure that the delayed work item synctstamp_work has fully completed if it was already running. This leads to use-after-free scenarios where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp(). Furthermore, the synctstamp_work is cyclic, the likelihood of triggering the bug is nonnegligible.
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback) otx2_remove() | otx2_ptp_destroy() | otx2_sync_tstamp() cancel_delayed_work() | kfree(ptp) | | ptp = container_of(...); //UAF | ptp-> //UAF
This is confirmed by a KASAN report:
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0 Write of size 8 at addr ffff88800aa09a18 by task bash/136 ... Call Trace: <IRQ> dump_stack_lvl+0x55/0x70 print_report+0xcf/0x610 ? __run_timer_base.part.0+0x7d7/0x8c0 kasan_report+0xb8/0xf0 ? __run_timer_base.part.0+0x7d7/0x8c0 __run_timer_base.part.0+0x7d7/0x8c0 ? __pfx___run_timer_base.part.0+0x10/0x10 ? __pfx_read_tsc+0x10/0x10 ? ktime_get+0x60/0x140 ? lapic_next_event+0x11/0x20 ? clockevents_program_event+0x1d4/0x2a0 run_timer_softirq+0xd1/0x190 handle_softirqs+0x16a/0x550 irq_exit_rcu+0xaf/0xe0 sysvec_apic_timer_interrupt+0x70/0x80 </IRQ> ... Allocated by task 1: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x7f/0x90 otx2_ptp_init+0xb1/0x860 otx2_probe+0x4eb/0xc30 local_pci_probe+0xdc/0x190 pci_device_probe+0x2fe/0x470 really_probe+0x1ca/0x5c0 __driver_probe_device+0x248/0x310 driver_probe_device+0x44/0x120 __driver_attach+0xd2/0x310 bus_for_each_dev+0xed/0x170 bus_add_driver+0x208/0x500 driver_register+0x132/0x460 do_one_initcall+0x89/0x300 kernel_init_freeable+0x40d/0x720 kernel_init+0x1a/0x150 ret_from_fork+0x10c/0x1a0 ret_from_fork_asm+0x1a/0x30
Freed by task 136: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3a/0x60 __kasan_slab_free+0x3f/0x50 kfree+0x137/0x370 otx2_ptp_destroy+0x38/0x80 otx2_remove+0x10d/0x4c0 pci_device_remove+0xa6/0x1d0 device_release_driver_internal+0xf8/0x210 pci_stop_bus_device+0x105/0x150 pci_stop_and_remove_bus_device_locked+0x15/0x30 remove_store+0xcc/0xe0 kernfs_fop_write_iter+0x2c3/0x440 vfs_write+0x871/0xd70 ksys_write+0xee/0x1c0 do_syscall_64+0xac/0x280 entry_SYSCALL_64_after_hwframe+0x77/0x7f ...
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure that the delayed work item is properly canceled before the otx2_ptp is deallocated.
This bug was initially identified through static analysis. To reproduce and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced artificial delays within the otx2_sync_tstamp() function to increase the likelihood of triggering the bug.
Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Vadim Fedorenko vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c @@ -491,7 +491,7 @@ void otx2_ptp_destroy(struct otx2_nic *p if (!ptp) return;
- cancel_delayed_work(&pfvf->ptp->synctstamp_work); + cancel_delayed_work_sync(&pfvf->ptp->synctstamp_work);
ptp_clock_unregister(ptp->ptp_clock); kfree(ptp);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit 5282491fc49d5614ac6ddcd012e5743eecb6a67c upstream.
If data_offset and data_length of smb_direct_data_transfer struct are invalid, out of bounds issue could happen. This patch validate data_offset and data_length field in recv_done.
Cc: stable@vger.kernel.org Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct") Reviewed-by: Stefan Metzmacher metze@samba.org Reported-by: Luigino Camastra, Aisle Research luigino.camastra@aisle.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/transport_rdma.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
--- a/fs/smb/server/transport_rdma.c +++ b/fs/smb/server/transport_rdma.c @@ -554,7 +554,7 @@ static void recv_done(struct ib_cq *cq, case SMB_DIRECT_MSG_DATA_TRANSFER: { struct smb_direct_data_transfer *data_transfer = (struct smb_direct_data_transfer *)recvmsg->packet; - unsigned int data_length; + unsigned int data_offset, data_length; int avail_recvmsg_count, receive_credits;
if (wc->byte_len < @@ -565,14 +565,15 @@ static void recv_done(struct ib_cq *cq, }
data_length = le32_to_cpu(data_transfer->data_length); - if (data_length) { - if (wc->byte_len < sizeof(struct smb_direct_data_transfer) + - (u64)data_length) { - put_recvmsg(t, recvmsg); - smb_direct_disconnect_rdma_connection(t); - return; - } + data_offset = le32_to_cpu(data_transfer->data_offset); + if (wc->byte_len < data_offset || + wc->byte_len < (u64)data_offset + data_length) { + put_recvmsg(t, recvmsg); + smb_direct_disconnect_rdma_connection(t); + return; + }
+ if (data_length) { if (t->full_packet_received) recvmsg->first_segment = true;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
commit e1868ba37fd27c6a68e31565402b154beaa65df0 upstream.
This is inspired by the check for data_offset + data_length.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable@vger.kernel.org Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct") Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/transport_rdma.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/fs/smb/server/transport_rdma.c +++ b/fs/smb/server/transport_rdma.c @@ -554,7 +554,7 @@ static void recv_done(struct ib_cq *cq, case SMB_DIRECT_MSG_DATA_TRANSFER: { struct smb_direct_data_transfer *data_transfer = (struct smb_direct_data_transfer *)recvmsg->packet; - unsigned int data_offset, data_length; + u32 remaining_data_length, data_offset, data_length; int avail_recvmsg_count, receive_credits;
if (wc->byte_len < @@ -564,6 +564,7 @@ static void recv_done(struct ib_cq *cq, return; }
+ remaining_data_length = le32_to_cpu(data_transfer->remaining_data_length); data_length = le32_to_cpu(data_transfer->data_length); data_offset = le32_to_cpu(data_transfer->data_offset); if (wc->byte_len < data_offset || @@ -571,6 +572,14 @@ static void recv_done(struct ib_cq *cq, put_recvmsg(t, recvmsg); smb_direct_disconnect_rdma_connection(t); return; + } + if (remaining_data_length > t->max_fragmented_recv_size || + data_length > t->max_fragmented_recv_size || + (u64)remaining_data_length + (u64)data_length > + (u64)t->max_fragmented_recv_size) { + put_recvmsg(t, recvmsg); + smb_direct_disconnect_rdma_connection(t); + return; }
if (data_length) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergey Senozhatsky senozhatsky@chromium.org
commit ce4be9e4307c5a60701ff6e0cafa74caffdc54ce upstream.
Parallel concurrent writes to the same zram index result in leaked zsmalloc handles. Schematically we can have something like this:
CPU0 CPU1 zram_slot_lock() zs_free(handle) zram_slot_lock() zram_slot_lock() zs_free(handle) zram_slot_lock()
compress compress handle = zs_malloc() handle = zs_malloc() zram_slot_lock zram_set_handle(handle) zram_slot_lock zram_slot_lock zram_set_handle(handle) zram_slot_lock
Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done too early. In fact, we need to reset zram entry right before we set its new handle, all under the same slot lock scope.
Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org Fixes: 71268035f5d7 ("zram: free slot memory early during write") Signed-off-by: Sergey Senozhatsky senozhatsky@chromium.org Reported-by: Changhui Zhong czhong@redhat.com Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf... Tested-by: Changhui Zhong czhong@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: Minchan Kim minchan@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/zram/zram_drv.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1794,6 +1794,7 @@ static int write_same_filled_page(struct u32 index) { zram_slot_lock(zram, index); + zram_free_page(zram, index); zram_set_flag(zram, index, ZRAM_SAME); zram_set_handle(zram, index, fill); zram_slot_unlock(zram, index); @@ -1831,6 +1832,7 @@ static int write_incompressible_page(str kunmap_local(src);
zram_slot_lock(zram, index); + zram_free_page(zram, index); zram_set_flag(zram, index, ZRAM_HUGE); zram_set_handle(zram, index, handle); zram_set_obj_size(zram, index, PAGE_SIZE); @@ -1854,11 +1856,6 @@ static int zram_write_page(struct zram * unsigned long element; bool same_filled;
- /* First, free memory allocated to this slot (if any) */ - zram_slot_lock(zram, index); - zram_free_page(zram, index); - zram_slot_unlock(zram, index); - mem = kmap_local_page(page); same_filled = page_same_filled(mem, &element); kunmap_local(mem); @@ -1900,6 +1897,7 @@ static int zram_write_page(struct zram * zcomp_stream_put(zstrm);
zram_slot_lock(zram, index); + zram_free_page(zram, index); zram_set_handle(zram, index, handle); zram_set_obj_size(zram, index, comp_len); zram_slot_unlock(zram, index);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 025e87f8ea2ae3a28bf1fe2b052bfa412c27ed4a upstream.
When accessing one of the files under /sys/fs/nilfs2/features when CONFIG_CFI_CLANG is enabled, there is a CFI violation:
CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d) ... Call Trace: <TASK> sysfs_kf_seq_show+0x2a6/0x390 ? __cfi_kobj_attr_show+0x10/0x10 kernfs_seq_show+0x104/0x15b seq_read_iter+0x580/0xe2b ...
When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When nilfs_feature_attr_group is added to that kobject via sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw, which will call sysfs_kf_seq_show() when ->seq_show() is called. sysfs_kf_seq_show() in turn calls kobj_attr_show() through ->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to a 'struct kobj_attribute' via container_of() and calls ->show(), resulting in the CFI violation since neither nilfs_feature_revision_show() nor nilfs_feature_README_show() match the prototype of ->show() in 'struct kobj_attribute'.
Resolve the CFI violation by adjusting the second parameter in nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct kobj_attribute' to match the expected prototype.
Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com Fixes: aebe17f68444 ("nilfs2: add /sys/fs/nilfs2/features group") Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/sysfs.c | 4 ++-- fs/nilfs2/sysfs.h | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -1075,7 +1075,7 @@ void nilfs_sysfs_delete_device_group(str ************************************************************************/
static ssize_t nilfs_feature_revision_show(struct kobject *kobj, - struct attribute *attr, char *buf) + struct kobj_attribute *attr, char *buf) { return sysfs_emit(buf, "%d.%d\n", NILFS_CURRENT_REV, NILFS_MINOR_REV); @@ -1087,7 +1087,7 @@ static const char features_readme_str[] "(1) revision\n\tshow current revision of NILFS file system driver.\n";
static ssize_t nilfs_feature_README_show(struct kobject *kobj, - struct attribute *attr, + struct kobj_attribute *attr, char *buf) { return sysfs_emit(buf, features_readme_str); --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -50,16 +50,16 @@ struct nilfs_sysfs_dev_subgroups { struct completion sg_segments_kobj_unregister; };
-#define NILFS_COMMON_ATTR_STRUCT(name) \ +#define NILFS_KOBJ_ATTR_STRUCT(name) \ struct nilfs_##name##_attr { \ struct attribute attr; \ - ssize_t (*show)(struct kobject *, struct attribute *, \ + ssize_t (*show)(struct kobject *, struct kobj_attribute *, \ char *); \ - ssize_t (*store)(struct kobject *, struct attribute *, \ + ssize_t (*store)(struct kobject *, struct kobj_attribute *, \ const char *, size_t); \ }
-NILFS_COMMON_ATTR_STRUCT(feature); +NILFS_KOBJ_ATTR_STRUCT(feature);
#define NILFS_DEV_ATTR_STRUCT(name) \ struct nilfs_##name##_attr { \
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu herbert@gondor.apana.org.au
commit 1b34cbbf4f011a121ef7b2d7d6e6920a036d5285 upstream.
Issuing two writes to the same af_alg socket is bogus as the data will be interleaved in an unpredictable fashion. Furthermore, concurrent writes may create inconsistencies in the internal socket state.
Disallow this by adding a new ctx->write field that indiciates exclusive ownership for writing.
Fixes: 8ff590903d5 ("crypto: algif_skcipher - User-space interface for skcipher operations") Reported-by: Muhammad Alifa Ramdhan ramdhan@starlabs.sg Reported-by: Bing-Jhong Billy Jheng billy@starlabs.sg Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/af_alg.c | 7 +++++++ include/crypto/if_alg.h | 10 ++++++---- 2 files changed, 13 insertions(+), 4 deletions(-)
--- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -970,6 +970,12 @@ int af_alg_sendmsg(struct socket *sock, }
lock_sock(sk); + if (ctx->write) { + release_sock(sk); + return -EBUSY; + } + ctx->write = true; + if (ctx->init && !ctx->more) { if (ctx->used) { err = -EINVAL; @@ -1104,6 +1110,7 @@ int af_alg_sendmsg(struct socket *sock,
unlock: af_alg_data_wakeup(sk); + ctx->write = false; release_sock(sk);
return copied ?: err; --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -135,6 +135,7 @@ struct af_alg_async_req { * SG? * @enc: Cryptographic operation to be performed when * recvmsg is invoked. + * @write: True if we are in the middle of a write. * @init: True if metadata has been sent. * @len: Length of memory allocated for this data structure. * @inflight: Non-zero when AIO requests are in flight. @@ -151,10 +152,11 @@ struct af_alg_ctx { size_t used; atomic_t rcvused;
- bool more; - bool merge; - bool enc; - bool init; + u32 more:1, + merge:1, + enc:1, + write:1, + init:1;
unsigned int len;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: H. Nikolaus Schaller hns@goldelico.com
commit 2c334d038466ac509468fbe06905a32d202117db upstream.
Since commit
commit f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy")
the console log of some devices with hdq enabled but no bq27000 battery (like e.g. the Pandaboard) is flooded with messages like:
[ 34.247833] power_supply bq27000-battery: driver failed to report 'status' property: -1
as soon as user-space is finding a /sys entry and trying to read the "status" property.
It turns out that the offending commit changes the logic to now return the value of cache.flags if it is <0. This is likely under the assumption that it is an error number. In normal errors from bq27xxx_read() this is indeed the case.
But there is special code to detect if no bq27000 is installed or accessible through hdq/1wire and wants to report this. In that case, the cache.flags are set historically by
commit 3dd843e1c26a ("bq27000: report missing device better.")
to constant -1 which did make reading properties return -ENODEV. So everything appeared to be fine before the return value was passed upwards.
Now the -1 is returned as -EPERM instead of -ENODEV, triggering the error condition in power_supply_format_property() which then floods the console log.
So we change the detection of missing bq27000 battery to simply set
cache.flags = -ENODEV
instead of -1.
Fixes: f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy") Cc: Jerry Lv Jerry.Lv@axis.com Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller hns@goldelico.com Link: https://lore.kernel.org/r/692f79eb6fd541adb397038ea6e750d4de2deddf.175594529... Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/power/supply/bq27xxx_battery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -1920,7 +1920,7 @@ static void bq27xxx_battery_update_unloc
cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag); if ((cache.flags & 0xff) == 0xff) - cache.flags = -1; /* read error */ + cache.flags = -ENODEV; /* read error */ if (cache.flags >= 0) { cache.capacity = bq27xxx_battery_read_soc(di);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: H. Nikolaus Schaller hns@goldelico.com
commit 1e451977e1703b6db072719b37cd1b8e250b9cc9 upstream.
There are fuel gauges in the bq27xxx series (e.g. bq27z561) which may in some cases report 0xff as the value of BQ27XXX_REG_FLAGS that should not be interpreted as "no battery" like for a disconnected battery with some built in bq27000 chip.
So restrict the no-battery detection originally introduced by
commit 3dd843e1c26a ("bq27000: report missing device better.")
to the bq27000.
There is no need to backport further because this was hidden before
commit f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy")
Fixes: f16d9fb6cf03 ("power: supply: bq27xxx: Retrieve again when busy") Suggested-by: Jerry Lv Jerry.Lv@axis.com Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller hns@goldelico.com Link: https://lore.kernel.org/r/dd979fa6855fd051ee5117016c58daaa05966e24.175594529... Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/power/supply/bq27xxx_battery.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -1919,8 +1919,8 @@ static void bq27xxx_battery_update_unloc bool has_singe_flag = di->opts & BQ27XXX_O_ZERO;
cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag); - if ((cache.flags & 0xff) == 0xff) - cache.flags = -ENODEV; /* read error */ + if (di->chip == BQ27000 && (cache.flags & 0xff) == 0xff) + cache.flags = -ENODEV; /* bq27000 hdq read error */ if (cache.flags >= 0) { cache.capacity = bq27xxx_battery_read_soc(di);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrea Righi arighi@nvidia.com
commit 0b47b6c3543efd65f2e620e359b05f4938314fbd upstream.
scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a CPU is taken by a higher scheduling class to give tasks queued to the CPU's local DSQ a chance to be migrated somewhere else, instead of waiting indefinitely for that CPU to become available again.
In doing so, we decided to skip migration-disabled tasks, under the assumption that they cannot be migrated anyway.
However, when a higher scheduling class preempts a CPU, the running task is always inserted at the head of the local DSQ as a migration-disabled task. This means it is always skipped by scx_bpf_reenqueue_local(), and ends up being confined to the same CPU even if that CPU is heavily contended by other higher scheduling class tasks.
As an example, let's consider the following scenario:
$ schedtool -a 0,1, -e yes > /dev/null $ sudo schedtool -F -p 99 -a 0, -e \ stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000
The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task (SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT task initially runs on CPU0, it will remain there because it always sees CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1% utilization while CPU1 stays idle:
0[||||||||||||||||||||||100.0%] 8[ 0.0%] 1[ 0.0%] 9[ 0.0%] 2[ 0.0%] 10[ 0.0%] 3[ 0.0%] 11[ 0.0%] 4[ 0.0%] 12[ 0.0%] 5[ 0.0%] 13[ 0.0%] 6[ 0.0%] 14[ 0.0%] 7[ 0.0%] 15[ 0.0%] PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command 1067 root RT 0 R 0 99.0 0.2 0:31.16 stress-ng-cpu [run] 975 arighi 20 0 R 0 1.0 0.0 0:26.32 yes
By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in this case) via ops.enqueue(), leading to better CPU utilization:
0[||||||||||||||||||||||100.0%] 8[ 0.0%] 1[||||||||||||||||||||||100.0%] 9[ 0.0%] 2[ 0.0%] 10[ 0.0%] 3[ 0.0%] 11[ 0.0%] 4[ 0.0%] 12[ 0.0%] 5[ 0.0%] 13[ 0.0%] 6[ 0.0%] 14[ 0.0%] 7[ 0.0%] 15[ 0.0%] PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command 577 root RT 0 R 0 100.0 0.2 0:23.17 stress-ng-cpu [run] 555 arighi 20 0 R 1 100.0 0.0 0:28.67 yes
It's debatable whether per-CPU tasks should be re-enqueued as well, but doing so is probably safer: the scheduler can recognize re-enqueued tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and either put them back at the head of the local DSQ or let another task attempt to take the CPU.
This also prevents giving per-CPU tasks an implicit priority boost, which would otherwise make them more likely to reclaim CPUs preempted by higher scheduling classes.
Fixes: 97e13ecb02668 ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()") Cc: stable@vger.kernel.org # v6.15+ Signed-off-by: Andrea Righi arighi@nvidia.com Acked-by: Changwoo Min changwoo@igalia.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/ext.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -6794,12 +6794,8 @@ __bpf_kfunc u32 scx_bpf_reenqueue_local( * CPUs disagree, they use %ENQUEUE_RESTORE which is bypassed to * the current local DSQ for running tasks and thus are not * visible to the BPF scheduler. - * - * Also skip re-enqueueing tasks that can only run on this - * CPU, as they would just be re-added to the same local - * DSQ without any benefit. */ - if (p->migration_pending || is_migration_disabled(p) || p->nr_cpus_allowed == 1) + if (p->migration_pending) continue;
dispatch_dequeue(rq, p);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: austinchang austinchang@synology.com
commit 8679d2687c351824d08cf1f0e86f3b65f22a00fe upstream.
btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode hasn't been read from inode item, then file_extent_tree won't be used at all in volumes without NO_HOLES.
Fix this by calling btrfs_init_file_extent_tree() after i_mode is initialized in btrfs_read_locked_inode().
Fixes: 3d7db6e8bd22e6 ("btrfs: don't allocate file extent tree for non regular files") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: austinchang austinchang@synology.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/delayed-inode.c | 3 --- fs/btrfs/inode.c | 11 +++++------ 2 files changed, 5 insertions(+), 9 deletions(-)
--- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1843,7 +1843,6 @@ static void fill_stack_inode_item(struct
int btrfs_fill_inode(struct btrfs_inode *inode, u32 *rdev) { - struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_delayed_node *delayed_node; struct btrfs_inode_item *inode_item; struct inode *vfs_inode = &inode->vfs_inode; @@ -1864,8 +1863,6 @@ int btrfs_fill_inode(struct btrfs_inode i_uid_write(vfs_inode, btrfs_stack_inode_uid(inode_item)); i_gid_write(vfs_inode, btrfs_stack_inode_gid(inode_item)); btrfs_i_size_write(inode, btrfs_stack_inode_size(inode_item)); - btrfs_inode_set_file_extent_range(inode, 0, - round_up(i_size_read(vfs_inode), fs_info->sectorsize)); vfs_inode->i_mode = btrfs_stack_inode_mode(inode_item); set_nlink(vfs_inode, btrfs_stack_inode_nlink(inode_item)); inode_set_bytes(vfs_inode, btrfs_stack_inode_nbytes(inode_item)); --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3881,10 +3881,6 @@ static int btrfs_read_locked_inode(struc bool filled = false; int first_xattr_slot;
- ret = btrfs_init_file_extent_tree(inode); - if (ret) - goto out; - ret = btrfs_fill_inode(inode, &rdev); if (!ret) filled = true; @@ -3916,8 +3912,6 @@ static int btrfs_read_locked_inode(struc i_uid_write(vfs_inode, btrfs_inode_uid(leaf, inode_item)); i_gid_write(vfs_inode, btrfs_inode_gid(leaf, inode_item)); btrfs_i_size_write(inode, btrfs_inode_size(leaf, inode_item)); - btrfs_inode_set_file_extent_range(inode, 0, - round_up(i_size_read(vfs_inode), fs_info->sectorsize));
inode_set_atime(vfs_inode, btrfs_timespec_sec(leaf, &inode_item->atime), btrfs_timespec_nsec(leaf, &inode_item->atime)); @@ -3948,6 +3942,11 @@ static int btrfs_read_locked_inode(struc btrfs_update_inode_mapping_flags(inode);
cache_index: + ret = btrfs_init_file_extent_tree(inode); + if (ret) + goto out; + btrfs_inode_set_file_extent_range(inode, 0, + round_up(i_size_read(vfs_inode), fs_info->sectorsize)); /* * If we were modified in the current generation and evicted from memory * and then re-read we need to do a full sync since we don't have any
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit a86556264696b797d94238d99d8284d0d34ed960 upstream.
These commands modprobe brd rd_size=1048576 vgcreate vg /dev/ram* lvcreate -m4 -L10 -n lv vg trigger the following warnings: device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
The warnings are caused by the fact that io_min is 512 and physical block size is 4096.
If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero because it would be raised to 512 and it would trigger the warning.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-raid.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3810,8 +3810,10 @@ static void raid_io_hints(struct dm_targ struct raid_set *rs = ti->private; unsigned int chunk_size_bytes = to_bytes(rs->md.chunk_sectors);
- limits->io_min = chunk_size_bytes; - limits->io_opt = chunk_size_bytes * mddev_data_stripes(rs); + if (chunk_size_bytes) { + limits->io_min = chunk_size_bytes; + limits->io_opt = chunk_size_bytes * mddev_data_stripes(rs); + } }
static void raid_presuspend(struct dm_target *ti)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit 1071d560afb4c245c2076494226df47db5a35708 upstream.
There's a possible integer overflow in stripe_io_hints if we have too large chunk size. Test if the overflow happened, and if it did, don't set limits->io_min and limits->io_opt;
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reviewed-by: John Garry john.g.garry@oracle.com Suggested-by: Dongsheng Yang dongsheng.yang@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-stripe.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/md/dm-stripe.c +++ b/drivers/md/dm-stripe.c @@ -456,11 +456,15 @@ static void stripe_io_hints(struct dm_ta struct queue_limits *limits) { struct stripe_c *sc = ti->private; - unsigned int chunk_size = sc->chunk_size << SECTOR_SHIFT; + unsigned int io_min, io_opt;
limits->chunk_sectors = sc->chunk_size; - limits->io_min = chunk_size; - limits->io_opt = chunk_size * sc->stripes; + + if (!check_shl_overflow(sc->chunk_size, SECTOR_SHIFT, &io_min) && + !check_mul_overflow(io_min, sc->stripes, &io_opt)) { + limits->io_min = io_min; + limits->io_opt = io_opt; + } }
static struct target_type stripe_target = {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad upstream.
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(), __gup_longterm_locked() tries to migrate target folios that should not be longterm pinned, for example because they reside in a CMA region or movable zone. This is done by first pinning all of the target folios anyway, collecting all of the longterm-unpinnable target folios into a list, dropping the pins that were just taken and finally handing the list off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the folios being migrated, otherwise the migration will fail and pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is relatively easy to observe migration failures when running pKVM (which uses pin_user_pages() on crosvm's virtual address space to resolve stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and this results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its mapping of guest memory prior to the pinning. Subsequently, when pin_user_pages() walks the page-table, the relevant 'pte' is not present and so the faulting logic allocates a new folio, mlocks it with mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch by pagevec"), mlock/munlock operations on a folio (formerly page), are deferred. For example, mlock_folio() takes an additional reference on the target folio before placing it into a per-cpu 'folio_batch' for later processing by mlock_folio_batch(), which drops the refcount once the operation is complete. Processing of the batches is coupled with the LRU batch logic and can be forcefully drained with lru_add_drain_all() but as long as a folio remains unprocessed on the batch, its refcount will be elevated.
This deferred batching therefore interacts poorly with the pKVM pinning scenario as we can find ourselves in a situation where the migration code fails to migrate a folio due to the elevated refcount from the pending mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an lru_add_drain_all() is worth calling, to remove LRU cache references to make the folio migratable: the LRU flag may be set even while the folio is held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11 commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch") tried to make it reliable, by moving LRU flag clearing; but missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag clearing to the mlock/munlock batches: if they do benefit from batching, mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable all along: some more false positives (unhelpful drains) than before, and never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated PG_private_2 flag, with the aid of netfs and filemap support functions. Although it is consistently matched by an increment of folio ref_count, folio_expected_ref_count() intentionally does not recognize it, and ceph folio migration currently depends on that for PG_private_2 folios to be rejected. New references to the deprecated flag are discouraged, so do not add it into the collect_longterm_unpinnable_folios() calculation: but longterm pinning of transiently PG_private_2 ceph and nfs folios (an uncommon case) may invoke a redundant lru_add_drain_all(). And this makes easy the backport to earlier releases: up to and including 6.12, btrfs also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Will Deacon will@kernel.org Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/ Acked-by: Kiryl Shutsemau kas@kernel.org Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/gup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/gup.c +++ b/mm/gup.c @@ -2331,7 +2331,8 @@ static unsigned long collect_longterm_un continue; }
- if (!folio_test_lru(folio) && drain_allow) { + if (drain_allow && folio_ref_count(folio) != + folio_expected_ref_count(folio) + 1) { lru_add_drain_all(); drain_allow = false; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit afb99e9f500485160f34b8cad6d3763ada3e80e8 upstream.
This reverts commit 33dfe9204f29: now that collect_longterm_unpinnable_folios() is checking ref_count instead of lru, and mlock/munlock do not participate in the revised LRU flag clearing, those changes are misleading, and enlarge the window during which mlock/munlock may miss an mlock_count update.
It is possible (I'd hesitate to claim probable) that the greater likelihood of missed mlock_count updates would explain the "Realtime threads delayed due to kcompactd0" observed on 6.12 in the Link below. If that is the case, this reversion will help; but a complete solution needs also a further patch, beyond the scope of this series.
Included some 80-column cleanup around folio_batch_add_and_move().
The role of folio_test_clear_lru() (before taking per-memcg lru_lock) is questionable since 6.13 removed mem_cgroup_move_account() etc; but perhaps there are still some races which need it - not examined here.
Link: https://lore.kernel.org/linux-mm/DU0PR01MB10385345F7153F334100981888259A@DU0... Link: https://lkml.kernel.org/r/05905d7b-ed14-68b1-79d8-bdec30367eba@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: Will Deacon will@kernel.org Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/swap.c | 50 ++++++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 24 deletions(-)
--- a/mm/swap.c +++ b/mm/swap.c @@ -164,6 +164,10 @@ static void folio_batch_move_lru(struct for (i = 0; i < folio_batch_count(fbatch); i++) { struct folio *folio = fbatch->folios[i];
+ /* block memcg migration while the folio moves between lru */ + if (move_fn != lru_add && !folio_test_clear_lru(folio)) + continue; + folio_lruvec_relock_irqsave(folio, &lruvec, &flags); move_fn(lruvec, folio);
@@ -176,14 +180,10 @@ static void folio_batch_move_lru(struct }
static void __folio_batch_add_and_move(struct folio_batch __percpu *fbatch, - struct folio *folio, move_fn_t move_fn, - bool on_lru, bool disable_irq) + struct folio *folio, move_fn_t move_fn, bool disable_irq) { unsigned long flags;
- if (on_lru && !folio_test_clear_lru(folio)) - return; - folio_get(folio);
if (disable_irq) @@ -191,8 +191,8 @@ static void __folio_batch_add_and_move(s else local_lock(&cpu_fbatches.lock);
- if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || folio_test_large(folio) || - lru_cache_disabled()) + if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || + folio_test_large(folio) || lru_cache_disabled()) folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn);
if (disable_irq) @@ -201,13 +201,13 @@ static void __folio_batch_add_and_move(s local_unlock(&cpu_fbatches.lock); }
-#define folio_batch_add_and_move(folio, op, on_lru) \ - __folio_batch_add_and_move( \ - &cpu_fbatches.op, \ - folio, \ - op, \ - on_lru, \ - offsetof(struct cpu_fbatches, op) >= offsetof(struct cpu_fbatches, lock_irq) \ +#define folio_batch_add_and_move(folio, op) \ + __folio_batch_add_and_move( \ + &cpu_fbatches.op, \ + folio, \ + op, \ + offsetof(struct cpu_fbatches, op) >= \ + offsetof(struct cpu_fbatches, lock_irq) \ )
static void lru_move_tail(struct lruvec *lruvec, struct folio *folio) @@ -231,10 +231,10 @@ static void lru_move_tail(struct lruvec void folio_rotate_reclaimable(struct folio *folio) { if (folio_test_locked(folio) || folio_test_dirty(folio) || - folio_test_unevictable(folio)) + folio_test_unevictable(folio) || !folio_test_lru(folio)) return;
- folio_batch_add_and_move(folio, lru_move_tail, true); + folio_batch_add_and_move(folio, lru_move_tail); }
void lru_note_cost(struct lruvec *lruvec, bool file, @@ -323,10 +323,11 @@ static void folio_activate_drain(int cpu
void folio_activate(struct folio *folio) { - if (folio_test_active(folio) || folio_test_unevictable(folio)) + if (folio_test_active(folio) || folio_test_unevictable(folio) || + !folio_test_lru(folio)) return;
- folio_batch_add_and_move(folio, lru_activate, true); + folio_batch_add_and_move(folio, lru_activate); }
#else @@ -502,7 +503,7 @@ void folio_add_lru(struct folio *folio) lru_gen_in_fault() && !(current->flags & PF_MEMALLOC)) folio_set_active(folio);
- folio_batch_add_and_move(folio, lru_add, false); + folio_batch_add_and_move(folio, lru_add); } EXPORT_SYMBOL(folio_add_lru);
@@ -680,13 +681,13 @@ void lru_add_drain_cpu(int cpu) void deactivate_file_folio(struct folio *folio) { /* Deactivating an unevictable folio will not accelerate reclaim */ - if (folio_test_unevictable(folio)) + if (folio_test_unevictable(folio) || !folio_test_lru(folio)) return;
if (lru_gen_enabled() && lru_gen_clear_refs(folio)) return;
- folio_batch_add_and_move(folio, lru_deactivate_file, true); + folio_batch_add_and_move(folio, lru_deactivate_file); }
/* @@ -699,13 +700,13 @@ void deactivate_file_folio(struct folio */ void folio_deactivate(struct folio *folio) { - if (folio_test_unevictable(folio)) + if (folio_test_unevictable(folio) || !folio_test_lru(folio)) return;
if (lru_gen_enabled() ? lru_gen_clear_refs(folio) : !folio_test_active(folio)) return;
- folio_batch_add_and_move(folio, lru_deactivate, true); + folio_batch_add_and_move(folio, lru_deactivate); }
/** @@ -718,10 +719,11 @@ void folio_deactivate(struct folio *foli void folio_mark_lazyfree(struct folio *folio) { if (!folio_test_anon(folio) || !folio_test_swapbacked(folio) || + !folio_test_lru(folio) || folio_test_swapcache(folio) || folio_test_unevictable(folio)) return;
- folio_batch_add_and_move(folio, lru_lazyfree, true); + folio_batch_add_and_move(folio, lru_lazyfree); }
void lru_add_drain(void)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Zhe lizhe.67@bytedance.com
commit a03db236aebfaeadf79396dbd570896b870bda01 upstream.
In the current implementation of longterm pin_user_pages(), we invoke collect_longterm_unpinnable_folios(). This function iterates through the list to check whether each folio belongs to the "longterm_unpinnabled" category. The folios in this list essentially correspond to a contiguous region of userspace addresses, with each folio representing a physical address in increments of PAGESIZE.
If this userspace address range is mapped with large folio, we can optimize the performance of function collect_longterm_unpinnable_folios() by reducing the using of READ_ONCE() invoked in pofs_get_folio()->page_folio()->_compound_head().
Also, we can simplify the logic of collect_longterm_unpinnable_folios(). Instead of comparing with prev_folio after calling pofs_get_folio(), we can check whether the next page is within the same folio.
The performance test results, based on v6.15, obtained through the gup_test tool from the kernel source tree are as follows. We achieve an improvement of over 66% for large folio with pagesize=2M. For small folio, we have only observed a very slight degradation in performance.
Without this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
With this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[lizhe.67@bytedance.com: whitespace fix, per David] Link: https://lkml.kernel.org/r/20250606091917.91384-1-lizhe.67@bytedance.com Link: https://lkml.kernel.org/r/20250606023742.58344-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe lizhe.67@bytedance.com Cc: David Hildenbrand david@redhat.com Cc: Dev Jain dev.jain@arm.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: John Hubbard jhubbard@nvidia.com Cc: Muchun Song muchun.song@linux.dev Cc: Peter Xu peterx@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/gup.c | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-)
--- a/mm/gup.c +++ b/mm/gup.c @@ -2300,6 +2300,31 @@ static void pofs_unpin(struct pages_or_f unpin_user_pages(pofs->pages, pofs->nr_entries); }
+static struct folio *pofs_next_folio(struct folio *folio, + struct pages_or_folios *pofs, long *index_ptr) +{ + long i = *index_ptr + 1; + + if (!pofs->has_folios && folio_test_large(folio)) { + const unsigned long start_pfn = folio_pfn(folio); + const unsigned long end_pfn = start_pfn + folio_nr_pages(folio); + + for (; i < pofs->nr_entries; i++) { + unsigned long pfn = page_to_pfn(pofs->pages[i]); + + /* Is this page part of this folio? */ + if (pfn < start_pfn || pfn >= end_pfn) + break; + } + } + + if (unlikely(i == pofs->nr_entries)) + return NULL; + *index_ptr = i; + + return pofs_get_folio(pofs, i); +} + /* * Returns the number of collected folios. Return value is always >= 0. */ @@ -2307,16 +2332,13 @@ static unsigned long collect_longterm_un struct list_head *movable_folio_list, struct pages_or_folios *pofs) { - unsigned long i, collected = 0; - struct folio *prev_folio = NULL; + unsigned long collected = 0; bool drain_allow = true; + struct folio *folio; + long i = 0;
- for (i = 0; i < pofs->nr_entries; i++) { - struct folio *folio = pofs_get_folio(pofs, i); - - if (folio == prev_folio) - continue; - prev_folio = folio; + for (folio = pofs_get_folio(pofs, i); folio; + folio = pofs_next_folio(folio, pofs, &i)) {
if (folio_is_longterm_pinnable(folio)) continue;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit a09a8a1fbb374e0053b97306da9dbc05bd384685 upstream.
In many cases, if collect_longterm_unpinnable_folios() does need to drain the LRU cache to release a reference, the cache in question is on this same CPU, and much more efficiently drained by a preliminary local lru_add_drain(), than the later cross-CPU lru_add_drain_all().
Marked for stable, to counter the increase in lru_add_drain_all()s from "mm/gup: check ref_count instead of lru before migration". Note for clean backports: can take 6.16 commit a03db236aebf ("gup: optimize longterm pin_user_pages() for large folio") first.
Link: https://lkml.kernel.org/r/66f2751f-283e-816d-9530-765db7edc465@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: Will Deacon will@kernel.org Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/gup.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/mm/gup.c +++ b/mm/gup.c @@ -2333,8 +2333,8 @@ static unsigned long collect_longterm_un struct pages_or_folios *pofs) { unsigned long collected = 0; - bool drain_allow = true; struct folio *folio; + int drained = 0; long i = 0;
for (folio = pofs_get_folio(pofs, i); folio; @@ -2353,10 +2353,17 @@ static unsigned long collect_longterm_un continue; }
- if (drain_allow && folio_ref_count(folio) != - folio_expected_ref_count(folio) + 1) { + if (drained == 0 && + folio_ref_count(folio) != + folio_expected_ref_count(folio) + 1) { + lru_add_drain(); + drained = 1; + } + if (drained == 1 && + folio_ref_count(folio) != + folio_expected_ref_count(folio) + 1) { lru_add_drain_all(); - drain_allow = false; + drained = 2; }
if (!folio_isolate_lru(folio))
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit 8d79ed36bfc83d0583ab72216b7980340478cdfb upstream.
This reverts commit 0885ef470560: that was a fix to the reverted 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9.
Link: https://lkml.kernel.org/r/aa0e9d67-fbcd-9d79-88a1-641dfbe1d9d1@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: Will Deacon will@kernel.org Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4505,7 +4505,7 @@ static bool sort_folio(struct lruvec *lr }
/* ineligible */ - if (!folio_test_lru(folio) || zone > sc->reclaim_idx) { + if (zone > sc->reclaim_idx) { gen = folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); return true;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit 2da6de30e60dd9bb14600eff1cc99df2fa2ddae3 upstream.
mm/swap.c and mm/mlock.c agree to drain any per-CPU batch as soon as a large folio is added: so collect_longterm_unpinnable_folios() just wastes effort when calling lru_add_drain[_all]() on a large folio.
But although there is good reason not to batch up PMD-sized folios, we might well benefit from batching a small number of low-order mTHPs (though unclear how that "small number" limitation will be implemented).
So ask if folio_may_be_lru_cached() rather than !folio_test_large(), to insulate those particular checks from future change. Name preferred to "folio_is_batchable" because large folios can well be put on a batch: it's just the per-CPU LRU caches, drained much later, which need care.
Marked for stable, to counter the increase in lru_add_drain_all()s from "mm/gup: check ref_count instead of lru before migration".
Link: https://lkml.kernel.org/r/57d2eaf8-3607-f318-e0c5-be02dce61ad0@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins hughd@google.com Suggested-by: David Hildenbrand david@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: Will Deacon will@kernel.org Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/swap.h | 10 ++++++++++ mm/gup.c | 4 ++-- mm/mlock.c | 6 +++--- mm/swap.c | 2 +- 4 files changed, 16 insertions(+), 6 deletions(-)
--- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -384,6 +384,16 @@ void folio_add_lru_vma(struct folio *, s void mark_page_accessed(struct page *); void folio_mark_accessed(struct folio *);
+static inline bool folio_may_be_lru_cached(struct folio *folio) +{ + /* + * Holding PMD-sized folios in per-CPU LRU cache unbalances accounting. + * Holding small numbers of low-order mTHP folios in per-CPU LRU cache + * will be sensible, but nobody has implemented and tested that yet. + */ + return !folio_test_large(folio); +} + extern atomic_t lru_disable_count;
static inline bool lru_cache_disabled(void) --- a/mm/gup.c +++ b/mm/gup.c @@ -2353,13 +2353,13 @@ static unsigned long collect_longterm_un continue; }
- if (drained == 0 && + if (drained == 0 && folio_may_be_lru_cached(folio) && folio_ref_count(folio) != folio_expected_ref_count(folio) + 1) { lru_add_drain(); drained = 1; } - if (drained == 1 && + if (drained == 1 && folio_may_be_lru_cached(folio) && folio_ref_count(folio) != folio_expected_ref_count(folio) + 1) { lru_add_drain_all(); --- a/mm/mlock.c +++ b/mm/mlock.c @@ -255,7 +255,7 @@ void mlock_folio(struct folio *folio)
folio_get(folio); if (!folio_batch_add(fbatch, mlock_lru(folio)) || - folio_test_large(folio) || lru_cache_disabled()) + !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } @@ -278,7 +278,7 @@ void mlock_new_folio(struct folio *folio
folio_get(folio); if (!folio_batch_add(fbatch, mlock_new(folio)) || - folio_test_large(folio) || lru_cache_disabled()) + !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } @@ -299,7 +299,7 @@ void munlock_folio(struct folio *folio) */ folio_get(folio); if (!folio_batch_add(fbatch, folio) || - folio_test_large(folio) || lru_cache_disabled()) + !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } --- a/mm/swap.c +++ b/mm/swap.c @@ -192,7 +192,7 @@ static void __folio_batch_add_and_move(s local_lock(&cpu_fbatches.lock);
if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || - folio_test_large(folio) || lru_cache_disabled()) + !folio_may_be_lru_cached(folio) || lru_cache_disabled()) folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn);
if (disable_irq)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiezhu Yang yangtiezhu@loongson.cn
commit f5003098e2f337d8e8a87dc636250e3fa978d9ad upstream.
Loongson-3A6000 and 3C6000 CPUs also support unaligned memory access, so the current description is out of date to some extent.
Actually, all of Loongson-3 series processors based on LoongArch support unaligned memory access, this hardware capability is indicated by the bit 20 (UAL) of CPUCFG1 register, update the help info to reflect the reality.
Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/Kconfig | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -566,10 +566,14 @@ config ARCH_STRICT_ALIGN -mstrict-align build parameter to prevent unaligned accesses.
CPUs with h/w unaligned access support: - Loongson-2K2000/2K3000/3A5000/3C5000/3D5000. + Loongson-2K2000/2K3000 and all of Loongson-3 series processors + based on LoongArch.
CPUs without h/w unaligned access support: - Loongson-2K500/2K1000. + Loongson-2K0300/2K0500/2K1000. + + If you want to make sure whether to support unaligned memory access + on your hardware, please read the bit 20 (UAL) of CPUCFG1 register.
This option is enabled by default to make the kernel be able to run on all LoongArch systems. But you can disable it manually if you want
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiezhu Yang yangtiezhu@loongson.cn
commit baad7830ee9a56756b3857348452fe756cb0a702 upstream.
If the break immediate code is 0, it should mark the type as INSN_TRAP. If the break immediate code is 1, it should mark the type as INSN_BUG.
While at it, format the code style and add the code comment for nop.
Cc: stable@vger.kernel.org Suggested-by: WANG Rui wangrui@loongson.cn Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/objtool/arch/loongarch/decode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/tools/objtool/arch/loongarch/decode.c +++ b/tools/objtool/arch/loongarch/decode.c @@ -310,10 +310,16 @@ int arch_decode_instruction(struct objto if (decode_insn_reg2i16_fomat(inst, insn)) return 0;
- if (inst.word == 0) + if (inst.word == 0) { + /* andi $zero, $zero, 0x0 */ insn->type = INSN_NOP; - else if (inst.reg0i15_format.opcode == break_op) { - /* break */ + } else if (inst.reg0i15_format.opcode == break_op && + inst.reg0i15_format.immediate == 0x0) { + /* break 0x0 */ + insn->type = INSN_TRAP; + } else if (inst.reg0i15_format.opcode == break_op && + inst.reg0i15_format.immediate == 0x1) { + /* break 0x1 */ insn->type = INSN_BUG; } else if (inst.reg2_format.opcode == ertn_op) { /* ertn */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiezhu Yang yangtiezhu@loongson.cn
commit 539d7344d4feaea37e05863e9aa86bd31f28e46f upstream.
When compiling with LLVM and CONFIG_RUST is set, there exists the following objtool warning:
rust/compiler_builtins.o: warning: objtool: __rust__unordsf2(): unexpected end of section .text.unlikely.
objdump shows that the end of section .text.unlikely is an atomic instruction:
amswap.w $zero, $ra, $zero
According to the LoongArch Reference Manual, if the amswap.w atomic memory access instruction has the same register number as rd and rj, the execution will trigger an Instruction Non-defined Exception, so mark the above instruction as INSN_BUG type to fix the warning.
Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/arch/loongarch/include/asm/inst.h | 12 ++++++++++++ tools/objtool/arch/loongarch/decode.c | 21 +++++++++++++++++++++ 2 files changed, 33 insertions(+)
--- a/tools/arch/loongarch/include/asm/inst.h +++ b/tools/arch/loongarch/include/asm/inst.h @@ -51,6 +51,10 @@ enum reg2i16_op { bgeu_op = 0x1b, };
+enum reg3_op { + amswapw_op = 0x70c0, +}; + struct reg0i15_format { unsigned int immediate : 15; unsigned int opcode : 17; @@ -96,6 +100,13 @@ struct reg2i16_format { unsigned int opcode : 6; };
+struct reg3_format { + unsigned int rd : 5; + unsigned int rj : 5; + unsigned int rk : 5; + unsigned int opcode : 17; +}; + union loongarch_instruction { unsigned int word; struct reg0i15_format reg0i15_format; @@ -105,6 +116,7 @@ union loongarch_instruction { struct reg2i12_format reg2i12_format; struct reg2i14_format reg2i14_format; struct reg2i16_format reg2i16_format; + struct reg3_format reg3_format; };
#define LOONGARCH_INSN_SIZE sizeof(union loongarch_instruction) --- a/tools/objtool/arch/loongarch/decode.c +++ b/tools/objtool/arch/loongarch/decode.c @@ -278,6 +278,25 @@ static bool decode_insn_reg2i16_fomat(un return true; }
+static bool decode_insn_reg3_fomat(union loongarch_instruction inst, + struct instruction *insn) +{ + switch (inst.reg3_format.opcode) { + case amswapw_op: + if (inst.reg3_format.rd == LOONGARCH_GPR_ZERO && + inst.reg3_format.rk == LOONGARCH_GPR_RA && + inst.reg3_format.rj == LOONGARCH_GPR_ZERO) { + /* amswap.w $zero, $ra, $zero */ + insn->type = INSN_BUG; + } + break; + default: + return false; + } + + return true; +} + int arch_decode_instruction(struct objtool_file *file, const struct section *sec, unsigned long offset, unsigned int maxlen, struct instruction *insn) @@ -309,6 +328,8 @@ int arch_decode_instruction(struct objto return 0; if (decode_insn_reg2i16_fomat(inst, insn)) return 0; + if (decode_insn_reg3_fomat(inst, insn)) + return 0;
if (inst.word == 0) { /* andi $zero, $zero, 0x0 */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiezhu Yang yangtiezhu@loongson.cn
commit 677d4a52d4dc4a147d5e84af9ff207832578be70 upstream.
When testing the kernel live patching with "modprobe livepatch-sample", there is a timeout over 15 seconds from "starting patching transition" to "patching complete". The dmesg command shows "unreliable stack" for user tasks in debug mode, here is one of the messages:
livepatch: klp_try_switch_task: bash:1193 has an unreliable stack
The "unreliable stack" is because it can not unwind from do_syscall() to its previous frame handle_syscall(). It should use fp to find the original stack top due to secondary stack in do_syscall(), but fp is not used for some other functions, then fp can not be restored by the next frame of do_syscall(), so it is necessary to save fp if task is not current, in order to get the stack top of do_syscall().
Here are the call chains:
klp_enable_patch() klp_try_complete_transition() klp_try_switch_task() klp_check_and_switch_task() klp_check_stack() stack_trace_save_tsk_reliable() arch_stack_walk_reliable()
When executing "rmmod livepatch-sample", there exists a similar issue. With this patch, it takes a short time for patching and unpatching.
Before:
# modprobe livepatch-sample # dmesg -T | tail -3 [Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition [Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks [Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled # rmmod livepatch_sample rmmod: ERROR: Module livepatch_sample is in use # rmmod livepatch_sample # dmesg -T | tail -3 [Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition [Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks [Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
After:
# modprobe livepatch-sample # dmesg -T | tail -2 [Tue Sep 16 16:19:30 2025] livepatch: 'livepatch_sample': starting patching transition [Tue Sep 16 16:19:31 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled # rmmod livepatch_sample # dmesg -T | tail -2 [Tue Sep 16 16:19:36 2025] livepatch: 'livepatch_sample': starting unpatching transition [Tue Sep 16 16:19:37 2025] livepatch: 'livepatch_sample': unpatching complete
Cc: stable@vger.kernel.org # v6.9+ Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support") Reported-by: Xi Zhang zhangxi@kylinos.cn Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kernel/stacktrace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/loongarch/kernel/stacktrace.c +++ b/arch/loongarch/kernel/stacktrace.c @@ -51,12 +51,13 @@ int arch_stack_walk_reliable(stack_trace if (task == current) { regs->regs[3] = (unsigned long)__builtin_frame_address(0); regs->csr_era = (unsigned long)__builtin_return_address(0); + regs->regs[22] = 0; } else { regs->regs[3] = thread_saved_fp(task); regs->csr_era = thread_saved_ra(task); + regs->regs[22] = task->thread.reg22; } regs->regs[1] = 0; - regs->regs[22] = 0;
for (unwind_start(&state, task, regs); !unwind_done(&state) && !unwind_error(&state); unwind_next_frame(&state)) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guangshuo Li 202321181@mail.sdu.edu.cn
commit ac398f570724c41e5e039d54e4075519f6af7408 upstream.
Add a NULL-pointer check after the kcalloc() call in init_vdso(). If allocation fails, return -ENOMEM to prevent a possible dereference of vdso_info.code_mapping.pages when it is NULL.
Cc: stable@vger.kernel.org Fixes: 2ed119aef60d ("LoongArch: Set correct size for vDSO code mapping") Signed-off-by: Guangshuo Li 202321181@mail.sdu.edu.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kernel/vdso.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -54,6 +54,9 @@ static int __init init_vdso(void) vdso_info.code_mapping.pages = kcalloc(vdso_info.size / PAGE_SIZE, sizeof(struct page *), GFP_KERNEL);
+ if (!vdso_info.code_mapping.pages) + return -ENOMEM; + pfn = __phys_to_pfn(__pa_symbol(vdso_info.vdso)); for (i = 0; i < vdso_info.size / PAGE_SIZE; i++) vdso_info.code_mapping.pages[i] = pfn_to_page(pfn + i);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit a9d13433fe17be0e867e51e71a1acd2731fbef8d upstream.
ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control the -mstrict-align flag. However, ACPI structures are packed by default so will cause unaligned accesses.
To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to align ACPI structures if ARCH_STRICT_ALIGN enabled.
Cc: stable@vger.kernel.org Reported-by: Binbin Zhou zhoubinbin@loongson.cn Suggested-by: Xi Ruoyao xry111@xry111.site Suggested-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/include/asm/acenv.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/arch/loongarch/include/asm/acenv.h +++ b/arch/loongarch/include/asm/acenv.h @@ -10,9 +10,8 @@ #ifndef _ASM_LOONGARCH_ACENV_H #define _ASM_LOONGARCH_ACENV_H
-/* - * This header is required by ACPI core, but we have nothing to fill in - * right now. Will be updated later when needed. - */ +#ifdef CONFIG_ARCH_STRICT_ALIGN +#define ACPI_MISALIGNMENT_NOT_SUPPORTED +#endif /* CONFIG_ARCH_STRICT_ALIGN */
#endif /* _ASM_LOONGARCH_ACENV_H */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tao Cui cuitao@kylinos.cn
commit 51adb03e6b865c0c6790f29659ff52d56742de2e upstream.
Add a check for the return value of kobject_create_and_add(), to ensure that the kobj allocation succeeds for later use.
Cc: stable@vger.kernel.org Signed-off-by: Tao Cui cuitao@kylinos.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kernel/env.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/loongarch/kernel/env.c +++ b/arch/loongarch/kernel/env.c @@ -109,6 +109,8 @@ static int __init boardinfo_init(void) struct kobject *loongson_kobj;
loongson_kobj = kobject_create_and_add("loongson", firmware_kobj); + if (!loongson_kobj) + return -ENOMEM;
return sysfs_create_file(loongson_kobj, &boardinfo_attr.attr); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiezhu Yang yangtiezhu@loongson.cn
commit b15212824a01cb0b62f7b522f4ee334622cf982a upstream.
LTO is not only used for Clang, but maybe also used for Rust, make LTO case out of CONFIG_CC_HAS_ANNOTATE_TABLEJUMP in Makefile.
This is preparation for later patch, no function changes.
Cc: stable@vger.kernel.org Suggested-by: WANG Rui wangrui@loongson.cn Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/Makefile | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -102,16 +102,16 @@ KBUILD_CFLAGS += $(call cc-option,-mth
ifdef CONFIG_OBJTOOL ifdef CONFIG_CC_HAS_ANNOTATE_TABLEJUMP +KBUILD_CFLAGS += -mannotate-tablejump +else +KBUILD_CFLAGS += -fno-jump-tables # keep compatibility with older compilers +endif +ifdef CONFIG_LTO_CLANG # The annotate-tablejump option can not be passed to LLVM backend when LTO is enabled. # Ensure it is aware of linker with LTO, '--loongarch-annotate-tablejump' also needs to # be passed via '-mllvm' to ld.lld. -KBUILD_CFLAGS += -mannotate-tablejump -ifdef CONFIG_LTO_CLANG KBUILD_LDFLAGS += -mllvm --loongarch-annotate-tablejump endif -else -KBUILD_CFLAGS += -fno-jump-tables # keep compatibility with older compilers -endif endif
KBUILD_RUSTFLAGS += --target=loongarch64-unknown-none-softfloat -Ccode-model=small
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tiezhu Yang yangtiezhu@loongson.cn
commit 74f8295c6fb8436bec9995baf6ba463151b6fb68 upstream.
When compiling with LLVM and CONFIG_RUST is set, there exist objtool warnings in rust/core.o and rust/kernel.o, like this:
rust/core.o: warning: objtool: _RNvXs1_NtNtCs5QSdWC790r4_4core5ascii10ascii_charNtB5_9AsciiCharNtNtB9_3fmt5Debug3fmt+0x54: sibling call from callable instruction with modified stack frame
For this special case, the related object file shows that there is no generated relocation section '.rela.discard.tablejump_annotate' for the table jump instruction jirl, thus objtool can not know that what is the actual destination address.
If rustc has the option "-Cllvm-args=--loongarch-annotate-tablejump", pass the option to enable jump tables for objtool, otherwise it should pass "-Zno-jump-tables" to keep compatibility with older rustc.
How to test:
$ rustup component add rust-src $ make LLVM=1 rustavailable $ make ARCH=loongarch LLVM=1 clean defconfig $ scripts/config -d MODVERSIONS \ -e RUST -e SAMPLES -e SAMPLES_RUST \ -e SAMPLE_RUST_CONFIGFS -e SAMPLE_RUST_MINIMAL \ -e SAMPLE_RUST_MISC_DEVICE -e SAMPLE_RUST_PRINT \ -e SAMPLE_RUST_DMA -e SAMPLE_RUST_DRIVER_PCI \ -e SAMPLE_RUST_DRIVER_PLATFORM -e SAMPLE_RUST_DRIVER_FAUX \ -e SAMPLE_RUST_DRIVER_AUXILIARY -e SAMPLE_RUST_HOSTPROGS $ make ARCH=loongarch LLVM=1 olddefconfig all
Cc: stable@vger.kernel.org Acked-by: Miguel Ojeda ojeda@kernel.org Reported-by: Miguel Ojeda ojeda@kernel.org Closes: https://lore.kernel.org/rust-for-linux/CANiq72mNeCuPkCDrG2db3w=AX+O-zYrfpris... Suggested-by: WANG Rui wangrui@loongson.cn Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/Kconfig | 4 ++++ arch/loongarch/Makefile | 5 +++++ 2 files changed, 9 insertions(+)
--- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -301,6 +301,10 @@ config AS_HAS_LVZ_EXTENSION config CC_HAS_ANNOTATE_TABLEJUMP def_bool $(cc-option,-mannotate-tablejump)
+config RUSTC_HAS_ANNOTATE_TABLEJUMP + depends on RUST + def_bool $(rustc-option,-Cllvm-args=--loongarch-annotate-tablejump) + menu "Kernel type and options"
source "kernel/Kconfig.hz" --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -106,6 +106,11 @@ KBUILD_CFLAGS += -mannotate-tablejump else KBUILD_CFLAGS += -fno-jump-tables # keep compatibility with older compilers endif +ifdef CONFIG_RUSTC_HAS_ANNOTATE_TABLEJUMP +KBUILD_RUSTFLAGS += -Cllvm-args=--loongarch-annotate-tablejump +else +KBUILD_RUSTFLAGS += -Zno-jump-tables # keep compatibility with older compilers +endif ifdef CONFIG_LTO_CLANG # The annotate-tablejump option can not be passed to LLVM backend when LTO is enabled. # Ensure it is aware of linker with LTO, '--loongarch-annotate-tablejump' also needs to
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bibo Mao maobibo@loongson.cn
commit 47256c4c8b1bfbc63223a0da2d4fa90b6ede5cbb upstream.
Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() before spinlock context in function kvm_eiointc_ctrl_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_ctrl_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd5 ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kvm/intc/eiointc.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-)
--- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -810,21 +810,26 @@ static int kvm_eiointc_ctrl_access(struc struct loongarch_eiointc *s = dev->kvm->arch.eiointc;
data = (void __user *)attr->addr; - spin_lock_irqsave(&s->lock, flags); switch (type) { case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_NUM_CPU: + case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_FEATURE: if (copy_from_user(&val, data, 4)) - ret = -EFAULT; - else { - if (val >= EIOINTC_ROUTE_MAX_VCPUS) - ret = -EINVAL; - else - s->num_cpu = val; - } + return -EFAULT; + break; + default: + break; + } + + spin_lock_irqsave(&s->lock, flags); + switch (type) { + case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_NUM_CPU: + if (val >= EIOINTC_ROUTE_MAX_VCPUS) + ret = -EINVAL; + else + s->num_cpu = val; break; case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_FEATURE: - if (copy_from_user(&s->features, data, 4)) - ret = -EFAULT; + s->features = val; if (!(s->features & BIT(EIOINTC_HAS_VIRT_EXTENSION))) s->status |= BIT(EIOINTC_ENABLE); break;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bibo Mao maobibo@loongson.cn
commit 62f11796a0dfa1a2ef5f50a2d1bc81c81628fb8e upstream.
Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() before spinlock context in function kvm_eiointc_ctrl_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_regs_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd5 ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kvm/intc/eiointc.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-)
--- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -851,19 +851,17 @@ static int kvm_eiointc_ctrl_access(struc
static int kvm_eiointc_regs_access(struct kvm_device *dev, struct kvm_device_attr *attr, - bool is_write) + bool is_write, int *data) { int addr, cpu, offset, ret = 0; unsigned long flags; void *p = NULL; - void __user *data; struct loongarch_eiointc *s;
s = dev->kvm->arch.eiointc; addr = attr->attr; cpu = addr >> 16; addr &= 0xffff; - data = (void __user *)attr->addr; switch (addr) { case EIOINTC_NODETYPE_START ... EIOINTC_NODETYPE_END: offset = (addr - EIOINTC_NODETYPE_START) / 4; @@ -902,13 +900,10 @@ static int kvm_eiointc_regs_access(struc }
spin_lock_irqsave(&s->lock, flags); - if (is_write) { - if (copy_from_user(p, data, 4)) - ret = -EFAULT; - } else { - if (copy_to_user(data, p, 4)) - ret = -EFAULT; - } + if (is_write) + memcpy(p, data, 4); + else + memcpy(data, p, 4); spin_unlock_irqrestore(&s->lock, flags);
return ret; @@ -965,9 +960,18 @@ static int kvm_eiointc_sw_status_access( static int kvm_eiointc_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr) { + int ret, data; + switch (attr->group) { case KVM_DEV_LOONGARCH_EXTIOI_GRP_REGS: - return kvm_eiointc_regs_access(dev, attr, false); + ret = kvm_eiointc_regs_access(dev, attr, false, &data); + if (ret) + return ret; + + if (copy_to_user((void __user *)attr->addr, &data, 4)) + ret = -EFAULT; + + return ret; case KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS: return kvm_eiointc_sw_status_access(dev, attr, false); default: @@ -978,11 +982,16 @@ static int kvm_eiointc_get_attr(struct k static int kvm_eiointc_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr) { + int data; + switch (attr->group) { case KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL: return kvm_eiointc_ctrl_access(dev, attr); case KVM_DEV_LOONGARCH_EXTIOI_GRP_REGS: - return kvm_eiointc_regs_access(dev, attr, true); + if (copy_from_user(&data, (void __user *)attr->addr, 4)) + return -EFAULT; + + return kvm_eiointc_regs_access(dev, attr, true, &data); case KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS: return kvm_eiointc_sw_status_access(dev, attr, true); default:
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bibo Mao maobibo@loongson.cn
commit 01a8e68396a6d51f5ba92021ad1a4b8eaabdd0e7 upstream.
Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move funtcion calling of copy_from_user() and copy_to_user() out of function kvm_eiointc_sw_status_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_sw_status_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd5 ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kvm/intc/eiointc.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-)
--- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -911,19 +911,17 @@ static int kvm_eiointc_regs_access(struc
static int kvm_eiointc_sw_status_access(struct kvm_device *dev, struct kvm_device_attr *attr, - bool is_write) + bool is_write, int *data) { int addr, ret = 0; unsigned long flags; void *p = NULL; - void __user *data; struct loongarch_eiointc *s;
s = dev->kvm->arch.eiointc; addr = attr->attr; addr &= 0xffff;
- data = (void __user *)attr->addr; switch (addr) { case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_NUM_CPU: if (is_write) @@ -945,13 +943,10 @@ static int kvm_eiointc_sw_status_access( return -EINVAL; } spin_lock_irqsave(&s->lock, flags); - if (is_write) { - if (copy_from_user(p, data, 4)) - ret = -EFAULT; - } else { - if (copy_to_user(data, p, 4)) - ret = -EFAULT; - } + if (is_write) + memcpy(p, data, 4); + else + memcpy(data, p, 4); spin_unlock_irqrestore(&s->lock, flags);
return ret; @@ -973,7 +968,14 @@ static int kvm_eiointc_get_attr(struct k
return ret; case KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS: - return kvm_eiointc_sw_status_access(dev, attr, false); + ret = kvm_eiointc_sw_status_access(dev, attr, false, &data); + if (ret) + return ret; + + if (copy_to_user((void __user *)attr->addr, &data, 4)) + ret = -EFAULT; + + return ret; default: return -EINVAL; } @@ -993,7 +995,10 @@ static int kvm_eiointc_set_attr(struct k
return kvm_eiointc_regs_access(dev, attr, true, &data); case KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS: - return kvm_eiointc_sw_status_access(dev, attr, true); + if (copy_from_user(&data, (void __user *)attr->addr, 4)) + return -EFAULT; + + return kvm_eiointc_sw_status_access(dev, attr, true, &data); default: return -EINVAL; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bibo Mao maobibo@loongson.cn
commit 8dc5245673cf7f33743e5c0d2a4207c0b8df3067 upstream.
Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() out of spinlock context in function kvm_pch_pic_regs_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_pch_pic_regs_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org Fixes: d206d95148732 ("LoongArch: KVM: Add PCHPIC user mode read and write functions") Signed-off-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kvm/intc/pch_pic.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
--- a/arch/loongarch/kvm/intc/pch_pic.c +++ b/arch/loongarch/kvm/intc/pch_pic.c @@ -348,6 +348,7 @@ static int kvm_pch_pic_regs_access(struc struct kvm_device_attr *attr, bool is_write) { + char buf[8]; int addr, offset, len = 8, ret = 0; void __user *data; void *p = NULL; @@ -397,17 +398,23 @@ static int kvm_pch_pic_regs_access(struc return -EINVAL; }
- spin_lock(&s->lock); - /* write or read value according to is_write */ if (is_write) { - if (copy_from_user(p, data, len)) - ret = -EFAULT; - } else { - if (copy_to_user(data, p, len)) - ret = -EFAULT; + if (copy_from_user(buf, data, len)) + return -EFAULT; } + + spin_lock(&s->lock); + if (is_write) + memcpy(p, buf, len); + else + memcpy(buf, p, len); spin_unlock(&s->lock);
+ if (!is_write) { + if (copy_to_user(data, buf, len)) + return -EFAULT; + } + return ret; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bibo Mao maobibo@loongson.cn
commit f58c9aa1065f73d243904b267c71f6a9d1e9f90e upstream.
With PTW disabled system, bit _PAGE_DIRTY is a HW bit for page writing. However with PTW enabled system, bit _PAGE_WRITE is also a "HW bit" for page writing, because hardware synchronizes _PAGE_WRITE to _PAGE_DIRTY automatically. Previously, _PAGE_WRITE is treated as a SW bit to record the page writeable attribute for the fast page fault handling in the secondary MMU, however with PTW enabled machine, this bit is used by HW already (so setting it will silence the TLB modify exception).
Here define KVM_PAGE_WRITEABLE with the SW bit _PAGE_MODIFIED, so that it can work on both PTW disabled and enabled machines. And for HW write bits, both _PAGE_DIRTY and _PAGE_WRITE are set or clear together.
Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/include/asm/kvm_mmu.h | 20 ++++++++++++++++---- arch/loongarch/kvm/mmu.c | 8 ++++---- 2 files changed, 20 insertions(+), 8 deletions(-)
--- a/arch/loongarch/include/asm/kvm_mmu.h +++ b/arch/loongarch/include/asm/kvm_mmu.h @@ -16,6 +16,13 @@ */ #define KVM_MMU_CACHE_MIN_PAGES (CONFIG_PGTABLE_LEVELS - 1)
+/* + * _PAGE_MODIFIED is a SW pte bit, it records page ever written on host + * kernel, on secondary MMU it records the page writeable attribute, in + * order for fast path handling. + */ +#define KVM_PAGE_WRITEABLE _PAGE_MODIFIED + #define _KVM_FLUSH_PGTABLE 0x1 #define _KVM_HAS_PGMASK 0x2 #define kvm_pfn_pte(pfn, prot) (((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot)) @@ -52,10 +59,10 @@ static inline void kvm_set_pte(kvm_pte_t WRITE_ONCE(*ptep, val); }
-static inline int kvm_pte_write(kvm_pte_t pte) { return pte & _PAGE_WRITE; } -static inline int kvm_pte_dirty(kvm_pte_t pte) { return pte & _PAGE_DIRTY; } static inline int kvm_pte_young(kvm_pte_t pte) { return pte & _PAGE_ACCESSED; } static inline int kvm_pte_huge(kvm_pte_t pte) { return pte & _PAGE_HUGE; } +static inline int kvm_pte_dirty(kvm_pte_t pte) { return pte & __WRITEABLE; } +static inline int kvm_pte_writeable(kvm_pte_t pte) { return pte & KVM_PAGE_WRITEABLE; }
static inline kvm_pte_t kvm_pte_mkyoung(kvm_pte_t pte) { @@ -69,12 +76,12 @@ static inline kvm_pte_t kvm_pte_mkold(kv
static inline kvm_pte_t kvm_pte_mkdirty(kvm_pte_t pte) { - return pte | _PAGE_DIRTY; + return pte | __WRITEABLE; }
static inline kvm_pte_t kvm_pte_mkclean(kvm_pte_t pte) { - return pte & ~_PAGE_DIRTY; + return pte & ~__WRITEABLE; }
static inline kvm_pte_t kvm_pte_mkhuge(kvm_pte_t pte) @@ -87,6 +94,11 @@ static inline kvm_pte_t kvm_pte_mksmall( return pte & ~_PAGE_HUGE; }
+static inline kvm_pte_t kvm_pte_mkwriteable(kvm_pte_t pte) +{ + return pte | KVM_PAGE_WRITEABLE; +} + static inline int kvm_need_flush(kvm_ptw_ctx *ctx) { return ctx->flag & _KVM_FLUSH_PGTABLE; --- a/arch/loongarch/kvm/mmu.c +++ b/arch/loongarch/kvm/mmu.c @@ -569,7 +569,7 @@ static int kvm_map_page_fast(struct kvm_ /* Track access to pages marked old */ new = kvm_pte_mkyoung(*ptep); if (write && !kvm_pte_dirty(new)) { - if (!kvm_pte_write(new)) { + if (!kvm_pte_writeable(new)) { ret = -EFAULT; goto out; } @@ -856,9 +856,9 @@ retry: prot_bits |= _CACHE_SUC;
if (writeable) { - prot_bits |= _PAGE_WRITE; + prot_bits = kvm_pte_mkwriteable(prot_bits); if (write) - prot_bits |= __WRITEABLE; + prot_bits = kvm_pte_mkdirty(prot_bits); }
/* Disable dirty logging on HugePages */ @@ -904,7 +904,7 @@ retry: kvm_release_faultin_page(kvm, page, false, writeable); spin_unlock(&kvm->mmu_lock);
- if (prot_bits & _PAGE_DIRTY) + if (kvm_pte_dirty(prot_bits)) mark_page_dirty_in_slot(kvm, memslot, gfn);
out:
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eugene Koira eugkoira@amazon.com
commit dce043c07ca1ac19cfbe2844a6dc71e35c322353 upstream.
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600), which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org Reviewed-by: Nicolas Saenz Julienne nsaenz@amazon.com Reviewed-by: David Woodhouse dwmw@amazon.co.uk Link: https://lore.kernel.org/r/20250826143816.38686-1-eugkoira@amazon.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel joerg.roedel@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1592,6 +1592,10 @@ static void switch_to_super_page(struct unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
+ if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || + !IS_ALIGNED(end_pfn + 1, lvl_pages))) + return; + while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1667,7 +1671,8 @@ __domain_mapping(struct dmar_domain *dom unsigned long pages_to_remove;
pteval |= DMA_PTE_LARGE_PAGE; - pages_to_remove = min_t(unsigned long, nr_pages, + pages_to_remove = min_t(unsigned long, + round_down(nr_pages, lvl_pages), nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhen Ni zhen.ni@easystack.cn
commit 923b70581cb6acede90f8aaf4afe5d1c58c67b71 upstream.
Fix a permanent ACPI table memory leak in early_amd_iommu_init() when CMPXCHG16B feature is not supported
Fixes: 82582f85ed22 ("iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported") Cc: stable@vger.kernel.org Signed-off-by: Zhen Ni zhen.ni@easystack.cn Reviewed-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Link: https://lore.kernel.org/r/20250822024915.673427-1-zhen.ni@easystack.cn Signed-off-by: Joerg Roedel joerg.roedel@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/amd/init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3048,7 +3048,8 @@ static int __init early_amd_iommu_init(v
if (!boot_cpu_has(X86_FEATURE_CX16)) { pr_err("Failed to initialize. The CMPXCHG16B feature is required.\n"); - return -EINVAL; + ret = -EINVAL; + goto out; }
/*
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasant Hegde vasant.hegde@amd.com
commit 1e56310b40fd2e7e0b9493da9ff488af145bdd0c upstream.
The AMD IOMMU host page table implementation supports dynamic page table levels (up to 6 levels), starting with a 3-level configuration that expands based on IOVA address. The kernel maintains a root pointer and current page table level to enable proper page table walks in alloc_pte()/fetch_pte() operations.
The IOMMU IOVA allocator initially starts with 32-bit address and onces its exhuasted it switches to 64-bit address (max address is determined based on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU driver increases page table level.
But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads pgtable->[root/mode] without lock. So its possible that in exteme corner case, when increase_address_space() is updating pgtable->[root/mode], fetch_pte() reads wrong page table level (pgtable->mode). It does compare the value with level encoded in page table and returns NULL. This will result is iommu_unmap ops to fail and upper layer may retry/log WARN_ON.
CPU 0 CPU 1 ------ ------ map pages unmap pages alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte() pgtable->root = pte (new root value) READ pgtable->[mode/root] Reads new root, old mode Updates mode (pgtable->mode += 1)
Since Page table level updates are infrequent and already synchronized with a spinlock, implement seqcount to enable lock-free read operations on the read path.
Fixes: 754265bcab7 ("iommu/amd: Fix race in increase_address_space()") Reported-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Cc: stable@vger.kernel.org Cc: Joao Martins joao.m.martins@oracle.com Cc: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Signed-off-by: Vasant Hegde vasant.hegde@amd.com Signed-off-by: Joerg Roedel joerg.roedel@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/amd/amd_iommu_types.h | 1 + drivers/iommu/amd/io_pgtable.c | 25 +++++++++++++++++++++---- 2 files changed, 22 insertions(+), 4 deletions(-)
--- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -551,6 +551,7 @@ struct gcr3_tbl_info { };
struct amd_io_pgtable { + seqcount_t seqcount; /* Protects root/mode update */ struct io_pgtable pgtbl; int mode; u64 *root; --- a/drivers/iommu/amd/io_pgtable.c +++ b/drivers/iommu/amd/io_pgtable.c @@ -17,6 +17,7 @@ #include <linux/slab.h> #include <linux/types.h> #include <linux/dma-mapping.h> +#include <linux/seqlock.h>
#include <asm/barrier.h>
@@ -130,8 +131,11 @@ static bool increase_address_space(struc
*pte = PM_LEVEL_PDE(pgtable->mode, iommu_virt_to_phys(pgtable->root));
+ write_seqcount_begin(&pgtable->seqcount); pgtable->root = pte; pgtable->mode += 1; + write_seqcount_end(&pgtable->seqcount); + amd_iommu_update_and_flush_device_table(domain);
pte = NULL; @@ -153,6 +157,7 @@ static u64 *alloc_pte(struct amd_io_pgta { unsigned long last_addr = address + (page_size - 1); struct io_pgtable_cfg *cfg = &pgtable->pgtbl.cfg; + unsigned int seqcount; int level, end_lvl; u64 *pte, *page;
@@ -170,8 +175,14 @@ static u64 *alloc_pte(struct amd_io_pgta }
- level = pgtable->mode - 1; - pte = &pgtable->root[PM_LEVEL_INDEX(level, address)]; + do { + seqcount = read_seqcount_begin(&pgtable->seqcount); + + level = pgtable->mode - 1; + pte = &pgtable->root[PM_LEVEL_INDEX(level, address)]; + } while (read_seqcount_retry(&pgtable->seqcount, seqcount)); + + address = PAGE_SIZE_ALIGN(address, page_size); end_lvl = PAGE_SIZE_LEVEL(page_size);
@@ -249,6 +260,7 @@ static u64 *fetch_pte(struct amd_io_pgta unsigned long *page_size) { int level; + unsigned int seqcount; u64 *pte;
*page_size = 0; @@ -256,8 +268,12 @@ static u64 *fetch_pte(struct amd_io_pgta if (address > PM_LEVEL_SIZE(pgtable->mode)) return NULL;
- level = pgtable->mode - 1; - pte = &pgtable->root[PM_LEVEL_INDEX(level, address)]; + do { + seqcount = read_seqcount_begin(&pgtable->seqcount); + level = pgtable->mode - 1; + pte = &pgtable->root[PM_LEVEL_INDEX(level, address)]; + } while (read_seqcount_retry(&pgtable->seqcount, seqcount)); + *page_size = PTE_LEVEL_PAGE_SIZE(level);
while (level > 0) { @@ -541,6 +557,7 @@ static struct io_pgtable *v1_alloc_pgtab if (!pgtable->root) return NULL; pgtable->mode = PAGE_MODE_3_LEVEL; + seqcount_init(&pgtable->seqcount);
cfg->pgsize_bitmap = amd_iommu_pgsize_bitmap; cfg->ias = IOMMU_IN_ADDR_BIT_SIZE;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Rosato mjrosato@linux.ibm.com
commit b3506e9bcc777ed6af2ab631c86a9990ed97b474 upstream.
zpci_get_iommu_ctrs() returns counter information to be reported as part of device statistics; these counters are stored as part of the s390_domain. The problem, however, is that the identity domain is not backed by an s390_domain and so the conversion via to_s390_domain() yields a bad address that is zero'd initially and read on-demand later via a sysfs read. These counters aren't necessary for the identity domain; just return NULL in this case.
This issue was discovered via KASAN with reports that look like: BUG: KASAN: global-out-of-bounds in zpci_fmb_enable_device when using the identity domain for a device on s390.
Cc: stable@vger.kernel.org Fixes: 64af12c6ec3a ("iommu/s390: implement iommu passthrough via identity domain") Reported-by: Cam Miller cam@linux.ibm.com Signed-off-by: Matthew Rosato mjrosato@linux.ibm.com Tested-by: Cam Miller cam@linux.ibm.com Reviewed-by: Farhan Ali alifm@linux.ibm.com Reviewed-by: Niklas Schnelle schnelle@linux.ibm.com Link: https://lore.kernel.org/r/20250827210828.274527-1-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel joerg.roedel@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/s390-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -1031,7 +1031,8 @@ struct zpci_iommu_ctrs *zpci_get_iommu_c
lockdep_assert_held(&zdev->dom_lock);
- if (zdev->s390_domain->type == IOMMU_DOMAIN_BLOCKED) + if (zdev->s390_domain->type == IOMMU_DOMAIN_BLOCKED || + zdev->s390_domain->type == IOMMU_DOMAIN_IDENTITY) return NULL;
s390_domain = to_s390_domain(zdev->s390_domain);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niklas Schnelle schnelle@linux.ibm.com
commit 9ffaf5229055fcfbb3b3d6f1c7e58d63715c3f73 upstream.
When a PCI device is removed with surprise hotplug, there may still be attempts to attach the device to the default domain as part of tear down via (__iommu_release_dma_ownership()), or because the removal happens during probe (__iommu_probe_device()). In both cases zpci_register_ioat() fails with a cc value indicating that the device handle is invalid. This is because the device is no longer part of the instance as far as the hypervisor is concerned.
Currently this leads to an error return and s390_iommu_attach_device() fails. This triggers the WARN_ON() in __iommu_group_set_domain_nofail() because attaching to the default domain must never fail.
With the device fenced by the hypervisor no DMAs to or from memory are possible and the IOMMU translations have no effect. Proceed as if the registration was successful and let the hotplug event handling clean up the device.
This is similar to how devices in the error state are handled since commit 59bbf596791b ("iommu/s390: Make attach succeed even if the device is in error state") except that for removal the domain will not be registered later. This approach was also previously discussed at the link.
Handle both cases, error state and removal, in a helper which checks if the error needs to be propagated or ignored. Avoid magic number condition codes by using the pre-existing, but never used, defines for PCI load/store condition codes and rename them to reflect that they apply to all PCI instructions.
Cc: stable@vger.kernel.org # v6.2 Link: https://lore.kernel.org/linux-iommu/20240808194155.GD1985367@ziepe.ca/ Suggested-by: Jason Gunthorpe jgg@ziepe.ca Signed-off-by: Niklas Schnelle schnelle@linux.ibm.com Reviewed-by: Matthew Rosato mjrosato@linux.ibm.com Reviewed-by: Benjamin Block bblock@linux.ibm.com Link: https://lore.kernel.org/r/20250904-iommu_succeed_attach_removed-v1-1-e7f333d... Signed-off-by: Joerg Roedel joerg.roedel@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/pci_insn.h | 10 +++++----- drivers/iommu/s390-iommu.c | 26 +++++++++++++++++++------- 2 files changed, 24 insertions(+), 12 deletions(-)
--- a/arch/s390/include/asm/pci_insn.h +++ b/arch/s390/include/asm/pci_insn.h @@ -16,11 +16,11 @@ #define ZPCI_PCI_ST_FUNC_NOT_AVAIL 40 #define ZPCI_PCI_ST_ALREADY_IN_RQ_STATE 44
-/* Load/Store return codes */ -#define ZPCI_PCI_LS_OK 0 -#define ZPCI_PCI_LS_ERR 1 -#define ZPCI_PCI_LS_BUSY 2 -#define ZPCI_PCI_LS_INVAL_HANDLE 3 +/* PCI instruction condition codes */ +#define ZPCI_CC_OK 0 +#define ZPCI_CC_ERR 1 +#define ZPCI_CC_BUSY 2 +#define ZPCI_CC_INVAL_HANDLE 3
/* Load/Store address space identifiers */ #define ZPCI_PCIAS_MEMIO_0 0 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -611,6 +611,23 @@ static u64 get_iota_region_flag(struct s } }
+static bool reg_ioat_propagate_error(int cc, u8 status) +{ + /* + * If the device is in the error state the reset routine + * will register the IOAT of the newly set domain on re-enable + */ + if (cc == ZPCI_CC_ERR && status == ZPCI_PCI_ST_FUNC_NOT_AVAIL) + return false; + /* + * If the device was removed treat registration as success + * and let the subsequent error event trigger tear down. + */ + if (cc == ZPCI_CC_INVAL_HANDLE) + return false; + return cc != ZPCI_CC_OK; +} + static int s390_iommu_domain_reg_ioat(struct zpci_dev *zdev, struct iommu_domain *domain, u8 *status) { @@ -695,7 +712,7 @@ static int s390_iommu_attach_device(stru
/* If we fail now DMA remains blocked via blocking domain */ cc = s390_iommu_domain_reg_ioat(zdev, domain, &status); - if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) + if (reg_ioat_propagate_error(cc, status)) return -EIO; zdev->dma_table = s390_domain->dma_table; zdev_s390_domain_update(zdev, domain); @@ -1123,12 +1140,7 @@ static int s390_attach_dev_identity(stru
/* If we fail now DMA remains blocked via blocking domain */ cc = s390_iommu_domain_reg_ioat(zdev, domain, &status); - - /* - * If the device is undergoing error recovery the reset code - * will re-establish the new domain. - */ - if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) + if (reg_ioat_propagate_error(cc, status)) return -EIO;
zdev_s390_domain_update(zdev, domain);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 96fa515e70f3e4b98685ef8cac9d737fc62f10e1 upstream.
[BUG] Inside check_inode_ref(), we need to make sure every structure, including the btrfs_inode_extref header, is covered by the item. But our code is incorrectly using "sizeof(iref)", where @iref is just a pointer.
This means "sizeof(iref)" will always be "sizeof(void *)", which is much smaller than "sizeof(struct btrfs_inode_extref)".
This will allow some bad inode extrefs to sneak in, defeating tree-checker.
[FIX] Fix the typo by calling "sizeof(*iref)", which is the same as "sizeof(struct btrfs_inode_extref)", and will be the correct behavior we want.
Fixes: 71bf92a9b877 ("btrfs: tree-checker: Add check for INODE_REF") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-checker.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1756,10 +1756,10 @@ static int check_inode_ref(struct extent while (ptr < end) { u16 namelen;
- if (unlikely(ptr + sizeof(iref) > end)) { + if (unlikely(ptr + sizeof(*iref) > end)) { inode_ref_err(leaf, slot, "inode ref overflow, ptr %lu end %lu inode_ref_size %zu", - ptr, end, sizeof(iref)); + ptr, end, sizeof(*iref)); return -EUCLEAN; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Strozek mstrozek@opensource.cirrus.com
commit 28edfaa10ca1b370b1a27fde632000d35c43402c upstream.
Certain systems have CS42L43 DisCo that claims to conform to version 0.6.28 but uses the function types from the 1.0 spec. Add a quirk as a workaround.
Closes: https://github.com/thesofproject/linux/issues/5515 Cc: stable@vger.kernel.org Signed-off-by: Maciej Strozek mstrozek@opensource.cirrus.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.dev Link: https://patch.msgid.link/20250901151518.3197941-1-mstrozek@opensource.cirrus... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/sound/sdca.h | 1 + sound/soc/sdca/sdca_device.c | 20 ++++++++++++++++++++ sound/soc/sdca/sdca_functions.c | 13 ++++++++----- 3 files changed, 29 insertions(+), 5 deletions(-)
--- a/include/sound/sdca.h +++ b/include/sound/sdca.h @@ -46,6 +46,7 @@ struct sdca_device_data {
enum sdca_quirk { SDCA_QUIRKS_RT712_VB, + SDCA_QUIRKS_SKIP_FUNC_TYPE_PATCHING, };
#if IS_ENABLED(CONFIG_ACPI) && IS_ENABLED(CONFIG_SND_SOC_SDCA) --- a/sound/soc/sdca/sdca_device.c +++ b/sound/soc/sdca/sdca_device.c @@ -7,6 +7,7 @@ */
#include <linux/acpi.h> +#include <linux/dmi.h> #include <linux/module.h> #include <linux/property.h> #include <linux/soundwire/sdw.h> @@ -55,11 +56,30 @@ static bool sdca_device_quirk_rt712_vb(s return false; }
+static bool sdca_device_quirk_skip_func_type_patching(struct sdw_slave *slave) +{ + const char *vendor, *sku; + + vendor = dmi_get_system_info(DMI_SYS_VENDOR); + sku = dmi_get_system_info(DMI_PRODUCT_SKU); + + if (vendor && sku && + !strcmp(vendor, "Dell Inc.") && + (!strcmp(sku, "0C62") || !strcmp(sku, "0C63") || !strcmp(sku, "0C6B")) && + slave->sdca_data.interface_revision == 0x061c && + slave->id.mfg_id == 0x01fa && slave->id.part_id == 0x4243) + return true; + + return false; +} + bool sdca_device_quirk_match(struct sdw_slave *slave, enum sdca_quirk quirk) { switch (quirk) { case SDCA_QUIRKS_RT712_VB: return sdca_device_quirk_rt712_vb(slave); + case SDCA_QUIRKS_SKIP_FUNC_TYPE_PATCHING: + return sdca_device_quirk_skip_func_type_patching(slave); default: break; } --- a/sound/soc/sdca/sdca_functions.c +++ b/sound/soc/sdca/sdca_functions.c @@ -89,6 +89,7 @@ static int find_sdca_function(struct acp { struct fwnode_handle *function_node = acpi_fwnode_handle(adev); struct sdca_device_data *sdca_data = data; + struct sdw_slave *slave = container_of(sdca_data, struct sdw_slave, sdca_data); struct device *dev = &adev->dev; struct fwnode_handle *control5; /* used to identify function type */ const char *function_name; @@ -136,11 +137,13 @@ static int find_sdca_function(struct acp return ret; }
- ret = patch_sdca_function_type(sdca_data->interface_revision, &function_type); - if (ret < 0) { - dev_err(dev, "SDCA version %#x invalid function type %d\n", - sdca_data->interface_revision, function_type); - return ret; + if (!sdca_device_quirk_match(slave, SDCA_QUIRKS_SKIP_FUNC_TYPE_PATCHING)) { + ret = patch_sdca_function_type(sdca_data->interface_revision, &function_type); + if (ret < 0) { + dev_err(dev, "SDCA version %#x invalid function type %d\n", + sdca_data->interface_revision, function_type); + return ret; + } }
function_name = get_sdca_function_name(function_type);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mohammad Rafi Shaik mohammad.rafi.shaik@oss.qualcomm.com
commit 5f1af203ef964e7f7bf9d32716dfa5f332cc6f09 upstream.
Fix missing lpaif_type configuration for the I2S interface. The proper lpaif interface type required to allow DSP to vote appropriate clock setting for I2S interface.
Fixes: 25ab80db6b133 ("ASoC: qdsp6: audioreach: add module configuration command helpers") Cc: stable@vger.kernel.org Reviewed-by: Srinivas Kandagatla srinivas.kandagatla@oss.qualcomm.com Signed-off-by: Mohammad Rafi Shaik mohammad.rafi.shaik@oss.qualcomm.com Message-ID: 20250908053631.70978-2-mohammad.rafi.shaik@oss.qualcomm.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/qcom/qdsp6/audioreach.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/soc/qcom/qdsp6/audioreach.c +++ b/sound/soc/qcom/qdsp6/audioreach.c @@ -971,6 +971,7 @@ static int audioreach_i2s_set_media_form param_data->param_id = PARAM_ID_I2S_INTF_CFG; param_data->param_size = ic_sz - APM_MODULE_PARAM_DATA_SIZE;
+ intf_cfg->cfg.lpaif_type = module->hw_interface_type; intf_cfg->cfg.intf_idx = module->hw_interface_idx; intf_cfg->cfg.sd_line_idx = module->sd_line_idx;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 68f27f7c7708183e7873c585ded2f1b057ac5b97 upstream.
If earlier opening of source graph fails (e.g. ADSP rejects due to incorrect audioreach topology), the graph is closed and "dai_data->graph[dai->id]" is assigned NULL. Preparing the DAI for sink graph continues though and next call to q6apm_lpass_dai_prepare() receives dai_data->graph[dai->id]=NULL leading to NULL pointer exception:
qcom-apm gprsvc:service:2:1: Error (1) Processing 0x01001002 cmd qcom-apm gprsvc:service:2:1: DSP returned error[1001002] 1 q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: fail to start APM port 78 q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: ASoC: error at snd_soc_pcm_dai_prepare on TX_CODEC_DMA_TX_3: -22 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8 ... Call trace: q6apm_graph_media_format_pcm+0x48/0x120 (P) q6apm_lpass_dai_prepare+0x110/0x1b4 snd_soc_pcm_dai_prepare+0x74/0x108 __soc_pcm_prepare+0x44/0x160 dpcm_be_dai_prepare+0x124/0x1c0
Fixes: 30ad723b93ad ("ASoC: qdsp6: audioreach: add q6apm lpass dai support") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Srinivas Kandagatla srinivas.kandagatla@oss.qualcomm.com Message-ID: 20250904101849.121503-2-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/qcom/qdsp6/q6apm-lpass-dais.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c +++ b/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c @@ -213,8 +213,10 @@ static int q6apm_lpass_dai_prepare(struc
return 0; err: - q6apm_graph_close(dai_data->graph[dai->id]); - dai_data->graph[dai->id] = NULL; + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) { + q6apm_graph_close(dai_data->graph[dai->id]); + dai_data->graph[dai->id] = NULL; + } return rc; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mohammad Rafi Shaik mohammad.rafi.shaik@oss.qualcomm.com
commit 33b55b94bca904ca25a9585e3cd43d15f0467969 upstream.
The q6i2s_set_fmt() function was defined but never linked into the I2S DAI operations, resulting DAI format settings is being ignored during stream setup. This change fixes the issue by properly linking the .set_fmt handler within the DAI ops.
Fixes: 30ad723b93ade ("ASoC: qdsp6: audioreach: add q6apm lpass dai support") Cc: stable@vger.kernel.org Reviewed-by: Srinivas Kandagatla srinivas.kandagatla@oss.qualcomm.com Signed-off-by: Mohammad Rafi Shaik mohammad.rafi.shaik@oss.qualcomm.com Message-ID: 20250908053631.70978-3-mohammad.rafi.shaik@oss.qualcomm.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/qcom/qdsp6/q6apm-lpass-dais.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c +++ b/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c @@ -262,6 +262,7 @@ static const struct snd_soc_dai_ops q6i2 .shutdown = q6apm_lpass_dai_shutdown, .set_channel_map = q6dma_set_channel_map, .hw_params = q6dma_hw_params, + .set_fmt = q6i2s_set_fmt, };
static const struct snd_soc_dai_ops q6hdmi_ops = {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Fourier fourier.thomas@gmail.com
commit 8ab2f1c35669bff7d7ed1bb16bf5cc989b3e2e17 upstream.
The dma_unmap_sg() functions should be called with the same nents as the dma_map_sg(), not the value the map function returned.
Fixes: 236caa7cc351 ("mmc: SDIO driver for Marvell SoCs") Signed-off-by: Thomas Fourier fourier.thomas@gmail.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mvsdio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mmc/host/mvsdio.c +++ b/drivers/mmc/host/mvsdio.c @@ -292,7 +292,7 @@ static u32 mvsd_finish_data(struct mvsd_ host->pio_ptr = NULL; host->pio_size = 0; } else { - dma_unmap_sg(mmc_dev(host->mmc), data->sg, host->sg_frags, + dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, mmc_get_dma_dir(data)); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Chuang ben.chuang@genesyslogic.com.tw
commit 7b7e71683b4ccbe0dbd7d434707623327e852f20 upstream.
The sdhci_set_clock() is called in sdhci_set_ios_common() and __sdhci_uhs2_set_ios(). According to Section 3.13.2 "Card Interface Detection Sequence" of the SD Host Controller Standard Specification Version 7.00, the SD clock is supplied after power is supplied, so we only need one in __sdhci_uhs2_set_ios(). Let's move the code related to setting the clock from sdhci_set_ios_common() into sdhci_set_ios() and modify the parameters passed to sdhci_set_clock() in __sdhci_uhs2_set_ios().
Fixes: 10c8298a052b ("mmc: sdhci-uhs2: add set_ios()") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Ben Chuang ben.chuang@genesyslogic.com.tw Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-uhs2.c | 3 ++- drivers/mmc/host/sdhci.c | 34 +++++++++++++++++----------------- 2 files changed, 19 insertions(+), 18 deletions(-)
--- a/drivers/mmc/host/sdhci-uhs2.c +++ b/drivers/mmc/host/sdhci-uhs2.c @@ -295,7 +295,8 @@ static void __sdhci_uhs2_set_ios(struct else sdhci_uhs2_set_power(host, ios->power_mode, ios->vdd);
- sdhci_set_clock(host, host->clock); + sdhci_set_clock(host, ios->clock); + host->clock = ios->clock; }
static int sdhci_uhs2_set_ios(struct mmc_host *mmc, struct mmc_ios *ios) --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2367,23 +2367,6 @@ void sdhci_set_ios_common(struct mmc_hos (ios->power_mode == MMC_POWER_UP) && !(host->quirks2 & SDHCI_QUIRK2_PRESET_VALUE_BROKEN)) sdhci_enable_preset_value(host, false); - - if (!ios->clock || ios->clock != host->clock) { - host->ops->set_clock(host, ios->clock); - host->clock = ios->clock; - - if (host->quirks & SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK && - host->clock) { - host->timeout_clk = mmc->actual_clock ? - mmc->actual_clock / 1000 : - host->clock / 1000; - mmc->max_busy_timeout = - host->ops->get_max_timeout_count ? - host->ops->get_max_timeout_count(host) : - 1 << 27; - mmc->max_busy_timeout /= host->timeout_clk; - } - } } EXPORT_SYMBOL_GPL(sdhci_set_ios_common);
@@ -2410,6 +2393,23 @@ void sdhci_set_ios(struct mmc_host *mmc,
sdhci_set_ios_common(mmc, ios);
+ if (!ios->clock || ios->clock != host->clock) { + host->ops->set_clock(host, ios->clock); + host->clock = ios->clock; + + if (host->quirks & SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK && + host->clock) { + host->timeout_clk = mmc->actual_clock ? + mmc->actual_clock / 1000 : + host->clock / 1000; + mmc->max_busy_timeout = + host->ops->get_max_timeout_count ? + host->ops->get_max_timeout_count(host) : + 1 << 27; + mmc->max_busy_timeout /= host->timeout_clk; + } + } + if (host->ops->set_power) host->ops->set_power(host, ios->power_mode, ios->vdd); else
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Chuang ben.chuang@genesyslogic.com.tw
commit 77a436c93d10d68201bfd4941d1ca3230dfd1f40 upstream.
According to the power structure of IC hardware design for UHS-II interface, reset control and timing must be added to the initialization process of powering on the UHS-II interface.
Fixes: 27dd3b82557a ("mmc: sdhci-pci-gli: enable UHS-II mode for GL9767") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Ben Chuang ben.chuang@genesyslogic.com.tw Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-pci-gli.c | 68 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-pci-gli.c +++ b/drivers/mmc/host/sdhci-pci-gli.c @@ -283,6 +283,8 @@ #define PCIE_GLI_9767_UHS2_CTL2_ZC_VALUE 0xb #define PCIE_GLI_9767_UHS2_CTL2_ZC_CTL BIT(6) #define PCIE_GLI_9767_UHS2_CTL2_ZC_CTL_VALUE 0x1 +#define PCIE_GLI_9767_UHS2_CTL2_FORCE_PHY_RESETN BIT(13) +#define PCIE_GLI_9767_UHS2_CTL2_FORCE_RESETN_VALUE BIT(14)
#define GLI_MAX_TUNING_LOOP 40
@@ -1179,6 +1181,65 @@ static void gl9767_set_low_power_negotia gl9767_vhs_read(pdev); }
+static void sdhci_gl9767_uhs2_phy_reset(struct sdhci_host *host, bool assert) +{ + struct sdhci_pci_slot *slot = sdhci_priv(host); + struct pci_dev *pdev = slot->chip->pdev; + u32 value, set, clr; + + if (assert) { + /* Assert reset, set RESETN and clean RESETN_VALUE */ + set = PCIE_GLI_9767_UHS2_CTL2_FORCE_PHY_RESETN; + clr = PCIE_GLI_9767_UHS2_CTL2_FORCE_RESETN_VALUE; + } else { + /* De-assert reset, clean RESETN and set RESETN_VALUE */ + set = PCIE_GLI_9767_UHS2_CTL2_FORCE_RESETN_VALUE; + clr = PCIE_GLI_9767_UHS2_CTL2_FORCE_PHY_RESETN; + } + + gl9767_vhs_write(pdev); + pci_read_config_dword(pdev, PCIE_GLI_9767_UHS2_CTL2, &value); + value |= set; + pci_write_config_dword(pdev, PCIE_GLI_9767_UHS2_CTL2, value); + value &= ~clr; + pci_write_config_dword(pdev, PCIE_GLI_9767_UHS2_CTL2, value); + gl9767_vhs_read(pdev); +} + +static void __gl9767_uhs2_set_power(struct sdhci_host *host, unsigned char mode, unsigned short vdd) +{ + u8 pwr = 0; + + if (mode != MMC_POWER_OFF) { + pwr = sdhci_get_vdd_value(vdd); + if (!pwr) + WARN(1, "%s: Invalid vdd %#x\n", + mmc_hostname(host->mmc), vdd); + pwr |= SDHCI_VDD2_POWER_180; + } + + if (host->pwr == pwr) + return; + + host->pwr = pwr; + + if (pwr == 0) { + sdhci_writeb(host, 0, SDHCI_POWER_CONTROL); + } else { + sdhci_writeb(host, 0, SDHCI_POWER_CONTROL); + + pwr |= SDHCI_POWER_ON; + sdhci_writeb(host, pwr & 0xf, SDHCI_POWER_CONTROL); + usleep_range(5000, 6250); + + /* Assert reset */ + sdhci_gl9767_uhs2_phy_reset(host, true); + pwr |= SDHCI_VDD2_POWER_ON; + sdhci_writeb(host, pwr, SDHCI_POWER_CONTROL); + usleep_range(5000, 6250); + } +} + static void sdhci_gl9767_set_clock(struct sdhci_host *host, unsigned int clock) { struct sdhci_pci_slot *slot = sdhci_priv(host); @@ -1205,6 +1266,11 @@ static void sdhci_gl9767_set_clock(struc }
sdhci_enable_clk(host, clk); + + if (mmc_card_uhs2(host->mmc)) + /* De-assert reset */ + sdhci_gl9767_uhs2_phy_reset(host, false); + gl9767_set_low_power_negotiation(pdev, true); }
@@ -1476,7 +1542,7 @@ static void sdhci_gl9767_set_power(struc gl9767_vhs_read(pdev);
sdhci_gli_overcurrent_event_enable(host, false); - sdhci_uhs2_set_power(host, mode, vdd); + __gl9767_uhs2_set_power(host, mode, vdd); sdhci_gli_overcurrent_event_enable(host, true); } else { gl9767_vhs_write(pdev);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Chuang ben.chuang@genesyslogic.com.tw
commit 09c2b628f6403ad467fc73326a50020590603871 upstream.
Fix calling incorrect sdhci_set_clock() in __sdhci_uhs2_set_ios() when the vendor defines its own sdhci_set_clock().
Fixes: 10c8298a052b ("mmc: sdhci-uhs2: add set_ios()") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Ben Chuang ben.chuang@genesyslogic.com.tw Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-uhs2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-uhs2.c +++ b/drivers/mmc/host/sdhci-uhs2.c @@ -295,7 +295,7 @@ static void __sdhci_uhs2_set_ios(struct else sdhci_uhs2_set_power(host, ios->power_mode, ios->vdd);
- sdhci_set_clock(host, ios->clock); + host->ops->set_clock(host, ios->clock); host->clock = ios->clock; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
commit 7f830e126dc357fc086905ce9730140fd4528d66 upstream.
The sev_evict_cache() is guest-related code and should be guarded by CONFIG_AMD_MEM_ENCRYPT, not CONFIG_KVM_AMD_SEV.
CONFIG_AMD_MEM_ENCRYPT=y is required for a guest to run properly as an SEV-SNP guest, but a guest kernel built with CONFIG_KVM_AMD_SEV=n would get the stub function of sev_evict_cache() instead of the version that performs the actual eviction. Move the function declarations under the appropriate #ifdef.
Fixes: 7b306dfa326f ("x86/sev: Evict cache lines during SNP memory validation") Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@kernel.org # 6.16.x Link: https://lore.kernel.org/r/70e38f2c4a549063de54052c9f64929705313526.175770895... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/sev.h | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-)
--- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -564,6 +564,24 @@ enum es_result sev_es_ghcb_hv_call(struc
extern struct ghcb *boot_ghcb;
+static inline void sev_evict_cache(void *va, int npages) +{ + volatile u8 val __always_unused; + u8 *bytes = va; + int page_idx; + + /* + * For SEV guests, a read from the first/last cache-lines of a 4K page + * using the guest key is sufficient to cause a flush of all cache-lines + * associated with that 4K page without incurring all the overhead of a + * full CLFLUSH sequence. + */ + for (page_idx = 0; page_idx < npages; page_idx++) { + val = bytes[page_idx * PAGE_SIZE]; + val = bytes[page_idx * PAGE_SIZE + PAGE_SIZE - 1]; + } +} + #else /* !CONFIG_AMD_MEM_ENCRYPT */
#define snp_vmpl 0 @@ -607,6 +625,7 @@ static inline int snp_send_guest_request static inline int snp_svsm_vtpm_send_command(u8 *buffer) { return -ENODEV; } static inline void __init snp_secure_tsc_prepare(void) { } static inline void __init snp_secure_tsc_init(void) { } +static inline void sev_evict_cache(void *va, int npages) {}
#endif /* CONFIG_AMD_MEM_ENCRYPT */
@@ -621,24 +640,6 @@ int rmp_make_shared(u64 pfn, enum pg_lev void snp_leak_pages(u64 pfn, unsigned int npages); void kdump_sev_callback(void); void snp_fixup_e820_tables(void); - -static inline void sev_evict_cache(void *va, int npages) -{ - volatile u8 val __always_unused; - u8 *bytes = va; - int page_idx; - - /* - * For SEV guests, a read from the first/last cache-lines of a 4K page - * using the guest key is sufficient to cause a flush of all cache-lines - * associated with that 4K page without incurring all the overhead of a - * full CLFLUSH sequence. - */ - for (page_idx = 0; page_idx < npages; page_idx++) { - val = bytes[page_idx * PAGE_SIZE]; - val = bytes[page_idx * PAGE_SIZE + PAGE_SIZE - 1]; - } -} #else static inline bool snp_probe_rmptable_info(void) { return false; } static inline int snp_rmptable_init(void) { return -ENOSYS; } @@ -654,7 +655,6 @@ static inline int rmp_make_shared(u64 pf static inline void snp_leak_pages(u64 pfn, unsigned int npages) {} static inline void kdump_sev_callback(void) { } static inline void snp_fixup_e820_tables(void) {} -static inline void sev_evict_cache(void *va, int npages) {} #endif
#endif
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej S. Szmigiero maciej.szmigiero@oracle.com
commit d02e48830e3fce9701265f6c5a58d9bdaf906a76 upstream.
Commit 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") inhibited pre-VMRUN sync of TPR from LAPIC into VMCB::V_TPR in sync_lapic_to_cr8() when AVIC is active.
AVIC does automatically sync between these two fields, however it does so only on explicit guest writes to one of these fields, not on a bare VMRUN.
This meant that when AVIC is enabled host changes to TPR in the LAPIC state might not get automatically copied into the V_TPR field of VMCB.
This is especially true when it is the userspace setting LAPIC state via KVM_SET_LAPIC ioctl() since userspace does not have access to the guest VMCB.
Practice shows that it is the V_TPR that is actually used by the AVIC to decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI), so any leftover value in V_TPR will cause serious interrupt delivery issues in the guest when AVIC is enabled.
Fix this issue by doing pre-VMRUN TPR sync from LAPIC into VMCB::V_TPR even when AVIC is enabled.
Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") Cc: stable@vger.kernel.org Signed-off-by: Maciej S. Szmigiero maciej.szmigiero@oracle.com Reviewed-by: Naveen N Rao (AMD) naveen@kernel.org Link: https://lore.kernel.org/r/c231be64280b1461e854e1ce3595d70cde3a2e9d.175613967... [sean: tag for stable@] Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4204,8 +4204,7 @@ static inline void sync_lapic_to_cr8(str struct vcpu_svm *svm = to_svm(vcpu); u64 cr8;
- if (nested_svm_virtualize_tpr(vcpu) || - kvm_vcpu_apicv_active(vcpu)) + if (nested_svm_virtualize_tpr(vcpu)) return;
cr8 = kvm_get_cr8(vcpu);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 2ade36eaa9ac05e4913e9785df19c2cde8f912fb upstream.
When in S0i3, the GFX state is retained, so all we need to do is stop the runlist so GFX can enter gfxoff.
Reviewed-by: Mario Limonciello (AMD) superm1@kernel.org Tested-by: David Perry david.perry@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 16 +++++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 12 +++++++++ drivers/gpu/drm/amd/amdkfd/kfd_device.c | 36 +++++++++++++++++++++++++++++ 3 files changed, 60 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -250,16 +250,24 @@ void amdgpu_amdkfd_interrupt(struct amdg
void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool suspend_proc) { - if (adev->kfd.dev) - kgd2kfd_suspend(adev->kfd.dev, suspend_proc); + if (adev->kfd.dev) { + if (adev->in_s0ix) + kgd2kfd_stop_sched_all_nodes(adev->kfd.dev); + else + kgd2kfd_suspend(adev->kfd.dev, suspend_proc); + } }
int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool resume_proc) { int r = 0;
- if (adev->kfd.dev) - r = kgd2kfd_resume(adev->kfd.dev, resume_proc); + if (adev->kfd.dev) { + if (adev->in_s0ix) + r = kgd2kfd_start_sched_all_nodes(adev->kfd.dev); + else + r = kgd2kfd_resume(adev->kfd.dev, resume_proc); + }
return r; } --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -426,7 +426,9 @@ void kgd2kfd_smi_event_throttle(struct k int kgd2kfd_check_and_lock_kfd(void); void kgd2kfd_unlock_kfd(void); int kgd2kfd_start_sched(struct kfd_dev *kfd, uint32_t node_id); +int kgd2kfd_start_sched_all_nodes(struct kfd_dev *kfd); int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id); +int kgd2kfd_stop_sched_all_nodes(struct kfd_dev *kfd); bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id); bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, bool retry_fault); @@ -516,10 +518,20 @@ static inline int kgd2kfd_start_sched(st return 0; }
+static inline int kgd2kfd_start_sched_all_nodes(struct kfd_dev *kfd) +{ + return 0; +} + static inline int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id) { return 0; } + +static inline int kgd2kfd_stop_sched_all_nodes(struct kfd_dev *kfd) +{ + return 0; +}
static inline bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id) { --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -1501,6 +1501,25 @@ int kgd2kfd_start_sched(struct kfd_dev * return ret; }
+int kgd2kfd_start_sched_all_nodes(struct kfd_dev *kfd) +{ + struct kfd_node *node; + int i, r; + + if (!kfd->init_complete) + return 0; + + for (i = 0; i < kfd->num_nodes; i++) { + node = kfd->nodes[i]; + r = node->dqm->ops.unhalt(node->dqm); + if (r) { + dev_err(kfd_device, "Error in starting scheduler\n"); + return r; + } + } + return 0; +} + int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id) { struct kfd_node *node; @@ -1518,6 +1537,23 @@ int kgd2kfd_stop_sched(struct kfd_dev *k return node->dqm->ops.halt(node->dqm); }
+int kgd2kfd_stop_sched_all_nodes(struct kfd_dev *kfd) +{ + struct kfd_node *node; + int i, r; + + if (!kfd->init_complete) + return 0; + + for (i = 0; i < kfd->num_nodes; i++) { + node = kfd->nodes[i]; + r = node->dqm->ops.halt(node->dqm); + if (r) + return r; + } + return 0; +} + bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id) { struct kfd_node *node;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 9272bb34b066993f5f468b219b4a26ba3f2b25a1 upstream.
We need to make sure the user queues are preempted so GFX can enter gfxoff.
Reviewed-by: Mario Limonciello (AMD) superm1@kernel.org Tested-by: David Perry david.perry@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit f8b367e6fa1716cab7cc232b9e3dff29187fc99d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5055,7 +5055,7 @@ int amdgpu_device_suspend(struct drm_dev adev->in_suspend = true;
if (amdgpu_sriov_vf(adev)) { - if (!adev->in_s0ix && !adev->in_runpm) + if (!adev->in_runpm) amdgpu_amdkfd_suspend_process(adev); amdgpu_virt_fini_data_exchange(adev); r = amdgpu_virt_request_full_gpu(adev, false); @@ -5075,10 +5075,8 @@ int amdgpu_device_suspend(struct drm_dev
amdgpu_device_ip_suspend_phase1(adev);
- if (!adev->in_s0ix) { - amdgpu_amdkfd_suspend(adev, !amdgpu_sriov_vf(adev) && !adev->in_runpm); - amdgpu_userq_suspend(adev); - } + amdgpu_amdkfd_suspend(adev, !amdgpu_sriov_vf(adev) && !adev->in_runpm); + amdgpu_userq_suspend(adev);
r = amdgpu_device_evict_resources(adev); if (r) @@ -5141,15 +5139,13 @@ int amdgpu_device_resume(struct drm_devi goto exit; }
- if (!adev->in_s0ix) { - r = amdgpu_amdkfd_resume(adev, !amdgpu_sriov_vf(adev) && !adev->in_runpm); - if (r) - goto exit; + r = amdgpu_amdkfd_resume(adev, !amdgpu_sriov_vf(adev) && !adev->in_runpm); + if (r) + goto exit;
- r = amdgpu_userq_resume(adev); - if (r) - goto exit; - } + r = amdgpu_userq_resume(adev); + if (r) + goto exit;
r = amdgpu_device_ip_late_init(adev); if (r) @@ -5162,7 +5158,7 @@ exit: amdgpu_virt_init_data_exchange(adev); amdgpu_virt_release_full_gpu(adev, true);
- if (!adev->in_s0ix && !r && !adev->in_runpm) + if (!r && !adev->in_runpm) r = amdgpu_amdkfd_resume_process(adev); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Lipski ivan.lipski@amd.com
commit 29a2f430475357f760679b249f33e7282688e292 upstream.
[Why&How] As reported on https://gitlab.freedesktop.org/drm/amd/-/issues/3936, SMU hang can occur if the interrupts are not enabled appropriately, causing a vblank timeout.
This patch reverts commit 5009628d8509 ("drm/amd/display: Remove unnecessary amdgpu_irq_get/put"), but only for RX6xxx & RX7700 GPUs, on which the issue was observed.
This will re-enable interrupts regardless of whether the user space needed it or not.
Fixes: 5009628d8509 ("drm/amd/display: Remove unnecessary amdgpu_irq_get/put") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3936 Suggested-by: Sun peng Li sunpeng.li@amd.com Reviewed-by: Sun peng Li sunpeng.li@amd.com Signed-off-by: Ivan Lipski ivan.lipski@amd.com Signed-off-by: Ray Wu ray.wu@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 95d168b367aa28a59f94fc690ff76ebf69312c6d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 39 +++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8689,7 +8689,16 @@ static int amdgpu_dm_encoder_init(struct static void manage_dm_interrupts(struct amdgpu_device *adev, struct amdgpu_crtc *acrtc, struct dm_crtc_state *acrtc_state) -{ +{ /* + * We cannot be sure that the frontend index maps to the same + * backend index - some even map to more than one. + * So we have to go through the CRTC to find the right IRQ. + */ + int irq_type = amdgpu_display_crtc_idx_to_irq_type( + adev, + acrtc->crtc_id); + struct drm_device *dev = adev_to_drm(adev); + struct drm_vblank_crtc_config config = {0}; struct dc_crtc_timing *timing; int offdelay; @@ -8742,7 +8751,35 @@ static void manage_dm_interrupts(struct
drm_crtc_vblank_on_config(&acrtc->base, &config); + /* Allow RX6xxx, RX7700, RX7800 GPUs to call amdgpu_irq_get.*/ + switch (amdgpu_ip_version(adev, DCE_HWIP, 0)) { + case IP_VERSION(3, 0, 0): + case IP_VERSION(3, 0, 2): + case IP_VERSION(3, 0, 3): + case IP_VERSION(3, 2, 0): + if (amdgpu_irq_get(adev, &adev->pageflip_irq, irq_type)) + drm_err(dev, "DM_IRQ: Cannot get pageflip irq!\n"); +#if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) + if (amdgpu_irq_get(adev, &adev->vline0_irq, irq_type)) + drm_err(dev, "DM_IRQ: Cannot get vline0 irq!\n"); +#endif + } + } else { + /* Allow RX6xxx, RX7700, RX7800 GPUs to call amdgpu_irq_put.*/ + switch (amdgpu_ip_version(adev, DCE_HWIP, 0)) { + case IP_VERSION(3, 0, 0): + case IP_VERSION(3, 0, 2): + case IP_VERSION(3, 0, 3): + case IP_VERSION(3, 2, 0): +#if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) + if (amdgpu_irq_put(adev, &adev->vline0_irq, irq_type)) + drm_err(dev, "DM_IRQ: Cannot put vline0 irq!\n"); +#endif + if (amdgpu_irq_put(adev, &adev->pageflip_irq, irq_type)) + drm_err(dev, "DM_IRQ: Cannot put pageflip irq!\n"); + } + drm_crtc_vblank_off(&acrtc->base); } }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit f9b80514a7227c589291792cb6743b0ddf41c2bc upstream.
If OD is not enabled then restoring cached clock settings doesn't make sense and actually leads to errors in resume.
Check if enabled before restoring settings.
Fixes: 4e9526924d09 ("drm/amd: Restore cached manual clock settings during resume") Reported-by: Jérôme Lécuyer jerome.4a4c@gmail.com Closes: https://lore.kernel.org/amd-gfx/0ffe2692-7bfa-4821-856e-dd0f18e2c32b@amd.com... Suggested-by: Jérôme Lécuyer jerome.4a4c@gmail.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 1a4dd33cc6e1baaa81efdbe68227a19f51c50f20) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -2185,7 +2185,7 @@ static int smu_resume(struct amdgpu_ip_b return ret; }
- if (smu_dpm_ctx->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL) { + if (smu_dpm_ctx->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL && smu->od_enabled) { ret = smu_od_edit_dpm_table(smu, PP_OD_COMMIT_DPM_TABLE, NULL, 0); if (ret) return ret;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
commit cd4ea81be3eb94047ad023c631afd9bd6c295400 upstream.
Commit 88e6c42e40de ("io_uring/io-wq: add check free worker before create new worker") reused the variable `do_create` for something else, abusing it for the free worker check.
This caused the value to effectively always be `true` at the time `nr_workers < max_workers` was checked, but it should really be `false`. This means the `max_workers` setting was ignored, and worse: if the limit had already been reached, incrementing `nr_workers` was skipped even though another worker would be created.
When later lots of workers exit, the `nr_workers` field could easily underflow, making the problem worse because more and more workers would be created without incrementing `nr_workers`.
The simple solution is to use a different variable for the free worker check instead of using one variable for two different things.
Cc: stable@vger.kernel.org Fixes: 88e6c42e40de ("io_uring/io-wq: add check free worker before create new worker") Signed-off-by: Max Kellermann max.kellermann@ionos.com Reviewed-by: Fengnan Chang changfengnan@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io-wq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 17dfaa0395c4..1d03b2fc4b25 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -352,16 +352,16 @@ static void create_worker_cb(struct callback_head *cb) struct io_wq *wq;
struct io_wq_acct *acct; - bool do_create = false; + bool activated_free_worker, do_create = false;
worker = container_of(cb, struct io_worker, create_work); wq = worker->wq; acct = worker->acct;
rcu_read_lock(); - do_create = !io_acct_activate_free_worker(acct); + activated_free_worker = io_acct_activate_free_worker(acct); rcu_read_unlock(); - if (!do_create) + if (activated_free_worker) goto no_need_create;
raw_spin_lock(&acct->workers_lock);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 3539b1467e94336d5854ebf976d9627bfb65d6c3 upstream.
When running task_work for an exiting task, rather than perform the issue retry attempt, the task_work is canceled. However, this isn't done for a ring that has been closed. This can lead to requests being successfully completed post the ring being closed, which is somewhat confusing and surprising to an application.
Rather than just check the task exit state, also include the ring ref state in deciding whether or not to terminate a given request when run from task_work.
Cc: stable@vger.kernel.org # 6.1+ Link: https://github.com/axboe/liburing/discussions/1459 Reported-by: Benedek Thaler thaler@thaler.hu Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 6 ++++-- io_uring/io_uring.h | 4 ++-- io_uring/poll.c | 2 +- io_uring/timeout.c | 2 +- io_uring/uring_cmd.c | 2 +- 5 files changed, 9 insertions(+), 7 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1371,8 +1371,10 @@ static void io_req_task_cancel(struct io
void io_req_task_submit(struct io_kiocb *req, io_tw_token_t tw) { - io_tw_lock(req->ctx, tw); - if (unlikely(io_should_terminate_tw())) + struct io_ring_ctx *ctx = req->ctx; + + io_tw_lock(ctx, tw); + if (unlikely(io_should_terminate_tw(ctx))) io_req_defer_failed(req, -EFAULT); else if (req->flags & REQ_F_FORCE_ASYNC) io_queue_iowq(req); --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -470,9 +470,9 @@ static inline bool io_allowed_run_tw(str * 2) PF_KTHREAD is set, in which case the invoker of the task_work is * our fallback task_work. */ -static inline bool io_should_terminate_tw(void) +static inline bool io_should_terminate_tw(struct io_ring_ctx *ctx) { - return current->flags & (PF_KTHREAD | PF_EXITING); + return (current->flags & (PF_KTHREAD | PF_EXITING)) || percpu_ref_is_dying(&ctx->refs); }
static inline void io_req_queue_tw_complete(struct io_kiocb *req, s32 res) --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -224,7 +224,7 @@ static int io_poll_check_events(struct i { int v;
- if (unlikely(io_should_terminate_tw())) + if (unlikely(io_should_terminate_tw(req->ctx))) return -ECANCELED;
do { --- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -324,7 +324,7 @@ static void io_req_task_link_timeout(str int ret;
if (prev) { - if (!io_should_terminate_tw()) { + if (!io_should_terminate_tw(req->ctx)) { struct io_cancel_data cd = { .ctx = req->ctx, .data = prev->cqe.user_data, --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -123,7 +123,7 @@ static void io_uring_cmd_work(struct io_ struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); unsigned int flags = IO_URING_F_COMPLETE_DEFER;
- if (io_should_terminate_tw()) + if (io_should_terminate_tw(req->ctx)) flags |= IO_URING_F_TASK_DEAD;
/* task_work executor checks the deffered list completion */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hansg@kernel.org
commit b6f56a44e4c1014b08859dcf04ed246500e310e5 upstream.
Since commit 7d5e9737efda ("net: rfkill: gpio: get the name and type from device property") rfkill_find_type() gets called with the possibly uninitialized "const char *type_name;" local variable.
On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752" acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:
rfkill->type = (unsigned)id->driver_data;
and there is no "type" property so device_property_read_string() will fail and leave type_name uninitialized, leading to a potential crash.
rfkill_find_type() does accept a NULL pointer, fix the potential crash by initializing type_name to NULL.
Note likely sofar this has not been caught because:
1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device 2. The stack happened to contain NULL where type_name is stored
Fixes: 7d5e9737efda ("net: rfkill: gpio: get the name and type from device property") Cc: stable@vger.kernel.org Cc: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Hans de Goede hansg@kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rfkill/rfkill-gpio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/rfkill/rfkill-gpio.c +++ b/net/rfkill/rfkill-gpio.c @@ -94,10 +94,10 @@ static const struct dmi_system_id rfkill static int rfkill_gpio_probe(struct platform_device *pdev) { struct rfkill_gpio_data *rfkill; - struct gpio_desc *gpio; + const char *type_name = NULL; const char *name_property; const char *type_property; - const char *type_name; + struct gpio_desc *gpio; int ret;
if (dmi_check_system(rfkill_gpio_deny_table))
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sébastien Szymanski sebastien.szymanski@armadeus.com
commit 19c839a98c731169f06d32e7c9e00c78a0086ebe upstream.
Since commit 7c010d463372 ("gpiolib: acpi: Make sure we fill struct acpi_gpio_info"), uninitialized acpi_gpio_info struct are passed to __acpi_find_gpio() and later in the call stack info->quirks is used in acpi_populate_gpio_lookup. This breaks the i2c_hid_cpi driver:
[ 58.122916] i2c_hid_acpi i2c-UNIW0001:00: HID over i2c has not been provided an Int IRQ [ 58.123097] i2c_hid_acpi i2c-UNIW0001:00: probe with driver i2c_hid_acpi failed with error -22
Fix this by initializing the acpi_gpio_info pass to __acpi_find_gpio()
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220388 Fixes: 7c010d463372 ("gpiolib: acpi: Make sure we fill struct acpi_gpio_info") Signed-off-by: Sébastien Szymanski sebastien.szymanski@armadeus.com Tested-by: Hans de Goede hansg@kernel.org Reviewed-by: Hans de Goede hansg@kernel.org Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Tested-By: Calvin Owens calvin@wbinvd.org Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-acpi-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpio/gpiolib-acpi-core.c +++ b/drivers/gpio/gpiolib-acpi-core.c @@ -942,7 +942,7 @@ struct gpio_desc *acpi_find_gpio(struct { struct acpi_device *adev = to_acpi_device_node(fwnode); bool can_fallback = acpi_can_fallback_to_crs(adev, con_id); - struct acpi_gpio_info info; + struct acpi_gpio_info info = {}; struct gpio_desc *desc;
desc = __acpi_find_gpio(fwnode, con_id, idx, can_fallback, &info); @@ -992,7 +992,7 @@ int acpi_dev_gpio_irq_wake_get_by(struct int ret;
for (i = 0, idx = 0; idx <= index; i++) { - struct acpi_gpio_info info; + struct acpi_gpio_info info = {}; struct gpio_desc *desc;
/* Ignore -EPROBE_DEFER, it only matters if idx matches */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
commit 46834d90a9a13549264b9581067d8f746b4b36cc upstream.
When
9770b428b1a2 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
moved the error messages dumping so that they don't need to be issued by the callers, it missed the case where __sev_firmware_shutdown() calls __sev_platform_shutdown_locked() with a NULL argument which leads to a NULL ptr deref on the shutdown path, during suspend to disk:
#PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 983 Comm: hib.sh Not tainted 6.17.0-rc4+ #1 PREEMPT(voluntary) Hardware name: Supermicro Super Server/H12SSL-i, BIOS 2.5 09/08/2022 RIP: 0010:__sev_platform_shutdown_locked.cold+0x0/0x21 [ccp]
That rIP is:
00000000000006fd <__sev_platform_shutdown_locked.cold>: 6fd: 8b 13 mov (%rbx),%edx 6ff: 48 8b 7d 00 mov 0x0(%rbp),%rdi 703: 89 c1 mov %eax,%ecx
Code: 74 05 31 ff 41 89 3f 49 8b 3e 89 ea 48 c7 c6 a0 8e 54 a0 41 bf 92 ff ff ff e8 e5 2e 09 e1 c6 05 2a d4 38 00 01 e9 26 af ff ff <8b> 13 48 8b 7d 00 89 c1 48 c7 c6 18 90 54 a0 89 44 24 04 e8 c1 2e RSP: 0018:ffffc90005467d00 EFLAGS: 00010282 RAX: 00000000ffffff92 RBX: 0000000000000000 RCX: 0000000000000000 ^^^^^^^^^^^^^^^^ and %rbx is nice and clean.
Call Trace: <TASK> __sev_firmware_shutdown.isra.0 sev_dev_destroy psp_dev_destroy sp_destroy pci_device_shutdown device_shutdown kernel_power_off hibernate.cold state_store kernfs_fop_write_iter vfs_write ksys_write do_syscall_64 entry_SYSCALL_64_after_hwframe
Pass in a pointer to the function-local error var in the caller.
With that addressed, suspending the ccp shows the error properly at least:
ccp 0000:47:00.1: sev command 0x2 timed out, disabling PSP ccp 0000:47:00.1: SEV: failed to SHUTDOWN error 0x0, rc -110 SEV-SNP: Leaking PFN range 0x146800-0x146a00 SEV-SNP: PFN 0x146800 unassigned, dumping non-zero entries in 2M PFN region: [0x146800 - 0x146a00] ... ccp 0000:47:00.1: SEV-SNP firmware shutdown failed, rc -16, error 0x0 ACPI: PM: Preparing to enter system sleep state S5 kvm: exiting hardware virtualization reboot: Power down
Btw, this driver is crying to be cleaned up to pass in a proper I/O struct which can be used to store information between the different functions, otherwise stuff like that will happen in the future again.
Fixes: 9770b428b1a2 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown") Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@kernel.org Reviewed-by: Ashish Kalra ashish.kalra@amd.com Acked-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/ccp/sev-dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c index e058ba027792..9f5ccc1720cb 100644 --- a/drivers/crypto/ccp/sev-dev.c +++ b/drivers/crypto/ccp/sev-dev.c @@ -2430,7 +2430,7 @@ static void __sev_firmware_shutdown(struct sev_device *sev, bool panic) { int error;
- __sev_platform_shutdown_locked(NULL); + __sev_platform_shutdown_locked(&error);
if (sev_es_tmr) { /*
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Håkon Bugge haakon.bugge@oracle.com
commit 4351ca3fcb3ffecf12631b4996bf085a2dad0db6 upstream.
We need to increment i_fastreg_wrs before we bail out from rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be outstanding using the dedicated QP used for memory registrations and de-registrations. This budget is enforced by the atomic_t i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will "leak" the possibility of posting an FRWR operation, and if that accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode") Fixes: 3a2886cca703 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown") Cc: stable@vger.kernel.org Signed-off-by: Håkon Bugge haakon.bugge@oracle.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Link: https://patch.msgid.link/20250911133336.451212-1-haakon.bugge@oracle.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/ib_frmr.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
--- a/net/rds/ib_frmr.c +++ b/net/rds/ib_frmr.c @@ -133,12 +133,15 @@ static int rds_ib_post_reg_frmr(struct r
ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_dma_len, &off, PAGE_SIZE); - if (unlikely(ret != ibmr->sg_dma_len)) - return ret < 0 ? ret : -EINVAL; + if (unlikely(ret != ibmr->sg_dma_len)) { + ret = ret < 0 ? ret : -EINVAL; + goto out_inc; + }
- if (cmpxchg(&frmr->fr_state, - FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE) - return -EBUSY; + if (cmpxchg(&frmr->fr_state, FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE) { + ret = -EBUSY; + goto out_inc; + }
atomic_inc(&ibmr->ic->i_fastreg_inuse_count);
@@ -166,11 +169,10 @@ static int rds_ib_post_reg_frmr(struct r /* Failure here can be because of -ENOMEM as well */ rds_transition_frwr_state(ibmr, FRMR_IS_INUSE, FRMR_IS_STALE);
- atomic_inc(&ibmr->ic->i_fastreg_wrs); if (printk_ratelimit()) pr_warn("RDS/IB: %s returned error(%d)\n", __func__, ret); - goto out; + goto out_inc; }
/* Wait for the registration to complete in order to prevent an invalid @@ -179,8 +181,10 @@ static int rds_ib_post_reg_frmr(struct r */ wait_event(frmr->fr_reg_done, !frmr->fr_reg);
-out: + return ret;
+out_inc: + atomic_inc(&ibmr->ic->i_fastreg_wrs); return ret; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit f755be0b1ff429a2ecf709beeb1bcd7abc111c2b upstream.
When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related metadata to exchange, and all subflows can be safely shutdown.
Before this patch, the subflows were actually terminated at 'close()' time. That's certainly fine most of the time, but not when the userspace 'shutdown()' a connection, without close()ing it. When doing so, the subflows were staying in LAST_ACK state on one side -- and consequently in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP socket.
Now, when the DATA FIN have been ACKed, all subflows are shutdown. A consequence of this is that the TCP 'FIN' flag can be set earlier now, but the end result is the same. This affects the packetdrill tests looking at the end of the MPTCP connections, but for a good reason.
Note that tcp_shutdown() will check the subflow state, so no need to do that again before calling it.
Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers") Cc: stable@vger.kernel.org Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN") Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-1-d40e77cbbf0... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -350,6 +350,20 @@ static void mptcp_close_wake_up(struct s sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); }
+static void mptcp_shutdown_subflows(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + bool slow; + + slow = lock_sock_fast(ssk); + tcp_shutdown(ssk, SEND_SHUTDOWN); + unlock_sock_fast(ssk, slow); + } +} + /* called under the msk socket lock */ static bool mptcp_pending_data_fin_ack(struct sock *sk) { @@ -374,6 +388,7 @@ static void mptcp_check_data_fin_ack(str break; case TCP_CLOSING: case TCP_LAST_ACK: + mptcp_shutdown_subflows(msk); mptcp_set_state(sk, TCP_CLOSE); break; } @@ -542,6 +557,7 @@ static bool mptcp_check_data_fin(struct mptcp_set_state(sk, TCP_CLOSING); break; case TCP_FIN_WAIT2: + mptcp_shutdown_subflows(msk); mptcp_set_state(sk, TCP_CLOSE); break; default:
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 14e22b43df25dbd4301351b882486ea38892ae4f upstream.
IO errors were correctly printed to stderr, and propagated up to the main loop for the server side, but the returned value was ignored. As a consequence, the program for the listener side was no longer exiting with an error code in case of IO issues.
Because of that, some issues might not have been seen. But very likely, most issues either had an effect on the client side, or the file transfer was not the expected one, e.g. the connection got reset before the end. Still, it is better to fix this.
The main consequence of this issue is the error that was reported by the selftests: the received and sent files were different, and the MIB counters were not printed. Also, when such errors happened during the 'disconnect' tests, the program tried to continue until the timeout.
Now when an IO error is detected, the program exits directly with an error.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf0... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1093,6 +1093,7 @@ int main_loop_s(int listensock) struct pollfd polls; socklen_t salen; int remotesock; + int err = 0; int fd = 0;
again: @@ -1125,7 +1126,7 @@ again: SOCK_TEST_TCPULP(remotesock, 0);
memset(&winfo, 0, sizeof(winfo)); - copyfd_io(fd, remotesock, 1, true, &winfo); + err = copyfd_io(fd, remotesock, 1, true, &winfo); } else { perror("accept"); return 1; @@ -1134,10 +1135,10 @@ again: if (cfg_input) close(fd);
- if (--cfg_repeat > 0) + if (!err && --cfg_repeat > 0) goto again;
- return 0; + return err; }
static void init_rng(void)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 8708c5d8b3fb3f6d5d3b9e6bfe01a505819f519a upstream.
The disconnect test-case, with 'plain' TCP sockets generates spurious errors, e.g.
07 ns1 TCP -> ns1 (dead:beef:1::1:10006) MPTCP read: Connection reset by peer read: Connection reset by peer (duration 155ms) [FAIL] client exit code 3, server 3
netns ns1-FloSdv (listener) socket stat for 10006: TcpActiveOpens 2 0.0 TcpPassiveOpens 2 0.0 TcpEstabResets 2 0.0 TcpInSegs 274 0.0 TcpOutSegs 276 0.0 TcpOutRsts 3 0.0 TcpExtPruneCalled 2 0.0 TcpExtRcvPruned 1 0.0 TcpExtTCPPureAcks 104 0.0 TcpExtTCPRcvCollapsed 2 0.0 TcpExtTCPBacklogCoalesce 42 0.0 TcpExtTCPRcvCoalesce 43 0.0 TcpExtTCPChallengeACK 1 0.0 TcpExtTCPFromZeroWindowAdv 42 0.0 TcpExtTCPToZeroWindowAdv 41 0.0 TcpExtTCPWantZeroWindowAdv 13 0.0 TcpExtTCPOrigDataSent 164 0.0 TcpExtTCPDelivered 165 0.0 TcpExtTCPRcvQDrop 1 0.0
In the failing scenarios (TCP -> MPTCP), the involved sockets are actually plain TCP ones, as fallbacks for passive sockets at 2WHS time cause the MPTCP listeners to actually create 'plain' TCP sockets.
Similar to commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), the root cause is in the user-space bits: the test program tries to disconnect as soon as all the pending data has been spooled, generating an RST. If such option reaches the peer before the connection has reached the closed status, the TCP socket will report an error to the user-space, as per protocol specification, causing the above failure. Note that it looks like this issue got more visible since the "tcp: receiver changes" series from commit 06baf9bfa6ca ("Merge branch 'tcp-receiver-changes'").
Address the issue by explicitly waiting for the TCP sockets (-t) to reach a closed status before performing the disconnect. More precisely, the test program now waits for plain TCP sockets or TCP subflows in addition to the MPTCP sockets that were already monitored.
While at it, use 'ss' with '-n' to avoid resolving service names, which is not needed here.
Fixes: 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-3-d40e77cbbf0... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1248,7 +1248,7 @@ void xdisconnect(int fd) else xerror("bad family");
- strcpy(cmd, "ss -M | grep -q "); + strcpy(cmd, "ss -Mnt | grep -q "); cmdlen = strlen(cmd); if (!inet_ntop(addr.ss_family, raw_addr, &cmd[cmdlen], sizeof(cmd) - cmdlen)) @@ -1258,7 +1258,7 @@ void xdisconnect(int fd)
/* * wait until the pending data is completely flushed and all - * the MPTCP sockets reached the closed status. + * the sockets reached the closed status. * disconnect will bypass/ignore/drop any pending data. */ for (i = 0; ; i += msec_sleep) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Praful Adiga praful.adiga@gmail.com
commit d33c3471047fc54966621d19329e6a23ebc8ec50 upstream.
This laptop uses the ALC236 codec with COEF 0x7 and idx 1 to control the mute LED. Enable the existing quirk for this device.
Signed-off-by: Praful Adiga praful.adiga@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10752,6 +10752,7 @@ static const struct hda_quirk alc269_fix SND_PCI_QUIRK(0x103c, 0x8992, "HP EliteBook 845 G9", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x103c, 0x8994, "HP EliteBook 855 G9", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8995, "HP EliteBook 855 G9", ALC287_FIXUP_CS35L41_I2C_2), + SND_PCI_QUIRK(0x103c, 0x89a0, "HP Laptop 15-dw4xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x89a4, "HP ProBook 440 G9", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x89a6, "HP ProBook 450 G9", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x89aa, "HP EliteBook 630 G9", ALC236_FIXUP_HP_GPIO_LED),
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Keepax ckeepax@opensource.cirrus.com
[ Upstream commit d05afb53c683ef7ed1228b593c3360f4d3126c58 ]
Using a single value of 22500000 for both 48000Hz and 44100Hz audio will sometimes result in returning wrong dividers due to rounding. Update the code to use the actual value for both.
Fixes: 294833fc9eb4 ("ASoC: wm8940: Rewrite code to set proper clocks") Reported-by: Ankur Tyagi ankur.tyagi85@gmail.com Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Tested-by: Ankur Tyagi ankur.tyagi85@gmail.com Link: https://patch.msgid.link/20250821082639.1301453-2-ckeepax@opensource.cirrus.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wm8940.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/sound/soc/codecs/wm8940.c b/sound/soc/codecs/wm8940.c index 401ee20897b1b..46c16c9bc17a8 100644 --- a/sound/soc/codecs/wm8940.c +++ b/sound/soc/codecs/wm8940.c @@ -693,7 +693,12 @@ static int wm8940_update_clocks(struct snd_soc_dai *dai) f = wm8940_get_mclkdiv(priv->mclk, fs256, &mclkdiv); if (f != priv->mclk) { /* The PLL performs best around 90MHz */ - fpll = wm8940_get_mclkdiv(22500000, fs256, &mclkdiv); + if (fs256 % 8000) + f = 22579200; + else + f = 24576000; + + fpll = wm8940_get_mclkdiv(f, fs256, &mclkdiv); }
wm8940_set_dai_pll(dai, 0, 0, priv->mclk, fpll);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Keepax ckeepax@opensource.cirrus.com
[ Upstream commit b4799520dcd6fe1e14495cecbbe9975d847cd482 ]
Fixes: 0b5e92c5e020 ("ASoC WM8940 Driver") Reported-by: Ankur Tyagi ankur.tyagi85@gmail.com Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Tested-by: Ankur Tyagi ankur.tyagi85@gmail.com Link: https://patch.msgid.link/20250821082639.1301453-3-ckeepax@opensource.cirrus.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wm8940.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/codecs/wm8940.c b/sound/soc/codecs/wm8940.c index 46c16c9bc17a8..94873ea630146 100644 --- a/sound/soc/codecs/wm8940.c +++ b/sound/soc/codecs/wm8940.c @@ -220,7 +220,7 @@ static const struct snd_kcontrol_new wm8940_snd_controls[] = { SOC_SINGLE_TLV("Digital Capture Volume", WM8940_ADCVOL, 0, 255, 0, wm8940_adc_tlv), SOC_ENUM("Mic Bias Level", wm8940_mic_bias_level_enum), - SOC_SINGLE_TLV("Capture Boost Volue", WM8940_ADCBOOST, + SOC_SINGLE_TLV("Capture Boost Volume", WM8940_ADCBOOST, 8, 1, 0, wm8940_capture_boost_vol_tlv), SOC_SINGLE_TLV("Speaker Playback Volume", WM8940_SPKVOL, 0, 63, 0, wm8940_spk_vol_tlv),
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Keepax ckeepax@opensource.cirrus.com
[ Upstream commit 9b17d3724df55ecc2bc67978822585f2b023be48 ]
Using a single value of 22500000 for both 48000Hz and 44100Hz audio will sometimes result in returning wrong dividers due to rounding. Update the code to use the actual value for both.
Fixes: 51b2bb3f2568 ("ASoC: wm8974: configure pll and mclk divider automatically") Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://patch.msgid.link/20250821082639.1301453-4-ckeepax@opensource.cirrus.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wm8974.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/wm8974.c b/sound/soc/codecs/wm8974.c index bdf437a5403fe..db16d893a2351 100644 --- a/sound/soc/codecs/wm8974.c +++ b/sound/soc/codecs/wm8974.c @@ -419,10 +419,14 @@ static int wm8974_update_clocks(struct snd_soc_dai *dai) fs256 = 256 * priv->fs;
f = wm8974_get_mclkdiv(priv->mclk, fs256, &mclkdiv); - if (f != priv->mclk) { /* The PLL performs best around 90MHz */ - fpll = wm8974_get_mclkdiv(22500000, fs256, &mclkdiv); + if (fs256 % 8000) + f = 22579200; + else + f = 24576000; + + fpll = wm8974_get_mclkdiv(f, fs256, &mclkdiv); }
wm8974_set_dai_pll(dai, 0, 0, priv->mclk, fpll);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 78338108b5a856dc98223a335f147846a8a18c51 ]
The sma1307->set.header_size is how many integers are in the header (there are 8 of them) but instead of allocating space of 8 integers we allocate 8 bytes. This leads to memory corruption when we copy data it on the next line:
memcpy(sma1307->set.header, data, sma1307->set.header_size * sizeof(int));
Also since we're immediately copying over the memory in ->set.header, there is no need to zero it in the allocator. Use devm_kmalloc_array() to allocate the memory instead.
Fixes: 576c57e6b4c1 ("ASoC: sma1307: Add driver for Iron Device SMA1307") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://patch.msgid.link/aLGjvjpueVstekXP@stanley.mountain Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/sma1307.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/sound/soc/codecs/sma1307.c b/sound/soc/codecs/sma1307.c index b3d401ada1760..2d993428f87e3 100644 --- a/sound/soc/codecs/sma1307.c +++ b/sound/soc/codecs/sma1307.c @@ -1737,9 +1737,10 @@ static void sma1307_setting_loaded(struct sma1307_priv *sma1307, const char *fil sma1307->set.checksum = data[sma1307->set.header_size - 2]; sma1307->set.num_mode = data[sma1307->set.header_size - 1]; num_mode = sma1307->set.num_mode; - sma1307->set.header = devm_kzalloc(sma1307->dev, - sma1307->set.header_size, - GFP_KERNEL); + sma1307->set.header = devm_kmalloc_array(sma1307->dev, + sma1307->set.header_size, + sizeof(int), + GFP_KERNEL); if (!sma1307->set.header) { sma1307->set.status = false; return;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.i.king@gmail.com
[ Upstream commit 35fc531a59694f24a2456569cf7d1a9c6436841c ]
The dev_err message is reporting an error about capture streams however it is using the incorrect variable num_playback instead of num_capture. Fix this by using the correct variable num_capture.
Fixes: a1d1e266b445 ("ASoC: SOF: Intel: Add Intel specific HDA stream operations") Signed-off-by: Colin Ian King colin.i.king@gmail.com Acked-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://patch.msgid.link/20250902120639.2626861-1-colin.i.king@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/intel/hda-stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/sof/intel/hda-stream.c b/sound/soc/sof/intel/hda-stream.c index aa6b0247d5c99..a34f472ef1751 100644 --- a/sound/soc/sof/intel/hda-stream.c +++ b/sound/soc/sof/intel/hda-stream.c @@ -890,7 +890,7 @@ int hda_dsp_stream_init(struct snd_sof_dev *sdev)
if (num_capture >= SOF_HDA_CAPTURE_STREAMS) { dev_err(sdev->dev, "error: too many capture streams %d\n", - num_playback); + num_capture); return -EINVAL; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Keepax ckeepax@opensource.cirrus.com
[ Upstream commit f81e63047600d023cbfda372b6de8f2821ff6839 ]
The MBQ size function returns an integer representing the size of a Control. Currently if the Control is not found the function will return false which makes little sense. Correct this typo to return -EINVAL.
Fixes: e3f7caf74b79 ("ASoC: SDCA: Add generic regmap SDCA helpers") Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Message-ID: 20250820163717.1095846-2-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sdca/sdca_regmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/sdca/sdca_regmap.c b/sound/soc/sdca/sdca_regmap.c index c41c67c2204a4..ff1f8fe2a39bb 100644 --- a/sound/soc/sdca/sdca_regmap.c +++ b/sound/soc/sdca/sdca_regmap.c @@ -196,7 +196,7 @@ int sdca_regmap_mbq_size(struct sdca_function_data *function, unsigned int reg)
control = function_find_control(function, reg); if (!control) - return false; + return -EINVAL;
return clamp_val(control->nbits / BITS_PER_BYTE, sizeof(u8), sizeof(u32)); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com
[ Upstream commit 690aa09b1845c0d5c3c29dabd50a9d0488c97c48 ]
Currently wrong bit depth is exposed in hw params, causing clipped volume during playback. Expose correct parameters.
Fixes: a126750fc865 ("ASoC: Intel: catpt: PCM operations") Reported-by: Andy Shevchenko andriy.shevchenko@intel.com Tested-by: Andy Shevchenko andriy.shevchenko@intel.com Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Signed-off-by: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com Message-ID: 20250909092829.375953-1-amadeuszx.slawinski@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/catpt/pcm.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/sound/soc/intel/catpt/pcm.c b/sound/soc/intel/catpt/pcm.c index 81a2f0339e055..ff1fa01acb85b 100644 --- a/sound/soc/intel/catpt/pcm.c +++ b/sound/soc/intel/catpt/pcm.c @@ -568,8 +568,9 @@ static const struct snd_pcm_hardware catpt_pcm_hardware = { SNDRV_PCM_INFO_RESUME | SNDRV_PCM_INFO_NO_PERIOD_WAKEUP, .formats = SNDRV_PCM_FMTBIT_S16_LE | - SNDRV_PCM_FMTBIT_S24_LE | SNDRV_PCM_FMTBIT_S32_LE, + .subformats = SNDRV_PCM_SUBFMTBIT_MSBITS_24 | + SNDRV_PCM_SUBFMTBIT_MSBITS_MAX, .period_bytes_min = PAGE_SIZE, .period_bytes_max = CATPT_BUFFER_MAX_SIZE / CATPT_PCM_PERIODS_MIN, .periods_min = CATPT_PCM_PERIODS_MIN, @@ -699,14 +700,18 @@ static struct snd_soc_dai_driver dai_drivers[] = { .channels_min = 2, .channels_max = 2, .rates = SNDRV_PCM_RATE_48000, - .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S24_LE, + .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S32_LE, + .subformats = SNDRV_PCM_SUBFMTBIT_MSBITS_24 | + SNDRV_PCM_SUBFMTBIT_MSBITS_MAX, }, .capture = { .stream_name = "Analog Capture", .channels_min = 2, .channels_max = 4, .rates = SNDRV_PCM_RATE_48000, - .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S24_LE, + .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S32_LE, + .subformats = SNDRV_PCM_SUBFMTBIT_MSBITS_24 | + SNDRV_PCM_SUBFMTBIT_MSBITS_MAX, }, }, { @@ -718,7 +723,9 @@ static struct snd_soc_dai_driver dai_drivers[] = { .channels_min = 2, .channels_max = 2, .rates = SNDRV_PCM_RATE_8000_192000, - .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S24_LE, + .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S32_LE, + .subformats = SNDRV_PCM_SUBFMTBIT_MSBITS_24 | + SNDRV_PCM_SUBFMTBIT_MSBITS_MAX, }, }, { @@ -730,7 +737,9 @@ static struct snd_soc_dai_driver dai_drivers[] = { .channels_min = 2, .channels_max = 2, .rates = SNDRV_PCM_RATE_8000_192000, - .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S24_LE, + .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S32_LE, + .subformats = SNDRV_PCM_SUBFMTBIT_MSBITS_24 | + SNDRV_PCM_SUBFMTBIT_MSBITS_MAX, }, }, { @@ -742,7 +751,9 @@ static struct snd_soc_dai_driver dai_drivers[] = { .channels_min = 2, .channels_max = 2, .rates = SNDRV_PCM_RATE_48000, - .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S24_LE, + .formats = SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S32_LE, + .subformats = SNDRV_PCM_SUBFMTBIT_MSBITS_24 | + SNDRV_PCM_SUBFMTBIT_MSBITS_MAX, }, }, {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasant Hegde vasant.hegde@amd.com
[ Upstream commit a0c17ed907ac3326cf3c9d6007ea222a746f5cc2 ]
Commit 7bea695ada0 restructured DTE flag handling but inadvertently changed the alias device configuration logic. This may cause incorrect DTE settings for certain devices.
Add alias flag check before calling set_dev_entry_from_acpi(). Also move the device iteration loop inside the alias check to restrict execution to cases where alias devices are present.
Fixes: 7bea695ada0 ("iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags") Cc: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Signed-off-by: Vasant Hegde vasant.hegde@amd.com Signed-off-by: Joerg Roedel joerg.roedel@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/amd/init.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 3a97cc667943f..eef55aa4143c1 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1450,12 +1450,12 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, PCI_FUNC(e->devid));
devid = e->devid; - for (dev_i = devid_start; dev_i <= devid; ++dev_i) { - if (alias) + if (alias) { + for (dev_i = devid_start; dev_i <= devid; ++dev_i) pci_seg->alias_table[dev_i] = devid_to; + set_dev_entry_from_acpi(iommu, devid_to, flags, ext_flags); } set_dev_entry_from_acpi_range(iommu, devid_start, devid, flags, ext_flags); - set_dev_entry_from_acpi(iommu, devid_to, flags, ext_flags); break; case IVHD_DEV_SPECIAL: { u8 handle, type;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Venkata Prasad Potturu venkataprasad.potturu@amd.com
[ Upstream commit d7871f400cad1da376f1d7724209a1c49226c456 ]
Use dev_get_drvdata(dev->parent) instead of dev_get_platdata(dev) to correctly obtain acp_chip_info members in the acp I2S driver. Previously, some members were not updated properly due to incorrect data access, which could potentially lead to null pointer dereferences.
This issue was missed in the earlier commit ("ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot"), which only addressed set_tdm_slot(). This change ensures that all relevant functions correctly retrieve acp_chip_info, preventing further null pointer dereference issues.
Fixes: e3933683b25e ("ASoC: amd: acp: Remove redundant acp_dev_data structure")
Signed-off-by: Venkata Prasad Potturu venkataprasad.potturu@amd.com Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://patch.msgid.link/20250910171419.3682468-1-venkataprasad.potturu@amd.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/amd/acp/acp-i2s.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/sound/soc/amd/acp/acp-i2s.c b/sound/soc/amd/acp/acp-i2s.c index 70fa54d568ef6..4d9589b67099e 100644 --- a/sound/soc/amd/acp/acp-i2s.c +++ b/sound/soc/amd/acp/acp-i2s.c @@ -72,7 +72,7 @@ static int acp_i2s_set_fmt(struct snd_soc_dai *cpu_dai, unsigned int fmt) { struct device *dev = cpu_dai->component->dev; - struct acp_chip_info *chip = dev_get_platdata(dev); + struct acp_chip_info *chip = dev_get_drvdata(dev->parent); int mode;
mode = fmt & SND_SOC_DAIFMT_FORMAT_MASK; @@ -196,7 +196,7 @@ static int acp_i2s_hwparams(struct snd_pcm_substream *substream, struct snd_pcm_ u32 reg_val, fmt_reg, tdm_fmt; u32 lrclk_div_val, bclk_div_val;
- chip = dev_get_platdata(dev); + chip = dev_get_drvdata(dev->parent); rsrc = chip->rsrc;
/* These values are as per Hardware Spec */ @@ -383,7 +383,7 @@ static int acp_i2s_trigger(struct snd_pcm_substream *substream, int cmd, struct { struct acp_stream *stream = substream->runtime->private_data; struct device *dev = dai->component->dev; - struct acp_chip_info *chip = dev_get_platdata(dev); + struct acp_chip_info *chip = dev_get_drvdata(dev->parent); struct acp_resource *rsrc = chip->rsrc; u32 val, period_bytes, reg_val, ier_val, water_val, buf_size, buf_reg;
@@ -513,14 +513,13 @@ static int acp_i2s_trigger(struct snd_pcm_substream *substream, int cmd, struct static int acp_i2s_prepare(struct snd_pcm_substream *substream, struct snd_soc_dai *dai) { struct device *dev = dai->component->dev; - struct acp_chip_info *chip = dev_get_platdata(dev); + struct acp_chip_info *chip = dev_get_drvdata(dev->parent); struct acp_resource *rsrc = chip->rsrc; struct acp_stream *stream = substream->runtime->private_data; u32 reg_dma_size = 0, reg_fifo_size = 0, reg_fifo_addr = 0; u32 phy_addr = 0, acp_fifo_addr = 0, ext_int_ctrl; unsigned int dir = substream->stream;
- chip = dev_get_platdata(dev); switch (dai->driver->id) { case I2S_SP_INSTANCE: if (dir == SNDRV_PCM_STREAM_PLAYBACK) { @@ -629,7 +628,7 @@ static int acp_i2s_startup(struct snd_pcm_substream *substream, struct snd_soc_d { struct acp_stream *stream = substream->runtime->private_data; struct device *dev = dai->component->dev; - struct acp_chip_info *chip = dev_get_platdata(dev); + struct acp_chip_info *chip = dev_get_drvdata(dev->parent); struct acp_resource *rsrc = chip->rsrc; unsigned int dir = substream->stream; unsigned int irq_bit = 0;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuicheng Lin shuicheng.lin@intel.com
[ Upstream commit 013e484dbd687a9174acf8f4450217bdb86ad788 ]
Call kobject_put() for the failure path to release the kobject
v2: remove extra newline. (Matt)
Fixes: e3d0839aa501 ("drm/xe/tile: Abort driver load for sysfs creation failure") Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Reviewed-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Shuicheng Lin shuicheng.lin@intel.com Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com (cherry picked from commit b98775bca99511cc22ab459a2de646cd2fa7241f) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_tile_sysfs.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_tile_sysfs.c b/drivers/gpu/drm/xe/xe_tile_sysfs.c index b804234a65516..9e1236a9ec673 100644 --- a/drivers/gpu/drm/xe/xe_tile_sysfs.c +++ b/drivers/gpu/drm/xe/xe_tile_sysfs.c @@ -44,16 +44,18 @@ int xe_tile_sysfs_init(struct xe_tile *tile) kt->tile = tile;
err = kobject_add(&kt->base, &dev->kobj, "tile%d", tile->id); - if (err) { - kobject_put(&kt->base); - return err; - } + if (err) + goto err_object;
tile->sysfs = &kt->base;
err = xe_vram_freq_sysfs_init(tile); if (err) - return err; + goto err_object;
return devm_add_action_or_reset(xe->drm.dev, tile_sysfs_fini, tile); + +err_object: + kobject_put(&kt->base); + return err; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Wajdeczko michal.wajdeczko@intel.com
[ Upstream commit fef8b64e48e836344574b85132a1c317f4260022 ]
This effectively reverts commit 4c3fe5eae46b ("drm/xe/pf: Limit fair VF LMEM provisioning") since we don't need it any more after non-contig VRAM allocations were fixed. This allows larger LMEM auto-provisioning for VFs, so instead:
[ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M) [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM or [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M) [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM
we may get:
[ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M) [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM and [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M) [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM
Fixes: 1e32ffbc9dc8 ("drm/xe/sriov: support non-contig VRAM provisioning") Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Reviewed-by: Piotr Piórkowski piotr.piorkowski@intel.com Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com (cherry picked from commit 95c1cfa306087142989bff34ea0e05dcd95ddc58) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c index 53a44702c04af..c15dc600dcae7 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c @@ -1600,7 +1600,6 @@ static u64 pf_estimate_fair_lmem(struct xe_gt *gt, unsigned int num_vfs) u64 fair;
fair = div_u64(available, num_vfs); - fair = rounddown_pow_of_two(fair); /* XXX: ttm_vram_mgr & drm_buddy limitation */ fair = ALIGN_DOWN(fair, alignment); #ifdef MAX_FAIR_LMEM fair = min_t(u64, MAX_FAIR_LMEM, fair);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Loic Poulain loic.poulain@oss.qualcomm.com
[ Upstream commit a10f910c77f280327b481e77eab909934ec508f0 ]
If the interrupt occurs before resource initialization is complete, the interrupt handler/worker may access uninitialized data such as the I2C tcpc_client device, potentially leading to NULL pointer dereference.
Signed-off-by: Loic Poulain loic.poulain@oss.qualcomm.com Fixes: 8bdfc5dae4e3 ("drm/bridge: anx7625: Add anx7625 MIPI DSI/DPI to DP") Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@oss.qualcomm.com Link: https://lore.kernel.org/r/20250709085438.56188-1-loic.poulain@oss.qualcomm.c... Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@oss.qualcomm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/analogix/anx7625.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c index 8a9079c2ed5c2..8257132a8ee9d 100644 --- a/drivers/gpu/drm/bridge/analogix/anx7625.c +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c @@ -2678,7 +2678,7 @@ static int anx7625_i2c_probe(struct i2c_client *client) ret = devm_request_threaded_irq(dev, platform->pdata.intp_irq, NULL, anx7625_intr_hpd_isr, IRQF_TRIGGER_FALLING | - IRQF_ONESHOT, + IRQF_ONESHOT | IRQF_NO_AUTOEN, "anx7625-intp", platform); if (ret) { DRM_DEV_ERROR(dev, "fail to request irq\n"); @@ -2747,8 +2747,10 @@ static int anx7625_i2c_probe(struct i2c_client *client) }
/* Add work function */ - if (platform->pdata.intp_irq) + if (platform->pdata.intp_irq) { + enable_irq(platform->pdata.intp_irq); queue_work(platform->workqueue, &platform->work); + }
if (platform->pdata.audio_en) anx7625_register_audio(dev, platform);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qi Xi xiqi2@huawei.com
[ Upstream commit 288dac9fb6084330d968459c750c838fd06e10e6 ]
Add missing mutex unlock before returning from the error path in cdns_mhdp_atomic_enable().
Fixes: 935a92a1c400 ("drm: bridge: cdns-mhdp8546: Fix possible null pointer dereference") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Qi Xi xiqi2@huawei.com Reviewed-by: Luca Ceresoli luca.ceresoli@bootlin.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20250904034447.665427-1-xiqi2@huawei.com Signed-off-by: Luca Ceresoli luca.ceresoli@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c index b431e7efd1f0d..dbef0ca1a22a3 100644 --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c @@ -1984,8 +1984,10 @@ static void cdns_mhdp_atomic_enable(struct drm_bridge *bridge, mhdp_state = to_cdns_mhdp_bridge_state(new_state);
mhdp_state->current_mode = drm_mode_duplicate(bridge->dev, mode); - if (!mhdp_state->current_mode) - return; + if (!mhdp_state->current_mode) { + ret = -EINVAL; + goto out; + }
drm_mode_set_name(mhdp_state->current_mode);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit cbc7f3b4f6ca19320e2eacf8fc1403d6f331ce14 ]
The xe_preempt_fence_create() function returns error pointers. It never returns NULL. Update the error checking to match.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Matthew Brost matthew.brost@intel.com Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com (cherry picked from commit 75cc23ffe5b422bc3cbd5cf0956b8b86e4b0e162) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 84052b98002d1..92ce7374a79ce 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -240,8 +240,8 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
pfence = xe_preempt_fence_create(q, q->lr.context, ++q->lr.seqno); - if (!pfence) { - err = -ENOMEM; + if (IS_ERR(pfence)) { + err = PTR_ERR(pfence); goto out_fini; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 44499ecb4f2817743c37d861bdb3e95f37d3d9cd ]
The sanity check previously added to uaudio_transfer_buffer_setup() assumed the allocated buffer being linear-mapped. But the buffer allocated via usb_alloc_coherent() isn't always so, rather to be used with (SG-)DMA API. This leaded to a false-positive warning and the driver failed to work.
Actually uaudio_transfer_buffer_setup() deals only with the DMA-API addresses for MEM_XFER_BUF type, while other callers of uaudio_iommu_map() are with pages with physical addresses for MEM_EVENT_RING and MEM_XFER_RING types. So this patch splits the mapping helper function to two different ones, uaudio_iommu_map() for the DMA pages and uaudio_iommu_map_pa() for the latter, in order to handle mapping differently for each type. Along with it, the unnecessary address check that caused probe error is dropped, too.
Fixes: 3335a1bbd624 ("ALSA: qc_audio_offload: try to reduce address space confusion") Suggested-by: Arnd Bergmann arnd@arndb.de Acked-by: Arnd Bergmann arnd@arndb.de Reported-and-tested-by: Luca Weiss luca.weiss@fairphone.com Closes: https://lore.kernel.org/DBR2363A95M1.L9XBNC003490@fairphone.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/usb/qcom/qc_audio_offload.c | 92 ++++++++++++++++--------------- 1 file changed, 48 insertions(+), 44 deletions(-)
diff --git a/sound/usb/qcom/qc_audio_offload.c b/sound/usb/qcom/qc_audio_offload.c index a25c5a5316901..9ad76fff741b8 100644 --- a/sound/usb/qcom/qc_audio_offload.c +++ b/sound/usb/qcom/qc_audio_offload.c @@ -538,38 +538,33 @@ static void uaudio_iommu_unmap(enum mem_type mtype, unsigned long iova, umap_size, iova, mapped_iova_size); }
+static int uaudio_iommu_map_prot(bool dma_coherent) +{ + int prot = IOMMU_READ | IOMMU_WRITE; + + if (dma_coherent) + prot |= IOMMU_CACHE; + return prot; +} + /** - * uaudio_iommu_map() - maps iommu memory for adsp + * uaudio_iommu_map_pa() - maps iommu memory for adsp * @mtype: ring type * @dma_coherent: dma coherent * @pa: physical address for ring/buffer * @size: size of memory region - * @sgt: sg table for memory region * * Maps the XHCI related resources to a memory region that is assigned to be * used by the adsp. This will be mapped to the domain, which is created by * the ASoC USB backend driver. * */ -static unsigned long uaudio_iommu_map(enum mem_type mtype, bool dma_coherent, - phys_addr_t pa, size_t size, - struct sg_table *sgt) +static unsigned long uaudio_iommu_map_pa(enum mem_type mtype, bool dma_coherent, + phys_addr_t pa, size_t size) { - struct scatterlist *sg; unsigned long iova = 0; - size_t total_len = 0; - unsigned long iova_sg; - phys_addr_t pa_sg; bool map = true; - size_t sg_len; - int prot; - int ret; - int i; - - prot = IOMMU_READ | IOMMU_WRITE; - - if (dma_coherent) - prot |= IOMMU_CACHE; + int prot = uaudio_iommu_map_prot(dma_coherent);
switch (mtype) { case MEM_EVENT_RING: @@ -583,20 +578,41 @@ static unsigned long uaudio_iommu_map(enum mem_type mtype, bool dma_coherent, &uaudio_qdev->xfer_ring_iova_size, &uaudio_qdev->xfer_ring_list, size); break; - case MEM_XFER_BUF: - iova = uaudio_get_iova(&uaudio_qdev->curr_xfer_buf_iova, - &uaudio_qdev->xfer_buf_iova_size, - &uaudio_qdev->xfer_buf_list, size); - break; default: dev_err(uaudio_qdev->data->dev, "unknown mem type %d\n", mtype); }
if (!iova || !map) - goto done; + return 0; + + iommu_map(uaudio_qdev->data->domain, iova, pa, size, prot, GFP_KERNEL);
- if (!sgt) - goto skip_sgt_map; + return iova; +} + +static unsigned long uaudio_iommu_map_xfer_buf(bool dma_coherent, size_t size, + struct sg_table *sgt) +{ + struct scatterlist *sg; + unsigned long iova = 0; + size_t total_len = 0; + unsigned long iova_sg; + phys_addr_t pa_sg; + size_t sg_len; + int prot = uaudio_iommu_map_prot(dma_coherent); + int ret; + int i; + + prot = IOMMU_READ | IOMMU_WRITE; + + if (dma_coherent) + prot |= IOMMU_CACHE; + + iova = uaudio_get_iova(&uaudio_qdev->curr_xfer_buf_iova, + &uaudio_qdev->xfer_buf_iova_size, + &uaudio_qdev->xfer_buf_list, size); + if (!iova) + goto done;
iova_sg = iova; for_each_sg(sgt->sgl, sg, sgt->nents, i) { @@ -618,11 +634,6 @@ static unsigned long uaudio_iommu_map(enum mem_type mtype, bool dma_coherent, uaudio_iommu_unmap(MEM_XFER_BUF, iova, size, total_len); iova = 0; } - return iova; - -skip_sgt_map: - iommu_map(uaudio_qdev->data->domain, iova, pa, size, prot, GFP_KERNEL); - done: return iova; } @@ -1020,7 +1031,6 @@ static int uaudio_transfer_buffer_setup(struct snd_usb_substream *subs, struct sg_table xfer_buf_sgt; dma_addr_t xfer_buf_dma; void *xfer_buf; - phys_addr_t xfer_buf_pa; u32 len = xfer_buf_len; bool dma_coherent; dma_addr_t xfer_buf_dma_sysdev; @@ -1051,18 +1061,12 @@ static int uaudio_transfer_buffer_setup(struct snd_usb_substream *subs, if (!xfer_buf) return -ENOMEM;
- /* Remapping is not possible if xfer_buf is outside of linear map */ - xfer_buf_pa = virt_to_phys(xfer_buf); - if (WARN_ON(!page_is_ram(PFN_DOWN(xfer_buf_pa)))) { - ret = -ENXIO; - goto unmap_sync; - } dma_get_sgtable(subs->dev->bus->sysdev, &xfer_buf_sgt, xfer_buf, xfer_buf_dma, len);
/* map the physical buffer into sysdev as well */ - xfer_buf_dma_sysdev = uaudio_iommu_map(MEM_XFER_BUF, dma_coherent, - xfer_buf_pa, len, &xfer_buf_sgt); + xfer_buf_dma_sysdev = uaudio_iommu_map_xfer_buf(dma_coherent, + len, &xfer_buf_sgt); if (!xfer_buf_dma_sysdev) { ret = -ENOMEM; goto unmap_sync; @@ -1143,8 +1147,8 @@ uaudio_endpoint_setup(struct snd_usb_substream *subs, sg_free_table(sgt);
/* data transfer ring */ - iova = uaudio_iommu_map(MEM_XFER_RING, dma_coherent, tr_pa, - PAGE_SIZE, NULL); + iova = uaudio_iommu_map_pa(MEM_XFER_RING, dma_coherent, tr_pa, + PAGE_SIZE); if (!iova) { ret = -ENOMEM; goto clear_pa; @@ -1207,8 +1211,8 @@ static int uaudio_event_ring_setup(struct snd_usb_substream *subs, mem_info->dma = sg_dma_address(sgt->sgl); sg_free_table(sgt);
- iova = uaudio_iommu_map(MEM_EVENT_RING, dma_coherent, er_pa, - PAGE_SIZE, NULL); + iova = uaudio_iommu_map_pa(MEM_EVENT_RING, dma_coherent, er_pa, + PAGE_SIZE); if (!iova) { ret = -ENOMEM; goto clear_pa;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
[ Upstream commit ae5fbbda341f92e605a9508a0fb18456155517f0 ]
Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini).
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: John Harrison John.C.Harrison@Intel.com Cc: Matthew Brost matthew.brost@intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: John Harrison John.C.Harrison@Intel.com Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@int... (cherry picked from commit 626667321deb4c7a294725406faa3dd71c3d445d) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_exec_queue.c | 22 ++++++--- drivers/gpu/drm/xe/xe_exec_queue_types.h | 8 ++- drivers/gpu/drm/xe/xe_execlist.c | 25 ++++++---- drivers/gpu/drm/xe/xe_execlist_types.h | 2 +- drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 4 +- drivers/gpu/drm/xe/xe_guc_submit.c | 52 ++++++++++++-------- 6 files changed, 72 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index fee22358cc09b..e511697596146 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -151,6 +151,16 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q) return err; }
+static void __xe_exec_queue_fini(struct xe_exec_queue *q) +{ + int i; + + q->ops->fini(q); + + for (i = 0; i < q->width; ++i) + xe_lrc_put(q->lrc[i]); +} + struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *vm, u32 logical_mask, u16 width, struct xe_hw_engine *hwe, u32 flags, @@ -181,11 +191,13 @@ struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *v if (xe_exec_queue_uses_pxp(q)) { err = xe_pxp_exec_queue_add(xe->pxp, q); if (err) - goto err_post_alloc; + goto err_post_init; }
return q;
+err_post_init: + __xe_exec_queue_fini(q); err_post_alloc: __xe_exec_queue_free(q); return ERR_PTR(err); @@ -283,13 +295,11 @@ void xe_exec_queue_destroy(struct kref *ref) xe_exec_queue_put(eq); }
- q->ops->fini(q); + q->ops->destroy(q); }
void xe_exec_queue_fini(struct xe_exec_queue *q) { - int i; - /* * Before releasing our ref to lrc and xef, accumulate our run ticks * and wakeup any waiters. @@ -298,9 +308,7 @@ void xe_exec_queue_fini(struct xe_exec_queue *q) if (q->xef && atomic_dec_and_test(&q->xef->exec_queue.pending_removal)) wake_up_var(&q->xef->exec_queue.pending_removal);
- for (i = 0; i < q->width; ++i) - xe_lrc_put(q->lrc[i]); - + __xe_exec_queue_fini(q); __xe_exec_queue_free(q); }
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h index cc1cffb5c87f1..1c9d03f2a3e5d 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h @@ -166,8 +166,14 @@ struct xe_exec_queue_ops { int (*init)(struct xe_exec_queue *q); /** @kill: Kill inflight submissions for backend */ void (*kill)(struct xe_exec_queue *q); - /** @fini: Fini exec queue for submission backend */ + /** @fini: Undoes the init() for submission backend */ void (*fini)(struct xe_exec_queue *q); + /** + * @destroy: Destroy exec queue for submission backend. The backend + * function must call xe_exec_queue_fini() (which will in turn call the + * fini() backend function) to ensure the queue is properly cleaned up. + */ + void (*destroy)(struct xe_exec_queue *q); /** @set_priority: Set priority for exec queue */ int (*set_priority)(struct xe_exec_queue *q, enum xe_exec_queue_priority priority); diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c index 788f56b066b6a..f83d421ac9d3d 100644 --- a/drivers/gpu/drm/xe/xe_execlist.c +++ b/drivers/gpu/drm/xe/xe_execlist.c @@ -385,10 +385,20 @@ static int execlist_exec_queue_init(struct xe_exec_queue *q) return err; }
-static void execlist_exec_queue_fini_async(struct work_struct *w) +static void execlist_exec_queue_fini(struct xe_exec_queue *q) +{ + struct xe_execlist_exec_queue *exl = q->execlist; + + drm_sched_entity_fini(&exl->entity); + drm_sched_fini(&exl->sched); + + kfree(exl); +} + +static void execlist_exec_queue_destroy_async(struct work_struct *w) { struct xe_execlist_exec_queue *ee = - container_of(w, struct xe_execlist_exec_queue, fini_async); + container_of(w, struct xe_execlist_exec_queue, destroy_async); struct xe_exec_queue *q = ee->q; struct xe_execlist_exec_queue *exl = q->execlist; struct xe_device *xe = gt_to_xe(q->gt); @@ -401,10 +411,6 @@ static void execlist_exec_queue_fini_async(struct work_struct *w) list_del(&exl->active_link); spin_unlock_irqrestore(&exl->port->lock, flags);
- drm_sched_entity_fini(&exl->entity); - drm_sched_fini(&exl->sched); - kfree(exl); - xe_exec_queue_fini(q); }
@@ -413,10 +419,10 @@ static void execlist_exec_queue_kill(struct xe_exec_queue *q) /* NIY */ }
-static void execlist_exec_queue_fini(struct xe_exec_queue *q) +static void execlist_exec_queue_destroy(struct xe_exec_queue *q) { - INIT_WORK(&q->execlist->fini_async, execlist_exec_queue_fini_async); - queue_work(system_unbound_wq, &q->execlist->fini_async); + INIT_WORK(&q->execlist->destroy_async, execlist_exec_queue_destroy_async); + queue_work(system_unbound_wq, &q->execlist->destroy_async); }
static int execlist_exec_queue_set_priority(struct xe_exec_queue *q, @@ -467,6 +473,7 @@ static const struct xe_exec_queue_ops execlist_exec_queue_ops = { .init = execlist_exec_queue_init, .kill = execlist_exec_queue_kill, .fini = execlist_exec_queue_fini, + .destroy = execlist_exec_queue_destroy, .set_priority = execlist_exec_queue_set_priority, .set_timeslice = execlist_exec_queue_set_timeslice, .set_preempt_timeout = execlist_exec_queue_set_preempt_timeout, diff --git a/drivers/gpu/drm/xe/xe_execlist_types.h b/drivers/gpu/drm/xe/xe_execlist_types.h index 415140936f11d..92c4ba52db0cb 100644 --- a/drivers/gpu/drm/xe/xe_execlist_types.h +++ b/drivers/gpu/drm/xe/xe_execlist_types.h @@ -42,7 +42,7 @@ struct xe_execlist_exec_queue {
bool has_run;
- struct work_struct fini_async; + struct work_struct destroy_async;
enum xe_exec_queue_priority active_priority; struct list_head active_link; diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h index a3f421e2adc03..c30c0e3ccbbb9 100644 --- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h @@ -35,8 +35,8 @@ struct xe_guc_exec_queue { struct xe_sched_msg static_msgs[MAX_STATIC_MSG_TYPE]; /** @lr_tdr: long running TDR worker */ struct work_struct lr_tdr; - /** @fini_async: do final fini async from this worker */ - struct work_struct fini_async; + /** @destroy_async: do final destroy async from this worker */ + struct work_struct destroy_async; /** @resume_time: time of last resume */ u64 resume_time; /** @state: GuC specific state for this xe_exec_queue */ diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index ef3e9e1588f7c..45a21af126927 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -1269,48 +1269,57 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) return DRM_GPU_SCHED_STAT_NOMINAL; }
-static void __guc_exec_queue_fini_async(struct work_struct *w) +static void guc_exec_queue_fini(struct xe_exec_queue *q) +{ + struct xe_guc_exec_queue *ge = q->guc; + struct xe_guc *guc = exec_queue_to_guc(q); + + release_guc_id(guc, q); + xe_sched_entity_fini(&ge->entity); + xe_sched_fini(&ge->sched); + + /* + * RCU free due sched being exported via DRM scheduler fences + * (timeline name). + */ + kfree_rcu(ge, rcu); +} + +static void __guc_exec_queue_destroy_async(struct work_struct *w) { struct xe_guc_exec_queue *ge = - container_of(w, struct xe_guc_exec_queue, fini_async); + container_of(w, struct xe_guc_exec_queue, destroy_async); struct xe_exec_queue *q = ge->q; struct xe_guc *guc = exec_queue_to_guc(q);
xe_pm_runtime_get(guc_to_xe(guc)); trace_xe_exec_queue_destroy(q);
- release_guc_id(guc, q); if (xe_exec_queue_is_lr(q)) cancel_work_sync(&ge->lr_tdr); /* Confirm no work left behind accessing device structures */ cancel_delayed_work_sync(&ge->sched.base.work_tdr); - xe_sched_entity_fini(&ge->entity); - xe_sched_fini(&ge->sched);
- /* - * RCU free due sched being exported via DRM scheduler fences - * (timeline name). - */ - kfree_rcu(ge, rcu); xe_exec_queue_fini(q); + xe_pm_runtime_put(guc_to_xe(guc)); }
-static void guc_exec_queue_fini_async(struct xe_exec_queue *q) +static void guc_exec_queue_destroy_async(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); struct xe_device *xe = guc_to_xe(guc);
- INIT_WORK(&q->guc->fini_async, __guc_exec_queue_fini_async); + INIT_WORK(&q->guc->destroy_async, __guc_exec_queue_destroy_async);
/* We must block on kernel engines so slabs are empty on driver unload */ if (q->flags & EXEC_QUEUE_FLAG_PERMANENT || exec_queue_wedged(q)) - __guc_exec_queue_fini_async(&q->guc->fini_async); + __guc_exec_queue_destroy_async(&q->guc->destroy_async); else - queue_work(xe->destroy_wq, &q->guc->fini_async); + queue_work(xe->destroy_wq, &q->guc->destroy_async); }
-static void __guc_exec_queue_fini(struct xe_guc *guc, struct xe_exec_queue *q) +static void __guc_exec_queue_destroy(struct xe_guc *guc, struct xe_exec_queue *q) { /* * Might be done from within the GPU scheduler, need to do async as we @@ -1319,7 +1328,7 @@ static void __guc_exec_queue_fini(struct xe_guc *guc, struct xe_exec_queue *q) * this we and don't really care when everything is fini'd, just that it * is. */ - guc_exec_queue_fini_async(q); + guc_exec_queue_destroy_async(q); }
static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg) @@ -1333,7 +1342,7 @@ static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg) if (exec_queue_registered(q)) disable_scheduling_deregister(guc, q); else - __guc_exec_queue_fini(guc, q); + __guc_exec_queue_destroy(guc, q); }
static bool guc_exec_queue_allowed_to_change_state(struct xe_exec_queue *q) @@ -1566,14 +1575,14 @@ static bool guc_exec_queue_try_add_msg(struct xe_exec_queue *q, #define STATIC_MSG_CLEANUP 0 #define STATIC_MSG_SUSPEND 1 #define STATIC_MSG_RESUME 2 -static void guc_exec_queue_fini(struct xe_exec_queue *q) +static void guc_exec_queue_destroy(struct xe_exec_queue *q) { struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_CLEANUP;
if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && !exec_queue_wedged(q)) guc_exec_queue_add_msg(q, msg, CLEANUP); else - __guc_exec_queue_fini(exec_queue_to_guc(q), q); + __guc_exec_queue_destroy(exec_queue_to_guc(q), q); }
static int guc_exec_queue_set_priority(struct xe_exec_queue *q, @@ -1703,6 +1712,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = { .init = guc_exec_queue_init, .kill = guc_exec_queue_kill, .fini = guc_exec_queue_fini, + .destroy = guc_exec_queue_destroy, .set_priority = guc_exec_queue_set_priority, .set_timeslice = guc_exec_queue_set_timeslice, .set_preempt_timeout = guc_exec_queue_set_preempt_timeout, @@ -1724,7 +1734,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q) if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q)) xe_exec_queue_put(q); else if (exec_queue_destroyed(q)) - __guc_exec_queue_fini(guc, q); + __guc_exec_queue_destroy(guc, q); } if (q->guc->suspend_pending) { set_exec_queue_suspended(q); @@ -1981,7 +1991,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q) if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q)) xe_exec_queue_put(q); else - __guc_exec_queue_fini(guc, q); + __guc_exec_queue_destroy(guc, q); }
int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
[ Upstream commit a7ffcea8631af91479cab10aa7fbfd0722f01d9a ]
On newer HW (Xe2 onwards + PVC) it is possible to get extra information when a CAT error occurs, specifically a dword reporting the error type. To enable this extra reporting, we need to opt-in with the GuC, which is done via a specific per-VF feature opt-in H2G.
On platforms where the HW does not support the extra reporting, the GuC will set the type to 0xdeadbeef, so we can keep the code simple and opt-in to the feature on every platform and then just discard the data if it is invalid.
Note that on native/PF we're guaranteed that the opt in is available because we don't support any GuC old enough to not have it, but if we're a VF we might be running on a non-XE PF with an older GuC, so we need to handle that case. We can re-use the invalid type above to handle this scenario the same way as if the feature was not supported in HW.
Given that this patch is the first user of the guc_buf_cache on native and VF, it also extends that feature to non-PF use-cases.
v2: simpler print for the error type (John), rebase v3: use guc_buf_cache instead of new alloc, simpler doc (Michal)
Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Nirmoy Das nirmoy.das@intel.com Cc: John Harrison John.C.Harrison@Intel.com Cc: Michal Wajdeczko michal.wajdeczko@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com #v1 Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@int... Stable-dep-of: 26caeae9fb48 ("drm/xe/guc: Set RCS/CCS yield policy") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/abi/guc_actions_abi.h | 4 ++ drivers/gpu/drm/xe/abi/guc_klvs_abi.h | 15 +++++++ drivers/gpu/drm/xe/xe_guc.c | 56 ++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_guc.h | 1 + drivers/gpu/drm/xe/xe_guc_submit.c | 21 +++++++-- drivers/gpu/drm/xe/xe_uc.c | 4 ++ 6 files changed, 98 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h index 448afb86e05c7..b55d4cfb483a1 100644 --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h @@ -142,6 +142,7 @@ enum xe_guc_action { XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A, XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C, XE_GUC_ACTION_SET_FUNCTION_ENGINE_ACTIVITY_BUFFER = 0x550D, + XE_GUC_ACTION_OPT_IN_FEATURE_KLV = 0x550E, XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000, XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC = 0x6002, XE_GUC_ACTION_PAGE_FAULT_RES_DESC = 0x6003, @@ -240,4 +241,7 @@ enum xe_guc_g2g_type { #define XE_G2G_DEREGISTER_TILE REG_GENMASK(15, 12) #define XE_G2G_DEREGISTER_TYPE REG_GENMASK(11, 8)
+/* invalid type for XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR */ +#define XE_GUC_CAT_ERR_TYPE_INVALID 0xdeadbeef + #endif diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h index 7de8f827281fc..5b2502bec2dcc 100644 --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h @@ -16,6 +16,7 @@ * +===+=======+==============================================================+ * | 0 | 31:16 | **KEY** - KLV key identifier | * | | | - `GuC Self Config KLVs`_ | + * | | | - `GuC Opt In Feature KLVs`_ | * | | | - `GuC VGT Policy KLVs`_ | * | | | - `GuC VF Configuration KLVs`_ | * | | | | @@ -124,6 +125,20 @@ enum { GUC_CONTEXT_POLICIES_KLV_NUM_IDS = 5, };
+/** + * DOC: GuC Opt In Feature KLVs + * + * `GuC KLV`_ keys available for use with OPT_IN_FEATURE_KLV + * + * _`GUC_KLV_OPT_IN_FEATURE_EXT_CAT_ERR_TYPE` : 0x4001 + * Adds an extra dword to the XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR G2H + * containing the type of the CAT error. On HW that does not support + * reporting the CAT error type, the extra dword is set to 0xdeadbeef. + */ + +#define GUC_KLV_OPT_IN_FEATURE_EXT_CAT_ERR_TYPE_KEY 0x4001 +#define GUC_KLV_OPT_IN_FEATURE_EXT_CAT_ERR_TYPE_LEN 0u + /** * DOC: GuC VGT Policy KLVs * diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index bac5471a1a780..2efc0298e1a4c 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -29,6 +29,7 @@ #include "xe_guc_db_mgr.h" #include "xe_guc_engine_activity.h" #include "xe_guc_hwconfig.h" +#include "xe_guc_klv_helpers.h" #include "xe_guc_log.h" #include "xe_guc_pc.h" #include "xe_guc_relay.h" @@ -570,6 +571,57 @@ static int guc_g2g_start(struct xe_guc *guc) return err; }
+static int __guc_opt_in_features_enable(struct xe_guc *guc, u64 addr, u32 num_dwords) +{ + u32 action[] = { + XE_GUC_ACTION_OPT_IN_FEATURE_KLV, + lower_32_bits(addr), + upper_32_bits(addr), + num_dwords + }; + + return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action)); +} + +#define OPT_IN_MAX_DWORDS 16 +int xe_guc_opt_in_features_enable(struct xe_guc *guc) +{ + struct xe_device *xe = guc_to_xe(guc); + CLASS(xe_guc_buf, buf)(&guc->buf, OPT_IN_MAX_DWORDS); + u32 count = 0; + u32 *klvs; + int ret; + + if (!xe_guc_buf_is_valid(buf)) + return -ENOBUFS; + + klvs = xe_guc_buf_cpu_ptr(buf); + + /* + * The extra CAT error type opt-in was added in GuC v70.17.0, which maps + * to compatibility version v1.7.0. + * Note that the GuC allows enabling this KLV even on platforms that do + * not support the extra type; in such case the returned type variable + * will be set to a known invalid value which we can check against. + */ + if (GUC_SUBMIT_VER(guc) >= MAKE_GUC_VER(1, 7, 0)) + klvs[count++] = PREP_GUC_KLV_TAG(OPT_IN_FEATURE_EXT_CAT_ERR_TYPE); + + if (count) { + xe_assert(xe, count <= OPT_IN_MAX_DWORDS); + + ret = __guc_opt_in_features_enable(guc, xe_guc_buf_flush(buf), count); + if (ret < 0) { + xe_gt_err(guc_to_gt(guc), + "failed to enable GuC opt-in features: %pe\n", + ERR_PTR(ret)); + return ret; + } + } + + return 0; +} + static void guc_fini_hw(void *arg) { struct xe_guc *guc = arg; @@ -763,6 +815,10 @@ int xe_guc_post_load_init(struct xe_guc *guc)
xe_guc_ads_populate_post_load(&guc->ads);
+ ret = xe_guc_opt_in_features_enable(guc); + if (ret) + return ret; + if (xe_guc_g2g_wanted(guc_to_xe(guc))) { ret = guc_g2g_start(guc); if (ret) diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h index 58338be445585..4a66575f017d2 100644 --- a/drivers/gpu/drm/xe/xe_guc.h +++ b/drivers/gpu/drm/xe/xe_guc.h @@ -33,6 +33,7 @@ int xe_guc_reset(struct xe_guc *guc); int xe_guc_upload(struct xe_guc *guc); int xe_guc_min_load_for_hwconfig(struct xe_guc *guc); int xe_guc_enable_communication(struct xe_guc *guc); +int xe_guc_opt_in_features_enable(struct xe_guc *guc); int xe_guc_suspend(struct xe_guc *guc); void xe_guc_notify(struct xe_guc *guc); int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr); diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index 45a21af126927..e670dcb0f0932 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -2088,12 +2088,16 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, struct xe_gt *gt = guc_to_gt(guc); struct xe_exec_queue *q; u32 guc_id; + u32 type = XE_GUC_CAT_ERR_TYPE_INVALID;
- if (unlikely(len < 1)) + if (unlikely(!len || len > 2)) return -EPROTO;
guc_id = msg[0];
+ if (len == 2) + type = msg[1]; + if (guc_id == GUC_ID_UNKNOWN) { /* * GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any PF/VF @@ -2107,8 +2111,19 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, if (unlikely(!q)) return -EPROTO;
- xe_gt_dbg(gt, "Engine memory cat error: engine_class=%s, logical_mask: 0x%x, guc_id=%d", - xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); + /* + * The type is HW-defined and changes based on platform, so we don't + * decode it in the kernel and only check if it is valid. + * See bspec 54047 and 72187 for details. + */ + if (type != XE_GUC_CAT_ERR_TYPE_INVALID) + xe_gt_dbg(gt, + "Engine memory CAT error [%u]: class=%s, logical_mask: 0x%x, guc_id=%d", + type, xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); + else + xe_gt_dbg(gt, + "Engine memory CAT error: class=%s, logical_mask: 0x%x, guc_id=%d", + xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id);
trace_xe_exec_queue_memory_cat_error(q);
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c index 3a8751a8b92dd..5c45b0f072a4c 100644 --- a/drivers/gpu/drm/xe/xe_uc.c +++ b/drivers/gpu/drm/xe/xe_uc.c @@ -165,6 +165,10 @@ static int vf_uc_init_hw(struct xe_uc *uc)
uc->guc.submission_state.enabled = true;
+ err = xe_guc_opt_in_features_enable(&uc->guc); + if (err) + return err; + err = xe_gt_record_default_lrcs(uc_to_gt(uc)); if (err) return err;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
[ Upstream commit 26caeae9fb482ec443753b4e3307e5122b60b850 ]
All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval.
v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize
Fixes: d9a1ae0d17bd ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: John Harrison John.C.Harrison@Intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com Tested-by: Vinay Belgaumkar vinay.belgaumkar@intel.com Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@int... (cherry picked from commit 88434448438e4302e272b2a2b810b42e05ea024b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com [Rodrigo added #include "xe_guc_submit.h" while backporting] Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/xe/abi/guc_klvs_abi.h | 25 +++++++++ drivers/gpu/drm/xe/xe_gt.c | 3 +- drivers/gpu/drm/xe/xe_guc.c | 6 +-- drivers/gpu/drm/xe/xe_guc_submit.c | 66 ++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_guc_submit.h | 2 + 6 files changed, 98 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h index b55d4cfb483a1..4d9896e14649c 100644 --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h @@ -117,6 +117,7 @@ enum xe_guc_action { XE_GUC_ACTION_ENTER_S_STATE = 0x501, XE_GUC_ACTION_EXIT_S_STATE = 0x502, XE_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506, + XE_GUC_ACTION_UPDATE_SCHEDULING_POLICIES_KLV = 0x509, XE_GUC_ACTION_SCHED_CONTEXT = 0x1000, XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001, XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002, diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h index 5b2502bec2dcc..89034bc97ec5a 100644 --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h @@ -17,6 +17,7 @@ * | 0 | 31:16 | **KEY** - KLV key identifier | * | | | - `GuC Self Config KLVs`_ | * | | | - `GuC Opt In Feature KLVs`_ | + * | | | - `GuC Scheduling Policies KLVs`_ | * | | | - `GuC VGT Policy KLVs`_ | * | | | - `GuC VF Configuration KLVs`_ | * | | | | @@ -139,6 +140,30 @@ enum { #define GUC_KLV_OPT_IN_FEATURE_EXT_CAT_ERR_TYPE_KEY 0x4001 #define GUC_KLV_OPT_IN_FEATURE_EXT_CAT_ERR_TYPE_LEN 0u
+/** + * DOC: GuC Scheduling Policies KLVs + * + * `GuC KLV`_ keys available for use with UPDATE_SCHEDULING_POLICIES_KLV. + * + * _`GUC_KLV_SCHEDULING_POLICIES_RENDER_COMPUTE_YIELD` : 0x1001 + * Some platforms do not allow concurrent execution of RCS and CCS + * workloads from different address spaces. By default, the GuC prioritizes + * RCS submissions over CCS ones, which can lead to CCS workloads being + * significantly (or completely) starved of execution time. This KLV allows + * the driver to specify a quantum (in ms) and a ratio (percentage value + * between 0 and 100), and the GuC will prioritize the CCS for that + * percentage of each quantum. For example, specifying 100ms and 30% will + * make the GuC prioritize the CCS for 30ms of every 100ms. + * Note that this does not necessarly mean that RCS and CCS engines will + * only be active for their percentage of the quantum, as the restriction + * only kicks in if both classes are fully busy with non-compatible address + * spaces; i.e., if one engine is idle or running the same address space, + * a pending job on the other engine will still be submitted to the HW no + * matter what the ratio is + */ +#define GUC_KLV_SCHEDULING_POLICIES_RENDER_COMPUTE_YIELD_KEY 0x1001 +#define GUC_KLV_SCHEDULING_POLICIES_RENDER_COMPUTE_YIELD_LEN 2u + /** * DOC: GuC VGT Policy KLVs * diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index e3517ce2e18c1..eaf7569a7c1d1 100644 --- a/drivers/gpu/drm/xe/xe_gt.c +++ b/drivers/gpu/drm/xe/xe_gt.c @@ -41,6 +41,7 @@ #include "xe_gt_topology.h" #include "xe_guc_exec_queue_types.h" #include "xe_guc_pc.h" +#include "xe_guc_submit.h" #include "xe_hw_fence.h" #include "xe_hw_engine_class_sysfs.h" #include "xe_irq.h" @@ -97,7 +98,7 @@ void xe_gt_sanitize(struct xe_gt *gt) * FIXME: if xe_uc_sanitize is called here, on TGL driver will not * reload */ - gt->uc.guc.submission_state.enabled = false; + xe_guc_submit_disable(>->uc.guc); }
static void xe_gt_enable_host_l2_vram(struct xe_gt *gt) diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 2efc0298e1a4c..b9d21fdaad48b 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -825,9 +825,7 @@ int xe_guc_post_load_init(struct xe_guc *guc) return ret; }
- guc->submission_state.enabled = true; - - return 0; + return xe_guc_submit_enable(guc); }
int xe_guc_reset(struct xe_guc *guc) @@ -1521,7 +1519,7 @@ void xe_guc_sanitize(struct xe_guc *guc) { xe_uc_fw_sanitize(&guc->fw); xe_guc_ct_disable(&guc->ct); - guc->submission_state.enabled = false; + xe_guc_submit_disable(guc); }
int xe_guc_reset_prepare(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index e670dcb0f0932..18ddbb7b98a15 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -32,6 +32,7 @@ #include "xe_guc_ct.h" #include "xe_guc_exec_queue_types.h" #include "xe_guc_id_mgr.h" +#include "xe_guc_klv_helpers.h" #include "xe_guc_submit_types.h" #include "xe_hw_engine.h" #include "xe_hw_fence.h" @@ -316,6 +317,71 @@ int xe_guc_submit_init(struct xe_guc *guc, unsigned int num_ids) return drmm_add_action_or_reset(&xe->drm, guc_submit_fini, guc); }
+/* + * Given that we want to guarantee enough RCS throughput to avoid missing + * frames, we set the yield policy to 20% of each 80ms interval. + */ +#define RC_YIELD_DURATION 80 /* in ms */ +#define RC_YIELD_RATIO 20 /* in percent */ +static u32 *emit_render_compute_yield_klv(u32 *emit) +{ + *emit++ = PREP_GUC_KLV_TAG(SCHEDULING_POLICIES_RENDER_COMPUTE_YIELD); + *emit++ = RC_YIELD_DURATION; + *emit++ = RC_YIELD_RATIO; + + return emit; +} + +#define SCHEDULING_POLICY_MAX_DWORDS 16 +static int guc_init_global_schedule_policy(struct xe_guc *guc) +{ + u32 data[SCHEDULING_POLICY_MAX_DWORDS]; + u32 *emit = data; + u32 count = 0; + int ret; + + if (GUC_SUBMIT_VER(guc) < MAKE_GUC_VER(1, 1, 0)) + return 0; + + *emit++ = XE_GUC_ACTION_UPDATE_SCHEDULING_POLICIES_KLV; + + if (CCS_MASK(guc_to_gt(guc))) + emit = emit_render_compute_yield_klv(emit); + + count = emit - data; + if (count > 1) { + xe_assert(guc_to_xe(guc), count <= SCHEDULING_POLICY_MAX_DWORDS); + + ret = xe_guc_ct_send_block(&guc->ct, data, count); + if (ret < 0) { + xe_gt_err(guc_to_gt(guc), + "failed to enable GuC sheduling policies: %pe\n", + ERR_PTR(ret)); + return ret; + } + } + + return 0; +} + +int xe_guc_submit_enable(struct xe_guc *guc) +{ + int ret; + + ret = guc_init_global_schedule_policy(guc); + if (ret) + return ret; + + guc->submission_state.enabled = true; + + return 0; +} + +void xe_guc_submit_disable(struct xe_guc *guc) +{ + guc->submission_state.enabled = false; +} + static void __release_guc_id(struct xe_guc *guc, struct xe_exec_queue *q, u32 xa_count) { int i; diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h index 9b71a986c6ca6..0d126b807c104 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.h +++ b/drivers/gpu/drm/xe/xe_guc_submit.h @@ -13,6 +13,8 @@ struct xe_exec_queue; struct xe_guc;
int xe_guc_submit_init(struct xe_guc *guc, unsigned int num_ids); +int xe_guc_submit_enable(struct xe_guc *guc); +void xe_guc_submit_disable(struct xe_guc *guc);
int xe_guc_submit_reset_prepare(struct xe_guc *guc); void xe_guc_submit_reset_wait(struct xe_guc *guc);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit 33dd53a90e3419ea260e9ff2b4aa107385cdf7fa ]
The expected message type can be global as they never change during the after negotiation process.
This will replace smbd_response->type and smb_direct_recvmsg->type in future.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: Namjae Jeon linkinjeon@kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/common/smbdirect/smbdirect_socket.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/fs/smb/common/smbdirect/smbdirect_socket.h b/fs/smb/common/smbdirect/smbdirect_socket.h index e5b15cc44a7ba..5db7815b614f8 100644 --- a/fs/smb/common/smbdirect/smbdirect_socket.h +++ b/fs/smb/common/smbdirect/smbdirect_socket.h @@ -38,6 +38,20 @@ struct smbdirect_socket { } ib;
struct smbdirect_socket_parameters parameters; + + /* + * The state for posted receive buffers + */ + struct { + /* + * The type of PDU we are expecting + */ + enum { + SMBDIRECT_EXPECT_NEGOTIATE_REQ = 1, + SMBDIRECT_EXPECT_NEGOTIATE_REP = 2, + SMBDIRECT_EXPECT_DATA_TRANSFER = 3, + } expected; + } recv_io; };
#endif /* __FS_SMB_COMMON_SMBDIRECT_SMBDIRECT_SOCKET_H__ */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit bbdbd9ae47155da65aa0c1641698a44d85c2faa2 ]
The expected incoming message type can be per connection.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smbdirect.c | 22 ++++++++++++++-------- fs/smb/client/smbdirect.h | 7 ------- 2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index b9bb531717a65..a6aa2c609dc3b 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -383,6 +383,7 @@ static bool process_negotiation_response( info->max_frmr_depth * PAGE_SIZE); info->max_frmr_depth = sp->max_read_write_size / PAGE_SIZE;
+ sc->recv_io.expected = SMBDIRECT_EXPECT_DATA_TRANSFER; return true; }
@@ -408,7 +409,6 @@ static void smbd_post_send_credits(struct work_struct *work) if (!response) break;
- response->type = SMBD_TRANSFER_DATA; response->first_segment = false; rc = smbd_post_recv(info, response); if (rc) { @@ -445,10 +445,11 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) struct smbd_response *response = container_of(wc->wr_cqe, struct smbd_response, cqe); struct smbd_connection *info = response->info; + struct smbdirect_socket *sc = &info->socket; int data_length = 0;
log_rdma_recv(INFO, "response=0x%p type=%d wc status=%d wc opcode %d byte_len=%d pkey_index=%u\n", - response, response->type, wc->status, wc->opcode, + response, sc->recv_io.expected, wc->status, wc->opcode, wc->byte_len, wc->pkey_index);
if (wc->status != IB_WC_SUCCESS || wc->opcode != IB_WC_RECV) { @@ -463,9 +464,9 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) response->sge.length, DMA_FROM_DEVICE);
- switch (response->type) { + switch (sc->recv_io.expected) { /* SMBD negotiation response */ - case SMBD_NEGOTIATE_RESP: + case SMBDIRECT_EXPECT_NEGOTIATE_REP: dump_smbdirect_negotiate_resp(smbd_response_payload(response)); info->full_packet_received = true; info->negotiate_done = @@ -475,7 +476,7 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) return;
/* SMBD data transfer packet */ - case SMBD_TRANSFER_DATA: + case SMBDIRECT_EXPECT_DATA_TRANSFER: data_transfer = smbd_response_payload(response); data_length = le32_to_cpu(data_transfer->data_length);
@@ -526,13 +527,17 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) put_receive_buffer(info, response);
return; + + case SMBDIRECT_EXPECT_NEGOTIATE_REQ: + /* Only server... */ + break; }
/* * This is an internal error! */ - log_rdma_recv(ERR, "unexpected response type=%d\n", response->type); - WARN_ON_ONCE(response->type != SMBD_TRANSFER_DATA); + log_rdma_recv(ERR, "unexpected response type=%d\n", sc->recv_io.expected); + WARN_ON_ONCE(sc->recv_io.expected != SMBDIRECT_EXPECT_DATA_TRANSFER); error: put_receive_buffer(info, response); smbd_disconnect_rdma_connection(info); @@ -1067,10 +1072,11 @@ static int smbd_post_recv( /* Perform SMBD negotiate according to [MS-SMBD] 3.1.5.2 */ static int smbd_negotiate(struct smbd_connection *info) { + struct smbdirect_socket *sc = &info->socket; int rc; struct smbd_response *response = get_receive_buffer(info);
- response->type = SMBD_NEGOTIATE_RESP; + sc->recv_io.expected = SMBDIRECT_EXPECT_NEGOTIATE_REP; rc = smbd_post_recv(info, response); log_rdma_event(INFO, "smbd_post_recv rc=%d iov.addr=0x%llx iov.length=%u iov.lkey=0x%x\n", rc, response->sge.addr, diff --git a/fs/smb/client/smbdirect.h b/fs/smb/client/smbdirect.h index ea04ce8a9763a..bf50544eaf02d 100644 --- a/fs/smb/client/smbdirect.h +++ b/fs/smb/client/smbdirect.h @@ -157,11 +157,6 @@ struct smbd_connection { unsigned int count_send_empty; };
-enum smbd_message_type { - SMBD_NEGOTIATE_RESP, - SMBD_TRANSFER_DATA, -}; - /* Maximum number of SGEs used by smbdirect.c in any send work request */ #define SMBDIRECT_MAX_SEND_SGE 6
@@ -187,8 +182,6 @@ struct smbd_response { struct ib_cqe cqe; struct ib_sge sge;
- enum smbd_message_type type; - /* Link to receive queue or reassembly queue */ struct list_head list;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit 60812d20da82606f0620904c281579a9af0ab452 ]
This will be used in client and server soon in order to replace smbd_response/smb_direct_recvmsg.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: Namjae Jeon linkinjeon@kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/common/smbdirect/smbdirect_socket.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/fs/smb/common/smbdirect/smbdirect_socket.h b/fs/smb/common/smbdirect/smbdirect_socket.h index 5db7815b614f8..a7ad31c471a7b 100644 --- a/fs/smb/common/smbdirect/smbdirect_socket.h +++ b/fs/smb/common/smbdirect/smbdirect_socket.h @@ -54,4 +54,19 @@ struct smbdirect_socket { } recv_io; };
+struct smbdirect_recv_io { + struct smbdirect_socket *socket; + struct ib_cqe cqe; + struct ib_sge sge; + + /* Link to free or reassembly list */ + struct list_head list; + + /* Indicate if this is the 1st packet of a payload */ + bool first_segment; + + /* SMBD packet header and payload follows this structure */ + u8 packet[]; +}; + #endif /* __FS_SMB_COMMON_SMBDIRECT_SMBDIRECT_SOCKET_H__ */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit 5dddf0497445d247e995306daf3b76dd0633831c ]
This is the shared structure that will be used in the server too and will allow us to move helper functions into common code soon.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: f57e53ea2523 ("smb: client: let recv_done verify data_offset, data_length and remaining_data_length") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smbdirect.c | 79 ++++++++++++++++++++------------------- fs/smb/client/smbdirect.h | 16 -------- 2 files changed, 41 insertions(+), 54 deletions(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index a6aa2c609dc3b..18702c67c8484 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -13,23 +13,23 @@ #include "cifsproto.h" #include "smb2proto.h"
-static struct smbd_response *get_receive_buffer( +static struct smbdirect_recv_io *get_receive_buffer( struct smbd_connection *info); static void put_receive_buffer( struct smbd_connection *info, - struct smbd_response *response); + struct smbdirect_recv_io *response); static int allocate_receive_buffers(struct smbd_connection *info, int num_buf); static void destroy_receive_buffers(struct smbd_connection *info);
static void enqueue_reassembly( struct smbd_connection *info, - struct smbd_response *response, int data_length); -static struct smbd_response *_get_first_reassembly( + struct smbdirect_recv_io *response, int data_length); +static struct smbdirect_recv_io *_get_first_reassembly( struct smbd_connection *info);
static int smbd_post_recv( struct smbd_connection *info, - struct smbd_response *response); + struct smbdirect_recv_io *response);
static int smbd_post_send_empty(struct smbd_connection *info);
@@ -260,7 +260,7 @@ static inline void *smbd_request_payload(struct smbd_request *request) return (void *)request->packet; }
-static inline void *smbd_response_payload(struct smbd_response *response) +static inline void *smbdirect_recv_io_payload(struct smbdirect_recv_io *response) { return (void *)response->packet; } @@ -315,12 +315,13 @@ static void dump_smbdirect_negotiate_resp(struct smbdirect_negotiate_resp *resp) * return value: true if negotiation is a success, false if failed */ static bool process_negotiation_response( - struct smbd_response *response, int packet_length) + struct smbdirect_recv_io *response, int packet_length) { - struct smbd_connection *info = response->info; - struct smbdirect_socket *sc = &info->socket; + struct smbdirect_socket *sc = response->socket; + struct smbd_connection *info = + container_of(sc, struct smbd_connection, socket); struct smbdirect_socket_parameters *sp = &sc->parameters; - struct smbdirect_negotiate_resp *packet = smbd_response_payload(response); + struct smbdirect_negotiate_resp *packet = smbdirect_recv_io_payload(response);
if (packet_length < sizeof(struct smbdirect_negotiate_resp)) { log_rdma_event(ERR, @@ -391,7 +392,7 @@ static void smbd_post_send_credits(struct work_struct *work) { int ret = 0; int rc; - struct smbd_response *response; + struct smbdirect_recv_io *response; struct smbd_connection *info = container_of(work, struct smbd_connection, post_send_credits_work); @@ -442,10 +443,11 @@ static void smbd_post_send_credits(struct work_struct *work) static void recv_done(struct ib_cq *cq, struct ib_wc *wc) { struct smbdirect_data_transfer *data_transfer; - struct smbd_response *response = - container_of(wc->wr_cqe, struct smbd_response, cqe); - struct smbd_connection *info = response->info; - struct smbdirect_socket *sc = &info->socket; + struct smbdirect_recv_io *response = + container_of(wc->wr_cqe, struct smbdirect_recv_io, cqe); + struct smbdirect_socket *sc = response->socket; + struct smbd_connection *info = + container_of(sc, struct smbd_connection, socket); int data_length = 0;
log_rdma_recv(INFO, "response=0x%p type=%d wc status=%d wc opcode %d byte_len=%d pkey_index=%u\n", @@ -467,7 +469,7 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) switch (sc->recv_io.expected) { /* SMBD negotiation response */ case SMBDIRECT_EXPECT_NEGOTIATE_REP: - dump_smbdirect_negotiate_resp(smbd_response_payload(response)); + dump_smbdirect_negotiate_resp(smbdirect_recv_io_payload(response)); info->full_packet_received = true; info->negotiate_done = process_negotiation_response(response, wc->byte_len); @@ -477,7 +479,7 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc)
/* SMBD data transfer packet */ case SMBDIRECT_EXPECT_DATA_TRANSFER: - data_transfer = smbd_response_payload(response); + data_transfer = smbdirect_recv_io_payload(response); data_length = le32_to_cpu(data_transfer->data_length);
if (data_length) { @@ -1034,7 +1036,7 @@ static int smbd_post_send_full_iter(struct smbd_connection *info, * The interaction is controlled by send/receive credit system */ static int smbd_post_recv( - struct smbd_connection *info, struct smbd_response *response) + struct smbd_connection *info, struct smbdirect_recv_io *response) { struct smbdirect_socket *sc = &info->socket; struct smbdirect_socket_parameters *sp = &sc->parameters; @@ -1074,7 +1076,7 @@ static int smbd_negotiate(struct smbd_connection *info) { struct smbdirect_socket *sc = &info->socket; int rc; - struct smbd_response *response = get_receive_buffer(info); + struct smbdirect_recv_io *response = get_receive_buffer(info);
sc->recv_io.expected = SMBDIRECT_EXPECT_NEGOTIATE_REP; rc = smbd_post_recv(info, response); @@ -1119,7 +1121,7 @@ static int smbd_negotiate(struct smbd_connection *info) */ static void enqueue_reassembly( struct smbd_connection *info, - struct smbd_response *response, + struct smbdirect_recv_io *response, int data_length) { spin_lock(&info->reassembly_queue_lock); @@ -1143,14 +1145,14 @@ static void enqueue_reassembly( * Caller is responsible for locking * return value: the first entry if any, NULL if queue is empty */ -static struct smbd_response *_get_first_reassembly(struct smbd_connection *info) +static struct smbdirect_recv_io *_get_first_reassembly(struct smbd_connection *info) { - struct smbd_response *ret = NULL; + struct smbdirect_recv_io *ret = NULL;
if (!list_empty(&info->reassembly_queue)) { ret = list_first_entry( &info->reassembly_queue, - struct smbd_response, list); + struct smbdirect_recv_io, list); } return ret; } @@ -1161,16 +1163,16 @@ static struct smbd_response *_get_first_reassembly(struct smbd_connection *info) * pre-allocated in advance. * return value: the receive buffer, NULL if none is available */ -static struct smbd_response *get_receive_buffer(struct smbd_connection *info) +static struct smbdirect_recv_io *get_receive_buffer(struct smbd_connection *info) { - struct smbd_response *ret = NULL; + struct smbdirect_recv_io *ret = NULL; unsigned long flags;
spin_lock_irqsave(&info->receive_queue_lock, flags); if (!list_empty(&info->receive_queue)) { ret = list_first_entry( &info->receive_queue, - struct smbd_response, list); + struct smbdirect_recv_io, list); list_del(&ret->list); info->count_receive_queue--; info->count_get_receive_buffer++; @@ -1187,7 +1189,7 @@ static struct smbd_response *get_receive_buffer(struct smbd_connection *info) * receive buffer is returned. */ static void put_receive_buffer( - struct smbd_connection *info, struct smbd_response *response) + struct smbd_connection *info, struct smbdirect_recv_io *response) { struct smbdirect_socket *sc = &info->socket; unsigned long flags; @@ -1212,8 +1214,9 @@ static void put_receive_buffer( /* Preallocate all receive buffer on transport establishment */ static int allocate_receive_buffers(struct smbd_connection *info, int num_buf) { + struct smbdirect_socket *sc = &info->socket; + struct smbdirect_recv_io *response; int i; - struct smbd_response *response;
INIT_LIST_HEAD(&info->reassembly_queue); spin_lock_init(&info->reassembly_queue_lock); @@ -1231,7 +1234,7 @@ static int allocate_receive_buffers(struct smbd_connection *info, int num_buf) if (!response) goto allocate_failed;
- response->info = info; + response->socket = sc; response->sge.length = 0; list_add_tail(&response->list, &info->receive_queue); info->count_receive_queue++; @@ -1243,7 +1246,7 @@ static int allocate_receive_buffers(struct smbd_connection *info, int num_buf) while (!list_empty(&info->receive_queue)) { response = list_first_entry( &info->receive_queue, - struct smbd_response, list); + struct smbdirect_recv_io, list); list_del(&response->list); info->count_receive_queue--;
@@ -1254,7 +1257,7 @@ static int allocate_receive_buffers(struct smbd_connection *info, int num_buf)
static void destroy_receive_buffers(struct smbd_connection *info) { - struct smbd_response *response; + struct smbdirect_recv_io *response;
while ((response = get_receive_buffer(info))) mempool_free(response, info->response_mempool); @@ -1295,7 +1298,7 @@ void smbd_destroy(struct TCP_Server_Info *server) struct smbd_connection *info = server->smbd_conn; struct smbdirect_socket *sc; struct smbdirect_socket_parameters *sp; - struct smbd_response *response; + struct smbdirect_recv_io *response; unsigned long flags;
if (!info) { @@ -1452,17 +1455,17 @@ static int allocate_caches_and_workqueue(struct smbd_connection *info) if (!info->request_mempool) goto out1;
- scnprintf(name, MAX_NAME_LEN, "smbd_response_%p", info); + scnprintf(name, MAX_NAME_LEN, "smbdirect_recv_io_%p", info);
struct kmem_cache_args response_args = { - .align = __alignof__(struct smbd_response), - .useroffset = (offsetof(struct smbd_response, packet) + + .align = __alignof__(struct smbdirect_recv_io), + .useroffset = (offsetof(struct smbdirect_recv_io, packet) + sizeof(struct smbdirect_data_transfer)), .usersize = sp->max_recv_size - sizeof(struct smbdirect_data_transfer), }; info->response_cache = kmem_cache_create(name, - sizeof(struct smbd_response) + sp->max_recv_size, + sizeof(struct smbdirect_recv_io) + sp->max_recv_size, &response_args, SLAB_HWCACHE_ALIGN); if (!info->response_cache) goto out2; @@ -1753,7 +1756,7 @@ struct smbd_connection *smbd_get_connection( int smbd_recv(struct smbd_connection *info, struct msghdr *msg) { struct smbdirect_socket *sc = &info->socket; - struct smbd_response *response; + struct smbdirect_recv_io *response; struct smbdirect_data_transfer *data_transfer; size_t size = iov_iter_count(&msg->msg_iter); int to_copy, to_read, data_read, offset; @@ -1789,7 +1792,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) offset = info->first_entry_offset; while (data_read < size) { response = _get_first_reassembly(info); - data_transfer = smbd_response_payload(response); + data_transfer = smbdirect_recv_io_payload(response); data_length = le32_to_cpu(data_transfer->data_length); remaining_data_length = le32_to_cpu( diff --git a/fs/smb/client/smbdirect.h b/fs/smb/client/smbdirect.h index bf50544eaf02d..d60e445da2256 100644 --- a/fs/smb/client/smbdirect.h +++ b/fs/smb/client/smbdirect.h @@ -176,22 +176,6 @@ struct smbd_request { /* Maximum number of SGEs used by smbdirect.c in any receive work request */ #define SMBDIRECT_MAX_RECV_SGE 1
-/* The context for a SMBD response */ -struct smbd_response { - struct smbd_connection *info; - struct ib_cqe cqe; - struct ib_sge sge; - - /* Link to receive queue or reassembly queue */ - struct list_head list; - - /* Indicate if this is the 1st packet of a payload */ - bool first_segment; - - /* SMBD packet header and payload follows this structure */ - u8 packet[]; -}; - /* Create a SMBDirect session */ struct smbd_connection *smbd_get_connection( struct TCP_Server_Info *server, struct sockaddr *dstaddr);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit f57e53ea252363234f86674db475839e5b87102e ]
This is inspired by the related server fixes.
Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Reviewed-by: Namjae Jeon linkinjeon@kernel.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smbdirect.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 18702c67c8484..65175ac3d8418 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -446,9 +446,12 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) struct smbdirect_recv_io *response = container_of(wc->wr_cqe, struct smbdirect_recv_io, cqe); struct smbdirect_socket *sc = response->socket; + struct smbdirect_socket_parameters *sp = &sc->parameters; struct smbd_connection *info = container_of(sc, struct smbd_connection, socket); - int data_length = 0; + u32 data_offset = 0; + u32 data_length = 0; + u32 remaining_data_length = 0;
log_rdma_recv(INFO, "response=0x%p type=%d wc status=%d wc opcode %d byte_len=%d pkey_index=%u\n", response, sc->recv_io.expected, wc->status, wc->opcode, @@ -480,7 +483,22 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) /* SMBD data transfer packet */ case SMBDIRECT_EXPECT_DATA_TRANSFER: data_transfer = smbdirect_recv_io_payload(response); + + if (wc->byte_len < + offsetof(struct smbdirect_data_transfer, padding)) + goto error; + + remaining_data_length = le32_to_cpu(data_transfer->remaining_data_length); + data_offset = le32_to_cpu(data_transfer->data_offset); data_length = le32_to_cpu(data_transfer->data_length); + if (wc->byte_len < data_offset || + (u64)wc->byte_len < (u64)data_offset + data_length) + goto error; + + if (remaining_data_length > sp->max_fragmented_recv_size || + data_length > sp->max_fragmented_recv_size || + (u64)remaining_data_length + (u64)data_length > (u64)sp->max_fragmented_recv_size) + goto error;
if (data_length) { if (info->full_packet_received)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.org
[ Upstream commit 93ed9a2951308db374cba4562533dde97bac70d3 ]
Fix the following case where the client would end up closing both deferred files (foo.tmp & foo) after unlink(foo) due to strstr() call in cifs_close_deferred_file_under_dentry():
fd1 = openat(AT_FDCWD, "foo", O_WRONLY|O_CREAT|O_TRUNC, 0666); fd2 = openat(AT_FDCWD, "foo.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666); close(fd1); close(fd2); unlink("foo");
Fixes: e3fc065682eb ("cifs: Deferred close performance improvements") Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.org Reviewed-by: Enzo Matsumiya ematsumiya@suse.de Cc: Frank Sorenson sorenson@redhat.com Cc: David Howells dhowells@redhat.com Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/cifsproto.h | 4 ++-- fs/smb/client/inode.c | 6 +++--- fs/smb/client/misc.c | 38 ++++++++++++++++---------------------- 3 files changed, 21 insertions(+), 27 deletions(-)
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h index 045227ed4efc9..0dcea9acca544 100644 --- a/fs/smb/client/cifsproto.h +++ b/fs/smb/client/cifsproto.h @@ -297,8 +297,8 @@ extern void cifs_close_deferred_file(struct cifsInodeInfo *cifs_inode);
extern void cifs_close_all_deferred_files(struct cifs_tcon *cifs_tcon);
-extern void cifs_close_deferred_file_under_dentry(struct cifs_tcon *cifs_tcon, - const char *path); +void cifs_close_deferred_file_under_dentry(struct cifs_tcon *cifs_tcon, + struct dentry *dentry);
extern void cifs_mark_open_handles_for_deleted_file(struct inode *inode, const char *path); diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index 11d442e8b3d62..1703f1285d36d 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -1984,7 +1984,7 @@ static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyren }
netfs_wait_for_outstanding_io(inode); - cifs_close_deferred_file_under_dentry(tcon, full_path); + cifs_close_deferred_file_under_dentry(tcon, dentry); #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (cap_unix(tcon->ses) && (CIFS_UNIX_POSIX_PATH_OPS_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability))) { @@ -2538,10 +2538,10 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir, goto cifs_rename_exit; }
- cifs_close_deferred_file_under_dentry(tcon, from_name); + cifs_close_deferred_file_under_dentry(tcon, source_dentry); if (d_inode(target_dentry) != NULL) { netfs_wait_for_outstanding_io(d_inode(target_dentry)); - cifs_close_deferred_file_under_dentry(tcon, to_name); + cifs_close_deferred_file_under_dentry(tcon, target_dentry); }
rc = cifs_do_rename(xid, source_dentry, from_name, target_dentry, diff --git a/fs/smb/client/misc.c b/fs/smb/client/misc.c index da23cc12a52ca..dda6dece802ad 100644 --- a/fs/smb/client/misc.c +++ b/fs/smb/client/misc.c @@ -832,33 +832,28 @@ cifs_close_all_deferred_files(struct cifs_tcon *tcon) kfree(tmp_list); } } -void -cifs_close_deferred_file_under_dentry(struct cifs_tcon *tcon, const char *path) + +void cifs_close_deferred_file_under_dentry(struct cifs_tcon *tcon, + struct dentry *dentry) { - struct cifsFileInfo *cfile; struct file_list *tmp_list, *tmp_next_list; - void *page; - const char *full_path; + struct cifsFileInfo *cfile; LIST_HEAD(file_head);
- page = alloc_dentry_path(); spin_lock(&tcon->open_file_lock); list_for_each_entry(cfile, &tcon->openFileList, tlist) { - full_path = build_path_from_dentry(cfile->dentry, page); - if (strstr(full_path, path)) { - if (delayed_work_pending(&cfile->deferred)) { - if (cancel_delayed_work(&cfile->deferred)) { - spin_lock(&CIFS_I(d_inode(cfile->dentry))->deferred_lock); - cifs_del_deferred_close(cfile); - spin_unlock(&CIFS_I(d_inode(cfile->dentry))->deferred_lock); - - tmp_list = kmalloc(sizeof(struct file_list), GFP_ATOMIC); - if (tmp_list == NULL) - break; - tmp_list->cfile = cfile; - list_add_tail(&tmp_list->list, &file_head); - } - } + if ((cfile->dentry == dentry) && + delayed_work_pending(&cfile->deferred) && + cancel_delayed_work(&cfile->deferred)) { + spin_lock(&CIFS_I(d_inode(cfile->dentry))->deferred_lock); + cifs_del_deferred_close(cfile); + spin_unlock(&CIFS_I(d_inode(cfile->dentry))->deferred_lock); + + tmp_list = kmalloc(sizeof(struct file_list), GFP_ATOMIC); + if (tmp_list == NULL) + break; + tmp_list->cfile = cfile; + list_add_tail(&tmp_list->list, &file_head); } } spin_unlock(&tcon->open_file_lock); @@ -868,7 +863,6 @@ cifs_close_deferred_file_under_dentry(struct cifs_tcon *tcon, const char *path) list_del(&tmp_list->list); kfree(tmp_list); } - free_dentry_path(page); }
/*
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit bac28f604c7699727b2fecf14c3a54668bbe458e ]
This makes it safer during the disconnect and avoids requeueing.
It's ok to call disable[delayed_]work[_sync]() more than once.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: Namjae Jeon linkinjeon@kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 050b8c374019 ("smbd: Make upper layer decide when to destroy the transport") Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smbdirect.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 65175ac3d8418..8c6e766078e14 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -1341,7 +1341,7 @@ void smbd_destroy(struct TCP_Server_Info *server) sc->ib.qp = NULL;
log_rdma_event(INFO, "cancelling idle timer\n"); - cancel_delayed_work_sync(&info->idle_timer_work); + disable_delayed_work_sync(&info->idle_timer_work);
/* It's not possible for upper layer to get to reassembly */ log_rdma_event(INFO, "drain the reassembly queue\n"); @@ -1713,7 +1713,7 @@ static struct smbd_connection *_smbd_get_connection( return NULL;
negotiation_failed: - cancel_delayed_work_sync(&info->idle_timer_work); + disable_delayed_work_sync(&info->idle_timer_work); destroy_caches_and_workqueue(info); sc->status = SMBDIRECT_SOCKET_NEGOTIATE_FAILED; rdma_disconnect(sc->rdma.cm_id); @@ -2072,7 +2072,7 @@ static void destroy_mr_list(struct smbd_connection *info) struct smbdirect_socket *sc = &info->socket; struct smbd_mr *mr, *tmp;
- cancel_work_sync(&info->mr_recovery_work); + disable_work_sync(&info->mr_recovery_work); list_for_each_entry_safe(mr, tmp, &info->mr_list, list) { if (mr->state == MR_INVALIDATED) ib_dma_unmap_sg(sc->ib.dev, mr->sgt.sgl,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit d9dcbbcf9145b68aa85c40947311a6907277e097 ]
In smbd_destroy() we may destroy the memory so we better wait until post_send_credits_work is no longer pending and will never be started again.
I actually just hit the case using rxe:
WARNING: CPU: 0 PID: 138 at drivers/infiniband/sw/rxe/rxe_verbs.c:1032 rxe_post_recv+0x1ee/0x480 [rdma_rxe] ... [ 5305.686979] [ T138] smbd_post_recv+0x445/0xc10 [cifs] [ 5305.687135] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687149] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687185] [ T138] ? __pfx_smbd_post_recv+0x10/0x10 [cifs] [ 5305.687329] [ T138] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 5305.687356] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687368] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687378] [ T138] ? _raw_spin_unlock_irqrestore+0x11/0x60 [ 5305.687389] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687399] [ T138] ? get_receive_buffer+0x168/0x210 [cifs] [ 5305.687555] [ T138] smbd_post_send_credits+0x382/0x4b0 [cifs] [ 5305.687701] [ T138] ? __pfx_smbd_post_send_credits+0x10/0x10 [cifs] [ 5305.687855] [ T138] ? __pfx___schedule+0x10/0x10 [ 5305.687865] [ T138] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 5305.687875] [ T138] ? queue_delayed_work_on+0x8e/0xa0 [ 5305.687889] [ T138] process_one_work+0x629/0xf80 [ 5305.687908] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687917] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687933] [ T138] worker_thread+0x87f/0x1570 ...
It means rxe_post_recv was called after rdma_destroy_qp(). This happened because put_receive_buffer() was triggered by ib_drain_qp() and called: queue_work(info->workqueue, &info->post_send_credits_work);
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: Namjae Jeon linkinjeon@kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smbdirect.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 8c6e766078e14..8b920410cd2fe 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -1335,6 +1335,9 @@ void smbd_destroy(struct TCP_Server_Info *server) sc->status == SMBDIRECT_SOCKET_DISCONNECTED); }
+ log_rdma_event(INFO, "cancelling post_send_credits_work\n"); + disable_work_sync(&info->post_send_credits_work); + log_rdma_event(INFO, "destroying qp\n"); ib_drain_qp(sc->ib.qp); rdma_destroy_qp(sc->rdma.cm_id);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu herbert@gondor.apana.org.au
[ Upstream commit 9574b2330dbd2b5459b74d3b5e9619d39299fc6f ]
If an error causes af_alg_sendmsg to abort, ctx->merge may contain a garbage value from the previous loop. This may then trigger a crash on the next entry into af_alg_sendmsg when it attempts to do a merge that can't be done.
Fix this by setting ctx->merge to zero near the start of the loop.
Fixes: 8ff590903d5 ("crypto: algif_skcipher - User-space interface for skcipher operations") Reported-by: Muhammad Alifa Ramdhan ramdhan@starlabs.sg Reported-by: Bing-Jhong Billy Jheng billy@starlabs.sg Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- crypto/af_alg.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/crypto/af_alg.c b/crypto/af_alg.c index f02c5586a8ab3..ca6fdcc6c54ac 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1025,6 +1025,8 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, continue; }
+ ctx->merge = 0; + if (!af_alg_writable(sk)) { err = af_alg_wait_for_wmem(sk, msg->msg_flags); if (err) @@ -1064,7 +1066,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, ctx->used += plen; copied += plen; size -= plen; - ctx->merge = 0; } else { do { struct page *pg;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
[ Upstream commit df8922afc37aa2111ca79a216653a629146763ad ]
A recent commit:
fc582cd26e88 ("io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU")
fixed an issue with not deferring freeing of io_kiocb structs that msg_ring allocates to after the current RCU grace period. But this only covers requests that don't end up in the allocation cache. If a request goes into the alloc cache, it can get reused before it is sane to do so. A recent syzbot report would seem to indicate that there's something there, however it may very well just be because of the KASAN poisoning that the alloc_cache handles manually.
Rather than attempt to make the alloc_cache sane for that use case, just drop the usage of the alloc_cache for msg_ring request payload data.
Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries") Link: https://lore.kernel.org/io-uring/68cc2687.050a0220.139b6.0005.GAE@google.com... Reported-by: syzbot+baa2e0f4e02df602583e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/io_uring_types.h | 3 --- io_uring/io_uring.c | 4 ---- io_uring/msg_ring.c | 24 ++---------------------- 3 files changed, 2 insertions(+), 29 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index a7efcec2e3d08..215ff20affa33 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -418,9 +418,6 @@ struct io_ring_ctx { struct list_head defer_list; unsigned nr_drained;
- struct io_alloc_cache msg_cache; - spinlock_t msg_lock; - #ifdef CONFIG_NET_RX_BUSY_POLL struct list_head napi_list; /* track busy poll napi_id */ spinlock_t napi_lock; /* napi_list lock */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index aa8787777f29a..eaa5410e5a70a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -290,7 +290,6 @@ static void io_free_alloc_caches(struct io_ring_ctx *ctx) io_alloc_cache_free(&ctx->netmsg_cache, io_netmsg_cache_free); io_alloc_cache_free(&ctx->rw_cache, io_rw_cache_free); io_alloc_cache_free(&ctx->cmd_cache, io_cmd_cache_free); - io_alloc_cache_free(&ctx->msg_cache, kfree); io_futex_cache_free(ctx); io_rsrc_cache_free(ctx); } @@ -337,9 +336,6 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) ret |= io_alloc_cache_init(&ctx->cmd_cache, IO_ALLOC_CACHE_MAX, sizeof(struct io_async_cmd), sizeof(struct io_async_cmd)); - spin_lock_init(&ctx->msg_lock); - ret |= io_alloc_cache_init(&ctx->msg_cache, IO_ALLOC_CACHE_MAX, - sizeof(struct io_kiocb), 0); ret |= io_futex_cache_init(ctx); ret |= io_rsrc_cache_init(ctx); if (ret) diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index 4c2578f2efcb0..5e5b94236d720 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -11,7 +11,6 @@ #include "io_uring.h" #include "rsrc.h" #include "filetable.h" -#include "alloc_cache.h" #include "msg_ring.h"
/* All valid masks for MSG_RING */ @@ -76,13 +75,7 @@ static void io_msg_tw_complete(struct io_kiocb *req, io_tw_token_t tw) struct io_ring_ctx *ctx = req->ctx;
io_add_aux_cqe(ctx, req->cqe.user_data, req->cqe.res, req->cqe.flags); - if (spin_trylock(&ctx->msg_lock)) { - if (io_alloc_cache_put(&ctx->msg_cache, req)) - req = NULL; - spin_unlock(&ctx->msg_lock); - } - if (req) - kfree_rcu(req, rcu_head); + kfree_rcu(req, rcu_head); percpu_ref_put(&ctx->refs); }
@@ -104,26 +97,13 @@ static int io_msg_remote_post(struct io_ring_ctx *ctx, struct io_kiocb *req, return 0; }
-static struct io_kiocb *io_msg_get_kiocb(struct io_ring_ctx *ctx) -{ - struct io_kiocb *req = NULL; - - if (spin_trylock(&ctx->msg_lock)) { - req = io_alloc_cache_get(&ctx->msg_cache); - spin_unlock(&ctx->msg_lock); - if (req) - return req; - } - return kmem_cache_alloc(req_cachep, GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); -} - static int io_msg_data_remote(struct io_ring_ctx *target_ctx, struct io_msg *msg) { struct io_kiocb *target; u32 flags = 0;
- target = io_msg_get_kiocb(target_ctx); + target = kmem_cache_alloc(req_cachep, GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO) ; if (unlikely(!target)) return -ENOMEM;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.org
[ Upstream commit 251090e2c2c1be60607d1c521af2c993f04d4f61 ]
Fix the file open check to decide whether or not silly-rename the file in SMB2+.
Fixes: c5ea3065586d ("smb: client: fix data loss due to broken rename(2)") Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.org Cc: Frank Sorenson sorenson@redhat.com Reviewed-by: David Howells dhowells@redhat.com Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/inode.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index 1703f1285d36d..0f0d2dae6283a 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -2003,8 +2003,21 @@ static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyren goto psx_del_no_retry; }
- if (sillyrename || (server->vals->protocol_id > SMB10_PROT_ID && - d_is_positive(dentry) && d_count(dentry) > 2)) + /* For SMB2+, if the file is open, we always perform a silly rename. + * + * We check for d_count() right after calling + * cifs_close_deferred_file_under_dentry() to make sure that the + * dentry's refcount gets dropped in case the file had any deferred + * close. + */ + if (!sillyrename && server->vals->protocol_id > SMB10_PROT_ID) { + spin_lock(&dentry->d_lock); + if (d_count(dentry) > 1) + sillyrename = true; + spin_unlock(&dentry->d_lock); + } + + if (sillyrename) rc = -EBUSY; else rc = server->ops->unlink(xid, tcon, full_path, cifs_sb, dentry);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Metzmacher metze@samba.org
[ Upstream commit daac51c7032036a0ca5f1aa419ad1b0471d1c6e0 ]
During tests of another unrelated patch I was able to trigger this error: Objects remaining on __kmem_cache_shutdown()
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: Namjae Jeon linkinjeon@kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smbdirect.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 8b920410cd2fe..6dd2a1c66df3d 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -1101,8 +1101,10 @@ static int smbd_negotiate(struct smbd_connection *info) log_rdma_event(INFO, "smbd_post_recv rc=%d iov.addr=0x%llx iov.length=%u iov.lkey=0x%x\n", rc, response->sge.addr, response->sge.length, response->sge.lkey); - if (rc) + if (rc) { + put_receive_buffer(info, response); return rc; + }
init_completion(&info->negotiate_completion); info->negotiate_done = false;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Xiuwei yangxiuwei@kylinos.cn
[ Upstream commit 2c139a47eff8de24e3350dadb4c9d5e3426db826 ]
In io_link_skb function, there is a bug where prev_notif is incorrectly assigned using 'nd' instead of 'prev_nd'. This causes the context validation check to compare the current notification with itself instead of comparing it with the previous notification.
Fix by using the correct prev_nd parameter when obtaining prev_notif.
Signed-off-by: Yang Xiuwei yangxiuwei@kylinos.cn Reviewed-by: Pavel Begunkov asml.silence@gmail.com Fixes: 6fe4220912d19 ("io_uring/notif: implement notification stacking") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/notif.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/notif.c b/io_uring/notif.c index 9a6f6e92d7424..ea9c0116cec2d 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -85,7 +85,7 @@ static int io_link_skb(struct sk_buff *skb, struct ubuf_info *uarg) return -EEXIST;
prev_nd = container_of(prev_uarg, struct io_notif_data, uarg); - prev_notif = cmd_to_io_kiocb(nd); + prev_notif = cmd_to_io_kiocb(prev_nd);
/* make sure all noifications can be finished in the same task_work */ if (unlikely(notif->ctx != prev_notif->ctx ||
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antheas Kapenekakis lkml@antheas.dev
commit 132bfcd24925d4d4531a19b87acb8474be82a017 upstream.
On commit 9286dfd5735b ("platform/x86: asus-wmi: Fix spurious rfkill on UX8406MA"), Mathieu adds a quirk for the Zenbook Duo to ignore the code 0x5f (WLAN button disable). On that laptop, this code is triggered when the device keyboard is attached.
On the ASUS ROG Z13 2025, this code is triggered when pressing the side button of the device, which is used to open Armoury Crate in Windows.
As this is becoming a pattern, where newer Asus laptops use this keycode for emitting events, let's convert the wlan ignore quirk to instead allow emitting codes, so that userspace programs can listen to it and so that it does not interfere with the rfkill state.
With this patch, the Z13 wil emit KEY_PROG3 and the Duo will remain unchanged and emit no event. While at it, add a quirk for the Z13 to switch into tablet mode when removing the keyboard.
Signed-off-by: Antheas Kapenekakis lkml@antheas.dev Link: https://lore.kernel.org/r/20250808154710.8981-2-lkml@antheas.dev Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/asus-nb-wmi.c | 23 +++++++++++++++++++---- drivers/platform/x86/asus-wmi.h | 3 ++- 2 files changed, 21 insertions(+), 5 deletions(-)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -147,7 +147,12 @@ static struct quirk_entry quirk_asus_ign };
static struct quirk_entry quirk_asus_zenbook_duo_kbd = { - .ignore_key_wlan = true, + .key_wlan_event = ASUS_WMI_KEY_IGNORE, +}; + +static struct quirk_entry quirk_asus_z13 = { + .key_wlan_event = ASUS_WMI_KEY_ARMOURY, + .tablet_switch_mode = asus_wmi_kbd_dock_devid, };
static int dmi_matched(const struct dmi_system_id *dmi) @@ -539,6 +544,15 @@ static const struct dmi_system_id asus_q }, .driver_data = &quirk_asus_zenbook_duo_kbd, }, + { + .callback = dmi_matched, + .ident = "ASUS ROG Z13", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "ROG Flow Z13"), + }, + .driver_data = &quirk_asus_z13, + }, {}, };
@@ -636,6 +650,7 @@ static const struct key_entry asus_nb_wm { KE_IGNORE, 0xCF, }, /* AC mode */ { KE_KEY, 0xFA, { KEY_PROG2 } }, /* Lid flip action */ { KE_KEY, 0xBD, { KEY_PROG2 } }, /* Lid flip action on ROG xflow laptops */ + { KE_KEY, ASUS_WMI_KEY_ARMOURY, { KEY_PROG3 } }, { KE_END, 0}, };
@@ -655,9 +670,9 @@ static void asus_nb_wmi_key_filter(struc if (atkbd_reports_vol_keys) *code = ASUS_WMI_KEY_IGNORE; break; - case 0x5F: /* Wireless console Disable */ - if (quirks->ignore_key_wlan) - *code = ASUS_WMI_KEY_IGNORE; + case 0x5F: /* Wireless console Disable / Special Key */ + if (quirks->key_wlan_event) + *code = quirks->key_wlan_event; break; } } --- a/drivers/platform/x86/asus-wmi.h +++ b/drivers/platform/x86/asus-wmi.h @@ -18,6 +18,7 @@ #include <linux/i8042.h>
#define ASUS_WMI_KEY_IGNORE (-1) +#define ASUS_WMI_KEY_ARMOURY 0xffff01 #define ASUS_WMI_BRN_DOWN 0x2e #define ASUS_WMI_BRN_UP 0x2f
@@ -40,7 +41,7 @@ struct quirk_entry { bool wmi_force_als_set; bool wmi_ignore_fan; bool filter_i8042_e1_extended_codes; - bool ignore_key_wlan; + int key_wlan_event; enum asus_wmi_tablet_switch_mode tablet_switch_mode; int wapf; /*
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antheas Kapenekakis lkml@antheas.dev
commit 225d1ee0f5ba3218d1814d36564fdb5f37b50474 upstream.
It turns out that the dual screen models use 0x5E for attaching and detaching the keyboard instead of 0x5F. So, re-add the codes by reverting commit cf3940ac737d ("platform/x86: asus-wmi: Remove extra keys from ignore_key_wlan quirk"). For our future reference, add a comment next to 0x5E indicating that it is used for that purpose.
Fixes: cf3940ac737d ("platform/x86: asus-wmi: Remove extra keys from ignore_key_wlan quirk") Reported-by: Rahul Chandra rahul@chandra.net Closes: https://lore.kernel.org/all/10020-68c90c80-d-4ac6c580@106290038/ Cc: stable@kernel.org Signed-off-by: Antheas Kapenekakis lkml@antheas.dev Link: https://patch.msgid.link/20250916072818.196462-1-lkml@antheas.dev Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/asus-nb-wmi.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -670,6 +670,8 @@ static void asus_nb_wmi_key_filter(struc if (atkbd_reports_vol_keys) *code = ASUS_WMI_KEY_IGNORE; break; + case 0x5D: /* Wireless console Toggle */ + case 0x5E: /* Wireless console Enable / Keyboard Attach, Detach */ case 0x5F: /* Wireless console Disable / Special Key */ if (quirks->key_wlan_event) *code = quirks->key_wlan_event;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit 2293c57484ae64c9a3c847c8807db8c26a3a4d41 upstream.
During the connection establishment, a peer can tell the other one that it cannot establish new subflows to the initial IP address and port by setting the 'C' flag [1]. Doing so makes sense when the sender is behind a strict NAT, operating behind a legacy Layer 4 load balancer, or using anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to establish new subflows to the other peer's initial IP address and port. The in-kernel PM has access to this info, but the userspace PM didn't.
The RFC8684 [1] is strict about that:
(...) therefore the receiver MUST NOT try to open any additional subflows toward this address and port.
So it is important to tell the userspace about that as it is responsible for the respect of this flag.
When a new connection is created and established, the Netlink events now contain the existing but not currently used 'flags' attribute. When MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows to the initial IP address and port -- info that are also part of the event -- can be established.
Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1] Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Reported-by: Marek Majkowski marek@cloudflare.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532 Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-401... Signed-off-by: Jakub Kicinski kuba@kernel.org [ Conflicts in mptcp_pm.yaml, because the indentation has been modified in commit ec362192aa9e ("netlink: specs: fix up indentation errors"), which is not in this version. Applying the same modifications, but at a different level. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/netlink/specs/mptcp_pm.yaml | 4 ++-- include/uapi/linux/mptcp.h | 2 ++ include/uapi/linux/mptcp_pm.h | 4 ++-- net/mptcp/pm_netlink.c | 7 +++++++ 4 files changed, 13 insertions(+), 4 deletions(-)
--- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -28,13 +28,13 @@ definitions: traffic-patterns it can take a long time until the MPTCP_EVENT_ESTABLISHED is sent. Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, sport, - dport, server-side. + dport, server-side, [flags]. - name: established doc: >- A MPTCP connection is established (can start new subflows). Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, sport, - dport, server-side. + dport, server-side, [flags]. - name: closed doc: >- --- a/include/uapi/linux/mptcp.h +++ b/include/uapi/linux/mptcp.h @@ -31,6 +31,8 @@ #define MPTCP_INFO_FLAG_FALLBACK _BITUL(0) #define MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED _BITUL(1)
+#define MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 _BITUL(0) + #define MPTCP_PM_ADDR_FLAG_SIGNAL (1 << 0) #define MPTCP_PM_ADDR_FLAG_SUBFLOW (1 << 1) #define MPTCP_PM_ADDR_FLAG_BACKUP (1 << 2) --- a/include/uapi/linux/mptcp_pm.h +++ b/include/uapi/linux/mptcp_pm.h @@ -16,10 +16,10 @@ * good time to allocate memory and send ADD_ADDR if needed. Depending on the * traffic-patterns it can take a long time until the MPTCP_EVENT_ESTABLISHED * is sent. Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, - * sport, dport, server-side. + * sport, dport, server-side, [flags]. * @MPTCP_EVENT_ESTABLISHED: A MPTCP connection is established (can start new * subflows). Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, - * sport, dport, server-side. + * sport, dport, server-side, [flags]. * @MPTCP_EVENT_CLOSED: A MPTCP connection has stopped. Attribute: token. * @MPTCP_EVENT_ANNOUNCED: A new address has been announced by the peer. * Attributes: token, rem_id, family, daddr4 | daddr6 [, dport]. --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -408,6 +408,7 @@ static int mptcp_event_created(struct sk const struct sock *ssk) { int err = nla_put_u32(skb, MPTCP_ATTR_TOKEN, READ_ONCE(msk->token)); + u16 flags = 0;
if (err) return err; @@ -415,6 +416,12 @@ static int mptcp_event_created(struct sk if (nla_put_u8(skb, MPTCP_ATTR_SERVER_SIDE, READ_ONCE(msk->pm.server_side))) return -EMSGSIZE;
+ if (READ_ONCE(msk->pm.remote_deny_join_id0)) + flags |= MPTCP_PM_EV_FLAG_DENY_JOIN_ID0; + + if (flags && nla_put_u16(skb, MPTCP_ATTR_FLAGS, flags)) + return -EMSGSIZE; + return mptcp_event_add_subflow(skb, ssk); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
[ Upstream commit d2db0d78154442fb89165edf8836bf2644c6c58d ]
Allow clock 'uartclk' and 'reg' for nxp,lpc1850-uart to align existed driver and dts. It is really old platform. Keep the same restriction for others.
Allow dmas and dma-names property, which allow maxItems 4 because very old platform (arch/arm/boot/dts/nxp/lpc/lpc18xx.dtsi) use duplicate "tx", "rx", "tx", "rx" as dma-names.
Fix below CHECK_DTB warnings: arch/arm/boot/dts/nxp/lpc/lpc4337-ciaa.dtb: serial@40081000 (nxp,lpc1850-uart): clock-names: ['uartclk', 'reg'] is too long
Signed-off-by: Frank Li Frank.Li@nxp.com Acked-by: "Rob Herring (Arm)" robh@kernel.org Link: https://lore.kernel.org/r/20250602142745.942568-1-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 387d00028ccc ("dt-bindings: serial: 8250: move a constraint") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/serial/8250.yaml | 41 ++++++++++++++++++--- 1 file changed, 37 insertions(+), 4 deletions(-)
--- a/Documentation/devicetree/bindings/serial/8250.yaml +++ b/Documentation/devicetree/bindings/serial/8250.yaml @@ -49,6 +49,24 @@ allOf: - required: [ clock-frequency ] - required: [ clocks ]
+ - if: + properties: + compatible: + contains: + const: nxp,lpc1850-uart + then: + properties: + clock-names: + items: + - const: uartclk + - const: reg + else: + properties: + clock-names: + items: + - const: core + - const: bus + properties: compatible: oneOf: @@ -142,9 +160,22 @@ properties:
clock-names: minItems: 1 - items: - - const: core - - const: bus + maxItems: 2 + oneOf: + - items: + - const: core + - const: bus + - items: + - const: uartclk + - const: reg + + dmas: + minItems: 1 + maxItems: 4 + + dma-names: + minItems: 1 + maxItems: 4
resets: maxItems: 1 @@ -237,7 +268,9 @@ if: properties: compatible: contains: - const: spacemit,k1-uart + enum: + - spacemit,k1-uart + - nxp,lpc1850-uart then: required: [clock-names] properties:
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yixun Lan dlan@gentoo.org
[ Upstream commit 48f9034e024a4c6e279b0d040e1f5589bb544806 ]
In SpacemiT's K1 SoC, the clocks for UART are mandatory needed, so for DT, both clocks and clock-names property should be set as required.
Fixes: 2c0594f9f062 ("dt-bindings: serial: 8250: support an optional second clock") Signed-off-by: Yixun Lan dlan@gentoo.org Acked-by: "Rob Herring (Arm)" robh@kernel.org Link: https://lore.kernel.org/r/20250718-01-k1-uart-binding-v1-1-a92e1e14c836@gent... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 387d00028ccc ("dt-bindings: serial: 8250: move a constraint") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/serial/8250.yaml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/Documentation/devicetree/bindings/serial/8250.yaml +++ b/Documentation/devicetree/bindings/serial/8250.yaml @@ -272,7 +272,9 @@ if: - spacemit,k1-uart - nxp,lpc1850-uart then: - required: [clock-names] + required: + - clocks + - clock-names properties: clocks: minItems: 2
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Elder elder@riscstar.com
[ Upstream commit 387d00028cccee7575f6416953bef62f849d83e3 ]
A block that required a "spacemit,k1-uart" compatible node to specify two clocks was placed in the wrong spot in the binding. Conor Dooley pointed out it belongs earlier in the file, as part of the initial "allOf".
Fixes: 2c0594f9f0629 ("dt-bindings: serial: 8250: support an optional second clock") Cc: stable stable@kernel.org Reported-by: Conor Dooley conor@kernel.org Closes: https://lore.kernel.org/lkml/20250729-reshuffle-contented-e6def76b540b@spud/ Signed-off-by: Alex Elder elder@riscstar.com Acked-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/r/20250813032151.2330616-1-elder@riscstar.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/serial/8250.yaml | 46 ++++++++++----------- 1 file changed, 22 insertions(+), 24 deletions(-)
--- a/Documentation/devicetree/bindings/serial/8250.yaml +++ b/Documentation/devicetree/bindings/serial/8250.yaml @@ -48,7 +48,6 @@ allOf: oneOf: - required: [ clock-frequency ] - required: [ clocks ] - - if: properties: compatible: @@ -66,6 +65,28 @@ allOf: items: - const: core - const: bus + - if: + properties: + compatible: + contains: + enum: + - spacemit,k1-uart + - nxp,lpc1850-uart + then: + required: + - clocks + - clock-names + properties: + clocks: + minItems: 2 + clock-names: + minItems: 2 + else: + properties: + clocks: + maxItems: 1 + clock-names: + maxItems: 1
properties: compatible: @@ -264,29 +285,6 @@ required: - reg - interrupts
-if: - properties: - compatible: - contains: - enum: - - spacemit,k1-uart - - nxp,lpc1850-uart -then: - required: - - clocks - - clock-names - properties: - clocks: - minItems: 2 - clock-names: - minItems: 2 -else: - properties: - clocks: - maxItems: 1 - clock-names: - maxItems: 1 - unevaluatedProperties: false
examples:
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
[ Upstream commit 2780505ec2b42c07853b34640bc63279ac2bb53b ]
If 'enable' parameter of the 'prcl' DAMON sample module is set at boot time via the kernel command line, memory allocation is tried before the slab is initialized. As a result kernel NULL pointer dereference BUG can happen. Fix it by checking the initialization status.
Link: https://lkml.kernel.org/r/20250706193207.39810-3-sj@kernel.org Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation") Signed-off-by: SeongJae Park sj@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: c62cff40481c ("samples/damon/mtier: avoid starting DAMON before initialization") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- samples/damon/prcl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/samples/damon/prcl.c +++ b/samples/damon/prcl.c @@ -109,6 +109,8 @@ static void damon_sample_prcl_stop(void) put_pid(target_pidp); }
+static bool init_called; + static int damon_sample_prcl_enable_store( const char *val, const struct kernel_param *kp) { @@ -134,6 +136,14 @@ static int damon_sample_prcl_enable_stor
static int __init damon_sample_prcl_init(void) { + int err = 0; + + init_called = true; + if (enable) { + err = damon_sample_prcl_start(); + if (err) + enable = false; + } return 0; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Honggyu Kim honggyu.kim@sk.com
[ Upstream commit 793020545cea0c9e2a79de6ad5c9746ec4f5bd7e ]
The damon_{lru_sort,reclaim,stat} kernel modules use "enabled" parameter knobs as follows.
/sys/module/damon_lru_sort/parameters/enabled /sys/module/damon_reclaim/parameters/enabled /sys/module/damon_stat/parameters/enabled
However, other sample modules of damon use "enable" parameter knobs so it'd be better to rename them from "enable" to "enabled" to keep the consistency with other damon modules.
Before: /sys/module/damon_sample_wsse/parameters/enable /sys/module/damon_sample_prcl/parameters/enable /sys/module/damon_sample_mtier/parameters/enable
After: /sys/module/damon_sample_wsse/parameters/enabled /sys/module/damon_sample_prcl/parameters/enabled /sys/module/damon_sample_mtier/parameters/enabled
There is no functional changes in this patch.
Link: https://lkml.kernel.org/r/20250707024548.1964-1-honggyu.kim@sk.com Signed-off-by: Honggyu Kim honggyu.kim@sk.com Reviewed-by: SeongJae Park sj@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: c62cff40481c ("samples/damon/mtier: avoid starting DAMON before initialization") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- samples/damon/mtier.c | 22 +++++++++++----------- samples/damon/prcl.c | 22 +++++++++++----------- samples/damon/wsse.c | 22 +++++++++++----------- 3 files changed, 33 insertions(+), 33 deletions(-)
--- a/samples/damon/mtier.c +++ b/samples/damon/mtier.c @@ -27,14 +27,14 @@ module_param(node1_end_addr, ulong, 0600 static int damon_sample_mtier_enable_store( const char *val, const struct kernel_param *kp);
-static const struct kernel_param_ops enable_param_ops = { +static const struct kernel_param_ops enabled_param_ops = { .set = damon_sample_mtier_enable_store, .get = param_get_bool, };
-static bool enable __read_mostly; -module_param_cb(enable, &enable_param_ops, &enable, 0600); -MODULE_PARM_DESC(enable, "Enable of disable DAMON_SAMPLE_MTIER"); +static bool enabled __read_mostly; +module_param_cb(enabled, &enabled_param_ops, &enabled, 0600); +MODULE_PARM_DESC(enabled, "Enable or disable DAMON_SAMPLE_MTIER");
static struct damon_ctx *ctxs[2];
@@ -156,20 +156,20 @@ static bool init_called; static int damon_sample_mtier_enable_store( const char *val, const struct kernel_param *kp) { - bool enabled = enable; + bool is_enabled = enabled; int err;
- err = kstrtobool(val, &enable); + err = kstrtobool(val, &enabled); if (err) return err;
- if (enable == enabled) + if (enabled == is_enabled) return 0;
- if (enable) { + if (enabled) { err = damon_sample_mtier_start(); if (err) - enable = false; + enabled = false; return err; } damon_sample_mtier_stop(); @@ -181,10 +181,10 @@ static int __init damon_sample_mtier_ini int err = 0;
init_called = true; - if (enable) { + if (enabled) { err = damon_sample_mtier_start(); if (err) - enable = false; + enabled = false; } return 0; } --- a/samples/damon/prcl.c +++ b/samples/damon/prcl.c @@ -17,14 +17,14 @@ module_param(target_pid, int, 0600); static int damon_sample_prcl_enable_store( const char *val, const struct kernel_param *kp);
-static const struct kernel_param_ops enable_param_ops = { +static const struct kernel_param_ops enabled_param_ops = { .set = damon_sample_prcl_enable_store, .get = param_get_bool, };
-static bool enable __read_mostly; -module_param_cb(enable, &enable_param_ops, &enable, 0600); -MODULE_PARM_DESC(enable, "Enable of disable DAMON_SAMPLE_WSSE"); +static bool enabled __read_mostly; +module_param_cb(enabled, &enabled_param_ops, &enabled, 0600); +MODULE_PARM_DESC(enabled, "Enable or disable DAMON_SAMPLE_PRCL");
static struct damon_ctx *ctx; static struct pid *target_pidp; @@ -114,20 +114,20 @@ static bool init_called; static int damon_sample_prcl_enable_store( const char *val, const struct kernel_param *kp) { - bool enabled = enable; + bool is_enabled = enabled; int err;
- err = kstrtobool(val, &enable); + err = kstrtobool(val, &enabled); if (err) return err;
- if (enable == enabled) + if (enabled == is_enabled) return 0;
- if (enable) { + if (enabled) { err = damon_sample_prcl_start(); if (err) - enable = false; + enabled = false; return err; } damon_sample_prcl_stop(); @@ -139,10 +139,10 @@ static int __init damon_sample_prcl_init int err = 0;
init_called = true; - if (enable) { + if (enabled) { err = damon_sample_prcl_start(); if (err) - enable = false; + enabled = false; } return 0; } --- a/samples/damon/wsse.c +++ b/samples/damon/wsse.c @@ -18,14 +18,14 @@ module_param(target_pid, int, 0600); static int damon_sample_wsse_enable_store( const char *val, const struct kernel_param *kp);
-static const struct kernel_param_ops enable_param_ops = { +static const struct kernel_param_ops enabled_param_ops = { .set = damon_sample_wsse_enable_store, .get = param_get_bool, };
-static bool enable __read_mostly; -module_param_cb(enable, &enable_param_ops, &enable, 0600); -MODULE_PARM_DESC(enable, "Enable or disable DAMON_SAMPLE_WSSE"); +static bool enabled __read_mostly; +module_param_cb(enabled, &enabled_param_ops, &enabled, 0600); +MODULE_PARM_DESC(enabled, "Enable or disable DAMON_SAMPLE_WSSE");
static struct damon_ctx *ctx; static struct pid *target_pidp; @@ -94,20 +94,20 @@ static bool init_called; static int damon_sample_wsse_enable_store( const char *val, const struct kernel_param *kp) { - bool enabled = enable; + bool is_enabled = enabled; int err;
- err = kstrtobool(val, &enable); + err = kstrtobool(val, &enabled); if (err) return err;
- if (enable == enabled) + if (enabled == is_enabled) return 0;
- if (enable) { + if (enabled) { err = damon_sample_wsse_start(); if (err) - enable = false; + enabled = false; return err; } damon_sample_wsse_stop(); @@ -119,10 +119,10 @@ static int __init damon_sample_wsse_init int err = 0;
init_called = true; - if (enable) { + if (enabled) { err = damon_sample_wsse_start(); if (err) - enable = false; + enabled = false; } return err; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
[ Upstream commit c62cff40481c037307a13becbda795f7afdcfebd ]
Commit 964314344eab ("samples/damon/mtier: support boot time enable setup") is somehow incompletely applying the origin patch [1]. It is missing the part that avoids starting DAMON before module initialization. Probably a mistake during a merge has happened. Fix it by applying the missed part again.
Link: https://lkml.kernel.org/r/20250909022238.2989-4-sj@kernel.org Link: https://lore.kernel.org/20250706193207.39810-4-sj@kernel.org [1] Fixes: 964314344eab ("samples/damon/mtier: support boot time enable setup") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- samples/damon/mtier.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/samples/damon/mtier.c +++ b/samples/damon/mtier.c @@ -166,6 +166,9 @@ static int damon_sample_mtier_enable_sto if (enabled == is_enabled) return 0;
+ if (!init_called) + return 0; + if (enabled) { err = damon_sample_mtier_start(); if (err)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen-Yu Tsai wens@csie.org
commit 25fbbaf515acd13399589bd5ee6de5f35740cef2 upstream.
When dual-divider clock support was introduced, the P divider offset was left out of the .recalc_rate readback function. This causes the clock rate to become bogus or even zero (possibly due to the P divider being 1, leading to a divide-by-zero).
Fix this by incorporating the P divider offset into the calculation.
Fixes: 45717804b75e ("clk: sunxi-ng: mp: introduce dual-divider clock") Reviewed-by: Andre Przywara andre.przywara@arm.com Link: https://patch.msgid.link/20250830170901.1996227-4-wens@kernel.org Signed-off-by: Chen-Yu Tsai wens@csie.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/sunxi-ng/ccu_mp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/clk/sunxi-ng/ccu_mp.c b/drivers/clk/sunxi-ng/ccu_mp.c index 354c981943b6..4221b1888b38 100644 --- a/drivers/clk/sunxi-ng/ccu_mp.c +++ b/drivers/clk/sunxi-ng/ccu_mp.c @@ -185,7 +185,7 @@ static unsigned long ccu_mp_recalc_rate(struct clk_hw *hw, p &= (1 << cmp->p.width) - 1;
if (cmp->common.features & CCU_FEATURE_DUAL_DIV) - rate = (parent_rate / p) / m; + rate = (parent_rate / (p + cmp->p.offset)) / m; else rate = (parent_rate >> p) / m;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
commit e6b733ca2f99e968d696c2e812c8eb8e090bf37b upstream.
Commit 2780505ec2b4 ("samples/damon/prcl: fix boot time enable crash") is somehow incompletely applying the origin patch [1]. It is missing the part that avoids starting DAMON before module initialization. Probably a mistake during a merge has happened. Fix it by applying the missed part again.
Link: https://lkml.kernel.org/r/20250909022238.2989-3-sj@kernel.org Link: https://lore.kernel.org/20250706193207.39810-3-sj@kernel.org [1] Fixes: 2780505ec2b4 ("samples/damon/prcl: fix boot time enable crash") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- samples/damon/prcl.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/samples/damon/prcl.c +++ b/samples/damon/prcl.c @@ -124,6 +124,9 @@ static int damon_sample_prcl_enable_stor if (enabled == is_enabled) return 0;
+ if (!init_called) + return 0; + if (enabled) { err = damon_sample_prcl_start(); if (err)
On 9/22/2025 12:28 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Mon, Sep 22, 2025 at 09:28:20PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On Tue, 23 Sept 2025 at 01:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.16.9-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: fef8d1e3eca6557cae4f0149eb2071123c473c26 * git describe: v6.16.8-150-gfef8d1e3eca6 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.16.y/build/v6.16....
## Test Regressions (compared to v6.16.6-198-gfb25a6be32b3)
## Metric Regressions (compared to v6.16.6-198-gfb25a6be32b3)
## Test Fixes (compared to v6.16.6-198-gfb25a6be32b3)
## Metric Fixes (compared to v6.16.6-198-gfb25a6be32b3)
## Test result summary total: 349618, pass: 322571, fail: 6726, skip: 20321, xfail: 0
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 139 total, 138 passed, 1 failed * arm64: 57 total, 56 passed, 1 failed * i386: 18 total, 18 passed, 0 failed * mips: 34 total, 27 passed, 7 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 40 total, 39 passed, 1 failed * riscv: 25 total, 25 passed, 0 failed * s390: 22 total, 22 passed, 0 failed * sh: 5 total, 5 passed, 0 failed * sparc: 4 total, 3 passed, 1 failed * x86_64: 49 total, 47 passed, 2 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mm * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-rust * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * lava * libgpiod * libhugetlbfs * log-parser-boot * log-parser-build-clang * log-parser-build-gcc * log-parser-test * ltp-capability * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * modules * perf * rcutorture * rt-tests-cyclicdeadline * rt-tests-pi-stress * rt-tests-pmqtest * rt-tests-rt-migrate-test * rt-tests-signaltest
-- Linaro LKFT https://lkft.linaro.org
Hi Greg
On Tue, Sep 23, 2025 at 4:42 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
6.16.9-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.16.9-rc1rv-gfef8d1e3eca6 (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 15.2.1 20250813, GNU ld (GNU Binutils) 2.45.0) #1 SMP PREEMPT_DYNAMIC Tue Sep 23 15:36:05 JST 2025
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
# Librecast Test Results
010/010 [ OK ] libmld 120/120 [ OK ] liblibrecast
CPU/kernel: Linux auntie 6.16.9-rc1-gfef8d1e3eca6 #87 SMP PREEMPT_DYNAMIC Tue Sep 23 07:06:54 -00 2025 x86_64 AMD Ryzen 9 9950X 16-Core Processor AuthenticAMD GNU/Linux
Tested-by: Brett A C Sheffield bacs@librecast.net
[2025-09-22 21:28] Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
Kernel 6.16.9-rc1 compiles fine for x86_64 using GCC 15.2.1+r22+gc4e96a094636 and binutils 2.45+r29+g2b2e51a31ec7, and the resulting kernel boots and runs without any discernible issues on various physical machines of mine (Ivy Bridge, Haswell, Kaby Lake, Coffee Lake) and on a Zen 2 VM and a bunch of Kaby Lake VMs.
Tested-by: Pascal Ernster git@hardfalcon.net
On Mon, 22 Sep 2025 21:28:20 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.16: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 120 tests: 120 pass, 0 fail
Linux version: 6.16.9-rc1-gfef8d1e3eca6 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra186-p3509-0000+p3636-0001, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Mon, Sep 22, 2025 at 09:28:20PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
Am 22.09.2025 um 21:28 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
On 9/22/25 12:28, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
Hi
no regressions here on x86_64 (RKL, Intel 11th Gen. CPU)
Thanks
Tested-by: Ronald Warsow rwarsow@gmx.de
On Mon, 22 Sep 2025 21:28:20 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
Boot-tested under QEMU for Rust x86_64, arm64 and riscv64; built-tested for arm and loongarch64:
Tested-by: Miguel Ojeda ojeda@kernel.org
Thanks!
Cheers, Miguel
On 9/22/25 13:28, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
The kernel, bpf tool, perf tool, and kselftest builds fine for v6.16.9-rc1 on x86 and arm64 Azure VM.
Tested-by: Hardik Garg hargar@linux.microsoft.com
Thanks, Hardik
On Tue, Sep 23, 2025 at 1:12 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Build and boot tested 6.16.9-rc1 using qemu-x86_64. The kernel was successfully built and booted in a virtualized environment without issues.
Build kernel: 6.16.9-rc1 git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git commit: fef8d1e3eca6557cae4f0149eb2071123c473c26
Tested-by: Dileep Malepu dileep.debian@gmail.com
Best regards Dileep Malepu.
On Mon Sep 22, 2025 at 9:28 PM CEST, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.9 release. There are 149 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Sep 2025 19:23:52 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.9-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Thanks! Build-tested on all Alpine architectures & boot-tested on x86_64.
Tested-By: Achill Gilgenast achill@achill.org
linux-stable-mirror@lists.linaro.org