This is the start of the stable review cycle for the 6.12.17 release. There are 154 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 26 Feb 2025 14:25:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.17-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.12.17-rc1
Alex Deucher alexander.deucher@amd.com drm/amdgpu: bump version for RV/PCO compute fix
Alex Deucher alexander.deucher@amd.com drm/amdgpu/gfx9: manually control gfxoff for CS on RV
Tianling Shen cnsztl@gmail.com arm64: dts: rockchip: change eth phy mode to rgmii-id for orangepi r1 plus lts
Kevin Brodsky kevin.brodsky@arm.com selftests/mm: build with -O2
Tejun Heo tj@kernel.org sched_ext: Fix incorrect assumption about migration disabled tasks in task_can_run_on_remote_rq()
Kory Maincent kory.maincent@bootlin.com net: pse-pd: Fix deadlock in current limit functions
Steven Rostedt rostedt@goodmis.org tracing: Fix using ret variable in tracing_set_tracer()
Steven Rostedt rostedt@goodmis.org ftrace: Do not add duplicate entries in subops manager ops
Steven Rostedt rostedt@goodmis.org ftrace: Fix accounting of adding subops to a manager ops
Sebastian Andrzej Siewior bigeasy@linutronix.de ftrace: Correct preemption accounting for function tracing.
Komal Bajaj quic_kbajaj@quicinc.com EDAC/qcom: Correct interrupt enable register configuration
Haoxiang Li haoxiang_li2024@163.com smb: client: Add check for next_buffer in receive_encrypted_standard()
Marc Zyngier maz@kernel.org irqchip/gic-v3: Fix rk3399 workaround when secure interrupts are enabled
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix event constraints for LNC
Niravkumar L Rabara niravkumar.l.rabara@intel.com mtd: rawnand: cadence: fix incorrect device in dma_unmap_single
Niravkumar L Rabara niravkumar.l.rabara@intel.com mtd: rawnand: cadence: use dma_map_resource for sdma address
Niravkumar L Rabara niravkumar.l.rabara@intel.com mtd: rawnand: cadence: fix error code in cadence_nand_init()
Amit Kumar Mahapatra amit.kumar-mahapatra@amd.com mtd: spi-nor: sst: Fix SST write failure
Ricardo Cañuelo Navarro rcn@igalia.com mm,madvise,hugetlb: check for 0-length range after end address adjustment
Christian Brauner brauner@kernel.org acct: block access to kernel internal filesystems
Christian Brauner brauner@kernel.org acct: perform last write from workqueue
Peter Ujfalusi peter.ujfalusi@linux.intel.com ASoC: SOF: pcm: Clear the susbstream pointer to NULL on close
John Veness john-linux@pelago.org.uk ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED
Wentao Liang vulab@iscas.ac.cn ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls()
Nikita Zhandarovich n.zhandarovich@fintech.ru ASoC: fsl_micfil: Enable default case in micfil_set_quality()
Peter Ujfalusi peter.ujfalusi@linux.intel.com ASoC: SOF: stream-ipc: Check for cstream nullity in sof_ipc_msg_data()
Joshua Washington joshwash@google.com gve: set xdp redirect target only when it is available
Haoxiang Li haoxiang_li2024@163.com nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
Paulo Alcantara pc@manguebit.com smb: client: fix chmod(2) regression with ATTR_READONLY
Pavel Begunkov asml.silence@gmail.com lib/iov_iter: fix import_iovec_ubuf iovec management
Darrick J. Wong djwong@kernel.org xfs: fix online repair probing when CONFIG_XFS_ONLINE_REPAIR=n
Heiko Carstens hca@linux.ibm.com s390/boot: Fix ESSA detection
Haoxiang Li haoxiang_li2024@163.com soc: loongson: loongson2_guts: Add check for devm_kstrdup()
Lukasz Czechowski lukasz.czechowski@thaumatec.com arm64: dts: rockchip: Disable DMA for uart5 on px30-ringneck
Lukasz Czechowski lukasz.czechowski@thaumatec.com arm64: dts: rockchip: Move uart5 pin configuration to px30 ringneck SoM
Alexander Shiyan eagle.alexander923@gmail.com arm64: dts: rockchip: Fix broken tsadc pinctrl names for rk3588
David Hildenbrand david@redhat.com mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
Gavrilov Ilia Ilia.Gavrilov@infotecs.ru drop_monitor: fix incorrect initialization order
Sumit Garg sumit.garg@linaro.org tee: optee: Fix supplicant wait loop
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpiolib: protect gpio_chip with SRCU in array_info paths in multi get/set
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpiolib: check the return value of gpio_chip::get_direction()
Pavel Begunkov asml.silence@gmail.com io_uring: prevent opcode speculation
Pavel Begunkov asml.silence@gmail.com io_uring/rw: forbid multishot async reads
Krzysztof Karas krzysztof.karas@intel.com drm/i915/gt: Use spin_lock_irqsave() in interruptible context
Imre Deak imre.deak@intel.com drm/i915/ddi: Fix HDMI port width programming in DDI_BUF_CTL
Imre Deak imre.deak@intel.com drm/i915/dp: Fix error handling during 128b/132b link training
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Make sure all planes in use by the joiner have their crtc included
Jessica Zhang quic_jesszhan@quicinc.com drm/msm/dpu: Disable dither in phys encoder cleanup
Hugo Villeneuve hvilleneuve@dimonoff.com drm: panel: jd9365da-h3: fix reset signal polarity
Artur Rojek contact@artur-rojek.eu irqchip/jcore-aic, clocksource/drivers/jcore: Fix jcore-pit interrupt request
Aaron Kling webgeek1234@gmail.com drm/nouveau/pmu: Fix gp10b firmware guard
Yan Zhai yan@cloudflare.com bpf: skip non exist keys in generic_map_lookup_batch
Caleb Sander Mateos csander@purestorage.com nvme/ioctl: add missing space in err message
Caleb Sander Mateos csander@purestorage.com nvme-tcp: fix connect failure on receiving partial ICResp PDU
Damien Le Moal dlemoal@kernel.org nvme: tcp: Fix compilation warning with W=1
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org drm/msm/dsi/phy: Do not overwite PHY_CMN_CLK_CFG1 when choosing bitclk source
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org drm/msm/dsi/phy: Protect PHY_CMN_CLK_CFG1 against clock driver
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org drm/msm/dsi/phy: Protect PHY_CMN_CLK_CFG0 updated from driver side
Marijn Suijten marijn.suijten@somainline.org drm/msm/dpu: Don't leak bits_per_component into random DSC_ENC fields
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: enable DPU_WB_INPUT_CTRL for DPU 5.x
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: skip watchdog timer programming through TOP on >= SM8450
Rob Clark robdclark@chromium.org drm/msm: Avoid rounding up to one jiffy
David Hildenbrand david@redhat.com nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range()
Geert Uytterhoeven geert+renesas@glider.be platform: cznic: CZNIC_PLATFORMS should depend on ARCH_MVEBU
Geert Uytterhoeven geert+renesas@glider.be firmware: imx: IMX_SCMI_MISC_DRV should depend on ARCH_MXC
Bart Van Assche bvanassche@acm.org md/raid*: Fix the set_queue_limits implementations
Peng Fan peng.fan@nxp.com firmware: arm_scmi: imx: Correct tx size of scmi_imx_misc_ctrl_set
Patrick Wildt patrick@blueri.se arm64: dts: rockchip: adjust SMMU interrupt type on rk3588
Alan Maguire alan.maguire@oracle.com bpf: Fix softlockup in arena_map_free on 64k page kernel
Kuniyuki Iwashima kuniyu@amazon.com net: Add rx_skb of kfree_skb to raw_tp_null_args[].
Kumar Kartikeya Dwivedi memxor@gmail.com selftests/bpf: Add tests for raw_tp null handling
Chris Morgan macromorgan@hotmail.com power: supply: axp20x_battery: Fix fault handling for AXP717
Andrey Vatoropin a.vatoropin@crpt.ru power: supply: da9150-fg: fix potential overflow
Andy Yan andyshrk@163.com arm64: dts: rockchip: Fix lcdpwr_en pin for Cool Pi GenBook
Abel Wu wuyun.abel@bytedance.com bpf: Fix deadlock when freeing cgroup storage
Jiayuan Chen mrpre@163.com bpf: Disable non stream socket for strparser
Jiayuan Chen mrpre@163.com bpf: Fix wrong copied_seq calculation
Jiayuan Chen mrpre@163.com strparser: Add read_sock callback
Andrii Nakryiko andrii@kernel.org bpf: avoid holding freeze_mutex during mmap operation
Andrii Nakryiko andrii@kernel.org bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
Shigeru Yoshida syoshida@redhat.com bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
Paolo Abeni pabeni@redhat.com net: allow small head cache usage with large MAX_SKB_FRAGS values
Sabrina Dubroca sd@queasysnail.net tcp: drop secpath at the same time as we currently drop dst
Nick Hu nick.hu@sifive.com net: axienet: Set mac_managed_pm
Breno Leitao leitao@debian.org arp: switch to dev_getbyhwaddr() in arp_req_set_public()
Breno Leitao leitao@debian.org net: Add non-RCU dev_getbyhwaddr() helper
Cong Wang xiyou.wangcong@gmail.com flow_dissector: Fix port range key handling in BPF conversion
Cong Wang xiyou.wangcong@gmail.com flow_dissector: Fix handling of mixed port and port-range keys
Kuniyuki Iwashima kuniyu@amazon.com geneve: Suppress list corruption splat in geneve_destroy_tunnels().
Kuniyuki Iwashima kuniyu@amazon.com gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
Kory Maincent kory.maincent@bootlin.com net: pse-pd: pd692x0: Fix power limit retrieval
Kory Maincent kory.maincent@bootlin.com net: pse-pd: Use power limit at driver side instead of current limit
Kory Maincent kory.maincent@bootlin.com net: pse-pd: Avoid setting max_uA in regulator constraints
Jakub Kicinski kuba@kernel.org tcp: adjust rcvq_space after updating scaling ratio
Michal Luczaj mhal@rbox.co vsock/bpf: Warn on socket without transport
Michal Luczaj mhal@rbox.co sockmap, vsock: For connectible sockets allow only connected
Nick Child nnac123@linux.ibm.com ibmvnic: Don't reference skb after sending to VIOS
Nick Child nnac123@linux.ibm.com ibmvnic: Add stat for tx direct vs tx batched
Julian Ruess julianr@linux.ibm.com s390/ism: add release function for struct device
Takashi Iwai tiwai@suse.de ALSA: seq: Drop UMP events when no UMP-conversion is set
Pierre Riteau pierre@stackhpc.com net/sched: cls_api: fix error handling causing NULL dereference
Vitaly Rodionov vitalyr@opensource.cirrus.com ALSA: hda/cirrus: Correct the full scale volume set logic
Kuniyuki Iwashima kuniyu@amazon.com geneve: Fix use-after-free in geneve_find_dev().
Junnan Wu junnan01.wu@samsung.com vsock/virtio: fix variables initialization during resuming
Shengjiu Wang shengjiu.wang@nxp.com ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device
Christophe Leroy christophe.leroy@csgroup.eu powerpc/code-patching: Fix KASAN hit by not flagging text patching area as VM_ALLOC
Kailang Yang kailang@realtek.com ALSA: hda/realtek: Fixup ALC225 depop procedure
Christophe Leroy christophe.leroy@csgroup.eu powerpc/64s: Rewrite __real_pte() and __rpte_to_hidx() as static inline
Christophe Leroy christophe.leroy@csgroup.eu powerpc/code-patching: Disable KASAN report during patching via temporary mm
Peter Ujfalusi peter.ujfalusi@linux.intel.com ASoC: SOF: ipc4-topology: Harden loops for looking up ALH copiers
John Keeping jkeeping@inmusicbrands.com ASoC: rockchip: i2s-tdm: fix shift config for SND_SOC_DAIFMT_DSP_[AB]
Tejun Heo tj@kernel.org sched_ext: Fix migration disabled handling in targeted dispatches
Tejun Heo tj@kernel.org sched_ext: Factor out move_task_between_dsqs() from scx_dispatch_from_dsq()
Jill Donahue jilliandonahue58@gmail.com USB: gadget: f_midi: f_midi_complete to call queue_work
Steven Rostedt rostedt@goodmis.org tracing: Have the error of __tracing_resize_ring_buffer() passed to user
Steven Rostedt rostedt@goodmis.org tracing: Switch trace.c code over to use guard()
Lancelot SIX lancelot.six@amd.com drm/amdkfd: Ensure consistent barrier state saved in gfx12 trap handler
Jay Cornwall jay.cornwall@amd.com drm/amdkfd: Move gfx12 trap handler to separate file
Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com accel/ivpu: Fix error handling in recovery/reset
Tomasz Rusinowicz tomasz.rusinowicz@intel.com accel/ivpu: Add FW state dump on TDR
Karol Wachowski karol.wachowski@intel.com accel/ivpu: Add coredump support
Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com accel/ivpu: Limit FW version string length
Chen-Yu Tsai wenst@chromium.org arm64: dts: mediatek: mt8183: Disable DSI display output by default
Fabien Parent fparent@baylibre.com arm64: dts: mediatek: mt8183-pumpkin: add HDMI support
Takashi Iwai tiwai@suse.de PCI: Restore original INTX_DISABLE bit by pcim_intx()
Philipp Stanner pstanner@redhat.com PCI: Remove devres from pci_intx()
Philipp Stanner pstanner@redhat.com PCI: Export pci_intx_unmanaged() and pcim_intx()
Philipp Stanner pstanner@redhat.com PCI: Make pcim_request_all_regions() a public function
Dan Carpenter dan.carpenter@linaro.org ASoC: renesas: rz-ssi: Add a check for negative sample_space
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com ASoC: renesas: rz-ssi: Terminate all the DMA transactions
Dmitry Torokhov dmitry.torokhov@gmail.com Input: synaptics - fix crash when enabling pass-through port
Dmitry Torokhov dmitry.torokhov@gmail.com Input: serio - define serio_pause_rx guard to pause and resume serio ports
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: qca: Fix poor RF performance for WCN6855
Cheng Jiang quic_chejiang@quicinc.com Bluetooth: qca: Update firmware-name to support board specific nvm
loanchen lo-an.chen@amd.com drm/amd/display: Correct register address in dcn35
Charlene Liu Charlene.Liu@amd.com drm/amd/display: update dcn351 used clock offset
Lohita Mudimela lohita.mudimela@amd.com drm/amd/display: Refactoring if and endif statements to enable DC_LOGGER
Chao Gao chao.gao@intel.com KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID
Sean Christopherson seanjc@google.com KVM: x86: Inline kvm_get_apic_mode() in lapic.h
Sean Christopherson seanjc@google.com KVM: x86: Get vcpu->arch.apic_base directly and drop kvm_get_apic_base()
Qu Wenruo wqu@suse.com btrfs: fix double accounting race when extent_writepage_io() failed
Qu Wenruo wqu@suse.com btrfs: fix double accounting race when btrfs_run_delalloc_range() failed
David Sterba dsterba@suse.com btrfs: use btrfs_inode in extent_writepage()
Qu Wenruo wqu@suse.com btrfs: rename btrfs_folio_(set|start|end)_writer_lock()
Qu Wenruo wqu@suse.com btrfs: unify to use writer locks for subpage locking
Qu Wenruo wqu@suse.com btrfs: remove unused btrfs_folio_start_writer_lock()
Qu Wenruo wqu@suse.com btrfs: mark all dirty sectors as locked inside writepage_delalloc()
Qu Wenruo wqu@suse.com btrfs: move the delalloc range bitmap search into extent_io.c
Qu Wenruo wqu@suse.com btrfs: do not assume the full page range is not dirty in extent_writepage_io()
Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com xe/oa: Fix query mode of operation for OAR/OAC
Ashutosh Dixit ashutosh.dixit@intel.com drm/xe/oa: Add input fence dependencies
Ashutosh Dixit ashutosh.dixit@intel.com drm/xe/oa/uapi: Define and parse OA sync properties
Ashutosh Dixit ashutosh.dixit@intel.com drm/xe/oa: Separate batch submission from waiting for completion
Catalin Marinas catalin.marinas@arm.com arm64: mte: Do not allow PROT_MTE on MAP_HUGETLB user mappings
-------------
Diffstat:
Documentation/networking/strparser.rst | 9 +- Makefile | 4 +- arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts | 123 ++- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 1 + .../boot/dts/rockchip/px30-ringneck-haikou.dts | 1 - arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi | 6 + .../dts/rockchip/rk3328-orangepi-r1-plus-lts.dts | 6 +- arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 22 +- .../dts/rockchip/rk3588-coolpi-cm5-genbook.dts | 4 +- arch/arm64/include/asm/mman.h | 9 +- arch/powerpc/include/asm/book3s/64/hash-4k.h | 12 +- arch/powerpc/lib/code-patching.c | 4 +- arch/s390/boot/startup.c | 2 +- arch/x86/events/intel/core.c | 20 +- arch/x86/events/intel/ds.c | 2 +- arch/x86/kvm/lapic.c | 11 + arch/x86/kvm/lapic.h | 8 +- arch/x86/kvm/vmx/nested.c | 5 + arch/x86/kvm/vmx/vmx.c | 21 + arch/x86/kvm/vmx/vmx.h | 1 + arch/x86/kvm/x86.c | 17 +- drivers/accel/ivpu/Kconfig | 1 + drivers/accel/ivpu/Makefile | 1 + drivers/accel/ivpu/ivpu_coredump.c | 39 + drivers/accel/ivpu/ivpu_coredump.h | 25 + drivers/accel/ivpu/ivpu_drv.c | 5 +- drivers/accel/ivpu/ivpu_drv.h | 1 + drivers/accel/ivpu/ivpu_fw.c | 7 +- drivers/accel/ivpu/ivpu_fw.h | 6 +- drivers/accel/ivpu/ivpu_fw_log.h | 8 - drivers/accel/ivpu/ivpu_hw.c | 3 + drivers/accel/ivpu/ivpu_ipc.c | 26 + drivers/accel/ivpu/ivpu_ipc.h | 2 + drivers/accel/ivpu/ivpu_jsm_msg.c | 8 + drivers/accel/ivpu/ivpu_jsm_msg.h | 2 + drivers/accel/ivpu/ivpu_pm.c | 85 +- drivers/bluetooth/btqca.c | 118 +- drivers/clocksource/jcore-pit.c | 15 +- drivers/edac/qcom_edac.c | 4 +- .../firmware/arm_scmi/vendors/imx/imx-sm-misc.c | 4 +- drivers/firmware/imx/Kconfig | 1 + drivers/gpio/gpiolib.c | 92 +- drivers/gpio/gpiolib.h | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 +- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm | 202 +--- .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm | 1130 ++++++++++++++++++++ drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile | 2 +- drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 5 +- .../amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c | 5 +- .../amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c | 6 +- .../amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c | 140 +++ .../amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 131 ++- .../amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h | 4 + .../drm/amd/display/dc/inc/hw/clk_mgr_internal.h | 59 + .../display/dc/link/protocols/link_dp_capability.c | 3 +- drivers/gpu/drm/i915/display/intel_ddi.c | 2 +- drivers/gpu/drm/i915/display/intel_display.c | 18 + .../gpu/drm/i915/display/intel_dp_link_training.c | 15 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 +- drivers/gpu/drm/i915/i915_reg.h | 2 +- .../gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h | 2 +- .../drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h | 2 +- .../gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h | 2 +- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 3 + drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 3 +- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c | 2 +- drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c | 53 +- drivers/gpu/drm/msm/msm_drv.h | 11 +- .../gpu/drm/msm/registers/display/dsi_phy_7nm.xml | 11 +- drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 2 +- drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c | 8 +- drivers/gpu/drm/xe/xe_oa.c | 265 +++-- drivers/gpu/drm/xe/xe_oa_types.h | 6 + drivers/gpu/drm/xe/xe_query.c | 2 +- drivers/gpu/drm/xe/xe_ring_ops.c | 5 +- drivers/gpu/drm/xe/xe_sched_job_types.h | 2 + drivers/input/mouse/synaptics.c | 56 +- drivers/input/mouse/synaptics.h | 1 + drivers/irqchip/irq-gic-v3.c | 49 +- drivers/irqchip/irq-jcore-aic.c | 2 +- drivers/md/raid0.c | 4 +- drivers/md/raid1.c | 4 +- drivers/md/raid10.c | 4 +- drivers/mtd/nand/raw/cadence-nand-controller.c | 42 +- drivers/mtd/spi-nor/sst.c | 2 +- drivers/net/ethernet/google/gve/gve.h | 10 + drivers/net/ethernet/google/gve/gve_main.c | 6 +- drivers/net/ethernet/ibm/ibmvnic.c | 27 +- drivers/net/ethernet/ibm/ibmvnic.h | 3 +- drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 2 + drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 + drivers/net/geneve.c | 16 +- drivers/net/gtp.c | 5 - drivers/net/pse-pd/pd692x0.c | 45 +- drivers/net/pse-pd/pse_core.c | 98 +- drivers/nvme/host/ioctl.c | 3 +- drivers/nvme/host/tcp.c | 7 +- drivers/pci/devres.c | 61 +- drivers/pci/pci.c | 16 +- drivers/platform/cznic/Kconfig | 1 + drivers/power/supply/axp20x_battery.c | 31 +- drivers/power/supply/da9150-fg.c | 4 +- drivers/s390/net/ism_drv.c | 14 +- drivers/soc/loongson/loongson2_guts.c | 5 +- drivers/tee/optee/supp.c | 35 +- drivers/usb/gadget/function/f_midi.c | 2 +- fs/btrfs/compression.c | 3 +- fs/btrfs/extent_io.c | 174 ++- fs/btrfs/inode.c | 3 +- fs/btrfs/subpage.c | 202 +--- fs/btrfs/subpage.h | 39 +- fs/smb/client/inode.c | 4 +- fs/smb/client/smb2ops.c | 4 + fs/xfs/scrub/common.h | 5 - fs/xfs/scrub/repair.h | 11 +- fs/xfs/scrub/scrub.c | 12 + include/linux/netdevice.h | 2 + include/linux/pci.h | 2 + include/linux/pse-pd/pse.h | 16 +- include/linux/serio.h | 3 + include/linux/skmsg.h | 2 + include/net/gro.h | 3 + include/net/strparser.h | 2 + include/net/tcp.h | 22 + include/uapi/drm/xe_drm.h | 17 + io_uring/io_uring.c | 2 + io_uring/rw.c | 13 +- kernel/acct.c | 134 ++- kernel/bpf/arena.c | 2 +- kernel/bpf/bpf_cgrp_storage.c | 2 +- kernel/bpf/btf.c | 2 + kernel/bpf/ringbuf.c | 4 - kernel/bpf/syscall.c | 43 +- kernel/sched/ext.c | 150 ++- kernel/trace/ftrace.c | 36 +- kernel/trace/trace.c | 277 ++--- kernel/trace/trace_functions.c | 6 +- lib/iov_iter.c | 3 +- mm/madvise.c | 11 +- mm/migrate_device.c | 13 +- net/bpf/test_run.c | 5 +- net/core/dev.c | 37 +- net/core/drop_monitor.c | 39 +- net/core/flow_dissector.c | 49 +- net/core/gro.c | 3 - net/core/skbuff.c | 10 +- net/core/skmsg.c | 7 + net/core/sock_map.c | 8 +- net/ipv4/arp.c | 2 +- net/ipv4/tcp.c | 29 +- net/ipv4/tcp_bpf.c | 36 + net/ipv4/tcp_fastopen.c | 4 +- net/ipv4/tcp_input.c | 20 +- net/ipv4/tcp_ipv4.c | 2 +- net/sched/cls_api.c | 2 +- net/strparser/strparser.c | 11 +- net/vmw_vsock/af_vsock.c | 3 + net/vmw_vsock/virtio_transport.c | 10 +- net/vmw_vsock/vsock_bpf.c | 2 +- sound/core/seq/seq_clientmgr.c | 12 +- sound/pci/hda/hda_codec.c | 4 +- sound/pci/hda/patch_conexant.c | 1 + sound/pci/hda/patch_cs8409-tables.c | 6 +- sound/pci/hda/patch_cs8409.c | 20 +- sound/pci/hda/patch_cs8409.h | 5 +- sound/pci/hda/patch_realtek.c | 1 + sound/soc/fsl/fsl_micfil.c | 2 + sound/soc/fsl/imx-audmix.c | 31 - sound/soc/rockchip/rockchip_i2s_tdm.c | 4 +- sound/soc/sh/rz-ssi.c | 10 +- sound/soc/sof/ipc4-topology.c | 12 +- sound/soc/sof/pcm.c | 2 + sound/soc/sof/stream-ipc.c | 6 +- .../selftests/bpf/bpf_testmod/bpf_testmod-events.h | 8 + .../selftests/bpf/bpf_testmod/bpf_testmod.c | 2 + .../testing/selftests/bpf/prog_tests/raw_tp_null.c | 25 + tools/testing/selftests/bpf/progs/raw_tp_null.c | 32 + tools/testing/selftests/mm/Makefile | 9 +- 181 files changed, 3590 insertions(+), 1560 deletions(-)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Catalin Marinas catalin.marinas@arm.com
PROT_MTE (memory tagging extensions) is not supported on all user mmap() types for various reasons (memory attributes, backing storage, CoW handling). The arm64 arch_validate_flags() function checks whether the VM_MTE_ALLOWED flag has been set for a vma during mmap(), usually by arch_calc_vm_flag_bits().
Linux prior to 6.13 does not support PROT_MTE hugetlb mappings. This was added by commit 25c17c4b55de ("hugetlb: arm64: add mte support"). However, earlier kernels inadvertently set VM_MTE_ALLOWED on (MAP_ANONYMOUS | MAP_HUGETLB) mappings by only checking for MAP_ANONYMOUS.
Explicitly check MAP_HUGETLB in arch_calc_vm_flag_bits() and avoid setting VM_MTE_ALLOWED for such mappings.
Fixes: 9f3419315f3c ("arm64: mte: Add PROT_MTE support to mmap() and mprotect()") Cc: stable@vger.kernel.org # 5.10.x-6.12.x Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/mman.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -41,9 +41,12 @@ static inline unsigned long arch_calc_vm * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ - if (system_supports_mte() && - ((flags & MAP_ANONYMOUS) || shmem_file(file))) - return VM_MTE_ALLOWED; + if (system_supports_mte()) { + if ((flags & MAP_ANONYMOUS) && !(flags & MAP_HUGETLB)) + return VM_MTE_ALLOWED; + if (shmem_file(file)) + return VM_MTE_ALLOWED; + }
return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ashutosh Dixit ashutosh.dixit@intel.com
[ Upstream commit dddcb19ad4d4bbe943a72a1fb3266c6e8aa8d541 ]
When we introduce xe_syncs, we don't wait for internal OA programming batches to complete. That is, xe_syncs are signaled asynchronously. In anticipation for this, separate out batch submission from waiting for completion of those batches.
v2: Change return type of xe_oa_submit_bb to "struct dma_fence *" (Matt B) v3: Retain init "int err = 0;" in xe_oa_submit_bb (Jose)
Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com Signed-off-by: Ashutosh Dixit ashutosh.dixit@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-2-ashut... Stable-dep-of: f0ed39830e60 ("xe/oa: Fix query mode of operation for OAR/OAC") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_oa.c | 57 +++++++++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c index 6fc00d63b2857..3328529774cb7 100644 --- a/drivers/gpu/drm/xe/xe_oa.c +++ b/drivers/gpu/drm/xe/xe_oa.c @@ -567,11 +567,10 @@ static __poll_t xe_oa_poll(struct file *file, poll_table *wait) return ret; }
-static int xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb) +static struct dma_fence *xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb) { struct xe_sched_job *job; struct dma_fence *fence; - long timeout; int err = 0;
/* Kernel configuration is issued on stream->k_exec_q, not stream->exec_q */ @@ -585,14 +584,9 @@ static int xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb) fence = dma_fence_get(&job->drm.s_fence->finished); xe_sched_job_push(job);
- timeout = dma_fence_wait_timeout(fence, false, HZ); - dma_fence_put(fence); - if (timeout < 0) - err = timeout; - else if (!timeout) - err = -ETIME; + return fence; exit: - return err; + return ERR_PTR(err); }
static void write_cs_mi_lri(struct xe_bb *bb, const struct xe_oa_reg *reg_data, u32 n_regs) @@ -656,6 +650,7 @@ static void xe_oa_store_flex(struct xe_oa_stream *stream, struct xe_lrc *lrc, static int xe_oa_modify_ctx_image(struct xe_oa_stream *stream, struct xe_lrc *lrc, const struct flex *flex, u32 count) { + struct dma_fence *fence; struct xe_bb *bb; int err;
@@ -667,7 +662,16 @@ static int xe_oa_modify_ctx_image(struct xe_oa_stream *stream, struct xe_lrc *lr
xe_oa_store_flex(stream, lrc, bb, flex, count);
- err = xe_oa_submit_bb(stream, bb); + fence = xe_oa_submit_bb(stream, bb); + if (IS_ERR(fence)) { + err = PTR_ERR(fence); + goto free_bb; + } + xe_bb_free(bb, fence); + dma_fence_put(fence); + + return 0; +free_bb: xe_bb_free(bb, NULL); exit: return err; @@ -675,6 +679,7 @@ static int xe_oa_modify_ctx_image(struct xe_oa_stream *stream, struct xe_lrc *lr
static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri) { + struct dma_fence *fence; struct xe_bb *bb; int err;
@@ -686,7 +691,16 @@ static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *re
write_cs_mi_lri(bb, reg_lri, 1);
- err = xe_oa_submit_bb(stream, bb); + fence = xe_oa_submit_bb(stream, bb); + if (IS_ERR(fence)) { + err = PTR_ERR(fence); + goto free_bb; + } + xe_bb_free(bb, fence); + dma_fence_put(fence); + + return 0; +free_bb: xe_bb_free(bb, NULL); exit: return err; @@ -914,15 +928,32 @@ static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config { #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500 struct xe_oa_config_bo *oa_bo; - int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US; + int err = 0, us = NOA_PROGRAM_ADDITIONAL_DELAY_US; + struct dma_fence *fence; + long timeout;
+ /* Emit OA configuration batch */ oa_bo = xe_oa_alloc_config_buffer(stream, config); if (IS_ERR(oa_bo)) { err = PTR_ERR(oa_bo); goto exit; }
- err = xe_oa_submit_bb(stream, oa_bo->bb); + fence = xe_oa_submit_bb(stream, oa_bo->bb); + if (IS_ERR(fence)) { + err = PTR_ERR(fence); + goto exit; + } + + /* Wait till all previous batches have executed */ + timeout = dma_fence_wait_timeout(fence, false, 5 * HZ); + dma_fence_put(fence); + if (timeout < 0) + err = timeout; + else if (!timeout) + err = -ETIME; + if (err) + drm_dbg(&stream->oa->xe->drm, "dma_fence_wait_timeout err %d\n", err);
/* Additional empirical delay needed for NOA programming after registers are written */ usleep_range(us, 2 * us);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ashutosh Dixit ashutosh.dixit@intel.com
[ Upstream commit c8507a25cebd179db935dd266a33c51bef1b1e80 ]
Now that we have laid the groundwork, introduce OA sync properties in the uapi and parse the input xe_sync array as is done elsewhere in the driver. Also add DRM_XE_OA_CAPS_SYNCS bit in OA capabilities for userspace.
v2: Fix and document DRM_XE_SYNC_TYPE_USER_FENCE for OA (Matt B) Add DRM_XE_OA_CAPS_SYNCS bit to OA capabilities (Jose)
Acked-by: José Roberto de Souza jose.souza@intel.com Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com Signed-off-by: Ashutosh Dixit ashutosh.dixit@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-3-ashut... Stable-dep-of: f0ed39830e60 ("xe/oa: Fix query mode of operation for OAR/OAC") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_oa.c | 83 +++++++++++++++++++++++++++++++- drivers/gpu/drm/xe/xe_oa_types.h | 6 +++ drivers/gpu/drm/xe/xe_query.c | 2 +- include/uapi/drm/xe_drm.h | 17 +++++++ 4 files changed, 106 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c index 3328529774cb7..20d279ed3c382 100644 --- a/drivers/gpu/drm/xe/xe_oa.c +++ b/drivers/gpu/drm/xe/xe_oa.c @@ -36,6 +36,7 @@ #include "xe_pm.h" #include "xe_sched_job.h" #include "xe_sriov.h" +#include "xe_sync.h"
#define DEFAULT_POLL_FREQUENCY_HZ 200 #define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ) @@ -70,6 +71,7 @@ struct flex { };
struct xe_oa_open_param { + struct xe_file *xef; u32 oa_unit_id; bool sample; u32 metric_set; @@ -81,6 +83,9 @@ struct xe_oa_open_param { struct xe_exec_queue *exec_q; struct xe_hw_engine *hwe; bool no_preempt; + struct drm_xe_sync __user *syncs_user; + int num_syncs; + struct xe_sync_entry *syncs; };
struct xe_oa_config_bo { @@ -1393,6 +1398,9 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream, stream->period_exponent = param->period_exponent; stream->no_preempt = param->no_preempt;
+ stream->num_syncs = param->num_syncs; + stream->syncs = param->syncs; + /* * For Xe2+, when overrun mode is enabled, there are no partial reports at the end * of buffer, making the OA buffer effectively a non-power-of-2 size circular @@ -1743,6 +1751,20 @@ static int xe_oa_set_no_preempt(struct xe_oa *oa, u64 value, return 0; }
+static int xe_oa_set_prop_num_syncs(struct xe_oa *oa, u64 value, + struct xe_oa_open_param *param) +{ + param->num_syncs = value; + return 0; +} + +static int xe_oa_set_prop_syncs_user(struct xe_oa *oa, u64 value, + struct xe_oa_open_param *param) +{ + param->syncs_user = u64_to_user_ptr(value); + return 0; +} + typedef int (*xe_oa_set_property_fn)(struct xe_oa *oa, u64 value, struct xe_oa_open_param *param); static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = { @@ -1755,6 +1777,8 @@ static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = { [DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID] = xe_oa_set_prop_exec_queue_id, [DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE] = xe_oa_set_prop_engine_instance, [DRM_XE_OA_PROPERTY_NO_PREEMPT] = xe_oa_set_no_preempt, + [DRM_XE_OA_PROPERTY_NUM_SYNCS] = xe_oa_set_prop_num_syncs, + [DRM_XE_OA_PROPERTY_SYNCS] = xe_oa_set_prop_syncs_user, };
static int xe_oa_user_ext_set_property(struct xe_oa *oa, u64 extension, @@ -1814,6 +1838,49 @@ static int xe_oa_user_extensions(struct xe_oa *oa, u64 extension, int ext_number return 0; }
+static int xe_oa_parse_syncs(struct xe_oa *oa, struct xe_oa_open_param *param) +{ + int ret, num_syncs, num_ufence = 0; + + if (param->num_syncs && !param->syncs_user) { + drm_dbg(&oa->xe->drm, "num_syncs specified without sync array\n"); + ret = -EINVAL; + goto exit; + } + + if (param->num_syncs) { + param->syncs = kcalloc(param->num_syncs, sizeof(*param->syncs), GFP_KERNEL); + if (!param->syncs) { + ret = -ENOMEM; + goto exit; + } + } + + for (num_syncs = 0; num_syncs < param->num_syncs; num_syncs++) { + ret = xe_sync_entry_parse(oa->xe, param->xef, ¶m->syncs[num_syncs], + ¶m->syncs_user[num_syncs], 0); + if (ret) + goto err_syncs; + + if (xe_sync_is_ufence(¶m->syncs[num_syncs])) + num_ufence++; + } + + if (XE_IOCTL_DBG(oa->xe, num_ufence > 1)) { + ret = -EINVAL; + goto err_syncs; + } + + return 0; + +err_syncs: + while (num_syncs--) + xe_sync_entry_cleanup(¶m->syncs[num_syncs]); + kfree(param->syncs); +exit: + return ret; +} + /** * xe_oa_stream_open_ioctl - Opens an OA stream * @dev: @drm_device @@ -1839,6 +1906,7 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f return -ENODEV; }
+ param.xef = xef; ret = xe_oa_user_extensions(oa, data, 0, ¶m); if (ret) return ret; @@ -1907,11 +1975,24 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f drm_dbg(&oa->xe->drm, "Using periodic sampling freq %lld Hz\n", oa_freq_hz); }
+ ret = xe_oa_parse_syncs(oa, ¶m); + if (ret) + goto err_exec_q; + mutex_lock(¶m.hwe->gt->oa.gt_lock); ret = xe_oa_stream_open_ioctl_locked(oa, ¶m); mutex_unlock(¶m.hwe->gt->oa.gt_lock); + if (ret < 0) + goto err_sync_cleanup; + + return ret; + +err_sync_cleanup: + while (param.num_syncs--) + xe_sync_entry_cleanup(¶m.syncs[param.num_syncs]); + kfree(param.syncs); err_exec_q: - if (ret < 0 && param.exec_q) + if (param.exec_q) xe_exec_queue_put(param.exec_q); return ret; } diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h index 8862eca73fbe3..99f4b2d4bdcf6 100644 --- a/drivers/gpu/drm/xe/xe_oa_types.h +++ b/drivers/gpu/drm/xe/xe_oa_types.h @@ -238,5 +238,11 @@ struct xe_oa_stream {
/** @no_preempt: Whether preemption and timeslicing is disabled for stream exec_q */ u32 no_preempt; + + /** @num_syncs: size of @syncs array */ + u32 num_syncs; + + /** @syncs: syncs to wait on and to signal */ + struct xe_sync_entry *syncs; }; #endif diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index 1c96375bd7df7..6fec5d1a1eb44 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -679,7 +679,7 @@ static int query_oa_units(struct xe_device *xe, du->oa_unit_id = u->oa_unit_id; du->oa_unit_type = u->type; du->oa_timestamp_freq = xe_oa_timestamp_frequency(gt); - du->capabilities = DRM_XE_OA_CAPS_BASE; + du->capabilities = DRM_XE_OA_CAPS_BASE | DRM_XE_OA_CAPS_SYNCS;
j = 0; for_each_hw_engine(hwe, gt, hwe_id) { diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index c4182e95a6195..4a8a4a63e99ca 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -1485,6 +1485,7 @@ struct drm_xe_oa_unit { /** @capabilities: OA capabilities bit-mask */ __u64 capabilities; #define DRM_XE_OA_CAPS_BASE (1 << 0) +#define DRM_XE_OA_CAPS_SYNCS (1 << 1)
/** @oa_timestamp_freq: OA timestamp freq */ __u64 oa_timestamp_freq; @@ -1634,6 +1635,22 @@ enum drm_xe_oa_property_id { * to be disabled for the stream exec queue. */ DRM_XE_OA_PROPERTY_NO_PREEMPT, + + /** + * @DRM_XE_OA_PROPERTY_NUM_SYNCS: Number of syncs in the sync array + * specified in @DRM_XE_OA_PROPERTY_SYNCS + */ + DRM_XE_OA_PROPERTY_NUM_SYNCS, + + /** + * @DRM_XE_OA_PROPERTY_SYNCS: Pointer to struct @drm_xe_sync array + * with array size specified via @DRM_XE_OA_PROPERTY_NUM_SYNCS. OA + * configuration will wait till input fences signal. Output fences + * will signal after the new OA configuration takes effect. For + * @DRM_XE_SYNC_TYPE_USER_FENCE, @addr is a user pointer, similar + * to the VM bind case. + */ + DRM_XE_OA_PROPERTY_SYNCS, };
/**
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ashutosh Dixit ashutosh.dixit@intel.com
[ Upstream commit 2fb4350a283af03a5ee34ba765783a941f942b82 ]
Add input fence dependencies which will make OA configuration wait till these dependencies are met (till input fences signal).
v2: Change add_deps arg to xe_oa_submit_bb from bool to enum (Matt Brost)
Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com Signed-off-by: Ashutosh Dixit ashutosh.dixit@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-4-ashut... Stable-dep-of: f0ed39830e60 ("xe/oa: Fix query mode of operation for OAR/OAC") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_oa.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c index 20d279ed3c382..1bfc4b58b5c17 100644 --- a/drivers/gpu/drm/xe/xe_oa.c +++ b/drivers/gpu/drm/xe/xe_oa.c @@ -42,6 +42,11 @@ #define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ) #define XE_OA_UNIT_INVALID U32_MAX
+enum xe_oa_submit_deps { + XE_OA_SUBMIT_NO_DEPS, + XE_OA_SUBMIT_ADD_DEPS, +}; + struct xe_oa_reg { struct xe_reg addr; u32 value; @@ -572,7 +577,8 @@ static __poll_t xe_oa_poll(struct file *file, poll_table *wait) return ret; }
-static struct dma_fence *xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb) +static struct dma_fence *xe_oa_submit_bb(struct xe_oa_stream *stream, enum xe_oa_submit_deps deps, + struct xe_bb *bb) { struct xe_sched_job *job; struct dma_fence *fence; @@ -585,11 +591,22 @@ static struct dma_fence *xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_ goto exit; }
+ if (deps == XE_OA_SUBMIT_ADD_DEPS) { + for (int i = 0; i < stream->num_syncs && !err; i++) + err = xe_sync_entry_add_deps(&stream->syncs[i], job); + if (err) { + drm_dbg(&stream->oa->xe->drm, "xe_sync_entry_add_deps err %d\n", err); + goto err_put_job; + } + } + xe_sched_job_arm(job); fence = dma_fence_get(&job->drm.s_fence->finished); xe_sched_job_push(job);
return fence; +err_put_job: + xe_sched_job_put(job); exit: return ERR_PTR(err); } @@ -667,7 +684,7 @@ static int xe_oa_modify_ctx_image(struct xe_oa_stream *stream, struct xe_lrc *lr
xe_oa_store_flex(stream, lrc, bb, flex, count);
- fence = xe_oa_submit_bb(stream, bb); + fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb); if (IS_ERR(fence)) { err = PTR_ERR(fence); goto free_bb; @@ -696,7 +713,7 @@ static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *re
write_cs_mi_lri(bb, reg_lri, 1);
- fence = xe_oa_submit_bb(stream, bb); + fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb); if (IS_ERR(fence)) { err = PTR_ERR(fence); goto free_bb; @@ -944,7 +961,7 @@ static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config goto exit; }
- fence = xe_oa_submit_bb(stream, oa_bo->bb); + fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_ADD_DEPS, oa_bo->bb); if (IS_ERR(fence)) { err = PTR_ERR(fence); goto exit;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com
[ Upstream commit 55039832f98c7e05f1cf9e0d8c12b2490abd0f16 ]
This is a set of squashed commits to facilitate smooth applying to stable. Each commit message is retained for reference.
1) Allow a GGTT mapped batch to be submitted to user exec queue
For a OA use case, one of the HW registers needs to be modified by submitting an MI_LOAD_REGISTER_IMM command to the users exec queue, so that the register is modified in the user's hardware context. In order to do this a batch that is mapped in GGTT, needs to be submitted to the user exec queue. Since all user submissions use q->vm and hence PPGTT, add some plumbing to enable submission of batches mapped in GGTT.
v2: ggtt is zero-initialized, so no need to set it false (Matt Brost)
2) xe/oa: Use MI_LOAD_REGISTER_IMMEDIATE to enable OAR/OAC
To enable OAR/OAC, a bit in RING_CONTEXT_CONTROL needs to be set. Setting this bit cause the context image size to change and if not done correct, can cause undesired hangs.
Current code uses a separate exec_queue to modify this bit and is error-prone. As per HW recommendation, submit MI_LOAD_REGISTER_IMM to the target hardware context to modify the relevant bit.
In v2 version, an attempt to submit everything to the user-queue was made, but it failed the unprivileged-single-ctx-counters test. It appears that the OACTXCONTROL must be modified from a remote context.
In v3 version, all context specific register configurations were moved to use LOAD_REGISTER_IMMEDIATE and that seems to work well. This is a cleaner way, since we can now submit all configuration to user exec_queue and the fence handling is simplified.
v2: (Matt) - set job->ggtt to true if create job is successful - unlock vm on job error
(Ashutosh) - don't wait on job submission - use kernel exec queue where possible
v3: (Ashutosh) - Fix checkpatch issues - Remove extra spaces/new-lines - Add Fixes: and Cc: tags - Reset context control bit when OA stream is closed - Submit all config via MI_LOAD_REGISTER_IMMEDIATE
(Umesh) - Update commit message for v3 experiment - Squash patches for easier port to stable
v4: (Ashutosh) - No need to pass q to xe_oa_submit_bb - Do not support exec queues with width > 1 - Fix disabling of CTX_CTRL_OAC_CONTEXT_ENABLE
v5: (Ashutosh) - Drop reg_lri related comments - Use XE_OA_SUBMIT_NO_DEPS in xe_oa_load_with_lri
Fixes: 8135f1c09dd2 ("drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close") Signed-off-by: Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com Reviewed-by: Matthew Brost matthew.brost@intel.com # commit 1 Reviewed-by: Ashutosh Dixit ashutosh.dixit@intel.com Cc: stable@vger.kernel.org Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com Signed-off-by: Ashutosh Dixit ashutosh.dixit@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241220171919.571528-2-umesh.... Stable-dep-of: f0ed39830e60 ("xe/oa: Fix query mode of operation for OAR/OAC") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_oa.c | 134 ++++++++---------------- drivers/gpu/drm/xe/xe_ring_ops.c | 5 +- drivers/gpu/drm/xe/xe_sched_job_types.h | 2 + 3 files changed, 51 insertions(+), 90 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c index 1bfc4b58b5c17..e6744422dee49 100644 --- a/drivers/gpu/drm/xe/xe_oa.c +++ b/drivers/gpu/drm/xe/xe_oa.c @@ -69,12 +69,6 @@ struct xe_oa_config { struct rcu_head rcu; };
-struct flex { - struct xe_reg reg; - u32 offset; - u32 value; -}; - struct xe_oa_open_param { struct xe_file *xef; u32 oa_unit_id; @@ -577,19 +571,38 @@ static __poll_t xe_oa_poll(struct file *file, poll_table *wait) return ret; }
+static void xe_oa_lock_vma(struct xe_exec_queue *q) +{ + if (q->vm) { + down_read(&q->vm->lock); + xe_vm_lock(q->vm, false); + } +} + +static void xe_oa_unlock_vma(struct xe_exec_queue *q) +{ + if (q->vm) { + xe_vm_unlock(q->vm); + up_read(&q->vm->lock); + } +} + static struct dma_fence *xe_oa_submit_bb(struct xe_oa_stream *stream, enum xe_oa_submit_deps deps, struct xe_bb *bb) { + struct xe_exec_queue *q = stream->exec_q ?: stream->k_exec_q; struct xe_sched_job *job; struct dma_fence *fence; int err = 0;
- /* Kernel configuration is issued on stream->k_exec_q, not stream->exec_q */ - job = xe_bb_create_job(stream->k_exec_q, bb); + xe_oa_lock_vma(q); + + job = xe_bb_create_job(q, bb); if (IS_ERR(job)) { err = PTR_ERR(job); goto exit; } + job->ggtt = true;
if (deps == XE_OA_SUBMIT_ADD_DEPS) { for (int i = 0; i < stream->num_syncs && !err; i++) @@ -604,10 +617,13 @@ static struct dma_fence *xe_oa_submit_bb(struct xe_oa_stream *stream, enum xe_oa fence = dma_fence_get(&job->drm.s_fence->finished); xe_sched_job_push(job);
+ xe_oa_unlock_vma(q); + return fence; err_put_job: xe_sched_job_put(job); exit: + xe_oa_unlock_vma(q); return ERR_PTR(err); }
@@ -655,63 +671,19 @@ static void xe_oa_free_configs(struct xe_oa_stream *stream) free_oa_config_bo(oa_bo); }
-static void xe_oa_store_flex(struct xe_oa_stream *stream, struct xe_lrc *lrc, - struct xe_bb *bb, const struct flex *flex, u32 count) -{ - u32 offset = xe_bo_ggtt_addr(lrc->bo); - - do { - bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_GGTT | MI_SDI_NUM_DW(1); - bb->cs[bb->len++] = offset + flex->offset * sizeof(u32); - bb->cs[bb->len++] = 0; - bb->cs[bb->len++] = flex->value; - - } while (flex++, --count); -} - -static int xe_oa_modify_ctx_image(struct xe_oa_stream *stream, struct xe_lrc *lrc, - const struct flex *flex, u32 count) -{ - struct dma_fence *fence; - struct xe_bb *bb; - int err; - - bb = xe_bb_new(stream->gt, 4 * count, false); - if (IS_ERR(bb)) { - err = PTR_ERR(bb); - goto exit; - } - - xe_oa_store_flex(stream, lrc, bb, flex, count); - - fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb); - if (IS_ERR(fence)) { - err = PTR_ERR(fence); - goto free_bb; - } - xe_bb_free(bb, fence); - dma_fence_put(fence); - - return 0; -free_bb: - xe_bb_free(bb, NULL); -exit: - return err; -} - -static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri) +static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri, u32 count) { struct dma_fence *fence; struct xe_bb *bb; int err;
- bb = xe_bb_new(stream->gt, 3, false); + bb = xe_bb_new(stream->gt, 2 * count + 1, false); if (IS_ERR(bb)) { err = PTR_ERR(bb); goto exit; }
- write_cs_mi_lri(bb, reg_lri, 1); + write_cs_mi_lri(bb, reg_lri, count);
fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb); if (IS_ERR(fence)) { @@ -731,70 +703,54 @@ static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *re static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable) { const struct xe_oa_format *format = stream->oa_buffer.format; - struct xe_lrc *lrc = stream->exec_q->lrc[0]; - u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32); u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) | (enable ? OAR_OACONTROL_COUNTER_ENABLE : 0);
- struct flex regs_context[] = { + struct xe_oa_reg reg_lri[] = { { OACTXCONTROL(stream->hwe->mmio_base), - stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1, enable ? OA_COUNTER_RESUME : 0, }, + { + OAR_OACONTROL, + oacontrol, + }, { RING_CONTEXT_CONTROL(stream->hwe->mmio_base), - regs_offset + CTX_CONTEXT_CONTROL, - _MASKED_BIT_ENABLE(CTX_CTRL_OAC_CONTEXT_ENABLE), + _MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE, + enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0) }, }; - struct xe_oa_reg reg_lri = { OAR_OACONTROL, oacontrol }; - int err; - - /* Modify stream hwe context image with regs_context */ - err = xe_oa_modify_ctx_image(stream, stream->exec_q->lrc[0], - regs_context, ARRAY_SIZE(regs_context)); - if (err) - return err;
- /* Apply reg_lri using LRI */ - return xe_oa_load_with_lri(stream, ®_lri); + return xe_oa_load_with_lri(stream, reg_lri, ARRAY_SIZE(reg_lri)); }
static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable) { const struct xe_oa_format *format = stream->oa_buffer.format; - struct xe_lrc *lrc = stream->exec_q->lrc[0]; - u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32); u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) | (enable ? OAR_OACONTROL_COUNTER_ENABLE : 0); - struct flex regs_context[] = { + struct xe_oa_reg reg_lri[] = { { OACTXCONTROL(stream->hwe->mmio_base), - stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1, enable ? OA_COUNTER_RESUME : 0, }, + { + OAC_OACONTROL, + oacontrol + }, { RING_CONTEXT_CONTROL(stream->hwe->mmio_base), - regs_offset + CTX_CONTEXT_CONTROL, - _MASKED_BIT_ENABLE(CTX_CTRL_OAC_CONTEXT_ENABLE) | + _MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE, + enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0) | _MASKED_FIELD(CTX_CTRL_RUN_ALONE, enable ? CTX_CTRL_RUN_ALONE : 0), }, }; - struct xe_oa_reg reg_lri = { OAC_OACONTROL, oacontrol }; - int err;
/* Set ccs select to enable programming of OAC_OACONTROL */ xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctrl, __oa_ccs_select(stream));
- /* Modify stream hwe context image with regs_context */ - err = xe_oa_modify_ctx_image(stream, stream->exec_q->lrc[0], - regs_context, ARRAY_SIZE(regs_context)); - if (err) - return err; - - /* Apply reg_lri using LRI */ - return xe_oa_load_with_lri(stream, ®_lri); + return xe_oa_load_with_lri(stream, reg_lri, ARRAY_SIZE(reg_lri)); }
static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable) @@ -1933,8 +1889,8 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f if (XE_IOCTL_DBG(oa->xe, !param.exec_q)) return -ENOENT;
- if (param.exec_q->width > 1) - drm_dbg(&oa->xe->drm, "exec_q->width > 1, programming only exec_q->lrc[0]\n"); + if (XE_IOCTL_DBG(oa->xe, param.exec_q->width > 1)) + return -EOPNOTSUPP; }
/* diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c index 0be4f489d3e12..9f327f27c0726 100644 --- a/drivers/gpu/drm/xe/xe_ring_ops.c +++ b/drivers/gpu/drm/xe/xe_ring_ops.c @@ -221,7 +221,10 @@ static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
static u32 get_ppgtt_flag(struct xe_sched_job *job) { - return job->q->vm ? BIT(8) : 0; + if (job->q->vm && !job->ggtt) + return BIT(8); + + return 0; }
static int emit_copy_timestamp(struct xe_lrc *lrc, u32 *dw, int i) diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h index 0d3f76fb05cea..c207361bf43e1 100644 --- a/drivers/gpu/drm/xe/xe_sched_job_types.h +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h @@ -57,6 +57,8 @@ struct xe_sched_job { u32 migrate_flush_flags; /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */ bool ring_ops_flush_tlb; + /** @ggtt: mapped in ggtt. */ + bool ggtt; /** @ptrs: per instance pointers. */ struct xe_job_ptrs ptrs[]; };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 928b4de66ed3b0d9a6f201ce41ab2eed6ea2e7ef ]
The function extent_writepage_io() will submit the dirty sectors inside the page for the write.
But recently to co-operate with the incoming subpage compression enhancement, a new bitmap is introduced to btrfs_bio_ctrl::submit_bitmap, to only avoid a subset of the dirty range.
This is because we can have the following cases with 64K page size:
0 16K 32K 48K 64K | |/////////| |/| 52K
For range [16K, 32K), we queue the dirty range for compression, which is ran in a delayed workqueue. Then for range [48K, 52K), we go through the regular submission path.
In that case, our btrfs_bio_ctrl::submit_bitmap will exclude the range [16K, 32K).
The dirty flags for the range [16K, 32K) is only cleared when the compression is done, by the extent_clear_unlock_delalloc() call inside submit_one_async_extent().
This patch fix the false alert by removing the btrfs_folio_assert_not_dirty() check, since it's no longer correct for subpage compression cases.
Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fe08c983d5bb4..9ff72a5a13eb3 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1394,8 +1394,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, goto out; submitted_io = true; } - - btrfs_folio_assert_not_dirty(fs_info, folio, start, len); out: /* * If we didn't submitted any sector (>= i_size), folio dirty get
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 2bca8eb0774d271b1077b72f1be135073e0a898f ]
Currently for subpage (sector size < page size) cases, we reuse subpage locked bitmap to find out all delalloc ranges we have locked, and run all those found ranges.
However such reuse is not perfect, e.g.:
0 32K 64K 96K 128K | |////////||///////| |////| 120K
For above range, writepage_delalloc() for page 0 will handle the range [32K, 96k), note delalloc range can be beyond the page boundary.
But writepage_delalloc() for page 64K will only handle range [120K, 128K), as the previous run on page 0 has already handled range [64K, 96K). Meanwhile for the writeback we should expect range [64K, 96K) to also be locked, this leads to the mismatch from locked bitmap and delalloc range.
This is not causing problems yet, but it's still an inconsistent behavior.
So instead of relying on the subpage locked bitmap, move the delalloc range search using local @delalloc_bitmap, so that we can remove the existing btrfs_folio_find_writer_locked().
Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 44 ++++++++++++++++++++++++++++++++++++++++- fs/btrfs/subpage.c | 47 -------------------------------------------- fs/btrfs/subpage.h | 4 ---- 3 files changed, 43 insertions(+), 52 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9ff72a5a13eb3..ba34b92d48c2f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1105,6 +1105,45 @@ int btrfs_read_folio(struct file *file, struct folio *folio) return ret; }
+static void set_delalloc_bitmap(struct folio *folio, unsigned long *delalloc_bitmap, + u64 start, u32 len) +{ + struct btrfs_fs_info *fs_info = folio_to_fs_info(folio); + const u64 folio_start = folio_pos(folio); + unsigned int start_bit; + unsigned int nbits; + + ASSERT(start >= folio_start && start + len <= folio_start + PAGE_SIZE); + start_bit = (start - folio_start) >> fs_info->sectorsize_bits; + nbits = len >> fs_info->sectorsize_bits; + ASSERT(bitmap_test_range_all_zero(delalloc_bitmap, start_bit, nbits)); + bitmap_set(delalloc_bitmap, start_bit, nbits); +} + +static bool find_next_delalloc_bitmap(struct folio *folio, + unsigned long *delalloc_bitmap, u64 start, + u64 *found_start, u32 *found_len) +{ + struct btrfs_fs_info *fs_info = folio_to_fs_info(folio); + const u64 folio_start = folio_pos(folio); + const unsigned int bitmap_size = fs_info->sectors_per_page; + unsigned int start_bit; + unsigned int first_zero; + unsigned int first_set; + + ASSERT(start >= folio_start && start < folio_start + PAGE_SIZE); + + start_bit = (start - folio_start) >> fs_info->sectorsize_bits; + first_set = find_next_bit(delalloc_bitmap, bitmap_size, start_bit); + if (first_set >= bitmap_size) + return false; + + *found_start = folio_start + (first_set << fs_info->sectorsize_bits); + first_zero = find_next_zero_bit(delalloc_bitmap, bitmap_size, first_set); + *found_len = (first_zero - first_set) << fs_info->sectorsize_bits; + return true; +} + /* * helper for extent_writepage(), doing all of the delayed allocation setup. * @@ -1124,6 +1163,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, const bool is_subpage = btrfs_is_subpage(fs_info, folio->mapping); const u64 page_start = folio_pos(folio); const u64 page_end = page_start + folio_size(folio) - 1; + unsigned long delalloc_bitmap = 0; /* * Save the last found delalloc end. As the delalloc end can go beyond * page boundary, thus we cannot rely on subpage bitmap to locate the @@ -1151,6 +1191,8 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, delalloc_start = delalloc_end + 1; continue; } + set_delalloc_bitmap(folio, &delalloc_bitmap, delalloc_start, + min(delalloc_end, page_end) + 1 - delalloc_start); btrfs_folio_set_writer_lock(fs_info, folio, delalloc_start, min(delalloc_end, page_end) + 1 - delalloc_start); @@ -1178,7 +1220,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, found_len = last_delalloc_end + 1 - found_start; found = true; } else { - found = btrfs_subpage_find_writer_locked(fs_info, folio, + found = find_next_delalloc_bitmap(folio, &delalloc_bitmap, delalloc_start, &found_start, &found_len); } if (!found) diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index ec7328a6bfd75..c4950e04f481a 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -801,53 +801,6 @@ void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, spin_unlock_irqrestore(&subpage->lock, flags); }
-/* - * Find any subpage writer locked range inside @folio, starting at file offset - * @search_start. The caller should ensure the folio is locked. - * - * Return true and update @found_start_ret and @found_len_ret to the first - * writer locked range. - * Return false if there is no writer locked range. - */ -bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 search_start, - u64 *found_start_ret, u32 *found_len_ret) -{ - struct btrfs_subpage *subpage = folio_get_private(folio); - const u32 sectors_per_page = fs_info->sectors_per_page; - const unsigned int len = PAGE_SIZE - offset_in_page(search_start); - const unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, - locked, search_start, len); - const unsigned int locked_bitmap_start = sectors_per_page * btrfs_bitmap_nr_locked; - const unsigned int locked_bitmap_end = locked_bitmap_start + sectors_per_page; - unsigned long flags; - int first_zero; - int first_set; - bool found = false; - - ASSERT(folio_test_locked(folio)); - spin_lock_irqsave(&subpage->lock, flags); - first_set = find_next_bit(subpage->bitmaps, locked_bitmap_end, start_bit); - if (first_set >= locked_bitmap_end) - goto out; - - found = true; - - *found_start_ret = folio_pos(folio) + - ((first_set - locked_bitmap_start) << fs_info->sectorsize_bits); - /* - * Since @first_set is ensured to be smaller than locked_bitmap_end - * here, @found_start_ret should be inside the folio. - */ - ASSERT(*found_start_ret < folio_pos(folio) + PAGE_SIZE); - - first_zero = find_next_zero_bit(subpage->bitmaps, locked_bitmap_end, first_set); - *found_len_ret = (first_zero - first_set) << fs_info->sectorsize_bits; -out: - spin_unlock_irqrestore(&subpage->lock, flags); - return found; -} - #define GET_SUBPAGE_BITMAP(subpage, fs_info, name, dst) \ { \ const int sectors_per_page = fs_info->sectors_per_page; \ diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index cdb554e0d215e..197ec6c0b07b2 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -108,10 +108,6 @@ void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); void btrfs_folio_end_writer_lock_bitmap(const struct btrfs_fs_info *fs_info, struct folio *folio, unsigned long bitmap); -bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 search_start, - u64 *found_start_ret, u32 *found_len_ret); - /* * Template for subpage related operations. *
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit c96d0e3921419bd3e5d8a1f355970c8ae3047ef4 ]
Currently we only mark sectors as locked if there is a *NEW* delalloc range for it.
But NEW delalloc range is not the same as dirty sectors we want to submit, e.g:
0 32K 64K 96K 128K | |////////||///////| |////| 120K
For above 64K page size case, writepage_delalloc() for page 0 will find and lock the delalloc range [32K, 96K), which is beyond the page boundary.
Then when writepage_delalloc() is called for the page 64K, since [64K, 96K) is already locked, only [120K, 128K) will be locked.
This means, although range [64K, 96K) is dirty and will be submitted later by extent_writepage_io(), it will not be marked as locked.
This is fine for now, as we call btrfs_folio_end_writer_lock_bitmap() to free every non-compressed sector, and compression is only allowed for full page range.
But this is not safe for future sector perfect compression support, as this can lead to double folio unlock:
Thread A | Thread B ---------------------------------------+-------------------------------- | submit_one_async_extent() | |- extent_clear_unlock_delalloc() extent_writepage() | |- btrfs_folio_end_writer_lock() |- btrfs_folio_end_writer_lock_bitmap()| |- btrfs_subpage_end_and_test_writer() | | | |- atomic_sub_and_test() | | | /* Now the atomic value is 0 */ |- if (atomic_read() == 0) | | |- folio_unlock() | |- folio_unlock()
The root cause is the above range [64K, 96K) is dirtied and should also be locked but it isn't.
So to make everything more consistent and prepare for the incoming sector perfect compression, mark all dirty sectors as locked.
Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ba34b92d48c2f..8222ae6f29af5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1174,6 +1174,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, u64 delalloc_end = page_end; u64 delalloc_to_write = 0; int ret = 0; + int bit;
/* Save the dirty bitmap as our submission bitmap will be a subset of it. */ if (btrfs_is_subpage(fs_info, inode->vfs_inode.i_mapping)) { @@ -1183,6 +1184,12 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, bio_ctrl->submit_bitmap = 1; }
+ for_each_set_bit(bit, &bio_ctrl->submit_bitmap, fs_info->sectors_per_page) { + u64 start = page_start + (bit << fs_info->sectorsize_bits); + + btrfs_folio_set_writer_lock(fs_info, folio, start, fs_info->sectorsize); + } + /* Lock all (subpage) delalloc ranges inside the folio first. */ while (delalloc_start < page_end) { delalloc_end = page_end; @@ -1193,9 +1200,6 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, } set_delalloc_bitmap(folio, &delalloc_bitmap, delalloc_start, min(delalloc_end, page_end) + 1 - delalloc_start); - btrfs_folio_set_writer_lock(fs_info, folio, delalloc_start, - min(delalloc_end, page_end) + 1 - - delalloc_start); last_delalloc_end = delalloc_end; delalloc_start = delalloc_end + 1; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 8511074c42b6255e03eceb09396338572572f1c7 ]
This function is not really suitable to lock a folio, as it lacks the proper mapping checks, thus the locked folio may not even belong to btrfs.
And due to the above reason, the last user inside lock_delalloc_folios() is already removed, and we can remove this function.
Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/subpage.c | 47 ---------------------------------------------- fs/btrfs/subpage.h | 2 -- 2 files changed, 49 deletions(-)
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index c4950e04f481a..99341e98bbcc7 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -295,26 +295,6 @@ static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len) orig_start + orig_len) - *start; }
-static void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) -{ - struct btrfs_subpage *subpage = folio_get_private(folio); - const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); - const int nbits = (len >> fs_info->sectorsize_bits); - unsigned long flags; - int ret; - - btrfs_subpage_assert(fs_info, folio, start, len); - - spin_lock_irqsave(&subpage->lock, flags); - ASSERT(atomic_read(&subpage->readers) == 0); - ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); - bitmap_set(subpage->bitmaps, start_bit, nbits); - ret = atomic_add_return(nbits, &subpage->writers); - ASSERT(ret == nbits); - spin_unlock_irqrestore(&subpage->lock, flags); -} - static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { @@ -351,33 +331,6 @@ static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_inf return last; }
-/* - * Lock a folio for delalloc page writeback. - * - * Return -EAGAIN if the page is not properly initialized. - * Return 0 with the page locked, and writer counter updated. - * - * Even with 0 returned, the page still need extra check to make sure - * it's really the correct page, as the caller is using - * filemap_get_folios_contig(), which can race with page invalidating. - */ -int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) -{ - if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping)) { - folio_lock(folio); - return 0; - } - folio_lock(folio); - if (!folio_test_private(folio) || !folio_get_private(folio)) { - folio_unlock(folio); - return -EAGAIN; - } - btrfs_subpage_clamp_range(folio, &start, &len); - btrfs_subpage_start_writer(fs_info, folio, start, len); - return 0; -} - /* * Handle different locked folios: * diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 197ec6c0b07b2..6289d6f65b87d 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -100,8 +100,6 @@ void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info, void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len);
-int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len); void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 336e69f3025fb70db9d0dfb7f36ac79887bf5341 ]
Since commit d7172f52e993 ("btrfs: use per-buffer locking for extent_buffer reading"), metadata read no longer relies on the subpage reader locking.
This means we do not need to maintain a different metadata/data split for locking, so we can convert the existing reader lock users by:
- add_ra_bio_pages() Convert to btrfs_folio_set_writer_lock()
- end_folio_read() Convert to btrfs_folio_end_writer_lock()
- begin_folio_read() Convert to btrfs_folio_set_writer_lock()
- folio_range_has_eb() Remove the subpage->readers checks, since it is always 0.
- Remove btrfs_subpage_start_reader() and btrfs_subpage_end_reader()
Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/compression.c | 3 +- fs/btrfs/extent_io.c | 10 ++----- fs/btrfs/subpage.c | 62 ++---------------------------------------- fs/btrfs/subpage.h | 13 --------- 4 files changed, 5 insertions(+), 83 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 90aef2627ca27..64eaf74fbebc8 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -545,8 +545,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, * subpage::readers and to unlock the page. */ if (fs_info->sectorsize < PAGE_SIZE) - btrfs_subpage_start_reader(fs_info, folio, cur, - add_size); + btrfs_folio_set_writer_lock(fs_info, folio, cur, add_size); folio_put(folio); cur += add_size; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8222ae6f29af5..5d6b3b812593d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -438,7 +438,7 @@ static void end_folio_read(struct folio *folio, bool uptodate, u64 start, u32 le if (!btrfs_is_subpage(fs_info, folio->mapping)) folio_unlock(folio); else - btrfs_subpage_end_reader(fs_info, folio, start, len); + btrfs_folio_end_writer_lock(fs_info, folio, start, len); }
/* @@ -495,7 +495,7 @@ static void begin_folio_read(struct btrfs_fs_info *fs_info, struct folio *folio) return;
ASSERT(folio_test_private(folio)); - btrfs_subpage_start_reader(fs_info, folio, folio_pos(folio), PAGE_SIZE); + btrfs_folio_set_writer_lock(fs_info, folio, folio_pos(folio), PAGE_SIZE); }
/* @@ -2507,12 +2507,6 @@ static bool folio_range_has_eb(struct btrfs_fs_info *fs_info, struct folio *foli subpage = folio_get_private(folio); if (atomic_read(&subpage->eb_refs)) return true; - /* - * Even there is no eb refs here, we may still have - * end_folio_read() call relying on page::private. - */ - if (atomic_read(&subpage->readers)) - return true; } return false; } diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 99341e98bbcc7..0587a7d7b5e81 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -140,12 +140,10 @@ struct btrfs_subpage *btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info, return ERR_PTR(-ENOMEM);
spin_lock_init(&ret->lock); - if (type == BTRFS_SUBPAGE_METADATA) { + if (type == BTRFS_SUBPAGE_METADATA) atomic_set(&ret->eb_refs, 0); - } else { - atomic_set(&ret->readers, 0); + else atomic_set(&ret->writers, 0); - } return ret; }
@@ -221,62 +219,6 @@ static void btrfs_subpage_assert(const struct btrfs_fs_info *fs_info, __start_bit; \ })
-void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) -{ - struct btrfs_subpage *subpage = folio_get_private(folio); - const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); - const int nbits = len >> fs_info->sectorsize_bits; - unsigned long flags; - - - btrfs_subpage_assert(fs_info, folio, start, len); - - spin_lock_irqsave(&subpage->lock, flags); - /* - * Even though it's just for reading the page, no one should have - * locked the subpage range. - */ - ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); - bitmap_set(subpage->bitmaps, start_bit, nbits); - atomic_add(nbits, &subpage->readers); - spin_unlock_irqrestore(&subpage->lock, flags); -} - -void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) -{ - struct btrfs_subpage *subpage = folio_get_private(folio); - const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); - const int nbits = len >> fs_info->sectorsize_bits; - unsigned long flags; - bool is_data; - bool last; - - btrfs_subpage_assert(fs_info, folio, start, len); - is_data = is_data_inode(BTRFS_I(folio->mapping->host)); - - spin_lock_irqsave(&subpage->lock, flags); - - /* The range should have already been locked. */ - ASSERT(bitmap_test_range_all_set(subpage->bitmaps, start_bit, nbits)); - ASSERT(atomic_read(&subpage->readers) >= nbits); - - bitmap_clear(subpage->bitmaps, start_bit, nbits); - last = atomic_sub_and_test(nbits, &subpage->readers); - - /* - * For data we need to unlock the page if the last read has finished. - * - * And please don't replace @last with atomic_sub_and_test() call - * inside if () condition. - * As we want the atomic_sub_and_test() to be always executed. - */ - if (is_data && last) - folio_unlock(folio); - spin_unlock_irqrestore(&subpage->lock, flags); -} - static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len) { u64 orig_start = *start; diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 6289d6f65b87d..8488ea057b30b 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -45,14 +45,6 @@ enum { struct btrfs_subpage { /* Common members for both data and metadata pages */ spinlock_t lock; - /* - * Both data and metadata needs to track how many readers are for the - * page. - * Data relies on @readers to unlock the page when last reader finished. - * While metadata doesn't need page unlock, it needs to prevent - * page::private get cleared before the last end_page_read(). - */ - atomic_t readers; union { /* * Structures only used by metadata @@ -95,11 +87,6 @@ void btrfs_free_subpage(struct btrfs_subpage *subpage); void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio); void btrfs_folio_dec_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio);
-void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len); -void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len); - void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 0f7120266584490616f031873e7148495d77dd68 ]
Since there is no user of reader locks, rename the writer locks into a more generic name, by removing the "_writer" part from the name.
And also rename btrfs_subpage::writer into btrfs_subpage::locked.
Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/compression.c | 2 +- fs/btrfs/extent_io.c | 14 ++++++------- fs/btrfs/subpage.c | 46 +++++++++++++++++++++--------------------- fs/btrfs/subpage.h | 20 ++++++++++-------- 4 files changed, 43 insertions(+), 39 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 64eaf74fbebc8..40332ab62f101 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -545,7 +545,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, * subpage::readers and to unlock the page. */ if (fs_info->sectorsize < PAGE_SIZE) - btrfs_folio_set_writer_lock(fs_info, folio, cur, add_size); + btrfs_folio_set_lock(fs_info, folio, cur, add_size); folio_put(folio); cur += add_size; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5d6b3b812593d..5a1bde8cc8b64 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -190,7 +190,7 @@ static void process_one_folio(struct btrfs_fs_info *fs_info, btrfs_folio_clamp_clear_writeback(fs_info, folio, start, len);
if (folio != locked_folio && (page_ops & PAGE_UNLOCK)) - btrfs_folio_end_writer_lock(fs_info, folio, start, len); + btrfs_folio_end_lock(fs_info, folio, start, len); }
static void __process_folios_contig(struct address_space *mapping, @@ -276,7 +276,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, range_start = max_t(u64, folio_pos(folio), start); range_len = min_t(u64, folio_pos(folio) + folio_size(folio), end + 1) - range_start; - btrfs_folio_set_writer_lock(fs_info, folio, range_start, range_len); + btrfs_folio_set_lock(fs_info, folio, range_start, range_len);
processed_end = range_start + range_len - 1; } @@ -438,7 +438,7 @@ static void end_folio_read(struct folio *folio, bool uptodate, u64 start, u32 le if (!btrfs_is_subpage(fs_info, folio->mapping)) folio_unlock(folio); else - btrfs_folio_end_writer_lock(fs_info, folio, start, len); + btrfs_folio_end_lock(fs_info, folio, start, len); }
/* @@ -495,7 +495,7 @@ static void begin_folio_read(struct btrfs_fs_info *fs_info, struct folio *folio) return;
ASSERT(folio_test_private(folio)); - btrfs_folio_set_writer_lock(fs_info, folio, folio_pos(folio), PAGE_SIZE); + btrfs_folio_set_lock(fs_info, folio, folio_pos(folio), PAGE_SIZE); }
/* @@ -1187,7 +1187,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, for_each_set_bit(bit, &bio_ctrl->submit_bitmap, fs_info->sectors_per_page) { u64 start = page_start + (bit << fs_info->sectorsize_bits);
- btrfs_folio_set_writer_lock(fs_info, folio, start, fs_info->sectorsize); + btrfs_folio_set_lock(fs_info, folio, start, fs_info->sectorsize); }
/* Lock all (subpage) delalloc ranges inside the folio first. */ @@ -1523,7 +1523,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl * Only unlock ranges that are submitted. As there can be some async * submitted ranges inside the folio. */ - btrfs_folio_end_writer_lock_bitmap(fs_info, folio, bio_ctrl->submit_bitmap); + btrfs_folio_end_lock_bitmap(fs_info, folio, bio_ctrl->submit_bitmap); ASSERT(ret <= 0); return ret; } @@ -2280,7 +2280,7 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f cur, cur_len, !ret); mapping_set_error(mapping, ret); } - btrfs_folio_end_writer_lock(fs_info, folio, cur, cur_len); + btrfs_folio_end_lock(fs_info, folio, cur, cur_len); if (ret < 0) found_error = true; next_page: diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index 0587a7d7b5e81..88a01d51ab11f 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -143,7 +143,7 @@ struct btrfs_subpage *btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info, if (type == BTRFS_SUBPAGE_METADATA) atomic_set(&ret->eb_refs, 0); else - atomic_set(&ret->writers, 0); + atomic_set(&ret->nr_locked, 0); return ret; }
@@ -237,8 +237,8 @@ static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len) orig_start + orig_len) - *start; }
-static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) +static bool btrfs_subpage_end_and_test_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); @@ -256,9 +256,9 @@ static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_inf * extent_clear_unlock_delalloc() for compression path. * * This @locked_page is locked by plain lock_page(), thus its - * subpage::writers is 0. Handle them in a special way. + * subpage::locked is 0. Handle them in a special way. */ - if (atomic_read(&subpage->writers) == 0) { + if (atomic_read(&subpage->nr_locked) == 0) { spin_unlock_irqrestore(&subpage->lock, flags); return true; } @@ -267,8 +267,8 @@ static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_inf clear_bit(bit, subpage->bitmaps); cleared++; } - ASSERT(atomic_read(&subpage->writers) >= cleared); - last = atomic_sub_and_test(cleared, &subpage->writers); + ASSERT(atomic_read(&subpage->nr_locked) >= cleared); + last = atomic_sub_and_test(cleared, &subpage->nr_locked); spin_unlock_irqrestore(&subpage->lock, flags); return last; } @@ -289,8 +289,8 @@ static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_inf * bitmap, reduce the writer lock number, and unlock the page if that's * the last locked range. */ -void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) +void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio);
@@ -303,24 +303,24 @@ void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info,
/* * For subpage case, there are two types of locked page. With or - * without writers number. + * without locked number. * - * Since we own the page lock, no one else could touch subpage::writers + * Since we own the page lock, no one else could touch subpage::locked * and we are safe to do several atomic operations without spinlock. */ - if (atomic_read(&subpage->writers) == 0) { - /* No writers, locked by plain lock_page(). */ + if (atomic_read(&subpage->nr_locked) == 0) { + /* No subpage lock, locked by plain lock_page(). */ folio_unlock(folio); return; }
btrfs_subpage_clamp_range(folio, &start, &len); - if (btrfs_subpage_end_and_test_writer(fs_info, folio, start, len)) + if (btrfs_subpage_end_and_test_lock(fs_info, folio, start, len)) folio_unlock(folio); }
-void btrfs_folio_end_writer_lock_bitmap(const struct btrfs_fs_info *fs_info, - struct folio *folio, unsigned long bitmap) +void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info, + struct folio *folio, unsigned long bitmap) { struct btrfs_subpage *subpage = folio_get_private(folio); const int start_bit = fs_info->sectors_per_page * btrfs_bitmap_nr_locked; @@ -334,8 +334,8 @@ void btrfs_folio_end_writer_lock_bitmap(const struct btrfs_fs_info *fs_info, return; }
- if (atomic_read(&subpage->writers) == 0) { - /* No writers, locked by plain lock_page(). */ + if (atomic_read(&subpage->nr_locked) == 0) { + /* No subpage lock, locked by plain lock_page(). */ folio_unlock(folio); return; } @@ -345,8 +345,8 @@ void btrfs_folio_end_writer_lock_bitmap(const struct btrfs_fs_info *fs_info, if (test_and_clear_bit(bit + start_bit, subpage->bitmaps)) cleared++; } - ASSERT(atomic_read(&subpage->writers) >= cleared); - last = atomic_sub_and_test(cleared, &subpage->writers); + ASSERT(atomic_read(&subpage->nr_locked) >= cleared); + last = atomic_sub_and_test(cleared, &subpage->nr_locked); spin_unlock_irqrestore(&subpage->lock, flags); if (last) folio_unlock(folio); @@ -671,8 +671,8 @@ void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, * This populates the involved subpage ranges so that subpage helpers can * properly unlock them. */ -void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) +void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage; unsigned long flags; @@ -691,7 +691,7 @@ void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, /* Target range should not yet be locked. */ ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); bitmap_set(subpage->bitmaps, start_bit, nbits); - ret = atomic_add_return(nbits, &subpage->writers); + ret = atomic_add_return(nbits, &subpage->nr_locked); ASSERT(ret <= fs_info->sectors_per_page); spin_unlock_irqrestore(&subpage->lock, flags); } diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 8488ea057b30b..44fff1f4eac48 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -54,8 +54,12 @@ struct btrfs_subpage { */ atomic_t eb_refs;
- /* Structures only used by data */ - atomic_t writers; + /* + * Structures only used by data, + * + * How many sectors inside the page is locked. + */ + atomic_t nr_locked; }; unsigned long bitmaps[]; }; @@ -87,12 +91,12 @@ void btrfs_free_subpage(struct btrfs_subpage *subpage); void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio); void btrfs_folio_dec_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio);
-void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len); -void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len); -void btrfs_folio_end_writer_lock_bitmap(const struct btrfs_fs_info *fs_info, - struct folio *folio, unsigned long bitmap); +void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len); +void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info, + struct folio *folio, u64 start, u32 len); +void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info, + struct folio *folio, unsigned long bitmap); /* * Template for subpage related operations. *
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
[ Upstream commit 011a9a1f244656cc3cbde47edba2b250f794d440 ]
As extent_writepage() is internal helper we should use our inode type, so change it from struct inode.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5a1bde8cc8b64..e8f882f949051 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1467,15 +1467,15 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, */ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl) { - struct inode *inode = folio->mapping->host; - struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); + struct btrfs_inode *inode = BTRFS_I(folio->mapping->host); + struct btrfs_fs_info *fs_info = inode->root->fs_info; const u64 page_start = folio_pos(folio); int ret; size_t pg_offset; - loff_t i_size = i_size_read(inode); + loff_t i_size = i_size_read(&inode->vfs_inode); unsigned long end_index = i_size >> PAGE_SHIFT;
- trace_extent_writepage(folio, inode, bio_ctrl->wbc); + trace_extent_writepage(folio, &inode->vfs_inode, bio_ctrl->wbc);
WARN_ON(!folio_test_locked(folio));
@@ -1499,13 +1499,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl if (ret < 0) goto done;
- ret = writepage_delalloc(BTRFS_I(inode), folio, bio_ctrl); + ret = writepage_delalloc(inode, folio, bio_ctrl); if (ret == 1) return 0; if (ret) goto done;
- ret = extent_writepage_io(BTRFS_I(inode), folio, folio_pos(folio), + ret = extent_writepage_io(inode, folio, folio_pos(folio), PAGE_SIZE, bio_ctrl, i_size); if (ret == 1) return 0; @@ -1514,7 +1514,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
done: if (ret) { - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, + btrfs_mark_ordered_io_finished(inode, folio, page_start, PAGE_SIZE, !ret); mapping_set_error(folio->mapping, ret); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 72dad8e377afa50435940adfb697e070d3556670 ]
[BUG] When running btrfs with block size (4K) smaller than page size (64K, aarch64), there is a very high chance to crash the kernel at generic/750, with the following messages: (before the call traces, there are 3 extra debug messages added)
BTRFS warning (device dm-3): read-write for sector size 4096 with page size 65536 is experimental BTRFS info (device dm-3): checking UUID tree hrtimer: interrupt took 5451385 ns BTRFS error (device dm-3): cow_file_range failed, root=4957 inode=257 start=1605632 len=69632: -28 BTRFS error (device dm-3): run_delalloc_nocow failed, root=4957 inode=257 start=1605632 len=69632: -28 BTRFS error (device dm-3): failed to run delalloc range, root=4957 ino=257 folio=1572864 submit_bitmap=8-15 start=1605632 len=69632: -28 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 3020984 at ordered-data.c:360 can_finish_ordered_extent+0x370/0x3b8 [btrfs] CPU: 2 UID: 0 PID: 3020984 Comm: kworker/u24:1 Tainted: G OE 6.13.0-rc1-custom+ #89 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : can_finish_ordered_extent+0x370/0x3b8 [btrfs] lr : can_finish_ordered_extent+0x1ec/0x3b8 [btrfs] Call trace: can_finish_ordered_extent+0x370/0x3b8 [btrfs] (P) can_finish_ordered_extent+0x1ec/0x3b8 [btrfs] (L) btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs] extent_writepage+0x10c/0x3b8 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x160 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x74/0xa0 start_delalloc_inodes+0x17c/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x17c/0x288 [btrfs] shrink_delalloc+0x11c/0x280 [btrfs] flush_space+0x288/0x328 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x228/0x680 worker_thread+0x1bc/0x360 kthread+0x100/0x118 ret_from_fork+0x10/0x20 ---[ end trace 0000000000000000 ]--- BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1605632 OE len=16384 to_dec=16384 left=0 BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1622016 OE len=12288 to_dec=12288 left=0 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1634304 OE len=8192 to_dec=4096 left=0 CPU: 1 UID: 0 PID: 3286940 Comm: kworker/u24:3 Tainted: G W OE 6.13.0-rc1-custom+ #89 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: btrfs_work_helper [btrfs] (btrfs-endio-write) pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : process_one_work+0x110/0x680 lr : worker_thread+0x1bc/0x360 Call trace: process_one_work+0x110/0x680 (P) worker_thread+0x1bc/0x360 (L) worker_thread+0x1bc/0x360 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: f84086a1 f9000fe1 53041c21 b9003361 (f9400661) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 2-3 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x275bb9540000 from 0xffff800080000000 PHYS_OFFSET: 0xffff8fbba0000000 CPU features: 0x100,00000070,00801250,8201720b
[CAUSE] The above warning is triggered immediately after the delalloc range failure, this happens in the following sequence:
- Range [1568K, 1636K) is dirty
1536K 1568K 1600K 1636K 1664K | |/////////|////////| |
Where 1536K, 1600K and 1664K are page boundaries (64K page size)
- Enter extent_writepage() for page 1536K
- Enter run_delalloc_nocow() with locked page 1536K and range [1568K, 1636K) This is due to the inode having preallocated extents.
- Enter cow_file_range() with locked page 1536K and range [1568K, 1636K)
- btrfs_reserve_extent() only reserved two extents The main loop of cow_file_range() only reserved two data extents,
Now we have:
1536K 1568K 1600K 1636K 1664K | |<-->|<--->|/|///////| | 1584K 1596K Range [1568K, 1596K) has an ordered extent reserved.
- btrfs_reserve_extent() failed inside cow_file_range() for file offset 1596K This is already a bug in our space reservation code, but for now let's focus on the error handling path.
Now cow_file_range() returned -ENOSPC.
- btrfs_run_delalloc_range() do error cleanup <<< ROOT CAUSE Call btrfs_cleanup_ordered_extents() with locked folio 1536K and range [1568K, 1636K)
Function btrfs_cleanup_ordered_extents() normally needs to skip the ranges inside the folio, as it will normally be cleaned up by extent_writepage().
Such split error handling is already problematic in the first place.
What's worse is the folio range skipping itself, which is not taking subpage cases into consideration at all, it will only skip the range if the page start >= the range start. In our case, the page start < the range start, since for subpage cases we can have delalloc ranges inside the folio but not covering the folio.
So it doesn't skip the page range at all. This means all the ordered extents, both [1568K, 1584K) and [1584K, 1596K) will be marked as IOERR.
And these two ordered extents have no more pending ios, they are marked finished, and *QUEUED* to be deleted from the io tree.
- extent_writepage() do error cleanup Call btrfs_mark_ordered_io_finished() for the range [1536K, 1600K).
Although ranges [1568K, 1584K) and [1584K, 1596K) are finished, the deletion from io tree is async, it may or may not happen at this time.
If the ranges have not yet been removed, we will do double cleaning on those ranges, triggering the above ordered extent warnings.
In theory there are other bugs, like the cleanup in extent_writepage() can cause double accounting on ranges that are submitted asynchronously (compression for example).
But that's much harder to trigger because normally we do not mix regular and compression delalloc ranges.
[FIX] The folio range split is already buggy and not subpage compatible, it was introduced a long time ago where subpage support was not even considered.
So instead of splitting the ordered extents cleanup into the folio range and out of folio range, do all the cleanup inside writepage_delalloc().
- Pass @NULL as locked_folio for btrfs_cleanup_ordered_extents() in btrfs_run_delalloc_range()
- Skip the btrfs_cleanup_ordered_extents() if writepage_delalloc() failed
So all ordered extents are only cleaned up by btrfs_run_delalloc_range().
- Handle the ranges that already have ordered extents allocated If part of the folio already has ordered extent allocated, and btrfs_run_delalloc_range() failed, we also need to cleanup that range.
Now we have a concentrated error handling for ordered extents during btrfs_run_delalloc_range().
Fixes: d1051d6ebf8e ("btrfs: Fix error handling in btrfs_cleanup_ordered_extents") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8bf334beb349 ("btrfs: fix double accounting race when extent_writepage_io() failed") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 59 +++++++++++++++++++++++++++++++++++--------- fs/btrfs/inode.c | 3 +-- 2 files changed, 49 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e8f882f949051..15cbb2a865e5e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1145,14 +1145,19 @@ static bool find_next_delalloc_bitmap(struct folio *folio, }
/* - * helper for extent_writepage(), doing all of the delayed allocation setup. + * Do all of the delayed allocation setup. * - * This returns 1 if btrfs_run_delalloc_range function did all the work required - * to write the page (copy into inline extent). In this case the IO has - * been started and the page is already unlocked. + * Return >0 if all the dirty blocks are submitted async (compression) or inlined. + * The @folio should no longer be touched (treat it as already unlocked). * - * This returns 0 if all went well (page still locked) - * This returns < 0 if there were errors (page still locked) + * Return 0 if there is still dirty block that needs to be submitted through + * extent_writepage_io(). + * bio_ctrl->submit_bitmap will indicate which blocks of the folio should be + * submitted, and @folio is still kept locked. + * + * Return <0 if there is any error hit. + * Any allocated ordered extent range covering this folio will be marked + * finished (IOERR), and @folio is still kept locked. */ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, struct folio *folio, @@ -1170,6 +1175,16 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, * last delalloc end. */ u64 last_delalloc_end = 0; + /* + * The range end (exclusive) of the last successfully finished delalloc + * range. + * Any range covered by ordered extent must either be manually marked + * finished (error handling), or has IO submitted (and finish the + * ordered extent normally). + * + * This records the end of ordered extent cleanup if we hit an error. + */ + u64 last_finished_delalloc_end = page_start; u64 delalloc_start = page_start; u64 delalloc_end = page_end; u64 delalloc_to_write = 0; @@ -1238,11 +1253,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, found_len = last_delalloc_end + 1 - found_start;
if (ret >= 0) { + /* + * Some delalloc range may be created by previous folios. + * Thus we still need to clean up this range during error + * handling. + */ + last_finished_delalloc_end = found_start; /* No errors hit so far, run the current delalloc range. */ ret = btrfs_run_delalloc_range(inode, folio, found_start, found_start + found_len - 1, wbc); + if (ret >= 0) + last_finished_delalloc_end = found_start + found_len; } else { /* * We've hit an error during previous delalloc range, @@ -1277,8 +1300,22 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
delalloc_start = found_start + found_len; } - if (ret < 0) + /* + * It's possible we had some ordered extents created before we hit + * an error, cleanup non-async successfully created delalloc ranges. + */ + if (unlikely(ret < 0)) { + unsigned int bitmap_size = min( + (last_finished_delalloc_end - page_start) >> + fs_info->sectorsize_bits, + fs_info->sectors_per_page); + + for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size) + btrfs_mark_ordered_io_finished(inode, folio, + page_start + (bit << fs_info->sectorsize_bits), + fs_info->sectorsize, false); return ret; + } out: if (last_delalloc_end) delalloc_end = last_delalloc_end; @@ -1512,13 +1549,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
-done: - if (ret) { + if (ret) btrfs_mark_ordered_io_finished(inode, folio, page_start, PAGE_SIZE, !ret); - mapping_set_error(folio->mapping, ret); - }
+done: + if (ret < 0) + mapping_set_error(folio->mapping, ret); /* * Only unlock ranges that are submitted. As there can be some async * submitted ranges inside the folio. diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f7e7d864f4144..5b842276573e8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2419,8 +2419,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
out: if (ret < 0) - btrfs_cleanup_ordered_extents(inode, locked_folio, start, - end - start + 1); + btrfs_cleanup_ordered_extents(inode, NULL, start, end - start + 1); return ret; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 8bf334beb3496da3c3fbf3daf3856f7eec70dacc ]
[BUG] If submit_one_sector() failed inside extent_writepage_io() for sector size < page size cases (e.g. 4K sector size and 64K page size), then we can hit double ordered extent accounting error.
This should be very rare, as submit_one_sector() only fails when we failed to grab the extent map, and such extent map should exist inside the memory and has been pinned.
[CAUSE] For example we have the following folio layout:
0 4K 32K 48K 60K 64K |//| |//////| |///|
Where |///| is the dirty range we need to writeback. The 3 different dirty ranges are submitted for regular COW.
Now we hit the following sequence:
- submit_one_sector() returned 0 for [0, 4K)
- submit_one_sector() returned 0 for [32K, 48K)
- submit_one_sector() returned error for [60K, 64K)
- btrfs_mark_ordered_io_finished() called for the whole folio This will mark the following ranges as finished: * [0, 4K) * [32K, 48K) Both ranges have their IO already submitted, this cleanup will lead to double accounting.
* [60K, 64K) That's the correct cleanup.
The only good news is, this error is only theoretical, as the target extent map is always pinned, thus we should directly grab it from memory, other than reading it from the disk.
[FIX] Instead of calling btrfs_mark_ordered_io_finished() for the whole folio range, which can touch ranges we should not touch, instead move the error handling inside extent_writepage_io().
So that we can cleanup exact sectors that ought to be submitted but failed.
This provides much more accurate cleanup, avoiding the double accounting.
CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 15cbb2a865e5e..660a5b9c08e9e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1431,6 +1431,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, struct btrfs_fs_info *fs_info = inode->root->fs_info; unsigned long range_bitmap = 0; bool submitted_io = false; + bool error = false; const u64 folio_start = folio_pos(folio); u64 cur; int bit; @@ -1473,11 +1474,26 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, break; } ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size); - if (ret < 0) - goto out; + if (unlikely(ret < 0)) { + /* + * bio_ctrl may contain a bio crossing several folios. + * Submit it immediately so that the bio has a chance + * to finish normally, other than marked as error. + */ + submit_one_bio(bio_ctrl); + /* + * Failed to grab the extent map which should be very rare. + * Since there is no bio submitted to finish the ordered + * extent, we have to manually finish this sector. + */ + btrfs_mark_ordered_io_finished(inode, folio, cur, + fs_info->sectorsize, false); + error = true; + continue; + } submitted_io = true; } -out: + /* * If we didn't submitted any sector (>= i_size), folio dirty get * cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared @@ -1485,8 +1501,11 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, * * Here we set writeback and clear for the range. If the full folio * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag. + * + * If we hit any error, the corresponding sector will still be dirty + * thus no need to clear PAGECACHE_TAG_DIRTY. */ - if (!submitted_io) { + if (!submitted_io && !error) { btrfs_folio_set_writeback(fs_info, folio, start, len); btrfs_folio_clear_writeback(fs_info, folio, start, len); } @@ -1506,7 +1525,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl { struct btrfs_inode *inode = BTRFS_I(folio->mapping->host); struct btrfs_fs_info *fs_info = inode->root->fs_info; - const u64 page_start = folio_pos(folio); int ret; size_t pg_offset; loff_t i_size = i_size_read(&inode->vfs_inode); @@ -1549,10 +1567,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
- if (ret) - btrfs_mark_ordered_io_finished(inode, folio, - page_start, PAGE_SIZE, !ret); - done: if (ret < 0) mapping_set_error(folio->mapping, ret); @@ -2312,11 +2326,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f if (ret == 1) goto next_page;
- if (ret) { - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, - cur, cur_len, !ret); + if (ret) mapping_set_error(mapping, ret); - } btrfs_folio_end_lock(fs_info, folio, cur, cur_len); if (ret < 0) found_error = true;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit d91060e342a66b52d9bd64f0b123b9c306293b76 ]
Access KVM's emulated APIC base MSR value directly instead of bouncing through a helper, as there is no reason to add a layer of indirection, and there are other MSRs with a "set" but no "get", e.g. EFER.
No functional change intended.
Reviewed-by: Kai Huang kai.huang@intel.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Link: https://lore.kernel.org/r/20241009181742.1128779-4-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-4-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Stable-dep-of: 04bc93cf49d1 ("KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/lapic.h | 1 - arch/x86/kvm/x86.c | 13 ++++--------- 2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 1b8ef9856422a..441abc4f4afd9 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -117,7 +117,6 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map); void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
-u64 kvm_get_apic_base(struct kvm_vcpu *vcpu); int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0846e3af5f6c5..36bedf235340c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -667,14 +667,9 @@ static void drop_user_return_notifiers(void) kvm_on_user_return(&msrs->urn); }
-u64 kvm_get_apic_base(struct kvm_vcpu *vcpu) -{ - return vcpu->arch.apic_base; -} - enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu) { - return kvm_apic_mode(kvm_get_apic_base(vcpu)); + return kvm_apic_mode(vcpu->arch.apic_base); } EXPORT_SYMBOL_GPL(kvm_get_apic_mode);
@@ -4314,7 +4309,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = 1 << 24; break; case MSR_IA32_APICBASE: - msr_info->data = kvm_get_apic_base(vcpu); + msr_info->data = vcpu->arch.apic_base; break; case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff: return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data); @@ -10159,7 +10154,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
kvm_run->if_flag = kvm_x86_call(get_if_flag)(vcpu); kvm_run->cr8 = kvm_get_cr8(vcpu); - kvm_run->apic_base = kvm_get_apic_base(vcpu); + kvm_run->apic_base = vcpu->arch.apic_base;
kvm_run->ready_for_interrupt_injection = pic_in_kernel(vcpu->kvm) || @@ -11718,7 +11713,7 @@ static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) sregs->cr4 = kvm_read_cr4(vcpu); sregs->cr8 = kvm_get_cr8(vcpu); sregs->efer = vcpu->arch.efer; - sregs->apic_base = kvm_get_apic_base(vcpu); + sregs->apic_base = vcpu->arch.apic_base; }
static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
[ Upstream commit adfec1f4591cf8c69664104eaf41e06b2e7b767e ]
Inline kvm_get_apic_mode() in lapic.h to avoid a CALL+RET as well as an export. The underlying kvm_apic_mode() helper is public information, i.e. there is no state/information that needs to be hidden from vendor modules.
No functional change intended.
Reviewed-by: Kai Huang kai.huang@intel.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Link: https://lore.kernel.org/r/20241009181742.1128779-5-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-5-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Stable-dep-of: 04bc93cf49d1 ("KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/lapic.h | 6 +++++- arch/x86/kvm/x86.c | 6 ------ 2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 441abc4f4afd9..fc4bd36d44cfc 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -120,7 +120,6 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high); int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); -enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); @@ -270,6 +269,11 @@ static inline enum lapic_mode kvm_apic_mode(u64 apic_base) return apic_base & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE); }
+static inline enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu) +{ + return kvm_apic_mode(vcpu->arch.apic_base); +} + static inline u8 kvm_xapic_id(struct kvm_lapic *apic) { return kvm_lapic_get_reg(apic, APIC_ID) >> 24; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 36bedf235340c..b67a2f46e40b0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -667,12 +667,6 @@ static void drop_user_return_notifiers(void) kvm_on_user_return(&msrs->urn); }
-enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu) -{ - return kvm_apic_mode(vcpu->arch.apic_base); -} -EXPORT_SYMBOL_GPL(kvm_get_apic_mode); - int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { enum lapic_mode old_mode = kvm_get_apic_mode(vcpu);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chao Gao chao.gao@intel.com
[ Upstream commit 04bc93cf49d16d01753b95ddb5d4f230b809a991 ]
If KVM emulates an EOI for L1's virtual APIC while L2 is active, defer updating GUEST_INTERUPT_STATUS.SVI, i.e. the VMCS's cache of the highest in-service IRQ, until L1 is active, as vmcs01, not vmcs02, needs to track vISR. The missed SVI update for vmcs01 can result in L1 interrupts being incorrectly blocked, e.g. if there is a pending interrupt with lower priority than the interrupt that was EOI'd.
This bug only affects use cases where L1's vAPIC is effectively passed through to L2, e.g. in a pKVM scenario where L2 is L1's depriveleged host, as KVM will only emulate an EOI for L1's vAPIC if Virtual Interrupt Delivery (VID) is disabled in vmc12, and L1 isn't intercepting L2 accesses to its (virtual) APIC page (or if x2APIC is enabled, the EOI MSR).
WARN() if KVM updates L1's ISR while L2 is active with VID enabled, as an EOI from L2 is supposed to affect L2's vAPIC, but still defer the update, to try to keep L1 alive. Specifically, KVM forwards all APICv-related VM-Exits to L1 via nested_vmx_l1_wants_exit():
case EXIT_REASON_APIC_ACCESS: case EXIT_REASON_APIC_WRITE: case EXIT_REASON_EOI_INDUCED: /* * The controls for "virtualize APIC accesses," "APIC- * register virtualization," and "virtual-interrupt * delivery" only come from vmcs12. */ return true;
Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com Reported-by: Markku Ahvenjärvi mankku@gmail.com Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com Cc: Janne Karhunen janne.karhunen@gmail.com Signed-off-by: Chao Gao chao.gao@intel.com [sean: drop request, handle in VMX, write changelog] Tested-by: Chao Gao chao.gao@intel.com Link: https://lore.kernel.org/r/20241128000010.4051275-3-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/lapic.c | 11 +++++++++++ arch/x86/kvm/lapic.h | 1 + arch/x86/kvm/vmx/nested.c | 5 +++++ arch/x86/kvm/vmx/vmx.c | 21 +++++++++++++++++++++ arch/x86/kvm/vmx/vmx.h | 1 + 5 files changed, 39 insertions(+)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 375bbb9600d3c..1a8148dec4afe 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -816,6 +816,17 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic *apic) } }
+void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu->arch.apic; + + if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)) || !apic->apicv_active) + return; + + kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic)); +} +EXPORT_SYMBOL_GPL(kvm_apic_update_hwapic_isr); + int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { /* This may race with setting of irr in __apic_accept_irq() and diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index fc4bd36d44cfc..3aa599db77968 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -120,6 +120,7 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high); int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); +void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 931a7361c30f2..22bee8a711442 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5043,6 +5043,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); }
+ if (vmx->nested.update_vmcs01_hwapic_isr) { + vmx->nested.update_vmcs01_hwapic_isr = false; + kvm_apic_update_hwapic_isr(vcpu); + } + if ((vm_exit_reason != -1) && (enable_shadow_vmcs || nested_vmx_is_evmptr12_valid(vmx))) vmx->nested.need_vmcs12_to_shadow_sync = true; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f06d443ec3c68..1af30e3472cdd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6858,6 +6858,27 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) u16 status; u8 old;
+ /* + * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI + * is only relevant for if and only if Virtual Interrupt Delivery is + * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's + * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested + * VM-Exit, otherwise L1 with run with a stale SVI. + */ + if (is_guest_mode(vcpu)) { + /* + * KVM is supposed to forward intercepted L2 EOIs to L1 if VID + * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC. + * Note, userspace can stuff state while L2 is active; assert + * that VID is disabled if and only if the vCPU is in KVM_RUN + * to avoid false positives if userspace is setting APIC state. + */ + WARN_ON_ONCE(vcpu->wants_to_run && + nested_cpu_has_vid(get_vmcs12(vcpu))); + to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true; + return; + } + if (max_isr == -1) max_isr = 0;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 2325f773a20be..41bf59bbc6426 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -176,6 +176,7 @@ struct nested_vmx { bool reload_vmcs01_apic_access_page; bool update_vmcs01_cpu_dirty_logging; bool update_vmcs01_apicv_status; + bool update_vmcs01_hwapic_isr;
/* * Enlightened VMCS has been enabled. It does not mean that L1 has to
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lohita Mudimela lohita.mudimela@amd.com
[ Upstream commit b04200432c4730c9bb730a66be46551c83d60263 ]
[Why] For Header related changes for core
[How] Refactoring if and endif statements to enable DC_LOGGER
Reviewed-by: Mounika Adhuri mounika.adhuri@amd.com Reviewed-by: Alvin Lee alvin.lee2@amd.com Signed-off-by: Lohita Mudimela lohita.mudimela@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: f88192d2335b ("drm/amd/display: Correct register address in dcn35") Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c | 5 +++-- .../gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c | 6 +++--- .../gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 1 + .../drm/amd/display/dc/link/protocols/link_dp_capability.c | 3 ++- 4 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c index e93df3d6222e6..bc123f1884da3 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c @@ -50,12 +50,13 @@ #include "link.h"
#include "logger_types.h" + + +#include "yellow_carp_offset.h" #undef DC_LOGGER #define DC_LOGGER \ clk_mgr->base.base.ctx->logger
-#include "yellow_carp_offset.h" - #define regCLK1_CLK_PLL_REQ 0x0237 #define regCLK1_CLK_PLL_REQ_BASE_IDX 0
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c index 29eff386505ab..91d872d6d392b 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c @@ -53,9 +53,6 @@
#include "logger_types.h" -#undef DC_LOGGER -#define DC_LOGGER \ - clk_mgr->base.base.ctx->logger
#define MAX_INSTANCE 7 @@ -77,6 +74,9 @@ static const struct IP_BASE CLK_BASE = { { { { 0x00016C00, 0x02401800, 0, 0, 0, { { 0x0001B200, 0x0242DC00, 0, 0, 0, 0, 0, 0 } }, { { 0x0001B400, 0x0242E000, 0, 0, 0, 0, 0, 0 } } } };
+#undef DC_LOGGER +#define DC_LOGGER \ + clk_mgr->base.base.ctx->logger #define regCLK1_CLK_PLL_REQ 0x0237 #define regCLK1_CLK_PLL_REQ_BASE_IDX 0
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c index 3bd0d46c17010..bbdc39ae57b9d 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c @@ -55,6 +55,7 @@ #define DC_LOGGER \ clk_mgr->base.base.ctx->logger
+ #define regCLK1_CLK_PLL_REQ 0x0237 #define regCLK1_CLK_PLL_REQ_BASE_IDX 0
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c index d78c8ec4de79e..885e749cdc6e9 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c @@ -51,9 +51,10 @@ #include "dc_dmub_srv.h" #include "gpio_service_interface.h"
+#define DC_TRACE_LEVEL_MESSAGE(...) /* do nothing */ + #define DC_LOGGER \ link->ctx->logger -#define DC_TRACE_LEVEL_MESSAGE(...) /* do nothing */
#ifndef MAX #define MAX(X, Y) ((X) > (Y) ? (X) : (Y))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charlene Liu Charlene.Liu@amd.com
[ Upstream commit a1fc2837f4960e84e9375e12292584ad2ae472da ]
[why] hw register offset delta
Reviewed-by: Martin Leung martin.leung@amd.com Signed-off-by: Charlene Liu Charlene.Liu@amd.com Signed-off-by: Aurabindo Pillai aurabindo.pillai@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: f88192d2335b ("drm/amd/display: Correct register address in dcn35") Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/display/dc/clk_mgr/Makefile | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 5 +- .../display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c | 140 ++++++++++++++++++ .../display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 132 +++++++++++++---- .../display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h | 4 + .../amd/display/dc/inc/hw/clk_mgr_internal.h | 59 ++++++++ 6 files changed, 308 insertions(+), 34 deletions(-) create mode 100644 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile index ab1132bc896a3..d9955c5d2e5ed 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile @@ -174,7 +174,7 @@ AMD_DISPLAY_FILES += $(AMD_DAL_CLK_MGR_DCN32) ############################################################################### # DCN35 ############################################################################### -CLK_MGR_DCN35 = dcn35_smu.o dcn35_clk_mgr.o +CLK_MGR_DCN35 = dcn35_smu.o dcn351_clk_mgr.o dcn35_clk_mgr.o
AMD_DAL_CLK_MGR_DCN35 = $(addprefix $(AMDDALPATH)/dc/clk_mgr/dcn35/,$(CLK_MGR_DCN35))
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c index 0e243f4344d05..4c3e58c730b11 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c @@ -355,8 +355,11 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, struct pp_smu_funcs *p BREAK_TO_DEBUGGER(); return NULL; } + if (ctx->dce_version == DCN_VERSION_3_51) + dcn351_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); + else + dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
- dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return &clk_mgr->base.base; } break; diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c new file mode 100644 index 0000000000000..6a6ae618650b6 --- /dev/null +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c @@ -0,0 +1,140 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + * Authors: AMD + * + */ + +#include "core_types.h" +#include "dcn35_clk_mgr.h" + +#define DCN_BASE__INST0_SEG1 0x000000C0 +#define mmCLK1_CLK_PLL_REQ 0x16E37 + +#define mmCLK1_CLK0_DFS_CNTL 0x16E69 +#define mmCLK1_CLK1_DFS_CNTL 0x16E6C +#define mmCLK1_CLK2_DFS_CNTL 0x16E6F +#define mmCLK1_CLK3_DFS_CNTL 0x16E72 +#define mmCLK1_CLK4_DFS_CNTL 0x16E75 +#define mmCLK1_CLK5_DFS_CNTL 0x16E78 + +#define mmCLK1_CLK0_CURRENT_CNT 0x16EFC +#define mmCLK1_CLK1_CURRENT_CNT 0x16EFD +#define mmCLK1_CLK2_CURRENT_CNT 0x16EFE +#define mmCLK1_CLK3_CURRENT_CNT 0x16EFF +#define mmCLK1_CLK4_CURRENT_CNT 0x16F00 +#define mmCLK1_CLK5_CURRENT_CNT 0x16F01 + +#define mmCLK1_CLK0_BYPASS_CNTL 0x16E8A +#define mmCLK1_CLK1_BYPASS_CNTL 0x16E93 +#define mmCLK1_CLK2_BYPASS_CNTL 0x16E9C +#define mmCLK1_CLK3_BYPASS_CNTL 0x16EA5 +#define mmCLK1_CLK4_BYPASS_CNTL 0x16EAE +#define mmCLK1_CLK5_BYPASS_CNTL 0x16EB7 + +#define mmCLK1_CLK0_DS_CNTL 0x16E83 +#define mmCLK1_CLK1_DS_CNTL 0x16E8C +#define mmCLK1_CLK2_DS_CNTL 0x16E95 +#define mmCLK1_CLK3_DS_CNTL 0x16E9E +#define mmCLK1_CLK4_DS_CNTL 0x16EA7 +#define mmCLK1_CLK5_DS_CNTL 0x16EB0 + +#define mmCLK1_CLK0_ALLOW_DS 0x16E84 +#define mmCLK1_CLK1_ALLOW_DS 0x16E8D +#define mmCLK1_CLK2_ALLOW_DS 0x16E96 +#define mmCLK1_CLK3_ALLOW_DS 0x16E9F +#define mmCLK1_CLK4_ALLOW_DS 0x16EA8 +#define mmCLK1_CLK5_ALLOW_DS 0x16EB1 + +#define mmCLK5_spll_field_8 0x1B04B +#define mmDENTIST_DISPCLK_CNTL 0x0124 +#define regDENTIST_DISPCLK_CNTL 0x0064 +#define regDENTIST_DISPCLK_CNTL_BASE_IDX 1 + +#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 +#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc +#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 +#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL +#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L +#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L + +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L + +// DENTIST_DISPCLK_CNTL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER__SHIFT 0x0 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER__SHIFT 0x8 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE__SHIFT 0x13 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE__SHIFT 0x14 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER__SHIFT 0x18 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER_MASK 0x0000007FL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER_MASK 0x00007F00L +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE_MASK 0x00080000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE_MASK 0x00100000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER_MASK 0x7F000000L + +#define CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L + +#define REG(reg) \ + (clk_mgr->regs->reg) + +#define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg + +#define BASE(seg) BASE_INNER(seg) + +#define SR(reg_name)\ + .reg_name = BASE(reg ## reg_name ## _BASE_IDX) + \ + reg ## reg_name + +#define CLK_SR_DCN35(reg_name)\ + .reg_name = mm ## reg_name + +static const struct clk_mgr_registers clk_mgr_regs_dcn351 = { + CLK_REG_LIST_DCN35() +}; + +static const struct clk_mgr_shift clk_mgr_shift_dcn351 = { + CLK_COMMON_MASK_SH_LIST_DCN32(__SHIFT) +}; + +static const struct clk_mgr_mask clk_mgr_mask_dcn351 = { + CLK_COMMON_MASK_SH_LIST_DCN32(_MASK) +}; + +#define TO_CLK_MGR_DCN35(clk_mgr)\ + container_of(clk_mgr, struct clk_mgr_dcn35, base) + + +void dcn351_clk_mgr_construct( + struct dc_context *ctx, + struct clk_mgr_dcn35 *clk_mgr, + struct pp_smu_funcs *pp_smu, + struct dccg *dccg) +{ + /*register offset changed*/ + clk_mgr->base.regs = &clk_mgr_regs_dcn351; + clk_mgr->base.clk_mgr_shift = &clk_mgr_shift_dcn351; + clk_mgr->base.clk_mgr_mask = &clk_mgr_mask_dcn351; + + dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); + +} + + diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c index bbdc39ae57b9d..d8a4cdbb5495d 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c @@ -36,15 +36,11 @@ #include "dcn20/dcn20_clk_mgr.h"
- - #include "reg_helper.h" #include "core_types.h" #include "dcn35_smu.h" #include "dm_helpers.h"
-/* TODO: remove this include once we ported over remaining clk mgr functions*/ -#include "dcn30/dcn30_clk_mgr.h" #include "dcn31/dcn31_clk_mgr.h"
#include "dc_dmub_srv.h" @@ -55,35 +51,102 @@ #define DC_LOGGER \ clk_mgr->base.base.ctx->logger
+#define DCN_BASE__INST0_SEG1 0x000000C0 +#define mmCLK1_CLK_PLL_REQ 0x16E37 + +#define mmCLK1_CLK0_DFS_CNTL 0x16E69 +#define mmCLK1_CLK1_DFS_CNTL 0x16E6C +#define mmCLK1_CLK2_DFS_CNTL 0x16E6F +#define mmCLK1_CLK3_DFS_CNTL 0x16E72 +#define mmCLK1_CLK4_DFS_CNTL 0x16E75 +#define mmCLK1_CLK5_DFS_CNTL 0x16E78 + +#define mmCLK1_CLK0_CURRENT_CNT 0x16EFB +#define mmCLK1_CLK1_CURRENT_CNT 0x16EFC +#define mmCLK1_CLK2_CURRENT_CNT 0x16EFD +#define mmCLK1_CLK3_CURRENT_CNT 0x16EFE +#define mmCLK1_CLK4_CURRENT_CNT 0x16EFF +#define mmCLK1_CLK5_CURRENT_CNT 0x16F00 + +#define mmCLK1_CLK0_BYPASS_CNTL 0x16E8A +#define mmCLK1_CLK1_BYPASS_CNTL 0x16E93 +#define mmCLK1_CLK2_BYPASS_CNTL 0x16E9C +#define mmCLK1_CLK3_BYPASS_CNTL 0x16EA5 +#define mmCLK1_CLK4_BYPASS_CNTL 0x16EAE +#define mmCLK1_CLK5_BYPASS_CNTL 0x16EB7 + +#define mmCLK1_CLK0_DS_CNTL 0x16E83 +#define mmCLK1_CLK1_DS_CNTL 0x16E8C +#define mmCLK1_CLK2_DS_CNTL 0x16E95 +#define mmCLK1_CLK3_DS_CNTL 0x16E9E +#define mmCLK1_CLK4_DS_CNTL 0x16EA7 +#define mmCLK1_CLK5_DS_CNTL 0x16EB0 + +#define mmCLK1_CLK0_ALLOW_DS 0x16E84 +#define mmCLK1_CLK1_ALLOW_DS 0x16E8D +#define mmCLK1_CLK2_ALLOW_DS 0x16E96 +#define mmCLK1_CLK3_ALLOW_DS 0x16E9F +#define mmCLK1_CLK4_ALLOW_DS 0x16EA8 +#define mmCLK1_CLK5_ALLOW_DS 0x16EB1 + +#define mmCLK5_spll_field_8 0x1B04B +#define mmDENTIST_DISPCLK_CNTL 0x0124 +#define regDENTIST_DISPCLK_CNTL 0x0064 +#define regDENTIST_DISPCLK_CNTL_BASE_IDX 1 + +#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 +#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc +#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 +#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL +#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L +#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L + +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK 0x000F0000L +// DENTIST_DISPCLK_CNTL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER__SHIFT 0x0 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER__SHIFT 0x8 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE__SHIFT 0x13 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE__SHIFT 0x14 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER__SHIFT 0x18 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER_MASK 0x0000007FL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER_MASK 0x00007F00L +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE_MASK 0x00080000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE_MASK 0x00100000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER_MASK 0x7F000000L + +#define CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L + +#define SMU_VER_THRESHOLD 0x5D4A00 //93.74.0 +#undef FN +#define FN(reg_name, field_name) \ + clk_mgr->clk_mgr_shift->field_name, clk_mgr->clk_mgr_mask->field_name
-#define regCLK1_CLK_PLL_REQ 0x0237 -#define regCLK1_CLK_PLL_REQ_BASE_IDX 0 +#define REG(reg) \ + (clk_mgr->regs->reg)
-#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 -#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc -#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 -#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL -#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L -#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L +#define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg
-#define regCLK1_CLK2_BYPASS_CNTL 0x029c -#define regCLK1_CLK2_BYPASS_CNTL_BASE_IDX 0 +#define BASE(seg) BASE_INNER(seg)
-#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL__SHIFT 0x0 -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV__SHIFT 0x10 -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK 0x000F0000L +#define SR(reg_name)\ + .reg_name = BASE(reg ## reg_name ## _BASE_IDX) + \ + reg ## reg_name
-#define regCLK5_0_CLK5_spll_field_8 0x464b -#define regCLK5_0_CLK5_spll_field_8_BASE_IDX 0 +#define CLK_SR_DCN35(reg_name)\ + .reg_name = mm ## reg_name
-#define CLK5_0_CLK5_spll_field_8__spll_ssc_en__SHIFT 0xd -#define CLK5_0_CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L +static const struct clk_mgr_registers clk_mgr_regs_dcn35 = { + CLK_REG_LIST_DCN35() +};
-#define SMU_VER_THRESHOLD 0x5D4A00 //93.74.0 +static const struct clk_mgr_shift clk_mgr_shift_dcn35 = { + CLK_COMMON_MASK_SH_LIST_DCN32(__SHIFT) +};
-#define REG(reg_name) \ - (ctx->clk_reg_offsets[reg ## reg_name ## _BASE_IDX] + reg ## reg_name) +static const struct clk_mgr_mask clk_mgr_mask_dcn35 = { + CLK_COMMON_MASK_SH_LIST_DCN32(_MASK) +};
#define TO_CLK_MGR_DCN35(clk_mgr)\ container_of(clk_mgr, struct clk_mgr_dcn35, base) @@ -444,7 +507,6 @@ static int get_vco_frequency_from_reg(struct clk_mgr_internal *clk_mgr) struct fixed31_32 pll_req; unsigned int fbmult_frac_val = 0; unsigned int fbmult_int_val = 0; - struct dc_context *ctx = clk_mgr->base.ctx;
/* * Register value of fbmult is in 8.16 format, we are converting to 314.32 @@ -504,12 +566,12 @@ static void dcn35_dump_clk_registers(struct clk_state_registers_and_bypass *regs static bool dcn35_is_spll_ssc_enabled(struct clk_mgr *clk_mgr_base) { struct clk_mgr_internal *clk_mgr = TO_CLK_MGR_INTERNAL(clk_mgr_base); - struct dc_context *ctx = clk_mgr->base.ctx; + uint32_t ssc_enable;
- REG_GET(CLK5_0_CLK5_spll_field_8, spll_ssc_en, &ssc_enable); + ssc_enable = REG_READ(CLK5_spll_field_8) & CLK5_spll_field_8__spll_ssc_en_MASK;
- return ssc_enable == 1; + return ssc_enable != 0; }
static void init_clk_states(struct clk_mgr *clk_mgr) @@ -634,10 +696,10 @@ static struct dcn35_ss_info_table ss_info_table = {
static void dcn35_read_ss_info_from_lut(struct clk_mgr_internal *clk_mgr) { - struct dc_context *ctx = clk_mgr->base.ctx; - uint32_t clock_source; + uint32_t clock_source = 0; + + clock_source = REG_READ(CLK1_CLK2_BYPASS_CNTL) & CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK;
- REG_GET(CLK1_CLK2_BYPASS_CNTL, CLK2_BYPASS_SEL, &clock_source); // If it's DFS mode, clock_source is 0. if (dcn35_is_spll_ssc_enabled(&clk_mgr->base) && (clock_source < ARRAY_SIZE(ss_info_table.ss_percentage))) { clk_mgr->dprefclk_ss_percentage = ss_info_table.ss_percentage[clock_source]; @@ -1107,6 +1169,12 @@ void dcn35_clk_mgr_construct( clk_mgr->base.dprefclk_ss_divider = 1000; clk_mgr->base.ss_on_dprefclk = false; clk_mgr->base.dfs_ref_freq_khz = 48000; + if (ctx->dce_version == DCN_VERSION_3_5) { + clk_mgr->base.regs = &clk_mgr_regs_dcn35; + clk_mgr->base.clk_mgr_shift = &clk_mgr_shift_dcn35; + clk_mgr->base.clk_mgr_mask = &clk_mgr_mask_dcn35; + } +
clk_mgr->smu_wm_set.wm_set = (struct dcn35_watermarks *)dm_helpers_allocate_gpu_mem( clk_mgr->base.base.ctx, diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h index 1203dc605b12c..a12a9bf90806e 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h @@ -60,4 +60,8 @@ void dcn35_clk_mgr_construct(struct dc_context *ctx,
void dcn35_clk_mgr_destroy(struct clk_mgr_internal *clk_mgr_int);
+void dcn351_clk_mgr_construct(struct dc_context *ctx, + struct clk_mgr_dcn35 *clk_mgr, + struct pp_smu_funcs *pp_smu, + struct dccg *dccg); #endif //__DCN35_CLK_MGR_H__ diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h b/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h index c2dd061892f4d..7a1ca1e98059b 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h @@ -166,6 +166,41 @@ enum dentist_divider_range { CLK_SR_DCN32(CLK1_CLK4_CURRENT_CNT), \ CLK_SR_DCN32(CLK4_CLK0_CURRENT_CNT)
+#define CLK_REG_LIST_DCN35() \ + CLK_SR_DCN35(CLK1_CLK_PLL_REQ), \ + CLK_SR_DCN35(CLK1_CLK0_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK1_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK2_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK3_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK4_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK5_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK0_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK1_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK2_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK3_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK4_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK5_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK0_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK1_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK2_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK3_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK4_BYPASS_CNTL),\ + CLK_SR_DCN35(CLK1_CLK5_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK0_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK1_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK2_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK3_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK4_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK5_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK0_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK1_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK2_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK3_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK4_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK5_ALLOW_DS), \ + CLK_SR_DCN35(CLK5_spll_field_8), \ + SR(DENTIST_DISPCLK_CNTL), \ + #define CLK_COMMON_MASK_SH_LIST_DCN32(mask_sh) \ CLK_COMMON_MASK_SH_LIST_DCN20_BASE(mask_sh),\ CLK_SF(CLK1_CLK_PLL_REQ, FbMult_int, mask_sh),\ @@ -236,6 +271,7 @@ struct clk_mgr_registers { uint32_t CLK1_CLK2_DFS_CNTL; uint32_t CLK1_CLK3_DFS_CNTL; uint32_t CLK1_CLK4_DFS_CNTL; + uint32_t CLK1_CLK5_DFS_CNTL; uint32_t CLK2_CLK2_DFS_CNTL;
uint32_t CLK1_CLK0_CURRENT_CNT; @@ -243,11 +279,34 @@ struct clk_mgr_registers { uint32_t CLK1_CLK2_CURRENT_CNT; uint32_t CLK1_CLK3_CURRENT_CNT; uint32_t CLK1_CLK4_CURRENT_CNT; + uint32_t CLK1_CLK5_CURRENT_CNT;
uint32_t CLK0_CLK0_DFS_CNTL; uint32_t CLK0_CLK1_DFS_CNTL; uint32_t CLK0_CLK3_DFS_CNTL; uint32_t CLK0_CLK4_DFS_CNTL; + uint32_t CLK1_CLK0_BYPASS_CNTL; + uint32_t CLK1_CLK1_BYPASS_CNTL; + uint32_t CLK1_CLK2_BYPASS_CNTL; + uint32_t CLK1_CLK3_BYPASS_CNTL; + uint32_t CLK1_CLK4_BYPASS_CNTL; + uint32_t CLK1_CLK5_BYPASS_CNTL; + + uint32_t CLK1_CLK0_DS_CNTL; + uint32_t CLK1_CLK1_DS_CNTL; + uint32_t CLK1_CLK2_DS_CNTL; + uint32_t CLK1_CLK3_DS_CNTL; + uint32_t CLK1_CLK4_DS_CNTL; + uint32_t CLK1_CLK5_DS_CNTL; + + uint32_t CLK1_CLK0_ALLOW_DS; + uint32_t CLK1_CLK1_ALLOW_DS; + uint32_t CLK1_CLK2_ALLOW_DS; + uint32_t CLK1_CLK3_ALLOW_DS; + uint32_t CLK1_CLK4_ALLOW_DS; + uint32_t CLK1_CLK5_ALLOW_DS; + uint32_t CLK5_spll_field_8; + };
struct clk_mgr_shift {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: loanchen lo-an.chen@amd.com
[ Upstream commit f88192d2335b5a911fcfa09338cc00624571ec5e ]
[Why] the offset address of mmCLK5_spll_field_8 was incorrect for dcn35 which causes SSC not to be enabled.
Reviewed-by: Charlene Liu charlene.liu@amd.com Signed-off-by: Lo-An Chen lo-an.chen@amd.com Signed-off-by: Zaeem Mohamed zaeem.mohamed@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c index d8a4cdbb5495d..7d0d8852ce8d2 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c @@ -89,7 +89,7 @@ #define mmCLK1_CLK4_ALLOW_DS 0x16EA8 #define mmCLK1_CLK5_ALLOW_DS 0x16EB1
-#define mmCLK5_spll_field_8 0x1B04B +#define mmCLK5_spll_field_8 0x1B24B #define mmDENTIST_DISPCLK_CNTL 0x0124 #define regDENTIST_DISPCLK_CNTL 0x0064 #define regDENTIST_DISPCLK_CNTL_BASE_IDX 1
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cheng Jiang quic_chejiang@quicinc.com
[ Upstream commit a4c5a468c6329bde7dfd46bacff2cbf5f8a8152e ]
Different connectivity boards may be attached to the same platform. For example, QCA6698-based boards can support either a two-antenna or three-antenna solution, both of which work on the sa8775p-ride platform. Due to differences in connectivity boards and variations in RF performance from different foundries, different NVM configurations are used based on the board ID.
Therefore, in the firmware-name property, if the NVM file has an extension, the NVM file will be used. Otherwise, the system will first try the .bNN (board ID) file, and if that fails, it will fall back to the .bin file.
Possible configurations: firmware-name = "QCA6698/hpnv21"; firmware-name = "QCA6698/hpnv21.bin";
Signed-off-by: Cheng Jiang quic_chejiang@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: a2fad248947d ("Bluetooth: qca: Fix poor RF performance for WCN6855") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 113 ++++++++++++++++++++++++++++---------- 1 file changed, 85 insertions(+), 28 deletions(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index dfbbac92242a8..5cb1fd1a0c7b5 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -272,6 +272,39 @@ int qca_send_pre_shutdown_cmd(struct hci_dev *hdev) } EXPORT_SYMBOL_GPL(qca_send_pre_shutdown_cmd);
+static bool qca_filename_has_extension(const char *filename) +{ + const char *suffix = strrchr(filename, '.'); + + /* File extensions require a dot, but not as the first or last character */ + if (!suffix || suffix == filename || *(suffix + 1) == '\0') + return 0; + + /* Avoid matching directories with names that look like files with extensions */ + return !strchr(suffix, '/'); +} + +static bool qca_get_alt_nvm_file(char *filename, size_t max_size) +{ + char fwname[64]; + const char *suffix; + + /* nvm file name has an extension, replace with .bin */ + if (qca_filename_has_extension(filename)) { + suffix = strrchr(filename, '.'); + strscpy(fwname, filename, suffix - filename + 1); + snprintf(fwname + (suffix - filename), + sizeof(fwname) - (suffix - filename), ".bin"); + /* If nvm file is already the default one, return false to skip the retry. */ + if (strcmp(fwname, filename) == 0) + return false; + + snprintf(filename, max_size, "%s", fwname); + return true; + } + return false; +} + static int qca_tlv_check_data(struct hci_dev *hdev, struct qca_fw_config *config, u8 *fw_data, size_t fw_size, @@ -564,6 +597,19 @@ static int qca_download_firmware(struct hci_dev *hdev, config->fwname, ret); return ret; } + } + /* If the board-specific file is missing, try loading the default + * one, unless that was attempted already. + */ + else if (config->type == TLV_TYPE_NVM && + qca_get_alt_nvm_file(config->fwname, sizeof(config->fwname))) { + bt_dev_info(hdev, "QCA Downloading %s", config->fwname); + ret = request_firmware(&fw, config->fwname, &hdev->dev); + if (ret) { + bt_dev_err(hdev, "QCA Failed to request file: %s (%d)", + config->fwname, ret); + return ret; + } } else { bt_dev_err(hdev, "QCA Failed to request file: %s (%d)", config->fwname, ret); @@ -700,34 +746,38 @@ static int qca_check_bdaddr(struct hci_dev *hdev, const struct qca_fw_config *co return 0; }
-static void qca_generate_hsp_nvm_name(char *fwname, size_t max_size, +static void qca_get_nvm_name_by_board(char *fwname, size_t max_size, + const char *stem, enum qca_btsoc_type soc_type, struct qca_btsoc_version ver, u8 rom_ver, u16 bid) { const char *variant; + const char *prefix;
- /* hsp gf chip */ - if ((le32_to_cpu(ver.soc_id) & QCA_HSP_GF_SOC_MASK) == QCA_HSP_GF_SOC_ID) - variant = "g"; - else - variant = ""; + /* Set the default value to variant and prefix */ + variant = ""; + prefix = "b";
- if (bid == 0x0) - snprintf(fwname, max_size, "qca/hpnv%02x%s.bin", rom_ver, variant); - else - snprintf(fwname, max_size, "qca/hpnv%02x%s.%x", rom_ver, variant, bid); -} + if (soc_type == QCA_QCA2066) + prefix = "";
-static inline void qca_get_nvm_name_generic(struct qca_fw_config *cfg, - const char *stem, u8 rom_ver, u16 bid) -{ - if (bid == 0x0) - snprintf(cfg->fwname, sizeof(cfg->fwname), "qca/%snv%02x.bin", stem, rom_ver); - else if (bid & 0xff00) - snprintf(cfg->fwname, sizeof(cfg->fwname), - "qca/%snv%02x.b%x", stem, rom_ver, bid); - else - snprintf(cfg->fwname, sizeof(cfg->fwname), - "qca/%snv%02x.b%02x", stem, rom_ver, bid); + if (soc_type == QCA_WCN6855 || soc_type == QCA_QCA2066) { + /* If the chip is manufactured by GlobalFoundries */ + if ((le32_to_cpu(ver.soc_id) & QCA_HSP_GF_SOC_MASK) == QCA_HSP_GF_SOC_ID) + variant = "g"; + } + + if (rom_ver != 0) { + if (bid == 0x0 || bid == 0xffff) + snprintf(fwname, max_size, "qca/%s%02x%s.bin", stem, rom_ver, variant); + else + snprintf(fwname, max_size, "qca/%s%02x%s.%s%02x", stem, rom_ver, + variant, prefix, bid); + } else { + if (bid == 0x0 || bid == 0xffff) + snprintf(fwname, max_size, "qca/%s%s.bin", stem, variant); + else + snprintf(fwname, max_size, "qca/%s%s.%s%02x", stem, variant, prefix, bid); + } }
int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, @@ -816,8 +866,14 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, /* Download NVM configuration */ config.type = TLV_TYPE_NVM; if (firmware_name) { - snprintf(config.fwname, sizeof(config.fwname), - "qca/%s", firmware_name); + /* The firmware name has an extension, use it directly */ + if (qca_filename_has_extension(firmware_name)) { + snprintf(config.fwname, sizeof(config.fwname), "qca/%s", firmware_name); + } else { + qca_read_fw_board_id(hdev, &boardid); + qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), + firmware_name, soc_type, ver, 0, boardid); + } } else { switch (soc_type) { case QCA_WCN3990: @@ -836,8 +892,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/apnv%02x.bin", rom_ver); break; case QCA_QCA2066: - qca_generate_hsp_nvm_name(config.fwname, - sizeof(config.fwname), ver, rom_ver, boardid); + qca_get_nvm_name_by_board(config.fwname, + sizeof(config.fwname), "hpnv", soc_type, ver, + rom_ver, boardid); break; case QCA_QCA6390: snprintf(config.fwname, sizeof(config.fwname), @@ -852,9 +909,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/hpnv%02x.bin", rom_ver); break; case QCA_WCN7850: - qca_get_nvm_name_generic(&config, "hmt", rom_ver, boardid); + qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), + "hmtnv", soc_type, ver, rom_ver, boardid); break; - default: snprintf(config.fwname, sizeof(config.fwname), "qca/nvm_%08x.bin", soc_ver);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit a2fad248947d702ed3dcb52b8377c1a3ae201e44 ]
For WCN6855, board ID specific NVM needs to be downloaded once board ID is available, but the default NVM is always downloaded currently.
The wrong NVM causes poor RF performance, and effects user experience for several types of laptop with WCN6855 on the market.
Fix by downloading board ID specific NVM if board ID is available.
Fixes: 095327fede00 ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6855") Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Tested-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Johan Hovold johan+linaro@kernel.org Tested-by: Steev Klimaszewski steev@kali.org #Thinkpad X13s Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index 5cb1fd1a0c7b5..04d02c746ec0f 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -905,8 +905,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/msnv%02x.bin", rom_ver); break; case QCA_WCN6855: - snprintf(config.fwname, sizeof(config.fwname), - "qca/hpnv%02x.bin", rom_ver); + qca_read_fw_board_id(hdev, &boardid); + qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), + "hpnv", soc_type, ver, rom_ver, boardid); break; case QCA_WCN7850: qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit 0e45a09a1da0872786885c505467aab8fb29b5b4 ]
serio_pause_rx() and serio_continue_rx() are usually used together to temporarily stop receiving interrupts/data for a given serio port. Define "serio_pause_rx" guard for this so that the port is always resumed once critical section is over.
Example:
scoped_guard(serio_pause_rx, elo->serio) { elo->expected_packet = toupper(packet[0]); init_completion(&elo->cmd_done); }
Link: https://lore.kernel.org/r/20240905041732.2034348-2-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Stable-dep-of: 08bd5b7c9a24 ("Input: synaptics - fix crash when enabling pass-through port") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/serio.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/linux/serio.h b/include/linux/serio.h index bf2191f253509..69a47674af653 100644 --- a/include/linux/serio.h +++ b/include/linux/serio.h @@ -6,6 +6,7 @@ #define _SERIO_H
+#include <linux/cleanup.h> #include <linux/types.h> #include <linux/interrupt.h> #include <linux/list.h> @@ -161,4 +162,6 @@ static inline void serio_continue_rx(struct serio *serio) spin_unlock_irq(&serio->lock); }
+DEFINE_GUARD(serio_pause_rx, struct serio *, serio_pause_rx(_T), serio_continue_rx(_T)) + #endif
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit 08bd5b7c9a2401faabdaa1472d45c7de0755fd7e ]
When enabling a pass-through port an interrupt might come before psmouse driver binds to the pass-through port. However synaptics sub-driver tries to access psmouse instance presumably associated with the pass-through port to figure out if only 1 byte of response or entire protocol packet needs to be forwarded to the pass-through port and may crash if psmouse instance has not been attached to the port yet.
Fix the crash by introducing open() and close() methods for the port and check if the port is open before trying to access psmouse instance. Because psmouse calls serio_open() only after attaching psmouse instance to serio port instance this prevents the potential crash.
Reported-by: Takashi Iwai tiwai@suse.de Fixes: 100e16959c3c ("Input: libps2 - attach ps2dev instances as serio port's drvdata") Link: https://bugzilla.suse.com/show_bug.cgi?id=1219522 Cc: stable@vger.kernel.org Reviewed-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/Z4qSHORvPn7EU2j1@google.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/mouse/synaptics.c | 56 ++++++++++++++++++++++++--------- drivers/input/mouse/synaptics.h | 1 + 2 files changed, 43 insertions(+), 14 deletions(-)
diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c index 380aa1614442f..3d1459b551bb2 100644 --- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -667,23 +667,50 @@ static void synaptics_pt_stop(struct serio *serio) serio_continue_rx(parent->ps2dev.serio); }
+static int synaptics_pt_open(struct serio *serio) +{ + struct psmouse *parent = psmouse_from_serio(serio->parent); + struct synaptics_data *priv = parent->private; + + guard(serio_pause_rx)(parent->ps2dev.serio); + priv->pt_port_open = true; + + return 0; +} + +static void synaptics_pt_close(struct serio *serio) +{ + struct psmouse *parent = psmouse_from_serio(serio->parent); + struct synaptics_data *priv = parent->private; + + guard(serio_pause_rx)(parent->ps2dev.serio); + priv->pt_port_open = false; +} + static int synaptics_is_pt_packet(u8 *buf) { return (buf[0] & 0xFC) == 0x84 && (buf[3] & 0xCC) == 0xC4; }
-static void synaptics_pass_pt_packet(struct serio *ptport, u8 *packet) +static void synaptics_pass_pt_packet(struct synaptics_data *priv, u8 *packet) { - struct psmouse *child = psmouse_from_serio(ptport); + struct serio *ptport;
- if (child && child->state == PSMOUSE_ACTIVATED) { - serio_interrupt(ptport, packet[1], 0); - serio_interrupt(ptport, packet[4], 0); - serio_interrupt(ptport, packet[5], 0); - if (child->pktsize == 4) - serio_interrupt(ptport, packet[2], 0); - } else { - serio_interrupt(ptport, packet[1], 0); + ptport = priv->pt_port; + if (!ptport) + return; + + serio_interrupt(ptport, packet[1], 0); + + if (priv->pt_port_open) { + struct psmouse *child = psmouse_from_serio(ptport); + + if (child->state == PSMOUSE_ACTIVATED) { + serio_interrupt(ptport, packet[4], 0); + serio_interrupt(ptport, packet[5], 0); + if (child->pktsize == 4) + serio_interrupt(ptport, packet[2], 0); + } } }
@@ -722,6 +749,8 @@ static void synaptics_pt_create(struct psmouse *psmouse) serio->write = synaptics_pt_write; serio->start = synaptics_pt_start; serio->stop = synaptics_pt_stop; + serio->open = synaptics_pt_open; + serio->close = synaptics_pt_close; serio->parent = psmouse->ps2dev.serio;
psmouse->pt_activate = synaptics_pt_activate; @@ -1218,11 +1247,10 @@ static psmouse_ret_t synaptics_process_byte(struct psmouse *psmouse)
if (SYN_CAP_PASS_THROUGH(priv->info.capabilities) && synaptics_is_pt_packet(psmouse->packet)) { - if (priv->pt_port) - synaptics_pass_pt_packet(priv->pt_port, - psmouse->packet); - } else + synaptics_pass_pt_packet(priv, psmouse->packet); + } else { synaptics_process_packet(psmouse); + }
return PSMOUSE_FULL_PACKET; } diff --git a/drivers/input/mouse/synaptics.h b/drivers/input/mouse/synaptics.h index 08533d1b1b16f..4b34f13b9f761 100644 --- a/drivers/input/mouse/synaptics.h +++ b/drivers/input/mouse/synaptics.h @@ -188,6 +188,7 @@ struct synaptics_data { bool disable_gesture; /* disable gestures */
struct serio *pt_port; /* Pass-through serio port */ + bool pt_port_open;
/* * Last received Advanced Gesture Mode (AGM) packet. An AGM packet
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com
[ Upstream commit 541011dc2d7c4c82523706f726f422a5e23cc86f ]
The stop trigger invokes rz_ssi_stop() and rz_ssi_stream_quit(). - The purpose of rz_ssi_stop() is to disable TX/RX, terminate DMA transactions, and set the controller to idle. - The purpose of rz_ssi_stream_quit() is to reset the substream-specific software data by setting strm->running and strm->substream appropriately.
The function rz_ssi_is_stream_running() checks if both strm->substream and strm->running are valid and returns true if so. Its implementation is as follows:
static inline bool rz_ssi_is_stream_running(struct rz_ssi_stream *strm) { return strm->substream && strm->running; }
When the controller is configured in full-duplex mode (with both playback and capture active), the rz_ssi_stop() function does not modify the controller settings when called for the first substream in the full-duplex setup. Instead, it simply sets strm->running = 0 and returns if the companion substream is still running. The following code illustrates this:
static int rz_ssi_stop(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm) { strm->running = 0;
if (rz_ssi_is_stream_running(&ssi->playback) || rz_ssi_is_stream_running(&ssi->capture)) return 0;
// ... }
The controller settings, along with the DMA termination (for the last stopped substream), are only applied when the last substream in the full-duplex setup is stopped.
While applying the controller settings only when the last substream stops is not problematic, terminating the DMA operations for only one substream causes failures when starting and stopping full-duplex operations multiple times in a loop.
To address this issue, call dmaengine_terminate_async() for both substreams involved in the full-duplex setup when the last substream in the setup is stopped.
Fixes: 4f8cd05a4305 ("ASoC: sh: rz-ssi: Add full duplex support") Cc: stable@vger.kernel.org Reviewed-by: Biju Das biju.das.jz@bp.renesas.com Signed-off-by: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://patch.msgid.link/20241210170953.2936724-5-claudiu.beznea.uj@bp.renes... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sh/rz-ssi.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/sound/soc/sh/rz-ssi.c b/sound/soc/sh/rz-ssi.c index 32db2cead8a4e..1b74dc1137958 100644 --- a/sound/soc/sh/rz-ssi.c +++ b/sound/soc/sh/rz-ssi.c @@ -416,8 +416,12 @@ static int rz_ssi_stop(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm) rz_ssi_reg_mask_setl(ssi, SSICR, SSICR_TEN | SSICR_REN, 0);
/* Cancel all remaining DMA transactions */ - if (rz_ssi_is_dma_enabled(ssi)) - dmaengine_terminate_async(strm->dma_ch); + if (rz_ssi_is_dma_enabled(ssi)) { + if (ssi->playback.dma_ch) + dmaengine_terminate_async(ssi->playback.dma_ch); + if (ssi->capture.dma_ch) + dmaengine_terminate_async(ssi->capture.dma_ch); + }
rz_ssi_set_idle(ssi);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 82a0a3e6f8c02b3236b55e784a083fa4ee07c321 ]
My static checker rule complains about this code. The concern is that if "sample_space" is negative then the "sample_space >= runtime->channels" condition will not work as intended because it will be type promoted to a high unsigned int value.
strm->fifo_sample_size is SSI_FIFO_DEPTH (32). The SSIFSR_TDC_MASK is 0x3f. Without any further context it does seem like a reasonable warning and it can't hurt to add a check for negatives.
Cc: stable@vger.kernel.org Fixes: 03e786bd4341 ("ASoC: sh: Add RZ/G2L SSIF-2 driver") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://patch.msgid.link/e07c3dc5-d885-4b04-a742-71f42243f4fd@stanley.mounta... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sh/rz-ssi.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/sh/rz-ssi.c b/sound/soc/sh/rz-ssi.c index 1b74dc1137958..4f483bfa584f5 100644 --- a/sound/soc/sh/rz-ssi.c +++ b/sound/soc/sh/rz-ssi.c @@ -528,6 +528,8 @@ static int rz_ssi_pio_send(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm) sample_space = strm->fifo_sample_size; ssifsr = rz_ssi_reg_readl(ssi, SSIFSR); sample_space -= (ssifsr >> SSIFSR_TDC_SHIFT) & SSIFSR_TDC_MASK; + if (sample_space < 0) + return -EINVAL;
/* Only add full frames at a time */ while (frames_left && (sample_space >= runtime->channels)) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philipp Stanner pstanner@redhat.com
[ Upstream commit d9d959c36bec59f11c0eb6b5308729e3c4901b5e ]
In order to remove the deprecated function pcim_iomap_regions_request_all(), a few drivers need an interface to request all BARs a PCI device offers.
Make pcim_request_all_regions() a public interface.
Link: https://lore.kernel.org/r/20241030112743.104395-2-pstanner@redhat.com Signed-off-by: Philipp Stanner pstanner@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Stable-dep-of: d555ed45a5a1 ("PCI: Restore original INTX_DISABLE bit by pcim_intx()") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/devres.c | 3 ++- include/linux/pci.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c index b133967faef84..2a64da5c91fb9 100644 --- a/drivers/pci/devres.c +++ b/drivers/pci/devres.c @@ -939,7 +939,7 @@ static void pcim_release_all_regions(struct pci_dev *pdev) * desired, release individual regions with pcim_release_region() or all of * them at once with pcim_release_all_regions(). */ -static int pcim_request_all_regions(struct pci_dev *pdev, const char *name) +int pcim_request_all_regions(struct pci_dev *pdev, const char *name) { int ret; int bar; @@ -957,6 +957,7 @@ static int pcim_request_all_regions(struct pci_dev *pdev, const char *name)
return ret; } +EXPORT_SYMBOL(pcim_request_all_regions);
/** * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones diff --git a/include/linux/pci.h b/include/linux/pci.h index 4e77c4230c0a1..000965a713edf 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2293,6 +2293,7 @@ static inline void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev) { } #endif
+int pcim_request_all_regions(struct pci_dev *pdev, const char *name); void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen); void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar, const char *name);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philipp Stanner pstanner@redhat.com
[ Upstream commit f546e8033d8f3e45d49622f04ca2fde650b80f6d ]
pci_intx() is a hybrid function which sometimes performs devres operations, depending on whether pcim_enable_device() has been used to enable the pci_dev. This sometimes-managed nature of the function is problematic. Notably, it causes the function to allocate under some circumstances which makes it unusable from interrupt context.
Export pcim_intx() (which is always managed) and rename __pcim_intx() (which is never managed) to pci_intx_unmanaged() and export it as well.
Then all callers of pci_intx() can be ported to the version they need, depending whether they use pci_enable_device() or pcim_enable_device().
Link: https://lore.kernel.org/r/20241209130632.132074-3-pstanner@redhat.com Signed-off-by: Philipp Stanner pstanner@redhat.com [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Stable-dep-of: d555ed45a5a1 ("PCI: Restore original INTX_DISABLE bit by pcim_intx()") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/devres.c | 24 +++--------------------- drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++ include/linux/pci.h | 2 ++ 3 files changed, 34 insertions(+), 21 deletions(-)
diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c index 2a64da5c91fb9..c3699105656a7 100644 --- a/drivers/pci/devres.c +++ b/drivers/pci/devres.c @@ -411,31 +411,12 @@ static inline bool mask_contains_bar(int mask, int bar) return mask & BIT(bar); }
-/* - * This is a copy of pci_intx() used to bypass the problem of recursive - * function calls due to the hybrid nature of pci_intx(). - */ -static void __pcim_intx(struct pci_dev *pdev, int enable) -{ - u16 pci_command, new; - - pci_read_config_word(pdev, PCI_COMMAND, &pci_command); - - if (enable) - new = pci_command & ~PCI_COMMAND_INTX_DISABLE; - else - new = pci_command | PCI_COMMAND_INTX_DISABLE; - - if (new != pci_command) - pci_write_config_word(pdev, PCI_COMMAND, new); -} - static void pcim_intx_restore(struct device *dev, void *data) { struct pci_dev *pdev = to_pci_dev(dev); struct pcim_intx_devres *res = data;
- __pcim_intx(pdev, res->orig_intx); + pci_intx_unmanaged(pdev, res->orig_intx); }
static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev) @@ -472,10 +453,11 @@ int pcim_intx(struct pci_dev *pdev, int enable) return -ENOMEM;
res->orig_intx = !enable; - __pcim_intx(pdev, enable); + pci_intx_unmanaged(pdev, enable);
return 0; } +EXPORT_SYMBOL_GPL(pcim_intx);
static void pcim_disable_device(void *pdev_raw) { diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index dd3c6dcb47ae4..3916e0b23cdaf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4480,6 +4480,35 @@ void pci_disable_parity(struct pci_dev *dev) } }
+/** + * pci_intx_unmanaged - enables/disables PCI INTx for device dev, + * unmanaged version + * @pdev: the PCI device to operate on + * @enable: boolean: whether to enable or disable PCI INTx + * + * Enables/disables PCI INTx for device @pdev + * + * This function behavios identically to pci_intx(), but is never managed with + * devres. + */ +void pci_intx_unmanaged(struct pci_dev *pdev, int enable) +{ + u16 pci_command, new; + + pci_read_config_word(pdev, PCI_COMMAND, &pci_command); + + if (enable) + new = pci_command & ~PCI_COMMAND_INTX_DISABLE; + else + new = pci_command | PCI_COMMAND_INTX_DISABLE; + + if (new == pci_command) + return; + + pci_write_config_word(pdev, PCI_COMMAND, new); +} +EXPORT_SYMBOL_GPL(pci_intx_unmanaged); + /** * pci_intx - enables/disables PCI INTx for device dev * @pdev: the PCI device to operate on diff --git a/include/linux/pci.h b/include/linux/pci.h index 000965a713edf..6ef32a8d146b1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1353,6 +1353,7 @@ int __must_check pcim_set_mwi(struct pci_dev *dev); int pci_try_set_mwi(struct pci_dev *dev); void pci_clear_mwi(struct pci_dev *dev); void pci_disable_parity(struct pci_dev *dev); +void pci_intx_unmanaged(struct pci_dev *pdev, int enable); void pci_intx(struct pci_dev *dev, int enable); bool pci_check_and_mask_intx(struct pci_dev *dev); bool pci_check_and_unmask_intx(struct pci_dev *dev); @@ -2293,6 +2294,7 @@ static inline void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev) { } #endif
+int pcim_intx(struct pci_dev *pdev, int enabled); int pcim_request_all_regions(struct pci_dev *pdev, const char *name); void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen); void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philipp Stanner pstanner@redhat.com
[ Upstream commit dfa2f4d5f9e5d757700cefa8ee480099889f1c69 ]
pci_intx() is a hybrid function which can sometimes be managed through devres. This hybrid nature is undesirable.
Since all users of pci_intx() have by now been ported either to always-managed pcim_intx() or never-managed pci_intx_unmanaged(), the devres functionality can be removed from pci_intx().
Consequently, pci_intx_unmanaged() is now redundant, because pci_intx() itself is now unmanaged.
Remove the devres functionality from pci_intx(). Have all users of pci_intx_unmanaged() call pci_intx(). Remove pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-13-pstanner@redhat.com Signed-off-by: Philipp Stanner pstanner@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: d555ed45a5a1 ("PCI: Restore original INTX_DISABLE bit by pcim_intx()") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/devres.c | 4 ++-- drivers/pci/pci.c | 43 ++----------------------------------------- include/linux/pci.h | 1 - 3 files changed, 4 insertions(+), 44 deletions(-)
diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c index c3699105656a7..70f1a46d07c5e 100644 --- a/drivers/pci/devres.c +++ b/drivers/pci/devres.c @@ -416,7 +416,7 @@ static void pcim_intx_restore(struct device *dev, void *data) struct pci_dev *pdev = to_pci_dev(dev); struct pcim_intx_devres *res = data;
- pci_intx_unmanaged(pdev, res->orig_intx); + pci_intx(pdev, res->orig_intx); }
static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev) @@ -453,7 +453,7 @@ int pcim_intx(struct pci_dev *pdev, int enable) return -ENOMEM;
res->orig_intx = !enable; - pci_intx_unmanaged(pdev, enable); + pci_intx(pdev, enable);
return 0; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 3916e0b23cdaf..1aa5d6f98ebda 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4481,17 +4481,13 @@ void pci_disable_parity(struct pci_dev *dev) }
/** - * pci_intx_unmanaged - enables/disables PCI INTx for device dev, - * unmanaged version + * pci_intx - enables/disables PCI INTx for device dev * @pdev: the PCI device to operate on * @enable: boolean: whether to enable or disable PCI INTx * * Enables/disables PCI INTx for device @pdev - * - * This function behavios identically to pci_intx(), but is never managed with - * devres. */ -void pci_intx_unmanaged(struct pci_dev *pdev, int enable) +void pci_intx(struct pci_dev *pdev, int enable) { u16 pci_command, new;
@@ -4507,41 +4503,6 @@ void pci_intx_unmanaged(struct pci_dev *pdev, int enable)
pci_write_config_word(pdev, PCI_COMMAND, new); } -EXPORT_SYMBOL_GPL(pci_intx_unmanaged); - -/** - * pci_intx - enables/disables PCI INTx for device dev - * @pdev: the PCI device to operate on - * @enable: boolean: whether to enable or disable PCI INTx - * - * Enables/disables PCI INTx for device @pdev - * - * NOTE: - * This is a "hybrid" function: It's normally unmanaged, but becomes managed - * when pcim_enable_device() has been called in advance. This hybrid feature is - * DEPRECATED! If you want managed cleanup, use pcim_intx() instead. - */ -void pci_intx(struct pci_dev *pdev, int enable) -{ - u16 pci_command, new; - - pci_read_config_word(pdev, PCI_COMMAND, &pci_command); - - if (enable) - new = pci_command & ~PCI_COMMAND_INTX_DISABLE; - else - new = pci_command | PCI_COMMAND_INTX_DISABLE; - - if (new != pci_command) { - /* Preserve the "hybrid" behavior for backwards compatibility */ - if (pci_is_managed(pdev)) { - WARN_ON_ONCE(pcim_intx(pdev, enable) != 0); - return; - } - - pci_write_config_word(pdev, PCI_COMMAND, new); - } -} EXPORT_SYMBOL_GPL(pci_intx);
/** diff --git a/include/linux/pci.h b/include/linux/pci.h index 6ef32a8d146b1..74114acbb07fb 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1353,7 +1353,6 @@ int __must_check pcim_set_mwi(struct pci_dev *dev); int pci_try_set_mwi(struct pci_dev *dev); void pci_clear_mwi(struct pci_dev *dev); void pci_disable_parity(struct pci_dev *dev); -void pci_intx_unmanaged(struct pci_dev *pdev, int enable); void pci_intx(struct pci_dev *dev, int enable); bool pci_check_and_mask_intx(struct pci_dev *dev); bool pci_check_and_unmask_intx(struct pci_dev *dev);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit d555ed45a5a10a813528c7685f432369d536ae3d ]
pcim_intx() tries to restore the INTx bit at removal via devres, but there is a chance that it restores a wrong value.
Because the value to be restored is blindly assumed to be the negative of the enable argument, when a driver calls pcim_intx() unnecessarily for the already enabled state, it'll restore to the disabled state in turn. That is, the function assumes the case like:
// INTx == 1 pcim_intx(pdev, 0); // old INTx value assumed to be 1 -> correct
but it might be like the following, too:
// INTx == 0 pcim_intx(pdev, 0); // old INTx value assumed to be 1 -> wrong
Also, when a driver calls pcim_intx() multiple times with different enable argument values, the last one will win no matter what value it is. This can lead to inconsistency, e.g.
// INTx == 1 pcim_intx(pdev, 0); // OK ... pcim_intx(pdev, 1); // now old INTx wrongly assumed to be 0
This patch addresses those inconsistencies by saving the original INTx state at the first pcim_intx() call. For that, get_or_create_intx_devres() is folded into pcim_intx() caller side; it allows us to simply check the already allocated devres and record the original INTx along with the devres_alloc() call.
Link: https://lore.kernel.org/r/20241031134300.10296-1-tiwai@suse.de Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()") Link: https://lore.kernel.org/87v7xk2ps5.wl-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Philipp Stanner pstanner@redhat.com Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/devres.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c index 70f1a46d07c5e..643f85849ef64 100644 --- a/drivers/pci/devres.c +++ b/drivers/pci/devres.c @@ -419,19 +419,12 @@ static void pcim_intx_restore(struct device *dev, void *data) pci_intx(pdev, res->orig_intx); }
-static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev) +static void save_orig_intx(struct pci_dev *pdev, struct pcim_intx_devres *res) { - struct pcim_intx_devres *res; - - res = devres_find(dev, pcim_intx_restore, NULL, NULL); - if (res) - return res; + u16 pci_command;
- res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL); - if (res) - devres_add(dev, res); - - return res; + pci_read_config_word(pdev, PCI_COMMAND, &pci_command); + res->orig_intx = !(pci_command & PCI_COMMAND_INTX_DISABLE); }
/** @@ -447,12 +440,23 @@ static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev) int pcim_intx(struct pci_dev *pdev, int enable) { struct pcim_intx_devres *res; + struct device *dev = &pdev->dev;
- res = get_or_create_intx_devres(&pdev->dev); - if (!res) - return -ENOMEM; + /* + * pcim_intx() must only restore the INTx value that existed before the + * driver was loaded, i.e., before it called pcim_intx() for the + * first time. + */ + res = devres_find(dev, pcim_intx_restore, NULL, NULL); + if (!res) { + res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + save_orig_intx(pdev, res); + devres_add(dev, res); + }
- res->orig_intx = !enable; pci_intx(pdev, enable);
return 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabien Parent fparent@baylibre.com
[ Upstream commit 72f3e3d68cfda508ec4b6c8927c50814229cd04e ]
The MT8183 Pumpkin board has a micro-HDMI connector. HDMI support is provided by an IT66121 DPI <-> HDMI bridge.
Enable the DPI and add the node for the IT66121 bridge.
Signed-off-by: Fabien Parent fparent@baylibre.com Co-developed-by: Pin-yen Lin treapking@chromium.org Signed-off-by: Pin-yen Lin treapking@chromium.org Reviewed-by: Nícolas F. R. A. Prado nfraprado@collabora.com Link: https://lore.kernel.org/r/20240919094212.1902073-1-treapking@chromium.org Signed-off-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Stable-dep-of: 26f6e91fa29a ("arm64: dts: mediatek: mt8183: Disable DSI display output by default") Signed-off-by: Sasha Levin sashal@kernel.org --- .../boot/dts/mediatek/mt8183-pumpkin.dts | 123 ++++++++++++++++++ 1 file changed, 123 insertions(+)
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts index 1aa668c3ccf92..61a6f66914b86 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts +++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts @@ -63,6 +63,18 @@ pulldown-ohm = <0>; io-channels = <&auxadc 0>; }; + + connector { + compatible = "hdmi-connector"; + label = "hdmi"; + type = "d"; + + port { + hdmi_connector_in: endpoint { + remote-endpoint = <&hdmi_connector_out>; + }; + }; + }; };
&auxadc { @@ -120,6 +132,43 @@ pinctrl-0 = <&i2c6_pins>; status = "okay"; clock-frequency = <100000>; + #address-cells = <1>; + #size-cells = <0>; + + it66121hdmitx: hdmitx@4c { + compatible = "ite,it66121"; + reg = <0x4c>; + pinctrl-names = "default"; + pinctrl-0 = <&ite_pins>; + reset-gpios = <&pio 160 GPIO_ACTIVE_LOW>; + interrupt-parent = <&pio>; + interrupts = <4 IRQ_TYPE_LEVEL_LOW>; + vcn33-supply = <&mt6358_vcn33_reg>; + vcn18-supply = <&mt6358_vcn18_reg>; + vrf12-supply = <&mt6358_vrf12_reg>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + + it66121_in: endpoint { + bus-width = <12>; + remote-endpoint = <&dpi_out>; + }; + }; + + port@1 { + reg = <1>; + + hdmi_connector_out: endpoint { + remote-endpoint = <&hdmi_connector_in>; + }; + }; + }; + }; };
&keyboard { @@ -362,6 +411,67 @@ input-enable; }; }; + + ite_pins: ite-pins { + pins-irq { + pinmux = <PINMUX_GPIO4__FUNC_GPIO4>; + input-enable; + bias-pull-up; + }; + + pins-rst { + pinmux = <PINMUX_GPIO160__FUNC_GPIO160>; + output-high; + }; + }; + + dpi_func_pins: dpi-func-pins { + pins-dpi { + pinmux = <PINMUX_GPIO12__FUNC_I2S5_BCK>, + <PINMUX_GPIO46__FUNC_I2S5_LRCK>, + <PINMUX_GPIO47__FUNC_I2S5_DO>, + <PINMUX_GPIO13__FUNC_DBPI_D0>, + <PINMUX_GPIO14__FUNC_DBPI_D1>, + <PINMUX_GPIO15__FUNC_DBPI_D2>, + <PINMUX_GPIO16__FUNC_DBPI_D3>, + <PINMUX_GPIO17__FUNC_DBPI_D4>, + <PINMUX_GPIO18__FUNC_DBPI_D5>, + <PINMUX_GPIO19__FUNC_DBPI_D6>, + <PINMUX_GPIO20__FUNC_DBPI_D7>, + <PINMUX_GPIO21__FUNC_DBPI_D8>, + <PINMUX_GPIO22__FUNC_DBPI_D9>, + <PINMUX_GPIO23__FUNC_DBPI_D10>, + <PINMUX_GPIO24__FUNC_DBPI_D11>, + <PINMUX_GPIO25__FUNC_DBPI_HSYNC>, + <PINMUX_GPIO26__FUNC_DBPI_VSYNC>, + <PINMUX_GPIO27__FUNC_DBPI_DE>, + <PINMUX_GPIO28__FUNC_DBPI_CK>; + }; + }; + + dpi_idle_pins: dpi-idle-pins { + pins-idle { + pinmux = <PINMUX_GPIO12__FUNC_GPIO12>, + <PINMUX_GPIO46__FUNC_GPIO46>, + <PINMUX_GPIO47__FUNC_GPIO47>, + <PINMUX_GPIO13__FUNC_GPIO13>, + <PINMUX_GPIO14__FUNC_GPIO14>, + <PINMUX_GPIO15__FUNC_GPIO15>, + <PINMUX_GPIO16__FUNC_GPIO16>, + <PINMUX_GPIO17__FUNC_GPIO17>, + <PINMUX_GPIO18__FUNC_GPIO18>, + <PINMUX_GPIO19__FUNC_GPIO19>, + <PINMUX_GPIO20__FUNC_GPIO20>, + <PINMUX_GPIO21__FUNC_GPIO21>, + <PINMUX_GPIO22__FUNC_GPIO22>, + <PINMUX_GPIO23__FUNC_GPIO23>, + <PINMUX_GPIO24__FUNC_GPIO24>, + <PINMUX_GPIO25__FUNC_GPIO25>, + <PINMUX_GPIO26__FUNC_GPIO26>, + <PINMUX_GPIO27__FUNC_GPIO27>, + <PINMUX_GPIO28__FUNC_GPIO28>; + }; + }; };
&pmic { @@ -415,3 +525,16 @@ &dsi0 { status = "disabled"; }; + +&dpi0 { + pinctrl-names = "default", "sleep"; + pinctrl-0 = <&dpi_func_pins>; + pinctrl-1 = <&dpi_idle_pins>; + status = "okay"; + + port { + dpi_out: endpoint { + remote-endpoint = <&it66121_in>; + }; + }; +};
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen-Yu Tsai wenst@chromium.org
[ Upstream commit 26f6e91fa29a58fdc76b47f94f8f6027944a490c ]
Most SoC dtsi files have the display output interfaces disabled by default, and only enabled on boards that utilize them. The MT8183 has it backwards: the display outputs are left enabled by default, and only disabled at the board level.
Reverse the situation for the DSI output so that it follows the normal scheme. For ease of backporting the DPI output is handled in a separate patch.
Fixes: 88ec840270e6 ("arm64: dts: mt8183: Add dsi node") Fixes: 19b6403f1e2a ("arm64: dts: mt8183: add mt8183 pumpkin board") Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai wenst@chromium.org Reviewed-by: Fei Shao fshao@chromium.org Link: https://lore.kernel.org/r/20241025075630.3917458-2-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts | 4 ---- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 1 + 2 files changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts index 61a6f66914b86..dbdee604edab4 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts +++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts @@ -522,10 +522,6 @@ status = "okay"; };
-&dsi0 { - status = "disabled"; -}; - &dpi0 { pinctrl-names = "default", "sleep"; pinctrl-0 = <&dpi_func_pins>; diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi index 5cb6bd3c5acbb..92c41463d10e3 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi @@ -1835,6 +1835,7 @@ resets = <&mmsys MT8183_MMSYS_SW0_RST_B_DISP_DSI0>; phys = <&mipi_tx0>; phy-names = "dphy"; + status = "disabled"; };
dpi0: dpi@14015000 {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
[ Upstream commit 990b1e3d150104249115a0ad81ea77c53b28f0f8 ]
Limit FW version string, when parsing FW binary, to 256 bytes and always add NULL-terminate it.
Reviewed-by: Karol Wachowski karol.wachowski@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-7-jacek.... Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Stable-dep-of: 41a2d8286c90 ("accel/ivpu: Fix error handling in recovery/reset") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/accel/ivpu/ivpu_fw.c | 7 ++++--- drivers/accel/ivpu/ivpu_fw.h | 6 +++++- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_fw.c b/drivers/accel/ivpu/ivpu_fw.c index ede6165e09d90..b2b6d89f06537 100644 --- a/drivers/accel/ivpu/ivpu_fw.c +++ b/drivers/accel/ivpu/ivpu_fw.c @@ -25,7 +25,6 @@ #define FW_SHAVE_NN_MAX_SIZE SZ_2M #define FW_RUNTIME_MIN_ADDR (FW_GLOBAL_MEM_START) #define FW_RUNTIME_MAX_ADDR (FW_GLOBAL_MEM_END - FW_SHARED_MEM_SIZE) -#define FW_VERSION_HEADER_SIZE SZ_4K #define FW_FILE_IMAGE_OFFSET (VPU_FW_HEADER_SIZE + FW_VERSION_HEADER_SIZE)
#define WATCHDOG_MSS_REDIRECT 32 @@ -191,8 +190,10 @@ static int ivpu_fw_parse(struct ivpu_device *vdev) ivpu_dbg(vdev, FW_BOOT, "Header version: 0x%x, format 0x%x\n", fw_hdr->header_version, fw_hdr->image_format);
- ivpu_info(vdev, "Firmware: %s, version: %s", fw->name, - (const char *)fw_hdr + VPU_FW_HEADER_SIZE); + if (!scnprintf(fw->version, sizeof(fw->version), "%s", fw->file->data + VPU_FW_HEADER_SIZE)) + ivpu_warn(vdev, "Missing firmware version\n"); + + ivpu_info(vdev, "Firmware: %s, version: %s\n", fw->name, fw->version);
if (IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, BOOT, 3)) return -EINVAL; diff --git a/drivers/accel/ivpu/ivpu_fw.h b/drivers/accel/ivpu/ivpu_fw.h index 40d9d17be3f52..5e8eb608b70f1 100644 --- a/drivers/accel/ivpu/ivpu_fw.h +++ b/drivers/accel/ivpu/ivpu_fw.h @@ -1,11 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (C) 2020-2023 Intel Corporation + * Copyright (C) 2020-2024 Intel Corporation */
#ifndef __IVPU_FW_H__ #define __IVPU_FW_H__
+#define FW_VERSION_HEADER_SIZE SZ_4K +#define FW_VERSION_STR_SIZE SZ_256 + struct ivpu_device; struct ivpu_bo; struct vpu_boot_params; @@ -13,6 +16,7 @@ struct vpu_boot_params; struct ivpu_fw_info { const struct firmware *file; const char *name; + char version[FW_VERSION_STR_SIZE]; struct ivpu_bo *mem; struct ivpu_bo *mem_shave_nn; struct ivpu_bo *mem_log_crit;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karol Wachowski karol.wachowski@intel.com
[ Upstream commit bade0340526827d03d9c293450c0422beba77f04 ]
Use coredump (if available) to collect FW logs in case of a FW crash. This makes dmesg more readable and allows to collect more log data.
Signed-off-by: Karol Wachowski karol.wachowski@intel.com Reviewed-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-8-jacek.... Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Stable-dep-of: 41a2d8286c90 ("accel/ivpu: Fix error handling in recovery/reset") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/accel/ivpu/Kconfig | 1 + drivers/accel/ivpu/Makefile | 1 + drivers/accel/ivpu/ivpu_coredump.c | 39 ++++++++++++++++++++++++++++++ drivers/accel/ivpu/ivpu_coredump.h | 25 +++++++++++++++++++ drivers/accel/ivpu/ivpu_drv.c | 5 ++-- drivers/accel/ivpu/ivpu_fw_log.h | 8 ------ drivers/accel/ivpu/ivpu_pm.c | 9 ++++--- 7 files changed, 74 insertions(+), 14 deletions(-) create mode 100644 drivers/accel/ivpu/ivpu_coredump.c create mode 100644 drivers/accel/ivpu/ivpu_coredump.h
diff --git a/drivers/accel/ivpu/Kconfig b/drivers/accel/ivpu/Kconfig index 682c532452863..e4d418b44626e 100644 --- a/drivers/accel/ivpu/Kconfig +++ b/drivers/accel/ivpu/Kconfig @@ -8,6 +8,7 @@ config DRM_ACCEL_IVPU select FW_LOADER select DRM_GEM_SHMEM_HELPER select GENERIC_ALLOCATOR + select WANT_DEV_COREDUMP help Choose this option if you have a system with an 14th generation Intel CPU (Meteor Lake) or newer. Intel NPU (formerly called Intel VPU) diff --git a/drivers/accel/ivpu/Makefile b/drivers/accel/ivpu/Makefile index ebd682a42eb12..232ea6d28c6e2 100644 --- a/drivers/accel/ivpu/Makefile +++ b/drivers/accel/ivpu/Makefile @@ -19,5 +19,6 @@ intel_vpu-y := \ ivpu_sysfs.o
intel_vpu-$(CONFIG_DEBUG_FS) += ivpu_debugfs.o +intel_vpu-$(CONFIG_DEV_COREDUMP) += ivpu_coredump.o
obj-$(CONFIG_DRM_ACCEL_IVPU) += intel_vpu.o diff --git a/drivers/accel/ivpu/ivpu_coredump.c b/drivers/accel/ivpu/ivpu_coredump.c new file mode 100644 index 0000000000000..16ad0c30818cc --- /dev/null +++ b/drivers/accel/ivpu/ivpu_coredump.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020-2024 Intel Corporation + */ + +#include <linux/devcoredump.h> +#include <linux/firmware.h> + +#include "ivpu_coredump.h" +#include "ivpu_fw.h" +#include "ivpu_gem.h" +#include "vpu_boot_api.h" + +#define CRASH_DUMP_HEADER "Intel NPU crash dump" +#define CRASH_DUMP_HEADERS_SIZE SZ_4K + +void ivpu_dev_coredump(struct ivpu_device *vdev) +{ + struct drm_print_iterator pi = {}; + struct drm_printer p; + size_t coredump_size; + char *coredump; + + coredump_size = CRASH_DUMP_HEADERS_SIZE + FW_VERSION_HEADER_SIZE + + ivpu_bo_size(vdev->fw->mem_log_crit) + ivpu_bo_size(vdev->fw->mem_log_verb); + coredump = vmalloc(coredump_size); + if (!coredump) + return; + + pi.data = coredump; + pi.remain = coredump_size; + p = drm_coredump_printer(&pi); + + drm_printf(&p, "%s\n", CRASH_DUMP_HEADER); + drm_printf(&p, "FW version: %s\n", vdev->fw->version); + ivpu_fw_log_print(vdev, false, &p); + + dev_coredumpv(vdev->drm.dev, coredump, pi.offset, GFP_KERNEL); +} diff --git a/drivers/accel/ivpu/ivpu_coredump.h b/drivers/accel/ivpu/ivpu_coredump.h new file mode 100644 index 0000000000000..8efb09d024411 --- /dev/null +++ b/drivers/accel/ivpu/ivpu_coredump.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2020-2024 Intel Corporation + */ + +#ifndef __IVPU_COREDUMP_H__ +#define __IVPU_COREDUMP_H__ + +#include <drm/drm_print.h> + +#include "ivpu_drv.h" +#include "ivpu_fw_log.h" + +#ifdef CONFIG_DEV_COREDUMP +void ivpu_dev_coredump(struct ivpu_device *vdev); +#else +static inline void ivpu_dev_coredump(struct ivpu_device *vdev) +{ + struct drm_printer p = drm_info_printer(vdev->drm.dev); + + ivpu_fw_log_print(vdev, false, &p); +} +#endif + +#endif /* __IVPU_COREDUMP_H__ */ diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c index c91400ecf9265..38b4158f52784 100644 --- a/drivers/accel/ivpu/ivpu_drv.c +++ b/drivers/accel/ivpu/ivpu_drv.c @@ -14,7 +14,7 @@ #include <drm/drm_ioctl.h> #include <drm/drm_prime.h>
-#include "vpu_boot_api.h" +#include "ivpu_coredump.h" #include "ivpu_debugfs.h" #include "ivpu_drv.h" #include "ivpu_fw.h" @@ -29,6 +29,7 @@ #include "ivpu_ms.h" #include "ivpu_pm.h" #include "ivpu_sysfs.h" +#include "vpu_boot_api.h"
#ifndef DRIVER_VERSION_STR #define DRIVER_VERSION_STR __stringify(DRM_IVPU_DRIVER_MAJOR) "." \ @@ -382,7 +383,7 @@ int ivpu_boot(struct ivpu_device *vdev) ivpu_err(vdev, "Failed to boot the firmware: %d\n", ret); ivpu_hw_diagnose_failure(vdev); ivpu_mmu_evtq_dump(vdev); - ivpu_fw_log_dump(vdev); + ivpu_dev_coredump(vdev); return ret; }
diff --git a/drivers/accel/ivpu/ivpu_fw_log.h b/drivers/accel/ivpu/ivpu_fw_log.h index 0b2573f6f3151..4b390a99699d6 100644 --- a/drivers/accel/ivpu/ivpu_fw_log.h +++ b/drivers/accel/ivpu/ivpu_fw_log.h @@ -8,8 +8,6 @@
#include <linux/types.h>
-#include <drm/drm_print.h> - #include "ivpu_drv.h"
#define IVPU_FW_LOG_DEFAULT 0 @@ -28,11 +26,5 @@ extern unsigned int ivpu_log_level; void ivpu_fw_log_print(struct ivpu_device *vdev, bool only_new_msgs, struct drm_printer *p); void ivpu_fw_log_clear(struct ivpu_device *vdev);
-static inline void ivpu_fw_log_dump(struct ivpu_device *vdev) -{ - struct drm_printer p = drm_info_printer(vdev->drm.dev); - - ivpu_fw_log_print(vdev, false, &p); -}
#endif /* __IVPU_FW_LOG_H__ */ diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index ef9a4ba18cb8a..0110f5ee7d069 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -9,17 +9,18 @@ #include <linux/pm_runtime.h> #include <linux/reboot.h>
-#include "vpu_boot_api.h" +#include "ivpu_coredump.h" #include "ivpu_drv.h" -#include "ivpu_hw.h" #include "ivpu_fw.h" #include "ivpu_fw_log.h" +#include "ivpu_hw.h" #include "ivpu_ipc.h" #include "ivpu_job.h" #include "ivpu_jsm_msg.h" #include "ivpu_mmu.h" #include "ivpu_ms.h" #include "ivpu_pm.h" +#include "vpu_boot_api.h"
static bool ivpu_disable_recovery; module_param_named_unsafe(disable_recovery, ivpu_disable_recovery, bool, 0644); @@ -123,7 +124,7 @@ static void ivpu_pm_recovery_work(struct work_struct *work) if (ret) ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
- ivpu_fw_log_dump(vdev); + ivpu_dev_coredump(vdev);
atomic_inc(&vdev->pm->reset_counter); atomic_set(&vdev->pm->reset_pending, 1); @@ -262,7 +263,7 @@ int ivpu_pm_runtime_suspend_cb(struct device *dev) if (!is_idle || ret_d0i3) { ivpu_err(vdev, "Forcing cold boot due to previous errors\n"); atomic_inc(&vdev->pm->reset_counter); - ivpu_fw_log_dump(vdev); + ivpu_dev_coredump(vdev); ivpu_pm_prepare_cold_boot(vdev); } else { ivpu_pm_prepare_warm_boot(vdev);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomasz Rusinowicz tomasz.rusinowicz@intel.com
[ Upstream commit 5e162f872d7af8f041b143536617ab2563ea7de5 ]
Send JSM state dump message at the beginning of TDR handler. This allows FW to collect debug info in the FW log before the state of the NPU is lost allowing to analyze the cause of a TDR.
Wait a predefined timeout (10 ms) so the FW has a chance to write debug logs. We cannot wait for JSM response at this point because IRQs are already disabled before TDR handler is invoked.
Signed-off-by: Tomasz Rusinowicz tomasz.rusinowicz@intel.com Reviewed-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-9-jacek.... Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Stable-dep-of: 41a2d8286c90 ("accel/ivpu: Fix error handling in recovery/reset") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/accel/ivpu/ivpu_drv.h | 1 + drivers/accel/ivpu/ivpu_hw.c | 3 +++ drivers/accel/ivpu/ivpu_ipc.c | 26 ++++++++++++++++++++++++++ drivers/accel/ivpu/ivpu_ipc.h | 2 ++ drivers/accel/ivpu/ivpu_jsm_msg.c | 8 ++++++++ drivers/accel/ivpu/ivpu_jsm_msg.h | 2 ++ drivers/accel/ivpu/ivpu_pm.c | 1 + 7 files changed, 43 insertions(+)
diff --git a/drivers/accel/ivpu/ivpu_drv.h b/drivers/accel/ivpu/ivpu_drv.h index 63f13b697eed7..2b30cc2e9272e 100644 --- a/drivers/accel/ivpu/ivpu_drv.h +++ b/drivers/accel/ivpu/ivpu_drv.h @@ -152,6 +152,7 @@ struct ivpu_device { int tdr; int autosuspend; int d0i3_entry_msg; + int state_dump_msg; } timeout; };
diff --git a/drivers/accel/ivpu/ivpu_hw.c b/drivers/accel/ivpu/ivpu_hw.c index e69c0613513f1..08b3cef58fd2d 100644 --- a/drivers/accel/ivpu/ivpu_hw.c +++ b/drivers/accel/ivpu/ivpu_hw.c @@ -89,12 +89,14 @@ static void timeouts_init(struct ivpu_device *vdev) vdev->timeout.tdr = 2000000; vdev->timeout.autosuspend = -1; vdev->timeout.d0i3_entry_msg = 500; + vdev->timeout.state_dump_msg = 10; } else if (ivpu_is_simics(vdev)) { vdev->timeout.boot = 50; vdev->timeout.jsm = 500; vdev->timeout.tdr = 10000; vdev->timeout.autosuspend = -1; vdev->timeout.d0i3_entry_msg = 100; + vdev->timeout.state_dump_msg = 10; } else { vdev->timeout.boot = 1000; vdev->timeout.jsm = 500; @@ -104,6 +106,7 @@ static void timeouts_init(struct ivpu_device *vdev) else vdev->timeout.autosuspend = 100; vdev->timeout.d0i3_entry_msg = 5; + vdev->timeout.state_dump_msg = 10; } }
diff --git a/drivers/accel/ivpu/ivpu_ipc.c b/drivers/accel/ivpu/ivpu_ipc.c index 29b723039a345..13c8a12162e89 100644 --- a/drivers/accel/ivpu/ivpu_ipc.c +++ b/drivers/accel/ivpu/ivpu_ipc.c @@ -353,6 +353,32 @@ int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req, return ret; }
+int ivpu_ipc_send_and_wait(struct ivpu_device *vdev, struct vpu_jsm_msg *req, + u32 channel, unsigned long timeout_ms) +{ + struct ivpu_ipc_consumer cons; + int ret; + + ret = ivpu_rpm_get(vdev); + if (ret < 0) + return ret; + + ivpu_ipc_consumer_add(vdev, &cons, channel, NULL); + + ret = ivpu_ipc_send(vdev, &cons, req); + if (ret) { + ivpu_warn_ratelimited(vdev, "IPC send failed: %d\n", ret); + goto consumer_del; + } + + msleep(timeout_ms); + +consumer_del: + ivpu_ipc_consumer_del(vdev, &cons); + ivpu_rpm_put(vdev); + return ret; +} + static bool ivpu_ipc_match_consumer(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons, struct ivpu_ipc_hdr *ipc_hdr, struct vpu_jsm_msg *jsm_msg) diff --git a/drivers/accel/ivpu/ivpu_ipc.h b/drivers/accel/ivpu/ivpu_ipc.h index fb4de7fb8210e..b4dfb504679ba 100644 --- a/drivers/accel/ivpu/ivpu_ipc.h +++ b/drivers/accel/ivpu/ivpu_ipc.h @@ -107,5 +107,7 @@ int ivpu_ipc_send_receive_internal(struct ivpu_device *vdev, struct vpu_jsm_msg int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req, enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp, u32 channel, unsigned long timeout_ms); +int ivpu_ipc_send_and_wait(struct ivpu_device *vdev, struct vpu_jsm_msg *req, + u32 channel, unsigned long timeout_ms);
#endif /* __IVPU_IPC_H__ */ diff --git a/drivers/accel/ivpu/ivpu_jsm_msg.c b/drivers/accel/ivpu/ivpu_jsm_msg.c index 88105963c1b28..f7618b605f021 100644 --- a/drivers/accel/ivpu/ivpu_jsm_msg.c +++ b/drivers/accel/ivpu/ivpu_jsm_msg.c @@ -555,3 +555,11 @@ int ivpu_jsm_dct_disable(struct ivpu_device *vdev) return ivpu_ipc_send_receive_internal(vdev, &req, VPU_JSM_MSG_DCT_DISABLE_DONE, &resp, VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm); } + +int ivpu_jsm_state_dump(struct ivpu_device *vdev) +{ + struct vpu_jsm_msg req = { .type = VPU_JSM_MSG_STATE_DUMP }; + + return ivpu_ipc_send_and_wait(vdev, &req, VPU_IPC_CHAN_ASYNC_CMD, + vdev->timeout.state_dump_msg); +} diff --git a/drivers/accel/ivpu/ivpu_jsm_msg.h b/drivers/accel/ivpu/ivpu_jsm_msg.h index e4e42c0ff6e65..9e84d3526a146 100644 --- a/drivers/accel/ivpu/ivpu_jsm_msg.h +++ b/drivers/accel/ivpu/ivpu_jsm_msg.h @@ -43,4 +43,6 @@ int ivpu_jsm_metric_streamer_info(struct ivpu_device *vdev, u64 metric_group_mas u64 buffer_size, u32 *sample_size, u64 *info_size); int ivpu_jsm_dct_enable(struct ivpu_device *vdev, u32 active_us, u32 inactive_us); int ivpu_jsm_dct_disable(struct ivpu_device *vdev); +int ivpu_jsm_state_dump(struct ivpu_device *vdev); + #endif diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index 0110f5ee7d069..848d7468d48ce 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -124,6 +124,7 @@ static void ivpu_pm_recovery_work(struct work_struct *work) if (ret) ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
+ ivpu_jsm_state_dump(vdev); ivpu_dev_coredump(vdev);
atomic_inc(&vdev->pm->reset_counter);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
[ Upstream commit 41a2d8286c905614f29007f1bc8e652d54654b82 ]
Disable runtime PM for the duration of reset/recovery so it is possible to set the correct runtime PM state depending on the outcome of the `ivpu_resume()`. Don’t suspend or reset the HW if the NPU is suspended when the reset/recovery is requested. Also, move common reset/recovery code to separate functions for better code readability.
Fixes: 27d19268cf39 ("accel/ivpu: Improve recovery and reset support") Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Maciej Falkowski maciej.falkowski@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250129124009.1039982-4-jacek... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/accel/ivpu/ivpu_pm.c | 79 ++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 36 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index 848d7468d48ce..fbb61a2c3b19c 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -111,41 +111,57 @@ static int ivpu_resume(struct ivpu_device *vdev) return ret; }
-static void ivpu_pm_recovery_work(struct work_struct *work) +static void ivpu_pm_reset_begin(struct ivpu_device *vdev) { - struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work); - struct ivpu_device *vdev = pm->vdev; - char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL}; - int ret; - - ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter)); - - ret = pm_runtime_resume_and_get(vdev->drm.dev); - if (ret) - ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); - - ivpu_jsm_state_dump(vdev); - ivpu_dev_coredump(vdev); + pm_runtime_disable(vdev->drm.dev);
atomic_inc(&vdev->pm->reset_counter); atomic_set(&vdev->pm->reset_pending, 1); down_write(&vdev->pm->reset_lock); +} + +static void ivpu_pm_reset_complete(struct ivpu_device *vdev) +{ + int ret;
- ivpu_suspend(vdev); ivpu_pm_prepare_cold_boot(vdev); ivpu_jobs_abort_all(vdev); ivpu_ms_cleanup_all(vdev);
ret = ivpu_resume(vdev); - if (ret) + if (ret) { ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); + pm_runtime_set_suspended(vdev->drm.dev); + } else { + pm_runtime_set_active(vdev->drm.dev); + }
up_write(&vdev->pm->reset_lock); atomic_set(&vdev->pm->reset_pending, 0);
- kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt); pm_runtime_mark_last_busy(vdev->drm.dev); - pm_runtime_put_autosuspend(vdev->drm.dev); + pm_runtime_enable(vdev->drm.dev); +} + +static void ivpu_pm_recovery_work(struct work_struct *work) +{ + struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work); + struct ivpu_device *vdev = pm->vdev; + char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL}; + + ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter)); + + ivpu_pm_reset_begin(vdev); + + if (!pm_runtime_status_suspended(vdev->drm.dev)) { + ivpu_jsm_state_dump(vdev); + ivpu_dev_coredump(vdev); + ivpu_suspend(vdev); + } + + ivpu_pm_reset_complete(vdev); + + kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt); }
void ivpu_pm_trigger_recovery(struct ivpu_device *vdev, const char *reason) @@ -316,16 +332,13 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev) struct ivpu_device *vdev = pci_get_drvdata(pdev);
ivpu_dbg(vdev, PM, "Pre-reset..\n"); - atomic_inc(&vdev->pm->reset_counter); - atomic_set(&vdev->pm->reset_pending, 1);
- pm_runtime_get_sync(vdev->drm.dev); - down_write(&vdev->pm->reset_lock); - ivpu_prepare_for_reset(vdev); - ivpu_hw_reset(vdev); - ivpu_pm_prepare_cold_boot(vdev); - ivpu_jobs_abort_all(vdev); - ivpu_ms_cleanup_all(vdev); + ivpu_pm_reset_begin(vdev); + + if (!pm_runtime_status_suspended(vdev->drm.dev)) { + ivpu_prepare_for_reset(vdev); + ivpu_hw_reset(vdev); + }
ivpu_dbg(vdev, PM, "Pre-reset done.\n"); } @@ -333,18 +346,12 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev) void ivpu_pm_reset_done_cb(struct pci_dev *pdev) { struct ivpu_device *vdev = pci_get_drvdata(pdev); - int ret;
ivpu_dbg(vdev, PM, "Post-reset..\n"); - ret = ivpu_resume(vdev); - if (ret) - ivpu_err(vdev, "Failed to set RESUME state: %d\n", ret); - up_write(&vdev->pm->reset_lock); - atomic_set(&vdev->pm->reset_pending, 0); - ivpu_dbg(vdev, PM, "Post-reset done.\n");
- pm_runtime_mark_last_busy(vdev->drm.dev); - pm_runtime_put_autosuspend(vdev->drm.dev); + ivpu_pm_reset_complete(vdev); + + ivpu_dbg(vdev, PM, "Post-reset done.\n"); }
void ivpu_pm_init(struct ivpu_device *vdev)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jay Cornwall jay.cornwall@amd.com
[ Upstream commit 62498e797aeb2bfa92a823ee1a8253f96d1cbe3f ]
gfx12 derivatives will have substantially different trap handler implementations from gfx10/gfx11. Add a separate source file for gfx12+ and remove unneeded conditional code.
No functional change.
v2: Revert copyright date to 2018, minor comment fixes
Signed-off-by: Jay Cornwall jay.cornwall@amd.com Reviewed-by: Lancelot Six lancelot.six@amd.com Cc: Jonathan Kim jonathan.kim@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: d584198a6fe4 ("drm/amdkfd: Ensure consistent barrier state saved in gfx12 trap handler") Signed-off-by: Sasha Levin sashal@kernel.org --- .../amd/amdkfd/cwsr_trap_handler_gfx10.asm | 202 +-- .../amd/amdkfd/cwsr_trap_handler_gfx12.asm | 1126 +++++++++++++++++ 2 files changed, 1127 insertions(+), 201 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm index 44772eec9ef4d..96fbb16ceb216 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm @@ -34,41 +34,24 @@ * cpp -DASIC_FAMILY=CHIP_PLUM_BONITO cwsr_trap_handler_gfx10.asm -P -o gfx11.sp3 * sp3 gfx11.sp3 -hex gfx11.hex * - * gfx12: - * cpp -DASIC_FAMILY=CHIP_GFX12 cwsr_trap_handler_gfx10.asm -P -o gfx12.sp3 - * sp3 gfx12.sp3 -hex gfx12.hex */
#define CHIP_NAVI10 26 #define CHIP_SIENNA_CICHLID 30 #define CHIP_PLUM_BONITO 36 -#define CHIP_GFX12 37
#define NO_SQC_STORE (ASIC_FAMILY >= CHIP_SIENNA_CICHLID) #define HAVE_XNACK (ASIC_FAMILY < CHIP_SIENNA_CICHLID) #define HAVE_SENDMSG_RTN (ASIC_FAMILY >= CHIP_PLUM_BONITO) #define HAVE_BUFFER_LDS_LOAD (ASIC_FAMILY < CHIP_PLUM_BONITO) -#define SW_SA_TRAP (ASIC_FAMILY >= CHIP_PLUM_BONITO && ASIC_FAMILY < CHIP_GFX12) +#define SW_SA_TRAP (ASIC_FAMILY == CHIP_PLUM_BONITO) #define SAVE_AFTER_XNACK_ERROR (HAVE_XNACK && !NO_SQC_STORE) // workaround for TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger #define SINGLE_STEP_MISSED_WORKAROUND 1 //workaround for lost MODE.DEBUG_EN exception when SAVECTX raised
-#if ASIC_FAMILY < CHIP_GFX12 #define S_COHERENCE glc:1 #define V_COHERENCE slc:1 glc:1 #define S_WAITCNT_0 s_waitcnt 0 -#else -#define S_COHERENCE scope:SCOPE_SYS -#define V_COHERENCE scope:SCOPE_SYS -#define S_WAITCNT_0 s_wait_idle - -#define HW_REG_SHADER_FLAT_SCRATCH_LO HW_REG_WAVE_SCRATCH_BASE_LO -#define HW_REG_SHADER_FLAT_SCRATCH_HI HW_REG_WAVE_SCRATCH_BASE_HI -#define HW_REG_GPR_ALLOC HW_REG_WAVE_GPR_ALLOC -#define HW_REG_LDS_ALLOC HW_REG_WAVE_LDS_ALLOC -#define HW_REG_MODE HW_REG_WAVE_MODE -#endif
-#if ASIC_FAMILY < CHIP_GFX12 var SQ_WAVE_STATUS_SPI_PRIO_MASK = 0x00000006 var SQ_WAVE_STATUS_HALT_MASK = 0x2000 var SQ_WAVE_STATUS_ECC_ERR_MASK = 0x20000 @@ -81,21 +64,6 @@ var S_STATUS_ALWAYS_CLEAR_MASK = SQ_WAVE_STATUS_SPI_PRIO_MASK|SQ_WAVE_STATUS_E var S_STATUS_HALT_MASK = SQ_WAVE_STATUS_HALT_MASK var S_SAVE_PC_HI_TRAP_ID_MASK = 0x00FF0000 var S_SAVE_PC_HI_HT_MASK = 0x01000000 -#else -var SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK = 0x4 -var SQ_WAVE_STATE_PRIV_SCC_SHIFT = 9 -var SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK = 0xC00 -var SQ_WAVE_STATE_PRIV_HALT_MASK = 0x4000 -var SQ_WAVE_STATE_PRIV_POISON_ERR_MASK = 0x8000 -var SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT = 15 -var SQ_WAVE_STATUS_WAVE64_SHIFT = 29 -var SQ_WAVE_STATUS_WAVE64_SIZE = 1 -var SQ_WAVE_LDS_ALLOC_GRANULARITY = 9 -var S_STATUS_HWREG = HW_REG_WAVE_STATE_PRIV -var S_STATUS_ALWAYS_CLEAR_MASK = SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK|SQ_WAVE_STATE_PRIV_POISON_ERR_MASK -var S_STATUS_HALT_MASK = SQ_WAVE_STATE_PRIV_HALT_MASK -var S_SAVE_PC_HI_TRAP_ID_MASK = 0xF0000000 -#endif
var SQ_WAVE_STATUS_NO_VGPRS_SHIFT = 24 var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT = 12 @@ -110,7 +78,6 @@ var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT = 8 var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT = 12 #endif
-#if ASIC_FAMILY < CHIP_GFX12 var SQ_WAVE_TRAPSTS_SAVECTX_MASK = 0x400 var SQ_WAVE_TRAPSTS_EXCP_MASK = 0x1FF var SQ_WAVE_TRAPSTS_SAVECTX_SHIFT = 10 @@ -161,39 +128,6 @@ var S_TRAPSTS_RESTORE_PART_3_SIZE = 32 - S_TRAPSTS_RESTORE_PART_3_SHIFT var S_TRAPSTS_HWREG = HW_REG_TRAPSTS var S_TRAPSTS_SAVE_CONTEXT_MASK = SQ_WAVE_TRAPSTS_SAVECTX_MASK var S_TRAPSTS_SAVE_CONTEXT_SHIFT = SQ_WAVE_TRAPSTS_SAVECTX_SHIFT -#else -var SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK = 0xF -var SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK = 0x10 -var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT = 5 -var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK = 0x20 -var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK = 0x40 -var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT = 6 -var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK = 0x80 -var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT = 7 -var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK = 0x100 -var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT = 8 -var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK = 0x200 -var SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK = 0x800 -var SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK = 0x80 -var SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK = 0x200 - -var S_TRAPSTS_HWREG = HW_REG_WAVE_EXCP_FLAG_PRIV -var S_TRAPSTS_SAVE_CONTEXT_MASK = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK -var S_TRAPSTS_SAVE_CONTEXT_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT -var S_TRAPSTS_NON_MASKABLE_EXCP_MASK = SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK |\ - SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK |\ - SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK |\ - SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK |\ - SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK |\ - SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK -var S_TRAPSTS_RESTORE_PART_1_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT -var S_TRAPSTS_RESTORE_PART_2_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT -var S_TRAPSTS_RESTORE_PART_2_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT - SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT -var S_TRAPSTS_RESTORE_PART_3_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT -var S_TRAPSTS_RESTORE_PART_3_SIZE = 32 - S_TRAPSTS_RESTORE_PART_3_SHIFT -var BARRIER_STATE_SIGNAL_OFFSET = 16 -var BARRIER_STATE_VALID_OFFSET = 0 -#endif
// bits [31:24] unused by SPI debug data var TTMP11_SAVE_REPLAY_W64H_SHIFT = 31 @@ -305,11 +239,7 @@ L_TRAP_NO_BARRIER:
L_HALTED: // Host trap may occur while wave is halted. -#if ASIC_FAMILY < CHIP_GFX12 s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_TRAP_ID_MASK -#else - s_and_b32 ttmp2, s_save_trapsts, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK -#endif s_cbranch_scc1 L_FETCH_2ND_TRAP
L_CHECK_SAVE: @@ -336,7 +266,6 @@ L_NOT_HALTED: // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi. // Maskable exceptions only cause the wave to enter the trap handler if // their respective bit in mode.excp_en is set. -#if ASIC_FAMILY < CHIP_GFX12 s_and_b32 ttmp2, s_save_trapsts, SQ_WAVE_TRAPSTS_EXCP_MASK|SQ_WAVE_TRAPSTS_EXCP_HI_MASK s_cbranch_scc0 L_CHECK_TRAP_ID
@@ -349,17 +278,6 @@ L_NOT_ADDR_WATCH: s_lshl_b32 ttmp2, ttmp2, SQ_WAVE_MODE_EXCP_EN_SHIFT s_and_b32 ttmp2, ttmp2, ttmp3 s_cbranch_scc1 L_FETCH_2ND_TRAP -#else - s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) - s_and_b32 ttmp3, s_save_trapsts, SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK - s_cbranch_scc0 L_NOT_ADDR_WATCH - s_or_b32 ttmp2, ttmp2, SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK - -L_NOT_ADDR_WATCH: - s_getreg_b32 ttmp3, hwreg(HW_REG_WAVE_TRAP_CTRL) - s_and_b32 ttmp2, ttmp3, ttmp2 - s_cbranch_scc1 L_FETCH_2ND_TRAP -#endif
L_CHECK_TRAP_ID: // Check trap_id != 0 @@ -369,13 +287,8 @@ L_CHECK_TRAP_ID: #if SINGLE_STEP_MISSED_WORKAROUND // Prioritize single step exception over context save. // Second-level trap will halt wave and RFE, re-entering for SAVECTX. -#if ASIC_FAMILY < CHIP_GFX12 s_getreg_b32 ttmp2, hwreg(HW_REG_MODE) s_and_b32 ttmp2, ttmp2, SQ_WAVE_MODE_DEBUG_EN_MASK -#else - // WAVE_TRAP_CTRL is already in ttmp3. - s_and_b32 ttmp3, ttmp3, SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK -#endif s_cbranch_scc1 L_FETCH_2ND_TRAP #endif
@@ -425,12 +338,7 @@ L_NO_NEXT_TRAP: s_cbranch_scc1 L_TRAP_CASE
// Host trap will not cause trap re-entry. -#if ASIC_FAMILY < CHIP_GFX12 s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_HT_MASK -#else - s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) - s_and_b32 ttmp2, ttmp2, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK -#endif s_cbranch_scc1 L_EXIT_TRAP s_or_b32 s_save_status, s_save_status, S_STATUS_HALT_MASK
@@ -457,16 +365,7 @@ L_EXIT_TRAP: s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32
-#if ASIC_FAMILY < CHIP_GFX12 s_setreg_b32 hwreg(S_STATUS_HWREG), s_save_status -#else - // STATE_PRIV.BARRIER_COMPLETE may have changed since we read it. - // Only restore fields which the trap handler changes. - s_lshr_b32 s_save_status, s_save_status, SQ_WAVE_STATE_PRIV_SCC_SHIFT - s_setreg_b32 hwreg(S_STATUS_HWREG, SQ_WAVE_STATE_PRIV_SCC_SHIFT, \ - SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT - SQ_WAVE_STATE_PRIV_SCC_SHIFT + 1), s_save_status -#endif - s_rfe_b64 [ttmp0, ttmp1]
L_SAVE: @@ -478,14 +377,6 @@ L_SAVE: s_endpgm L_HAVE_VGPRS: #endif -#if ASIC_FAMILY >= CHIP_GFX12 - s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) - s_bitcmp1_b32 s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT - s_cbranch_scc0 L_HAVE_VGPRS - s_endpgm -L_HAVE_VGPRS: -#endif - s_and_b32 s_save_pc_hi, s_save_pc_hi, 0x0000ffff //pc[47:32] s_mov_b32 s_save_tmp, 0 s_setreg_b32 hwreg(S_TRAPSTS_HWREG, S_TRAPSTS_SAVE_CONTEXT_SHIFT, 1), s_save_tmp //clear saveCtx bit @@ -671,19 +562,6 @@ L_SAVE_HWREG: s_mov_b32 m0, 0x0 //Next lane of v2 to write to #endif
-#if ASIC_FAMILY >= CHIP_GFX12 - // Ensure no further changes to barrier or LDS state. - // STATE_PRIV.BARRIER_COMPLETE may change up to this point. - s_barrier_signal -2 - s_barrier_wait -2 - - // Re-read final state of BARRIER_COMPLETE field for save. - s_getreg_b32 s_save_tmp, hwreg(S_STATUS_HWREG) - s_and_b32 s_save_tmp, s_save_tmp, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK - s_andn2_b32 s_save_status, s_save_status, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK - s_or_b32 s_save_status, s_save_status, s_save_tmp -#endif - write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) write_hwreg_to_mem(s_save_pc_lo, s_save_buf_rsrc0, s_save_mem_offset) s_andn2_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK @@ -707,21 +585,6 @@ L_SAVE_HWREG: s_getreg_b32 s_save_m0, hwreg(HW_REG_SHADER_FLAT_SCRATCH_HI) write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset)
-#if ASIC_FAMILY >= CHIP_GFX12 - s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) - write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) - - s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_TRAP_CTRL) - write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) - - s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) - write_hwreg_to_mem(s_save_tmp, s_save_buf_rsrc0, s_save_mem_offset) - - s_get_barrier_state s_save_tmp, -1 - s_wait_kmcnt (0) - write_hwreg_to_mem(s_save_tmp, s_save_buf_rsrc0, s_save_mem_offset) -#endif - #if NO_SQC_STORE // Write HWREGs with 16 VGPR lanes. TTMPs occupy space after this. s_mov_b32 exec_lo, 0xFFFF @@ -814,9 +677,7 @@ L_SAVE_LDS_NORMAL: s_and_b32 s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF //lds_size is zero? s_cbranch_scc0 L_SAVE_LDS_DONE //no lds used? jump to L_SAVE_DONE
-#if ASIC_FAMILY < CHIP_GFX12 s_barrier //LDS is used? wait for other waves in the same TG -#endif s_and_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK s_cbranch_scc0 L_SAVE_LDS_DONE
@@ -1081,11 +942,6 @@ L_RESTORE: s_mov_b32 s_restore_buf_rsrc2, 0 //NUM_RECORDS initial value = 0 (in bytes) s_mov_b32 s_restore_buf_rsrc3, S_RESTORE_BUF_RSRC_WORD3_MISC
-#if ASIC_FAMILY >= CHIP_GFX12 - // Save s_restore_spi_init_hi for later use. - s_mov_b32 s_restore_spi_init_hi_save, s_restore_spi_init_hi -#endif - //determine it is wave32 or wave64 get_wave_size2(s_restore_size)
@@ -1320,9 +1176,7 @@ L_RESTORE_SGPR: // s_barrier with MODE.DEBUG_EN=1, STATUS.PRIV=1 incorrectly asserts debug exception. // Clear DEBUG_EN before and restore MODE after the barrier. s_setreg_imm32_b32 hwreg(HW_REG_MODE), 0 -#if ASIC_FAMILY < CHIP_GFX12 s_barrier //barrier to ensure the readiness of LDS before access attemps from any other wave in the same TG -#endif
/* restore HW registers */ L_RESTORE_HWREG: @@ -1334,11 +1188,6 @@ L_RESTORE_HWREG:
s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes
-#if ASIC_FAMILY >= CHIP_GFX12 - // Restore s_restore_spi_init_hi before the saved value gets clobbered. - s_mov_b32 s_restore_spi_init_hi, s_restore_spi_init_hi_save -#endif - read_hwreg_from_mem(s_restore_m0, s_restore_buf_rsrc0, s_restore_mem_offset) read_hwreg_from_mem(s_restore_pc_lo, s_restore_buf_rsrc0, s_restore_mem_offset) read_hwreg_from_mem(s_restore_pc_hi, s_restore_buf_rsrc0, s_restore_mem_offset) @@ -1358,44 +1207,6 @@ L_RESTORE_HWREG:
s_setreg_b32 hwreg(HW_REG_SHADER_FLAT_SCRATCH_HI), s_restore_flat_scratch
-#if ASIC_FAMILY >= CHIP_GFX12 - read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) - S_WAITCNT_0 - s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_USER), s_restore_tmp - - read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) - S_WAITCNT_0 - s_setreg_b32 hwreg(HW_REG_WAVE_TRAP_CTRL), s_restore_tmp - - // Only the first wave needs to restore the workgroup barrier. - s_and_b32 s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK - s_cbranch_scc0 L_SKIP_BARRIER_RESTORE - - // Skip over WAVE_STATUS, since there is no state to restore from it - s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 4 - - read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) - S_WAITCNT_0 - - s_bitcmp1_b32 s_restore_tmp, BARRIER_STATE_VALID_OFFSET - s_cbranch_scc0 L_SKIP_BARRIER_RESTORE - - // extract the saved signal count from s_restore_tmp - s_lshr_b32 s_restore_tmp, s_restore_tmp, BARRIER_STATE_SIGNAL_OFFSET - - // We need to call s_barrier_signal repeatedly to restore the signal - // count of the work group barrier. The member count is already - // initialized with the number of waves in the work group. -L_BARRIER_RESTORE_LOOP: - s_and_b32 s_restore_tmp, s_restore_tmp, s_restore_tmp - s_cbranch_scc0 L_SKIP_BARRIER_RESTORE - s_barrier_signal -1 - s_add_i32 s_restore_tmp, s_restore_tmp, -1 - s_branch L_BARRIER_RESTORE_LOOP - -L_SKIP_BARRIER_RESTORE: -#endif - s_mov_b32 m0, s_restore_m0 s_mov_b32 exec_lo, s_restore_exec_lo s_mov_b32 exec_hi, s_restore_exec_hi @@ -1453,13 +1264,6 @@ L_RETURN_WITHOUT_PRIV:
s_setreg_b32 hwreg(S_STATUS_HWREG), s_restore_status // SCC is included, which is changed by previous salu
-#if ASIC_FAMILY >= CHIP_GFX12 - // Make barrier and LDS state visible to all waves in the group. - // STATE_PRIV.BARRIER_COMPLETE may change after this point. - s_barrier_signal -2 - s_barrier_wait -2 -#endif - s_rfe_b64 s_restore_pc_lo //Return to the main shader program and resume execution
L_END_PGM: @@ -1598,11 +1402,7 @@ function get_hwreg_size_bytes end
function get_wave_size2(s_reg) -#if ASIC_FAMILY < CHIP_GFX12 s_getreg_b32 s_reg, hwreg(HW_REG_IB_STS2,SQ_WAVE_IB_STS2_WAVE64_SHIFT,SQ_WAVE_IB_STS2_WAVE64_SIZE) -#else - s_getreg_b32 s_reg, hwreg(HW_REG_WAVE_STATUS,SQ_WAVE_STATUS_WAVE64_SHIFT,SQ_WAVE_STATUS_WAVE64_SIZE) -#endif s_lshl_b32 s_reg, s_reg, S_WAVE_SIZE end
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm new file mode 100644 index 0000000000000..1740e98c6719d --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm @@ -0,0 +1,1126 @@ +/* + * Copyright 2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +/* To compile this assembly code: + * + * gfx12: + * cpp -DASIC_FAMILY=CHIP_GFX12 cwsr_trap_handler_gfx12.asm -P -o gfx12.sp3 + * sp3 gfx12.sp3 -hex gfx12.hex + */ + +#define CHIP_GFX12 37 + +#define SINGLE_STEP_MISSED_WORKAROUND 1 //workaround for lost TRAP_AFTER_INST exception when SAVECTX raised + +var SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK = 0x4 +var SQ_WAVE_STATE_PRIV_SCC_SHIFT = 9 +var SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK = 0xC00 +var SQ_WAVE_STATE_PRIV_HALT_MASK = 0x4000 +var SQ_WAVE_STATE_PRIV_POISON_ERR_MASK = 0x8000 +var SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT = 15 +var SQ_WAVE_STATUS_WAVE64_SHIFT = 29 +var SQ_WAVE_STATUS_WAVE64_SIZE = 1 +var SQ_WAVE_STATUS_NO_VGPRS_SHIFT = 24 +var SQ_WAVE_STATE_PRIV_ALWAYS_CLEAR_MASK = SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK|SQ_WAVE_STATE_PRIV_POISON_ERR_MASK +var S_SAVE_PC_HI_TRAP_ID_MASK = 0xF0000000 + +var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT = 12 +var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE = 9 +var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE = 8 +var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT = 12 +var SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT = 24 +var SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE = 4 +var SQ_WAVE_LDS_ALLOC_GRANULARITY = 9 + +var SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK = 0xF +var SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK = 0x10 +var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT = 5 +var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK = 0x20 +var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK = 0x40 +var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT = 6 +var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK = 0x80 +var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT = 7 +var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK = 0x100 +var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT = 8 +var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK = 0x200 +var SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK = 0x800 +var SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK = 0x80 +var SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK = 0x200 + +var SQ_WAVE_EXCP_FLAG_PRIV_NON_MASKABLE_EXCP_MASK= SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK |\ + SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK |\ + SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK |\ + SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK |\ + SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK |\ + SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK +var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_1_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT +var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT +var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT - SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT +var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT +var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SIZE = 32 - SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT +var BARRIER_STATE_SIGNAL_OFFSET = 16 +var BARRIER_STATE_VALID_OFFSET = 0 + +var TTMP11_DEBUG_TRAP_ENABLED_SHIFT = 23 +var TTMP11_DEBUG_TRAP_ENABLED_MASK = 0x800000 + +// SQ_SEL_X/Y/Z/W, BUF_NUM_FORMAT_FLOAT, (0 for MUBUF stride[17:14] +// when ADD_TID_ENABLE and BUF_DATA_FORMAT_32 for MTBUF), ADD_TID_ENABLE +var S_SAVE_BUF_RSRC_WORD1_STRIDE = 0x00040000 +var S_SAVE_BUF_RSRC_WORD3_MISC = 0x10807FAC +var S_SAVE_SPI_INIT_FIRST_WAVE_MASK = 0x04000000 +var S_SAVE_SPI_INIT_FIRST_WAVE_SHIFT = 26 + +var S_SAVE_PC_HI_FIRST_WAVE_MASK = 0x80000000 +var S_SAVE_PC_HI_FIRST_WAVE_SHIFT = 31 + +var s_sgpr_save_num = 108 + +var s_save_spi_init_lo = exec_lo +var s_save_spi_init_hi = exec_hi +var s_save_pc_lo = ttmp0 +var s_save_pc_hi = ttmp1 +var s_save_exec_lo = ttmp2 +var s_save_exec_hi = ttmp3 +var s_save_state_priv = ttmp12 +var s_save_excp_flag_priv = ttmp15 +var s_save_xnack_mask = s_save_excp_flag_priv +var s_wave_size = ttmp7 +var s_save_buf_rsrc0 = ttmp8 +var s_save_buf_rsrc1 = ttmp9 +var s_save_buf_rsrc2 = ttmp10 +var s_save_buf_rsrc3 = ttmp11 +var s_save_mem_offset = ttmp4 +var s_save_alloc_size = s_save_excp_flag_priv +var s_save_tmp = ttmp14 +var s_save_m0 = ttmp5 +var s_save_ttmps_lo = s_save_tmp +var s_save_ttmps_hi = s_save_excp_flag_priv + +var S_RESTORE_BUF_RSRC_WORD1_STRIDE = S_SAVE_BUF_RSRC_WORD1_STRIDE +var S_RESTORE_BUF_RSRC_WORD3_MISC = S_SAVE_BUF_RSRC_WORD3_MISC + +var S_RESTORE_SPI_INIT_FIRST_WAVE_MASK = 0x04000000 +var S_RESTORE_SPI_INIT_FIRST_WAVE_SHIFT = 26 +var S_WAVE_SIZE = 25 + +var s_restore_spi_init_lo = exec_lo +var s_restore_spi_init_hi = exec_hi +var s_restore_mem_offset = ttmp12 +var s_restore_alloc_size = ttmp3 +var s_restore_tmp = ttmp2 +var s_restore_mem_offset_save = s_restore_tmp +var s_restore_m0 = s_restore_alloc_size +var s_restore_mode = ttmp7 +var s_restore_flat_scratch = s_restore_tmp +var s_restore_pc_lo = ttmp0 +var s_restore_pc_hi = ttmp1 +var s_restore_exec_lo = ttmp4 +var s_restore_exec_hi = ttmp5 +var s_restore_state_priv = ttmp14 +var s_restore_excp_flag_priv = ttmp15 +var s_restore_xnack_mask = ttmp13 +var s_restore_buf_rsrc0 = ttmp8 +var s_restore_buf_rsrc1 = ttmp9 +var s_restore_buf_rsrc2 = ttmp10 +var s_restore_buf_rsrc3 = ttmp11 +var s_restore_size = ttmp6 +var s_restore_ttmps_lo = s_restore_tmp +var s_restore_ttmps_hi = s_restore_alloc_size +var s_restore_spi_init_hi_save = s_restore_exec_hi + +shader main + asic(DEFAULT) + type(CS) + wave_size(32) + + s_branch L_SKIP_RESTORE //NOT restore. might be a regular trap or save + +L_JUMP_TO_RESTORE: + s_branch L_RESTORE + +L_SKIP_RESTORE: + s_getreg_b32 s_save_state_priv, hwreg(HW_REG_WAVE_STATE_PRIV) //save STATUS since we will change SCC + + // Clear SPI_PRIO: do not save with elevated priority. + // Clear ECC_ERR: prevents SQC store and triggers FATAL_HALT if setreg'd. + s_andn2_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_ALWAYS_CLEAR_MASK + + s_getreg_b32 s_save_excp_flag_priv, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) + + s_and_b32 ttmp2, s_save_state_priv, SQ_WAVE_STATE_PRIV_HALT_MASK + s_cbranch_scc0 L_NOT_HALTED + +L_HALTED: + // Host trap may occur while wave is halted. + s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK + s_cbranch_scc1 L_FETCH_2ND_TRAP + +L_CHECK_SAVE: + s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK + s_cbranch_scc1 L_SAVE + + // Wave is halted but neither host trap nor SAVECTX is raised. + // Caused by instruction fetch memory violation. + // Spin wait until context saved to prevent interrupt storm. + s_sleep 0x10 + s_getreg_b32 s_save_excp_flag_priv, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) + s_branch L_CHECK_SAVE + +L_NOT_HALTED: + // Let second-level handle non-SAVECTX exception or trap. + // Any concurrent SAVECTX will be handled upon re-entry once halted. + + // Check non-maskable exceptions. memory_violation, illegal_instruction + // and xnack_error exceptions always cause the wave to enter the trap + // handler. + s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_NON_MASKABLE_EXCP_MASK + s_cbranch_scc1 L_FETCH_2ND_TRAP + + // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi. + // Maskable exceptions only cause the wave to enter the trap handler if + // their respective bit in mode.excp_en is set. + s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) + s_and_b32 ttmp3, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK + s_cbranch_scc0 L_NOT_ADDR_WATCH + s_or_b32 ttmp2, ttmp2, SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK + +L_NOT_ADDR_WATCH: + s_getreg_b32 ttmp3, hwreg(HW_REG_WAVE_TRAP_CTRL) + s_and_b32 ttmp2, ttmp3, ttmp2 + s_cbranch_scc1 L_FETCH_2ND_TRAP + +L_CHECK_TRAP_ID: + // Check trap_id != 0 + s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_TRAP_ID_MASK + s_cbranch_scc1 L_FETCH_2ND_TRAP + +#if SINGLE_STEP_MISSED_WORKAROUND + // Prioritize single step exception over context save. + // Second-level trap will halt wave and RFE, re-entering for SAVECTX. + // WAVE_TRAP_CTRL is already in ttmp3. + s_and_b32 ttmp3, ttmp3, SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK + s_cbranch_scc1 L_FETCH_2ND_TRAP +#endif + + s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK + s_cbranch_scc1 L_SAVE + +L_FETCH_2ND_TRAP: + // Read second-level TBA/TMA from first-level TMA and jump if available. + // ttmp[2:5] and ttmp12 can be used (others hold SPI-initialized debug data) + // ttmp12 holds SQ_WAVE_STATUS + s_sendmsg_rtn_b64 [ttmp14, ttmp15], sendmsg(MSG_RTN_GET_TMA) + s_wait_idle + s_lshl_b64 [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 + + s_bitcmp1_b32 ttmp15, 0xF + s_cbranch_scc0 L_NO_SIGN_EXTEND_TMA + s_or_b32 ttmp15, ttmp15, 0xFFFF0000 +L_NO_SIGN_EXTEND_TMA: + + s_load_dword ttmp2, [ttmp14, ttmp15], 0x10 scope:SCOPE_SYS // debug trap enabled flag + s_wait_idle + s_lshl_b32 ttmp2, ttmp2, TTMP11_DEBUG_TRAP_ENABLED_SHIFT + s_andn2_b32 ttmp11, ttmp11, TTMP11_DEBUG_TRAP_ENABLED_MASK + s_or_b32 ttmp11, ttmp11, ttmp2 + + s_load_dwordx2 [ttmp2, ttmp3], [ttmp14, ttmp15], 0x0 scope:SCOPE_SYS // second-level TBA + s_wait_idle + s_load_dwordx2 [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 scope:SCOPE_SYS // second-level TMA + s_wait_idle + + s_and_b64 [ttmp2, ttmp3], [ttmp2, ttmp3], [ttmp2, ttmp3] + s_cbranch_scc0 L_NO_NEXT_TRAP // second-level trap handler not been set + s_setpc_b64 [ttmp2, ttmp3] // jump to second-level trap handler + +L_NO_NEXT_TRAP: + // If not caused by trap then halt wave to prevent re-entry. + s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_TRAP_ID_MASK + s_cbranch_scc1 L_TRAP_CASE + + // Host trap will not cause trap re-entry. + s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) + s_and_b32 ttmp2, ttmp2, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK + s_cbranch_scc1 L_EXIT_TRAP + s_or_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_HALT_MASK + + // If the PC points to S_ENDPGM then context save will fail if STATE_PRIV.HALT is set. + // Rewind the PC to prevent this from occurring. + s_sub_u32 ttmp0, ttmp0, 0x8 + s_subb_u32 ttmp1, ttmp1, 0x0 + + s_branch L_EXIT_TRAP + +L_TRAP_CASE: + // Advance past trap instruction to prevent re-entry. + s_add_u32 ttmp0, ttmp0, 0x4 + s_addc_u32 ttmp1, ttmp1, 0x0 + +L_EXIT_TRAP: + s_and_b32 ttmp1, ttmp1, 0xFFFF + + // Restore SQ_WAVE_STATUS. + s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 + s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 + + // STATE_PRIV.BARRIER_COMPLETE may have changed since we read it. + // Only restore fields which the trap handler changes. + s_lshr_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_SCC_SHIFT + s_setreg_b32 hwreg(HW_REG_WAVE_STATE_PRIV, SQ_WAVE_STATE_PRIV_SCC_SHIFT, \ + SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT - SQ_WAVE_STATE_PRIV_SCC_SHIFT + 1), s_save_state_priv + + s_rfe_b64 [ttmp0, ttmp1] + +L_SAVE: + // If VGPRs have been deallocated then terminate the wavefront. + // It has no remaining program to run and cannot save without VGPRs. + s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) + s_bitcmp1_b32 s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT + s_cbranch_scc0 L_HAVE_VGPRS + s_endpgm +L_HAVE_VGPRS: + + s_and_b32 s_save_pc_hi, s_save_pc_hi, 0x0000ffff //pc[47:32] + s_mov_b32 s_save_tmp, 0 + s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT, 1), s_save_tmp //clear saveCtx bit + + /* inform SPI the readiness and wait for SPI's go signal */ + s_mov_b32 s_save_exec_lo, exec_lo //save EXEC and use EXEC for the go signal from SPI + s_mov_b32 s_save_exec_hi, exec_hi + s_mov_b64 exec, 0x0 //clear EXEC to get ready to receive + + s_sendmsg_rtn_b64 [exec_lo, exec_hi], sendmsg(MSG_RTN_SAVE_WAVE) + s_wait_idle + + // Save first_wave flag so we can clear high bits of save address. + s_and_b32 s_save_tmp, s_save_spi_init_hi, S_SAVE_SPI_INIT_FIRST_WAVE_MASK + s_lshl_b32 s_save_tmp, s_save_tmp, (S_SAVE_PC_HI_FIRST_WAVE_SHIFT - S_SAVE_SPI_INIT_FIRST_WAVE_SHIFT) + s_or_b32 s_save_pc_hi, s_save_pc_hi, s_save_tmp + + // Trap temporaries must be saved via VGPR but all VGPRs are in use. + // There is no ttmp space to hold the resource constant for VGPR save. + // Save v0 by itself since it requires only two SGPRs. + s_mov_b32 s_save_ttmps_lo, exec_lo + s_and_b32 s_save_ttmps_hi, exec_hi, 0xFFFF + s_mov_b32 exec_lo, 0xFFFFFFFF + s_mov_b32 exec_hi, 0xFFFFFFFF + global_store_dword_addtid v0, [s_save_ttmps_lo, s_save_ttmps_hi] scope:SCOPE_SYS + v_mov_b32 v0, 0x0 + s_mov_b32 exec_lo, s_save_ttmps_lo + s_mov_b32 exec_hi, s_save_ttmps_hi + + // Save trap temporaries 4-11, 13 initialized by SPI debug dispatch logic + // ttmp SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR)+0x40 + get_wave_size2(s_save_ttmps_hi) + get_vgpr_size_bytes(s_save_ttmps_lo, s_save_ttmps_hi) + get_svgpr_size_bytes(s_save_ttmps_hi) + s_add_u32 s_save_ttmps_lo, s_save_ttmps_lo, s_save_ttmps_hi + s_and_b32 s_save_ttmps_hi, s_save_spi_init_hi, 0xFFFF + s_add_u32 s_save_ttmps_lo, s_save_ttmps_lo, get_sgpr_size_bytes() + s_add_u32 s_save_ttmps_lo, s_save_ttmps_lo, s_save_spi_init_lo + s_addc_u32 s_save_ttmps_hi, s_save_ttmps_hi, 0x0 + + v_writelane_b32 v0, ttmp4, 0x4 + v_writelane_b32 v0, ttmp5, 0x5 + v_writelane_b32 v0, ttmp6, 0x6 + v_writelane_b32 v0, ttmp7, 0x7 + v_writelane_b32 v0, ttmp8, 0x8 + v_writelane_b32 v0, ttmp9, 0x9 + v_writelane_b32 v0, ttmp10, 0xA + v_writelane_b32 v0, ttmp11, 0xB + v_writelane_b32 v0, ttmp13, 0xD + v_writelane_b32 v0, exec_lo, 0xE + v_writelane_b32 v0, exec_hi, 0xF + + s_mov_b32 exec_lo, 0x3FFF + s_mov_b32 exec_hi, 0x0 + global_store_dword_addtid v0, [s_save_ttmps_lo, s_save_ttmps_hi] offset:0x40 scope:SCOPE_SYS + v_readlane_b32 ttmp14, v0, 0xE + v_readlane_b32 ttmp15, v0, 0xF + s_mov_b32 exec_lo, ttmp14 + s_mov_b32 exec_hi, ttmp15 + + /* setup Resource Contants */ + s_mov_b32 s_save_buf_rsrc0, s_save_spi_init_lo //base_addr_lo + s_and_b32 s_save_buf_rsrc1, s_save_spi_init_hi, 0x0000FFFF //base_addr_hi + s_or_b32 s_save_buf_rsrc1, s_save_buf_rsrc1, S_SAVE_BUF_RSRC_WORD1_STRIDE + s_mov_b32 s_save_buf_rsrc2, 0 //NUM_RECORDS initial value = 0 (in bytes) although not neccessarily inited + s_mov_b32 s_save_buf_rsrc3, S_SAVE_BUF_RSRC_WORD3_MISC + + s_mov_b32 s_save_m0, m0 + + /* global mem offset */ + s_mov_b32 s_save_mem_offset, 0x0 + get_wave_size2(s_wave_size) + + /* save first 4 VGPRs, needed for SGPR save */ + s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on + s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_ENABLE_SAVE_4VGPR_EXEC_HI + s_mov_b32 exec_hi, 0x00000000 + s_branch L_SAVE_4VGPR_WAVE32 +L_ENABLE_SAVE_4VGPR_EXEC_HI: + s_mov_b32 exec_hi, 0xFFFFFFFF + s_branch L_SAVE_4VGPR_WAVE64 +L_SAVE_4VGPR_WAVE32: + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // VGPR Allocated in 4-GPR granularity + + buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128 + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*2 + buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*3 + s_branch L_SAVE_HWREG + +L_SAVE_4VGPR_WAVE64: + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // VGPR Allocated in 4-GPR granularity + + buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256 + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*2 + buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*3 + + /* save HW registers */ + +L_SAVE_HWREG: + // HWREG SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR) + get_vgpr_size_bytes(s_save_mem_offset, s_wave_size) + get_svgpr_size_bytes(s_save_tmp) + s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp + s_add_u32 s_save_mem_offset, s_save_mem_offset, get_sgpr_size_bytes() + + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + v_mov_b32 v0, 0x0 //Offset[31:0] from buffer resource + v_mov_b32 v1, 0x0 //Offset[63:32] from buffer resource + v_mov_b32 v2, 0x0 //Set of SGPRs for TCP store + s_mov_b32 m0, 0x0 //Next lane of v2 to write to + + // Ensure no further changes to barrier or LDS state. + // STATE_PRIV.BARRIER_COMPLETE may change up to this point. + s_barrier_signal -2 + s_barrier_wait -2 + + // Re-read final state of BARRIER_COMPLETE field for save. + s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATE_PRIV) + s_and_b32 s_save_tmp, s_save_tmp, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK + s_andn2_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK + s_or_b32 s_save_state_priv, s_save_state_priv, s_save_tmp + + write_hwreg_to_v2(s_save_m0) + write_hwreg_to_v2(s_save_pc_lo) + s_andn2_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK + write_hwreg_to_v2(s_save_tmp) + write_hwreg_to_v2(s_save_exec_lo) + write_hwreg_to_v2(s_save_exec_hi) + write_hwreg_to_v2(s_save_state_priv) + + s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) + write_hwreg_to_v2(s_save_tmp) + + write_hwreg_to_v2(s_save_xnack_mask) + + s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_MODE) + write_hwreg_to_v2(s_save_m0) + + s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_SCRATCH_BASE_LO) + write_hwreg_to_v2(s_save_m0) + + s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_SCRATCH_BASE_HI) + write_hwreg_to_v2(s_save_m0) + + s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) + write_hwreg_to_v2(s_save_m0) + + s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_TRAP_CTRL) + write_hwreg_to_v2(s_save_m0) + + s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) + write_hwreg_to_v2(s_save_tmp) + + s_get_barrier_state s_save_tmp, -1 + s_wait_kmcnt (0) + write_hwreg_to_v2(s_save_tmp) + + // Write HWREGs with 16 VGPR lanes. TTMPs occupy space after this. + s_mov_b32 exec_lo, 0xFFFF + s_mov_b32 exec_hi, 0x0 + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + + // Write SGPRs with 32 VGPR lanes. This works in wave32 and wave64 mode. + s_mov_b32 exec_lo, 0xFFFFFFFF + + /* save SGPRs */ + // Save SGPR before LDS save, then the s0 to s4 can be used during LDS save... + + // SGPR SR memory offset : size(VGPR)+size(SVGPR) + get_vgpr_size_bytes(s_save_mem_offset, s_wave_size) + get_svgpr_size_bytes(s_save_tmp) + s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + s_mov_b32 ttmp13, 0x0 //next VGPR lane to copy SGPR into + + s_mov_b32 m0, 0x0 //SGPR initial index value =0 + s_nop 0x0 //Manually inserted wait states +L_SAVE_SGPR_LOOP: + // SGPR is allocated in 16 SGPR granularity + s_movrels_b64 s0, s0 //s0 = s[0+m0], s1 = s[1+m0] + s_movrels_b64 s2, s2 //s2 = s[2+m0], s3 = s[3+m0] + s_movrels_b64 s4, s4 //s4 = s[4+m0], s5 = s[5+m0] + s_movrels_b64 s6, s6 //s6 = s[6+m0], s7 = s[7+m0] + s_movrels_b64 s8, s8 //s8 = s[8+m0], s9 = s[9+m0] + s_movrels_b64 s10, s10 //s10 = s[10+m0], s11 = s[11+m0] + s_movrels_b64 s12, s12 //s12 = s[12+m0], s13 = s[13+m0] + s_movrels_b64 s14, s14 //s14 = s[14+m0], s15 = s[15+m0] + + write_16sgpr_to_v2(s0) + + s_cmp_eq_u32 ttmp13, 0x20 //have 32 VGPR lanes filled? + s_cbranch_scc0 L_SAVE_SGPR_SKIP_TCP_STORE + + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + s_add_u32 s_save_mem_offset, s_save_mem_offset, 0x80 + s_mov_b32 ttmp13, 0x0 + v_mov_b32 v2, 0x0 +L_SAVE_SGPR_SKIP_TCP_STORE: + + s_add_u32 m0, m0, 16 //next sgpr index + s_cmp_lt_u32 m0, 96 //scc = (m0 < first 96 SGPR) ? 1 : 0 + s_cbranch_scc1 L_SAVE_SGPR_LOOP //first 96 SGPR save is complete? + + //save the rest 12 SGPR + s_movrels_b64 s0, s0 //s0 = s[0+m0], s1 = s[1+m0] + s_movrels_b64 s2, s2 //s2 = s[2+m0], s3 = s[3+m0] + s_movrels_b64 s4, s4 //s4 = s[4+m0], s5 = s[5+m0] + s_movrels_b64 s6, s6 //s6 = s[6+m0], s7 = s[7+m0] + s_movrels_b64 s8, s8 //s8 = s[8+m0], s9 = s[9+m0] + s_movrels_b64 s10, s10 //s10 = s[10+m0], s11 = s[11+m0] + write_12sgpr_to_v2(s0) + + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + + /* save LDS */ + +L_SAVE_LDS: + // Change EXEC to all threads... + s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on + s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_ENABLE_SAVE_LDS_EXEC_HI + s_mov_b32 exec_hi, 0x00000000 + s_branch L_SAVE_LDS_NORMAL +L_ENABLE_SAVE_LDS_EXEC_HI: + s_mov_b32 exec_hi, 0xFFFFFFFF +L_SAVE_LDS_NORMAL: + s_getreg_b32 s_save_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE) + s_and_b32 s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF //lds_size is zero? + s_cbranch_scc0 L_SAVE_LDS_DONE //no lds used? jump to L_SAVE_DONE + + s_and_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK + s_cbranch_scc0 L_SAVE_LDS_DONE + + // first wave do LDS save; + + s_lshl_b32 s_save_alloc_size, s_save_alloc_size, SQ_WAVE_LDS_ALLOC_GRANULARITY + s_mov_b32 s_save_buf_rsrc2, s_save_alloc_size //NUM_RECORDS in bytes + + // LDS at offset: size(VGPR)+size(SVGPR)+SIZE(SGPR)+SIZE(HWREG) + // + get_vgpr_size_bytes(s_save_mem_offset, s_wave_size) + get_svgpr_size_bytes(s_save_tmp) + s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp + s_add_u32 s_save_mem_offset, s_save_mem_offset, get_sgpr_size_bytes() + s_add_u32 s_save_mem_offset, s_save_mem_offset, get_hwreg_size_bytes() + + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + //load 0~63*4(byte address) to vgpr v0 + v_mbcnt_lo_u32_b32 v0, -1, 0 + v_mbcnt_hi_u32_b32 v0, -1, v0 + v_mul_u32_u24 v0, 4, v0 + + s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_mov_b32 m0, 0x0 + s_cbranch_scc1 L_SAVE_LDS_W64 + +L_SAVE_LDS_W32: + s_mov_b32 s3, 128 + s_nop 0 + s_nop 0 + s_nop 0 +L_SAVE_LDS_LOOP_W32: + ds_read_b32 v1, v0 + s_wait_idle + buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + + s_add_u32 m0, m0, s3 //every buffer_store_lds does 128 bytes + s_add_u32 s_save_mem_offset, s_save_mem_offset, s3 + v_add_nc_u32 v0, v0, 128 //mem offset increased by 128 bytes + s_cmp_lt_u32 m0, s_save_alloc_size //scc=(m0 < s_save_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_SAVE_LDS_LOOP_W32 //LDS save is complete? + + s_branch L_SAVE_LDS_DONE + +L_SAVE_LDS_W64: + s_mov_b32 s3, 256 + s_nop 0 + s_nop 0 + s_nop 0 +L_SAVE_LDS_LOOP_W64: + ds_read_b32 v1, v0 + s_wait_idle + buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + + s_add_u32 m0, m0, s3 //every buffer_store_lds does 256 bytes + s_add_u32 s_save_mem_offset, s_save_mem_offset, s3 + v_add_nc_u32 v0, v0, 256 //mem offset increased by 256 bytes + s_cmp_lt_u32 m0, s_save_alloc_size //scc=(m0 < s_save_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_SAVE_LDS_LOOP_W64 //LDS save is complete? + +L_SAVE_LDS_DONE: + /* save VGPRs - set the Rest VGPRs */ +L_SAVE_VGPR: + // VGPR SR memory offset: 0 + s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on + s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_ENABLE_SAVE_VGPR_EXEC_HI + s_mov_b32 s_save_mem_offset, (0+128*4) // for the rest VGPRs + s_mov_b32 exec_hi, 0x00000000 + s_branch L_SAVE_VGPR_NORMAL +L_ENABLE_SAVE_VGPR_EXEC_HI: + s_mov_b32 s_save_mem_offset, (0+256*4) // for the rest VGPRs + s_mov_b32 exec_hi, 0xFFFFFFFF +L_SAVE_VGPR_NORMAL: + s_getreg_b32 s_save_alloc_size, hwreg(HW_REG_WAVE_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE) + s_add_u32 s_save_alloc_size, s_save_alloc_size, 1 + s_lshl_b32 s_save_alloc_size, s_save_alloc_size, 2 //Number of VGPRs = (vgpr_size + 1) * 4 (non-zero value) + //determine it is wave32 or wave64 + s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_SAVE_VGPR_WAVE64 + + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // VGPR Allocated in 4-GPR granularity + + // VGPR store using dw burst + s_mov_b32 m0, 0x4 //VGPR initial index value =4 + s_cmp_lt_u32 m0, s_save_alloc_size + s_cbranch_scc0 L_SAVE_VGPR_END + +L_SAVE_VGPR_W32_LOOP: + v_movrels_b32 v0, v0 //v0 = v[0+m0] + v_movrels_b32 v1, v1 //v1 = v[1+m0] + v_movrels_b32 v2, v2 //v2 = v[2+m0] + v_movrels_b32 v3, v3 //v3 = v[3+m0] + + buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128 + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*2 + buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*3 + + s_add_u32 m0, m0, 4 //next vgpr index + s_add_u32 s_save_mem_offset, s_save_mem_offset, 128*4 //every buffer_store_dword does 128 bytes + s_cmp_lt_u32 m0, s_save_alloc_size //scc = (m0 < s_save_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_SAVE_VGPR_W32_LOOP //VGPR save is complete? + + s_branch L_SAVE_VGPR_END + +L_SAVE_VGPR_WAVE64: + s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // VGPR store using dw burst + s_mov_b32 m0, 0x4 //VGPR initial index value =4 + s_cmp_lt_u32 m0, s_save_alloc_size + s_cbranch_scc0 L_SAVE_SHARED_VGPR + +L_SAVE_VGPR_W64_LOOP: + v_movrels_b32 v0, v0 //v0 = v[0+m0] + v_movrels_b32 v1, v1 //v1 = v[1+m0] + v_movrels_b32 v2, v2 //v2 = v[2+m0] + v_movrels_b32 v3, v3 //v3 = v[3+m0] + + buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256 + buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*2 + buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*3 + + s_add_u32 m0, m0, 4 //next vgpr index + s_add_u32 s_save_mem_offset, s_save_mem_offset, 256*4 //every buffer_store_dword does 256 bytes + s_cmp_lt_u32 m0, s_save_alloc_size //scc = (m0 < s_save_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_SAVE_VGPR_W64_LOOP //VGPR save is complete? + +L_SAVE_SHARED_VGPR: + s_getreg_b32 s_save_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE) + s_and_b32 s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF //shared_vgpr_size is zero? + s_cbranch_scc0 L_SAVE_VGPR_END //no shared_vgpr used? jump to L_SAVE_LDS + s_lshl_b32 s_save_alloc_size, s_save_alloc_size, 3 //Number of SHARED_VGPRs = shared_vgpr_size * 8 (non-zero value) + //m0 now has the value of normal vgpr count, just add the m0 with shared_vgpr count to get the total count. + //save shared_vgpr will start from the index of m0 + s_add_u32 s_save_alloc_size, s_save_alloc_size, m0 + s_mov_b32 exec_lo, 0xFFFFFFFF + s_mov_b32 exec_hi, 0x00000000 + +L_SAVE_SHARED_VGPR_WAVE64_LOOP: + v_movrels_b32 v0, v0 //v0 = v[0+m0] + buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS + s_add_u32 m0, m0, 1 //next vgpr index + s_add_u32 s_save_mem_offset, s_save_mem_offset, 128 + s_cmp_lt_u32 m0, s_save_alloc_size //scc = (m0 < s_save_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_SAVE_SHARED_VGPR_WAVE64_LOOP //SHARED_VGPR save is complete? + +L_SAVE_VGPR_END: + s_branch L_END_PGM + +L_RESTORE: + /* Setup Resource Contants */ + s_mov_b32 s_restore_buf_rsrc0, s_restore_spi_init_lo //base_addr_lo + s_and_b32 s_restore_buf_rsrc1, s_restore_spi_init_hi, 0x0000FFFF //base_addr_hi + s_or_b32 s_restore_buf_rsrc1, s_restore_buf_rsrc1, S_RESTORE_BUF_RSRC_WORD1_STRIDE + s_mov_b32 s_restore_buf_rsrc2, 0 //NUM_RECORDS initial value = 0 (in bytes) + s_mov_b32 s_restore_buf_rsrc3, S_RESTORE_BUF_RSRC_WORD3_MISC + + // Save s_restore_spi_init_hi for later use. + s_mov_b32 s_restore_spi_init_hi_save, s_restore_spi_init_hi + + //determine it is wave32 or wave64 + get_wave_size2(s_restore_size) + + s_and_b32 s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK + s_cbranch_scc0 L_RESTORE_VGPR + + /* restore LDS */ +L_RESTORE_LDS: + s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on + s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_ENABLE_RESTORE_LDS_EXEC_HI + s_mov_b32 exec_hi, 0x00000000 + s_branch L_RESTORE_LDS_NORMAL +L_ENABLE_RESTORE_LDS_EXEC_HI: + s_mov_b32 exec_hi, 0xFFFFFFFF +L_RESTORE_LDS_NORMAL: + s_getreg_b32 s_restore_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE) + s_and_b32 s_restore_alloc_size, s_restore_alloc_size, 0xFFFFFFFF //lds_size is zero? + s_cbranch_scc0 L_RESTORE_VGPR //no lds used? jump to L_RESTORE_VGPR + s_lshl_b32 s_restore_alloc_size, s_restore_alloc_size, SQ_WAVE_LDS_ALLOC_GRANULARITY + s_mov_b32 s_restore_buf_rsrc2, s_restore_alloc_size //NUM_RECORDS in bytes + + // LDS at offset: size(VGPR)+size(SVGPR)+SIZE(SGPR)+SIZE(HWREG) + // + get_vgpr_size_bytes(s_restore_mem_offset, s_restore_size) + get_svgpr_size_bytes(s_restore_tmp) + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_sgpr_size_bytes() + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_hwreg_size_bytes() + + s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_mov_b32 m0, 0x0 + s_cbranch_scc1 L_RESTORE_LDS_LOOP_W64 + +L_RESTORE_LDS_LOOP_W32: + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset + s_wait_idle + ds_store_addtid_b32 v0 + s_add_u32 m0, m0, 128 // 128 DW + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128 //mem offset increased by 128DW + s_cmp_lt_u32 m0, s_restore_alloc_size //scc=(m0 < s_restore_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_RESTORE_LDS_LOOP_W32 //LDS restore is complete? + s_branch L_RESTORE_VGPR + +L_RESTORE_LDS_LOOP_W64: + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset + s_wait_idle + ds_store_addtid_b32 v0 + s_add_u32 m0, m0, 256 // 256 DW + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256 //mem offset increased by 256DW + s_cmp_lt_u32 m0, s_restore_alloc_size //scc=(m0 < s_restore_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_RESTORE_LDS_LOOP_W64 //LDS restore is complete? + + /* restore VGPRs */ +L_RESTORE_VGPR: + // VGPR SR memory offset : 0 + s_mov_b32 s_restore_mem_offset, 0x0 + s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on + s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_ENABLE_RESTORE_VGPR_EXEC_HI + s_mov_b32 exec_hi, 0x00000000 + s_branch L_RESTORE_VGPR_NORMAL +L_ENABLE_RESTORE_VGPR_EXEC_HI: + s_mov_b32 exec_hi, 0xFFFFFFFF +L_RESTORE_VGPR_NORMAL: + s_getreg_b32 s_restore_alloc_size, hwreg(HW_REG_WAVE_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE) + s_add_u32 s_restore_alloc_size, s_restore_alloc_size, 1 + s_lshl_b32 s_restore_alloc_size, s_restore_alloc_size, 2 //Number of VGPRs = (vgpr_size + 1) * 4 (non-zero value) + //determine it is wave32 or wave64 + s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE + s_and_b32 m0, m0, 1 + s_cmp_eq_u32 m0, 1 + s_cbranch_scc1 L_RESTORE_VGPR_WAVE64 + + s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // VGPR load using dw burst + s_mov_b32 s_restore_mem_offset_save, s_restore_mem_offset // restore start with v1, v0 will be the last + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128*4 + s_mov_b32 m0, 4 //VGPR initial index value = 4 + s_cmp_lt_u32 m0, s_restore_alloc_size + s_cbranch_scc0 L_RESTORE_SGPR + +L_RESTORE_VGPR_WAVE32_LOOP: + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS + buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:128 + buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:128*2 + buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:128*3 + s_wait_idle + v_movreld_b32 v0, v0 //v[0+m0] = v0 + v_movreld_b32 v1, v1 + v_movreld_b32 v2, v2 + v_movreld_b32 v3, v3 + s_add_u32 m0, m0, 4 //next vgpr index + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128*4 //every buffer_load_dword does 128 bytes + s_cmp_lt_u32 m0, s_restore_alloc_size //scc = (m0 < s_restore_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_RESTORE_VGPR_WAVE32_LOOP //VGPR restore (except v0) is complete? + + /* VGPR restore on v0 */ + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS + buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:128 + buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:128*2 + buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:128*3 + s_wait_idle + + s_branch L_RESTORE_SGPR + +L_RESTORE_VGPR_WAVE64: + s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // VGPR load using dw burst + s_mov_b32 s_restore_mem_offset_save, s_restore_mem_offset // restore start with v4, v0 will be the last + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256*4 + s_mov_b32 m0, 4 //VGPR initial index value = 4 + s_cmp_lt_u32 m0, s_restore_alloc_size + s_cbranch_scc0 L_RESTORE_SHARED_VGPR + +L_RESTORE_VGPR_WAVE64_LOOP: + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS + buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:256 + buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:256*2 + buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:256*3 + s_wait_idle + v_movreld_b32 v0, v0 //v[0+m0] = v0 + v_movreld_b32 v1, v1 + v_movreld_b32 v2, v2 + v_movreld_b32 v3, v3 + s_add_u32 m0, m0, 4 //next vgpr index + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256*4 //every buffer_load_dword does 256 bytes + s_cmp_lt_u32 m0, s_restore_alloc_size //scc = (m0 < s_restore_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_RESTORE_VGPR_WAVE64_LOOP //VGPR restore (except v0) is complete? + +L_RESTORE_SHARED_VGPR: + s_getreg_b32 s_restore_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE) //shared_vgpr_size + s_and_b32 s_restore_alloc_size, s_restore_alloc_size, 0xFFFFFFFF //shared_vgpr_size is zero? + s_cbranch_scc0 L_RESTORE_V0 //no shared_vgpr used? + s_lshl_b32 s_restore_alloc_size, s_restore_alloc_size, 3 //Number of SHARED_VGPRs = shared_vgpr_size * 8 (non-zero value) + //m0 now has the value of normal vgpr count, just add the m0 with shared_vgpr count to get the total count. + //restore shared_vgpr will start from the index of m0 + s_add_u32 s_restore_alloc_size, s_restore_alloc_size, m0 + s_mov_b32 exec_lo, 0xFFFFFFFF + s_mov_b32 exec_hi, 0x00000000 +L_RESTORE_SHARED_VGPR_WAVE64_LOOP: + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS + s_wait_idle + v_movreld_b32 v0, v0 //v[0+m0] = v0 + s_add_u32 m0, m0, 1 //next vgpr index + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128 + s_cmp_lt_u32 m0, s_restore_alloc_size //scc = (m0 < s_restore_alloc_size) ? 1 : 0 + s_cbranch_scc1 L_RESTORE_SHARED_VGPR_WAVE64_LOOP //VGPR restore (except v0) is complete? + + s_mov_b32 exec_hi, 0xFFFFFFFF //restore back exec_hi before restoring V0!! + + /* VGPR restore on v0 */ +L_RESTORE_V0: + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS + buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:256 + buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:256*2 + buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:256*3 + s_wait_idle + + /* restore SGPRs */ + //will be 2+8+16*6 + // SGPR SR memory offset : size(VGPR)+size(SVGPR) +L_RESTORE_SGPR: + get_vgpr_size_bytes(s_restore_mem_offset, s_restore_size) + get_svgpr_size_bytes(s_restore_tmp) + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_sgpr_size_bytes() + s_sub_u32 s_restore_mem_offset, s_restore_mem_offset, 20*4 //s108~s127 is not saved + + s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + s_mov_b32 m0, s_sgpr_save_num + + read_4sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + + s_sub_u32 m0, m0, 4 // Restore from S[0] to S[104] + s_nop 0 // hazard SALU M0=> S_MOVREL + + s_movreld_b64 s0, s0 //s[0+m0] = s0 + s_movreld_b64 s2, s2 + + read_8sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + + s_sub_u32 m0, m0, 8 // Restore from S[0] to S[96] + s_nop 0 // hazard SALU M0=> S_MOVREL + + s_movreld_b64 s0, s0 //s[0+m0] = s0 + s_movreld_b64 s2, s2 + s_movreld_b64 s4, s4 + s_movreld_b64 s6, s6 + + L_RESTORE_SGPR_LOOP: + read_16sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + + s_sub_u32 m0, m0, 16 // Restore from S[n] to S[0] + s_nop 0 // hazard SALU M0=> S_MOVREL + + s_movreld_b64 s0, s0 //s[0+m0] = s0 + s_movreld_b64 s2, s2 + s_movreld_b64 s4, s4 + s_movreld_b64 s6, s6 + s_movreld_b64 s8, s8 + s_movreld_b64 s10, s10 + s_movreld_b64 s12, s12 + s_movreld_b64 s14, s14 + + s_cmp_eq_u32 m0, 0 //scc = (m0 < s_sgpr_save_num) ? 1 : 0 + s_cbranch_scc0 L_RESTORE_SGPR_LOOP + + // s_barrier with STATE_PRIV.TRAP_AFTER_INST=1, STATUS.PRIV=1 incorrectly asserts debug exception. + // Clear DEBUG_EN before and restore MODE after the barrier. + s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE), 0 + + /* restore HW registers */ +L_RESTORE_HWREG: + // HWREG SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR) + get_vgpr_size_bytes(s_restore_mem_offset, s_restore_size) + get_svgpr_size_bytes(s_restore_tmp) + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_sgpr_size_bytes() + + s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + + // Restore s_restore_spi_init_hi before the saved value gets clobbered. + s_mov_b32 s_restore_spi_init_hi, s_restore_spi_init_hi_save + + read_hwreg_from_mem(s_restore_m0, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_pc_lo, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_pc_hi, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_exec_lo, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_exec_hi, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_state_priv, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_excp_flag_priv, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_xnack_mask, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_mode, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_flat_scratch, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + + s_setreg_b32 hwreg(HW_REG_WAVE_SCRATCH_BASE_LO), s_restore_flat_scratch + + read_hwreg_from_mem(s_restore_flat_scratch, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + + s_setreg_b32 hwreg(HW_REG_WAVE_SCRATCH_BASE_HI), s_restore_flat_scratch + + read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_USER), s_restore_tmp + + read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + s_setreg_b32 hwreg(HW_REG_WAVE_TRAP_CTRL), s_restore_tmp + + // Only the first wave needs to restore the workgroup barrier. + s_and_b32 s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK + s_cbranch_scc0 L_SKIP_BARRIER_RESTORE + + // Skip over WAVE_STATUS, since there is no state to restore from it + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 4 + + read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) + s_wait_idle + + s_bitcmp1_b32 s_restore_tmp, BARRIER_STATE_VALID_OFFSET + s_cbranch_scc0 L_SKIP_BARRIER_RESTORE + + // extract the saved signal count from s_restore_tmp + s_lshr_b32 s_restore_tmp, s_restore_tmp, BARRIER_STATE_SIGNAL_OFFSET + + // We need to call s_barrier_signal repeatedly to restore the signal + // count of the work group barrier. The member count is already + // initialized with the number of waves in the work group. +L_BARRIER_RESTORE_LOOP: + s_and_b32 s_restore_tmp, s_restore_tmp, s_restore_tmp + s_cbranch_scc0 L_SKIP_BARRIER_RESTORE + s_barrier_signal -1 + s_add_i32 s_restore_tmp, s_restore_tmp, -1 + s_branch L_BARRIER_RESTORE_LOOP + +L_SKIP_BARRIER_RESTORE: + + s_mov_b32 m0, s_restore_m0 + s_mov_b32 exec_lo, s_restore_exec_lo + s_mov_b32 exec_hi, s_restore_exec_hi + + // EXCP_FLAG_PRIV.SAVE_CONTEXT and HOST_TRAP may have changed. + // Only restore the other fields to avoid clobbering them. + s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, 0, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_1_SIZE), s_restore_excp_flag_priv + s_lshr_b32 s_restore_excp_flag_priv, s_restore_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT + s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SIZE), s_restore_excp_flag_priv + s_lshr_b32 s_restore_excp_flag_priv, s_restore_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT - SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT + s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SIZE), s_restore_excp_flag_priv + + s_setreg_b32 hwreg(HW_REG_WAVE_MODE), s_restore_mode + + // Restore trap temporaries 4-11, 13 initialized by SPI debug dispatch logic + // ttmp SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR)+0x40 + get_vgpr_size_bytes(s_restore_ttmps_lo, s_restore_size) + get_svgpr_size_bytes(s_restore_ttmps_hi) + s_add_u32 s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_ttmps_hi + s_add_u32 s_restore_ttmps_lo, s_restore_ttmps_lo, get_sgpr_size_bytes() + s_add_u32 s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_buf_rsrc0 + s_addc_u32 s_restore_ttmps_hi, s_restore_buf_rsrc1, 0x0 + s_and_b32 s_restore_ttmps_hi, s_restore_ttmps_hi, 0xFFFF + s_load_dwordx4 [ttmp4, ttmp5, ttmp6, ttmp7], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x50 scope:SCOPE_SYS + s_load_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x60 scope:SCOPE_SYS + s_load_dword ttmp13, [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x74 scope:SCOPE_SYS + s_wait_idle + + s_and_b32 s_restore_pc_hi, s_restore_pc_hi, 0x0000ffff //pc[47:32] //Do it here in order not to affect STATUS + s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 + s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 + + s_setreg_b32 hwreg(HW_REG_WAVE_STATE_PRIV), s_restore_state_priv // SCC is included, which is changed by previous salu + + // Make barrier and LDS state visible to all waves in the group. + // STATE_PRIV.BARRIER_COMPLETE may change after this point. + s_barrier_signal -2 + s_barrier_wait -2 + + s_rfe_b64 s_restore_pc_lo //Return to the main shader program and resume execution + +L_END_PGM: + s_endpgm_saved +end + +function write_hwreg_to_v2(s) + // Copy into VGPR for later TCP store. + v_writelane_b32 v2, s, m0 + s_add_u32 m0, m0, 0x1 +end + + +function write_16sgpr_to_v2(s) + // Copy into VGPR for later TCP store. + for var sgpr_idx = 0; sgpr_idx < 16; sgpr_idx ++ + v_writelane_b32 v2, s[sgpr_idx], ttmp13 + s_add_u32 ttmp13, ttmp13, 0x1 + end +end + +function write_12sgpr_to_v2(s) + // Copy into VGPR for later TCP store. + for var sgpr_idx = 0; sgpr_idx < 12; sgpr_idx ++ + v_writelane_b32 v2, s[sgpr_idx], ttmp13 + s_add_u32 ttmp13, ttmp13, 0x1 + end +end + +function read_hwreg_from_mem(s, s_rsrc, s_mem_offset) + s_buffer_load_dword s, s_rsrc, s_mem_offset scope:SCOPE_SYS + s_add_u32 s_mem_offset, s_mem_offset, 4 +end + +function read_16sgpr_from_mem(s, s_rsrc, s_mem_offset) + s_sub_u32 s_mem_offset, s_mem_offset, 4*16 + s_buffer_load_dwordx16 s, s_rsrc, s_mem_offset scope:SCOPE_SYS +end + +function read_8sgpr_from_mem(s, s_rsrc, s_mem_offset) + s_sub_u32 s_mem_offset, s_mem_offset, 4*8 + s_buffer_load_dwordx8 s, s_rsrc, s_mem_offset scope:SCOPE_SYS +end + +function read_4sgpr_from_mem(s, s_rsrc, s_mem_offset) + s_sub_u32 s_mem_offset, s_mem_offset, 4*4 + s_buffer_load_dwordx4 s, s_rsrc, s_mem_offset scope:SCOPE_SYS +end + +function get_vgpr_size_bytes(s_vgpr_size_byte, s_size) + s_getreg_b32 s_vgpr_size_byte, hwreg(HW_REG_WAVE_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE) + s_add_u32 s_vgpr_size_byte, s_vgpr_size_byte, 1 + s_bitcmp1_b32 s_size, S_WAVE_SIZE + s_cbranch_scc1 L_ENABLE_SHIFT_W64 + s_lshl_b32 s_vgpr_size_byte, s_vgpr_size_byte, (2+7) //Number of VGPRs = (vgpr_size + 1) * 4 * 32 * 4 (non-zero value) + s_branch L_SHIFT_DONE +L_ENABLE_SHIFT_W64: + s_lshl_b32 s_vgpr_size_byte, s_vgpr_size_byte, (2+8) //Number of VGPRs = (vgpr_size + 1) * 4 * 64 * 4 (non-zero value) +L_SHIFT_DONE: +end + +function get_svgpr_size_bytes(s_svgpr_size_byte) + s_getreg_b32 s_svgpr_size_byte, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE) + s_lshl_b32 s_svgpr_size_byte, s_svgpr_size_byte, (3+7) +end + +function get_sgpr_size_bytes + return 512 +end + +function get_hwreg_size_bytes + return 128 +end + +function get_wave_size2(s_reg) + s_getreg_b32 s_reg, hwreg(HW_REG_WAVE_STATUS,SQ_WAVE_STATUS_WAVE64_SHIFT,SQ_WAVE_STATUS_WAVE64_SIZE) + s_lshl_b32 s_reg, s_reg, S_WAVE_SIZE +end
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lancelot SIX lancelot.six@amd.com
[ Upstream commit d584198a6fe4c51f4aa88ad72f258f8961a0f11c ]
It is possible for some waves in a workgroup to finish their save sequence before the group leader has had time to capture the workgroup barrier state. When this happens, having those waves exit do impact the barrier state. As a consequence, the state captured by the group leader is invalid, and is eventually incorrectly restored.
This patch proposes to have all waves in a workgroup wait for each other at the end of their save sequence (just before calling s_endpgm_saved).
Signed-off-by: Lancelot SIX lancelot.six@amd.com Reviewed-by: Jay Cornwall jay.cornwall@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 3 ++- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 02f7ba8c93cd4..7062f12b5b751 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -4117,7 +4117,8 @@ static const uint32_t cwsr_trap_gfx12_hex[] = { 0x0000ffff, 0x8bfe7e7e, 0x8bea6a6a, 0xb97af804, 0xbe804ec2, 0xbf94fffe, - 0xbe804a6c, 0xbfb10000, + 0xbe804a6c, 0xbe804ec2, + 0xbf94fffe, 0xbfb10000, 0xbf9f0000, 0xbf9f0000, 0xbf9f0000, 0xbf9f0000, 0xbf9f0000, 0x00000000, diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm index 1740e98c6719d..7b9d36e5fa437 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm @@ -1049,6 +1049,10 @@ L_SKIP_BARRIER_RESTORE: s_rfe_b64 s_restore_pc_lo //Return to the main shader program and resume execution
L_END_PGM: + // Make sure that no wave of the workgroup can exit the trap handler + // before the workgroup barrier state is saved. + s_barrier_signal -2 + s_barrier_wait -2 s_endpgm_saved end
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
[ Upstream commit d33b10c0c73adca00f72bf4a153a07b7f5f34715 ]
There are several functions in trace.c that have "goto out;" or equivalent on error in order to release locks or free values that were allocated. This can be error prone or just simply make the code more complex.
Switch every location that ends with unlocking a mutex or freeing on error over to using the guard(mutex)() and __free() infrastructure to let the compiler worry about releasing locks. This makes the code easier to read and understand.
There's one place that should probably return an error but instead return 0. This does not change the return as the only changes are to do the conversion without changing the logic. Fixing that location will have to come later.
Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Peter Zijlstra peterz@infradead.org Cc: Andrew Morton akpm@linux-foundation.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/20241224221413.7b8c68c3@batman.local.home Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Stable-dep-of: 60b8f711143d ("tracing: Have the error of __tracing_resize_ring_buffer() passed to user") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 266 +++++++++++++++---------------------------- 1 file changed, 94 insertions(+), 172 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index bfc4ac265c2c3..f03eef90de54c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -26,6 +26,7 @@ #include <linux/hardirq.h> #include <linux/linkage.h> #include <linux/uaccess.h> +#include <linux/cleanup.h> #include <linux/vmalloc.h> #include <linux/ftrace.h> #include <linux/module.h> @@ -535,19 +536,16 @@ LIST_HEAD(ftrace_trace_arrays); int trace_array_get(struct trace_array *this_tr) { struct trace_array *tr; - int ret = -ENODEV;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock); list_for_each_entry(tr, &ftrace_trace_arrays, list) { if (tr == this_tr) { tr->ref++; - ret = 0; - break; + return 0; } } - mutex_unlock(&trace_types_lock);
- return ret; + return -ENODEV; }
static void __trace_array_put(struct trace_array *this_tr) @@ -1456,22 +1454,20 @@ EXPORT_SYMBOL_GPL(tracing_snapshot_alloc); int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, cond_update_fn_t update) { - struct cond_snapshot *cond_snapshot; - int ret = 0; + struct cond_snapshot *cond_snapshot __free(kfree) = + kzalloc(sizeof(*cond_snapshot), GFP_KERNEL); + int ret;
- cond_snapshot = kzalloc(sizeof(*cond_snapshot), GFP_KERNEL); if (!cond_snapshot) return -ENOMEM;
cond_snapshot->cond_data = cond_data; cond_snapshot->update = update;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock);
- if (tr->current_trace->use_max_tr) { - ret = -EBUSY; - goto fail_unlock; - } + if (tr->current_trace->use_max_tr) + return -EBUSY;
/* * The cond_snapshot can only change to NULL without the @@ -1481,29 +1477,20 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, * do safely with only holding the trace_types_lock and not * having to take the max_lock. */ - if (tr->cond_snapshot) { - ret = -EBUSY; - goto fail_unlock; - } + if (tr->cond_snapshot) + return -EBUSY;
ret = tracing_arm_snapshot_locked(tr); if (ret) - goto fail_unlock; + return ret;
local_irq_disable(); arch_spin_lock(&tr->max_lock); - tr->cond_snapshot = cond_snapshot; + tr->cond_snapshot = no_free_ptr(cond_snapshot); arch_spin_unlock(&tr->max_lock); local_irq_enable();
- mutex_unlock(&trace_types_lock); - - return ret; - - fail_unlock: - mutex_unlock(&trace_types_lock); - kfree(cond_snapshot); - return ret; + return 0; } EXPORT_SYMBOL_GPL(tracing_snapshot_cond_enable);
@@ -2216,10 +2203,10 @@ static __init int init_trace_selftests(void)
selftests_can_run = true;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock);
if (list_empty(&postponed_selftests)) - goto out; + return 0;
pr_info("Running postponed tracer tests:\n");
@@ -2248,9 +2235,6 @@ static __init int init_trace_selftests(void) } tracing_selftest_running = false;
- out: - mutex_unlock(&trace_types_lock); - return 0; } core_initcall(init_trace_selftests); @@ -2818,7 +2802,7 @@ int tracepoint_printk_sysctl(const struct ctl_table *table, int write, int save_tracepoint_printk; int ret;
- mutex_lock(&tracepoint_printk_mutex); + guard(mutex)(&tracepoint_printk_mutex); save_tracepoint_printk = tracepoint_printk;
ret = proc_dointvec(table, write, buffer, lenp, ppos); @@ -2831,16 +2815,13 @@ int tracepoint_printk_sysctl(const struct ctl_table *table, int write, tracepoint_printk = 0;
if (save_tracepoint_printk == tracepoint_printk) - goto out; + return ret;
if (tracepoint_printk) static_key_enable(&tracepoint_printk_key.key); else static_key_disable(&tracepoint_printk_key.key);
- out: - mutex_unlock(&tracepoint_printk_mutex); - return ret; }
@@ -5150,7 +5131,8 @@ static int tracing_trace_options_show(struct seq_file *m, void *v) u32 tracer_flags; int i;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock); + tracer_flags = tr->current_trace->flags->val; trace_opts = tr->current_trace->flags->opts;
@@ -5167,7 +5149,6 @@ static int tracing_trace_options_show(struct seq_file *m, void *v) else seq_printf(m, "no%s\n", trace_opts[i].name); } - mutex_unlock(&trace_types_lock);
return 0; } @@ -5832,7 +5813,7 @@ trace_insert_eval_map_file(struct module *mod, struct trace_eval_map **start, return; }
- mutex_lock(&trace_eval_mutex); + guard(mutex)(&trace_eval_mutex);
if (!trace_eval_maps) trace_eval_maps = map_array; @@ -5856,8 +5837,6 @@ trace_insert_eval_map_file(struct module *mod, struct trace_eval_map **start, map_array++; } memset(map_array, 0, sizeof(*map_array)); - - mutex_unlock(&trace_eval_mutex); }
static void trace_create_eval_file(struct dentry *d_tracer) @@ -6021,23 +6000,18 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr, { int ret;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock);
if (cpu_id != RING_BUFFER_ALL_CPUS) { /* make sure, this cpu is enabled in the mask */ - if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) { - ret = -EINVAL; - goto out; - } + if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) + return -EINVAL; }
ret = __tracing_resize_ring_buffer(tr, size, cpu_id); if (ret < 0) ret = -ENOMEM;
-out: - mutex_unlock(&trace_types_lock); - return ret; }
@@ -6129,9 +6103,9 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) #ifdef CONFIG_TRACER_MAX_TRACE bool had_max_tr; #endif - int ret = 0; + int ret;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock);
update_last_data(tr);
@@ -6139,7 +6113,7 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) ret = __tracing_resize_ring_buffer(tr, trace_buf_size, RING_BUFFER_ALL_CPUS); if (ret < 0) - goto out; + return ret; ret = 0; }
@@ -6147,12 +6121,11 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) if (strcmp(t->name, buf) == 0) break; } - if (!t) { - ret = -EINVAL; - goto out; - } + if (!t) + return -EINVAL; + if (t == tr->current_trace) - goto out; + return 0;
#ifdef CONFIG_TRACER_SNAPSHOT if (t->use_max_tr) { @@ -6163,27 +6136,23 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) arch_spin_unlock(&tr->max_lock); local_irq_enable(); if (ret) - goto out; + return ret; } #endif /* Some tracers won't work on kernel command line */ if (system_state < SYSTEM_RUNNING && t->noboot) { pr_warn("Tracer '%s' is not allowed on command line, ignored\n", t->name); - goto out; + return 0; }
/* Some tracers are only allowed for the top level buffer */ - if (!trace_ok_for_array(t, tr)) { - ret = -EINVAL; - goto out; - } + if (!trace_ok_for_array(t, tr)) + return -EINVAL;
/* If trace pipe files are being read, we can't change the tracer */ - if (tr->trace_ref) { - ret = -EBUSY; - goto out; - } + if (tr->trace_ref) + return -EBUSY;
trace_branch_disable();
@@ -6214,7 +6183,7 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) if (!had_max_tr && t->use_max_tr) { ret = tracing_arm_snapshot_locked(tr); if (ret) - goto out; + return ret; } #else tr->current_trace = &nop_trace; @@ -6227,17 +6196,15 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) if (t->use_max_tr) tracing_disarm_snapshot(tr); #endif - goto out; + return ret; } }
tr->current_trace = t; tr->current_trace->enabled++; trace_branch_enable(tr); - out: - mutex_unlock(&trace_types_lock);
- return ret; + return 0; }
static ssize_t @@ -6315,22 +6282,18 @@ tracing_thresh_write(struct file *filp, const char __user *ubuf, struct trace_array *tr = filp->private_data; int ret;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock); ret = tracing_nsecs_write(&tracing_thresh, ubuf, cnt, ppos); if (ret < 0) - goto out; + return ret;
if (tr->current_trace->update_thresh) { ret = tr->current_trace->update_thresh(tr); if (ret < 0) - goto out; + return ret; }
- ret = cnt; -out: - mutex_unlock(&trace_types_lock); - - return ret; + return cnt; }
#ifdef CONFIG_TRACER_MAX_TRACE @@ -6549,31 +6512,29 @@ tracing_read_pipe(struct file *filp, char __user *ubuf, * This is just a matter of traces coherency, the ring buffer itself * is protected. */ - mutex_lock(&iter->mutex); + guard(mutex)(&iter->mutex);
/* return any leftover data */ sret = trace_seq_to_user(&iter->seq, ubuf, cnt); if (sret != -EBUSY) - goto out; + return sret;
trace_seq_init(&iter->seq);
if (iter->trace->read) { sret = iter->trace->read(iter, filp, ubuf, cnt, ppos); if (sret) - goto out; + return sret; }
waitagain: sret = tracing_wait_pipe(filp); if (sret <= 0) - goto out; + return sret;
/* stop when tracing is finished */ - if (trace_empty(iter)) { - sret = 0; - goto out; - } + if (trace_empty(iter)) + return 0;
if (cnt >= TRACE_SEQ_BUFFER_SIZE) cnt = TRACE_SEQ_BUFFER_SIZE - 1; @@ -6637,9 +6598,6 @@ tracing_read_pipe(struct file *filp, char __user *ubuf, if (sret == -EBUSY) goto waitagain;
-out: - mutex_unlock(&iter->mutex); - return sret; }
@@ -7231,25 +7189,19 @@ u64 tracing_event_time_stamp(struct trace_buffer *buffer, struct ring_buffer_eve */ int tracing_set_filter_buffering(struct trace_array *tr, bool set) { - int ret = 0; - - mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock);
if (set && tr->no_filter_buffering_ref++) - goto out; + return 0;
if (!set) { - if (WARN_ON_ONCE(!tr->no_filter_buffering_ref)) { - ret = -EINVAL; - goto out; - } + if (WARN_ON_ONCE(!tr->no_filter_buffering_ref)) + return -EINVAL;
--tr->no_filter_buffering_ref; } - out: - mutex_unlock(&trace_types_lock);
- return ret; + return 0; }
struct ftrace_buffer_info { @@ -7325,12 +7277,10 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, if (ret) return ret;
- mutex_lock(&trace_types_lock); + guard(mutex)(&trace_types_lock);
- if (tr->current_trace->use_max_tr) { - ret = -EBUSY; - goto out; - } + if (tr->current_trace->use_max_tr) + return -EBUSY;
local_irq_disable(); arch_spin_lock(&tr->max_lock); @@ -7339,24 +7289,20 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, arch_spin_unlock(&tr->max_lock); local_irq_enable(); if (ret) - goto out; + return ret;
switch (val) { case 0: - if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { - ret = -EINVAL; - break; - } + if (iter->cpu_file != RING_BUFFER_ALL_CPUS) + return -EINVAL; if (tr->allocated_snapshot) free_snapshot(tr); break; case 1: /* Only allow per-cpu swap if the ring buffer supports it */ #ifndef CONFIG_RING_BUFFER_ALLOW_SWAP - if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { - ret = -EINVAL; - break; - } + if (iter->cpu_file != RING_BUFFER_ALL_CPUS) + return -EINVAL; #endif if (tr->allocated_snapshot) ret = resize_buffer_duplicate_size(&tr->max_buffer, @@ -7364,7 +7310,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
ret = tracing_arm_snapshot_locked(tr); if (ret) - break; + return ret;
/* Now, we're going to swap */ if (iter->cpu_file == RING_BUFFER_ALL_CPUS) { @@ -7391,8 +7337,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, *ppos += cnt; ret = cnt; } -out: - mutex_unlock(&trace_types_lock); + return ret; }
@@ -7778,12 +7723,11 @@ void tracing_log_err(struct trace_array *tr,
len += sizeof(CMD_PREFIX) + 2 * sizeof("\n") + strlen(cmd) + 1;
- mutex_lock(&tracing_err_log_lock); + guard(mutex)(&tracing_err_log_lock); + err = get_tracing_log_err(tr, len); - if (PTR_ERR(err) == -ENOMEM) { - mutex_unlock(&tracing_err_log_lock); + if (PTR_ERR(err) == -ENOMEM) return; - }
snprintf(err->loc, TRACING_LOG_LOC_MAX, "%s: error: ", loc); snprintf(err->cmd, len, "\n" CMD_PREFIX "%s\n", cmd); @@ -7794,7 +7738,6 @@ void tracing_log_err(struct trace_array *tr, err->info.ts = local_clock();
list_add_tail(&err->list, &tr->err_log); - mutex_unlock(&tracing_err_log_lock); }
static void clear_tracing_err_log(struct trace_array *tr) @@ -9535,20 +9478,17 @@ static int instance_mkdir(const char *name) struct trace_array *tr; int ret;
- mutex_lock(&event_mutex); - mutex_lock(&trace_types_lock); + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock);
ret = -EEXIST; if (trace_array_find(name)) - goto out_unlock; + return -EEXIST;
tr = trace_array_create(name);
ret = PTR_ERR_OR_ZERO(tr);
-out_unlock: - mutex_unlock(&trace_types_lock); - mutex_unlock(&event_mutex); return ret; }
@@ -9598,24 +9538,23 @@ struct trace_array *trace_array_get_by_name(const char *name, const char *system { struct trace_array *tr;
- mutex_lock(&event_mutex); - mutex_lock(&trace_types_lock); + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) { - if (tr->name && strcmp(tr->name, name) == 0) - goto out_unlock; + if (tr->name && strcmp(tr->name, name) == 0) { + tr->ref++; + return tr; + } }
tr = trace_array_create_systems(name, systems, 0, 0);
if (IS_ERR(tr)) tr = NULL; -out_unlock: - if (tr) + else tr->ref++;
- mutex_unlock(&trace_types_lock); - mutex_unlock(&event_mutex); return tr; } EXPORT_SYMBOL_GPL(trace_array_get_by_name); @@ -9666,48 +9605,36 @@ static int __remove_instance(struct trace_array *tr) int trace_array_destroy(struct trace_array *this_tr) { struct trace_array *tr; - int ret;
if (!this_tr) return -EINVAL;
- mutex_lock(&event_mutex); - mutex_lock(&trace_types_lock); + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock);
- ret = -ENODEV;
/* Making sure trace array exists before destroying it. */ list_for_each_entry(tr, &ftrace_trace_arrays, list) { - if (tr == this_tr) { - ret = __remove_instance(tr); - break; - } + if (tr == this_tr) + return __remove_instance(tr); }
- mutex_unlock(&trace_types_lock); - mutex_unlock(&event_mutex); - - return ret; + return -ENODEV; } EXPORT_SYMBOL_GPL(trace_array_destroy);
static int instance_rmdir(const char *name) { struct trace_array *tr; - int ret;
- mutex_lock(&event_mutex); - mutex_lock(&trace_types_lock); + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock);
- ret = -ENODEV; tr = trace_array_find(name); - if (tr) - ret = __remove_instance(tr); - - mutex_unlock(&trace_types_lock); - mutex_unlock(&event_mutex); + if (!tr) + return -ENODEV;
- return ret; + return __remove_instance(tr); }
static __init void create_trace_instances(struct dentry *d_tracer) @@ -9720,19 +9647,16 @@ static __init void create_trace_instances(struct dentry *d_tracer) if (MEM_FAIL(!trace_instance_dir, "Failed to create instances directory\n")) return;
- mutex_lock(&event_mutex); - mutex_lock(&trace_types_lock); + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) { if (!tr->name) continue; if (MEM_FAIL(trace_array_create_dir(tr) < 0, "Failed to create instance directory\n")) - break; + return; } - - mutex_unlock(&trace_types_lock); - mutex_unlock(&event_mutex); }
static void @@ -9946,7 +9870,7 @@ static void trace_module_remove_evals(struct module *mod) if (!mod->num_trace_evals) return;
- mutex_lock(&trace_eval_mutex); + guard(mutex)(&trace_eval_mutex);
map = trace_eval_maps;
@@ -9958,12 +9882,10 @@ static void trace_module_remove_evals(struct module *mod) map = map->tail.next; } if (!map) - goto out; + return;
*last = trace_eval_jmp_to_tail(map)->tail.next; kfree(map); - out: - mutex_unlock(&trace_eval_mutex); } #else static inline void trace_module_remove_evals(struct module *mod) { }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
[ Upstream commit 60b8f711143de7cd9c0f55be0fe7eb94b19eb5c7 ]
Currently if __tracing_resize_ring_buffer() returns an error, the tracing_resize_ringbuffer() returns -ENOMEM. But it may not be a memory issue that caused the function to fail. If the ring buffer is memory mapped, then the resizing of the ring buffer will be disabled. But if the user tries to resize the buffer, it will get an -ENOMEM returned, which is confusing because there is plenty of memory. The actual error returned was -EBUSY, which would make much more sense to the user.
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Vincent Donnefort vdonnefort@google.com Link: https://lore.kernel.org/20250213134132.7e4505d7@gandalf.local.home Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f03eef90de54c..1142a7802bb60 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5998,8 +5998,6 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr, ssize_t tracing_resize_ring_buffer(struct trace_array *tr, unsigned long size, int cpu_id) { - int ret; - guard(mutex)(&trace_types_lock);
if (cpu_id != RING_BUFFER_ALL_CPUS) { @@ -6008,11 +6006,7 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr, return -EINVAL; }
- ret = __tracing_resize_ring_buffer(tr, size, cpu_id); - if (ret < 0) - ret = -ENOMEM; - - return ret; + return __tracing_resize_ring_buffer(tr, size, cpu_id); }
static void update_last_data(struct trace_array *tr)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jill Donahue jilliandonahue58@gmail.com
[ Upstream commit 4ab37fcb42832cdd3e9d5e50653285ca84d6686f ]
When using USB MIDI, a lock is attempted to be acquired twice through a re-entrant call to f_midi_transmit, causing a deadlock.
Fix it by using queue_work() to schedule the inner f_midi_transmit() via a high priority work queue from the completion handler.
Link: https://lore.kernel.org/all/CAArt=LjxU0fUZOj06X+5tkeGT+6RbXzpWg1h4t4Fwa_KGVA... Fixes: d5daf49b58661 ("USB: gadget: midi: add midi function driver") Cc: stable stable@kernel.org Signed-off-by: Jill Donahue jilliandonahue58@gmail.com Link: https://lore.kernel.org/r/20250211174805.1369265-1-jdonahue@fender.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/f_midi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c index 4153643c67dce..1f18f15dba277 100644 --- a/drivers/usb/gadget/function/f_midi.c +++ b/drivers/usb/gadget/function/f_midi.c @@ -283,7 +283,7 @@ f_midi_complete(struct usb_ep *ep, struct usb_request *req) /* Our transmit completed. See if there's more to go. * f_midi_transmit eats req, don't queue it again. */ req->length = 0; - f_midi_transmit(midi); + queue_work(system_highpri_wq, &midi->work); return; } break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
[ Upstream commit 8427acb6b5861d205abca7afa656a897bbae34b7 ]
Pure reorganization. No functional changes.
Signed-off-by: Tejun Heo tj@kernel.org Stable-dep-of: 32966821574c ("sched_ext: Fix migration disabled handling in targeted dispatches") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/ext.c | 116 +++++++++++++++++++++++++++++---------------- 1 file changed, 75 insertions(+), 41 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 689f7e8f69f54..97076748dee0e 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2397,6 +2397,73 @@ static inline bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *r static inline bool consume_remote_task(struct rq *this_rq, struct task_struct *p, struct scx_dispatch_q *dsq, struct rq *task_rq) { return false; } #endif /* CONFIG_SMP */
+/** + * move_task_between_dsqs() - Move a task from one DSQ to another + * @p: target task + * @enq_flags: %SCX_ENQ_* + * @src_dsq: DSQ @p is currently on, must not be a local DSQ + * @dst_dsq: DSQ @p is being moved to, can be any DSQ + * + * Must be called with @p's task_rq and @src_dsq locked. If @dst_dsq is a local + * DSQ and @p is on a different CPU, @p will be migrated and thus its task_rq + * will change. As @p's task_rq is locked, this function doesn't need to use the + * holding_cpu mechanism. + * + * On return, @src_dsq is unlocked and only @p's new task_rq, which is the + * return value, is locked. + */ +static struct rq *move_task_between_dsqs(struct task_struct *p, u64 enq_flags, + struct scx_dispatch_q *src_dsq, + struct scx_dispatch_q *dst_dsq) +{ + struct rq *src_rq = task_rq(p), *dst_rq; + + BUG_ON(src_dsq->id == SCX_DSQ_LOCAL); + lockdep_assert_held(&src_dsq->lock); + lockdep_assert_rq_held(src_rq); + + if (dst_dsq->id == SCX_DSQ_LOCAL) { + dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq); + if (!task_can_run_on_remote_rq(p, dst_rq, true)) { + dst_dsq = find_global_dsq(p); + dst_rq = src_rq; + } + } else { + /* no need to migrate if destination is a non-local DSQ */ + dst_rq = src_rq; + } + + /* + * Move @p into $dst_dsq. If $dst_dsq is the local DSQ of a different + * CPU, @p will be migrated. + */ + if (dst_dsq->id == SCX_DSQ_LOCAL) { + /* @p is going from a non-local DSQ to a local DSQ */ + if (src_rq == dst_rq) { + task_unlink_from_dsq(p, src_dsq); + move_local_task_to_local_dsq(p, enq_flags, + src_dsq, dst_rq); + raw_spin_unlock(&src_dsq->lock); + } else { + raw_spin_unlock(&src_dsq->lock); + move_remote_task_to_local_dsq(p, enq_flags, + src_rq, dst_rq); + } + } else { + /* + * @p is going from a non-local DSQ to a non-local DSQ. As + * $src_dsq is already locked, do an abbreviated dequeue. + */ + task_unlink_from_dsq(p, src_dsq); + p->scx.dsq = NULL; + raw_spin_unlock(&src_dsq->lock); + + dispatch_enqueue(dst_dsq, p, enq_flags); + } + + return dst_rq; +} + static bool consume_dispatch_q(struct rq *rq, struct scx_dispatch_q *dsq) { struct task_struct *p; @@ -6134,7 +6201,7 @@ static bool scx_dispatch_from_dsq(struct bpf_iter_scx_dsq_kern *kit, u64 enq_flags) { struct scx_dispatch_q *src_dsq = kit->dsq, *dst_dsq; - struct rq *this_rq, *src_rq, *dst_rq, *locked_rq; + struct rq *this_rq, *src_rq, *locked_rq; bool dispatched = false; bool in_balance; unsigned long flags; @@ -6180,51 +6247,18 @@ static bool scx_dispatch_from_dsq(struct bpf_iter_scx_dsq_kern *kit, /* @p is still on $src_dsq and stable, determine the destination */ dst_dsq = find_dsq_for_dispatch(this_rq, dsq_id, p);
- if (dst_dsq->id == SCX_DSQ_LOCAL) { - dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq); - if (!task_can_run_on_remote_rq(p, dst_rq, true)) { - dst_dsq = find_global_dsq(p); - dst_rq = src_rq; - } - } else { - /* no need to migrate if destination is a non-local DSQ */ - dst_rq = src_rq; - } - /* - * Move @p into $dst_dsq. If $dst_dsq is the local DSQ of a different - * CPU, @p will be migrated. + * Apply vtime and slice updates before moving so that the new time is + * visible before inserting into $dst_dsq. @p is still on $src_dsq but + * this is safe as we're locking it. */ - if (dst_dsq->id == SCX_DSQ_LOCAL) { - /* @p is going from a non-local DSQ to a local DSQ */ - if (src_rq == dst_rq) { - task_unlink_from_dsq(p, src_dsq); - move_local_task_to_local_dsq(p, enq_flags, - src_dsq, dst_rq); - raw_spin_unlock(&src_dsq->lock); - } else { - raw_spin_unlock(&src_dsq->lock); - move_remote_task_to_local_dsq(p, enq_flags, - src_rq, dst_rq); - locked_rq = dst_rq; - } - } else { - /* - * @p is going from a non-local DSQ to a non-local DSQ. As - * $src_dsq is already locked, do an abbreviated dequeue. - */ - task_unlink_from_dsq(p, src_dsq); - p->scx.dsq = NULL; - raw_spin_unlock(&src_dsq->lock); - - if (kit->cursor.flags & __SCX_DSQ_ITER_HAS_VTIME) - p->scx.dsq_vtime = kit->vtime; - dispatch_enqueue(dst_dsq, p, enq_flags); - } - + if (kit->cursor.flags & __SCX_DSQ_ITER_HAS_VTIME) + p->scx.dsq_vtime = kit->vtime; if (kit->cursor.flags & __SCX_DSQ_ITER_HAS_SLICE) p->scx.slice = kit->slice;
+ /* execute move */ + locked_rq = move_task_between_dsqs(p, enq_flags, src_dsq, dst_dsq); dispatched = true; out: if (in_balance) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
[ Upstream commit 32966821574cd2917bd60f2554f435fe527f4702 ]
A dispatch operation that can target a specific local DSQ - scx_bpf_dsq_move_to_local() or scx_bpf_dsq_move() - checks whether the task can be migrated to the target CPU using task_can_run_on_remote_rq(). If the task can't be migrated to the targeted CPU, it is bounced through a global DSQ.
task_can_run_on_remote_rq() assumes that the task is on a CPU that's different from the targeted CPU but the callers doesn't uphold the assumption and may call the function when the task is already on the target CPU. When such task has migration disabled, task_can_run_on_remote_rq() ends up returning %false incorrectly unnecessarily bouncing the task to a global DSQ.
Fix it by updating the callers to only call task_can_run_on_remote_rq() when the task is on a different CPU than the target CPU. As this is a bit subtle, for clarity and documentation:
- Make task_can_run_on_remote_rq() trigger SCHED_WARN_ON() if the task is on the same CPU as the target CPU.
- is_migration_disabled() test in task_can_run_on_remote_rq() cannot trigger if the task is on a different CPU than the target CPU as the preceding task_allowed_on_cpu() test should fail beforehand. Convert the test into SCHED_WARN_ON().
Signed-off-by: Tejun Heo tj@kernel.org Fixes: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()") Fixes: 0366017e0973 ("sched_ext: Use task_can_run_on_remote_rq() test in dispatch_to_local_dsq()") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/ext.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 97076748dee0e..0fc18fc5753b0 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2300,12 +2300,16 @@ static void move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags, * * - The BPF scheduler is bypassed while the rq is offline and we can always say * no to the BPF scheduler initiated migrations while offline. + * + * The caller must ensure that @p and @rq are on different CPUs. */ static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq, bool trigger_error) { int cpu = cpu_of(rq);
+ SCHED_WARN_ON(task_cpu(p) == cpu); + /* * We don't require the BPF scheduler to avoid dispatching to offline * CPUs mostly for convenience but also because CPUs can go offline @@ -2319,8 +2323,11 @@ static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq, return false; }
- if (unlikely(is_migration_disabled(p))) - return false; + /* + * If @p has migration disabled, @p->cpus_ptr only contains its current + * CPU and the above task_allowed_on_cpu() test should have failed. + */ + SCHED_WARN_ON(is_migration_disabled(p));
if (!scx_rq_online(rq)) return false; @@ -2424,7 +2431,8 @@ static struct rq *move_task_between_dsqs(struct task_struct *p, u64 enq_flags,
if (dst_dsq->id == SCX_DSQ_LOCAL) { dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq); - if (!task_can_run_on_remote_rq(p, dst_rq, true)) { + if (src_rq != dst_rq && + unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) { dst_dsq = find_global_dsq(p); dst_rq = src_rq; } @@ -2541,7 +2549,8 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq, }
#ifdef CONFIG_SMP - if (unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) { + if (src_rq != dst_rq && + unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) { dispatch_enqueue(find_global_dsq(p), p, enq_flags | SCX_ENQ_CLEAR_OPSS); return;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Keeping jkeeping@inmusicbrands.com
[ Upstream commit 6b24e67b4056ba83b1e95e005b7e50fdb1cc6cf4 ]
Commit 2f45a4e289779 ("ASoC: rockchip: i2s_tdm: Fixup config for SND_SOC_DAIFMT_DSP_A/B") applied a partial change to fix the configuration for DSP A and DSP B formats.
The shift control also needs updating to set the correct offset for frame data compared to LRCK. Set the correct values.
Fixes: 081068fd64140 ("ASoC: rockchip: add support for i2s-tdm controller") Signed-off-by: John Keeping jkeeping@inmusicbrands.com Link: https://patch.msgid.link/20250204161311.2117240-1-jkeeping@inmusicbrands.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/rockchip/rockchip_i2s_tdm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/rockchip/rockchip_i2s_tdm.c b/sound/soc/rockchip/rockchip_i2s_tdm.c index acd75e48851fc..7feefeb6b876d 100644 --- a/sound/soc/rockchip/rockchip_i2s_tdm.c +++ b/sound/soc/rockchip/rockchip_i2s_tdm.c @@ -451,11 +451,11 @@ static int rockchip_i2s_tdm_set_fmt(struct snd_soc_dai *cpu_dai, break; case SND_SOC_DAIFMT_DSP_A: val = I2S_TXCR_TFS_TDM_PCM; - tdm_val = TDM_SHIFT_CTRL(0); + tdm_val = TDM_SHIFT_CTRL(2); break; case SND_SOC_DAIFMT_DSP_B: val = I2S_TXCR_TFS_TDM_PCM; - tdm_val = TDM_SHIFT_CTRL(2); + tdm_val = TDM_SHIFT_CTRL(4); break; default: ret = -EINVAL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Ujfalusi peter.ujfalusi@linux.intel.com
[ Upstream commit 6fd60136d256b3b948333ebdb3835f41a95ab7ef ]
Other, non DAI copier widgets could have the same stream name (sname) as the ALH copier and in that case the copier->data is NULL, no alh_data is attached, which could lead to NULL pointer dereference. We could check for this NULL pointer in sof_ipc4_prepare_copier_module() and avoid the crash, but a similar loop in sof_ipc4_widget_setup_comp_dai() will miscalculate the ALH device count, causing broken audio.
The correct fix is to harden the matching logic by making sure that the 1. widget is a DAI widget - so dai = w->private is valid 2. the dai (and thus the copier) is ALH copier
Fixes: a150345aa758 ("ASoC: SOF: ipc4-topology: add SoundWire/ALH aggregation support") Reported-by: Seppo Ingalsuo seppo.ingalsuo@linux.intel.com Link: https://github.com/thesofproject/sof/pull/9652 Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Liam Girdwood liam.r.girdwood@intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Link: https://patch.msgid.link/20250206084642.14988-1-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/ipc4-topology.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/sound/soc/sof/ipc4-topology.c b/sound/soc/sof/ipc4-topology.c index 240fee2166d12..f82db7f2a6b7e 100644 --- a/sound/soc/sof/ipc4-topology.c +++ b/sound/soc/sof/ipc4-topology.c @@ -671,10 +671,16 @@ static int sof_ipc4_widget_setup_comp_dai(struct snd_sof_widget *swidget) }
list_for_each_entry(w, &sdev->widget_list, list) { - if (w->widget->sname && + struct snd_sof_dai *alh_dai; + + if (!WIDGET_IS_DAI(w->id) || !w->widget->sname || strcmp(w->widget->sname, swidget->widget->sname)) continue;
+ alh_dai = w->private; + if (alh_dai->type != SOF_DAI_INTEL_ALH) + continue; + blob->alh_cfg.device_count++; }
@@ -1973,11 +1979,13 @@ sof_ipc4_prepare_copier_module(struct snd_sof_widget *swidget, list_for_each_entry(w, &sdev->widget_list, list) { u32 node_type;
- if (w->widget->sname && + if (!WIDGET_IS_DAI(w->id) || !w->widget->sname || strcmp(w->widget->sname, swidget->widget->sname)) continue;
dai = w->private; + if (dai->type != SOF_DAI_INTEL_ALH) + continue; alh_copier = (struct sof_ipc4_copier *)dai->private; alh_data = &alh_copier->data; node_type = SOF_IPC4_GET_NODE_TYPE(alh_data->gtw_cfg.node_id);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit dc9c5166c3cb044f8a001e397195242fd6796eee ]
Erhard reports the following KASAN hit on Talos II (power9) with kernel 6.13:
[ 12.028126] ================================================================== [ 12.028198] BUG: KASAN: user-memory-access in copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028260] Write of size 8 at addr 0000187e458f2000 by task systemd/1
[ 12.028346] CPU: 87 UID: 0 PID: 1 Comm: systemd Tainted: G T 6.13.0-P9-dirty #3 [ 12.028408] Tainted: [T]=RANDSTRUCT [ 12.028446] Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [ 12.028500] Call Trace: [ 12.028536] [c000000008dbf3b0] [c000000001656a48] dump_stack_lvl+0xbc/0x110 (unreliable) [ 12.028609] [c000000008dbf3f0] [c0000000006e2fc8] print_report+0x6b0/0x708 [ 12.028666] [c000000008dbf4e0] [c0000000006e2454] kasan_report+0x164/0x300 [ 12.028725] [c000000008dbf600] [c0000000006e54d4] kasan_check_range+0x314/0x370 [ 12.028784] [c000000008dbf640] [c0000000006e6310] __kasan_check_write+0x20/0x40 [ 12.028842] [c000000008dbf660] [c000000000578e8c] copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028902] [c000000008dbf6a0] [c0000000000acfe4] __patch_instructions+0x194/0x210 [ 12.028965] [c000000008dbf6e0] [c0000000000ade80] patch_instructions+0x150/0x590 [ 12.029026] [c000000008dbf7c0] [c0000000001159bc] bpf_arch_text_copy+0x6c/0xe0 [ 12.029085] [c000000008dbf800] [c000000000424250] bpf_jit_binary_pack_finalize+0x40/0xc0 [ 12.029147] [c000000008dbf830] [c000000000115dec] bpf_int_jit_compile+0x3bc/0x930 [ 12.029206] [c000000008dbf990] [c000000000423720] bpf_prog_select_runtime+0x1f0/0x280 [ 12.029266] [c000000008dbfa00] [c000000000434b18] bpf_prog_load+0xbb8/0x1370 [ 12.029324] [c000000008dbfb70] [c000000000436ebc] __sys_bpf+0x5ac/0x2e00 [ 12.029379] [c000000008dbfd00] [c00000000043a228] sys_bpf+0x28/0x40 [ 12.029435] [c000000008dbfd20] [c000000000038eb4] system_call_exception+0x334/0x610 [ 12.029497] [c000000008dbfe50] [c00000000000c270] system_call_vectored_common+0xf0/0x280 [ 12.029561] --- interrupt: 3000 at 0x3fff82f5cfa8 [ 12.029608] NIP: 00003fff82f5cfa8 LR: 00003fff82f5cfa8 CTR: 0000000000000000 [ 12.029660] REGS: c000000008dbfe80 TRAP: 3000 Tainted: G T (6.13.0-P9-dirty) [ 12.029735] MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 42004848 XER: 00000000 [ 12.029855] IRQMASK: 0 GPR00: 0000000000000169 00003fffdcf789a0 00003fff83067100 0000000000000005 GPR04: 00003fffdcf78a98 0000000000000090 0000000000000000 0000000000000008 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00003fff836ff7e0 c000000000010678 0000000000000000 GPR16: 0000000000000000 0000000000000000 00003fffdcf78f28 00003fffdcf78f90 GPR20: 0000000000000000 0000000000000000 0000000000000000 00003fffdcf78f80 GPR24: 00003fffdcf78f70 00003fffdcf78d10 00003fff835c7239 00003fffdcf78bd8 GPR28: 00003fffdcf78a98 0000000000000000 0000000000000000 000000011f547580 [ 12.030316] NIP [00003fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030361] LR [00003fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030405] --- interrupt: 3000 [ 12.030444] ==================================================================
Commit c28c15b6d28a ("powerpc/code-patching: Use temporary mm for Radix MMU") is inspired from x86 but unlike x86 is doesn't disable KASAN reports during patching. This wasn't a problem at the begining because __patch_mem() is not instrumented.
Commit 465cabc97b42 ("powerpc/code-patching: introduce patch_instructions()") use copy_to_kernel_nofault() to copy several instructions at once. But when using temporary mm the destination is not regular kernel memory but a kind of kernel-like memory located in user address space. Because it is not in kernel address space it is not covered by KASAN shadow memory. Since commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault") KASAN reports bad accesses from copy_to_kernel_nofault(). Here a bad access to user memory is reported because KASAN detects the lack of shadow memory and the address is below TASK_SIZE.
Do like x86 in commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") and disable KASAN reports during patching when using temporary mm.
Reported-by: Erhard Furtner erhard_f@mailbox.org Close: https://lore.kernel.org/all/20250201151435.48400261@yea/ Fixes: 465cabc97b42 ("powerpc/code-patching: introduce patch_instructions()") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Acked-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/1c05b2a1b02ad75b981cfc45927e0b4a90441046.1738577687... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/lib/code-patching.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index acdab294b340a..2685d7efea511 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -493,7 +493,9 @@ static int __do_patch_instructions_mm(u32 *addr, u32 *code, size_t len, bool rep
orig_mm = start_using_temp_mm(patching_mm);
+ kasan_disable_current(); err = __patch_instructions(patch_addr, code, len, repeat_instr); + kasan_enable_current();
/* context synchronisation performed by __patch_instructions */ stop_using_temp_mm(patching_mm, orig_mm);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit 61bcc752d1b81fde3cae454ff20c1d3c359df500 ]
Rewrite __real_pte() and __rpte_to_hidx() as static inline in order to avoid following warnings/errors when building with 4k page size:
CC arch/powerpc/mm/book3s64/hash_tlb.o arch/powerpc/mm/book3s64/hash_tlb.c: In function 'hpte_need_flush': arch/powerpc/mm/book3s64/hash_tlb.c:49:16: error: variable 'offset' set but not used [-Werror=unused-but-set-variable] 49 | int i, offset; | ^~~~~~
CC arch/powerpc/mm/book3s64/hash_native.o arch/powerpc/mm/book3s64/hash_native.c: In function 'native_flush_hash_range': arch/powerpc/mm/book3s64/hash_native.c:782:29: error: variable 'index' set but not used [-Werror=unused-but-set-variable] 782 | unsigned long hash, index, hidx, shift, slot; | ^~~~~
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202501081741.AYFwybsq-lkp@intel.com/ Fixes: ff31e105464d ("powerpc/mm/hash64: Store the slot information at the right offset for hugetlb") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/e0d340a5b7bd478ecbf245d826e6ab2778b74e06.1736706263... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/book3s/64/hash-4k.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index c3efacab4b941..aa90a048f319a 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -77,9 +77,17 @@ /* * With 4K page size the real_pte machinery is all nops. */ -#define __real_pte(e, p, o) ((real_pte_t){(e)}) +static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep, int offset) +{ + return (real_pte_t){pte}; +} + #define __rpte_to_pte(r) ((r).pte) -#define __rpte_to_hidx(r,index) (pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT) + +static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index) +{ + return pte_val(__rpte_to_pte(rpte)) >> H_PAGE_F_GIX_SHIFT; +}
#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift) \ do { \
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
[ Upstream commit 174448badb4409491bfba2e6b46f7aa078741c5e ]
Headset MIC will no function when power_save=0.
Fixes: 1fd50509fe14 ("ALSA: hda/realtek: Update ALC225 depop procedure") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219743 Signed-off-by: Kailang Yang kailang@realtek.com Link: https://lore.kernel.org/0474a095ab0044d0939ec4bf4362423d@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index f3f849b96402d..9bf99fe6cd34d 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -3790,6 +3790,7 @@ static void alc225_init(struct hda_codec *codec) AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE);
msleep(75); + alc_update_coef_idx(codec, 0x4a, 3 << 10, 0); alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x4); /* Hight power */ } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit d262a192d38e527faa5984629aabda2e0d1c4f54 ]
Erhard reported the following KASAN hit while booting his PowerMac G4 with a KASAN-enabled kernel 6.13-rc6:
BUG: KASAN: vmalloc-out-of-bounds in copy_to_kernel_nofault+0xd8/0x1c8 Write of size 8 at addr f1000000 by task chronyd/1293
CPU: 0 UID: 123 PID: 1293 Comm: chronyd Tainted: G W 6.13.0-rc6-PMacG4 #2 Tainted: [W]=WARN Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [c2437590] [c1631a84] dump_stack_lvl+0x70/0x8c (unreliable) [c24375b0] [c0504998] print_report+0xdc/0x504 [c2437610] [c050475c] kasan_report+0xf8/0x108 [c2437690] [c0505a3c] kasan_check_range+0x24/0x18c [c24376a0] [c03fb5e4] copy_to_kernel_nofault+0xd8/0x1c8 [c24376c0] [c004c014] patch_instructions+0x15c/0x16c [c2437710] [c00731a8] bpf_arch_text_copy+0x60/0x7c [c2437730] [c0281168] bpf_jit_binary_pack_finalize+0x50/0xac [c2437750] [c0073cf4] bpf_int_jit_compile+0xb30/0xdec [c2437880] [c0280394] bpf_prog_select_runtime+0x15c/0x478 [c24378d0] [c1263428] bpf_prepare_filter+0xbf8/0xc14 [c2437990] [c12677ec] bpf_prog_create_from_user+0x258/0x2b4 [c24379d0] [c027111c] do_seccomp+0x3dc/0x1890 [c2437ac0] [c001d8e0] system_call_exception+0x2dc/0x420 [c2437f30] [c00281ac] ret_from_syscall+0x0/0x2c --- interrupt: c00 at 0x5a1274 NIP: 005a1274 LR: 006a3b3c CTR: 005296c8 REGS: c2437f40 TRAP: 0c00 Tainted: G W (6.13.0-rc6-PMacG4) MSR: 0200f932 <VEC,EE,PR,FP,ME,IR,DR,RI> CR: 24004422 XER: 00000000
GPR00: 00000166 af8f3fa0 a7ee3540 00000001 00000000 013b6500 005a5858 0200f932 GPR08: 00000000 00001fe9 013d5fc8 005296c8 2822244c 00b2fcd8 00000000 af8f4b57 GPR16: 00000000 00000001 00000000 00000000 00000000 00000001 00000000 00000002 GPR24: 00afdbb0 00000000 00000000 00000000 006e0004 013ce060 006e7c1c 00000001 NIP [005a1274] 0x5a1274 LR [006a3b3c] 0x6a3b3c --- interrupt: c00
The buggy address belongs to the virtual mapping at [f1000000, f1002000) created by: text_area_cpu_up+0x20/0x190
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x76e30 flags: 0x80000000(zone=2) raw: 80000000 00000000 00000122 00000000 00000000 00000000 ffffffff 00000001 raw: 00000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: f0ffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0ffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f1000000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
^ f1000080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f1000100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ==================================================================
f8 corresponds to KASAN_VMALLOC_INVALID which means the area is not initialised hence not supposed to be used yet.
Powerpc text patching infrastructure allocates a virtual memory area using get_vm_area() and flags it as VM_ALLOC. But that flag is meant to be used for vmalloc() and vmalloc() allocated memory is not supposed to be used before a call to __vmalloc_node_range() which is never called for that area.
That went undetected until commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault")
The area allocated by text_area_cpu_up() is not vmalloc memory, it is mapped directly on demand when needed by map_kernel_page(). There is no VM flag corresponding to such usage, so just pass no flag. That way the area will be unpoisonned and usable immediately.
Reported-by: Erhard Furtner erhard_f@mailbox.org Closes: https://lore.kernel.org/all/20250112135832.57c92322@yea/ Fixes: 37bc3e5fd764 ("powerpc/lib/code-patching: Use alternate map for patch_instruction()") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/06621423da339b374f48c0886e3a5db18e896be8.1739342693... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/lib/code-patching.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 2685d7efea511..c1d9b031f0d57 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -108,7 +108,7 @@ static int text_area_cpu_up(unsigned int cpu) unsigned long addr; int err;
- area = get_vm_area(PAGE_SIZE, VM_ALLOC); + area = get_vm_area(PAGE_SIZE, 0); if (!area) { WARN_ONCE(1, "Failed to create text area for cpu %d\n", cpu);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shengjiu Wang shengjiu.wang@nxp.com
[ Upstream commit 571b69f2f9b1ec7cf7d0e9b79e52115a87a869c4 ]
When defer probe happens, there may be below error:
platform 59820000.sai: Resources present before probing
The cpu_mclk clock is from the cpu dai device, if it is not released, then the cpu dai device probe will fail for the second time.
The cpu_mclk is used to get rate for rate constraint, rate constraint may be specific for each platform, which is not necessary for machine driver, so remove it.
Fixes: b86ef5367761 ("ASoC: fsl: Add Audio Mixer machine driver") Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Link: https://patch.msgid.link/20250213070518.547375-1-shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/fsl/imx-audmix.c | 31 ------------------------------- 1 file changed, 31 deletions(-)
diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c index 8e7b75cf64db4..ff3671226306b 100644 --- a/sound/soc/fsl/imx-audmix.c +++ b/sound/soc/fsl/imx-audmix.c @@ -23,7 +23,6 @@ struct imx_audmix { struct snd_soc_card card; struct platform_device *audmix_pdev; struct platform_device *out_pdev; - struct clk *cpu_mclk; int num_dai; struct snd_soc_dai_link *dai; int num_dai_conf; @@ -32,34 +31,11 @@ struct imx_audmix { struct snd_soc_dapm_route *dapm_routes; };
-static const u32 imx_audmix_rates[] = { - 8000, 12000, 16000, 24000, 32000, 48000, 64000, 96000, -}; - -static const struct snd_pcm_hw_constraint_list imx_audmix_rate_constraints = { - .count = ARRAY_SIZE(imx_audmix_rates), - .list = imx_audmix_rates, -}; - static int imx_audmix_fe_startup(struct snd_pcm_substream *substream) { - struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream); - struct imx_audmix *priv = snd_soc_card_get_drvdata(rtd->card); struct snd_pcm_runtime *runtime = substream->runtime; - struct device *dev = rtd->card->dev; - unsigned long clk_rate = clk_get_rate(priv->cpu_mclk); int ret;
- if (clk_rate % 24576000 == 0) { - ret = snd_pcm_hw_constraint_list(runtime, 0, - SNDRV_PCM_HW_PARAM_RATE, - &imx_audmix_rate_constraints); - if (ret < 0) - return ret; - } else { - dev_warn(dev, "mclk may be not supported %lu\n", clk_rate); - } - ret = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_CHANNELS, 1, 8); if (ret < 0) @@ -325,13 +301,6 @@ static int imx_audmix_probe(struct platform_device *pdev) } put_device(&cpu_pdev->dev);
- priv->cpu_mclk = devm_clk_get(&cpu_pdev->dev, "mclk1"); - if (IS_ERR(priv->cpu_mclk)) { - ret = PTR_ERR(priv->cpu_mclk); - dev_err(&cpu_pdev->dev, "failed to get DAI mclk1: %d\n", ret); - return ret; - } - priv->audmix_pdev = audmix_pdev; priv->out_pdev = cpu_pdev;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Junnan Wu junnan01.wu@samsung.com
[ Upstream commit 55eff109e76a14e5ed10c8c3c3978d20a35e2a4d ]
When executing suspend to ram twice in a row, the `rx_buf_nr` and `rx_buf_max_nr` increase to three times vq->num_free. Then after virtqueue_get_buf and `rx_buf_nr` decreased in function virtio_transport_rx_work, the condition to fill rx buffer (rx_buf_nr < rx_buf_max_nr / 2) will never be met.
It is because that `rx_buf_nr` and `rx_buf_max_nr` are initialized only in virtio_vsock_probe(), but they should be reset whenever virtqueues are recreated, like after a suspend/resume.
Move the `rx_buf_nr` and `rx_buf_max_nr` initialization in virtio_vsock_vqs_init(), so we are sure that they are properly initialized, every time we initialize the virtqueues, either when we load the driver or after a suspend/resume.
To prevent erroneous atomic load operations on the `queued_replies` in the virtio_transport_send_pkt_work() function which may disrupt the scheduling of vsock->rx_work when transmitting reply-required socket packets, this atomic variable must undergo synchronized initialization alongside the preceding two variables after a suspend/resume.
Fixes: bd50c5dc182b ("vsock/virtio: add support for device suspend/resume") Link: https://lore.kernel.org/virtualization/20250207052033.2222629-1-junnan01.wu@... Co-developed-by: Ying Gao ying01.gao@samsung.com Signed-off-by: Ying Gao ying01.gao@samsung.com Signed-off-by: Junnan Wu junnan01.wu@samsung.com Reviewed-by: Luigi Leonardi leonardi@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Link: https://patch.msgid.link/20250214012200.1883896-1-junnan01.wu@samsung.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/virtio_transport.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index b58c3818f284f..f0e48e6911fc4 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -670,6 +670,13 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock) }; int ret;
+ mutex_lock(&vsock->rx_lock); + vsock->rx_buf_nr = 0; + vsock->rx_buf_max_nr = 0; + mutex_unlock(&vsock->rx_lock); + + atomic_set(&vsock->queued_replies, 0); + ret = virtio_find_vqs(vdev, VSOCK_VQ_MAX, vsock->vqs, vqs_info, NULL); if (ret < 0) return ret; @@ -779,9 +786,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
vsock->vdev = vdev;
- vsock->rx_buf_nr = 0; - vsock->rx_buf_max_nr = 0; - atomic_set(&vsock->queued_replies, 0);
mutex_init(&vsock->tx_lock); mutex_init(&vsock->rx_lock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 9593172d93b9f91c362baec4643003dc29802929 ]
syzkaller reported a use-after-free in geneve_find_dev() [0] without repro.
geneve_configure() links struct geneve_dev.next to net_generic(net, geneve_net_id)->geneve_list.
The net here could differ from dev_net(dev) if IFLA_NET_NS_PID, IFLA_NET_NS_FD, or IFLA_TARGET_NETNSID is set.
When dev_net(dev) is dismantled, geneve_exit_batch_rtnl() finally calls unregister_netdevice_queue() for each dev in the netns, and later the dev is freed.
However, its geneve_dev.next is still linked to the backend UDP socket netns.
Then, use-after-free will occur when another geneve dev is created in the netns.
Let's call geneve_dellink() instead in geneve_destroy_tunnels().
[0]: BUG: KASAN: slab-use-after-free in geneve_find_dev drivers/net/geneve.c:1295 [inline] BUG: KASAN: slab-use-after-free in geneve_configure+0x234/0x858 drivers/net/geneve.c:1343 Read of size 2 at addr ffff000054d6ee24 by task syz.1.4029/13441
CPU: 1 UID: 0 PID: 13441 Comm: syz.1.4029 Not tainted 6.13.0-g0ad9617c78ac #24 dc35ca22c79fb82e8e7bc5c9c9adafea898b1e3d Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x16c/0x6f0 mm/kasan/report.c:489 kasan_report+0xc0/0x120 mm/kasan/report.c:602 __asan_report_load2_noabort+0x20/0x30 mm/kasan/report_generic.c:379 geneve_find_dev drivers/net/geneve.c:1295 [inline] geneve_configure+0x234/0x858 drivers/net/geneve.c:1343 geneve_newlink+0xb8/0x128 drivers/net/geneve.c:1634 rtnl_newlink_create+0x23c/0x868 net/core/rtnetlink.c:3795 __rtnl_newlink net/core/rtnetlink.c:3906 [inline] rtnl_newlink+0x1054/0x1630 net/core/rtnetlink.c:4021 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2543 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6938 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:713 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x410/0x6f8 net/socket.c:2568 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2622 __sys_sendmsg net/socket.c:2654 [inline] __do_sys_sendmsg net/socket.c:2659 [inline] __se_sys_sendmsg net/socket.c:2657 [inline] __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2657 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
Allocated by task 13247: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x30/0x68 mm/kasan/common.c:68 kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __do_kmalloc_node mm/slub.c:4298 [inline] __kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4304 __kvmalloc_node_noprof+0x9c/0x230 mm/util.c:645 alloc_netdev_mqs+0xb8/0x11a0 net/core/dev.c:11470 rtnl_create_link+0x2b8/0xb50 net/core/rtnetlink.c:3604 rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3780 __rtnl_newlink net/core/rtnetlink.c:3906 [inline] rtnl_newlink+0x1054/0x1630 net/core/rtnetlink.c:4021 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2543 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6938 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:713 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x410/0x6f8 net/socket.c:2568 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2622 __sys_sendmsg net/socket.c:2654 [inline] __do_sys_sendmsg net/socket.c:2659 [inline] __se_sys_sendmsg net/socket.c:2657 [inline] __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2657 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
Freed by task 45: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x30/0x68 mm/kasan/common.c:68 kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x48/0x68 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2353 [inline] slab_free mm/slub.c:4613 [inline] kfree+0x140/0x420 mm/slub.c:4761 kvfree+0x4c/0x68 mm/util.c:688 netdev_release+0x94/0xc8 net/core/net-sysfs.c:2065 device_release+0x98/0x1c0 kobject_cleanup lib/kobject.c:689 [inline] kobject_release lib/kobject.c:720 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x2b0/0x438 lib/kobject.c:737 netdev_run_todo+0xe5c/0xfc8 net/core/dev.c:11185 rtnl_unlock+0x20/0x38 net/core/rtnetlink.c:151 cleanup_net+0x4fc/0x8c0 net/core/net_namespace.c:648 process_one_work+0x700/0x1398 kernel/workqueue.c:3236 process_scheduled_works kernel/workqueue.c:3317 [inline] worker_thread+0x8c4/0xe10 kernel/workqueue.c:3398 kthread+0x4bc/0x608 kernel/kthread.c:464 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862
The buggy address belongs to the object at ffff000054d6e000 which belongs to the cache kmalloc-cg-4k of size 4096 The buggy address is located 3620 bytes inside of freed 4096-byte region [ffff000054d6e000, ffff000054d6f000)
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x94d68 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff000016276181 flags: 0x3fffe0000000040(head|node=0|zone=0|lastcpupid=0x1ffff) page_type: f5(slab) raw: 03fffe0000000040 ffff0000c000f500 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000040004 00000001f5000000 ffff000016276181 head: 03fffe0000000040 ffff0000c000f500 dead000000000122 0000000000000000 head: 0000000000000000 0000000000040004 00000001f5000000 ffff000016276181 head: 03fffe0000000003 fffffdffc1535a01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff000054d6ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000054d6ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff000054d6ee00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff000054d6ee80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000054d6ef00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250213043354.91368-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/geneve.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index ba15a0a4ce629..9c53b0bbb4c57 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1907,16 +1907,11 @@ static void geneve_destroy_tunnels(struct net *net, struct list_head *head) /* gather any geneve devices that were moved into this ns */ for_each_netdev_safe(net, dev, aux) if (dev->rtnl_link_ops == &geneve_link_ops) - unregister_netdevice_queue(dev, head); + geneve_dellink(dev, head);
/* now gather any other geneve devices that were created in this ns */ - list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) { - /* If geneve->dev is in the same netns, it was already added - * to the list by the previous loop. - */ - if (!net_eq(dev_net(geneve->dev), net)) - unregister_netdevice_queue(geneve->dev, head); - } + list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) + geneve_dellink(geneve->dev, head); }
static void __net_exit geneve_exit_batch_rtnl(struct list_head *net_list,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Rodionov vitalyr@opensource.cirrus.com
[ Upstream commit 08b613b9e2ba431db3bd15cb68ca72472a50ef5c ]
This patch corrects the full-scale volume setting logic. On certain platforms, the full-scale volume bit is required. The current logic mistakenly sets this bit and incorrectly clears reserved bit 0, causing the headphone output to be muted.
Fixes: 342b6b610ae2 ("ALSA: hda/cs8409: Fix Full Scale Volume setting for all variants") Signed-off-by: Vitaly Rodionov vitalyr@opensource.cirrus.com Link: https://patch.msgid.link/20250214210736.30814-1-vitalyr@opensource.cirrus.co... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_cs8409-tables.c | 6 +++--- sound/pci/hda/patch_cs8409.c | 20 +++++++++++--------- sound/pci/hda/patch_cs8409.h | 5 +++-- 3 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/sound/pci/hda/patch_cs8409-tables.c b/sound/pci/hda/patch_cs8409-tables.c index 759f48038273d..621f947e38174 100644 --- a/sound/pci/hda/patch_cs8409-tables.c +++ b/sound/pci/hda/patch_cs8409-tables.c @@ -121,7 +121,7 @@ static const struct cs8409_i2c_param cs42l42_init_reg_seq[] = { { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3f }, - { CS42L42_HP_CTL, 0x03 }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_MIC_DET_CTL1, 0xB6 }, { CS42L42_TIPSENSE_CTL, 0xC2 }, { CS42L42_HS_CLAMP_DISABLE, 0x01 }, @@ -315,7 +315,7 @@ static const struct cs8409_i2c_param dolphin_c0_init_reg_seq[] = { { CS42L42_ASP_TX_SZ_EN, 0x01 }, { CS42L42_PWR_CTL1, 0x0A }, { CS42L42_PWR_CTL2, 0x84 }, - { CS42L42_HP_CTL, 0x03 }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3f }, @@ -371,7 +371,7 @@ static const struct cs8409_i2c_param dolphin_c1_init_reg_seq[] = { { CS42L42_ASP_TX_SZ_EN, 0x00 }, { CS42L42_PWR_CTL1, 0x0E }, { CS42L42_PWR_CTL2, 0x84 }, - { CS42L42_HP_CTL, 0x01 }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3f }, diff --git a/sound/pci/hda/patch_cs8409.c b/sound/pci/hda/patch_cs8409.c index 614327218634c..b760332a4e357 100644 --- a/sound/pci/hda/patch_cs8409.c +++ b/sound/pci/hda/patch_cs8409.c @@ -876,7 +876,7 @@ static void cs42l42_resume(struct sub_codec *cs42l42) { CS42L42_DET_INT_STATUS2, 0x00 }, { CS42L42_TSRS_PLUG_STATUS, 0x00 }, }; - int fsv_old, fsv_new; + unsigned int fsv;
/* Bring CS42L42 out of Reset */ spec->gpio_data = snd_hda_codec_read(codec, CS8409_PIN_AFG, 0, AC_VERB_GET_GPIO_DATA, 0); @@ -893,13 +893,15 @@ static void cs42l42_resume(struct sub_codec *cs42l42) /* Clear interrupts, by reading interrupt status registers */ cs8409_i2c_bulk_read(cs42l42, irq_regs, ARRAY_SIZE(irq_regs));
- fsv_old = cs8409_i2c_read(cs42l42, CS42L42_HP_CTL); - if (cs42l42->full_scale_vol == CS42L42_FULL_SCALE_VOL_0DB) - fsv_new = fsv_old & ~CS42L42_FULL_SCALE_VOL_MASK; - else - fsv_new = fsv_old & CS42L42_FULL_SCALE_VOL_MASK; - if (fsv_new != fsv_old) - cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv_new); + fsv = cs8409_i2c_read(cs42l42, CS42L42_HP_CTL); + if (cs42l42->full_scale_vol) { + // Set the full scale volume bit + fsv |= CS42L42_FULL_SCALE_VOL_MASK; + cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv); + } + // Unmute analog channels A and B + fsv = (fsv & ~CS42L42_ANA_MUTE_AB); + cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv);
/* we have to explicitly allow unsol event handling even during the * resume phase so that the jack event is processed properly @@ -920,7 +922,7 @@ static void cs42l42_suspend(struct sub_codec *cs42l42) { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, - { CS42L42_HP_CTL, 0x0F }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_ASP_RX_DAI0_EN, 0x00 }, { CS42L42_ASP_CLK_CFG, 0x00 }, { CS42L42_PWR_CTL1, 0xFE }, diff --git a/sound/pci/hda/patch_cs8409.h b/sound/pci/hda/patch_cs8409.h index 5e48115caf096..14645d25e70fd 100644 --- a/sound/pci/hda/patch_cs8409.h +++ b/sound/pci/hda/patch_cs8409.h @@ -230,9 +230,10 @@ enum cs8409_coefficient_index_registers { #define CS42L42_PDN_TIMEOUT_US (250000) #define CS42L42_PDN_SLEEP_US (2000) #define CS42L42_INIT_TIMEOUT_MS (45) +#define CS42L42_ANA_MUTE_AB (0x0C) #define CS42L42_FULL_SCALE_VOL_MASK (2) -#define CS42L42_FULL_SCALE_VOL_0DB (1) -#define CS42L42_FULL_SCALE_VOL_MINUS6DB (0) +#define CS42L42_FULL_SCALE_VOL_0DB (0) +#define CS42L42_FULL_SCALE_VOL_MINUS6DB (1)
/* Dell BULLSEYE / WARLOCK / CYBORG Specific Definitions */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre Riteau pierre@stackhpc.com
[ Upstream commit 071ed42cff4fcdd89025d966d48eabef59913bf2 ]
tcf_exts_miss_cookie_base_alloc() calls xa_alloc_cyclic() which can return 1 if the allocation succeeded after wrapping. This was treated as an error, with value 1 returned to caller tcf_exts_init_ex() which sets exts->actions to NULL and returns 1 to caller fl_change().
fl_change() treats err == 1 as success, calling tcf_exts_validate_ex() which calls tcf_action_init() with exts->actions as argument, where it is dereferenced.
Example trace:
BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 114 PID: 16151 Comm: handler114 Kdump: loaded Not tainted 5.14.0-503.16.1.el9_5.x86_64 #1 RIP: 0010:tcf_action_init+0x1f8/0x2c0 Call Trace: tcf_action_init+0x1f8/0x2c0 tcf_exts_validate_ex+0x175/0x190 fl_change+0x537/0x1120 [cls_flower]
Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by: Pierre Riteau pierre@stackhpc.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Link: https://patch.msgid.link/20250213223610.320278-1-pierre@stackhpc.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index dfa3067084948..998ea3b5badfc 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -97,7 +97,7 @@ tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base, n, xa_limit_32b, &next, GFP_KERNEL); - if (err) + if (err < 0) goto err_xa_alloc;
exts->miss_cookie_node = n;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit e77aa4b2eaa7fb31b2a7a50214ecb946b2a8b0f6 ]
When a destination client is a user client in the legacy MIDI mode and it sets the no-UMP-conversion flag, currently the all UMP events are still passed as-is. But this may confuse the user-space, because the event packet size is different from the legacy mode.
Since we cannot handle UMP events in user clients unless it's running in the UMP client mode, we should filter out those events instead of accepting blindly. This patch addresses it by slightly adjusting the conditions for UMP event handling at the event delivery time.
Fixes: 329ffe11a014 ("ALSA: seq: Allow suppressing UMP conversions") Link: https://lore.kernel.org/b77a2cd6-7b59-4eb0-a8db-22d507d3af5f@gmail.com Link: https://patch.msgid.link/20250217170034.21930-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/seq/seq_clientmgr.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c index 77b6ac9b5c11b..9955c4d54e42a 100644 --- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -678,12 +678,18 @@ static int snd_seq_deliver_single_event(struct snd_seq_client *client, dest_port->time_real);
#if IS_ENABLED(CONFIG_SND_SEQ_UMP) - if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { - if (snd_seq_ev_is_ump(event)) { + if (snd_seq_ev_is_ump(event)) { + if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { result = snd_seq_deliver_from_ump(client, dest, dest_port, event, atomic, hop); goto __skip; - } else if (snd_seq_client_is_ump(dest)) { + } else if (dest->type == USER_CLIENT && + !snd_seq_client_is_ump(dest)) { + result = 0; // drop the event + goto __skip; + } + } else if (snd_seq_client_is_ump(dest)) { + if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { result = snd_seq_deliver_to_ump(client, dest, dest_port, event, atomic, hop); goto __skip;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Ruess julianr@linux.ibm.com
[ Upstream commit 915e34d5ad35a6a9e56113f852ade4a730fb88f0 ]
According to device_release() in /drivers/base/core.c, a device without a release function is a broken device and must be fixed.
The current code directly frees the device after calling device_add() without waiting for other kernel parts to release their references. Thus, a reference could still be held to a struct device, e.g., by sysfs, leading to potential use-after-free issues if a proper release function is not set.
Fixes: 8c81ba20349d ("net/smc: De-tangle ism and smc device initialization") Reviewed-by: Alexandra Winter wintera@linux.ibm.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Signed-off-by: Julian Ruess julianr@linux.ibm.com Signed-off-by: Alexandra Winter wintera@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250214120137.563409-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/ism_drv.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index e36e3ea165d3b..2f34761e64135 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -588,6 +588,15 @@ static int ism_dev_init(struct ism_dev *ism) return ret; }
+static void ism_dev_release(struct device *dev) +{ + struct ism_dev *ism; + + ism = container_of(dev, struct ism_dev, dev); + + kfree(ism); +} + static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct ism_dev *ism; @@ -601,6 +610,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) dev_set_drvdata(&pdev->dev, ism); ism->pdev = pdev; ism->dev.parent = &pdev->dev; + ism->dev.release = ism_dev_release; device_initialize(&ism->dev); dev_set_name(&ism->dev, dev_name(&pdev->dev)); ret = device_add(&ism->dev); @@ -637,7 +647,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) device_del(&ism->dev); err_dev: dev_set_drvdata(&pdev->dev, NULL); - kfree(ism); + put_device(&ism->dev);
return ret; } @@ -682,7 +692,7 @@ static void ism_remove(struct pci_dev *pdev) pci_disable_device(pdev); device_del(&ism->dev); dev_set_drvdata(&pdev->dev, NULL); - kfree(ism); + put_device(&ism->dev); }
static struct pci_driver ism_driver = {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit 2ee73c54a615b74d2e7ee6f20844fd3ba63fc485 ]
Allow tracking of packets sent with send_subcrq direct vs indirect. `ethtool -S <dev>` will now provide a counter of the number of uses of each xmit method. This metric will be useful in performance debugging.
Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20241001163531.1803152-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: bdf5d13aa05e ("ibmvnic: Don't reference skb after sending to VIOS") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 23 ++++++++++++++++------- drivers/net/ethernet/ibm/ibmvnic.h | 3 ++- 2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 97425c06e1ed7..cca2ed6ad2899 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2310,7 +2310,7 @@ static void ibmvnic_tx_scrq_clean_buffer(struct ibmvnic_adapter *adapter, tx_buff = &tx_pool->tx_buff[index]; adapter->netdev->stats.tx_packets--; adapter->netdev->stats.tx_bytes -= tx_buff->skb->len; - adapter->tx_stats_buffers[queue_num].packets--; + adapter->tx_stats_buffers[queue_num].batched_packets--; adapter->tx_stats_buffers[queue_num].bytes -= tx_buff->skb->len; dev_kfree_skb_any(tx_buff->skb); @@ -2402,7 +2402,8 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) unsigned int tx_map_failed = 0; union sub_crq indir_arr[16]; unsigned int tx_dropped = 0; - unsigned int tx_packets = 0; + unsigned int tx_dpackets = 0; + unsigned int tx_bpackets = 0; unsigned int tx_bytes = 0; dma_addr_t data_dma_addr; struct netdev_queue *txq; @@ -2575,6 +2576,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) if (lpar_rc != H_SUCCESS) goto tx_err;
+ tx_dpackets++; goto early_exit; }
@@ -2603,6 +2605,8 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) goto tx_err; }
+ tx_bpackets++; + early_exit: if (atomic_add_return(num_entries, &tx_scrq->used) >= adapter->req_tx_entries_per_subcrq) { @@ -2610,7 +2614,6 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) netif_stop_subqueue(netdev, queue_num); }
- tx_packets++; tx_bytes += skb->len; txq_trans_cond_update(txq); ret = NETDEV_TX_OK; @@ -2640,10 +2643,11 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) rcu_read_unlock(); netdev->stats.tx_dropped += tx_dropped; netdev->stats.tx_bytes += tx_bytes; - netdev->stats.tx_packets += tx_packets; + netdev->stats.tx_packets += tx_bpackets + tx_dpackets; adapter->tx_send_failed += tx_send_failed; adapter->tx_map_failed += tx_map_failed; - adapter->tx_stats_buffers[queue_num].packets += tx_packets; + adapter->tx_stats_buffers[queue_num].batched_packets += tx_bpackets; + adapter->tx_stats_buffers[queue_num].direct_packets += tx_dpackets; adapter->tx_stats_buffers[queue_num].bytes += tx_bytes; adapter->tx_stats_buffers[queue_num].dropped_packets += tx_dropped;
@@ -3808,7 +3812,10 @@ static void ibmvnic_get_strings(struct net_device *dev, u32 stringset, u8 *data) memcpy(data, ibmvnic_stats[i].name, ETH_GSTRING_LEN);
for (i = 0; i < adapter->req_tx_queues; i++) { - snprintf(data, ETH_GSTRING_LEN, "tx%d_packets", i); + snprintf(data, ETH_GSTRING_LEN, "tx%d_batched_packets", i); + data += ETH_GSTRING_LEN; + + snprintf(data, ETH_GSTRING_LEN, "tx%d_direct_packets", i); data += ETH_GSTRING_LEN;
snprintf(data, ETH_GSTRING_LEN, "tx%d_bytes", i); @@ -3873,7 +3880,9 @@ static void ibmvnic_get_ethtool_stats(struct net_device *dev, (adapter, ibmvnic_stats[i].offset));
for (j = 0; j < adapter->req_tx_queues; j++) { - data[i] = adapter->tx_stats_buffers[j].packets; + data[i] = adapter->tx_stats_buffers[j].batched_packets; + i++; + data[i] = adapter->tx_stats_buffers[j].direct_packets; i++; data[i] = adapter->tx_stats_buffers[j].bytes; i++; diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h index 94ac36b1408be..a189038d88df0 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.h +++ b/drivers/net/ethernet/ibm/ibmvnic.h @@ -213,7 +213,8 @@ struct ibmvnic_statistics {
#define NUM_TX_STATS 3 struct ibmvnic_tx_queue_stats { - u64 packets; + u64 batched_packets; + u64 direct_packets; u64 bytes; u64 dropped_packets; };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit bdf5d13aa05ec314d4385b31ac974d6c7e0997c9 ]
Previously, after successfully flushing the xmit buffer to VIOS, the tx_bytes stat was incremented by the length of the skb.
It is invalid to access the skb memory after sending the buffer to the VIOS because, at any point after sending, the VIOS can trigger an interrupt to free this memory. A race between reading skb->len and freeing the skb is possible (especially during LPM) and will result in use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ibmvnic_xmit+0x75c/0x1808 [ibmvnic] Read of size 4 at addr c00000024eb48a70 by task hxecom/14495 <...> Call Trace: [c000000118f66cf0] [c0000000018cba6c] dump_stack_lvl+0x84/0xe8 (unreliable) [c000000118f66d20] [c0000000006f0080] print_report+0x1a8/0x7f0 [c000000118f66df0] [c0000000006f08f0] kasan_report+0x128/0x1f8 [c000000118f66f00] [c0000000006f2868] __asan_load4+0xac/0xe0 [c000000118f66f20] [c0080000046eac84] ibmvnic_xmit+0x75c/0x1808 [ibmvnic] [c000000118f67340] [c0000000014be168] dev_hard_start_xmit+0x150/0x358 <...> Freed by task 0: kasan_save_stack+0x34/0x68 kasan_save_track+0x2c/0x50 kasan_save_free_info+0x64/0x108 __kasan_mempool_poison_object+0x148/0x2d4 napi_skb_cache_put+0x5c/0x194 net_tx_action+0x154/0x5b8 handle_softirqs+0x20c/0x60c do_softirq_own_stack+0x6c/0x88 <...> The buggy address belongs to the object at c00000024eb48a00 which belongs to the cache skbuff_head_cache of size 224 ==================================================================
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250214155233.235559-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index cca2ed6ad2899..61db00b2b33e4 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2408,6 +2408,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) dma_addr_t data_dma_addr; struct netdev_queue *txq; unsigned long lpar_rc; + unsigned int skblen; union sub_crq tx_crq; unsigned int offset; bool use_scrq_send_direct = false; @@ -2522,6 +2523,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_buff->skb = skb; tx_buff->index = bufidx; tx_buff->pool_index = queue_num; + skblen = skb->len;
memset(&tx_crq, 0, sizeof(tx_crq)); tx_crq.v1.first = IBMVNIC_CRQ_CMD; @@ -2614,7 +2616,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) netif_stop_subqueue(netdev, queue_num); }
- tx_bytes += skb->len; + tx_bytes += skblen; txq_trans_cond_update(txq); ret = NETDEV_TX_OK; goto out;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 8fb5bb169d17cdd12c2dcc2e96830ed487d77a0f ]
sockmap expects all vsocks to have a transport assigned, which is expressed in vsock_proto::psock_update_sk_prot(). However, there is an edge case where an unconnected (connectible) socket may lose its previously assigned transport. This is handled with a NULL check in the vsock/BPF recv path.
Another design detail is that listening vsocks are not supposed to have any transport assigned at all. Which implies they are not supported by the sockmap. But this is complicated by the fact that a socket, before switching to TCP_LISTEN, may have had some transport assigned during a failed connect() attempt. Hence, we may end up with a listening vsock in a sockmap, which blows up quickly:
KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127] CPU: 7 UID: 0 PID: 56 Comm: kworker/7:0 Not tainted 6.14.0-rc1+ Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:vsock_read_skb+0x4b/0x90 Call Trace: sk_psock_verdict_data_ready+0xa4/0x2e0 virtio_transport_recv_pkt+0x1ca8/0x2acc vsock_loopback_work+0x27d/0x3f0 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x35a/0x700 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30
For connectible sockets, instead of relying solely on the state of vsk->transport, tell sockmap to only allow those representing established connections. This aligns with the behaviour for AF_INET and AF_UNIX.
Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj mhal@rbox.co Acked-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock_map.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index f1b9b3958792c..2f1be9baad057 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -541,6 +541,9 @@ static bool sock_map_sk_state_allowed(const struct sock *sk) return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN); if (sk_is_stream_unix(sk)) return (1 << sk->sk_state) & TCPF_ESTABLISHED; + if (sk_is_vsock(sk) && + (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET)) + return (1 << sk->sk_state) & TCPF_ESTABLISHED; return true; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 857ae05549ee2542317e7084ecaa5f8536634dd9 ]
In the spirit of commit 91751e248256 ("vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]"), armorize the "impossible" cases with a warning.
Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj mhal@rbox.co Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/af_vsock.c | 3 +++ net/vmw_vsock/vsock_bpf.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 37299a7ca1876..eb6ea26b390ee 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1189,6 +1189,9 @@ static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor) { struct vsock_sock *vsk = vsock_sk(sk);
+ if (WARN_ON_ONCE(!vsk->transport)) + return -ENODEV; + return vsk->transport->read_skb(vsk, read_actor); }
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index f201d9eca1df2..07b96d56f3a57 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -87,7 +87,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, lock_sock(sk); vsk = vsock_sk(sk);
- if (!vsk->transport) { + if (WARN_ON_ONCE(!vsk->transport)) { copied = -ENODEV; goto out; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit f5da7c45188eea71394bf445655cae2df88a7788 ]
Since commit under Fixes we set the window clamp in accordance to newly measured rcvbuf scaling_ratio. If the scaling_ratio decreased significantly we may put ourselves in a situation where windows become smaller than rcvq_space, preventing tcp_rcv_space_adjust() from increasing rcvbuf.
The significant decrease of scaling_ratio is far more likely since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), which increased the "default" scaling ratio from ~30% to 50%.
Hitting the bad condition depends a lot on TCP tuning, and drivers at play. One of Meta's workloads hits it reliably under following conditions: - default rcvbuf of 125k - sender MTU 1500, receiver MTU 5000 - driver settles on scaling_ratio of 78 for the config above. Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss (10 * 5k = 50k). Once we find out the true scaling ratio and MSS we clamp the windows to 38k. Triggering the condition also depends on the message sequence of this workload. I can't repro the problem with simple iperf or TCP_RR-style tests.
Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Neal Cardwell ncardwell@google.com Link: https://patch.msgid.link/20250217232905.3162187-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2d43b29da15e2..bb17add6e4a78 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -243,9 +243,15 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) do_div(val, skb->truesize); tcp_sk(sk)->scaling_ratio = val ? val : 1;
- if (old_ratio != tcp_sk(sk)->scaling_ratio) - WRITE_ONCE(tcp_sk(sk)->window_clamp, - tcp_win_from_space(sk, sk->sk_rcvbuf)); + if (old_ratio != tcp_sk(sk)->scaling_ratio) { + struct tcp_sock *tp = tcp_sk(sk); + + val = tcp_win_from_space(sk, sk->sk_rcvbuf); + tcp_set_window_clamp(sk, val); + + if (tp->window_clamp < tp->rcvq_space.space) + tp->rcvq_space.space = tp->window_clamp; + } } icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, tcp_sk(sk)->advmss);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit 675d0e3cacc3ae7c29294a5f6a820187f862ad8b ]
Setting the max_uA constraint in the regulator API imposes a current limit during the regulator registration process. This behavior conflicts with preserving the maximum PI power budget configuration across reboots.
Instead, compare the desired current limit to MAX_PI_CURRENT in the pse_pi_set_current_limit() function to ensure proper handling of the power budget.
Acked-by: Oleksij Rempel o.rempel@pengutronix.de Signed-off-by: Kory Maincent kory.maincent@bootlin.com Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: f6093c5ec74d ("net: pse-pd: pd692x0: Fix power limit retrieval") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/pse-pd/pse_core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c index 2906ce173f66c..9fee4dd53515a 100644 --- a/drivers/net/pse-pd/pse_core.c +++ b/drivers/net/pse-pd/pse_core.c @@ -357,6 +357,9 @@ static int pse_pi_set_current_limit(struct regulator_dev *rdev, int min_uA, if (!ops->pi_set_current_limit) return -EOPNOTSUPP;
+ if (max_uA > MAX_PI_CURRENT) + return -ERANGE; + id = rdev_get_id(rdev); mutex_lock(&pcdev->lock); ret = ops->pi_set_current_limit(pcdev, id, max_uA); @@ -403,11 +406,9 @@ devm_pse_pi_regulator_register(struct pse_controller_dev *pcdev,
rinit_data->constraints.valid_ops_mask = REGULATOR_CHANGE_STATUS;
- if (pcdev->ops->pi_set_current_limit) { + if (pcdev->ops->pi_set_current_limit) rinit_data->constraints.valid_ops_mask |= REGULATOR_CHANGE_CURRENT; - rinit_data->constraints.max_uA = MAX_PI_CURRENT; - }
rinit_data->supply_regulator = "vpwr";
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit e0a5e2bba38aa61a900934b45d6e846e0a6d7524 ]
The regulator framework uses current limits, but the PSE standard and known PSE controllers rely on power limits. Instead of converting current to power within each driver, perform the conversion in the PSE core. This avoids redundancy in driver implementation and aligns better with the standard, simplifying driver development.
Remove at the same time the _pse_ethtool_get_status() function which is not needed anymore.
Acked-by: Oleksij Rempel o.rempel@pengutronix.de Signed-off-by: Kory Maincent kory.maincent@bootlin.com Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: f6093c5ec74d ("net: pse-pd: pd692x0: Fix power limit retrieval") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/pse-pd/pd692x0.c | 45 ++++------------- drivers/net/pse-pd/pse_core.c | 91 ++++++++++++++++------------------- include/linux/pse-pd/pse.h | 16 +++--- 3 files changed, 57 insertions(+), 95 deletions(-)
diff --git a/drivers/net/pse-pd/pd692x0.c b/drivers/net/pse-pd/pd692x0.c index 0af7db80b2f88..9f00538f7e450 100644 --- a/drivers/net/pse-pd/pd692x0.c +++ b/drivers/net/pse-pd/pd692x0.c @@ -999,13 +999,12 @@ static int pd692x0_pi_get_voltage(struct pse_controller_dev *pcdev, int id) return (buf.sub[0] << 8 | buf.sub[1]) * 100000; }
-static int pd692x0_pi_get_current_limit(struct pse_controller_dev *pcdev, - int id) +static int pd692x0_pi_get_pw_limit(struct pse_controller_dev *pcdev, + int id) { struct pd692x0_priv *priv = to_pd692x0_priv(pcdev); struct pd692x0_msg msg, buf = {0}; - int mW, uV, uA, ret; - s64 tmp_64; + int ret;
msg = pd692x0_msg_template_list[PD692X0_MSG_GET_PORT_PARAM]; msg.sub[2] = id; @@ -1013,48 +1012,24 @@ static int pd692x0_pi_get_current_limit(struct pse_controller_dev *pcdev, if (ret < 0) return ret;
- ret = pd692x0_pi_get_pw_from_table(buf.data[2], buf.data[3]); - if (ret < 0) - return ret; - mW = ret; - - ret = pd692x0_pi_get_voltage(pcdev, id); - if (ret < 0) - return ret; - uV = ret; - - tmp_64 = mW; - tmp_64 *= 1000000000ull; - /* uA = mW * 1000000000 / uV */ - uA = DIV_ROUND_CLOSEST_ULL(tmp_64, uV); - return uA; + return pd692x0_pi_get_pw_from_table(buf.data[2], buf.data[3]); }
-static int pd692x0_pi_set_current_limit(struct pse_controller_dev *pcdev, - int id, int max_uA) +static int pd692x0_pi_set_pw_limit(struct pse_controller_dev *pcdev, + int id, int max_mW) { struct pd692x0_priv *priv = to_pd692x0_priv(pcdev); struct device *dev = &priv->client->dev; struct pd692x0_msg msg, buf = {0}; - int uV, ret, mW; - s64 tmp_64; + int ret;
ret = pd692x0_fw_unavailable(priv); if (ret) return ret;
- ret = pd692x0_pi_get_voltage(pcdev, id); - if (ret < 0) - return ret; - uV = ret; - msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; msg.sub[2] = id; - tmp_64 = uV; - tmp_64 *= max_uA; - /* mW = uV * uA / 1000000000 */ - mW = DIV_ROUND_CLOSEST_ULL(tmp_64, 1000000000); - ret = pd692x0_pi_set_pw_from_table(dev, &msg, mW); + ret = pd692x0_pi_set_pw_from_table(dev, &msg, max_mW); if (ret) return ret;
@@ -1068,8 +1043,8 @@ static const struct pse_controller_ops pd692x0_ops = { .pi_disable = pd692x0_pi_disable, .pi_is_enabled = pd692x0_pi_is_enabled, .pi_get_voltage = pd692x0_pi_get_voltage, - .pi_get_current_limit = pd692x0_pi_get_current_limit, - .pi_set_current_limit = pd692x0_pi_set_current_limit, + .pi_get_pw_limit = pd692x0_pi_get_pw_limit, + .pi_set_pw_limit = pd692x0_pi_set_pw_limit, };
#define PD692X0_FW_LINE_MAX_SZ 0xff diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c index 9fee4dd53515a..4c5abef9e94ee 100644 --- a/drivers/net/pse-pd/pse_core.c +++ b/drivers/net/pse-pd/pse_core.c @@ -291,33 +291,25 @@ static int pse_pi_get_voltage(struct regulator_dev *rdev) return ret; }
-static int _pse_ethtool_get_status(struct pse_controller_dev *pcdev, - int id, - struct netlink_ext_ack *extack, - struct pse_control_status *status); - static int pse_pi_get_current_limit(struct regulator_dev *rdev) { struct pse_controller_dev *pcdev = rdev_get_drvdata(rdev); const struct pse_controller_ops *ops; - struct netlink_ext_ack extack = {}; - struct pse_control_status st = {}; - int id, uV, ret; + int id, uV, mW, ret; s64 tmp_64;
ops = pcdev->ops; id = rdev_get_id(rdev); + if (!ops->pi_get_pw_limit || !ops->pi_get_voltage) + return -EOPNOTSUPP; + mutex_lock(&pcdev->lock); - if (ops->pi_get_current_limit) { - ret = ops->pi_get_current_limit(pcdev, id); + ret = ops->pi_get_pw_limit(pcdev, id); + if (ret < 0) goto out; - } + mW = ret;
- /* If pi_get_current_limit() callback not populated get voltage - * from pi_get_voltage() and power limit from ethtool_get_status() - * to calculate current limit. - */ - ret = _pse_pi_get_voltage(rdev); + ret = pse_pi_get_voltage(rdev); if (!ret) { dev_err(pcdev->dev, "Voltage null\n"); ret = -ERANGE; @@ -327,16 +319,7 @@ static int pse_pi_get_current_limit(struct regulator_dev *rdev) goto out; uV = ret;
- ret = _pse_ethtool_get_status(pcdev, id, &extack, &st); - if (ret) - goto out; - - if (!st.c33_avail_pw_limit) { - ret = -ENODATA; - goto out; - } - - tmp_64 = st.c33_avail_pw_limit; + tmp_64 = mW; tmp_64 *= 1000000000ull; /* uA = mW * 1000000000 / uV */ ret = DIV_ROUND_CLOSEST_ULL(tmp_64, uV); @@ -351,10 +334,11 @@ static int pse_pi_set_current_limit(struct regulator_dev *rdev, int min_uA, { struct pse_controller_dev *pcdev = rdev_get_drvdata(rdev); const struct pse_controller_ops *ops; - int id, ret; + int id, mW, ret; + s64 tmp_64;
ops = pcdev->ops; - if (!ops->pi_set_current_limit) + if (!ops->pi_set_pw_limit || !ops->pi_get_voltage) return -EOPNOTSUPP;
if (max_uA > MAX_PI_CURRENT) @@ -362,7 +346,21 @@ static int pse_pi_set_current_limit(struct regulator_dev *rdev, int min_uA,
id = rdev_get_id(rdev); mutex_lock(&pcdev->lock); - ret = ops->pi_set_current_limit(pcdev, id, max_uA); + ret = pse_pi_get_voltage(rdev); + if (!ret) { + dev_err(pcdev->dev, "Voltage null\n"); + ret = -ERANGE; + goto out; + } + if (ret < 0) + goto out; + + tmp_64 = ret; + tmp_64 *= max_uA; + /* mW = uA * uV / 1000000000 */ + mW = DIV_ROUND_CLOSEST_ULL(tmp_64, 1000000000); + ret = ops->pi_set_pw_limit(pcdev, id, mW); +out: mutex_unlock(&pcdev->lock);
return ret; @@ -406,7 +404,7 @@ devm_pse_pi_regulator_register(struct pse_controller_dev *pcdev,
rinit_data->constraints.valid_ops_mask = REGULATOR_CHANGE_STATUS;
- if (pcdev->ops->pi_set_current_limit) + if (pcdev->ops->pi_set_pw_limit) rinit_data->constraints.valid_ops_mask |= REGULATOR_CHANGE_CURRENT;
@@ -737,23 +735,6 @@ struct pse_control *of_pse_control_get(struct device_node *node) } EXPORT_SYMBOL_GPL(of_pse_control_get);
-static int _pse_ethtool_get_status(struct pse_controller_dev *pcdev, - int id, - struct netlink_ext_ack *extack, - struct pse_control_status *status) -{ - const struct pse_controller_ops *ops; - - ops = pcdev->ops; - if (!ops->ethtool_get_status) { - NL_SET_ERR_MSG(extack, - "PSE driver does not support status report"); - return -EOPNOTSUPP; - } - - return ops->ethtool_get_status(pcdev, id, extack, status); -} - /** * pse_ethtool_get_status - get status of PSE control * @psec: PSE control pointer @@ -766,11 +747,21 @@ int pse_ethtool_get_status(struct pse_control *psec, struct netlink_ext_ack *extack, struct pse_control_status *status) { + const struct pse_controller_ops *ops; + struct pse_controller_dev *pcdev; int err;
- mutex_lock(&psec->pcdev->lock); - err = _pse_ethtool_get_status(psec->pcdev, psec->id, extack, status); - mutex_unlock(&psec->pcdev->lock); + pcdev = psec->pcdev; + ops = pcdev->ops; + if (!ops->ethtool_get_status) { + NL_SET_ERR_MSG(extack, + "PSE driver does not support status report"); + return -EOPNOTSUPP; + } + + mutex_lock(&pcdev->lock); + err = ops->ethtool_get_status(pcdev, psec->id, extack, status); + mutex_unlock(&pcdev->lock);
return err; } diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h index 591a53e082e65..df1592022d938 100644 --- a/include/linux/pse-pd/pse.h +++ b/include/linux/pse-pd/pse.h @@ -75,12 +75,8 @@ struct pse_control_status { * @pi_disable: Configure the PSE PI as disabled. * @pi_get_voltage: Return voltage similarly to get_voltage regulator * callback. - * @pi_get_current_limit: Get the configured current limit similarly to - * get_current_limit regulator callback. - * @pi_set_current_limit: Configure the current limit similarly to - * set_current_limit regulator callback. - * Should not return an error in case of MAX_PI_CURRENT - * current value set. + * @pi_get_pw_limit: Get the configured power limit of the PSE PI. + * @pi_set_pw_limit: Configure the power limit of the PSE PI. */ struct pse_controller_ops { int (*ethtool_get_status)(struct pse_controller_dev *pcdev, @@ -91,10 +87,10 @@ struct pse_controller_ops { int (*pi_enable)(struct pse_controller_dev *pcdev, int id); int (*pi_disable)(struct pse_controller_dev *pcdev, int id); int (*pi_get_voltage)(struct pse_controller_dev *pcdev, int id); - int (*pi_get_current_limit)(struct pse_controller_dev *pcdev, - int id); - int (*pi_set_current_limit)(struct pse_controller_dev *pcdev, - int id, int max_uA); + int (*pi_get_pw_limit)(struct pse_controller_dev *pcdev, + int id); + int (*pi_set_pw_limit)(struct pse_controller_dev *pcdev, + int id, int max_mW); };
struct module;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit f6093c5ec74d5cc495f89bd359253d9c738d04d9 ]
Fix incorrect data offset read in the pd692x0_pi_get_pw_limit callback. The issue was previously unnoticed as it was only used by the regulator API and not thoroughly tested, since the PSE is mainly controlled via ethtool.
The function became actively used by ethtool after commit 3e9dbfec4998 ("net: pse-pd: Split ethtool_get_status into multiple callbacks"), which led to the discovery of this issue.
Fix it by using the correct data offset.
Fixes: a87e699c9d33 ("net: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks") Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://patch.msgid.link/20250217134812.1925345-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/pse-pd/pd692x0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/pse-pd/pd692x0.c b/drivers/net/pse-pd/pd692x0.c index 9f00538f7e450..7cfc36cadb576 100644 --- a/drivers/net/pse-pd/pd692x0.c +++ b/drivers/net/pse-pd/pd692x0.c @@ -1012,7 +1012,7 @@ static int pd692x0_pi_get_pw_limit(struct pse_controller_dev *pcdev, if (ret < 0) return ret;
- return pd692x0_pi_get_pw_from_table(buf.data[2], buf.data[3]); + return pd692x0_pi_get_pw_from_table(buf.data[0], buf.data[1]); }
static int pd692x0_pi_set_pw_limit(struct pse_controller_dev *pcdev,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 4ccacf86491d33d2486b62d4d44864d7101b299d ]
Brad Spengler reported the list_del() corruption splat in gtp_net_exit_batch_rtnl(). [0]
Commit eb28fd76c0a0 ("gtp: Destroy device along with udp socket's netns dismantle.") added the for_each_netdev() loop in gtp_net_exit_batch_rtnl() to destroy devices in each netns as done in geneve and ip tunnels.
However, this could trigger ->dellink() twice for the same device during ->exit_batch_rtnl().
Say we have two netns A & B and gtp device B that resides in netns B but whose UDP socket is in netns A.
1. cleanup_net() processes netns A and then B.
2. gtp_net_exit_batch_rtnl() finds the device B while iterating netns A's gn->gtp_dev_list and calls ->dellink().
[ device B is not yet unlinked from netns B as unregister_netdevice_many() has not been called. ]
3. gtp_net_exit_batch_rtnl() finds the device B while iterating netns B's for_each_netdev() and calls ->dellink().
gtp_dellink() cleans up the device's hash table, unlinks the dev from gn->gtp_dev_list, and calls unregister_netdevice_queue().
Basically, calling gtp_dellink() multiple times is fine unless CONFIG_DEBUG_LIST is enabled.
Let's remove for_each_netdev() in gtp_net_exit_batch_rtnl() and delegate the destruction to default_device_exit_batch() as done in bareudp.
[0]: list_del corruption, ffff8880aaa62c00->next (autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]) is LIST_POISON1 (ffffffffffffff02) (prev is 0xffffffffffffff04) kernel BUG at lib/list_debug.c:58! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 UID: 0 PID: 1804 Comm: kworker/u8:7 Tainted: G T 6.12.13-grsec-full-20250211091339 #1 Tainted: [T]=RANDSTRUCT Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:[<ffffffff84947381>] __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58 Code: c2 76 91 31 c0 e8 9f b1 f7 fc 0f 0b 4d 89 f0 48 c7 c1 02 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 e0 c2 76 91 31 c0 e8 7f b1 f7 fc <0f> 0b 4d 89 e8 48 c7 c1 04 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 60 RSP: 0018:fffffe8040b4fbd0 EFLAGS: 00010283 RAX: 00000000000000cc RBX: dffffc0000000000 RCX: ffffffff818c4054 RDX: ffffffff84947381 RSI: ffffffff818d1512 RDI: 0000000000000000 RBP: ffff8880aaa62c00 R08: 0000000000000001 R09: fffffbd008169f32 R10: fffffe8040b4f997 R11: 0000000000000001 R12: a1988d84f24943e4 R13: ffffffffffffff02 R14: ffffffffffffff04 R15: ffff8880aaa62c08 RBX: kasan shadow of 0x0 RCX: __wake_up_klogd.part.0+0x74/0xe0 kernel/printk/printk.c:4554 RDX: __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58 RSI: vprintk+0x72/0x100 kernel/printk/printk_safe.c:71 RBP: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object] RSP: process kstack fffffe8040b4fbd0+0x7bd0/0x8000 [kworker/u8:7+netns 1804 ] R09: kasan shadow of process kstack fffffe8040b4f990+0x7990/0x8000 [kworker/u8:7+netns 1804 ] R10: process kstack fffffe8040b4f997+0x7997/0x8000 [kworker/u8:7+netns 1804 ] R15: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc08/0x1000 [slab object] FS: 0000000000000000(0000) GS:ffff888116000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000748f5372c000 CR3: 0000000015408000 CR4: 00000000003406f0 shadow CR4: 00000000003406f0 Stack: 0000000000000000 ffffffff8a0c35e7 ffffffff8a0c3603 ffff8880aaa62c00 ffff8880aaa62c00 0000000000000004 ffff88811145311c 0000000000000005 0000000000000001 ffff8880aaa62000 fffffe8040b4fd40 ffffffff8a0c360d Call Trace: <TASK> [<ffffffff8a0c360d>] __list_del_entry_valid include/linux/list.h:131 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] __list_del_entry include/linux/list.h:248 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] list_del include/linux/list.h:262 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] gtp_dellink+0x16d/0x360 drivers/net/gtp.c:1557 fffffe8040b4fc28 [<ffffffff8a0d0404>] gtp_net_exit_batch_rtnl+0x124/0x2c0 drivers/net/gtp.c:2495 fffffe8040b4fc88 [<ffffffff8e705b24>] cleanup_net+0x5a4/0xbe0 net/core/net_namespace.c:635 fffffe8040b4fcd0 [<ffffffff81754c97>] process_one_work+0xbd7/0x2160 kernel/workqueue.c:3326 fffffe8040b4fd88 [<ffffffff81757195>] process_scheduled_works kernel/workqueue.c:3407 [inline] fffffe8040b4fec0 [<ffffffff81757195>] worker_thread+0x6b5/0xfa0 kernel/workqueue.c:3488 fffffe8040b4fec0 [<ffffffff817782a0>] kthread+0x360/0x4c0 kernel/kthread.c:397 fffffe8040b4ff78 [<ffffffff814d8594>] ret_from_fork+0x74/0xe0 arch/x86/kernel/process.c:172 fffffe8040b4ffb8 [<ffffffff8110f509>] ret_from_fork_asm+0x29/0xc0 arch/x86/entry/entry_64.S:399 fffffe8040b4ffe8 </TASK> Modules linked in:
Fixes: eb28fd76c0a0 ("gtp: Destroy device along with udp socket's netns dismantle.") Reported-by: Brad Spengler spender@grsecurity.net Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250217203705.40342-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/gtp.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 47406ce990161..33b78b4007fe7 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -2487,11 +2487,6 @@ static void __net_exit gtp_net_exit_batch_rtnl(struct list_head *net_list, list_for_each_entry(net, net_list, exit_list) { struct gtp_net *gn = net_generic(net, gtp_net_id); struct gtp_dev *gtp, *gtp_next; - struct net_device *dev; - - for_each_netdev(net, dev) - if (dev->rtnl_link_ops == >p_link_ops) - gtp_dellink(dev, dev_to_kill);
list_for_each_entry_safe(gtp, gtp_next, &gn->gtp_dev_list, list) gtp_dellink(gtp->dev, dev_to_kill);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 62fab6eef61f245dc8797e3a6a5b890ef40e8628 ]
As explained in the previous patch, iterating for_each_netdev() and gn->geneve_list during ->exit_batch_rtnl() could trigger ->dellink() twice for the same device.
If CONFIG_DEBUG_LIST is enabled, we will see a list_del() corruption splat in the 2nd call of geneve_dellink().
Let's remove for_each_netdev() in geneve_destroy_tunnels() and delegate that part to default_device_exit_batch().
Fixes: 9593172d93b9 ("geneve: Fix use-after-free in geneve_find_dev().") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250217203705.40342-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/geneve.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 9c53b0bbb4c57..963fb9261f017 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1902,14 +1902,7 @@ static void geneve_destroy_tunnels(struct net *net, struct list_head *head) { struct geneve_net *gn = net_generic(net, geneve_net_id); struct geneve_dev *geneve, *next; - struct net_device *dev, *aux;
- /* gather any geneve devices that were moved into this ns */ - for_each_netdev_safe(net, dev, aux) - if (dev->rtnl_link_ops == &geneve_link_ops) - geneve_dellink(dev, head); - - /* now gather any other geneve devices that were created in this ns */ list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) geneve_dellink(geneve->dev, head); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 3e5796862c692ea608d96f0a1437f9290f44953a ]
This patch fixes a bug in TC flower filter where rules combining a specific destination port with a source port range weren't working correctly.
The specific case was when users tried to configure rules like:
tc filter add dev ens38 ingress protocol ip flower ip_proto udp \ dst_port 5000 src_port 2000-3000 action drop
The root cause was in the flow dissector code. While both FLOW_DISSECTOR_KEY_PORTS and FLOW_DISSECTOR_KEY_PORTS_RANGE flags were being set correctly in the classifier, the __skb_flow_dissect_ports() function was only populating one of them: whichever came first in the enum check. This meant that when the code needed both a specific port and a port range, one of them would be left as 0, causing the filter to not match packets as expected.
Fix it by removing the either/or logic and instead checking and populating both key types independently when they're in use.
Fixes: 8ffb055beae5 ("cls_flower: Fix the behavior using port ranges with hw-offload") Reported-by: Qiang Zhang dtzq01@gmail.com Closes: https://lore.kernel.org/netdev/CAPx+-5uvFxkhkz4=j_Xuwkezjn9U6kzKTD5jz4tZ9msS... Cc: Yoshiki Komachi komachi.yoshiki@gmail.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://patch.msgid.link/20250218043210.732959-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/flow_dissector.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 5db41bf2ed93e..c33af3ef0b790 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -853,23 +853,30 @@ __skb_flow_dissect_ports(const struct sk_buff *skb, void *target_container, const void *data, int nhoff, u8 ip_proto, int hlen) { - enum flow_dissector_key_id dissector_ports = FLOW_DISSECTOR_KEY_MAX; - struct flow_dissector_key_ports *key_ports; + struct flow_dissector_key_ports_range *key_ports_range = NULL; + struct flow_dissector_key_ports *key_ports = NULL; + __be32 ports;
if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) - dissector_ports = FLOW_DISSECTOR_KEY_PORTS; - else if (dissector_uses_key(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS_RANGE)) - dissector_ports = FLOW_DISSECTOR_KEY_PORTS_RANGE; + key_ports = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS, + target_container);
- if (dissector_ports == FLOW_DISSECTOR_KEY_MAX) + if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS_RANGE)) + key_ports_range = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS_RANGE, + target_container); + + if (!key_ports && !key_ports_range) return;
- key_ports = skb_flow_dissector_target(flow_dissector, - dissector_ports, - target_container); - key_ports->ports = __skb_flow_get_ports(skb, nhoff, ip_proto, - data, hlen); + ports = __skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen); + + if (key_ports) + key_ports->ports = ports; + + if (key_ports_range) + key_ports_range->tp.ports = ports; }
static void
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 69ab34f705fbfabcace64b5d53bb7a4450fac875 ]
Fix how port range keys are handled in __skb_flow_bpf_to_target() by: - Separating PORTS and PORTS_RANGE key handling - Using correct key_ports_range structure for range keys - Properly initializing both key types independently
This ensures port range information is correctly stored in its dedicated structure rather than incorrectly using the regular ports key structure.
Fixes: 59fb9b62fb6c ("flow_dissector: Fix to use new variables for port ranges in bpf hook") Reported-by: Qiang Zhang dtzq01@gmail.com Closes: https://lore.kernel.org/netdev/CAPx+-5uvFxkhkz4=j_Xuwkezjn9U6kzKTD5jz4tZ9msS... Cc: Yoshiki Komachi komachi.yoshiki@gmail.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Link: https://patch.msgid.link/20250218043210.732959-4-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/flow_dissector.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index c33af3ef0b790..9cd8de6bebb54 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -931,6 +931,7 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, struct flow_dissector *flow_dissector, void *target_container) { + struct flow_dissector_key_ports_range *key_ports_range = NULL; struct flow_dissector_key_ports *key_ports = NULL; struct flow_dissector_key_control *key_control; struct flow_dissector_key_basic *key_basic; @@ -975,20 +976,21 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; }
- if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) + if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) { key_ports = skb_flow_dissector_target(flow_dissector, FLOW_DISSECTOR_KEY_PORTS, target_container); - else if (dissector_uses_key(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS_RANGE)) - key_ports = skb_flow_dissector_target(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS_RANGE, - target_container); - - if (key_ports) { key_ports->src = flow_keys->sport; key_ports->dst = flow_keys->dport; } + if (dissector_uses_key(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS_RANGE)) { + key_ports_range = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS_RANGE, + target_container); + key_ports_range->tp.src = flow_keys->sport; + key_ports_range->tp.dst = flow_keys->dport; + }
if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 4b5a28b38c4a0106c64416a1b2042405166b26ce ]
Add dedicated helper for finding devices by hardware address when holding rtnl_lock, similar to existing dev_getbyhwaddr_rcu(). This prevents PROVE_LOCKING warnings when rtnl_lock is held but RCU read lock is not.
Extract common address comparison logic into dev_addr_cmp().
The context about this change could be found in the following discussion:
Link: https://lore.kernel.org/all/20250206-scarlet-ermine-of-improvement-1fcac5@le...
Cc: kuniyu@amazon.com Cc: ushankar@purestorage.com Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-1-d3d6892db9e1@debian.... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 4eae0ee0f1e6 ("arp: switch to dev_getbyhwaddr() in arp_req_set_public()") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netdevice.h | 2 ++ net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++--- 2 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4f17b786828af..35b886385f329 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3064,6 +3064,8 @@ static inline struct net_device *first_net_device_rcu(struct net *net) }
int netdev_boot_setup_check(struct net_device *dev); +struct net_device *dev_getbyhwaddr(struct net *net, unsigned short type, + const char *hwaddr); struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type, const char *hwaddr); struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type); diff --git a/net/core/dev.c b/net/core/dev.c index 2e0fe38d0e877..c761f862bc5a2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1012,6 +1012,12 @@ int netdev_get_name(struct net *net, char *name, int ifindex) return ret; }
+static bool dev_addr_cmp(struct net_device *dev, unsigned short type, + const char *ha) +{ + return dev->type == type && !memcmp(dev->dev_addr, ha, dev->addr_len); +} + /** * dev_getbyhwaddr_rcu - find a device by its hardware address * @net: the applicable net namespace @@ -1020,7 +1026,7 @@ int netdev_get_name(struct net *net, char *name, int ifindex) * * Search for an interface by MAC address. Returns NULL if the device * is not found or a pointer to the device. - * The caller must hold RCU or RTNL. + * The caller must hold RCU. * The returned device has not had its ref count increased * and the caller must therefore be careful about locking * @@ -1032,14 +1038,39 @@ struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type, struct net_device *dev;
for_each_netdev_rcu(net, dev) - if (dev->type == type && - !memcmp(dev->dev_addr, ha, dev->addr_len)) + if (dev_addr_cmp(dev, type, ha)) return dev;
return NULL; } EXPORT_SYMBOL(dev_getbyhwaddr_rcu);
+/** + * dev_getbyhwaddr() - find a device by its hardware address + * @net: the applicable net namespace + * @type: media type of device + * @ha: hardware address + * + * Similar to dev_getbyhwaddr_rcu(), but the owner needs to hold + * rtnl_lock. + * + * Context: rtnl_lock() must be held. + * Return: pointer to the net_device, or NULL if not found + */ +struct net_device *dev_getbyhwaddr(struct net *net, unsigned short type, + const char *ha) +{ + struct net_device *dev; + + ASSERT_RTNL(); + for_each_netdev(net, dev) + if (dev_addr_cmp(dev, type, ha)) + return dev; + + return NULL; +} +EXPORT_SYMBOL(dev_getbyhwaddr); + struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type) { struct net_device *dev, *ret = NULL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 4eae0ee0f1e6256d0b0b9dd6e72f1d9cf8f72e08 ]
The arp_req_set_public() function is called with the rtnl lock held, which provides enough synchronization protection. This makes the RCU variant of dev_getbyhwaddr() unnecessary. Switch to using the simpler dev_getbyhwaddr() function since we already have the required rtnl locking.
This change helps maintain consistency in the networking code by using the appropriate helper function for the existing locking context. Since we're not holding the RCU read lock in arp_req_set_public() existing code could trigger false positive locking warnings.
Fixes: 941666c2e3e0 ("net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()") Suggested-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Breno Leitao leitao@debian.org Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-2-d3d6892db9e1@debian.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/arp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 59ffaa89d7b05..8fb48f42581ce 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1077,7 +1077,7 @@ static int arp_req_set_public(struct net *net, struct arpreq *r, __be32 mask = ((struct sockaddr_in *)&r->arp_netmask)->sin_addr.s_addr;
if (!dev && (r->arp_flags & ATF_COM)) { - dev = dev_getbyhwaddr_rcu(net, r->arp_ha.sa_family, + dev = dev_getbyhwaddr(net, r->arp_ha.sa_family, r->arp_ha.sa_data); if (!dev) return -ENODEV;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Hu nick.hu@sifive.com
[ Upstream commit a370295367b55662a32a4be92565fe72a5aa79bb ]
The external PHY will undergo a soft reset twice during the resume process when it wake up from suspend. The first reset occurs when the axienet driver calls phylink_of_phy_connect(), and the second occurs when mdio_bus_phy_resume() invokes phy_init_hw(). The second soft reset of the external PHY does not reinitialize the internal PHY, which causes issues with the internal PHY, resulting in the PHY link being down. To prevent this, setting the mac_managed_pm flag skips the mdio_bus_phy_resume() function.
Fixes: a129b41fe0a8 ("Revert "net: phy: dp83867: perform soft reset and retain established link"") Signed-off-by: Nick Hu nick.hu@sifive.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://patch.msgid.link/20250217055843.19799-1-nick.hu@sifive.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index de10a2d08c428..fe3438abcd253 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -2888,6 +2888,7 @@ static int axienet_probe(struct platform_device *pdev)
lp->phylink_config.dev = &ndev->dev; lp->phylink_config.type = PHYLINK_NETDEV; + lp->phylink_config.mac_managed_pm = true; lp->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_ASYM_PAUSE | MAC_10FD | MAC_100FD | MAC_1000FD;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 9b6412e6979f6f9e0632075f8f008937b5cd4efd ]
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while running tests that boil down to: - create a pair of netns - run a basic TCP test over ipcomp6 - delete the pair of netns
The xfrm_state found on spi_byaddr was not deleted at the time we delete the netns, because we still have a reference on it. This lingering reference comes from a secpath (which holds a ref on the xfrm_state), which is still attached to an skb. This skb is not leaked, it ends up on sk_receive_queue and then gets defer-free'd by skb_attempt_defer_free.
The problem happens when we defer freeing an skb (push it on one CPU's defer_list), and don't flush that list before the netns is deleted. In that case, we still have a reference on the xfrm_state that we don't expect at this point.
We already drop the skb's dst in the TCP receive path when it's no longer needed, so let's also drop the secpath. At this point, tcp_filter has already called into the LSM hooks that may require the secpath, so it should not be needed anymore. However, in some of those places, the MPTCP extension has just been attached to the skb, so we cannot simply drop all extensions.
Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Reported-by: Xiumei Mu xmu@redhat.com Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tcp.h | 14 ++++++++++++++ net/ipv4/tcp_fastopen.c | 4 ++-- net/ipv4/tcp_input.c | 8 ++++---- net/ipv4/tcp_ipv4.c | 2 +- 4 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index d1948d357dade..6cd0fde806519 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -41,6 +41,7 @@ #include <net/inet_ecn.h> #include <net/dst.h> #include <net/mptcp.h> +#include <net/xfrm.h>
#include <linux/seq_file.h> #include <linux/memcontrol.h> @@ -683,6 +684,19 @@ void tcp_fin(struct sock *sk); void tcp_check_space(struct sock *sk); void tcp_sack_compress_send_ack(struct sock *sk);
+static inline void tcp_cleanup_skb(struct sk_buff *skb) +{ + skb_dst_drop(skb); + secpath_reset(skb); +} + +static inline void tcp_add_receive_queue(struct sock *sk, struct sk_buff *skb) +{ + DEBUG_NET_WARN_ON_ONCE(skb_dst(skb)); + DEBUG_NET_WARN_ON_ONCE(secpath_exists(skb)); + __skb_queue_tail(&sk->sk_receive_queue, skb); +} + /* tcp_timer.c */ void tcp_init_xmit_timers(struct sock *); static inline void tcp_clear_xmit_timers(struct sock *sk) diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 0f523cbfe329e..32b28fc21b63c 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -178,7 +178,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) if (!skb) return;
- skb_dst_drop(skb); + tcp_cleanup_skb(skb); /* segs_in has been initialized to 1 in tcp_create_openreq_child(). * Hence, reset segs_in to 0 before calling tcp_segs_in() * to avoid double counting. Also, tcp_segs_in() expects @@ -195,7 +195,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_SYN;
tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); tp->syn_data_acked = 1;
/* u64_stats_update_begin(&tp->syncp) not needed here, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bb17add6e4a78..d93a5a89c5692 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4970,7 +4970,7 @@ static void tcp_ofo_queue(struct sock *sk) tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; if (!eaten) - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); else kfree_skb_partial(skb, fragstolen);
@@ -5162,7 +5162,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, skb, fragstolen)) ? 1 : 0; tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); if (!eaten) { - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); skb_set_owner_r(skb, sk); } return eaten; @@ -5245,7 +5245,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); return; } - skb_dst_drop(skb); + tcp_cleanup_skb(skb); __skb_pull(skb, tcp_hdr(skb)->doff * 4);
reason = SKB_DROP_REASON_NOT_SPECIFIED; @@ -6214,7 +6214,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS);
/* Bulk data transfer: receiver */ - skb_dst_drop(skb); + tcp_cleanup_skb(skb); __skb_pull(skb, tcp_header_len); eaten = tcp_queue_rcv(sk, skb, &fragstolen);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index bcc2f1e090c7d..824048679e1b8 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2025,7 +2025,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb, */ skb_condense(skb);
- skb_dst_drop(skb); + tcp_cleanup_skb(skb);
if (unlikely(tcp_checksum_complete(skb))) { bh_unlock_sock(sk);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 14ad6ed30a10afbe91b0749d6378285f4225d482 ]
Sabrina reported the following splat:
WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0 Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48 RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6 RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168 R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> gro_cells_init+0x1ba/0x270 xfrm_input_init+0x4b/0x2a0 xfrm_init+0x38/0x50 ip_rt_init+0x2d7/0x350 ip_init+0xf/0x20 inet_init+0x406/0x590 do_one_initcall+0x9d/0x2e0 do_initcalls+0x23b/0x280 kernel_init_freeable+0x445/0x490 kernel_init+0x20/0x1d0 ret_from_fork+0x46/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> irq event stamp: 584330 hardirqs last enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0 hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0 softirqs last enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470 softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0
on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024) is smaller than GRO_MAX_HEAD.
Such built additionally contains the revert of the single page frag cache so that napi_get_frags() ends up using the page frag allocator, triggering the splat.
Note that the underlying issue is independent from the mentioned revert; address it ensuring that the small head cache will fit either TCP and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb() to select kmalloc() usage for any allocation fitting such cache.
Reported-by: Sabrina Dubroca sd@queasysnail.net Suggested-by: Eric Dumazet edumazet@google.com Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/gro.h | 3 +++ net/core/gro.c | 3 --- net/core/skbuff.c | 10 +++++++--- 3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/include/net/gro.h b/include/net/gro.h index b9b58c1f8d190..7b548f91754bf 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -11,6 +11,9 @@ #include <net/udp.h> #include <net/hotdata.h>
+/* This should be increased if a protocol with a bigger head is added. */ +#define GRO_MAX_HEAD (MAX_HEADER + 128) + struct napi_gro_cb { union { struct { diff --git a/net/core/gro.c b/net/core/gro.c index d1f44084e978f..78b320b631744 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -7,9 +7,6 @@
#define MAX_GRO_SKBS 8
-/* This should be increased if a protocol with a bigger head is added. */ -#define GRO_MAX_HEAD (MAX_HEADER + 128) - static DEFINE_SPINLOCK(offload_lock);
/** diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 74149dc4ee318..61a950f13a91c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -69,6 +69,7 @@ #include <net/dst.h> #include <net/sock.h> #include <net/checksum.h> +#include <net/gro.h> #include <net/gso.h> #include <net/hotdata.h> #include <net/ip6_checksum.h> @@ -95,7 +96,9 @@ static struct kmem_cache *skbuff_ext_cache __ro_after_init; #endif
-#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER) +#define GRO_MAX_HEAD_PAD (GRO_MAX_HEAD + NET_SKB_PAD + NET_IP_ALIGN) +#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(max(MAX_TCP_HEADER, \ + GRO_MAX_HEAD_PAD))
/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique @@ -736,7 +739,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, /* If requested length is either too small or too big, * we use kmalloc() for skb->head allocation. */ - if (len <= SKB_WITH_OVERHEAD(1024) || + if (len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE) || len > SKB_WITH_OVERHEAD(PAGE_SIZE) || (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); @@ -816,7 +819,8 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) * When the small frag allocator is available, prefer it over kmalloc * for small fragments */ - if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) || + if ((!NAPI_HAS_SMALL_PAGE_FRAG && + len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)) || len > SKB_WITH_OVERHEAD(PAGE_SIZE) || (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 6b3d638ca897e099fa99bd6d02189d3176f80a47 ]
KMSAN reported a use-after-free issue in eth_skb_pkt_type()[1]. The cause of the issue was that eth_skb_pkt_type() accessed skb's data that didn't contain an Ethernet header. This occurs when bpf_prog_test_run_xdp() passes an invalid value as the user_data argument to bpf_test_init().
Fix this by returning an error when user_data is less than ETH_HLEN in bpf_test_init(). Additionally, remove the check for "if (user_size > size)" as it is unnecessary.
[1] BUG: KMSAN: use-after-free in eth_skb_pkt_type include/linux/etherdevice.h:627 [inline] BUG: KMSAN: use-after-free in eth_type_trans+0x4ee/0x980 net/ethernet/eth.c:165 eth_skb_pkt_type include/linux/etherdevice.h:627 [inline] eth_type_trans+0x4ee/0x980 net/ethernet/eth.c:165 __xdp_build_skb_from_frame+0x5a8/0xa50 net/core/xdp.c:635 xdp_recv_frames net/bpf/test_run.c:272 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x2954/0x3330 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0x148e/0x1b10 net/bpf/test_run.c:1318 bpf_prog_test_run+0x5b7/0xa30 kernel/bpf/syscall.c:4371 __sys_bpf+0x6a6/0xe20 kernel/bpf/syscall.c:5777 __do_sys_bpf kernel/bpf/syscall.c:5866 [inline] __se_sys_bpf kernel/bpf/syscall.c:5864 [inline] __x64_sys_bpf+0xa4/0xf0 kernel/bpf/syscall.c:5864 x64_sys_call+0x2ea0/0x3d90 arch/x86/include/generated/asm/syscalls_64.h:322 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at: free_pages_prepare mm/page_alloc.c:1056 [inline] free_unref_page+0x156/0x1320 mm/page_alloc.c:2657 __free_pages+0xa3/0x1b0 mm/page_alloc.c:4838 bpf_ringbuf_free kernel/bpf/ringbuf.c:226 [inline] ringbuf_map_free+0xff/0x1e0 kernel/bpf/ringbuf.c:235 bpf_map_free kernel/bpf/syscall.c:838 [inline] bpf_map_free_deferred+0x17c/0x310 kernel/bpf/syscall.c:862 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa2b/0x1b60 kernel/workqueue.c:3310 worker_thread+0xedf/0x1550 kernel/workqueue.c:3391 kthread+0x535/0x6b0 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
CPU: 1 UID: 0 PID: 17276 Comm: syz.1.16450 Not tainted 6.12.0-05490-g9bb88c659673 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Fixes: be3d72a2896c ("bpf: move user_size out of bpf_test_init") Reported-by: syzkaller syzkaller@googlegroups.com Suggested-by: Martin KaFai Lau martin.lau@linux.dev Signed-off-by: Shigeru Yoshida syoshida@redhat.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Acked-by: Stanislav Fomichev sdf@fomichev.me Acked-by: Daniel Borkmann daniel@iogearbox.net Link: https://patch.msgid.link/20250121150643.671650-1-syoshida@redhat.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bpf/test_run.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 501ec4249fedc..8612023bec60d 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -660,12 +660,9 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, void __user *data_in = u64_to_user_ptr(kattr->test.data_in); void *data;
- if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) + if (user_size < ETH_HLEN || user_size > PAGE_SIZE - headroom - tailroom) return ERR_PTR(-EINVAL);
- if (user_size > size) - return ERR_PTR(-EMSGSIZE); - size = SKB_DATA_ALIGN(size); data = kzalloc(size + headroom + tailroom, GFP_USER); if (!data)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit 98671a0fd1f14e4a518ee06b19037c20014900eb ]
For all BPF maps we ensure that VM_MAYWRITE is cleared when memory-mapping BPF map contents as initially read-only VMA. This is because in some cases BPF verifier relies on the underlying data to not be modified afterwards by user space, so once something is mapped read-only, it shouldn't be re-mmap'ed as read-write.
As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for read-write mappings, and if VM_WRITE is not set, there is no way for user space to upgrade read-only mapping to read-write one.
This patch cleans up this VM_WRITE vs VM_MAYWRITE handling within bpf_map_mmap(), which is an entry point for any BPF map mmap()-ing logic. We also drop unnecessary sanitization of VM_MAYWRITE in BPF ringbuf's map_mmap() callback implementation, as it is already performed by common code in bpf_map_mmap().
Note, though, that in bpf_map_mmap_{open,close}() callbacks we can't drop VM_MAYWRITE use, because it's possible (and is outside of subsystem's control) to have initially read-write memory mapping, which is subsequently dropped to read-only by user space through mprotect(). In such case, from BPF verifier POV it's read-write data throughout the lifetime of BPF map, and is counted as "active writer".
But its VMAs will start out as VM_WRITE|VM_MAYWRITE, then mprotect() can change it to just VM_MAYWRITE (and no VM_WRITE), so when its finally munmap()'ed and bpf_map_mmap_close() is called, vm_flags will be just VM_MAYWRITE, but we still need to decrement active writer count with bpf_map_write_active_dec() as it's still considered to be a read-write mapping by the rest of BPF subsystem.
Similar reasoning applies to bpf_map_mmap_open(), which is called whenever mmap(), munmap(), and/or mprotect() forces mm subsystem to split original VMA into multiple discontiguous VMAs.
Memory-mapping handling is a bit tricky, yes.
Cc: Jann Horn jannh@google.com Cc: Suren Baghdasaryan surenb@google.com Cc: Shakeel Butt shakeel.butt@linux.dev Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20250129012246.1515826-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Stable-dep-of: bc27c52eea18 ("bpf: avoid holding freeze_mutex during mmap operation") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/ringbuf.c | 4 ---- kernel/bpf/syscall.c | 10 ++++++++-- 2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index e1cfe890e0be6..1499d8caa9a35 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma /* allow writable mapping for the consumer_pos only */ if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) return -EPERM; - } else { - vm_flags_clear(vma, VM_MAYWRITE); } /* remap_vmalloc_range() checks size and offset constraints */ return remap_vmalloc_range(vma, rb_map->rb, @@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma * position, and the ring buffer data itself. */ return -EPERM; - } else { - vm_flags_clear(vma, VM_MAYWRITE); } /* remap_vmalloc_range() checks size and offset constraints */ return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 368ae8d231d41..fa43f26ce0dac 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -966,15 +966,21 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) vma->vm_ops = &bpf_map_default_vmops; vma->vm_private_data = map; vm_flags_clear(vma, VM_MAYEXEC); + /* If mapping is read-only, then disallow potentially re-mapping with + * PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing + * means that as far as BPF map's memory-mapped VMAs are concerned, + * VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set, + * both should be set, so we can forget about VM_MAYWRITE and always + * check just VM_WRITE + */ if (!(vma->vm_flags & VM_WRITE)) - /* disallow re-mapping with PROT_WRITE */ vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma); if (err) goto out;
- if (vma->vm_flags & VM_MAYWRITE) + if (vma->vm_flags & VM_WRITE) bpf_map_write_active_inc(map); out: mutex_unlock(&map->freeze_mutex);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit bc27c52eea189e8f7492d40739b7746d67b65beb ]
We use map->freeze_mutex to prevent races between map_freeze() and memory mapping BPF map contents with writable permissions. The way we naively do this means we'll hold freeze_mutex for entire duration of all the mm and VMA manipulations, which is completely unnecessary. This can potentially also lead to deadlocks, as reported by syzbot in [0].
So, instead, hold freeze_mutex only during writeability checks, bump (proactively) "write active" count for the map, unlock the mutex and proceed with mmap logic. And only if something went wrong during mmap logic, then undo that "write active" counter increment.
[0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@google.com/
Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") Reported-by: syzbot+4dc041c686b7c816a71e@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20250129012246.1515826-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index fa43f26ce0dac..3200372ea28ce 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -936,7 +936,7 @@ static const struct vm_operations_struct bpf_map_default_vmops = { static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) { struct bpf_map *map = filp->private_data; - int err; + int err = 0;
if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record)) return -ENOTSUPP; @@ -960,7 +960,12 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) err = -EACCES; goto out; } + bpf_map_write_active_inc(map); } +out: + mutex_unlock(&map->freeze_mutex); + if (err) + return err;
/* set default open/close callbacks */ vma->vm_ops = &bpf_map_default_vmops; @@ -977,13 +982,11 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma); - if (err) - goto out; + if (err) { + if (vma->vm_flags & VM_WRITE) + bpf_map_write_active_dec(map); + }
- if (vma->vm_flags & VM_WRITE) - bpf_map_write_active_inc(map); -out: - mutex_unlock(&map->freeze_mutex); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiayuan Chen mrpre@163.com
[ Upstream commit 0532a79efd68a4d9686b0385e4993af4b130ff82 ]
Added a new read_sock handler, allowing users to customize read operations instead of relying on the native socket's read_sock.
Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Reviewed-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://patch.msgid.link/20250122100917.49845-2-mrpre@163.com Stable-dep-of: 36b62df5683c ("bpf: Fix wrong copied_seq calculation") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/strparser.rst | 9 ++++++++- include/net/strparser.h | 2 ++ net/strparser/strparser.c | 11 +++++++++-- 3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/strparser.rst b/Documentation/networking/strparser.rst index 6cab1f74ae05a..7f623d1db72aa 100644 --- a/Documentation/networking/strparser.rst +++ b/Documentation/networking/strparser.rst @@ -112,7 +112,7 @@ Functions Callbacks =========
-There are six callbacks: +There are seven callbacks:
::
@@ -182,6 +182,13 @@ There are six callbacks: the length of the message. skb->len - offset may be greater then full_len since strparser does not trim the skb.
+ :: + + int (*read_sock)(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); + + The read_sock callback is used by strparser instead of + sock->ops->read_sock, if provided. ::
int (*read_sock_done)(struct strparser *strp, int err); diff --git a/include/net/strparser.h b/include/net/strparser.h index 41e2ce9e9e10f..0a83010b3a64a 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -43,6 +43,8 @@ struct strparser; struct strp_callbacks { int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); + int (*read_sock)(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); int (*read_sock_done)(struct strparser *strp, int err); void (*abort_parser)(struct strparser *strp, int err); void (*lock)(struct strparser *strp); diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index 8299ceb3e3739..95696f42647ec 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -347,7 +347,10 @@ static int strp_read_sock(struct strparser *strp) struct socket *sock = strp->sk->sk_socket; read_descriptor_t desc;
- if (unlikely(!sock || !sock->ops || !sock->ops->read_sock)) + if (unlikely(!sock || !sock->ops)) + return -EBUSY; + + if (unlikely(!strp->cb.read_sock && !sock->ops->read_sock)) return -EBUSY;
desc.arg.data = strp; @@ -355,7 +358,10 @@ static int strp_read_sock(struct strparser *strp) desc.count = 1; /* give more than one skb per call */
/* sk should be locked here, so okay to do read_sock */ - sock->ops->read_sock(strp->sk, &desc, strp_recv); + if (strp->cb.read_sock) + strp->cb.read_sock(strp, &desc, strp_recv); + else + sock->ops->read_sock(strp->sk, &desc, strp_recv);
desc.error = strp->cb.read_sock_done(strp, desc.error);
@@ -468,6 +474,7 @@ int strp_init(struct strparser *strp, struct sock *sk, strp->cb.unlock = cb->unlock ? : strp_sock_unlock; strp->cb.rcv_msg = cb->rcv_msg; strp->cb.parse_msg = cb->parse_msg; + strp->cb.read_sock = cb->read_sock; strp->cb.read_sock_done = cb->read_sock_done ? : default_read_sock_done; strp->cb.abort_parser = cb->abort_parser ? : strp_abort_strp;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiayuan Chen mrpre@163.com
[ Upstream commit 36b62df5683c315ba58c950f1a9c771c796c30ec ]
'sk->copied_seq' was updated in the tcp_eat_skb() function when the action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, the update logic for 'sk->copied_seq' was moved to tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
It works for a single stream_verdict scenario, as it also modified sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb to remove updating 'sk->copied_seq'.
However, for programs where both stream_parser and stream_verdict are active (strparser purpose), tcp_read_sock() was used instead of tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock). tcp_read_sock() now still updates 'sk->copied_seq', leading to duplicate updates.
In summary, for strparser + SK_PASS, copied_seq is redundantly calculated in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
The issue causes incorrect copied_seq calculations, which prevent correct data reads from the recv() interface in user-land.
We do not want to add new proto_ops to implement a new version of tcp_read_sock, as this would introduce code complexity [1].
We could have added noack and copied_seq to desc, and then called ops->read_sock. However, unfortunately, other modules didn’t fully initialize desc to zero. So, for now, we are directly calling tcp_read_sock_noack() in tcp_bpf.c.
[1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com
Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Suggested-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Reviewed-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://patch.msgid.link/20250122100917.49845-3-mrpre@163.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/skmsg.h | 2 ++ include/net/tcp.h | 8 ++++++++ net/core/skmsg.c | 7 +++++++ net/ipv4/tcp.c | 29 ++++++++++++++++++++++++----- net/ipv4/tcp_bpf.c | 36 ++++++++++++++++++++++++++++++++++++ 5 files changed, 77 insertions(+), 5 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 2cbe0c22a32f3..0b9095a281b89 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -91,6 +91,8 @@ struct sk_psock { struct sk_psock_progs progs; #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) struct strparser strp; + u32 copied_seq; + u32 ingress_bytes; #endif struct sk_buff_head ingress_skb; struct list_head ingress_msg; diff --git a/include/net/tcp.h b/include/net/tcp.h index 6cd0fde806519..3255a199ef60d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -743,6 +743,9 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq); int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off); void tcp_read_done(struct sock *sk, size_t len); @@ -2609,6 +2612,11 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); +#ifdef CONFIG_BPF_STREAM_PARSER +struct strparser; +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); +#endif /* CONFIG_BPF_STREAM_PARSER */ #endif /* CONFIG_BPF_SYSCALL */
#ifdef CONFIG_INET diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 8ad7e6755fd64..f76cbf49c68c8 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -548,6 +548,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, return num_sge; }
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) + psock->ingress_bytes += len; +#endif copied = len; msg->sg.start = 0; msg->sg.size = copied; @@ -1143,6 +1146,10 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock) if (!ret) sk_psock_set_state(psock, SK_PSOCK_RX_STRP_ENABLED);
+ if (sk_is_tcp(sk)) { + psock->strp.cb.read_sock = tcp_bpf_strp_read_sock; + psock->copied_seq = tcp_sk(sk)->copied_seq; + } return ret; }
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4f77bd862e957..68cb6a966b18b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1564,12 +1564,13 @@ EXPORT_SYMBOL(tcp_recv_skb); * or for 'peeking' the socket using this routine * (although both would be easy to implement). */ -int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq) { struct sk_buff *skb; struct tcp_sock *tp = tcp_sk(sk); - u32 seq = tp->copied_seq; + u32 seq = *copied_seq; u32 offset; int copied = 0;
@@ -1623,9 +1624,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, tcp_eat_recv_skb(sk, skb); if (!desc->count) break; - WRITE_ONCE(tp->copied_seq, seq); + WRITE_ONCE(*copied_seq, seq); } - WRITE_ONCE(tp->copied_seq, seq); + WRITE_ONCE(*copied_seq, seq); + + if (noack) + goto out;
tcp_rcv_space_adjust(sk);
@@ -1634,10 +1638,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, tcp_recv_skb(sk, seq, &offset); tcp_cleanup_rbuf(sk, copied); } +out: return copied; } + +int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + return __tcp_read_sock(sk, desc, recv_actor, false, + &tcp_sk(sk)->copied_seq); +} EXPORT_SYMBOL(tcp_read_sock);
+int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq) +{ + return __tcp_read_sock(sk, desc, recv_actor, noack, copied_seq); +} + int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct sk_buff *skb; diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 392678ae80f4e..22e8a2af5dd8b 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -646,6 +646,42 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendmsg == tcp_sendmsg ? 0 : -ENOTSUPP; }
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + struct sock *sk = strp->sk; + struct sk_psock *psock; + struct tcp_sock *tp; + int copied = 0; + + tp = tcp_sk(sk); + rcu_read_lock(); + psock = sk_psock(sk); + if (WARN_ON_ONCE(!psock)) { + desc->error = -EINVAL; + goto out; + } + + psock->ingress_bytes = 0; + copied = tcp_read_sock_noack(sk, desc, recv_actor, true, + &psock->copied_seq); + if (copied < 0) + goto out; + /* recv_actor may redirect skb to another socket (SK_REDIRECT) or + * just put skb into ingress queue of current socket (SK_PASS). + * For SK_REDIRECT, we need to ack the frame immediately but for + * SK_PASS, we want to delay the ack until tcp_bpf_recvmsg_parser(). + */ + tp->copied_seq = psock->copied_seq - psock->ingress_bytes; + tcp_rcv_space_adjust(sk); + __tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes); +out: + rcu_read_unlock(); + return copied; +} +#endif /* CONFIG_BPF_STREAM_PARSER */ + int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) { int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiayuan Chen mrpre@163.com
[ Upstream commit 5459cce6bf49e72ee29be21865869c2ac42419f5 ]
Currently, only TCP supports strparser, but sockmap doesn't intercept non-TCP connections to attach strparser. For example, with UDP, although the read/write handlers are replaced, strparser is not executed due to the lack of a read_sock operation.
Furthermore, in udp_bpf_recvmsg(), it checks whether the psock has data, and if not, it falls back to the native UDP read interface, making UDP + strparser appear to read correctly. According to its commit history, this behavior is unexpected.
Moreover, since UDP lacks the concept of streams, we intercept it directly.
Fixes: 1fa1fe8ff161 ("bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0") Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Acked-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://patch.msgid.link/20250122100917.49845-4-mrpre@163.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 2f1be9baad057..82a14f131d00c 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -303,7 +303,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
write_lock_bh(&sk->sk_callback_lock); if (stream_parser && stream_verdict && !psock->saved_data_ready) { - ret = sk_psock_init_strp(sk, psock); + if (sk_is_tcp(sk)) + ret = sk_psock_init_strp(sk, psock); + else + ret = -EOPNOTSUPP; if (ret) { write_unlock_bh(&sk->sk_callback_lock); sk_psock_put(sk, psock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abel Wu wuyun.abel@bytedance.com
[ Upstream commit c78f4afbd962f43a3989f45f3ca04300252b19b5 ]
The following commit bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]") first introduced deadlock prevention for fentry/fexit programs attaching on bpf_task_storage helpers. That commit also employed the logic in map free path in its v6 version.
Later bpf_cgrp_storage was first introduced in c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") which faces the same issue as bpf_task_storage, instead of its busy counter, NULL was passed to bpf_local_storage_map_free() which opened a window to cause deadlock:
<TASK> (acquiring local_storage->lock) _raw_spin_lock_irqsave+0x3d/0x50 bpf_local_storage_update+0xd1/0x460 bpf_cgrp_storage_get+0x109/0x130 bpf_prog_a4d4a370ba857314_cgrp_ptr+0x139/0x170 ? __bpf_prog_enter_recur+0x16/0x80 bpf_trampoline_6442485186+0x43/0xa4 cgroup_storage_ptr+0x9/0x20 (holding local_storage->lock) bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 bpf_selem_unlink_storage+0x6f/0x110 bpf_local_storage_map_free+0xa2/0x110 bpf_map_free_deferred+0x5b/0x90 process_one_work+0x17c/0x390 worker_thread+0x251/0x360 kthread+0xd2/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK>
Progs: - A: SEC("fentry/cgroup_storage_ptr") - cgid (BPF_MAP_TYPE_HASH) Record the id of the cgroup the current task belonging to in this hash map, using the address of the cgroup as the map key. - cgrpa (BPF_MAP_TYPE_CGRP_STORAGE) If current task is a kworker, lookup the above hash map using function parameter @owner as the key to get its corresponding cgroup id which is then used to get a trusted pointer to the cgroup through bpf_cgroup_from_id(). This trusted pointer can then be passed to bpf_cgrp_storage_get() to finally trigger the deadlock issue. - B: SEC("tp_btf/sys_enter") - cgrpb (BPF_MAP_TYPE_CGRP_STORAGE) The only purpose of this prog is to fill Prog A's hash map by calling bpf_cgrp_storage_get() for as many userspace tasks as possible.
Steps to reproduce: - Run A; - while (true) { Run B; Destroy B; }
Fix this issue by passing its busy counter to the free procedure so it can be properly incremented before storage/smap locking.
Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") Signed-off-by: Abel Wu wuyun.abel@bytedance.com Acked-by: Martin KaFai Lau martin.lau@kernel.org Link: https://lore.kernel.org/r/20241221061018.37717-1-wuyun.abel@bytedance.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/bpf_cgrp_storage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c index 28efd0a3f2200..6547fb7ac0dcb 100644 --- a/kernel/bpf/bpf_cgrp_storage.c +++ b/kernel/bpf/bpf_cgrp_storage.c @@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
static void cgroup_storage_map_free(struct bpf_map *map) { - bpf_local_storage_map_free(map, &cgroup_cache, NULL); + bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy); }
/* *gfp_flags* is a hidden argument provided by the verifier */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Yan andyshrk@163.com
[ Upstream commit a1d939055a22be06d8c12bf53afb258b9d38575f ]
According to the schematic, the lcdpwr_en pin is GPIO0_C4, not GPIO1_C4.
Fixes: 4a8c1161b843 ("arm64: dts: rockchip: Add support for rk3588 based Cool Pi CM5 GenBook") Signed-off-by: Andy Yan andyshrk@163.com Link: https://lore.kernel.org/r/20250113104825.2390427-1-andyshrk@163.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts index 6418286efe40d..762d36ad733ab 100644 --- a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts +++ b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts @@ -101,7 +101,7 @@ vcc3v3_lcd: vcc3v3-lcd-regulator { compatible = "regulator-fixed"; regulator-name = "vcc3v3_lcd"; enable-active-high; - gpio = <&gpio1 RK_PC4 GPIO_ACTIVE_HIGH>; + gpio = <&gpio0 RK_PC4 GPIO_ACTIVE_HIGH>; pinctrl-names = "default"; pinctrl-0 = <&lcdpwr_en>; vin-supply = <&vcc3v3_sys>; @@ -207,7 +207,7 @@ &pcie3x4 { &pinctrl { lcd { lcdpwr_en: lcdpwr-en { - rockchip,pins = <1 RK_PC4 RK_FUNC_GPIO &pcfg_pull_down>; + rockchip,pins = <0 RK_PC4 RK_FUNC_GPIO &pcfg_pull_down>; };
bl_en: bl-en {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Vatoropin a.vatoropin@crpt.ru
[ Upstream commit 3fb3cb4350befc4f901c54e0cb4a2a47b1302e08 ]
Size of variable sd_gain equals four bytes - DA9150_QIF_SD_GAIN_SIZE. Size of variable shunt_val equals two bytes - DA9150_QIF_SHUNT_VAL_SIZE.
The expression sd_gain * shunt_val is currently being evaluated using 32-bit arithmetic. So during the multiplication an overflow may occur.
As the value of type 'u64' is used as storage for the eventual result, put ULL variable at the first position of each expression in order to give the compiler complete information about the proper arithmetic to use. According to C99 the guaranteed width for a variable of type 'unsigned long long' >= 64 bits.
Remove the explicit cast to u64 as it is meaningless.
Just for the sake of consistency, perform the similar trick with another expression concerning 'iavg'.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a419b4fd9138 ("power: Add support for DA9150 Fuel-Gauge") Signed-off-by: Andrey Vatoropin a.vatoropin@crpt.ru Link: https://lore.kernel.org/r/20250130090030.53422-1-a.vatoropin@crpt.ru Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/da9150-fg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/power/supply/da9150-fg.c b/drivers/power/supply/da9150-fg.c index 652c1f213af1c..4f28ef1bba1a3 100644 --- a/drivers/power/supply/da9150-fg.c +++ b/drivers/power/supply/da9150-fg.c @@ -247,9 +247,9 @@ static int da9150_fg_current_avg(struct da9150_fg *fg, DA9150_QIF_SD_GAIN_SIZE); da9150_fg_read_sync_end(fg);
- div = (u64) (sd_gain * shunt_val * 65536ULL); + div = 65536ULL * sd_gain * shunt_val; do_div(div, 1000000); - res = (u64) (iavg * 1000000ULL); + res = 1000000ULL * iavg; do_div(res, div);
val->intval = (int) res;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chris Morgan macromorgan@hotmail.com
[ Upstream commit 98380110bd48fbfd6a798ee11fffff893d36062c ]
Correct the fault handling for the AXP717 by changing the i2c write from regmap_update_bits() to regmap_write_bits(). The update bits function does not work properly on a RW1C register where we must write a 1 back to an existing register to clear it.
Additionally, as part of this testing I confirmed the behavior of errors reappearing, so remove comment about assumptions.
Fixes: 6625767049c2 ("power: supply: axp20x_battery: add support for AXP717") Signed-off-by: Chris Morgan macromorgan@hotmail.com Reviewed-by: Chen-Yu Tsai wens@csie.org Link: https://lore.kernel.org/r/20250131231455.153447-2-macroalpha82@gmail.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/axp20x_battery.c | 31 +++++++++++++-------------- 1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/drivers/power/supply/axp20x_battery.c b/drivers/power/supply/axp20x_battery.c index f71cc90fea127..57eba1ddb17ba 100644 --- a/drivers/power/supply/axp20x_battery.c +++ b/drivers/power/supply/axp20x_battery.c @@ -466,10 +466,9 @@ static int axp717_battery_get_prop(struct power_supply *psy,
/* * If a fault is detected it must also be cleared; if the - * condition persists it should reappear (This is an - * assumption, it's actually not documented). A restart was - * not sufficient to clear the bit in testing despite the - * register listed as POR. + * condition persists it should reappear. A restart was not + * sufficient to clear the bit in testing despite the register + * listed as POR. */ case POWER_SUPPLY_PROP_HEALTH: ret = regmap_read(axp20x_batt->regmap, AXP717_PMU_FAULT, @@ -480,26 +479,26 @@ static int axp717_battery_get_prop(struct power_supply *psy, switch (reg & AXP717_BATT_PMU_FAULT_MASK) { case AXP717_BATT_UVLO_2_5V: val->intval = POWER_SUPPLY_HEALTH_DEAD; - regmap_update_bits(axp20x_batt->regmap, - AXP717_PMU_FAULT, - AXP717_BATT_UVLO_2_5V, - AXP717_BATT_UVLO_2_5V); + regmap_write_bits(axp20x_batt->regmap, + AXP717_PMU_FAULT, + AXP717_BATT_UVLO_2_5V, + AXP717_BATT_UVLO_2_5V); return 0;
case AXP717_BATT_OVER_TEMP: val->intval = POWER_SUPPLY_HEALTH_HOT; - regmap_update_bits(axp20x_batt->regmap, - AXP717_PMU_FAULT, - AXP717_BATT_OVER_TEMP, - AXP717_BATT_OVER_TEMP); + regmap_write_bits(axp20x_batt->regmap, + AXP717_PMU_FAULT, + AXP717_BATT_OVER_TEMP, + AXP717_BATT_OVER_TEMP); return 0;
case AXP717_BATT_UNDER_TEMP: val->intval = POWER_SUPPLY_HEALTH_COLD; - regmap_update_bits(axp20x_batt->regmap, - AXP717_PMU_FAULT, - AXP717_BATT_UNDER_TEMP, - AXP717_BATT_UNDER_TEMP); + regmap_write_bits(axp20x_batt->regmap, + AXP717_PMU_FAULT, + AXP717_BATT_UNDER_TEMP, + AXP717_BATT_UNDER_TEMP); return 0;
default:
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kumar Kartikeya Dwivedi memxor@gmail.com
[ Upstream commit d798ce3f4cab1b0d886b19ec5cc8e6b3d7e35081 ]
Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in raw_tp program. Without the previous fix, this selftest crashes the kernel due to a NULL-pointer dereference. Also ensure that dead code elimination does not kick in for checks on the pointer.
Reviewed-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20241104171959.2938862-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Stable-dep-of: 5da7e15fb5a1 ("net: Add rx_skb of kfree_skb to raw_tp_null_args[].") Signed-off-by: Sasha Levin sashal@kernel.org --- .../bpf/bpf_testmod/bpf_testmod-events.h | 8 +++++ .../selftests/bpf/bpf_testmod/bpf_testmod.c | 2 ++ .../selftests/bpf/prog_tests/raw_tp_null.c | 25 +++++++++++++++ .../testing/selftests/bpf/progs/raw_tp_null.c | 32 +++++++++++++++++++ 4 files changed, 67 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_null.c create mode 100644 tools/testing/selftests/bpf/progs/raw_tp_null.c
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h index 6c3b4d4f173ac..aeef86b3da747 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h @@ -40,6 +40,14 @@ DECLARE_TRACE(bpf_testmod_test_nullable_bare, TP_ARGS(ctx__nullable) );
+struct sk_buff; + +DECLARE_TRACE(bpf_testmod_test_raw_tp_null, + TP_PROTO(struct sk_buff *skb), + TP_ARGS(skb) +); + + #undef BPF_TESTMOD_DECLARE_TRACE #ifdef DECLARE_TRACE_WRITABLE #define BPF_TESTMOD_DECLARE_TRACE(call, proto, args, size) \ diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 8835761d9a126..4e6a9e9c03687 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -380,6 +380,8 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
(void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2);
+ (void)trace_bpf_testmod_test_raw_tp_null(NULL); + struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) + sizeof(int)), GFP_KERNEL); if (struct_arg3 != NULL) { diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c new file mode 100644 index 0000000000000..6fa19449297e9 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_null.c @@ -0,0 +1,25 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ + +#include <test_progs.h> +#include "raw_tp_null.skel.h" + +void test_raw_tp_null(void) +{ + struct raw_tp_null *skel; + + skel = raw_tp_null__open_and_load(); + if (!ASSERT_OK_PTR(skel, "raw_tp_null__open_and_load")) + return; + + skel->bss->tid = sys_gettid(); + + if (!ASSERT_OK(raw_tp_null__attach(skel), "raw_tp_null__attach")) + goto end; + + ASSERT_OK(trigger_module_test_read(2), "trigger testmod read"); + ASSERT_EQ(skel->bss->i, 3, "invocations"); + +end: + raw_tp_null__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/raw_tp_null.c b/tools/testing/selftests/bpf/progs/raw_tp_null.c new file mode 100644 index 0000000000000..457f34c151e32 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/raw_tp_null.c @@ -0,0 +1,32 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ + +#include <vmlinux.h> +#include <bpf/bpf_tracing.h> + +char _license[] SEC("license") = "GPL"; + +int tid; +int i; + +SEC("tp_btf/bpf_testmod_test_raw_tp_null") +int BPF_PROG(test_raw_tp_null, struct sk_buff *skb) +{ + struct task_struct *task = bpf_get_current_task_btf(); + + if (task->pid != tid) + return 0; + + i = i + skb->mark + 1; + /* The compiler may move the NULL check before this deref, which causes + * the load to fail as deref of scalar. Prevent that by using a barrier. + */ + barrier(); + /* If dead code elimination kicks in, the increment below will + * be removed. For raw_tp programs, we mark input arguments as + * PTR_MAYBE_NULL, so branch prediction should never kick in. + */ + if (!skb) + i += 2; + return 0; +}
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 5da7e15fb5a12e78de974d8908f348e279922ce9 ]
Yan Zhai reported a BPF prog could trigger a null-ptr-deref [0] in trace_kfree_skb if the prog does not check if rx_sk is NULL.
Commit c53795d48ee8 ("net: add rx_sk to trace_kfree_skb") added rx_sk to trace_kfree_skb, but rx_sk is optional and could be NULL.
Let's add kfree_skb to raw_tp_null_args[] to let the BPF verifier validate such a prog and prevent the issue.
Now we fail to load such a prog:
libbpf: prog 'drop': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int BPF_PROG(drop, struct sk_buff *skb, void *location, @ kfree_skb_sk_null.bpf.c:21 0: (79) r3 = *(u64 *)(r1 +24) func 'kfree_skb' arg3 has btf_id 5253 type STRUCT 'sock' 1: R1=ctx() R3_w=trusted_ptr_or_null_sock(id=1) ; bpf_printk("sk: %d, %d\n", sk, sk->__sk_common.skc_family); @ kfree_skb_sk_null.bpf.c:24 1: (69) r4 = *(u16 *)(r3 +16) R3 invalid mem access 'trusted_ptr_or_null_' processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG --
Note this fix requires commit 838a10bd2ebf ("bpf: Augment raw_tp arguments with PTR_MAYBE_NULL").
[0]: BUG: kernel NULL pointer dereference, address: 0000000000000010 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 PREEMPT SMP RIP: 0010:bpf_prog_5e21a6db8fcff1aa_drop+0x10/0x2d Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x148/0x420 ? search_bpf_extables+0x5b/0x70 ? fixup_exception+0x27/0x2c0 ? exc_page_fault+0x75/0x170 ? asm_exc_page_fault+0x22/0x30 ? bpf_prog_5e21a6db8fcff1aa_drop+0x10/0x2d bpf_trace_run4+0x68/0xd0 ? unix_stream_connect+0x1f4/0x6f0 sk_skb_reason_drop+0x90/0x120 unix_stream_connect+0x1f4/0x6f0 __sys_connect+0x7f/0xb0 __x64_sys_connect+0x14/0x20 do_syscall_64+0x47/0xc30 entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: c53795d48ee8 ("net: add rx_sk to trace_kfree_skb") Reported-by: Yan Zhai yan@cloudflare.com Closes: https://lore.kernel.org/netdev/Z50zebTRzI962e6X@debian.debian/ Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Tested-by: Yan Zhai yan@cloudflare.com Acked-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/r/20250201030142.62703-1-kuniyu@amazon.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/btf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index a44f4be592be7..2c54c148a94f3 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6483,6 +6483,8 @@ static const struct bpf_raw_tp_null_args raw_tp_null_args[] = { /* rxrpc */ { "rxrpc_recvdata", 0x1 }, { "rxrpc_resend", 0x10 }, + /* skb */ + {"kfree_skb", 0x1000}, /* sunrpc */ { "xs_stream_read_data", 0x1 }, /* ... from xprt_cong_event event class */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Maguire alan.maguire@oracle.com
[ Upstream commit 517e8a7835e8cfb398a0aeb0133de50e31cae32b ]
On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y, arena_htab tests cause a segmentation fault and soft lockup. The same failure is not observed with 4k pages on aarch64.
It turns out arena_map_free() is calling apply_to_existing_page_range() with the address returned by bpf_arena_get_kern_vm_start(). If this address is not page-aligned the code ends up calling apply_to_pte_range() with that unaligned address causing soft lockup.
Fix it by round up GUARD_SZ to PAGE_SIZE << 1 so that the division by 2 in bpf_arena_get_kern_vm_start() returns a page-aligned value.
Fixes: 317460317a02 ("bpf: Introduce bpf_arena.") Reported-by: Colm Harrington colm.harrington@oracle.com Suggested-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Alan Maguire alan.maguire@oracle.com Link: https://lore.kernel.org/r/20250205170059.427458-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/arena.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 93e48c7cad4ef..8c775a1401d3e 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -37,7 +37,7 @@ */
/* number of bytes addressable by LDX/STX insn with 16-bit 'off' field */ -#define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8) +#define GUARD_SZ round_up(1ull << sizeof_field(struct bpf_insn, off) * 8, PAGE_SIZE << 1) #define KERN_VM_SZ (SZ_4G + GUARD_SZ)
struct bpf_arena {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Patrick Wildt patrick@blueri.se
[ Upstream commit 8546cfd08aa4b982acd2357403a1f15495d622ec ]
The SMMU architecture requires wired interrupts to be edge triggered, which does not align with the DT description for the RK3588. This leads to interrupt storms, as the SMMU continues to hold the pin high and only pulls it down for a short amount when issuing an IRQ. Update the DT description to be in line with the spec and perceived reality.
Signed-off-by: Patrick Wildt patrick@blueri.se Fixes: cd81d3a0695c ("arm64: dts: rockchip: add rk3588 pcie and php IOMMUs") Reviewed-by: Niklas Cassel cassel@kernel.org Link: https://lore.kernel.org/r/Z6pxme2Chmf3d3uK@windev.fritz.box Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi index fc67585b64b7b..1fd8093f2124c 100644 --- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi @@ -549,10 +549,10 @@ usb_host2_xhci: usb@fcd00000 { mmu600_pcie: iommu@fc900000 { compatible = "arm,smmu-v3"; reg = <0x0 0xfc900000 0x0 0x200000>; - interrupts = <GIC_SPI 369 IRQ_TYPE_LEVEL_HIGH 0>, - <GIC_SPI 371 IRQ_TYPE_LEVEL_HIGH 0>, - <GIC_SPI 374 IRQ_TYPE_LEVEL_HIGH 0>, - <GIC_SPI 367 IRQ_TYPE_LEVEL_HIGH 0>; + interrupts = <GIC_SPI 369 IRQ_TYPE_EDGE_RISING 0>, + <GIC_SPI 371 IRQ_TYPE_EDGE_RISING 0>, + <GIC_SPI 374 IRQ_TYPE_EDGE_RISING 0>, + <GIC_SPI 367 IRQ_TYPE_EDGE_RISING 0>; interrupt-names = "eventq", "gerror", "priq", "cmdq-sync"; #iommu-cells = <1>; status = "disabled"; @@ -561,10 +561,10 @@ mmu600_pcie: iommu@fc900000 { mmu600_php: iommu@fcb00000 { compatible = "arm,smmu-v3"; reg = <0x0 0xfcb00000 0x0 0x200000>; - interrupts = <GIC_SPI 381 IRQ_TYPE_LEVEL_HIGH 0>, - <GIC_SPI 383 IRQ_TYPE_LEVEL_HIGH 0>, - <GIC_SPI 386 IRQ_TYPE_LEVEL_HIGH 0>, - <GIC_SPI 379 IRQ_TYPE_LEVEL_HIGH 0>; + interrupts = <GIC_SPI 381 IRQ_TYPE_EDGE_RISING 0>, + <GIC_SPI 383 IRQ_TYPE_EDGE_RISING 0>, + <GIC_SPI 386 IRQ_TYPE_EDGE_RISING 0>, + <GIC_SPI 379 IRQ_TYPE_EDGE_RISING 0>; interrupt-names = "eventq", "gerror", "priq", "cmdq-sync"; #iommu-cells = <1>; status = "disabled";
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Fan peng.fan@nxp.com
[ Upstream commit ab027c488fc4a1fff0a5b712d4bdb2d2d324e8f8 ]
'struct scmi_imx_misc_ctrl_set_in' has a zero length array in the end, The sizeof will not count 'value[]', and hence Tx size will be smaller than actual size for Tx,and SCMI firmware will flag this as protocol error.
Fix this by enlarge the Tx size with 'num * sizeof(__le32)' to count in the size of data.
Fixes: 61c9f03e22fc ("firmware: arm_scmi: Add initial support for i.MX MISC protocol") Reviewed-by: Jacky Bai ping.bai@nxp.com Tested-by: Shengjiu Wang shengjiu.wang@nxp.com Acked-by: Jason Liu jason.hui.liu@nxp.com Signed-off-by: Peng Fan peng.fan@nxp.com Message-Id: 20250123063441.392555-1-peng.fan@oss.nxp.com (sudeep.holla: Commit rewording and replace hardcoded sizeof(__le32) value) Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c b/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c index a86ab9b35953f..2641faa329cdd 100644 --- a/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c +++ b/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c @@ -254,8 +254,8 @@ static int scmi_imx_misc_ctrl_set(const struct scmi_protocol_handle *ph, if (num > max_num) return -EINVAL;
- ret = ph->xops->xfer_get_init(ph, SCMI_IMX_MISC_CTRL_SET, sizeof(*in), - 0, &t); + ret = ph->xops->xfer_get_init(ph, SCMI_IMX_MISC_CTRL_SET, + sizeof(*in) + num * sizeof(__le32), 0, &t); if (ret) return ret;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
[ Upstream commit fbe8f2fa971c537571994a0df532c511c4fb5537 ]
queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() calls from the raid*_set_limits() functions because there is no corresponding queue_limits_start_update() call.
Cc: Christoph Hellwig hch@lst.de Fixes: c6e56cf6b2e7 ("block: move integrity information into queue_limits") Signed-off-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/linux-raid/20250212171108.3483150-1-bvanassche@acm.o... Signed-off-by: Yu Kuai yukuai@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/raid0.c | 4 +--- drivers/md/raid1.c | 4 +--- drivers/md/raid10.c | 4 +--- 3 files changed, 3 insertions(+), 9 deletions(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 32d5875247784..31bea72bcb01a 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -385,10 +385,8 @@ static int raid0_set_limits(struct mddev *mddev) lim.io_min = mddev->chunk_sectors << 9; lim.io_opt = lim.io_min * mddev->raid_disks; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); - if (err) { - queue_limits_cancel_update(mddev->gendisk->queue); + if (err) return err; - } return queue_limits_set(mddev->gendisk->queue, &lim); }
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index d83fe3b3abc00..8a994a1975ca7 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -3171,10 +3171,8 @@ static int raid1_set_limits(struct mddev *mddev) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors = 0; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); - if (err) { - queue_limits_cancel_update(mddev->gendisk->queue); + if (err) return err; - } return queue_limits_set(mddev->gendisk->queue, &lim); }
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index daf42acc4fb6f..a214fed4f1622 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -3963,10 +3963,8 @@ static int raid10_set_queue_limits(struct mddev *mddev) lim.io_min = mddev->chunk_sectors << 9; lim.io_opt = lim.io_min * raid10_nr_stripes(conf); err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); - if (err) { - queue_limits_cancel_update(mddev->gendisk->queue); + if (err) return err; - } return queue_limits_set(mddev->gendisk->queue, &lim); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit be6686b823b30a69b1f71bde228ce042c78a1941 ]
The i.MX System Controller Management Interface firmware is only present on Freescale i.MX SoCs. Hence add a dependency on ARCH_MXC, to prevent asking the user about this driver when configuring a kernel without Freescale i.MX platform support.
Fixes: 514b2262ade48a05 ("firmware: arm_scmi: Fix i.MX build dependency") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Fabio Estevam festevam@gmail.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/imx/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/firmware/imx/Kconfig b/drivers/firmware/imx/Kconfig index 907cd149c40a8..c964f4924359f 100644 --- a/drivers/firmware/imx/Kconfig +++ b/drivers/firmware/imx/Kconfig @@ -25,6 +25,7 @@ config IMX_SCU
config IMX_SCMI_MISC_DRV tristate "IMX SCMI MISC Protocol driver" + depends on ARCH_MXC || COMPILE_TEST default y if ARCH_MXC help The System Controller Management Interface firmware (SCMI FW) is
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit dd0f05b98925111f4530d7dab774398cdb32e9e3 ]
CZ.NIC's Turris devices are based on Marvell EBU SoCs. Hence add a dependency on ARCH_MVEBU, to prevent asking the user about these drivers when configuring a kernel that cannot run on an affected CZ.NIC Turris system.
Fixes: 992f1a3d4e88498d ("platform: cznic: Add preliminary support for Turris Omnia MCU") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/cznic/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/cznic/Kconfig b/drivers/platform/cznic/Kconfig index 49c383eb67854..13e37b49d9d01 100644 --- a/drivers/platform/cznic/Kconfig +++ b/drivers/platform/cznic/Kconfig @@ -6,6 +6,7 @@
menuconfig CZNIC_PLATFORMS bool "Platform support for CZ.NIC's Turris hardware" + depends on ARCH_MVEBU || COMPILE_TEST help Say Y here to be able to choose driver support for CZ.NIC's Turris devices. This option alone does not add any kernel code.
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
[ Upstream commit b3fefbb30a1691533cb905006b69b2a474660744 ]
In case we have to retry the loop, we are missing to unlock+put the folio. In that case, we will keep failing make_device_exclusive_range() because we cannot grab the folio lock, and even return from the function with the folio locked and referenced, effectively never succeeding the make_device_exclusive_range().
While at it, convert the other unlock+put to use a folio as well.
This was found by code inspection.
Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access") Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Alistair Popple apopple@nvidia.com Tested-by: Alistair Popple apopple@nvidia.com Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20250124181524.3584236-2-david... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index b4da82ddbb6b2..8ea98f06d39af 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, unsigned long timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); struct mm_struct *mm = svmm->notifier.mm; + struct folio *folio; struct page *page; unsigned long start = args->p.addr; unsigned long notifier_seq; @@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = -EINVAL; goto out; } + folio = page_folio(page);
mutex_lock(&svmm->mutex); if (!mmu_interval_read_retry(¬ifier->notifier, notifier_seq)) break; mutex_unlock(&svmm->mutex); + + folio_unlock(folio); + folio_put(folio); }
/* Map the page on the GPU. */ @@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); mutex_unlock(&svmm->mutex);
- unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio);
out: mmu_interval_notifier_remove(¬ifier->notifier);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit 669c285620231786fffe9d87ab432e08a6ed922b ]
If userspace is trying to achieve a timeout of zero, let 'em have it. Only round up if the timeout is greater than zero.
Fixes: 4969bccd5f4e ("drm/msm: Avoid rounding down to zero jiffies") Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Akhil P Oommen quic_akhilpo@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/632264/ Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_drv.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index 2e28a13446366..9526b22038ab8 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -543,15 +543,12 @@ static inline int align_pitch(int width, int bpp) static inline unsigned long timeout_to_jiffies(const ktime_t *timeout) { ktime_t now = ktime_get(); - s64 remaining_jiffies;
- if (ktime_compare(*timeout, now) < 0) { - remaining_jiffies = 0; - } else { - ktime_t rem = ktime_sub(*timeout, now); - remaining_jiffies = ktime_divns(rem, NSEC_PER_SEC / HZ); - } + if (ktime_compare(*timeout, now) <= 0) + return 0;
+ ktime_t rem = ktime_sub(*timeout, now); + s64 remaining_jiffies = ktime_divns(rem, NSEC_PER_SEC / HZ); return clamp(remaining_jiffies, 1LL, (s64)INT_MAX); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit 2f69e54584475ac85ea0e3407c9198ac7c6ea8ad ]
The SM8450 and later chips have DPU_MDP_PERIPH_0_REMOVED feature bit set, which means that those platforms have dropped some of the registers, including the WD TIMER-related ones. Stop providing the callback to program WD timer on those platforms.
Fixes: 100d7ef6995d ("drm/msm/dpu: add support for SM8450") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/628874/ Link: https://lore.kernel.org/r/20241214-dpu-drop-features-v1-1-988f0662cb7e@linar... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c index 0f40eea7f5e24..2040bee8d512f 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c @@ -272,7 +272,7 @@ static void _setup_mdp_ops(struct dpu_hw_mdp_ops *ops,
if (cap & BIT(DPU_MDP_VSYNC_SEL)) ops->setup_vsync_source = dpu_hw_setup_vsync_sel; - else + else if (!(cap & BIT(DPU_MDP_PERIPH_0_REMOVED))) ops->setup_vsync_source = dpu_hw_setup_wd_timer;
ops->get_safe_status = dpu_hw_get_safe_status;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit af0a4a2090cce732c70ad6c5f4145b43f39e3fe9 ]
Several DPU 5.x platforms are supposed to be using DPU_WB_INPUT_CTRL, to bind WB and PINGPONG blocks, but they do not. Change those platforms to use WB_SM8250_MASK, which includes that bit.
Fixes: 1f5bcc4316b3 ("drm/msm/dpu: enable writeback on SC8108X") Fixes: ab2b03d73a66 ("drm/msm/dpu: enable writeback on SM6125") Fixes: 47cebb740a83 ("drm/msm/dpu: enable writeback on SM8150") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/628876/ Link: https://lore.kernel.org/r/20241214-dpu-drop-features-v1-2-988f0662cb7e@linar... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h | 2 +- drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h | 2 +- drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h index 421afacb72480..36cc9dbc00b5c 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h +++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h @@ -297,7 +297,7 @@ static const struct dpu_wb_cfg sm8150_wb[] = { { .name = "wb_2", .id = WB_2, .base = 0x65000, .len = 0x2c8, - .features = WB_SDM845_MASK, + .features = WB_SM8250_MASK, .format_list = wb2_formats_rgb, .num_formats = ARRAY_SIZE(wb2_formats_rgb), .clk_ctrl = DPU_CLK_CTRL_WB2, diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h index 641023b102bf5..e8eacdb47967a 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h +++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h @@ -304,7 +304,7 @@ static const struct dpu_wb_cfg sc8180x_wb[] = { { .name = "wb_2", .id = WB_2, .base = 0x65000, .len = 0x2c8, - .features = WB_SDM845_MASK, + .features = WB_SM8250_MASK, .format_list = wb2_formats_rgb, .num_formats = ARRAY_SIZE(wb2_formats_rgb), .clk_ctrl = DPU_CLK_CTRL_WB2, diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h index d039b96beb97c..76f60a2df7a89 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h +++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h @@ -144,7 +144,7 @@ static const struct dpu_wb_cfg sm6125_wb[] = { { .name = "wb_2", .id = WB_2, .base = 0x65000, .len = 0x2c8, - .features = WB_SDM845_MASK, + .features = WB_SM8250_MASK, .format_list = wb2_formats_rgb, .num_formats = ARRAY_SIZE(wb2_formats_rgb), .clk_ctrl = DPU_CLK_CTRL_WB2,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marijn Suijten marijn.suijten@somainline.org
[ Upstream commit 144429831f447223253a0e4376489f84ff37d1a7 ]
What used to be the input_10_bits boolean - feeding into the lowest bit of DSC_ENC - on MSM downstream turned into an accidental OR with the full bits_per_component number when it was ported to the upstream kernel.
On typical bpc=8 setups we don't notice this because line_buf_depth is always an odd value (it contains bpc+1) and will also set the 4th bit after left-shifting by 3 (hence this |= bits_per_component is a no-op).
Now that guards are being removed to allow more bits_per_component values besides 8 (possible since commit 49fd30a7153b ("drm/msm/dsi: use DRM DSC helpers for DSC setup")), a bpc of 10 will instead clash with the 5th bit which is convert_rgb. This is "fortunately" also always set to true by MSM's dsi_populate_dsc_params() already, but once a bpc of 12 starts being used it'll write into simple_422 which is normally false.
To solve all these overlaps, simply replicate downstream code and only set this lowest bit if bits_per_component is equal to 10. It is unclear why DSC requires this only for bpc=10 but not bpc=12, and also notice that this lowest bit wasn't set previously despite having a panel and patch on the list using it without any mentioned issues.
Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC") Signed-off-by: Marijn Suijten marijn.suijten@somainline.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Reviewed-by: Konrad Dybcio konrad.dybcio@oss.qualcomm.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/636311/ Link: https://lore.kernel.org/r/20250211-dsc-10-bit-v1-1-1c85a9430d9a@somainline.o... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c index 5e9aad1b2aa28..d1e0fb2139765 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c @@ -52,6 +52,7 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc, u32 slice_last_group_size; u32 det_thresh_flatness; bool is_cmd_mode = !(mode & DSC_MODE_VIDEO); + bool input_10_bits = dsc->bits_per_component == 10;
DPU_REG_WRITE(c, DSC_COMMON_MODE, mode);
@@ -68,7 +69,7 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc, data |= (dsc->line_buf_depth << 3); data |= (dsc->simple_422 << 2); data |= (dsc->convert_rgb << 1); - data |= dsc->bits_per_component; + data |= input_10_bits;
DPU_REG_WRITE(c, DSC_ENC, data);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 588257897058a0b1aa47912db4fe93c6ff5e3887 ]
PHY_CMN_CLK_CFG0 register is updated by the PHY driver and by two divider clocks from Common Clock Framework: devm_clk_hw_register_divider_parent_hw(). Concurrent access by the clocks side is protected with spinlock, however driver's side in restoring state is not. Restoring state is called from msm_dsi_phy_enable(), so there could be a path leading to concurrent and conflicting updates with clock framework.
Add missing lock usage on the PHY driver side, encapsulated in its own function so the code will be still readable.
While shuffling the code, define and use PHY_CMN_CLK_CFG0 bitfields to make the code more readable and obvious.
Fixes: 1ef7c99d145c ("drm/msm/dsi: add support for 7nm DSI PHY/PLL") Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/637376/ Link: https://lore.kernel.org/r/20250214-drm-msm-phy-pll-cfg-reg-v3-1-0943b850722c... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c | 14 ++++++++++++-- .../gpu/drm/msm/registers/display/dsi_phy_7nm.xml | 5 ++++- 2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c index 031446c87daec..25ca649de717e 100644 --- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c +++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c @@ -372,6 +372,15 @@ static void dsi_pll_enable_pll_bias(struct dsi_pll_7nm *pll) ndelay(250); }
+static void dsi_pll_cmn_clk_cfg0_write(struct dsi_pll_7nm *pll, u32 val) +{ + unsigned long flags; + + spin_lock_irqsave(&pll->postdiv_lock, flags); + writel(val, pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG0); + spin_unlock_irqrestore(&pll->postdiv_lock, flags); +} + static void dsi_pll_disable_global_clk(struct dsi_pll_7nm *pll) { u32 data; @@ -574,8 +583,9 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) val |= cached->pll_out_div; writel(val, pll_7nm->phy->pll_base + REG_DSI_7nm_PHY_PLL_PLL_OUTDIV_RATE);
- writel(cached->bit_clk_div | (cached->pix_clk_div << 4), - phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG0); + dsi_pll_cmn_clk_cfg0_write(pll_7nm, + DSI_7nm_PHY_CMN_CLK_CFG0_DIV_CTRL_3_0(cached->bit_clk_div) | + DSI_7nm_PHY_CMN_CLK_CFG0_DIV_CTRL_7_4(cached->pix_clk_div));
val = readl(phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); val &= ~0x3; diff --git a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml index d54b72f924493..e0bf6e016b4ce 100644 --- a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml +++ b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml @@ -9,7 +9,10 @@ xsi:schemaLocation="https://gitlab.freedesktop.org/freedreno/ rules-fd.xsd"> <reg32 offset="0x00004" name="REVISION_ID1"/> <reg32 offset="0x00008" name="REVISION_ID2"/> <reg32 offset="0x0000c" name="REVISION_ID3"/> - <reg32 offset="0x00010" name="CLK_CFG0"/> + <reg32 offset="0x00010" name="CLK_CFG0"> + <bitfield name="DIV_CTRL_3_0" low="0" high="3" type="uint"/> + <bitfield name="DIV_CTRL_7_4" low="4" high="7" type="uint"/> + </reg32> <reg32 offset="0x00014" name="CLK_CFG1"/> <reg32 offset="0x00018" name="GLBL_CTRL"/> <reg32 offset="0x0001c" name="RBUF_CTRL"/>
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 5a97bc924ae0804b8dbf627e357acaa5ef761483 ]
PHY_CMN_CLK_CFG1 register is updated by the PHY driver and by a mux clock from Common Clock Framework: devm_clk_hw_register_mux_parent_hws(). There could be a path leading to concurrent and conflicting updates between PHY driver and clock framework, e.g. changing the mux and enabling PLL clocks.
Add dedicated spinlock to be sure all PHY_CMN_CLK_CFG1 updates are synchronized.
While shuffling the code, define and use PHY_CMN_CLK_CFG1 bitfields to make the code more readable and obvious.
Fixes: 1ef7c99d145c ("drm/msm/dsi: add support for 7nm DSI PHY/PLL") Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/637378/ Link: https://lore.kernel.org/r/20250214-drm-msm-phy-pll-cfg-reg-v3-2-0943b850722c... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c | 35 ++++++++++++------- .../drm/msm/registers/display/dsi_phy_7nm.xml | 5 ++- 2 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c index 25ca649de717e..388017db45d80 100644 --- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c +++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c @@ -83,6 +83,9 @@ struct dsi_pll_7nm { /* protects REG_DSI_7nm_PHY_CMN_CLK_CFG0 register */ spinlock_t postdiv_lock;
+ /* protects REG_DSI_7nm_PHY_CMN_CLK_CFG1 register */ + spinlock_t pclk_mux_lock; + struct pll_7nm_cached_state cached_state;
struct dsi_pll_7nm *slave; @@ -381,22 +384,32 @@ static void dsi_pll_cmn_clk_cfg0_write(struct dsi_pll_7nm *pll, u32 val) spin_unlock_irqrestore(&pll->postdiv_lock, flags); }
-static void dsi_pll_disable_global_clk(struct dsi_pll_7nm *pll) +static void dsi_pll_cmn_clk_cfg1_update(struct dsi_pll_7nm *pll, u32 mask, + u32 val) { + unsigned long flags; u32 data;
+ spin_lock_irqsave(&pll->pclk_mux_lock, flags); data = readl(pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); - writel(data & ~BIT(5), pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); + data &= ~mask; + data |= val & mask; + + writel(data, pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); + spin_unlock_irqrestore(&pll->pclk_mux_lock, flags); +} + +static void dsi_pll_disable_global_clk(struct dsi_pll_7nm *pll) +{ + dsi_pll_cmn_clk_cfg1_update(pll, DSI_7nm_PHY_CMN_CLK_CFG1_CLK_EN, 0); }
static void dsi_pll_enable_global_clk(struct dsi_pll_7nm *pll) { - u32 data; + u32 cfg_1 = DSI_7nm_PHY_CMN_CLK_CFG1_CLK_EN | DSI_7nm_PHY_CMN_CLK_CFG1_CLK_EN_SEL;
writel(0x04, pll->phy->base + REG_DSI_7nm_PHY_CMN_CTRL_3); - - data = readl(pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); - writel(data | BIT(5) | BIT(4), pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); + dsi_pll_cmn_clk_cfg1_update(pll, cfg_1, cfg_1); }
static void dsi_pll_phy_dig_reset(struct dsi_pll_7nm *pll) @@ -574,7 +587,6 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) { struct dsi_pll_7nm *pll_7nm = to_pll_7nm(phy->vco_hw); struct pll_7nm_cached_state *cached = &pll_7nm->cached_state; - void __iomem *phy_base = pll_7nm->phy->base; u32 val; int ret;
@@ -586,11 +598,7 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) dsi_pll_cmn_clk_cfg0_write(pll_7nm, DSI_7nm_PHY_CMN_CLK_CFG0_DIV_CTRL_3_0(cached->bit_clk_div) | DSI_7nm_PHY_CMN_CLK_CFG0_DIV_CTRL_7_4(cached->pix_clk_div)); - - val = readl(phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); - val &= ~0x3; - val |= cached->pll_mux; - writel(val, phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); + dsi_pll_cmn_clk_cfg1_update(pll_7nm, 0x3, cached->pll_mux);
ret = dsi_pll_7nm_vco_set_rate(phy->vco_hw, pll_7nm->vco_current_rate, @@ -743,7 +751,7 @@ static int pll_7nm_register(struct dsi_pll_7nm *pll_7nm, struct clk_hw **provide pll_by_2_bit, }), 2, 0, pll_7nm->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1, - 0, 1, 0, NULL); + 0, 1, 0, &pll_7nm->pclk_mux_lock); if (IS_ERR(hw)) { ret = PTR_ERR(hw); goto fail; @@ -788,6 +796,7 @@ static int dsi_pll_7nm_init(struct msm_dsi_phy *phy) pll_7nm_list[phy->id] = pll_7nm;
spin_lock_init(&pll_7nm->postdiv_lock); + spin_lock_init(&pll_7nm->pclk_mux_lock);
pll_7nm->phy = phy;
diff --git a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml index e0bf6e016b4ce..cfaf78c028b13 100644 --- a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml +++ b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml @@ -13,7 +13,10 @@ xsi:schemaLocation="https://gitlab.freedesktop.org/freedreno/ rules-fd.xsd"> <bitfield name="DIV_CTRL_3_0" low="0" high="3" type="uint"/> <bitfield name="DIV_CTRL_7_4" low="4" high="7" type="uint"/> </reg32> - <reg32 offset="0x00014" name="CLK_CFG1"/> + <reg32 offset="0x00014" name="CLK_CFG1"> + <bitfield name="CLK_EN" pos="5" type="boolean"/> + <bitfield name="CLK_EN_SEL" pos="4" type="boolean"/> + </reg32> <reg32 offset="0x00018" name="GLBL_CTRL"/> <reg32 offset="0x0001c" name="RBUF_CTRL"/> <reg32 offset="0x00020" name="VREG_CTRL_0"/>
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 73f69c6be2a9f22c31c775ec03c6c286bfe12cfa ]
PHY_CMN_CLK_CFG1 register has four fields being used in the driver: DSI clock divider, source of bitclk and two for enabling the DSI PHY PLL clocks.
dsi_7nm_set_usecase() sets only the source of bitclk, so should leave all other bits untouched. Use newly introduced dsi_pll_cmn_clk_cfg1_update() to update respective bits without overwriting the rest.
While shuffling the code, define and use PHY_CMN_CLK_CFG1 bitfields to make the code more readable and obvious.
Fixes: 1ef7c99d145c ("drm/msm/dsi: add support for 7nm DSI PHY/PLL") Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/637380/ Link: https://lore.kernel.org/r/20250214-drm-msm-phy-pll-cfg-reg-v3-3-0943b850722c... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c | 4 ++-- drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml | 1 + 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c index 388017db45d80..798168180c1ab 100644 --- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c +++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c @@ -617,7 +617,6 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) static int dsi_7nm_set_usecase(struct msm_dsi_phy *phy) { struct dsi_pll_7nm *pll_7nm = to_pll_7nm(phy->vco_hw); - void __iomem *base = phy->base; u32 data = 0x0; /* internal PLL */
DBG("DSI PLL%d", pll_7nm->phy->id); @@ -636,7 +635,8 @@ static int dsi_7nm_set_usecase(struct msm_dsi_phy *phy) }
/* set PLL src */ - writel(data << 2, base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); + dsi_pll_cmn_clk_cfg1_update(pll_7nm, DSI_7nm_PHY_CMN_CLK_CFG1_BITCLK_SEL__MASK, + DSI_7nm_PHY_CMN_CLK_CFG1_BITCLK_SEL(data));
return 0; } diff --git a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml index cfaf78c028b13..35f7f40e405b7 100644 --- a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml +++ b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml @@ -16,6 +16,7 @@ xsi:schemaLocation="https://gitlab.freedesktop.org/freedreno/ rules-fd.xsd"> <reg32 offset="0x00014" name="CLK_CFG1"> <bitfield name="CLK_EN" pos="5" type="boolean"/> <bitfield name="CLK_EN_SEL" pos="4" type="boolean"/> + <bitfield name="BITCLK_SEL" low="2" high="3" type="uint"/> </reg32> <reg32 offset="0x00018" name="GLBL_CTRL"/> <reg32 offset="0x0001c" name="RBUF_CTRL"/>
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
[ Upstream commit cd513e0434c3e736c549bc99bf7982658b25114d ]
When compiling with W=1, a warning result for the function nvme_tcp_set_queue_io_cpu():
host/tcp.c:1578: warning: Function parameter or struct member 'queue' not described in 'nvme_tcp_set_queue_io_cpu' host/tcp.c:1578: warning: expecting prototype for Track the number of queues assigned to each cpu using a global per(). Prototype was for nvme_tcp_set_queue_io_cpu() instead
Avoid this warning by using the regular comment format for the function nvme_tcp_set_queue_io_cpu() instead of the kdoc comment format.
Fixes: 32193789878c ("nvme-tcp: Fix I/O queue cpu spreading for multiple controllers") Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 8305d3c128074..34eb3dabdc8a6 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1565,7 +1565,7 @@ static bool nvme_tcp_poll_queue(struct nvme_tcp_queue *queue) ctrl->io_queues[HCTX_TYPE_POLL]; }
-/** +/* * Track the number of queues assigned to each cpu using a global per-cpu * counter and select the least used cpu from the mq_map. Our goal is to spread * different controllers I/O threads across different cpu cores.
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Caleb Sander Mateos csander@purestorage.com
[ Upstream commit 578539e0969028f711c34d9a4565931edfe1d730 ]
nvme_tcp_init_connection() attempts to receive an ICResp PDU but only checks that the return value from recvmsg() is non-negative. If the sender closes the TCP connection or sends fewer than 128 bytes, this check will pass even though the full PDU wasn't received.
Ensure the full ICResp PDU is received by checking that recvmsg() returns the expected 128 bytes.
Additionally set the MSG_WAITALL flag for recvmsg(), as a sender could split the ICResp over multiple TCP frames. Without MSG_WAITALL, recvmsg() could return prematurely with only part of the PDU.
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Caleb Sander Mateos csander@purestorage.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/tcp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 34eb3dabdc8a6..840ae475074d0 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1449,11 +1449,14 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue) msg.msg_control = cbuf; msg.msg_controllen = sizeof(cbuf); } + msg.msg_flags = MSG_WAITALL; ret = kernel_recvmsg(queue->sock, &msg, &iov, 1, iov.iov_len, msg.msg_flags); - if (ret < 0) { + if (ret < sizeof(*icresp)) { pr_warn("queue %d: failed to receive icresp, error %d\n", nvme_tcp_queue_id(queue), ret); + if (ret >= 0) + ret = -ECONNRESET; goto free_icresp; } ret = -ENOTCONN;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Caleb Sander Mateos csander@purestorage.com
[ Upstream commit 487a3ea7b1b8ba2ca7d2c2bb3c3594dc360d6261 ]
nvme_validate_passthru_nsid() logs an err message whose format string is split over 2 lines. There is a missing space between the two pieces, resulting in log lines like "... does not match nsid (1)of namespace". Add the missing space between ")" and "of". Also combine the format string pieces onto a single line to make the err message easier to grep.
Fixes: e7d4b5493a2d ("nvme: factor out a nvme_validate_passthru_nsid helper") Signed-off-by: Caleb Sander Mateos csander@purestorage.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/ioctl.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index a96976b22fa79..61af1583356c2 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -276,8 +276,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, { if (ns && nsid != ns->head->ns_id) { dev_err(ctrl->device, - "%s: nsid (%u) in cmd does not match nsid (%u)" - "of namespace\n", + "%s: nsid (%u) in cmd does not match nsid (%u) of namespace\n", current->comm, nsid, ns->head->ns_id); return false; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yan Zhai yan@cloudflare.com
[ Upstream commit 5644c6b50ffee0a56c1e01430a8c88e34decb120 ]
The generic_map_lookup_batch currently returns EINTR if it fails with ENOENT and retries several times on bpf_map_copy_value. The next batch would start from the same location, presuming it's a transient issue. This is incorrect if a map can actually have "holes", i.e. "get_next_key" can return a key that does not point to a valid value. At least the array of maps type may contain such holes legitly. Right now these holes show up, generic batch lookup cannot proceed any more. It will always fail with EINTR errors.
Rather, do not retry in generic_map_lookup_batch. If it finds a non existing element, skip to the next key. This simple solution comes with a price that transient errors may not be recovered, and the iteration might cycle back to the first key under parallel deletion. For example, Hou Tao houtao@huaweicloud.com pointed out a following scenario:
For LPM trie map: (1) ->map_get_next_key(map, prev_key, key) returns a valid key
(2) bpf_map_copy_value() return -ENOMENT It means the key must be deleted concurrently.
(3) goto next_key It swaps the prev_key and key
(4) ->map_get_next_key(map, prev_key, key) again prev_key points to a non-existing key, for LPM trie it will treat just like prev_key=NULL case, the returned key will be duplicated.
With the retry logic, the iteration can continue to the key next to the deleted one. But if we directly skip to the next key, the iteration loop would restart from the first key for the lpm_trie type.
However, not all races may be recovered. For example, if current key is deleted after instead of before bpf_map_copy_value, or if the prev_key also gets deleted, then the loop will still restart from the first key for lpm_tire anyway. For generic lookup it might be better to stay simple, i.e. just skip to the next key. To guarantee that the output keys are not duplicated, it is better to implement map type specific batch operations, which can properly lock the trie and synchronize with concurrent mutators.
Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op") Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/ Signed-off-by: Yan Zhai yan@cloudflare.com Acked-by: Hou Tao houtao1@huawei.com Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.173917159... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 3200372ea28ce..696e5a2cbea2e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1872,8 +1872,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, return err; }
-#define MAP_LOOKUP_RETRIES 3 - int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) @@ -1883,8 +1881,8 @@ int generic_map_lookup_batch(struct bpf_map *map, void __user *values = u64_to_user_ptr(attr->batch.values); void __user *keys = u64_to_user_ptr(attr->batch.keys); void *buf, *buf_prevkey, *prev_key, *key, *value; - int err, retry = MAP_LOOKUP_RETRIES; u32 value_size, cp, max_count; + int err;
if (attr->batch.elem_flags & ~BPF_F_LOCK) return -EINVAL; @@ -1930,14 +1928,8 @@ int generic_map_lookup_batch(struct bpf_map *map, err = bpf_map_copy_value(map, key, value, attr->batch.elem_flags);
- if (err == -ENOENT) { - if (retry) { - retry--; - continue; - } - err = -EINTR; - break; - } + if (err == -ENOENT) + goto next_key;
if (err) goto free_buf; @@ -1952,12 +1944,12 @@ int generic_map_lookup_batch(struct bpf_map *map, goto free_buf; }
+ cp++; +next_key: if (!prev_key) prev_key = buf_prevkey;
swap(prev_key, key); - retry = MAP_LOOKUP_RETRIES; - cp++; cond_resched(); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaron Kling webgeek1234@gmail.com
[ Upstream commit 3dbc0215e3c502a9f3221576da0fdc9847fb9721 ]
Most kernel configs enable multiple Tegra SoC generations, causing this typo to go unnoticed. But in the case where a kernel config is strictly for Tegra186, this is a problem.
Fixes: 989863d7cbe5 ("drm/nouveau/pmu: select implementation based on available firmware") Signed-off-by: Aaron Kling webgeek1234@gmail.com Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20250218-nouveau-gm10b-guard-v... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c index a6f410ba60bc9..d393bc540f862 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c @@ -75,7 +75,7 @@ gp10b_pmu_acr = { .bootstrap_multiple_falcons = gp10b_pmu_acr_bootstrap_multiple_falcons, };
-#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) +#if IS_ENABLED(CONFIG_ARCH_TEGRA_186_SOC) MODULE_FIRMWARE("nvidia/gp10b/pmu/desc.bin"); MODULE_FIRMWARE("nvidia/gp10b/pmu/image.bin"); MODULE_FIRMWARE("nvidia/gp10b/pmu/sig.bin");
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artur Rojek contact@artur-rojek.eu
[ Upstream commit d7e3fd658248f257006227285095d190e70ee73a ]
The jcore-aic irqchip does not have separate interrupt numbers reserved for cpu-local vs global interrupts. Therefore the device drivers need to request the given interrupt as per CPU interrupt.
69a9dcbd2d65 ("clocksource/drivers/jcore: Use request_percpu_irq()") converted the clocksource driver over to request_percpu_irq(), but failed to do add all the required changes, resulting in a failure to register PIT interrupts.
Fix this by:
1) Explicitly mark the interrupt via irq_set_percpu_devid() in jcore_pit_init().
2) Enable and disable the per CPU interrupt in the CPU hotplug callbacks.
3) Pass the correct per-cpu cookie to the irq handler by using handle_percpu_devid_irq() instead of handle_percpu_irq() in handle_jcore_irq().
[ tglx: Massage change log ]
Fixes: 69a9dcbd2d65 ("clocksource/drivers/jcore: Use request_percpu_irq()") Signed-off-by: Artur Rojek contact@artur-rojek.eu Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Uros Bizjak ubizjak@gmail.com Link: https://lore.kernel.org/all/20250216175545.35079-3-contact@artur-rojek.eu Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clocksource/jcore-pit.c | 15 ++++++++++++++- drivers/irqchip/irq-jcore-aic.c | 2 +- 2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/clocksource/jcore-pit.c b/drivers/clocksource/jcore-pit.c index a3fe98cd38382..82815428f8f92 100644 --- a/drivers/clocksource/jcore-pit.c +++ b/drivers/clocksource/jcore-pit.c @@ -114,6 +114,18 @@ static int jcore_pit_local_init(unsigned cpu) pit->periodic_delta = DIV_ROUND_CLOSEST(NSEC_PER_SEC, HZ * buspd);
clockevents_config_and_register(&pit->ced, freq, 1, ULONG_MAX); + enable_percpu_irq(pit->ced.irq, IRQ_TYPE_NONE); + + return 0; +} + +static int jcore_pit_local_teardown(unsigned cpu) +{ + struct jcore_pit *pit = this_cpu_ptr(jcore_pit_percpu); + + pr_info("Local J-Core PIT teardown on cpu %u\n", cpu); + + disable_percpu_irq(pit->ced.irq);
return 0; } @@ -168,6 +180,7 @@ static int __init jcore_pit_init(struct device_node *node) return -ENOMEM; }
+ irq_set_percpu_devid(pit_irq); err = request_percpu_irq(pit_irq, jcore_timer_interrupt, "jcore_pit", jcore_pit_percpu); if (err) { @@ -237,7 +250,7 @@ static int __init jcore_pit_init(struct device_node *node)
cpuhp_setup_state(CPUHP_AP_JCORE_TIMER_STARTING, "clockevents/jcore:starting", - jcore_pit_local_init, NULL); + jcore_pit_local_init, jcore_pit_local_teardown);
return 0; } diff --git a/drivers/irqchip/irq-jcore-aic.c b/drivers/irqchip/irq-jcore-aic.c index b9dcc8e78c750..1f613eb7b7f03 100644 --- a/drivers/irqchip/irq-jcore-aic.c +++ b/drivers/irqchip/irq-jcore-aic.c @@ -38,7 +38,7 @@ static struct irq_chip jcore_aic; static void handle_jcore_irq(struct irq_desc *desc) { if (irqd_is_per_cpu(irq_desc_get_irq_data(desc))) - handle_percpu_irq(desc); + handle_percpu_devid_irq(desc); else handle_simple_irq(desc); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit a8972d5a49b408248294b5ecbdd0a085e4726349 upstream.
In jadard_prepare() a reset pulse is generated with the following statements (delays ommited for clarity):
gpiod_set_value(jadard->reset, 1); --> Deassert reset gpiod_set_value(jadard->reset, 0); --> Assert reset for 10ms gpiod_set_value(jadard->reset, 1); --> Deassert reset
However, specifying second argument of "0" to gpiod_set_value() means to deassert the GPIO, and "1" means to assert it. If the reset signal is defined as GPIO_ACTIVE_LOW in the DTS, the above statements will incorrectly generate the reset pulse (inverted) and leave it asserted (LOW) at the end of jadard_prepare().
Fix reset behavior by inverting gpiod_set_value() second argument in jadard_prepare(). Also modify second argument to devm_gpiod_get() in jadard_dsi_probe() to assert the reset when probing.
Do not modify it in jadard_unprepare() as it is already properly asserted with "1", which seems to be the intended behavior.
Fixes: 6b818c533dd8 ("drm: panel: Add Jadard JD9365DA-H3 DSI panel") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20240927135306.857617-1-hugo@hugovil.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240927135306.857617-1-hugo@h... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c +++ b/drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c @@ -109,13 +109,13 @@ static int jadard_prepare(struct drm_pan if (jadard->desc->lp11_to_reset_delay_ms) msleep(jadard->desc->lp11_to_reset_delay_ms);
- gpiod_set_value(jadard->reset, 1); + gpiod_set_value(jadard->reset, 0); msleep(5);
- gpiod_set_value(jadard->reset, 0); + gpiod_set_value(jadard->reset, 1); msleep(10);
- gpiod_set_value(jadard->reset, 1); + gpiod_set_value(jadard->reset, 0); msleep(130);
ret = jadard->desc->init(jadard); @@ -1130,7 +1130,7 @@ static int jadard_dsi_probe(struct mipi_ dsi->format = desc->format; dsi->lanes = desc->lanes;
- jadard->reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_LOW); + jadard->reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH); if (IS_ERR(jadard->reset)) { DRM_DEV_ERROR(&dsi->dev, "failed to get our reset GPIO\n"); return PTR_ERR(jadard->reset);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jessica Zhang quic_jesszhan@quicinc.com
commit f063ac6b55df03ed25996bdc84d9e1c50147cfa1 upstream.
Disable pingpong dither in dpu_encoder_helper_phys_cleanup().
This avoids the issue where an encoder unknowingly uses dither after reserving a pingpong block that was previously bound to an encoder that had enabled dither.
Cc: stable@vger.kernel.org Reported-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Closes: https://lore.kernel.org/all/jr7zbj5w7iq4apg3gofuvcwf4r2swzqjk7sshwcdjll4mn6c... Signed-off-by: Jessica Zhang quic_jesszhan@quicinc.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Fixes: 3c128638a07d ("drm/msm/dpu: add support for dither block in display") Patchwork: https://patchwork.freedesktop.org/patch/636517/ Link: https://lore.kernel.org/r/20250211-dither-disable-v1-1-ac2cb455f6b9@quicinc.... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c @@ -2125,6 +2125,9 @@ void dpu_encoder_helper_phys_cleanup(str } }
+ if (phys_enc->hw_pp && phys_enc->hw_pp->ops.setup_dither) + phys_enc->hw_pp->ops.setup_dither(phys_enc->hw_pp, NULL); + /* reset the merge 3D HW block */ if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) { phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 07fb70d82e0df085980246bf17bc12537588795f upstream.
Any active plane needs to have its crtc included in the atomic state. For planes enabled via uapi that is all handler in the core. But when we use a plane for joiner the uapi code things the plane is disabled and therefore doesn't have a crtc. So we need to pull those in by hand. We do it first thing in intel_joiner_add_affected_crtcs() so that any newly added crtc will subsequently pull in all of its joined crtcs as well.
The symptoms from failing to do this are: - duct tape in the form of commit 1d5b09f8daf8 ("drm/i915: Fix NULL ptr deref by checking new_crtc_state") - the plane's hw state will get overwritten by the disabled uapi state if it can't find the uapi counterpart plane in the atomic state from where it should copy the correct state
Cc: stable@vger.kernel.org Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250212164330.16891-2-ville.s... (cherry picked from commit 91077d1deb5374eb8be00fb391710f00e751dc4b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -6369,12 +6369,30 @@ static int intel_async_flip_check_hw(str static int intel_joiner_add_affected_crtcs(struct intel_atomic_state *state) { struct drm_i915_private *i915 = to_i915(state->base.dev); + const struct intel_plane_state *plane_state; struct intel_crtc_state *crtc_state; + struct intel_plane *plane; struct intel_crtc *crtc; u8 affected_pipes = 0; u8 modeset_pipes = 0; int i;
+ /* + * Any plane which is in use by the joiner needs its crtc. + * Pull those in first as this will not have happened yet + * if the plane remains disabled according to uapi. + */ + for_each_new_intel_plane_in_state(state, plane, plane_state, i) { + crtc = to_intel_crtc(plane_state->hw.crtc); + if (!crtc) + continue; + + crtc_state = intel_atomic_get_crtc_state(&state->base, crtc); + if (IS_ERR(crtc_state)) + return PTR_ERR(crtc_state); + } + + /* Now pull in all joined crtcs */ for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) { affected_pipes |= crtc_state->joiner_pipes; if (intel_crtc_needs_modeset(crtc_state))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit b9275eabe31e6679ae12c46a4a0a18d622db4570 upstream.
At the end of a 128b/132b link training sequence, the HW expects the transcoder training pattern to be set to TPS2 and from that to normal mode (disabling the training pattern). Transitioning from TPS1 directly to normal mode leaves the transcoder in a stuck state, resulting in page-flip timeouts later in the modeset sequence.
Atm, in case of a failure during link training, the transcoder may be still set to output the TPS1 pattern. Later the transcoder is then set from TPS1 directly to normal mode in intel_dp_stop_link_train(), leading to modeset failures later as described above. Fix this by setting the training patter to TPS2, if the link training failed at any point.
The clue in the specification about the above HW behavior is the explicit mention that TPS2 must be set after the link training sequence (and there isn't a similar requirement specified for the 8b/10b link training), see the Bspec links below.
v2: Add bspec aspect/link to the commit log. (Jani)
Bspec: 54128, 65448, 68849 Cc: stable@vger.kernel.org # v5.18+ Cc: Jani Nikula jani.nikula@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Acked-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250217223828.1166093-2-imre.... Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com (cherry picked from commit 8b4bbaf8ddc1f68f3ee96a706f65fdb1bcd9d355) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_dp_link_training.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c +++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c @@ -1561,7 +1561,7 @@ intel_dp_128b132b_link_train(struct inte
if (wait_for(intel_dp_128b132b_intra_hop(intel_dp, crtc_state) == 0, 500)) { lt_err(intel_dp, DP_PHY_DPRX, "128b/132b intra-hop not clear\n"); - return false; + goto out; }
if (intel_dp_128b132b_lane_eq(intel_dp, crtc_state) && @@ -1573,6 +1573,19 @@ intel_dp_128b132b_link_train(struct inte passed ? "passed" : "failed", crtc_state->port_clock, crtc_state->lane_count);
+out: + /* + * Ensure that the training pattern does get set to TPS2 even in case + * of a failure, as is the case at the end of a passing link training + * and what is expected by the transcoder. Leaving TPS1 set (and + * disabling the link train mode in DP_TP_CTL later from TPS1 directly) + * would result in a stuck transcoder HW state and flip-done timeouts + * later in the modeset sequence. + */ + if (!passed) + intel_dp_program_link_training_pattern(intel_dp, crtc_state, + DP_PHY_DPRX, DP_TRAINING_PATTERN_2); + return passed; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit 166ce267ae3f96e439d8ccc838e8ec4d8b4dab73 upstream.
Fix the port width programming in the DDI_BUF_CTL register on MTLP+, where this had an off-by-one error.
Cc: stable@vger.kernel.org # v6.5+ Fixes: b66a8abaa48a ("drm/i915/display/mtl: Fill port width in DDI_BUF_/TRANS_DDI_FUNC_/PORT_BUF_CTL for HDMI") Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250214142001.552916-3-imre.d... (cherry picked from commit b2ecdabe46d23db275f94cd7c46ca414a144818b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_ddi.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -3363,7 +3363,7 @@ static void intel_enable_ddi_hdmi(struct intel_de_rmw(dev_priv, XELPDP_PORT_BUF_CTL1(dev_priv, port), XELPDP_PORT_WIDTH_MASK | XELPDP_PORT_REVERSAL, port_buf);
- buf_ctl |= DDI_PORT_WIDTH(lane_count); + buf_ctl |= DDI_PORT_WIDTH(crtc_state->lane_count);
if (DISPLAY_VER(dev_priv) >= 20) buf_ctl |= XE2LPD_DDI_BUF_D2D_LINK_ENABLE; --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3869,7 +3869,7 @@ enum skl_power_gate { #define DDI_BUF_IS_IDLE (1 << 7) #define DDI_BUF_CTL_TC_PHY_OWNERSHIP REG_BIT(6) #define DDI_A_4_LANES (1 << 4) -#define DDI_PORT_WIDTH(width) (((width) - 1) << 1) +#define DDI_PORT_WIDTH(width) (((width) == 3 ? 4 : ((width) - 1)) << 1) #define DDI_PORT_WIDTH_MASK (7 << 1) #define DDI_PORT_WIDTH_SHIFT 1 #define DDI_INIT_DISPLAY_DETECTED (1 << 0)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Karas krzysztof.karas@intel.com
commit e49477f7f78598295551d486ecc7f020d796432e upstream.
spin_lock/unlock() functions used in interrupt contexts could result in a deadlock, as seen in GitLab issue #13399, which occurs when interrupt comes in while holding a lock.
Try to remedy the problem by saving irq state before spin lock acquisition.
v2: add irqs' state save/restore calls to all locks/unlocks in signal_irq_work() execution (Maciej)
v3: use with spin_lock_irqsave() in guc_lrc_desc_unpin() instead of other lock/unlock calls and add Fixes and Cc tags (Tvrtko); change title and commit message
Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13399 Signed-off-by: Krzysztof Karas krzysztof.karas@intel.com Cc: stable@vger.kernel.org # v6.9+ Reviewed-by: Maciej Patelczyk maciej.patelczyk@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/pusppq5ybyszau2oocboj3mtj5x574... (cherry picked from commit c088387ddd6482b40f21ccf23db1125e8fa4af7e) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -3425,10 +3425,10 @@ static inline int guc_lrc_desc_unpin(str */ ret = deregister_context(ce, ce->guc_id.id); if (ret) { - spin_lock(&ce->guc_state.lock); + spin_lock_irqsave(&ce->guc_state.lock, flags); set_context_registered(ce); clr_context_destroyed(ce); - spin_unlock(&ce->guc_state.lock); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); /* * As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements * the wakeref immediately but per function spec usage call this after unlock.
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit 67b0025d19f99fb9fbb8b62e6975553c183f3a16 upstream.
At the moment we can't sanely handle queuing an async request from a multishot context, so disable them. It shouldn't matter as pollable files / socekts don't normally do async.
Patching it in __io_read() is not the cleanest way, but it's simpler than other options, so let's fix it there and clean up on top.
Cc: stable@vger.kernel.org Reported-by: chase xd sl1589472800@gmail.com Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/7d51732c125159d17db4fe16f51ec41b936973f8.173991903... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/rw.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -862,7 +862,15 @@ static int __io_read(struct io_kiocb *re if (unlikely(ret)) return ret;
- ret = io_iter_do_read(rw, &io->iter); + if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) { + void *cb_copy = rw->kiocb.ki_complete; + + rw->kiocb.ki_complete = NULL; + ret = io_iter_do_read(rw, &io->iter); + rw->kiocb.ki_complete = cb_copy; + } else { + ret = io_iter_do_read(rw, &io->iter); + }
/* * Some file systems like to return -EOPNOTSUPP for an IOCB_NOWAIT @@ -887,7 +895,8 @@ static int __io_read(struct io_kiocb *re } else if (ret == -EIOCBQUEUED) { return IOU_ISSUE_SKIP_COMPLETE; } else if (ret == req->cqe.res || ret <= 0 || !force_nonblock || - (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) { + (req->flags & REQ_F_NOWAIT) || !need_complete_io(req) || + (issue_flags & IO_URING_F_MULTISHOT)) { /* read all, failed, already did sync or don't want to retry */ goto done; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit 1e988c3fe1264708f4f92109203ac5b1d65de50b upstream.
sqe->opcode is used for different tables, make sure we santitise it against speculations.
Cc: stable@vger.kernel.org Fixes: d3656344fea03 ("io_uring: add lookup table for various opcode needs") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Li Zetao lizetao1@huawei.com Link: https://lore.kernel.org/r/7eddbf31c8ca0a3947f8ed98271acc2b4349c016.173956840... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2053,6 +2053,8 @@ static int io_init_req(struct io_ring_ct req->opcode = 0; return io_init_fail_req(req, -EINVAL); } + opcode = array_index_nospec(opcode, IORING_OP_LAST); + def = &io_issue_defs[opcode]; if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) { /* enforce forwards compatibility on users */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
commit 9d846b1aebbe488f245f1aa463802ff9c34cc078 upstream.
As per the API contract - gpio_chip::get_direction() may fail and return a negative error number. However, we treat it as if it always returned 0 or 1. Check the return value of the callback and propagate the error number up the stack.
Cc: stable@vger.kernel.org Reviewed-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20250210-gpio-sanitize-retvals-v1-1-12ea88506cb2@l... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib.c | 44 +++++++++++++++++++++++++++++--------------- 1 file changed, 29 insertions(+), 15 deletions(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1059,8 +1059,11 @@ int gpiochip_add_data_with_key(struct gp struct gpio_desc *desc = &gdev->descs[desc_index];
if (gc->get_direction && gpiochip_line_is_valid(gc, desc_index)) { - assign_bit(FLAG_IS_OUT, - &desc->flags, !gc->get_direction(gc, desc_index)); + ret = gc->get_direction(gc, desc_index); + if (ret < 0) + goto err_cleanup_desc_srcu; + + assign_bit(FLAG_IS_OUT, &desc->flags, !ret); } else { assign_bit(FLAG_IS_OUT, &desc->flags, !gc->direction_input); @@ -2702,13 +2705,18 @@ int gpiod_direction_input(struct gpio_de if (guard.gc->direction_input) { ret = guard.gc->direction_input(guard.gc, gpio_chip_hwgpio(desc)); - } else if (guard.gc->get_direction && - (guard.gc->get_direction(guard.gc, - gpio_chip_hwgpio(desc)) != 1)) { - gpiod_warn(desc, - "%s: missing direction_input() operation and line is output\n", - __func__); - return -EIO; + } else if (guard.gc->get_direction) { + ret = guard.gc->get_direction(guard.gc, + gpio_chip_hwgpio(desc)); + if (ret < 0) + return ret; + + if (ret != GPIO_LINE_DIRECTION_IN) { + gpiod_warn(desc, + "%s: missing direction_input() operation and line is output\n", + __func__); + return -EIO; + } } if (ret == 0) { clear_bit(FLAG_IS_OUT, &desc->flags); @@ -2746,12 +2754,18 @@ static int gpiod_direction_output_raw_co gpio_chip_hwgpio(desc), val); } else { /* Check that we are in output mode if we can */ - if (guard.gc->get_direction && - guard.gc->get_direction(guard.gc, gpio_chip_hwgpio(desc))) { - gpiod_warn(desc, - "%s: missing direction_output() operation\n", - __func__); - return -EIO; + if (guard.gc->get_direction) { + ret = guard.gc->get_direction(guard.gc, + gpio_chip_hwgpio(desc)); + if (ret < 0) + return ret; + + if (ret != GPIO_LINE_DIRECTION_OUT) { + gpiod_warn(desc, + "%s: missing direction_output() operation\n", + __func__); + return -EIO; + } } /* * If we can't actively set the direction, we are some
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
commit 81570d6a7ad37033c7895811551a5a9023706eda upstream.
During the locking rework in GPIOLIB, we omitted one important use-case, namely: setting and getting values for GPIO descriptor arrays with array_info present.
This patch does two things: first it makes struct gpio_array store the address of the underlying GPIO device and not chip. Next: it protects the chip with SRCU from removal in gpiod_get_array_value_complex() and gpiod_set_array_value_complex().
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250215095655.23152-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib.c | 48 +++++++++++++++++++++++++++++++++--------------- drivers/gpio/gpiolib.h | 4 ++-- 2 files changed, 35 insertions(+), 17 deletions(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -3082,6 +3082,8 @@ static int gpiod_get_raw_value_commit(co static int gpio_chip_get_multiple(struct gpio_chip *gc, unsigned long *mask, unsigned long *bits) { + lockdep_assert_held(&gc->gpiodev->srcu); + if (gc->get_multiple) return gc->get_multiple(gc, mask, bits); if (gc->get) { @@ -3112,6 +3114,7 @@ int gpiod_get_array_value_complex(bool r struct gpio_array *array_info, unsigned long *value_bitmap) { + struct gpio_chip *gc; int ret, i = 0;
/* @@ -3123,10 +3126,15 @@ int gpiod_get_array_value_complex(bool r array_size <= array_info->size && (void *)array_info == desc_array + array_info->size) { if (!can_sleep) - WARN_ON(array_info->chip->can_sleep); + WARN_ON(array_info->gdev->can_sleep);
- ret = gpio_chip_get_multiple(array_info->chip, - array_info->get_mask, + guard(srcu)(&array_info->gdev->srcu); + gc = srcu_dereference(array_info->gdev->chip, + &array_info->gdev->srcu); + if (!gc) + return -ENODEV; + + ret = gpio_chip_get_multiple(gc, array_info->get_mask, value_bitmap); if (ret) return ret; @@ -3407,6 +3415,8 @@ static void gpiod_set_raw_value_commit(s static void gpio_chip_set_multiple(struct gpio_chip *gc, unsigned long *mask, unsigned long *bits) { + lockdep_assert_held(&gc->gpiodev->srcu); + if (gc->set_multiple) { gc->set_multiple(gc, mask, bits); } else { @@ -3424,6 +3434,7 @@ int gpiod_set_array_value_complex(bool r struct gpio_array *array_info, unsigned long *value_bitmap) { + struct gpio_chip *gc; int i = 0;
/* @@ -3435,14 +3446,19 @@ int gpiod_set_array_value_complex(bool r array_size <= array_info->size && (void *)array_info == desc_array + array_info->size) { if (!can_sleep) - WARN_ON(array_info->chip->can_sleep); + WARN_ON(array_info->gdev->can_sleep); + + guard(srcu)(&array_info->gdev->srcu); + gc = srcu_dereference(array_info->gdev->chip, + &array_info->gdev->srcu); + if (!gc) + return -ENODEV;
if (!raw && !bitmap_empty(array_info->invert_mask, array_size)) bitmap_xor(value_bitmap, value_bitmap, array_info->invert_mask, array_size);
- gpio_chip_set_multiple(array_info->chip, array_info->set_mask, - value_bitmap); + gpio_chip_set_multiple(gc, array_info->set_mask, value_bitmap);
i = find_first_zero_bit(array_info->set_mask, array_size); if (i == array_size) @@ -4698,9 +4714,10 @@ struct gpio_descs *__must_check gpiod_ge { struct gpio_desc *desc; struct gpio_descs *descs; + struct gpio_device *gdev; struct gpio_array *array_info = NULL; - struct gpio_chip *gc; int count, bitmap_size; + unsigned long dflags; size_t descs_size;
count = gpiod_count(dev, con_id); @@ -4721,7 +4738,7 @@ struct gpio_descs *__must_check gpiod_ge
descs->desc[descs->ndescs] = desc;
- gc = gpiod_to_chip(desc); + gdev = gpiod_to_gpio_device(desc); /* * If pin hardware number of array member 0 is also 0, select * its chip as a candidate for fast bitmap processing path. @@ -4729,8 +4746,8 @@ struct gpio_descs *__must_check gpiod_ge if (descs->ndescs == 0 && gpio_chip_hwgpio(desc) == 0) { struct gpio_descs *array;
- bitmap_size = BITS_TO_LONGS(gc->ngpio > count ? - gc->ngpio : count); + bitmap_size = BITS_TO_LONGS(gdev->ngpio > count ? + gdev->ngpio : count);
array = krealloc(descs, descs_size + struct_size(array_info, invert_mask, 3 * bitmap_size), @@ -4750,7 +4767,7 @@ struct gpio_descs *__must_check gpiod_ge
array_info->desc = descs->desc; array_info->size = count; - array_info->chip = gc; + array_info->gdev = gdev; bitmap_set(array_info->get_mask, descs->ndescs, count - descs->ndescs); bitmap_set(array_info->set_mask, descs->ndescs, @@ -4763,7 +4780,7 @@ struct gpio_descs *__must_check gpiod_ge continue;
/* Unmark array members which don't belong to the 'fast' chip */ - if (array_info->chip != gc) { + if (array_info->gdev != gdev) { __clear_bit(descs->ndescs, array_info->get_mask); __clear_bit(descs->ndescs, array_info->set_mask); } @@ -4786,9 +4803,10 @@ struct gpio_descs *__must_check gpiod_ge array_info->set_mask); } } else { + dflags = READ_ONCE(desc->flags); /* Exclude open drain or open source from fast output */ - if (gpiochip_line_is_open_drain(gc, descs->ndescs) || - gpiochip_line_is_open_source(gc, descs->ndescs)) + if (test_bit(FLAG_OPEN_DRAIN, &dflags) || + test_bit(FLAG_OPEN_SOURCE, &dflags)) __clear_bit(descs->ndescs, array_info->set_mask); /* Identify 'fast' pins which require invertion */ @@ -4800,7 +4818,7 @@ struct gpio_descs *__must_check gpiod_ge if (array_info) dev_dbg(dev, "GPIO array info: chip=%s, size=%d, get_mask=%lx, set_mask=%lx, invert_mask=%lx\n", - array_info->chip->label, array_info->size, + array_info->gdev->label, array_info->size, *array_info->get_mask, *array_info->set_mask, *array_info->invert_mask); return descs; --- a/drivers/gpio/gpiolib.h +++ b/drivers/gpio/gpiolib.h @@ -110,7 +110,7 @@ extern const char *const gpio_suffixes[] * * @desc: Array of pointers to the GPIO descriptors * @size: Number of elements in desc - * @chip: Parent GPIO chip + * @gdev: Parent GPIO device * @get_mask: Get mask used in fastpath * @set_mask: Set mask used in fastpath * @invert_mask: Invert mask used in fastpath @@ -122,7 +122,7 @@ extern const char *const gpio_suffixes[] struct gpio_array { struct gpio_desc **desc; unsigned int size; - struct gpio_chip *chip; + struct gpio_device *gdev; unsigned long *get_mask; unsigned long *set_mask; unsigned long invert_mask[];
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sumit Garg sumit.garg@linaro.org
commit 70b0d6b0a199c5a3ee6c72f5e61681ed6f759612 upstream.
OP-TEE supplicant is a user-space daemon and it's possible for it be hung or crashed or killed in the middle of processing an OP-TEE RPC call. It becomes more complicated when there is incorrect shutdown ordering of the supplicant process vs the OP-TEE client application which can eventually lead to system hang-up waiting for the closure of the client application.
Allow the client process waiting in kernel for supplicant response to be killed rather than indefinitely waiting in an unkillable state. Also, a normal uninterruptible wait should not have resulted in the hung-task watchdog getting triggered, but the endless loop would.
This fixes issues observed during system reboot/shutdown when supplicant got hung for some reason or gets crashed/killed which lead to client getting hung in an unkillable state. It in turn lead to system being in hung up state requiring hard power off/on to recover.
Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver") Suggested-by: Arnd Bergmann arnd@arndb.de Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg sumit.garg@linaro.org Reviewed-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tee/optee/supp.c | 35 ++++++++--------------------------- 1 file changed, 8 insertions(+), 27 deletions(-)
--- a/drivers/tee/optee/supp.c +++ b/drivers/tee/optee/supp.c @@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_conte struct optee *optee = tee_get_drvdata(ctx->teedev); struct optee_supp *supp = &optee->supp; struct optee_supp_req *req; - bool interruptable; u32 ret;
/* @@ -111,36 +110,18 @@ u32 optee_supp_thrd_req(struct tee_conte /* * Wait for supplicant to process and return result, once we've * returned from wait_for_completion(&req->c) successfully we have - * exclusive access again. + * exclusive access again. Allow the wait to be killable such that + * the wait doesn't turn into an indefinite state if the supplicant + * gets hung for some reason. */ - while (wait_for_completion_interruptible(&req->c)) { + if (wait_for_completion_killable(&req->c)) { mutex_lock(&supp->mutex); - interruptable = !supp->ctx; - if (interruptable) { - /* - * There's no supplicant available and since the - * supp->mutex currently is held none can - * become available until the mutex released - * again. - * - * Interrupting an RPC to supplicant is only - * allowed as a way of slightly improving the user - * experience in case the supplicant hasn't been - * started yet. During normal operation the supplicant - * will serve all requests in a timely manner and - * interrupting then wouldn't make sense. - */ - if (req->in_queue) { - list_del(&req->link); - req->in_queue = false; - } + if (req->in_queue) { + list_del(&req->link); + req->in_queue = false; } mutex_unlock(&supp->mutex); - - if (interruptable) { - req->ret = TEEC_ERROR_COMMUNICATION; - break; - } + req->ret = TEEC_ERROR_COMMUNICATION; }
ret = req->ret;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gavrilov Ilia Ilia.Gavrilov@infotecs.ru
commit 07b598c0e6f06a0f254c88dafb4ad50f8a8c6eea upstream.
Syzkaller reports the following bug:
BUG: spinlock bad magic on CPU#1, syz-executor.0/7995 lock: 0xffff88805303f3e0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 1 PID: 7995 Comm: syz-executor.0 Tainted: G E 5.10.209+ #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x119/0x179 lib/dump_stack.c:118 debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline] do_raw_spin_lock+0x1f6/0x270 kernel/locking/spinlock_debug.c:112 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159 reset_per_cpu_data+0xe6/0x240 [drop_monitor] net_dm_cmd_trace+0x43d/0x17a0 [drop_monitor] genl_family_rcv_msg_doit+0x22f/0x330 net/netlink/genetlink.c:739 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0x341/0x5a0 net/netlink/genetlink.c:800 netlink_rcv_skb+0x14d/0x440 net/netlink/af_netlink.c:2497 genl_rcv+0x29/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x54b/0x800 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:651 [inline] __sock_sendmsg+0x157/0x190 net/socket.c:663 ____sys_sendmsg+0x712/0x870 net/socket.c:2378 ___sys_sendmsg+0xf8/0x170 net/socket.c:2432 __sys_sendmsg+0xea/0x1b0 net/socket.c:2461 do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x62/0xc7 RIP: 0033:0x7f3f9815aee9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f972bf0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f3f9826d050 RCX: 00007f3f9815aee9 RDX: 0000000020000000 RSI: 0000000020001300 RDI: 0000000000000007 RBP: 00007f3f981b63bd R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000006e R14: 00007f3f9826d050 R15: 00007ffe01ee6768
If drop_monitor is built as a kernel module, syzkaller may have time to send a netlink NET_DM_CMD_START message during the module loading. This will call the net_dm_monitor_start() function that uses a spinlock that has not yet been initialized.
To fix this, let's place resource initialization above the registration of a generic netlink family.
Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov Ilia.Gavrilov@infotecs.ru Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://patch.msgid.link/20250213152054.2785669-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/drop_monitor.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-)
--- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -1734,30 +1734,30 @@ static int __init init_net_drop_monitor( return -ENOSPC; }
- rc = genl_register_family(&net_drop_monitor_family); - if (rc) { - pr_err("Could not create drop monitor netlink family\n"); - return rc; + for_each_possible_cpu(cpu) { + net_dm_cpu_data_init(cpu); + net_dm_hw_cpu_data_init(cpu); } - WARN_ON(net_drop_monitor_family.mcgrp_offset != NET_DM_GRP_ALERT);
rc = register_netdevice_notifier(&dropmon_net_notifier); if (rc < 0) { pr_crit("Failed to register netdevice notifier\n"); + return rc; + } + + rc = genl_register_family(&net_drop_monitor_family); + if (rc) { + pr_err("Could not create drop monitor netlink family\n"); goto out_unreg; } + WARN_ON(net_drop_monitor_family.mcgrp_offset != NET_DM_GRP_ALERT);
rc = 0;
- for_each_possible_cpu(cpu) { - net_dm_cpu_data_init(cpu); - net_dm_hw_cpu_data_init(cpu); - } - goto out;
out_unreg: - genl_unregister_family(&net_drop_monitor_family); + WARN_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); out: return rc; } @@ -1766,19 +1766,18 @@ static void exit_net_drop_monitor(void) { int cpu;
- BUG_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); - /* * Because of the module_get/put we do in the trace state change path * we are guaranteed not to have any current users when we get here */ + BUG_ON(genl_unregister_family(&net_drop_monitor_family)); + + BUG_ON(unregister_netdevice_notifier(&dropmon_net_notifier));
for_each_possible_cpu(cpu) { net_dm_hw_cpu_data_fini(cpu); net_dm_cpu_data_fini(cpu); } - - BUG_ON(genl_unregister_family(&net_drop_monitor_family)); }
module_init(init_net_drop_monitor);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 41cddf83d8b00f29fd105e7a0777366edc69a5cf upstream.
If migration succeeded, we called folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the old to the new folio. This will set memcg_data of the old folio to 0.
Similarly, if migration failed, memcg_data of the dst folio is left unset.
If we call folio_putback_lru() on such folios (memcg_data == 0), we will add the folio to be freed to the LRU, making memcg code unhappy. Running the hmm selftests:
# ./hmm-tests ... # RUN hmm.hmm_device_private.migrate ... [ 102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00 [ 102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff) [ 102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9 [ 102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000 [ 102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled()) [ 102.087230][T14893] ------------[ cut here ]------------ [ 102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.090478][T14893] Modules linked in: [ 102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151 [ 102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.096104][T14893] Code: ... [ 102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293 [ 102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426 [ 102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880 [ 102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 [ 102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8 [ 102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000 [ 102.108830][T14893] FS: 00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 [ 102.110643][T14893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0 [ 102.113478][T14893] PKRU: 55555554 [ 102.114172][T14893] Call Trace: [ 102.114805][T14893] <TASK> [ 102.115397][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.116547][T14893] ? __warn.cold+0x110/0x210 [ 102.117461][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.118667][T14893] ? report_bug+0x1b9/0x320 [ 102.119571][T14893] ? handle_bug+0x54/0x90 [ 102.120494][T14893] ? exc_invalid_op+0x17/0x50 [ 102.121433][T14893] ? asm_exc_invalid_op+0x1a/0x20 [ 102.122435][T14893] ? __wake_up_klogd.part.0+0x76/0xd0 [ 102.123506][T14893] ? dump_page+0x4f/0x60 [ 102.124352][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.125500][T14893] folio_batch_move_lru+0xd4/0x200 [ 102.126577][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.127505][T14893] __folio_batch_add_and_move+0x391/0x720 [ 102.128633][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.129550][T14893] folio_putback_lru+0x16/0x80 [ 102.130564][T14893] migrate_device_finalize+0x9b/0x530 [ 102.131640][T14893] dmirror_migrate_to_device.constprop.0+0x7c5/0xad0 [ 102.133047][T14893] dmirror_fops_unlocked_ioctl+0x89b/0xc80
Likely, nothing else goes wrong: putting the last folio reference will remove the folio from the LRU again. So besides memcg complaining, adding the folio to be freed to the LRU is just an unnecessary step.
The new flow resembles what we have in migrate_folio_move(): add the dst to the lru, remove migration ptes, unlock and unref dst.
Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory") Signed-off-by: David Hildenbrand david@redhat.com Cc: Jérôme Glisse jglisse@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Alistair Popple apopple@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/migrate_device.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
--- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -840,20 +840,15 @@ void migrate_device_finalize(unsigned lo dst = src; }
+ if (!folio_is_zone_device(dst)) + folio_add_lru(dst); remove_migration_ptes(src, dst, 0); folio_unlock(src); - - if (folio_is_zone_device(src)) - folio_put(src); - else - folio_putback_lru(src); + folio_put(src);
if (dst != src) { folio_unlock(dst); - if (folio_is_zone_device(dst)) - folio_put(dst); - else - folio_putback_lru(dst); + folio_put(dst); } } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shiyan eagle.alexander923@gmail.com
commit 5c8f9a05336cf5cadbd57ad461621b386aadb762 upstream.
The tsadc driver does not handle pinctrl "gpio" and "otpout". Let's use the correct pinctrl names "default" and "sleep". Additionally, Alexey Charkov's testing [1] has established that it is necessary for pinctrl state to reference the &tsadc_shut_org configuration rather than &tsadc_shut for the driver to function correctly.
[1] https://lkml.org/lkml/2025/1/24/966
Fixes: 32641b8ab1a5 ("arm64: dts: rockchip: add rk3588 thermal sensor") Cc: stable@vger.kernel.org Reviewed-by: Dragan Simic dsimic@manjaro.org Signed-off-by: Alexander Shiyan eagle.alexander923@gmail.com Link: https://lore.kernel.org/r/20250130053849.4902-1-eagle.alexander923@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi @@ -2626,9 +2626,9 @@ rockchip,hw-tshut-temp = <120000>; rockchip,hw-tshut-mode = <0>; /* tshut mode 0:CRU 1:GPIO */ rockchip,hw-tshut-polarity = <0>; /* tshut polarity 0:LOW 1:HIGH */ - pinctrl-0 = <&tsadc_gpio_func>; - pinctrl-1 = <&tsadc_shut>; - pinctrl-names = "gpio", "otpout"; + pinctrl-0 = <&tsadc_shut_org>; + pinctrl-1 = <&tsadc_gpio_func>; + pinctrl-names = "default", "sleep"; #thermal-sensor-cells = <1>; status = "disabled"; };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukasz Czechowski lukasz.czechowski@thaumatec.com
commit 4eee627ea59304cdd66c5d4194ef13486a6c44fc upstream.
In the PX30-uQ7 (Ringneck) SoM, the hardware CTS and RTS pins for uart5 cannot be used for the UART CTS/RTS, because they are already allocated for different purposes. CTS pin is routed to SUS_S3# signal, while RTS pin is used internally and is not available on Q7 connector. Move definition of the pinctrl-0 property from px30-ringneck-haikou.dts to px30-ringneck.dtsi.
This commit is a dependency to next commit in the patch series, that disables DMA for uart5.
Cc: stable@vger.kernel.org Reviewed-by: Quentin Schulz quentin.schulz@cherry.de Signed-off-by: Lukasz Czechowski lukasz.czechowski@thaumatec.com Link: https://lore.kernel.org/r/20250121125604.3115235-2-lukasz.czechowski@thaumat... Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts | 1 - arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi | 4 ++++ 2 files changed, 4 insertions(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts +++ b/arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts @@ -226,7 +226,6 @@ };
&uart5 { - pinctrl-0 = <&uart5_xfer>; rts-gpios = <&gpio0 RK_PB5 GPIO_ACTIVE_HIGH>; status = "okay"; }; --- a/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi +++ b/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi @@ -373,6 +373,10 @@ status = "okay"; };
+&uart5 { + pinctrl-0 = <&uart5_xfer>; +}; + /* Mule UCAN */ &usb_host0_ehci { status = "okay";
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukasz Czechowski lukasz.czechowski@thaumatec.com
commit 5ae4dca718eacd0a56173a687a3736eb7e627c77 upstream.
UART controllers without flow control seem to behave unstable in case DMA is enabled. The issues were indicated in the message: https://lore.kernel.org/linux-arm-kernel/CAMdYzYpXtMocCtCpZLU_xuWmOp2Ja_v0Aj... In case of PX30-uQ7 Ringneck SoM, it was noticed that after couple of hours of UART communication, the CPU stall was occurring, leading to the system becoming unresponsive. After disabling the DMA, extensive UART communication tests for up to two weeks were performed, and no issues were further observed. The flow control pins for uart5 are not available on PX30-uQ7 Ringneck, as configured by pinctrl-0, so the DMA nodes were removed on SoM dtsi.
Cc: stable@vger.kernel.org Fixes: c484cf93f61b ("arm64: dts: rockchip: add PX30-µQ7 (Ringneck) SoM with Haikou baseboard") Reviewed-by: Quentin Schulz quentin.schulz@cherry.de Signed-off-by: Lukasz Czechowski lukasz.czechowski@thaumatec.com Link: https://lore.kernel.org/r/20250121125604.3115235-3-lukasz.czechowski@thaumat... Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi +++ b/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi @@ -374,6 +374,8 @@ };
&uart5 { + /delete-property/ dmas; + /delete-property/ dma-names; pinctrl-0 = <&uart5_xfer>; };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoxiang Li haoxiang_li2024@163.com
commit e31e3f6c0ce473f7ce1e70d54ac8e3ed190509f8 upstream.
Add check for the return value of devm_kstrdup() in loongson2_guts_probe() to catch potential exception.
Fixes: b82621ac8450 ("soc: loongson: add GUTS driver for loongson-2 platforms") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li haoxiang_li2024@163.com Link: https://lore.kernel.org/r/20250220081714.2676828-1-haoxiang_li2024@163.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/loongson/loongson2_guts.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/soc/loongson/loongson2_guts.c +++ b/drivers/soc/loongson/loongson2_guts.c @@ -114,8 +114,11 @@ static int loongson2_guts_probe(struct p if (of_property_read_string(root, "model", &machine)) of_property_read_string_index(root, "compatible", 0, &machine); of_node_put(root); - if (machine) + if (machine) { soc_dev_attr.machine = devm_kstrdup(dev, machine, GFP_KERNEL); + if (!soc_dev_attr.machine) + return -ENOMEM; + }
svr = loongson2_guts_get_svr(); soc_die = loongson2_soc_die_match(svr, loongson2_soc_die);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Carstens hca@linux.ibm.com
commit c3a589fd9fcbf295a7402a4b188dc9277d505f4f upstream.
The cmma_test_essa() inline assembly uses tmp as input and output, however tmp is specified as output only, which allows the compiler to optimize the initialization of tmp away.
Therefore the ESSA detection may or may not work depending on previous contents of the register that the compiler selected for tmp.
Fix this by using the correct constraint modifier.
Fixes: 468a3bc2b7b9 ("s390/cmma: move parsing of cmma kernel parameter to early boot code") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens hca@linux.ibm.com Reviewed-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/boot/startup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/boot/startup.c +++ b/arch/s390/boot/startup.c @@ -75,7 +75,7 @@ static int cmma_test_essa(void) : [reg1] "=&d" (reg1), [reg2] "=&a" (reg2), [rc] "+&d" (rc), - [tmp] "=&d" (tmp), + [tmp] "+&d" (tmp), "+Q" (get_lowcore()->program_new_psw), "=Q" (old) : [psw_old] "a" (&old),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 66314e9a57a050f95cb0ebac904f5ab047a8926e upstream.
I received a report from the release engineering side of the house that xfs_scrub without the -n flag (aka fix it mode) would try to fix a broken filesystem even on a kernel that doesn't have online repair built into it:
# xfs_scrub -dTvn /mnt/test EXPERIMENTAL xfs_scrub program in use! Use at your own risk! Phase 1: Find filesystem geometry. /mnt/test: using 1 threads to scrub. Phase 1: Memory used: 132k/0k (108k/25k), time: 0.00/ 0.00/ 0.00s <snip> Phase 4: Repair filesystem. <snip> Info: /mnt/test/some/victimdir directory entries: Attempting repair. (repair.c line 351) Corruption: /mnt/test/some/victimdir directory entries: Repair unsuccessful; offline repair required. (repair.c line 204)
Source: https://blogs.oracle.com/linux/post/xfs-online-filesystem-repair
It is strange that xfs_scrub doesn't refuse to run, because the kernel is supposed to return EOPNOTSUPP if we actually needed to run a repair, and xfs_io's repair subcommand will perror that. And yet:
# xfs_io -x -c 'repair probe' /mnt/test #
The first problem is commit dcb660f9222fd9 (4.15) which should have had xchk_probe set the CORRUPT OFLAG so that any of the repair machinery will get called at all.
It turns out that some refactoring that happened in the 6.6-6.8 era broke the operation of this corner case. What we *really* want to happen is that all the predicates that would steer xfs_scrub_metadata() towards calling xrep_attempt() should function the same way that they do when repair is compiled in; and then xrep_attempt gets to return the fatal EOPNOTSUPP error code that causes the probe to fail.
Instead, commit 8336a64eb75cba (6.6) started the failwhale swimming by hoisting OFLAG checking logic into a helper whose non-repair stub always returns false, causing scrub to return "repair not needed" when in fact the repair is not supported. Prior to that commit, the oflag checking that was open-coded in scrub.c worked correctly.
Similarly, in commit 4bdfd7d15747b1 (6.8) we hoisted the IFLAG_REPAIR and ALREADY_FIXED logic into a helper whose non-repair stub always returns false, so we never enter the if test body that would have called xrep_attempt, let alone fail to decode the OFLAGs correctly.
The final insult (yes, we're doing The Naked Gun now) is commit 48a72f60861f79 (6.8) in which we hoisted the "are we going to try a repair?" predicate into yet another function with a non-repair stub always returns false.
Fix xchk_probe to trigger xrep_probe if repair is enabled, or return EOPNOTSUPP directly if it is not. For all the other scrub types, we need to fix the header predicates so that the ->repair functions (which are all xrep_notsupported) get called to return EOPNOTSUPP. Commit 48a72 is tagged here because the scrub code prior to LTS 6.12 are incomplete and not worth patching.
Reported-by: David Flynn david.flynn@oracle.com Cc: stable@vger.kernel.org # v6.8 Fixes: 8336a64eb75c ("xfs: don't complain about unfixed metadata when repairs were injected") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/scrub/common.h | 5 ----- fs/xfs/scrub/repair.h | 11 ++++++++++- fs/xfs/scrub/scrub.c | 12 ++++++++++++ 3 files changed, 22 insertions(+), 6 deletions(-)
--- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -179,7 +179,6 @@ static inline bool xchk_skip_xref(struct bool xchk_dir_looks_zapped(struct xfs_inode *dp); bool xchk_pptr_looks_zapped(struct xfs_inode *ip);
-#ifdef CONFIG_XFS_ONLINE_REPAIR /* Decide if a repair is required. */ static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm) { @@ -199,10 +198,6 @@ static inline bool xchk_could_repair(con return (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !(sc->flags & XREP_ALREADY_FIXED); } -#else -# define xchk_needs_repair(sc) (false) -# define xchk_could_repair(sc) (false) -#endif /* CONFIG_XFS_ONLINE_REPAIR */
int xchk_metadata_inode_forks(struct xfs_scrub *sc);
--- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -163,7 +163,16 @@ bool xrep_buf_verify_struct(struct xfs_b #else
#define xrep_ino_dqattach(sc) (0) -#define xrep_will_attempt(sc) (false) + +/* + * When online repair is not built into the kernel, we still want to attempt + * the repair so that the stub xrep_attempt below will return EOPNOTSUPP. + */ +static inline bool xrep_will_attempt(const struct xfs_scrub *sc) +{ + return (sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) || + xchk_needs_repair(sc->sm); +}
static inline int xrep_attempt( --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -149,6 +149,18 @@ xchk_probe( if (xchk_should_terminate(sc, &error)) return error;
+ /* + * If the caller is probing to see if repair works but repair isn't + * built into the kernel, return EOPNOTSUPP because that's the signal + * that userspace expects. If online repair is built in, set the + * CORRUPT flag (without any of the usual tracing/logging) to force us + * into xrep_probe. + */ + if (xchk_could_repair(sc)) { + if (!IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)) + return -EOPNOTSUPP; + sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT; + } return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit f4b78260fc678ccd7169f32dc9f3bfa3b93931c7 upstream.
import_iovec() says that it should always be fine to kfree the iovec returned in @iovp regardless of the error code. __import_iovec_ubuf() never reallocates it and thus should clear the pointer even in cases when copy_iovec_*() fail.
Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.173833246... Fixes: 3b2deb0e46da ("iov_iter: import single vector iovecs as ITER_UBUF") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Jens Axboe axboe@kernel.dk Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/iov_iter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1428,6 +1428,8 @@ static ssize_t __import_iovec_ubuf(int t struct iovec *iov = *iovp; ssize_t ret;
+ *iovp = NULL; + if (compat) ret = copy_compat_iovec_from_user(iov, uvec, 1); else @@ -1438,7 +1440,6 @@ static ssize_t __import_iovec_ubuf(int t ret = import_ubuf(type, iov->iov_base, iov->iov_len, i); if (unlikely(ret)) return ret; - *iovp = NULL; return i->count; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 654292a0b264e9b8c51b98394146218a21612aa1 upstream.
When the user sets a file or directory as read-only (e.g. ~S_IWUGO), the client will set the ATTR_READONLY attribute by sending an SMB2_SET_INFO request to the server in cifs_setattr_{,nounix}(), but cifsInodeInfo::cifsAttrs will be left unchanged as the client will only update the new file attributes in the next call to {smb311_posix,cifs}_get_inode_info() with the new metadata filled in @data parameter.
Commit a18280e7fdea ("smb: cilent: set reparse mount points as automounts") mistakenly removed the @data NULL check when calling is_inode_cache_good(), which broke the above case as the new ATTR_READONLY attribute would end up not being updated on files with a read lease.
Fix this by updating the inode whenever we have cached metadata in @data parameter.
Reported-by: Horst Reiterer horst.reiterer@fabasoft.com Closes: https://lore.kernel.org/r/85a16504e09147a195ac0aac1c801280@fabasoft.com Fixes: a18280e7fdea ("smb: cilent: set reparse mount points as automounts") Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -1381,7 +1381,7 @@ int cifs_get_inode_info(struct inode **i struct cifs_fattr fattr = {}; int rc;
- if (is_inode_cache_good(*inode)) { + if (!data && is_inode_cache_good(*inode)) { cifs_dbg(FYI, "No need to revalidate cached inode sizes\n"); return 0; } @@ -1480,7 +1480,7 @@ int smb311_posix_get_inode_info(struct i struct cifs_fattr fattr = {}; int rc;
- if (is_inode_cache_good(*inode)) { + if (!data && is_inode_cache_good(*inode)) { cifs_dbg(FYI, "No need to revalidate cached inode sizes\n"); return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoxiang Li haoxiang_li2024@163.com
commit 878e7b11736e062514e58f3b445ff343e6705537 upstream.
Add check for the return value of nfp_app_ctrl_msg_alloc() in nfp_bpf_cmsg_alloc() to prevent null pointer dereference.
Fixes: ff3d43f7568c ("nfp: bpf: implement helpers for FW map ops") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li haoxiang_li2024@163.com Link: https://patch.msgid.link/20250218030409.2425798-1-haoxiang_li2024@163.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c @@ -20,6 +20,8 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *b struct sk_buff *skb;
skb = nfp_app_ctrl_msg_alloc(bpf->app, size, GFP_KERNEL); + if (!skb) + return NULL; skb_put(skb, size);
return skb;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joshua Washington joshwash@google.com
commit 415cadd505464d9a11ff5e0f6e0329c127849da5 upstream.
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by default as part of driver initialization, and is never cleared. However, this flag differs from others in that it is used as an indicator for whether the driver is ready to perform the ndo_xdp_xmit operation as part of an XDP_REDIRECT. Kernel helpers xdp_features_(set|clear)_redirect_target exist to convey this meaning.
This patch ensures that the netdev is only reported as a redirect target when XDP queues exist to forward traffic.
Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format") Cc: stable@vger.kernel.org Reviewed-by: Praveen Kaligineedi pkaligineedi@google.com Reviewed-by: Jeroen de Borst jeroendb@google.com Signed-off-by: Joshua Washington joshwash@google.com Link: https://patch.msgid.link/20250214224417.1237818-1-joshwash@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/google/gve/gve.h | 10 ++++++++++ drivers/net/ethernet/google/gve/gve_main.c | 6 +++++- 2 files changed, 15 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -1110,6 +1110,16 @@ static inline u32 gve_xdp_tx_start_queue return gve_xdp_tx_queue_id(priv, 0); }
+static inline bool gve_supports_xdp_xmit(struct gve_priv *priv) +{ + switch (priv->queue_format) { + case GVE_GQI_QPL_FORMAT: + return true; + default: + return false; + } +} + /* gqi napi handler defined in gve_main.c */ int gve_napi_poll(struct napi_struct *napi, int budget);
--- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1895,6 +1895,8 @@ static void gve_turndown(struct gve_priv /* Stop tx queues */ netif_tx_disable(priv->dev);
+ xdp_features_clear_redirect_target(priv->dev); + gve_clear_napi_enabled(priv); gve_clear_report_stats(priv);
@@ -1955,6 +1957,9 @@ static void gve_turnup(struct gve_priv * napi_schedule(&block->napi); }
+ if (priv->num_xdp_queues && gve_supports_xdp_xmit(priv)) + xdp_features_set_redirect_target(priv->dev, false); + gve_set_napi_enabled(priv); }
@@ -2229,7 +2234,6 @@ static void gve_set_netdev_xdp_features( if (priv->queue_format == GVE_GQI_QPL_FORMAT) { xdp_features = NETDEV_XDP_ACT_BASIC; xdp_features |= NETDEV_XDP_ACT_REDIRECT; - xdp_features |= NETDEV_XDP_ACT_NDO_XMIT; xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY; } else { xdp_features = 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Ujfalusi peter.ujfalusi@linux.intel.com
commit d8d99c3b5c485f339864aeaa29f76269cc0ea975 upstream.
The nullity of sps->cstream should be checked similarly as it is done in sof_set_stream_data_offset() function. Assuming that it is not NULL if sps->stream is NULL is incorrect and can lead to NULL pointer dereference.
Fixes: 090349a9feba ("ASoC: SOF: Add support for compress API for stream data/offset") Cc: stable@vger.kernel.org Reported-by: Curtis Malainey cujomalainey@chromium.org Closes: https://github.com/thesofproject/linux/pull/5214 Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Daniel Baluta daniel.baluta@nxp.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Curtis Malainey cujomalainey@chromium.org Link: https://patch.msgid.link/20250205135232.19762-2-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/sof/stream-ipc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/sound/soc/sof/stream-ipc.c +++ b/sound/soc/sof/stream-ipc.c @@ -43,7 +43,7 @@ int sof_ipc_msg_data(struct snd_sof_dev return -ESTRPIPE;
posn_offset = stream->posn_offset; - } else { + } else if (sps->cstream) {
struct sof_compr_stream *sstream = sps->cstream->runtime->private_data;
@@ -51,6 +51,10 @@ int sof_ipc_msg_data(struct snd_sof_dev return -ESTRPIPE;
posn_offset = sstream->posn_offset; + + } else { + dev_err(sdev->dev, "%s: No stream opened\n", __func__); + return -EINVAL; }
snd_sof_dsp_mailbox_read(sdev, posn_offset, p, sz);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Zhandarovich n.zhandarovich@fintech.ru
commit a8c9a453387640dbe45761970f41301a6985e7fa upstream.
If 'micfil->quality' received from micfil_quality_set() somehow ends up with an unpredictable value, switch() operator will fail to initialize local variable qsel before regmap_update_bits() tries to utilize it.
While it is unlikely, play it safe and enable a default case that returns -EINVAL error.
Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE.
Fixes: bea1d61d5892 ("ASoC: fsl_micfil: rework quality setting") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich n.zhandarovich@fintech.ru Link: https://patch.msgid.link/20250116142436.22389-1-n.zhandarovich@fintech.ru Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/fsl/fsl_micfil.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/soc/fsl/fsl_micfil.c +++ b/sound/soc/fsl/fsl_micfil.c @@ -156,6 +156,8 @@ static int micfil_set_quality(struct fsl case QUALITY_VLOW2: qsel = MICFIL_QSEL_VLOW2_QUALITY; break; + default: + return -EINVAL; }
return regmap_update_bits(micfil->regmap, REG_MICFIL_CTRL2,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wentao Liang vulab@iscas.ac.cn
commit 822b7ec657e99b44b874e052d8540d8b54fe8569 upstream.
Check the return value of snd_ctl_rename_id() in snd_hda_create_dig_out_ctls(). Ensure that failures are properly handled.
[ Note: the error cannot happen practically because the only error condition in snd_ctl_rename_id() is the missing ID, but this is a rename, hence it must be present. But for the code consistency, it's safer to have always the proper return check -- tiwai ]
Fixes: 5c219a340850 ("ALSA: hda: Fix kctl->id initialization") Cc: stable@vger.kernel.org # 6.4+ Signed-off-by: Wentao Liang vulab@iscas.ac.cn Link: https://patch.msgid.link/20250213074543.1620-1-vulab@iscas.ac.cn Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/hda_codec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -2470,7 +2470,9 @@ int snd_hda_create_dig_out_ctls(struct h break; id = kctl->id; id.index = spdif_index; - snd_ctl_rename_id(codec->card, &kctl->id, &id); + err = snd_ctl_rename_id(codec->card, &kctl->id, &id); + if (err < 0) + return err; } bus->primary_dig_out_type = HDA_PCM_TYPE_HDMI; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Veness john-linux@pelago.org.uk
commit 6d1f86610f23b0bc334d6506a186f21a98f51392 upstream.
Allows the LED on the dedicated mute button on the HP ProBook 450 G4 laptop to change colour correctly.
Signed-off-by: John Veness john-linux@pelago.org.uk Cc: stable@vger.kernel.org Link: https://patch.msgid.link/2fb55d48-6991-4a42-b591-4c78f2fad8d7@pelago.org.uk Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_conexant.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -1080,6 +1080,7 @@ static const struct hda_quirk cxt5066_fi SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO), SND_PCI_QUIRK(0x103c, 0x8174, "HP Spectre x360", CXT_FIXUP_HP_SPECTRE), SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO), + SND_PCI_QUIRK(0x103c, 0x8231, "HP ProBook 450 G4", CXT_FIXUP_MUTE_LED_GPIO), SND_PCI_QUIRK(0x103c, 0x828c, "HP EliteBook 840 G4", CXT_FIXUP_HP_DOCK), SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Ujfalusi peter.ujfalusi@linux.intel.com
commit 46c7b901e2a03536df5a3cb40b3b26e2be505df6 upstream.
The spcm->stream[substream->stream].substream is set during open and was left untouched. After the first PCM stream it will never be NULL and we have code which checks for substream NULLity as indication if the stream is active or not. For the compressed cstream pointer the same has been done, this change will correct the handling of PCM streams.
Fixes: 090349a9feba ("ASoC: SOF: Add support for compress API for stream data/offset") Cc: stable@vger.kernel.org Reported-by: Curtis Malainey cujomalainey@chromium.org Closes: https://github.com/thesofproject/linux/pull/5214 Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Daniel Baluta daniel.baluta@nxp.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Curtis Malainey cujomalainey@chromium.org Link: https://patch.msgid.link/20250205135232.19762-3-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/sof/pcm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/soc/sof/pcm.c +++ b/sound/soc/sof/pcm.c @@ -511,6 +511,8 @@ static int sof_pcm_close(struct snd_soc_ */ }
+ spcm->stream[substream->stream].substream = NULL; + return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 56d5f3eba3f5de0efdd556de4ef381e109b973a9 upstream.
In [1] it was reported that the acct(2) system call can be used to trigger NULL deref in cases where it is set to write to a file that triggers an internal lookup. This can e.g., happen when pointing acc(2) to /sys/power/resume. At the point the where the write to this file happens the calling task has already exited and called exit_fs(). A lookup will thus trigger a NULL-deref when accessing current->fs.
Reorganize the code so that the the final write happens from the workqueue but with the caller's credentials. This preserves the (strange) permission model and has almost no regression risk.
This api should stop to exist though.
Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com [1] Link: https://lore.kernel.org/r/20250211-work-acct-v1-1-1c16aecab8b3@kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Zicheng Qu quzicheng@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/acct.c | 120 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 70 insertions(+), 50 deletions(-)
--- a/kernel/acct.c +++ b/kernel/acct.c @@ -103,48 +103,50 @@ struct bsd_acct_struct { atomic_long_t count; struct rcu_head rcu; struct mutex lock; - int active; + bool active; + bool check_space; unsigned long needcheck; struct file *file; struct pid_namespace *ns; struct work_struct work; struct completion done; + acct_t ac; };
-static void do_acct_process(struct bsd_acct_struct *acct); +static void fill_ac(struct bsd_acct_struct *acct); +static void acct_write_process(struct bsd_acct_struct *acct);
/* * Check the amount of free space and suspend/resume accordingly. */ -static int check_free_space(struct bsd_acct_struct *acct) +static bool check_free_space(struct bsd_acct_struct *acct) { struct kstatfs sbuf;
- if (time_is_after_jiffies(acct->needcheck)) - goto out; + if (!acct->check_space) + return acct->active;
/* May block */ if (vfs_statfs(&acct->file->f_path, &sbuf)) - goto out; + return acct->active;
if (acct->active) { u64 suspend = sbuf.f_blocks * SUSPEND; do_div(suspend, 100); if (sbuf.f_bavail <= suspend) { - acct->active = 0; + acct->active = false; pr_info("Process accounting paused\n"); } } else { u64 resume = sbuf.f_blocks * RESUME; do_div(resume, 100); if (sbuf.f_bavail >= resume) { - acct->active = 1; + acct->active = true; pr_info("Process accounting resumed\n"); } }
acct->needcheck = jiffies + ACCT_TIMEOUT*HZ; -out: return acct->active; }
@@ -189,7 +191,11 @@ static void acct_pin_kill(struct fs_pin { struct bsd_acct_struct *acct = to_acct(pin); mutex_lock(&acct->lock); - do_acct_process(acct); + /* + * Fill the accounting struct with the exiting task's info + * before punting to the workqueue. + */ + fill_ac(acct); schedule_work(&acct->work); wait_for_completion(&acct->done); cmpxchg(&acct->ns->bacct, pin, NULL); @@ -202,6 +208,9 @@ static void close_work(struct work_struc { struct bsd_acct_struct *acct = container_of(work, struct bsd_acct_struct, work); struct file *file = acct->file; + + /* We were fired by acct_pin_kill() which holds acct->lock. */ + acct_write_process(acct); if (file->f_op->flush) file->f_op->flush(file, NULL); __fput_sync(file); @@ -430,13 +439,27 @@ static u32 encode_float(u64 value) * do_exit() or when switching to a different output file. */
-static void fill_ac(acct_t *ac) +static void fill_ac(struct bsd_acct_struct *acct) { struct pacct_struct *pacct = ¤t->signal->pacct; + struct file *file = acct->file; + acct_t *ac = &acct->ac; u64 elapsed, run_time; time64_t btime; struct tty_struct *tty;
+ lockdep_assert_held(&acct->lock); + + if (time_is_after_jiffies(acct->needcheck)) { + acct->check_space = false; + + /* Don't fill in @ac if nothing will be written. */ + if (!acct->active) + return; + } else { + acct->check_space = true; + } + /* * Fill the accounting struct with the needed info as recorded * by the different kernel functions. @@ -484,64 +507,61 @@ static void fill_ac(acct_t *ac) ac->ac_majflt = encode_comp_t(pacct->ac_majflt); ac->ac_exitcode = pacct->ac_exitcode; spin_unlock_irq(¤t->sighand->siglock); -} -/* - * do_acct_process does all actual work. Caller holds the reference to file. - */ -static void do_acct_process(struct bsd_acct_struct *acct) -{ - acct_t ac; - unsigned long flim; - const struct cred *orig_cred; - struct file *file = acct->file; - - /* - * Accounting records are not subject to resource limits. - */ - flim = rlimit(RLIMIT_FSIZE); - current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; - /* Perform file operations on behalf of whoever enabled accounting */ - orig_cred = override_creds(file->f_cred);
- /* - * First check to see if there is enough free_space to continue - * the process accounting system. - */ - if (!check_free_space(acct)) - goto out; - - fill_ac(&ac); /* we really need to bite the bullet and change layout */ - ac.ac_uid = from_kuid_munged(file->f_cred->user_ns, orig_cred->uid); - ac.ac_gid = from_kgid_munged(file->f_cred->user_ns, orig_cred->gid); + ac->ac_uid = from_kuid_munged(file->f_cred->user_ns, current_uid()); + ac->ac_gid = from_kgid_munged(file->f_cred->user_ns, current_gid()); #if ACCT_VERSION == 1 || ACCT_VERSION == 2 /* backward-compatible 16 bit fields */ - ac.ac_uid16 = ac.ac_uid; - ac.ac_gid16 = ac.ac_gid; + ac->ac_uid16 = ac->ac_uid; + ac->ac_gid16 = ac->ac_gid; #elif ACCT_VERSION == 3 { struct pid_namespace *ns = acct->ns;
- ac.ac_pid = task_tgid_nr_ns(current, ns); + ac->ac_pid = task_tgid_nr_ns(current, ns); rcu_read_lock(); - ac.ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), - ns); + ac->ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), ns); rcu_read_unlock(); } #endif +} + +static void acct_write_process(struct bsd_acct_struct *acct) +{ + struct file *file = acct->file; + const struct cred *cred; + acct_t *ac = &acct->ac; + + /* Perform file operations on behalf of whoever enabled accounting */ + cred = override_creds(file->f_cred); + /* - * Get freeze protection. If the fs is frozen, just skip the write - * as we could deadlock the system otherwise. + * First check to see if there is enough free_space to continue + * the process accounting system. Then get freeze protection. If + * the fs is frozen, just skip the write as we could deadlock + * the system otherwise. */ - if (file_start_write_trylock(file)) { + if (check_free_space(acct) && file_start_write_trylock(file)) { /* it's been opened O_APPEND, so position is irrelevant */ loff_t pos = 0; - __kernel_write(file, &ac, sizeof(acct_t), &pos); + __kernel_write(file, ac, sizeof(acct_t), &pos); file_end_write(file); } -out: + + revert_creds(cred); +} + +static void do_acct_process(struct bsd_acct_struct *acct) +{ + unsigned long flim; + + /* Accounting records are not subject to resource limits. */ + flim = rlimit(RLIMIT_FSIZE); + current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; + fill_ac(acct); + acct_write_process(acct); current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim; - revert_creds(orig_cred); }
/**
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 890ed45bde808c422c3c27d3285fc45affa0f930 upstream.
There's no point in allowing anything kernel internal nor procfs or sysfs.
Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com Link: https://lore.kernel.org/r/20250211-work-acct-v1-2-1c16aecab8b3@kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Amir Goldstein amir73il@gmail.com Reported-by: Zicheng Qu quzicheng@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/acct.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/kernel/acct.c +++ b/kernel/acct.c @@ -243,6 +243,20 @@ static int acct_on(struct filename *path return -EACCES; }
+ /* Exclude kernel kernel internal filesystems. */ + if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) { + kfree(acct); + filp_close(file, NULL); + return -EINVAL; + } + + /* Exclude procfs and sysfs. */ + if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) { + kfree(acct); + filp_close(file, NULL); + return -EINVAL; + } + if (!(file->f_mode & FMODE_CAN_WRITE)) { kfree(acct); filp_close(file, NULL);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Cañuelo Navarro rcn@igalia.com
commit 2ede647a6fde3e54a6bfda7cf01c716649655900 upstream.
Add a sanity check to madvise_dontneed_free() to address a corner case in madvise where a race condition causes the current vma being processed to be backed by a different page size.
During a madvise(MADV_DONTNEED) call on a memory region registered with a userfaultfd, there's a period of time where the process mm lock is temporarily released in order to send a UFFD_EVENT_REMOVE and let userspace handle the event. During this time, the vma covering the current address range may change due to an explicit mmap done concurrently by another thread.
If, after that change, the memory region, which was originally backed by 4KB pages, is now backed by hugepages, the end address is rounded down to a hugepage boundary to avoid data loss (see "Fixes" below). This rounding may cause the end address to be truncated to the same address as the start.
Make this corner case follow the same semantics as in other similar cases where the requested region has zero length (ie. return 0).
This will make madvise_walk_vmas() continue to the next vma in the range (this time holding the process mm lock) which, due to the prev pointer becoming stale because of the vma change, will be the same hugepage-backed vma that was just checked before. The next time madvise_dontneed_free() runs for this vma, if the start address isn't aligned to a hugepage boundary, it'll return -EINVAL, which is also in line with the madvise api.
From userspace perspective, madvise() will return EINVAL because the start
address isn't aligned according to the new vma alignment requirements (hugepage), even though it was correctly page-aligned when the call was issued.
Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs") Signed-off-by: Ricardo Cañuelo Navarro rcn@igalia.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Florent Revest revest@google.com Cc: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/madvise.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/madvise.c +++ b/mm/madvise.c @@ -920,7 +920,16 @@ static long madvise_dontneed_free(struct */ end = vma->vm_end; } - VM_WARN_ON(start >= end); + /* + * If the memory region between start and end was + * originally backed by 4kB pages and then remapped to + * be backed by hugepages while mmap_lock was dropped, + * the adjustment for hugetlb vma above may have rounded + * end down to the start address. + */ + if (start == end) + return 0; + VM_WARN_ON(start > end); }
if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amit Kumar Mahapatra amit.kumar-mahapatra@amd.com
commit 539bd20352832b9244238a055eb169ccf1c41ff6 upstream.
'commit 18bcb4aa54ea ("mtd: spi-nor: sst: Factor out common write operation to `sst_nor_write_data()`")' introduced a bug where only one byte of data is written, regardless of the number of bytes passed to sst_nor_write_data(), causing a kernel crash during the write operation. Ensure the correct number of bytes are written as passed to sst_nor_write_data().
Call trace: [ 57.400180] ------------[ cut here ]------------ [ 57.404842] While writing 2 byte written 1 bytes [ 57.409493] WARNING: CPU: 0 PID: 737 at drivers/mtd/spi-nor/sst.c:187 sst_nor_write_data+0x6c/0x74 [ 57.418464] Modules linked in: [ 57.421517] CPU: 0 UID: 0 PID: 737 Comm: mtd_debug Not tainted 6.12.0-g5ad04afd91f9 #30 [ 57.429517] Hardware name: Xilinx Versal A2197 Processor board revA - x-prc-02 revA (DT) [ 57.437600] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 57.444557] pc : sst_nor_write_data+0x6c/0x74 [ 57.448911] lr : sst_nor_write_data+0x6c/0x74 [ 57.453264] sp : ffff80008232bb40 [ 57.456570] x29: ffff80008232bb40 x28: 0000000000010000 x27: 0000000000000001 [ 57.463708] x26: 000000000000ffff x25: 0000000000000000 x24: 0000000000000000 [ 57.470843] x23: 0000000000010000 x22: ffff80008232bbf0 x21: ffff000816230000 [ 57.477978] x20: ffff0008056c0080 x19: 0000000000000002 x18: 0000000000000006 [ 57.485112] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008232b580 [ 57.492246] x14: 0000000000000000 x13: ffff8000816d1530 x12: 00000000000004a4 [ 57.499380] x11: 000000000000018c x10: ffff8000816fd530 x9 : ffff8000816d1530 [ 57.506515] x8 : 00000000fffff7ff x7 : ffff8000816fd530 x6 : 0000000000000001 [ 57.513649] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 57.520782] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0008049b0000 [ 57.527916] Call trace: [ 57.530354] sst_nor_write_data+0x6c/0x74 [ 57.534361] sst_nor_write+0xb4/0x18c [ 57.538019] mtd_write_oob_std+0x7c/0x88 [ 57.541941] mtd_write_oob+0x70/0xbc [ 57.545511] mtd_write+0x68/0xa8 [ 57.548733] mtdchar_write+0x10c/0x290 [ 57.552477] vfs_write+0xb4/0x3a8 [ 57.555791] ksys_write+0x74/0x10c [ 57.559189] __arm64_sys_write+0x1c/0x28 [ 57.563109] invoke_syscall+0x54/0x11c [ 57.566856] el0_svc_common.constprop.0+0xc0/0xe0 [ 57.571557] do_el0_svc+0x1c/0x28 [ 57.574868] el0_svc+0x30/0xcc [ 57.577921] el0t_64_sync_handler+0x120/0x12c [ 57.582276] el0t_64_sync+0x190/0x194 [ 57.585933] ---[ end trace 0000000000000000 ]---
Cc: stable@vger.kernel.org Fixes: 18bcb4aa54ea ("mtd: spi-nor: sst: Factor out common write operation to `sst_nor_write_data()`") Signed-off-by: Amit Kumar Mahapatra amit.kumar-mahapatra@amd.com Reviewed-by: Pratyush Yadav pratyush@kernel.org Reviewed-by: Tudor Ambarus tudor.ambarus@linaro.org Reviewed-by: Bence Csókás csokas.bence@prolan.hu [pratyush@kernel.org: add Cc stable tag] Signed-off-by: Pratyush Yadav pratyush@kernel.org Link: https://lore.kernel.org/r/20250213054546.2078121-1-amit.kumar-mahapatra@amd.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/spi-nor/sst.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/spi-nor/sst.c b/drivers/mtd/spi-nor/sst.c index b5ad7118c49a..175211fe6a5e 100644 --- a/drivers/mtd/spi-nor/sst.c +++ b/drivers/mtd/spi-nor/sst.c @@ -174,7 +174,7 @@ static int sst_nor_write_data(struct spi_nor *nor, loff_t to, size_t len, int ret;
nor->program_opcode = op; - ret = spi_nor_write_data(nor, to, 1, buf); + ret = spi_nor_write_data(nor, to, len, buf); if (ret < 0) return ret; WARN(ret != len, "While writing %zu byte written %i bytes\n", len, ret);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niravkumar L Rabara niravkumar.l.rabara@intel.com
commit 2b9df00cded911e2ca2cfae5c45082166b24f8aa upstream.
Replace dma_request_channel() with dma_request_chan_by_mask() and use helper functions to return proper error code instead of fixed -EBUSY.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem") Cc: stable@vger.kernel.org Signed-off-by: Niravkumar L Rabara niravkumar.l.rabara@intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/cadence-nand-controller.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c +++ b/drivers/mtd/nand/raw/cadence-nand-controller.c @@ -2904,11 +2904,10 @@ static int cadence_nand_init(struct cdns dma_cap_set(DMA_MEMCPY, mask);
if (cdns_ctrl->caps1->has_dma) { - cdns_ctrl->dmac = dma_request_channel(mask, NULL, NULL); - if (!cdns_ctrl->dmac) { - dev_err(cdns_ctrl->dev, - "Unable to get a DMA channel\n"); - ret = -EBUSY; + cdns_ctrl->dmac = dma_request_chan_by_mask(&mask); + if (IS_ERR(cdns_ctrl->dmac)) { + ret = dev_err_probe(cdns_ctrl->dev, PTR_ERR(cdns_ctrl->dmac), + "%d: Failed to get a DMA channel\n", ret); goto disable_irq; } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niravkumar L Rabara niravkumar.l.rabara@intel.com
commit d76d22b5096c5b05208fd982b153b3f182350b19 upstream.
Remap the slave DMA I/O resources to enhance driver portability. Using a physical address causes DMA translation failure when the ARM SMMU is enabled.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem") Cc: stable@vger.kernel.org Signed-off-by: Niravkumar L Rabara niravkumar.l.rabara@intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/cadence-nand-controller.c | 29 +++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c +++ b/drivers/mtd/nand/raw/cadence-nand-controller.c @@ -471,6 +471,8 @@ struct cdns_nand_ctrl { struct { void __iomem *virt; dma_addr_t dma; + dma_addr_t iova_dma; + u32 size; } io;
int irq; @@ -1835,11 +1837,11 @@ static int cadence_nand_slave_dma_transf }
if (dir == DMA_FROM_DEVICE) { - src_dma = cdns_ctrl->io.dma; + src_dma = cdns_ctrl->io.iova_dma; dst_dma = buf_dma; } else { src_dma = buf_dma; - dst_dma = cdns_ctrl->io.dma; + dst_dma = cdns_ctrl->io.iova_dma; }
tx = dmaengine_prep_dma_memcpy(cdns_ctrl->dmac, dst_dma, src_dma, len, @@ -2869,6 +2871,7 @@ cadence_nand_irq_cleanup(int irqnum, str static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) { dma_cap_mask_t mask; + struct dma_device *dma_dev = cdns_ctrl->dmac->device; int ret;
cdns_ctrl->cdma_desc = dma_alloc_coherent(cdns_ctrl->dev, @@ -2912,6 +2915,16 @@ static int cadence_nand_init(struct cdns } }
+ cdns_ctrl->io.iova_dma = dma_map_resource(dma_dev->dev, cdns_ctrl->io.dma, + cdns_ctrl->io.size, + DMA_BIDIRECTIONAL, 0); + + ret = dma_mapping_error(dma_dev->dev, cdns_ctrl->io.iova_dma); + if (ret) { + dev_err(cdns_ctrl->dev, "Failed to map I/O resource to DMA\n"); + goto dma_release_chnl; + } + nand_controller_init(&cdns_ctrl->controller); INIT_LIST_HEAD(&cdns_ctrl->chips);
@@ -2922,18 +2935,22 @@ static int cadence_nand_init(struct cdns if (ret) { dev_err(cdns_ctrl->dev, "Failed to register MTD: %d\n", ret); - goto dma_release_chnl; + goto unmap_dma_resource; }
kfree(cdns_ctrl->buf); cdns_ctrl->buf = kzalloc(cdns_ctrl->buf_size, GFP_KERNEL); if (!cdns_ctrl->buf) { ret = -ENOMEM; - goto dma_release_chnl; + goto unmap_dma_resource; }
return 0;
+unmap_dma_resource: + dma_unmap_resource(dma_dev->dev, cdns_ctrl->io.iova_dma, + cdns_ctrl->io.size, DMA_BIDIRECTIONAL, 0); + dma_release_chnl: if (cdns_ctrl->dmac) dma_release_channel(cdns_ctrl->dmac); @@ -2955,6 +2972,8 @@ free_buf_desc: static void cadence_nand_remove(struct cdns_nand_ctrl *cdns_ctrl) { cadence_nand_chips_cleanup(cdns_ctrl); + dma_unmap_resource(cdns_ctrl->dmac->device->dev, cdns_ctrl->io.iova_dma, + cdns_ctrl->io.size, DMA_BIDIRECTIONAL, 0); cadence_nand_irq_cleanup(cdns_ctrl->irq, cdns_ctrl); kfree(cdns_ctrl->buf); dma_free_coherent(cdns_ctrl->dev, sizeof(struct cadence_nand_cdma_desc), @@ -3019,7 +3038,9 @@ static int cadence_nand_dt_probe(struct cdns_ctrl->io.virt = devm_platform_get_and_ioremap_resource(ofdev, 1, &res); if (IS_ERR(cdns_ctrl->io.virt)) return PTR_ERR(cdns_ctrl->io.virt); + cdns_ctrl->io.dma = res->start; + cdns_ctrl->io.size = resource_size(res);
dt->clk = devm_clk_get(cdns_ctrl->dev, "nf_clk"); if (IS_ERR(dt->clk))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niravkumar L Rabara niravkumar.l.rabara@intel.com
commit f37d135b42cb484bdecee93f56b9f483214ede78 upstream.
dma_map_single is using physical/bus device (DMA) but dma_unmap_single is using framework device(NAND controller), which is incorrect. Fixed dma_unmap_single to use correct physical/bus device.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem") Cc: stable@vger.kernel.org Signed-off-by: Niravkumar L Rabara niravkumar.l.rabara@intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/cadence-nand-controller.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c +++ b/drivers/mtd/nand/raw/cadence-nand-controller.c @@ -1863,12 +1863,12 @@ static int cadence_nand_slave_dma_transf dma_async_issue_pending(cdns_ctrl->dmac); wait_for_completion(&finished);
- dma_unmap_single(cdns_ctrl->dev, buf_dma, len, dir); + dma_unmap_single(dma_dev->dev, buf_dma, len, dir);
return 0;
err_unmap: - dma_unmap_single(cdns_ctrl->dev, buf_dma, len, dir); + dma_unmap_single(dma_dev->dev, buf_dma, len, dir);
err: dev_dbg(cdns_ctrl->dev, "Fall back to CPU I/O\n");
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
commit 782cffeec9ad96daa64ffb2d527b2a052fb02552 upstream.
According to the latest event list, update the event constraint tables for Lion Cove core.
The general rule (the event codes < 0x90 are restricted to counters 0-3.) has been removed. There is no restriction for most of the performance monitoring events.
Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support") Reported-by: Amiri Khalil amiri.khalil@intel.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250219141005.2446823-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 20 +++++++------------- arch/x86/events/intel/ds.c | 2 +- 2 files changed, 8 insertions(+), 14 deletions(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -397,34 +397,28 @@ static struct event_constraint intel_lnc METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_FETCH_LAT, 6), METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_MEM_BOUND, 7),
+ INTEL_EVENT_CONSTRAINT(0x20, 0xf), + + INTEL_UEVENT_CONSTRAINT(0x012a, 0xf), + INTEL_UEVENT_CONSTRAINT(0x012b, 0xf), INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), INTEL_UEVENT_CONSTRAINT(0x0175, 0x4),
INTEL_EVENT_CONSTRAINT(0x2e, 0x3ff), INTEL_EVENT_CONSTRAINT(0x3c, 0x3ff), - /* - * Generally event codes < 0x90 are restricted to counters 0-3. - * The 0x2E and 0x3C are exception, which has no restriction. - */ - INTEL_EVENT_CONSTRAINT_RANGE(0x01, 0x8f, 0xf),
- INTEL_UEVENT_CONSTRAINT(0x01a3, 0xf), - INTEL_UEVENT_CONSTRAINT(0x02a3, 0xf), INTEL_UEVENT_CONSTRAINT(0x08a3, 0x4), INTEL_UEVENT_CONSTRAINT(0x0ca3, 0x4), INTEL_UEVENT_CONSTRAINT(0x04a4, 0x1), INTEL_UEVENT_CONSTRAINT(0x08a4, 0x1), INTEL_UEVENT_CONSTRAINT(0x10a4, 0x1), INTEL_UEVENT_CONSTRAINT(0x01b1, 0x8), + INTEL_UEVENT_CONSTRAINT(0x01cd, 0x3fc), INTEL_UEVENT_CONSTRAINT(0x02cd, 0x3), - INTEL_EVENT_CONSTRAINT(0xce, 0x1),
INTEL_EVENT_CONSTRAINT_RANGE(0xd0, 0xdf, 0xf), - /* - * Generally event codes >= 0x90 are likely to have no restrictions. - * The exception are defined as above. - */ - INTEL_EVENT_CONSTRAINT_RANGE(0x90, 0xfe, 0x3ff), + + INTEL_UEVENT_CONSTRAINT(0x00e0, 0xf),
EVENT_CONSTRAINT_END }; --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1178,7 +1178,7 @@ struct event_constraint intel_lnc_pebs_e INTEL_FLAGS_UEVENT_CONSTRAINT(0x100, 0x100000000ULL), /* INST_RETIRED.PREC_DIST */ INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL),
- INTEL_HYBRID_LDLAT_CONSTRAINT(0x1cd, 0x3ff), + INTEL_HYBRID_LDLAT_CONSTRAINT(0x1cd, 0x3fc), INTEL_HYBRID_STLAT_CONSTRAINT(0x2cd, 0x3), INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
commit 4cb77793842a351b39a030f77caebace3524840e upstream.
Christoph reports that their rk3399 system dies since commit 773c05f417fa1 ("irqchip/gic-v3: Work around insecure GIC integrations").
It appears that some rk3399 have secure payloads, and that the firmware sets SCR_EL3.FIQ==1. Obivously, disabling security in that configuration leads to even more problems.
Revisit the workaround by:
- making it rk3399 specific - checking whether Group-0 is available, which is a good proxy for SCR_EL3.FIQ being 0 - either apply the workaround if Group-0 is available, or disable pseudo-NMIs if not
Note that this doesn't mean that the secure side is able to receive interrupts, as all interrupts are made non-secure anyway.
Clearly, nobody ever tested secure interrupts on this platform.
Fixes: 773c05f417fa1 ("irqchip/gic-v3: Work around insecure GIC integrations") Reported-by: Christoph Fritz chf.fritz@googlemail.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Christoph Fritz chf.fritz@googlemail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250215185241.3768218-1-maz@kernel.org Closes: https://lore.kernel.org/r/b1266652fb64857246e8babdf268d0df8f0c36d9.camel@goo... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3.c | 49 ++++++++++++++++++++++++++++-------- 1 file changed, 38 insertions(+), 11 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 76dce0aac246..270d7a4d85a6 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -44,6 +44,7 @@ static u8 dist_prio_nmi __ro_after_init = GICV3_PRIO_NMI; #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0) #define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539 (1ULL << 1) #define FLAGS_WORKAROUND_ASR_ERRATUM_8601001 (1ULL << 2) +#define FLAGS_WORKAROUND_INSECURE (1ULL << 3)
#define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1)
@@ -83,6 +84,8 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key); #define GIC_LINE_NR min(GICD_TYPER_SPIS(gic_data.rdists.gicd_typer), 1020U) #define GIC_ESPI_NR GICD_TYPER_ESPIS(gic_data.rdists.gicd_typer)
+static bool nmi_support_forbidden; + /* * There are 16 SGIs, though we only actually use 8 in Linux. The other 8 SGIs * are potentially stolen by the secure side. Some code, especially code dealing @@ -163,21 +166,27 @@ static void __init gic_prio_init(void) { bool ds;
+ cpus_have_group0 = gic_has_group0(); + ds = gic_dist_security_disabled(); - if (!ds) { - u32 val; + if ((gic_data.flags & FLAGS_WORKAROUND_INSECURE) && !ds) { + if (cpus_have_group0) { + u32 val;
- val = readl_relaxed(gic_data.dist_base + GICD_CTLR); - val |= GICD_CTLR_DS; - writel_relaxed(val, gic_data.dist_base + GICD_CTLR); + val = readl_relaxed(gic_data.dist_base + GICD_CTLR); + val |= GICD_CTLR_DS; + writel_relaxed(val, gic_data.dist_base + GICD_CTLR);
- ds = gic_dist_security_disabled(); - if (ds) - pr_warn("Broken GIC integration, security disabled"); + ds = gic_dist_security_disabled(); + if (ds) + pr_warn("Broken GIC integration, security disabled\n"); + } else { + pr_warn("Broken GIC integration, pNMI forbidden\n"); + nmi_support_forbidden = true; + } }
cpus_have_security_disabled = ds; - cpus_have_group0 = gic_has_group0();
/* * How priority values are used by the GIC depends on two things: @@ -209,7 +218,7 @@ static void __init gic_prio_init(void) * be in the non-secure range, we program the non-secure values into * the distributor to match the PMR values we want. */ - if (cpus_have_group0 & !cpus_have_security_disabled) { + if (cpus_have_group0 && !cpus_have_security_disabled) { dist_prio_irq = __gicv3_prio_to_ns(dist_prio_irq); dist_prio_nmi = __gicv3_prio_to_ns(dist_prio_nmi); } @@ -1922,6 +1931,18 @@ static bool gic_enable_quirk_arm64_2941627(void *data) return true; }
+static bool gic_enable_quirk_rk3399(void *data) +{ + struct gic_chip_data *d = data; + + if (of_machine_is_compatible("rockchip,rk3399")) { + d->flags |= FLAGS_WORKAROUND_INSECURE; + return true; + } + + return false; +} + static bool rd_set_non_coherent(void *data) { struct gic_chip_data *d = data; @@ -1996,6 +2017,12 @@ static const struct gic_quirk gic_quirks[] = { .property = "dma-noncoherent", .init = rd_set_non_coherent, }, + { + .desc = "GICv3: Insecure RK3399 integration", + .iidr = 0x0000043b, + .mask = 0xff000fff, + .init = gic_enable_quirk_rk3399, + }, { } }; @@ -2004,7 +2031,7 @@ static void gic_enable_nmi_support(void) { int i;
- if (!gic_prio_masking_enabled()) + if (!gic_prio_masking_enabled() || nmi_support_forbidden) return;
rdist_nmi_refs = kcalloc(gic_data.ppi_nr + SGI_NR,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoxiang Li haoxiang_li2024@163.com
commit 860ca5e50f73c2a1cef7eefc9d39d04e275417f7 upstream.
Add check for the return value of cifs_buf_get() and cifs_small_buf_get() in receive_encrypted_standard() to prevent null pointer dereference.
Fixes: eec04ea11969 ("smb: client: fix OOB in receive_encrypted_standard()") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li haoxiang_li2024@163.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2ops.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -4991,6 +4991,10 @@ one_more: next_buffer = (char *)cifs_buf_get(); else next_buffer = (char *)cifs_small_buf_get(); + if (!next_buffer) { + cifs_server_dbg(VFS, "No memory for (large) SMB response\n"); + return -1; + } memcpy(next_buffer, buf + next_cmd, pdu_length - next_cmd); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Komal Bajaj quic_kbajaj@quicinc.com
commit c158647c107358bf1be579f98e4bb705c1953292 upstream.
The previous implementation incorrectly configured the cmn_interrupt_2_enable register for interrupt handling. Using cmn_interrupt_2_enable to configure Tag, Data RAM ECC interrupts would lead to issues like double handling of the interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured for interrupts which needs to be handled by EL3.
EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.
Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Signed-off-by: Komal Bajaj quic_kbajaj@quicinc.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Cc: stable@kernel.org Link: https://lore.kernel.org/r/20241119064608.12326-1-quic_kbajaj@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/edac/qcom_edac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/edac/qcom_edac.c +++ b/drivers/edac/qcom_edac.c @@ -95,7 +95,7 @@ static int qcom_llcc_core_setup(struct l * Configure interrupt enable registers such that Tag, Data RAM related * interrupts are propagated to interrupt controller for servicing */ - ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable, + ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_0_enable, TRP0_INTERRUPT_ENABLE, TRP0_INTERRUPT_ENABLE); if (ret) @@ -113,7 +113,7 @@ static int qcom_llcc_core_setup(struct l if (ret) return ret;
- ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable, + ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_0_enable, DRP0_INTERRUPT_ENABLE, DRP0_INTERRUPT_ENABLE); if (ret)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 57b76bedc5c52c66968183b5ef57234894c25ce7 upstream.
The function tracer should record the preemption level at the point when the function is invoked. If the tracing subsystem decrement the preemption counter it needs to correct this before feeding the data into the trace buffer. This was broken in the commit cited below while shifting the preempt-disabled section.
Use tracing_gen_ctx_dec() which properly subtracts one from the preemption counter on a preemptible kernel.
Cc: stable@vger.kernel.org Cc: Wander Lairson Costa wander@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/20250220140749.pfw8qoNZ@linutronix.de Fixes: ce5e48036c9e7 ("ftrace: disable preemption when recursion locked") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Tested-by: Wander Lairson Costa wander@redhat.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_functions.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -193,7 +193,7 @@ function_trace_call(unsigned long ip, un if (bit < 0) return;
- trace_ctx = tracing_gen_ctx(); + trace_ctx = tracing_gen_ctx_dec();
cpu = smp_processor_id(); data = per_cpu_ptr(tr->array_buffer.data, cpu); @@ -298,7 +298,6 @@ function_no_repeats_trace_call(unsigned struct trace_array *tr = op->private; struct trace_array_cpu *data; unsigned int trace_ctx; - unsigned long flags; int bit; int cpu;
@@ -325,8 +324,7 @@ function_no_repeats_trace_call(unsigned if (is_repeat_check(tr, last_info, ip, parent_ip)) goto out;
- local_save_flags(flags); - trace_ctx = tracing_gen_ctx_flags(flags); + trace_ctx = tracing_gen_ctx_dec(); process_repeats(tr, ip, parent_ip, last_info, trace_ctx);
trace_function(tr, ip, parent_ip, trace_ctx);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit 38b14061947fa546491656e3f5e388d4fedf8dba upstream.
Function graph uses a subops and manager ops mechanism to attach to ftrace. The manager ops connects to ftrace and the functions it connects to is defined by a list of subops that it manages.
The function hash that defines what the above ops attaches to limits the functions to attach if the hash has any content. If the hash is empty, it means to trace all functions.
The creation of the manager ops hash is done by iterating over all the subops hashes. If any of the subops hashes is empty, it means that the manager ops hash must trace all functions as well.
The issue is in the creation of the manager ops. When a second subops is attached, a new hash is created by starting it as NULL and adding the subops one at a time. But the NULL ops is mistaken as an empty hash, and once an empty hash is found, it stops the loop of subops and just enables all functions.
# echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events # cat /sys/kernel/tracing/enabled_functions kernel_clone (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
# echo "f:myevent2 schedule_timeout" >> /sys/kernel/tracing/dynamic_events # cat /sys/kernel/tracing/enabled_functions trace_initcall_start_cb (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 try_to_run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 x86_pmu_show_pmu_cap (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 cleanup_rapl_pmus (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 uncore_free_pcibus_map (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 uncore_types_exit (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 uncore_pci_exit.part.0 (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 kvm_shutdown (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 vmx_dump_msrs (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 vmx_cleanup_l1d_flush (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 [..]
Fix this by initializing the new hash to NULL and if the hash is NULL do not treat it as an empty hash but instead allocate by copying the content of the first sub ops. Then on subsequent iterations, the new hash will not be NULL, but the content of the previous subops. If that first subops attached to all functions, then new hash may assume that the manager ops also needs to attach to all functions.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Heiko Carstens hca@linux.ibm.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Alexander Gordeev agordeev@linux.ibm.com Link: https://lore.kernel.org/20250220202055.060300046@goodmis.org Fixes: 5fccc7552ccbc ("ftrace: Add subops logic to allow one ops to manage many") Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3219,15 +3219,22 @@ static struct ftrace_hash *copy_hash(str * The filter_hash updates uses just the append_hash() function * and the notrace_hash does not. */ -static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash) +static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash, + int size_bits) { struct ftrace_func_entry *entry; int size; int i;
- /* An empty hash does everything */ - if (ftrace_hash_empty(*hash)) - return 0; + if (*hash) { + /* An empty hash does everything */ + if (ftrace_hash_empty(*hash)) + return 0; + } else { + *hash = alloc_ftrace_hash(size_bits); + if (!*hash) + return -ENOMEM; + }
/* If new_hash has everything make hash have everything */ if (ftrace_hash_empty(new_hash)) { @@ -3291,16 +3298,18 @@ static int intersect_hash(struct ftrace_ /* Return a new hash that has a union of all @ops->filter_hash entries */ static struct ftrace_hash *append_hashes(struct ftrace_ops *ops) { - struct ftrace_hash *new_hash; + struct ftrace_hash *new_hash = NULL; struct ftrace_ops *subops; + int size_bits; int ret;
- new_hash = alloc_ftrace_hash(ops->func_hash->filter_hash->size_bits); - if (!new_hash) - return NULL; + if (ops->func_hash->filter_hash) + size_bits = ops->func_hash->filter_hash->size_bits; + else + size_bits = FTRACE_HASH_DEFAULT_BITS;
list_for_each_entry(subops, &ops->subop_list, list) { - ret = append_hash(&new_hash, subops->func_hash->filter_hash); + ret = append_hash(&new_hash, subops->func_hash->filter_hash, size_bits); if (ret < 0) { free_ftrace_hash(new_hash); return NULL; @@ -3309,7 +3318,8 @@ static struct ftrace_hash *append_hashes if (ftrace_hash_empty(new_hash)) break; } - return new_hash; + /* Can't return NULL as that means this failed */ + return new_hash ? : EMPTY_HASH; }
/* Make @ops trace evenything except what all its subops do not trace */ @@ -3504,7 +3514,8 @@ int ftrace_startup_subops(struct ftrace_ filter_hash = alloc_and_copy_ftrace_hash(size_bits, ops->func_hash->filter_hash); if (!filter_hash) return -ENOMEM; - ret = append_hash(&filter_hash, subops->func_hash->filter_hash); + ret = append_hash(&filter_hash, subops->func_hash->filter_hash, + size_bits); if (ret < 0) { free_ftrace_hash(filter_hash); return ret;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit 8eb4b09e0bbd30981305643229fe7640ad41b667 upstream.
Check if a function is already in the manager ops of a subops. A manager ops contains multiple subops, and if two or more subops are tracing the same function, the manager ops only needs a single entry in its hash.
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Sven Schnelle svens@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Alexander Gordeev agordeev@linux.ibm.com Link: https://lore.kernel.org/20250220202055.226762894@goodmis.org Fixes: 4f554e955614f ("ftrace: Add ftrace_set_filter_ips function") Tested-by: Heiko Carstens hca@linux.ibm.com Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5758,6 +5758,9 @@ __ftrace_match_addr(struct ftrace_hash * return -ENOENT; free_hash_entry(hash, entry); return 0; + } else if (__ftrace_lookup_ip(hash, ip) != NULL) { + /* Already exists */ + return 0; }
entry = add_hash_entry(hash, ip);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit 22bec11a569983f39c6061cb82279e7de9e3bdfc upstream.
When the function tracing_set_tracer() switched over to using the guard() infrastructure, it did not need to save the 'ret' variable and would just return the value when an error arised, instead of setting ret and jumping to an out label.
When CONFIG_TRACER_SNAPSHOT is enabled, it had code that expected the "ret" variable to be initialized to zero and had set 'ret' while holding an arch_spin_lock() (not used by guard), and then upon releasing the lock it would check 'ret' and exit if set. But because ret was only set when an error occurred while holding the locks, 'ret' would be used uninitialized if there was no error. The code in the CONFIG_TRACER_SNAPSHOT block should be self contain. Make sure 'ret' is also set when no error occurred.
Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/20250106111143.2f90ff65@gandalf.local.home Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/r/202412271654.nJVBuwmF-lkp@intel.com/ Fixes: d33b10c0c73ad ("tracing: Switch trace.c code over to use guard()") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6125,8 +6125,7 @@ int tracing_set_tracer(struct trace_arra if (t->use_max_tr) { local_irq_disable(); arch_spin_lock(&tr->max_lock); - if (tr->cond_snapshot) - ret = -EBUSY; + ret = tr->cond_snapshot ? -EBUSY : 0; arch_spin_unlock(&tr->max_lock); local_irq_enable(); if (ret)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
commit 488fb6effe03e20f38d34da7425de77bbd3e2665 upstream.
Fix a deadlock in pse_pi_get_current_limit and pse_pi_set_current_limit caused by consecutive mutex_lock calls. One in the function itself and another in pse_pi_get_voltage.
Resolve the issue by using the unlocked version of pse_pi_get_voltage instead.
Fixes: e0a5e2bba38a ("net: pse-pd: Use power limit at driver side instead of current limit") Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://patch.msgid.link/20250212151751.1515008-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/pse-pd/pse_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/pse-pd/pse_core.c +++ b/drivers/net/pse-pd/pse_core.c @@ -309,7 +309,7 @@ static int pse_pi_get_current_limit(stru goto out; mW = ret;
- ret = pse_pi_get_voltage(rdev); + ret = _pse_pi_get_voltage(rdev); if (!ret) { dev_err(pcdev->dev, "Voltage null\n"); ret = -ERANGE; @@ -346,7 +346,7 @@ static int pse_pi_set_current_limit(stru
id = rdev_get_id(rdev); mutex_lock(&pcdev->lock); - ret = pse_pi_get_voltage(rdev); + ret = _pse_pi_get_voltage(rdev); if (!ret) { dev_err(pcdev->dev, "Voltage null\n"); ret = -ERANGE;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
commit f3f08c3acfb8860e07a22814a344e83c99ad7398 upstream.
While fixing migration disabled task handling, 32966821574c ("sched_ext: Fix migration disabled handling in targeted dispatches") assumed that a migration disabled task's ->cpus_ptr would only have the pinned CPU. While this is eventually true for migration disabled tasks that are switched out, ->cpus_ptr update is performed by migrate_disable_switch() which is called right before context_switch() in __scheduler(). However, the task is enqueued earlier during pick_next_task() via put_prev_task_scx(), so there is a race window where another CPU can see the task on a DSQ.
If the CPU tries to dispatch the migration disabled task while in that window, task_allowed_on_cpu() will succeed and task_can_run_on_remote_rq() will subsequently trigger SCHED_WARN(is_migration_disabled()).
WARNING: CPU: 8 PID: 1837 at kernel/sched/ext.c:2466 task_can_run_on_remote_rq+0x12e/0x140 Sched_ext: layered (enabled+all), task: runnable_at=-10ms RIP: 0010:task_can_run_on_remote_rq+0x12e/0x140 ... <TASK> consume_dispatch_q+0xab/0x220 scx_bpf_dsq_move_to_local+0x58/0xd0 bpf_prog_84dd17b0654b6cf0_layered_dispatch+0x290/0x1cfa bpf__sched_ext_ops_dispatch+0x4b/0xab balance_one+0x1fe/0x3b0 balance_scx+0x61/0x1d0 prev_balance+0x46/0xc0 __pick_next_task+0x73/0x1c0 __schedule+0x206/0x1730 schedule+0x3a/0x160 __do_sys_sched_yield+0xe/0x20 do_syscall_64+0xbb/0x1e0 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fix it by converting the SCHED_WARN() back to a regular failure path. Also, perform the migration disabled test before task_allowed_on_cpu() test so that BPF schedulers which fail to handle migration disabled tasks can be noticed easily.
While at it, adjust scx_ops_error() message for !task_allowed_on_cpu() case for brevity and consistency.
Signed-off-by: Tejun Heo tj@kernel.org Fixes: 32966821574c ("sched_ext: Fix migration disabled handling in targeted dispatches") Acked-by: Andrea Righi arighi@nvidia.com Reported-by: Jake Hillion jakehillion@meta.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/ext.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)
--- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2311,6 +2311,25 @@ static bool task_can_run_on_remote_rq(st SCHED_WARN_ON(task_cpu(p) == cpu);
/* + * If @p has migration disabled, @p->cpus_ptr is updated to contain only + * the pinned CPU in migrate_disable_switch() while @p is being switched + * out. However, put_prev_task_scx() is called before @p->cpus_ptr is + * updated and thus another CPU may see @p on a DSQ inbetween leading to + * @p passing the below task_allowed_on_cpu() check while migration is + * disabled. + * + * Test the migration disabled state first as the race window is narrow + * and the BPF scheduler failing to check migration disabled state can + * easily be masked if task_allowed_on_cpu() is done first. + */ + if (unlikely(is_migration_disabled(p))) { + if (trigger_error) + scx_ops_error("SCX_DSQ_LOCAL[_ON] cannot move migration disabled %s[%d] from CPU %d to %d", + p->comm, p->pid, task_cpu(p), cpu); + return false; + } + + /* * We don't require the BPF scheduler to avoid dispatching to offline * CPUs mostly for convenience but also because CPUs can go offline * between scx_bpf_dispatch() calls and here. Trigger error iff the @@ -2318,17 +2337,11 @@ static bool task_can_run_on_remote_rq(st */ if (!task_allowed_on_cpu(p, cpu)) { if (trigger_error) - scx_ops_error("SCX_DSQ_LOCAL[_ON] verdict target cpu %d not allowed for %s[%d]", - cpu_of(rq), p->comm, p->pid); + scx_ops_error("SCX_DSQ_LOCAL[_ON] target CPU %d not allowed for %s[%d]", + cpu, p->comm, p->pid); return false; }
- /* - * If @p has migration disabled, @p->cpus_ptr only contains its current - * CPU and the above task_allowed_on_cpu() test should have failed. - */ - SCHED_WARN_ON(is_migration_disabled(p)); - if (!scx_rq_online(rq)) return false;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kevin Brodsky kevin.brodsky@arm.com
commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 upstream.
The mm kselftests are currently built with no optimisation (-O0). It's unclear why, and besides being obviously suboptimal, this also prevents the pkeys tests from working as intended. Let's build all the tests with -O2.
[kevin.brodsky@arm.com: silence unused-result warnings] Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com Cc: Aruna Ramakrishna aruna.ramakrishna@oracle.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Joey Gouly joey.gouly@arm.com Cc: Keith Lucas keith.lucas@oracle.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Shuah Khan shuah@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [ Yifei: This commit also fix the failure of pkey_sighandler_tests_64, which is also in linux-6.12.y, thus backport this commit. ] Signed-off-by: Yifei Liu yifei.l.liu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/Makefile | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -33,9 +33,16 @@ endif # LDLIBS. MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) +CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is +# automatically enabled at -O1 or above. This triggers various unused-result +# warnings where functions such as read() or write() are called and their +# return value is not checked. Disable _FORTIFY_SOURCE to silence those +# warnings. +CFLAGS += -U_FORTIFY_SOURCE + TEST_GEN_FILES = cow TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_longterm
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tianling Shen cnsztl@gmail.com
commit a6a7cba17c544fb95d5a29ab9d9ed4503029cb29 upstream.
In general the delay should be added by the PHY instead of the MAC, and this improves network stability on some boards which seem to need different delay.
Fixes: 387b3bbac5ea ("arm64: dts: rockchip: Add Xunlong OrangePi R1 Plus LTS") Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Tianling Shen cnsztl@gmail.com Link: https://lore.kernel.org/r/20250119091154.1110762-1-cnsztl@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de [Fix conflicts due to missing dtsi conversion] Signed-off-by: Tianling Shen cnsztl@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts @@ -15,9 +15,11 @@ };
&gmac2io { + /delete-property/ tx_delay; + /delete-property/ rx_delay; + phy-handle = <&yt8531c>; - tx_delay = <0x19>; - rx_delay = <0x05>; + phy-mode = "rgmii-id";
mdio { /delete-node/ ethernet-phy@1;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit b35eb9128ebeec534eed1cefd6b9b1b7282cf5ba upstream.
When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips.
v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order
Reviewed-by: Lijo Lazar lijo.lazar@amd.com Suggested-by: Błażej Szczygieł mumei6102@gmail.com Suggested-by: Sergey Kovalenko seryoga.engineering@gmail.com Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -7415,6 +7415,34 @@ static void gfx_v9_0_ring_emit_cleaner_s amdgpu_ring_write(ring, 0); /* RESERVED field, programmed to zero */ }
+static void gfx_v9_0_ring_begin_use_compute(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + amdgpu_gfx_enforce_isolation_ring_begin_use(ring); + + /* Raven and PCO APUs seem to have stability issues + * with compute and gfxoff and gfx pg. Disable gfx pg during + * submission and allow again afterwards. + */ + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 1, 0)) + gfx_v9_0_set_powergating_state(adev, AMD_PG_STATE_UNGATE); +} + +static void gfx_v9_0_ring_end_use_compute(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + /* Raven and PCO APUs seem to have stability issues + * with compute and gfxoff and gfx pg. Disable gfx pg during + * submission and allow again afterwards. + */ + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 1, 0)) + gfx_v9_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + + amdgpu_gfx_enforce_isolation_ring_end_use(ring); +} + static const struct amd_ip_funcs gfx_v9_0_ip_funcs = { .name = "gfx_v9_0", .early_init = gfx_v9_0_early_init, @@ -7591,8 +7619,8 @@ static const struct amdgpu_ring_funcs gf .emit_wave_limit = gfx_v9_0_emit_wave_limit, .reset = gfx_v9_0_reset_kcq, .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader, - .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use, - .end_use = amdgpu_gfx_enforce_isolation_ring_end_use, + .begin_use = gfx_v9_0_ring_begin_use_compute, + .end_use = gfx_v9_0_ring_end_use_compute, };
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 55ed2b1b50d029dd7e49a35f6628ca64db6d75d8 upstream.
Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO.
Reviewed-by: Lijo Lazar lijo.lazar@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -119,9 +119,10 @@ * - 3.58.0 - Add GFX12 DCC support * - 3.59.0 - Cleared VRAM * - 3.60.0 - Add AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE (Vulkan requirement) + * - 3.61.0 - Contains fix for RV/PCO compute queues */ #define KMS_DRIVER_MAJOR 3 -#define KMS_DRIVER_MINOR 60 +#define KMS_DRIVER_MINOR 61 #define KMS_DRIVER_PATCHLEVEL 0
/*
On 2/24/2025 6:33 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.17 release. There are 154 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 26 Feb 2025 14:25:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.17-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: FLorian Fainelli florian.fainelli@broadcom.com
Am 24.02.2025 um 15:33 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.12.17 release. There are 154 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
linux-stable-mirror@lists.linaro.org