This is the start of the stable review cycle for the 6.6.80 release. There are 140 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 26 Feb 2025 14:25:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.80-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.6.80-rc1
Patrick Bellasi derkling@google.com x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: handle errors that nilfs_prepare_chunk() may return
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: eliminate staggered calls to kunmap in nilfs_rename
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF
Tianling Shen cnsztl@gmail.com arm64: dts: rockchip: change eth phy mode to rgmii-id for orangepi r1 plus lts
Yu Kuai yukuai3@huawei.com md: Fix md_seq_ops() regressions
Yu Kuai yukuai3@huawei.com md: fix missing flush of sync_work
Cosmin Ratiu cratiu@nvidia.com net/mlx5e: Don't call cleanup on profile rollback failure
Steven Rostedt rostedt@goodmis.org ftrace: Do not add duplicate entries in subops manager ops
Sebastian Andrzej Siewior bigeasy@linutronix.de ftrace: Correct preemption accounting for function tracing.
Komal Bajaj quic_kbajaj@quicinc.com EDAC/qcom: Correct interrupt enable register configuration
Haoxiang Li haoxiang_li2024@163.com smb: client: Add check for next_buffer in receive_encrypted_standard()
Niravkumar L Rabara niravkumar.l.rabara@intel.com mtd: rawnand: cadence: fix incorrect device in dma_unmap_single
Niravkumar L Rabara niravkumar.l.rabara@intel.com mtd: rawnand: cadence: use dma_map_resource for sdma address
Niravkumar L Rabara niravkumar.l.rabara@intel.com mtd: rawnand: cadence: fix error code in cadence_nand_init()
Ricardo Cañuelo Navarro rcn@igalia.com mm,madvise,hugetlb: check for 0-length range after end address adjustment
Christian Brauner brauner@kernel.org acct: block access to kernel internal filesystems
Christian Brauner brauner@kernel.org acct: perform last write from workqueue
Peter Ujfalusi peter.ujfalusi@linux.intel.com ASoC: SOF: pcm: Clear the susbstream pointer to NULL on close
John Veness john-linux@pelago.org.uk ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED
Wentao Liang vulab@iscas.ac.cn ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls()
Nikita Zhandarovich n.zhandarovich@fintech.ru ASoC: fsl_micfil: Enable default case in micfil_set_quality()
Peter Ujfalusi peter.ujfalusi@linux.intel.com ASoC: SOF: stream-ipc: Check for cstream nullity in sof_ipc_msg_data()
Haoxiang Li haoxiang_li2024@163.com nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
Pavel Begunkov asml.silence@gmail.com lib/iov_iter: fix import_iovec_ubuf iovec management
Haoxiang Li haoxiang_li2024@163.com soc: loongson: loongson2_guts: Add check for devm_kstrdup()
Gavrilov Ilia Ilia.Gavrilov@infotecs.ru drop_monitor: fix incorrect initialization order
Sumit Garg sumit.garg@linaro.org tee: optee: Fix supplicant wait loop
Pavel Begunkov asml.silence@gmail.com io_uring: prevent opcode speculation
Imre Deak imre.deak@intel.com drm/i915/dp: Fix error handling during 128b/132b link training
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Make sure all planes in use by the joiner have their crtc included
Jessica Zhang quic_jesszhan@quicinc.com drm/msm/dpu: Disable dither in phys encoder cleanup
Chen-Yu Tsai wenst@chromium.org arm64: dts: mediatek: mt8183: Disable DSI display output by default
Aaron Kling webgeek1234@gmail.com drm/nouveau/pmu: Fix gp10b firmware guard
Yan Zhai yan@cloudflare.com bpf: skip non exist keys in generic_map_lookup_batch
Caleb Sander Mateos csander@purestorage.com nvme/ioctl: add missing space in err message
Rob Clark robdclark@chromium.org drm/msm: Avoid rounding up to one jiffy
David Hildenbrand david@redhat.com nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range()
Andrey Vatoropin a.vatoropin@crpt.ru power: supply: da9150-fg: fix potential overflow
Abel Wu wuyun.abel@bytedance.com bpf: Fix deadlock when freeing cgroup storage
Jiayuan Chen mrpre@163.com bpf: Disable non stream socket for strparser
Jiayuan Chen mrpre@163.com bpf: Fix wrong copied_seq calculation
Jiayuan Chen mrpre@163.com strparser: Add read_sock callback
Andrii Nakryiko andrii@kernel.org bpf: avoid holding freeze_mutex during mmap operation
Andrii Nakryiko andrii@kernel.org bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
Shigeru Yoshida syoshida@redhat.com bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
Dan Carpenter dan.carpenter@linaro.org drm/msm/gem: prevent integer overflow in msm_ioctl_gem_submit()
Rob Clark robdclark@chromium.org drm/msm/gem: Demote userspace errors to DRM_UT_DRIVER
Devarsh Thakkar devarsht@ti.com drm/tidss: Fix race condition while handling interrupt registers
Tomi Valkeinen tomi.valkeinen@ideasonboard.com drm/tidss: Add simple K2G manual reset
Sabrina Dubroca sd@queasysnail.net tcp: drop secpath at the same time as we currently drop dst
Nick Hu nick.hu@sifive.com net: axienet: Set mac_managed_pm
Breno Leitao leitao@debian.org arp: switch to dev_getbyhwaddr() in arp_req_set_public()
Breno Leitao leitao@debian.org net: Add non-RCU dev_getbyhwaddr() helper
Cong Wang xiyou.wangcong@gmail.com flow_dissector: Fix port range key handling in BPF conversion
Cong Wang xiyou.wangcong@gmail.com flow_dissector: Fix handling of mixed port and port-range keys
Kuniyuki Iwashima kuniyu@amazon.com geneve: Suppress list corruption splat in geneve_destroy_tunnels().
Kuniyuki Iwashima kuniyu@amazon.com gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
Jakub Kicinski kuba@kernel.org tcp: adjust rcvq_space after updating scaling ratio
Michal Luczaj mhal@rbox.co vsock/bpf: Warn on socket without transport
Michal Luczaj mhal@rbox.co sockmap, vsock: For connectible sockets allow only connected
Nick Child nnac123@linux.ibm.com ibmvnic: Don't reference skb after sending to VIOS
Nick Child nnac123@linux.ibm.com ibmvnic: Add stat for tx direct vs tx batched
Nick Child nnac123@linux.ibm.com ibmvnic: Introduce send sub-crq direct
Nick Child nnac123@linux.ibm.com ibmvnic: Return error code on TX scrq flush fail
Julian Ruess julianr@linux.ibm.com s390/ism: add release function for struct device
Takashi Iwai tiwai@suse.de ALSA: seq: Drop UMP events when no UMP-conversion is set
Pierre Riteau pierre@stackhpc.com net/sched: cls_api: fix error handling causing NULL dereference
Vitaly Rodionov vitalyr@opensource.cirrus.com ALSA: hda/cirrus: Correct the full scale volume set logic
Kuniyuki Iwashima kuniyu@amazon.com geneve: Fix use-after-free in geneve_find_dev().
Christophe Leroy christophe.leroy@csgroup.eu powerpc/code-patching: Fix KASAN hit by not flagging text patching area as VM_ALLOC
Kailang Yang kailang@realtek.com ALSA: hda/realtek: Fixup ALC225 depop procedure
Christophe Leroy christophe.leroy@csgroup.eu powerpc/64s: Rewrite __real_pte() and __rpte_to_hidx() as static inline
Michael Ellerman mpe@ellerman.id.au powerpc/64s/mm: Move __real_pte stubs into hash-4k.h
John Keeping jkeeping@inmusicbrands.com ASoC: rockchip: i2s-tdm: fix shift config for SND_SOC_DAIFMT_DSP_[AB]
Jill Donahue jilliandonahue58@gmail.com USB: gadget: f_midi: f_midi_complete to call queue_work
Roy Luo royluo@google.com usb: gadget: core: flush gadget workqueue after device removal
Roy Luo royluo@google.com USB: gadget: core: create sysfs link between udc and gadget
Sascha Hauer s.hauer@pengutronix.de nvmem: imx-ocotp-ele: fix MAC address byte order
Miquel Raynal miquel.raynal@bootlin.com nvmem: Move and rename ->fixup_cell_info()
Miquel Raynal miquel.raynal@bootlin.com nvmem: Simplify the ->add_cells() hook
Miquel Raynal miquel.raynal@bootlin.com nvmem: Create a header for internal sharing
Ricardo Ribalda ribalda@chromium.org media: uvcvideo: Remove dangling pointers
Ricardo Ribalda ribalda@chromium.org media: uvcvideo: Only save async fh if success
Ricardo Ribalda ribalda@chromium.org media: uvcvideo: Refactor iterators
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org soc: mediatek: mtk-devapc: Fix leaking IO map on driver remove
Uwe Kleine-König u.kleine-koenig@pengutronix.de soc/mediatek: mtk-devapc: Convert to platform remove callback returning void
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: sm8550: Fix ADSP memory base and length
Neil Armstrong neil.armstrong@linaro.org arm64: dts: qcom: sm8550: add missing qcom,non-secure-domain property
Ling Xu quic_lxu5@quicinc.com arm64: dts: qcom: sm8550: Add dma-coherent property
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: sm8450: Fix ADSP memory base and length
Neil Armstrong neil.armstrong@linaro.org arm64: dts: qcom: sm8450: add missing qcom,non-secure-domain property
Igor Pylypiv ipylypiv@google.com scsi: core: Do not retry I/Os during depopulation
Douglas Gilbert dgilbert@interlog.com scsi: core: Handle depopulation and restoration in progress
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available()
Dan Carpenter dan.carpenter@linaro.org ASoC: renesas: rz-ssi: Add a check for negative sample_space
Dmitry Torokhov dmitry.torokhov@gmail.com Input: synaptics - fix crash when enabling pass-through port
Dmitry Torokhov dmitry.torokhov@gmail.com Input: serio - define serio_pause_rx guard to pause and resume serio ports
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: qca: Fix poor RF performance for WCN6855
Cheng Jiang quic_chejiang@quicinc.com Bluetooth: qca: Update firmware-name to support board specific nvm
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: qca: Support downloading board id specific NVM for WCN7850
Andreas Kemnade andreas@kemnade.info cpufreq: fix using cpufreq-dt as module
Jeff Johnson quic_jjohnson@quicinc.com cpufreq: dt-platdev: add missing MODULE_DESCRIPTION() macro
Chen Ridong chenridong@huawei.com memcg: fix soft lockup in the OOM process
Carlos Galo carlosgalo@google.com mm: update mark_victim tracepoints fields
Yu Kuai yukuai3@huawei.com md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
Yu Kuai yukuai3@huawei.com md/md-bitmap: add 'sync_size' into struct md_bitmap_stats
Yu Kuai yukuai3@huawei.com md/md-cluster: fix spares warnings for __le64
Yu Kuai yukuai3@huawei.com md/md-bitmap: replace md_bitmap_status() with a new helper md_bitmap_get_stats()
Yu Kuai yukuai3@huawei.com md: simplify md_seq_ops
Yu Kuai yukuai3@huawei.com md: factor out a helper from mddev_put()
Yu Kuai yukuai3@huawei.com md: use separate work_struct for md_start_sync()
Darrick J. Wong djwong@kernel.org xfs: don't over-report free space or inodes in statvfs
Darrick J. Wong djwong@kernel.org xfs: report realtime block quota limits on realtime directories
Ojaswin Mujoo ojaswin@linux.ibm.com xfs: Check for delayed allocations before setting extsize
Christoph Hellwig hch@lst.de xfs: streamline xfs_filestream_pick_ag
Chi Zhiling chizhiling@kylinos.cn xfs: Reduce unnecessary searches when searching for the best extents
Christoph Hellwig hch@lst.de xfs: update the pag for the last AG at recovery time
Christoph Hellwig hch@lst.de xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
Christoph Hellwig hch@lst.de xfs: error out when a superblock buffer update reduces the agcount
Christoph Hellwig hch@lst.de xfs: update the file system geometry after recoverying superblock buffers
Christoph Hellwig hch@lst.de xfs: pass the exact range to initialize to xfs_initialize_perag
Zhang Zekun zhangzekun11@huawei.com xfs: Remove empty declartion in header file
Uros Bizjak ubizjak@gmail.com xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
Christoph Hellwig hch@lst.de xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
Christoph Hellwig hch@lst.de xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
Christoph Hellwig hch@lst.de xfs: don't ifdef around the exact minlen allocations
Christoph Hellwig hch@lst.de xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
Christoph Hellwig hch@lst.de xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
Christoph Hellwig hch@lst.de xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
Christoph Hellwig hch@lst.de xfs: return bool from xfs_attr3_leaf_add
Christoph Hellwig hch@lst.de xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
Brian Foster bfoster@redhat.com xfs: don't free cowblocks from under dirty pagecache on unshare
Brian Foster bfoster@redhat.com xfs: skip background cowblock trims on inodes open for write
Andrew Kreimer algonell@gmail.com xfs: fix a typo
Darrick J. Wong djwong@kernel.org xfs: fix a sloppy memory handling bug in xfs_iroot_realloc
Darrick J. Wong djwong@kernel.org xfs: validate inumber in xfs_iget
Christoph Hellwig hch@lst.de xfs: assert a valid limit in xfs_rtfind_forw
Catalin Marinas catalin.marinas@arm.com arm64: mte: Do not allow PROT_MTE on MAP_HUGETLB user mappings
-------------
Diffstat:
Documentation/networking/strparser.rst | 9 +- Makefile | 4 +- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 1 + arch/arm64/boot/dts/qcom/sm8450.dtsi | 213 +++++++++-------- arch/arm64/boot/dts/qcom/sm8550.dtsi | 265 +++++++++++---------- .../dts/rockchip/rk3328-orangepi-r1-plus-lts.dts | 6 +- arch/arm64/include/asm/mman.h | 9 +- arch/powerpc/include/asm/book3s/64/hash-4k.h | 28 +++ arch/powerpc/include/asm/book3s/64/pgtable.h | 26 -- arch/powerpc/lib/code-patching.c | 2 +- arch/x86/Kconfig | 3 +- arch/x86/events/intel/core.c | 17 +- arch/x86/include/asm/perf_event.h | 26 +- arch/x86/kernel/cpu/bugs.c | 21 +- drivers/bluetooth/btqca.c | 110 +++++++-- drivers/cpufreq/Kconfig | 2 +- drivers/cpufreq/cpufreq-dt-platdev.c | 1 - drivers/edac/qcom_edac.c | 4 +- drivers/firmware/qcom_scm.c | 5 +- drivers/gpu/drm/i915/display/intel_display.c | 18 ++ .../gpu/drm/i915/display/intel_dp_link_training.c | 15 +- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 3 + drivers/gpu/drm/msm/msm_drv.h | 11 +- drivers/gpu/drm/msm/msm_gem.c | 6 +- drivers/gpu/drm/msm/msm_gem_submit.c | 39 +-- drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 2 +- drivers/gpu/drm/tidss/tidss_dispc.c | 22 +- drivers/gpu/drm/tidss/tidss_irq.c | 2 + drivers/input/mouse/synaptics.c | 56 +++-- drivers/input/mouse/synaptics.h | 1 + drivers/md/md-bitmap.c | 34 ++- drivers/md/md-bitmap.h | 9 +- drivers/md/md-cluster.c | 34 +-- drivers/md/md.c | 191 ++++++++------- drivers/md/md.h | 5 +- drivers/media/usb/uvc/uvc_ctrl.c | 99 ++++++-- drivers/media/usb/uvc/uvc_v4l2.c | 2 + drivers/media/usb/uvc/uvcvideo.h | 9 +- drivers/mtd/nand/raw/cadence-nand-controller.c | 42 +++- drivers/net/ethernet/ibm/ibmvnic.c | 85 +++++-- drivers/net/ethernet/ibm/ibmvnic.h | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 +- drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 2 + drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 + drivers/net/geneve.c | 16 +- drivers/net/gtp.c | 5 - drivers/nvme/host/ioctl.c | 3 +- drivers/nvmem/core.c | 32 +-- drivers/nvmem/imx-ocotp-ele.c | 22 ++ drivers/nvmem/imx-ocotp.c | 11 +- drivers/nvmem/internals.h | 37 +++ drivers/nvmem/layouts/onie-tlv.c | 3 +- drivers/nvmem/layouts/sl28vpd.c | 3 +- drivers/nvmem/mtk-efuse.c | 11 +- drivers/power/supply/da9150-fg.c | 4 +- drivers/s390/net/ism_drv.c | 14 +- drivers/scsi/scsi_lib.c | 8 +- drivers/scsi/sd.c | 4 + drivers/soc/loongson/loongson2_guts.c | 5 +- drivers/soc/mediatek/mtk-devapc.c | 7 +- drivers/tee/optee/supp.c | 35 +-- drivers/usb/gadget/function/f_midi.c | 2 +- drivers/usb/gadget/udc/core.c | 11 +- fs/nilfs2/dir.c | 24 +- fs/nilfs2/namei.c | 37 +-- fs/nilfs2/nilfs.h | 10 +- fs/smb/client/smb2ops.c | 4 + fs/xfs/libxfs/xfs_ag.c | 47 ++-- fs/xfs/libxfs/xfs_ag.h | 6 +- fs/xfs/libxfs/xfs_alloc.c | 9 +- fs/xfs/libxfs/xfs_alloc.h | 4 +- fs/xfs/libxfs/xfs_attr.c | 198 +++++++-------- fs/xfs/libxfs/xfs_attr_leaf.c | 40 ++-- fs/xfs/libxfs/xfs_attr_leaf.h | 2 +- fs/xfs/libxfs/xfs_bmap.c | 140 ++++------- fs/xfs/libxfs/xfs_da_btree.c | 5 +- fs/xfs/libxfs/xfs_inode_fork.c | 10 +- fs/xfs/libxfs/xfs_rtbitmap.c | 2 + fs/xfs/xfs_buf_item_recover.c | 70 ++++++ fs/xfs/xfs_filestream.c | 102 ++++---- fs/xfs/xfs_fsops.c | 18 +- fs/xfs/xfs_icache.c | 39 +-- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 5 + fs/xfs/xfs_ioctl.c | 4 +- fs/xfs/xfs_log.h | 1 - fs/xfs/xfs_log_cil.c | 11 +- fs/xfs/xfs_log_recover.c | 9 +- fs/xfs/xfs_mount.c | 4 +- fs/xfs/xfs_qm_bhv.c | 41 ++-- fs/xfs/xfs_reflink.c | 3 + fs/xfs/xfs_reflink.h | 19 ++ fs/xfs/xfs_super.c | 11 +- include/linux/netdevice.h | 2 + include/linux/nvmem-provider.h | 17 +- include/linux/serio.h | 3 + include/linux/skmsg.h | 2 + include/net/strparser.h | 2 + include/net/tcp.h | 22 ++ include/trace/events/oom.h | 36 ++- io_uring/io_uring.c | 2 + kernel/acct.c | 134 +++++++---- kernel/bpf/bpf_cgrp_storage.c | 2 +- kernel/bpf/ringbuf.c | 4 - kernel/bpf/syscall.c | 43 ++-- kernel/trace/ftrace.c | 3 + kernel/trace/trace_functions.c | 6 +- lib/iov_iter.c | 3 +- mm/madvise.c | 11 +- mm/memcontrol.c | 7 +- mm/oom_kill.c | 14 +- net/bpf/test_run.c | 5 +- net/core/dev.c | 37 ++- net/core/drop_monitor.c | 39 ++- net/core/flow_dissector.c | 49 ++-- net/core/skmsg.c | 7 + net/core/sock_map.c | 8 +- net/ipv4/arp.c | 2 +- net/ipv4/tcp.c | 29 ++- net/ipv4/tcp_bpf.c | 36 +++ net/ipv4/tcp_fastopen.c | 4 +- net/ipv4/tcp_input.c | 20 +- net/ipv4/tcp_ipv4.c | 2 +- net/sched/cls_api.c | 2 +- net/strparser/strparser.c | 11 +- net/vmw_vsock/af_vsock.c | 3 + net/vmw_vsock/vsock_bpf.c | 2 +- sound/core/seq/seq_clientmgr.c | 12 +- sound/pci/hda/hda_codec.c | 4 +- sound/pci/hda/patch_conexant.c | 1 + sound/pci/hda/patch_cs8409-tables.c | 6 +- sound/pci/hda/patch_cs8409.c | 20 +- sound/pci/hda/patch_cs8409.h | 5 +- sound/pci/hda/patch_realtek.c | 1 + sound/soc/fsl/fsl_micfil.c | 2 + sound/soc/rockchip/rockchip_i2s_tdm.c | 4 +- sound/soc/sh/rz-ssi.c | 2 + sound/soc/sof/pcm.c | 2 + sound/soc/sof/stream-ipc.c | 6 +- 140 files changed, 1951 insertions(+), 1249 deletions(-)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Catalin Marinas catalin.marinas@arm.com
PROT_MTE (memory tagging extensions) is not supported on all user mmap() types for various reasons (memory attributes, backing storage, CoW handling). The arm64 arch_validate_flags() function checks whether the VM_MTE_ALLOWED flag has been set for a vma during mmap(), usually by arch_calc_vm_flag_bits().
Linux prior to 6.13 does not support PROT_MTE hugetlb mappings. This was added by commit 25c17c4b55de ("hugetlb: arm64: add mte support"). However, earlier kernels inadvertently set VM_MTE_ALLOWED on (MAP_ANONYMOUS | MAP_HUGETLB) mappings by only checking for MAP_ANONYMOUS.
Explicitly check MAP_HUGETLB in arch_calc_vm_flag_bits() and avoid setting VM_MTE_ALLOWED for such mappings.
Fixes: 9f3419315f3c ("arm64: mte: Add PROT_MTE support to mmap() and mprotect()") Cc: stable@vger.kernel.org # 5.10.x-6.12.x Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/mman.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -31,9 +31,12 @@ static inline unsigned long arch_calc_vm * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ - if (system_supports_mte() && - ((flags & MAP_ANONYMOUS) || shmem_file(file))) - return VM_MTE_ALLOWED; + if (system_supports_mte()) { + if ((flags & MAP_ANONYMOUS) && !(flags & MAP_HUGETLB)) + return VM_MTE_ALLOWED; + if (shmem_file(file)) + return VM_MTE_ALLOWED; + }
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 6d2db12d56a389b3e8efa236976f8dc3a8ae00f0 upstream.
Protect against developers passing stupid limits when refactoring the RT code once again.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_rtbitmap.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -288,6 +288,8 @@ xfs_rtfind_forw( xfs_rtword_t wdiff; /* difference from wanted value */ int word; /* word number in the buffer */
+ ASSERT(start <= limit); + /* * Compute and read in starting bitmap block for starting block. */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
commit 05aba1953f4a6e2b48e13c610e8a4545ba4ef509 upstream.
Actually use the inumber validator to check the argument passed in here.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -748,7 +748,7 @@ xfs_iget( ASSERT((lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) == 0);
/* reject inode numbers outside existing AGs */ - if (!ino || XFS_INO_TO_AGNO(mp, ino) >= mp->m_sb.sb_agcount) + if (!xfs_verify_ino(mp, ino)) return -EINVAL;
XFS_STATS_INC(mp, xs_ig_attempts);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
commit de55149b6639e903c4d06eb0474ab2c05060e61d upstream.
While refactoring code, I noticed that when xfs_iroot_realloc tries to shrink a bmbt root block, it allocates a smaller new block and then copies "records" and pointers to the new block. However, bmbt root blocks cannot ever be leaves, which means that it's not technically correct to copy records. We /should/ be copying keys.
Note that this has never resulted in actual memory corruption because sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However, this will no longer be true when we start adding realtime rmap stuff, so fix this now.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_inode_fork.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -449,15 +449,15 @@ xfs_iroot_realloc( }
/* - * Only copy the records and pointers if there are any. + * Only copy the keys and pointers if there are any. */ if (new_max > 0) { /* - * First copy the records. + * First copy the keys. */ - op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1); - np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1); - memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t)); + op = (char *)XFS_BMBT_KEY_ADDR(mp, ifp->if_broot, 1); + np = (char *)XFS_BMBT_KEY_ADDR(mp, new_broot, 1); + memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t));
/* * Then copy the pointers.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Kreimer algonell@gmail.com
commit 77bfe1b11ea0c0c4b0ce19b742cd1aa82f60e45d upstream.
Fix a typo in comments.
Signed-off-by: Andrew Kreimer algonell@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_recover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1820,7 +1820,7 @@ xlog_find_item_ops( * from the transaction. However, we can't do that until after we've * replayed all the other items because they may be dependent on the * cancelled buffer and replaying the cancelled buffer can remove it - * form the cancelled buffer table. Hence they have tobe done last. + * form the cancelled buffer table. Hence they have to be done last. * * 3. Inode allocation buffers must be replayed before inode items that * read the buffer and replay changes into it. For filesystems using the
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brian Foster bfoster@redhat.com
commit 90a71daaf73f5d39bb0cbb3c7ab6af942fe6233e upstream.
The background blockgc scanner runs on a 5m interval by default and trims preallocation (post-eof and cow fork) from inodes that are otherwise idle. Idle effectively means that iolock can be acquired without blocking and that the inode has no dirty pagecache or I/O in flight.
This simple mechanism and heuristic has worked fairly well for post-eof speculative preallocations. Support for reflink and COW fork preallocations came sometime later and plugged into the same mechanism, with similar heuristics. Some recent testing has shown that COW fork preallocation may be notably more sensitive to blockgc processing than post-eof preallocation, however.
For example, consider an 8GB reflinked file with a COW extent size hint of 1MB. A worst case fully randomized overwrite of this file results in ~8k extents of an average size of ~1MB. If the same workload is interrupted a couple times for blockgc processing (assuming the file goes idle), the resulting extent count explodes to over 100k extents with an average size <100kB. This is significantly worse than ideal and essentially defeats the COW extent size hint mechanism.
While this particular test is instrumented, it reflects a fairly reasonable pattern in practice where random I/Os might spread out over a large period of time with varying periods of (in)activity. For example, consider a cloned disk image file for a VM or container with long uptime and variable and bursty usage. A background blockgc scan that races and processes the image file when it happens to be clean and idle can have a significant effect on the future fragmentation level of the file, even when still in use.
To help combat this, update the heuristic to skip cowblocks inodes that are currently opened for write access during non-sync blockgc scans. This allows COW fork preallocations to persist for as long as possible unless otherwise needed for functional purposes (i.e. a sync scan), the file is idle and closed, or the inode is being evicted from cache. While here, update the comments to help distinguish performance oriented heuristics from the logic that exists to maintain functional correctness.
Suggested-by: Darrick Wong djwong@kernel.org Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icache.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1234,14 +1234,17 @@ xfs_inode_clear_eofblocks_tag( }
/* - * Set ourselves up to free CoW blocks from this file. If it's already clean - * then we can bail out quickly, but otherwise we must back off if the file - * is undergoing some kind of write. + * Prepare to free COW fork blocks from an inode. */ static bool xfs_prep_free_cowblocks( - struct xfs_inode *ip) + struct xfs_inode *ip, + struct xfs_icwalk *icw) { + bool sync; + + sync = icw && (icw->icw_flags & XFS_ICWALK_FLAG_SYNC); + /* * Just clear the tag if we have an empty cow fork or none at all. It's * possible the inode was fully unshared since it was originally tagged. @@ -1253,9 +1256,21 @@ xfs_prep_free_cowblocks( }
/* - * If the mapping is dirty or under writeback we cannot touch the - * CoW fork. Leave it alone if we're in the midst of a directio. + * A cowblocks trim of an inode can have a significant effect on + * fragmentation even when a reasonable COW extent size hint is set. + * Therefore, we prefer to not process cowblocks unless they are clean + * and idle. We can never process a cowblocks inode that is dirty or has + * in-flight I/O under any circumstances, because outstanding writeback + * or dio expects targeted COW fork blocks exist through write + * completion where they can be remapped into the data fork. + * + * Therefore, the heuristic used here is to never process inodes + * currently opened for write from background (i.e. non-sync) scans. For + * sync scans, use the pagecache/dio state of the inode to ensure we + * never free COW fork blocks out from under pending I/O. */ + if (!sync && inode_is_open_for_write(VFS_I(ip))) + return false; if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) || mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) || mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) || @@ -1291,7 +1306,7 @@ xfs_inode_free_cowblocks( if (!xfs_iflags_test(ip, XFS_ICOWBLOCKS)) return 0;
- if (!xfs_prep_free_cowblocks(ip)) + if (!xfs_prep_free_cowblocks(ip, icw)) return 0;
if (!xfs_icwalk_match(ip, icw)) @@ -1320,7 +1335,7 @@ xfs_inode_free_cowblocks( * Check again, nobody else should be able to dirty blocks or change * the reflink iflag now that we have the first two locks held. */ - if (xfs_prep_free_cowblocks(ip)) + if (xfs_prep_free_cowblocks(ip, icw)) ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false); return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brian Foster bfoster@redhat.com
commit 4390f019ad7866c3791c3d768d2ff185d89e8ebe upstream.
fallocate unshare mode explicitly breaks extent sharing. When a command completes, it checks the data fork for any remaining shared extents to determine whether the reflink inode flag and COW fork preallocation can be removed. This logic doesn't consider in-core pagecache and I/O state, however, which means we can unsafely remove COW fork blocks that are still needed under certain conditions.
For example, consider the following command sequence:
xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \ -c "pwrite 0 32k" -c "funshare 0 1k" <file>
This allocates a data block at offset 0, shares it, and then overwrites it with a larger buffered write. The overwrite triggers COW fork preallocation, 32 blocks by default, which maps the entire 32k write to delalloc in the COW fork. All but the shared block at offset 0 remains hole mapped in the data fork. The unshare command redirties and flushes the folio at offset 0, removing the only shared extent from the inode. Since the inode no longer maps shared extents, unshare purges the COW fork before the remaining 28k may have written back.
This leaves dirty pagecache backed by holes, which writeback quietly skips, thus leaving clean, non-zeroed pagecache over holes in the file. To verify, fiemap shows holes in the first 32k of the file and reads return different data across a remount:
$ xfs_io -c "fiemap -v" <file> <file>: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS ... 1: [8..511]: hole 504 ... $ xfs_io -c "pread -v 4k 8" <file> 00001000: cd cd cd cd cd cd cd cd ........ $ umount <mnt>; mount <dev> <mnt> $ xfs_io -c "pread -v 4k 8" <file> 00001000: 00 00 00 00 00 00 00 00 ........
To avoid this problem, make unshare follow the same rules used for background cowblock scanning and never purge the COW fork for inodes with dirty pagecache or in-flight I/O.
Fixes: 46afb0628b86347 ("xfs: only flush the unshared range in xfs_reflink_unshare") Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icache.c | 8 +------- fs/xfs/xfs_reflink.c | 3 +++ fs/xfs/xfs_reflink.h | 19 +++++++++++++++++++ 3 files changed, 23 insertions(+), 7 deletions(-)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1271,13 +1271,7 @@ xfs_prep_free_cowblocks( */ if (!sync && inode_is_open_for_write(VFS_I(ip))) return false; - if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) || - mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) || - mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) || - atomic_read(&VFS_I(ip)->i_dio_count)) - return false; - - return true; + return xfs_can_free_cowblocks(ip); }
/* --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1600,6 +1600,9 @@ xfs_reflink_clear_inode_flag(
ASSERT(xfs_is_reflink_inode(ip));
+ if (!xfs_can_free_cowblocks(ip)) + return 0; + error = xfs_reflink_inode_has_shared_extents(*tpp, ip, &needs_flag); if (error || needs_flag) return error; --- a/fs/xfs/xfs_reflink.h +++ b/fs/xfs/xfs_reflink.h @@ -16,6 +16,25 @@ static inline bool xfs_is_cow_inode(stru return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip); }
+/* + * Check whether it is safe to free COW fork blocks from an inode. It is unsafe + * to do so when an inode has dirty cache or I/O in-flight, even if no shared + * extents exist in the data fork, because outstanding I/O may target blocks + * that were speculatively allocated to the COW fork. + */ +static inline bool +xfs_can_free_cowblocks(struct xfs_inode *ip) +{ + struct inode *inode = VFS_I(ip); + + if ((inode->i_state & I_DIRTY_PAGES) || + mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY) || + mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK) || + atomic_read(&inode->i_dio_count)) + return false; + return true; +} + extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip, struct xfs_bmbt_irec *irec, bool *shared); int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit b1c649da15c2e4c86344c8e5af69c8afa215efec upstream.
[backport: dependency of a5f7334 and b3f4e84]
xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and merging the two will simplify a following error handling fix.
To facilitate this move the remote block state save/restore helpers up in the file so that they don't need forward declarations now.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_attr.c | 176 +++++++++++++++++++---------------------------- 1 file changed, 74 insertions(+), 102 deletions(-)
--- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -50,7 +50,6 @@ STATIC int xfs_attr_shortform_addname(xf STATIC int xfs_attr_leaf_get(xfs_da_args_t *args); STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args); STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp); -STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args);
/* * Internal routines when attribute list is more than one block. @@ -401,6 +400,33 @@ out: return error; }
+/* Save the current remote block info and clear the current pointers. */ +static void +xfs_attr_save_rmt_blk( + struct xfs_da_args *args) +{ + args->blkno2 = args->blkno; + args->index2 = args->index; + args->rmtblkno2 = args->rmtblkno; + args->rmtblkcnt2 = args->rmtblkcnt; + args->rmtvaluelen2 = args->rmtvaluelen; + args->rmtblkno = 0; + args->rmtblkcnt = 0; + args->rmtvaluelen = 0; +} + +/* Set stored info about a remote block */ +static void +xfs_attr_restore_rmt_blk( + struct xfs_da_args *args) +{ + args->blkno = args->blkno2; + args->index = args->index2; + args->rmtblkno = args->rmtblkno2; + args->rmtblkcnt = args->rmtblkcnt2; + args->rmtvaluelen = args->rmtvaluelen2; +} + /* * Handle the state change on completion of a multi-state attr operation. * @@ -428,49 +454,77 @@ xfs_attr_complete_op( return XFS_DAS_DONE; }
+/* + * Try to add an attribute to an inode in leaf form. + */ static int xfs_attr_leaf_addname( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; + struct xfs_buf *bp; int error;
ASSERT(xfs_attr_is_leaf(args->dp));
+ error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp); + if (error) + return error; + /* - * Use the leaf buffer we may already hold locked as a result of - * a sf-to-leaf conversion. + * Look up the xattr name to set the insertion point for the new xattr. */ - error = xfs_attr_leaf_try_add(args); - - if (error == -ENOSPC) { - error = xfs_attr3_leaf_to_node(args); - if (error) - return error; + error = xfs_attr3_leaf_lookup_int(bp, args); + switch (error) { + case -ENOATTR: + if (args->op_flags & XFS_DA_OP_REPLACE) + goto out_brelse; + break; + case -EEXIST: + if (!(args->op_flags & XFS_DA_OP_REPLACE)) + goto out_brelse;
+ trace_xfs_attr_leaf_replace(args); /* - * We're not in leaf format anymore, so roll the transaction and - * retry the add to the newly allocated node block. + * Save the existing remote attr state so that the current + * values reflect the state of the new attribute we are about to + * add, not the attribute we just found and will remove later. */ - attr->xattri_dela_state = XFS_DAS_NODE_ADD; - goto out; + xfs_attr_save_rmt_blk(args); + break; + case 0: + break; + default: + goto out_brelse; } - if (error) - return error;
/* * We need to commit and roll if we need to allocate remote xattr blocks * or perform more xattr manipulations. Otherwise there is nothing more * to do and we can return success. */ - if (args->rmtblkno) + error = xfs_attr3_leaf_add(bp, args); + if (error) { + if (error != -ENOSPC) + return error; + error = xfs_attr3_leaf_to_node(args); + if (error) + return error; + + attr->xattri_dela_state = XFS_DAS_NODE_ADD; + } else if (args->rmtblkno) { attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT; - else - attr->xattri_dela_state = xfs_attr_complete_op(attr, - XFS_DAS_LEAF_REPLACE); -out: + } else { + attr->xattri_dela_state = + xfs_attr_complete_op(attr, XFS_DAS_LEAF_REPLACE); + } + trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp); return error; + +out_brelse: + xfs_trans_brelse(args->trans, bp); + return error; }
/* @@ -1164,88 +1218,6 @@ xfs_attr_shortform_addname( * External routines when attribute list is one block *========================================================================*/
-/* Save the current remote block info and clear the current pointers. */ -static void -xfs_attr_save_rmt_blk( - struct xfs_da_args *args) -{ - args->blkno2 = args->blkno; - args->index2 = args->index; - args->rmtblkno2 = args->rmtblkno; - args->rmtblkcnt2 = args->rmtblkcnt; - args->rmtvaluelen2 = args->rmtvaluelen; - args->rmtblkno = 0; - args->rmtblkcnt = 0; - args->rmtvaluelen = 0; -} - -/* Set stored info about a remote block */ -static void -xfs_attr_restore_rmt_blk( - struct xfs_da_args *args) -{ - args->blkno = args->blkno2; - args->index = args->index2; - args->rmtblkno = args->rmtblkno2; - args->rmtblkcnt = args->rmtblkcnt2; - args->rmtvaluelen = args->rmtvaluelen2; -} - -/* - * Tries to add an attribute to an inode in leaf form - * - * This function is meant to execute as part of a delayed operation and leaves - * the transaction handling to the caller. On success the attribute is added - * and the inode and transaction are left dirty. If there is not enough space, - * the attr data is converted to node format and -ENOSPC is returned. Caller is - * responsible for handling the dirty inode and transaction or adding the attr - * in node format. - */ -STATIC int -xfs_attr_leaf_try_add( - struct xfs_da_args *args) -{ - struct xfs_buf *bp; - int error; - - error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp); - if (error) - return error; - - /* - * Look up the xattr name to set the insertion point for the new xattr. - */ - error = xfs_attr3_leaf_lookup_int(bp, args); - switch (error) { - case -ENOATTR: - if (args->op_flags & XFS_DA_OP_REPLACE) - goto out_brelse; - break; - case -EEXIST: - if (!(args->op_flags & XFS_DA_OP_REPLACE)) - goto out_brelse; - - trace_xfs_attr_leaf_replace(args); - /* - * Save the existing remote attr state so that the current - * values reflect the state of the new attribute we are about to - * add, not the attribute we just found and will remove later. - */ - xfs_attr_save_rmt_blk(args); - break; - case 0: - break; - default: - goto out_brelse; - } - - return xfs_attr3_leaf_add(bp, args); - -out_brelse: - xfs_trans_brelse(args->trans, bp); - return error; -} - /* * Return EEXIST if attr is found, or ENOATTR if not */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 346c1d46d4c631c0c88592d371f585214d714da4 upstream.
[backport: dependency of a5f7334 and b3f4e84]
xfs_attr3_leaf_add only has two potential return values, indicating if the entry could be added or not. Replace the errno return with a bool so that ENOSPC from it can't easily be confused with a real ENOSPC.
Remove the return value from the xfs_attr3_leaf_add_work helper entirely, as it always return 0.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_attr.c | 13 +++++-------- fs/xfs/libxfs/xfs_attr_leaf.c | 37 +++++++++++++++++++------------------ fs/xfs/libxfs/xfs_attr_leaf.h | 2 +- 3 files changed, 25 insertions(+), 27 deletions(-)
--- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -503,10 +503,7 @@ xfs_attr_leaf_addname( * or perform more xattr manipulations. Otherwise there is nothing more * to do and we can return success. */ - error = xfs_attr3_leaf_add(bp, args); - if (error) { - if (error != -ENOSPC) - return error; + if (!xfs_attr3_leaf_add(bp, args)) { error = xfs_attr3_leaf_to_node(args); if (error) return error; @@ -520,7 +517,7 @@ xfs_attr_leaf_addname( }
trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp); - return error; + return 0;
out_brelse: xfs_trans_brelse(args->trans, bp); @@ -1393,21 +1390,21 @@ xfs_attr_node_try_addname( { struct xfs_da_state *state = attr->xattri_da_state; struct xfs_da_state_blk *blk; - int error; + int error = 0;
trace_xfs_attr_node_addname(state->args);
blk = &state->path.blk[state->path.active-1]; ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
- error = xfs_attr3_leaf_add(blk->bp, state->args); - if (error == -ENOSPC) { + if (!xfs_attr3_leaf_add(blk->bp, state->args)) { if (state->path.active == 1) { /* * Its really a single leaf node, but it had * out-of-line values so it looked like it *might* * have been a b-tree. Let the caller deal with this. */ + error = -ENOSPC; goto out; }
--- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -46,7 +46,7 @@ */ STATIC int xfs_attr3_leaf_create(struct xfs_da_args *args, xfs_dablk_t which_block, struct xfs_buf **bpp); -STATIC int xfs_attr3_leaf_add_work(struct xfs_buf *leaf_buffer, +STATIC void xfs_attr3_leaf_add_work(struct xfs_buf *leaf_buffer, struct xfs_attr3_icleaf_hdr *ichdr, struct xfs_da_args *args, int freemap_index); STATIC void xfs_attr3_leaf_compact(struct xfs_da_args *args, @@ -990,10 +990,8 @@ xfs_attr_shortform_to_leaf( } error = xfs_attr3_leaf_lookup_int(bp, &nargs); /* set a->index */ ASSERT(error == -ENOATTR); - error = xfs_attr3_leaf_add(bp, &nargs); - ASSERT(error != -ENOSPC); - if (error) - goto out; + if (!xfs_attr3_leaf_add(bp, &nargs)) + ASSERT(0); sfe = xfs_attr_sf_nextentry(sfe); } error = 0; @@ -1349,8 +1347,9 @@ xfs_attr3_leaf_split( struct xfs_da_state_blk *oldblk, struct xfs_da_state_blk *newblk) { - xfs_dablk_t blkno; - int error; + bool added; + xfs_dablk_t blkno; + int error;
trace_xfs_attr_leaf_split(state->args);
@@ -1385,10 +1384,10 @@ xfs_attr3_leaf_split( */ if (state->inleaf) { trace_xfs_attr_leaf_add_old(state->args); - error = xfs_attr3_leaf_add(oldblk->bp, state->args); + added = xfs_attr3_leaf_add(oldblk->bp, state->args); } else { trace_xfs_attr_leaf_add_new(state->args); - error = xfs_attr3_leaf_add(newblk->bp, state->args); + added = xfs_attr3_leaf_add(newblk->bp, state->args); }
/* @@ -1396,13 +1395,15 @@ xfs_attr3_leaf_split( */ oldblk->hashval = xfs_attr_leaf_lasthash(oldblk->bp, NULL); newblk->hashval = xfs_attr_leaf_lasthash(newblk->bp, NULL); - return error; + if (!added) + return -ENOSPC; + return 0; }
/* * Add a name to the leaf attribute list structure. */ -int +bool xfs_attr3_leaf_add( struct xfs_buf *bp, struct xfs_da_args *args) @@ -1411,6 +1412,7 @@ xfs_attr3_leaf_add( struct xfs_attr3_icleaf_hdr ichdr; int tablesize; int entsize; + bool added = true; int sum; int tmp; int i; @@ -1439,7 +1441,7 @@ xfs_attr3_leaf_add( if (ichdr.freemap[i].base < ichdr.firstused) tmp += sizeof(xfs_attr_leaf_entry_t); if (ichdr.freemap[i].size >= tmp) { - tmp = xfs_attr3_leaf_add_work(bp, &ichdr, args, i); + xfs_attr3_leaf_add_work(bp, &ichdr, args, i); goto out_log_hdr; } sum += ichdr.freemap[i].size; @@ -1451,7 +1453,7 @@ xfs_attr3_leaf_add( * no good and we should just give up. */ if (!ichdr.holes && sum < entsize) - return -ENOSPC; + return false;
/* * Compact the entries to coalesce free space. @@ -1464,24 +1466,24 @@ xfs_attr3_leaf_add( * free region, in freemap[0]. If it is not big enough, give up. */ if (ichdr.freemap[0].size < (entsize + sizeof(xfs_attr_leaf_entry_t))) { - tmp = -ENOSPC; + added = false; goto out_log_hdr; }
- tmp = xfs_attr3_leaf_add_work(bp, &ichdr, args, 0); + xfs_attr3_leaf_add_work(bp, &ichdr, args, 0);
out_log_hdr: xfs_attr3_leaf_hdr_to_disk(args->geo, leaf, &ichdr); xfs_trans_log_buf(args->trans, bp, XFS_DA_LOGRANGE(leaf, &leaf->hdr, xfs_attr3_leaf_hdr_size(leaf))); - return tmp; + return added; }
/* * Add a name to a leaf attribute list structure. */ -STATIC int +STATIC void xfs_attr3_leaf_add_work( struct xfs_buf *bp, struct xfs_attr3_icleaf_hdr *ichdr, @@ -1599,7 +1601,6 @@ xfs_attr3_leaf_add_work( } } ichdr->usedbytes += xfs_attr_leaf_entsize(leaf, args->index); - return 0; }
/* --- a/fs/xfs/libxfs/xfs_attr_leaf.h +++ b/fs/xfs/libxfs/xfs_attr_leaf.h @@ -78,7 +78,7 @@ int xfs_attr3_leaf_split(struct xfs_da_s int xfs_attr3_leaf_lookup_int(struct xfs_buf *leaf, struct xfs_da_args *args); int xfs_attr3_leaf_getvalue(struct xfs_buf *bp, struct xfs_da_args *args); -int xfs_attr3_leaf_add(struct xfs_buf *leaf_buffer, +bool xfs_attr3_leaf_add(struct xfs_buf *leaf_buffer, struct xfs_da_args *args); int xfs_attr3_leaf_remove(struct xfs_buf *leaf_buffer, struct xfs_da_args *args);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit a5f73342abe1f796140f6585e43e2aa7bc1b7975 upstream.
xfs_attr3_leaf_split propagates the need for an extra btree split as -ENOSPC to it's only caller, but the same return value can also be returned from xfs_da_grow_inode when it fails to find free space.
Distinguish the two cases by returning 1 for the extra split case instead of overloading -ENOSPC.
This can be triggered relatively easily with the pending realtime group support and a file system with a lot of small zones that use metadata space on the main device. In this case every about 5-10th run of xfs/538 runs into the following assert:
ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC);
in xfs_attr3_leaf_split caused by an allocation failure. Note that the allocation failure is caused by another bug that will be fixed subsequently, but this commit at least sorts out the error handling.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++++- fs/xfs/libxfs/xfs_da_btree.c | 5 +++-- 2 files changed, 7 insertions(+), 3 deletions(-)
--- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -1340,6 +1340,9 @@ xfs_attr3_leaf_create(
/* * Split the leaf node, rebalance, then add the new entry. + * + * Returns 0 if the entry was added, 1 if a further split is needed or a + * negative error number otherwise. */ int xfs_attr3_leaf_split( @@ -1396,7 +1399,7 @@ xfs_attr3_leaf_split( oldblk->hashval = xfs_attr_leaf_lasthash(oldblk->bp, NULL); newblk->hashval = xfs_attr_leaf_lasthash(newblk->bp, NULL); if (!added) - return -ENOSPC; + return 1; return 0; }
--- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -522,9 +522,8 @@ xfs_da3_split( switch (oldblk->magic) { case XFS_ATTR_LEAF_MAGIC: error = xfs_attr3_leaf_split(state, oldblk, newblk); - if ((error != 0) && (error != -ENOSPC)) { + if (error < 0) return error; /* GROT: attr is inconsistent */ - } if (!error) { addblk = newblk; break; @@ -546,6 +545,8 @@ xfs_da3_split( error = xfs_attr3_leaf_split(state, newblk, &state->extrablk); } + if (error == 1) + return -ENOSPC; if (error) return error; /* GROT: attr inconsistent */ addblk = newblk;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit b3f4e84e2f438a119b7ca8684a25452b3e57c0f0 upstream.
Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return -ENOSPC both for an actual failure to allocate a disk block, but also to signal the caller to convert the format of the attr fork. Use magic 1 to ask for the conversion here as well.
Note that unlike the similar issue in xfs_attr3_leaf_split, this one was only found by code review.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_attr.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -543,7 +543,7 @@ xfs_attr_node_addname( return error;
error = xfs_attr_node_try_addname(attr); - if (error == -ENOSPC) { + if (error == 1) { error = xfs_attr3_leaf_to_node(args); if (error) return error; @@ -1380,9 +1380,12 @@ error: /* * Add a name to a Btree-format attribute list. * - * This will involve walking down the Btree, and may involve splitting - * leaf nodes and even splitting intermediate nodes up to and including - * the root node (a special case of an intermediate node). + * This will involve walking down the Btree, and may involve splitting leaf + * nodes and even splitting intermediate nodes up to and including the root + * node (a special case of an intermediate node). + * + * If the tree was still in single leaf format and needs to converted to + * real node format return 1 and let the caller handle that. */ static int xfs_attr_node_try_addname( @@ -1404,7 +1407,7 @@ xfs_attr_node_try_addname( * out-of-line values so it looked like it *might* * have been a b-tree. Let the caller deal with this. */ - error = -ENOSPC; + error = 1; goto out; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 865469cd41bce2b04bef9539cbf70676878bc8df upstream.
[backport: dependency of 6aac770]
Userdata and metadata allocations end up in the same allocation helpers. Remove the separate xfs_bmap_alloc_userdata function to make this more clear.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_bmap.c | 73 ++++++++++++++++++----------------------------- 1 file changed, 28 insertions(+), 45 deletions(-)
--- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4078,43 +4078,6 @@ out: }
static int -xfs_bmap_alloc_userdata( - struct xfs_bmalloca *bma) -{ - struct xfs_mount *mp = bma->ip->i_mount; - int whichfork = xfs_bmapi_whichfork(bma->flags); - int error; - - /* - * Set the data type being allocated. For the data fork, the first data - * in the file is treated differently to all other allocations. For the - * attribute fork, we only need to ensure the allocated range is not on - * the busy list. - */ - bma->datatype = XFS_ALLOC_NOBUSY; - if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK) { - bma->datatype |= XFS_ALLOC_USERDATA; - if (bma->offset == 0) - bma->datatype |= XFS_ALLOC_INITIAL_USER_DATA; - - if (mp->m_dalign && bma->length >= mp->m_dalign) { - error = xfs_bmap_isaeof(bma, whichfork); - if (error) - return error; - } - - if (XFS_IS_REALTIME_INODE(bma->ip)) - return xfs_bmap_rtalloc(bma); - } - - if (unlikely(XFS_TEST_ERROR(false, mp, - XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) - return xfs_bmap_exact_minlen_extent_alloc(bma); - - return xfs_bmap_btalloc(bma); -} - -static int xfs_bmapi_allocate( struct xfs_bmalloca *bma) { @@ -4147,15 +4110,35 @@ xfs_bmapi_allocate( else bma->minlen = 1;
- if (bma->flags & XFS_BMAPI_METADATA) { - if (unlikely(XFS_TEST_ERROR(false, mp, - XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) - error = xfs_bmap_exact_minlen_extent_alloc(bma); - else - error = xfs_bmap_btalloc(bma); - } else { - error = xfs_bmap_alloc_userdata(bma); + if (!(bma->flags & XFS_BMAPI_METADATA)) { + /* + * For the data and COW fork, the first data in the file is + * treated differently to all other allocations. For the + * attribute fork, we only need to ensure the allocated range + * is not on the busy list. + */ + bma->datatype = XFS_ALLOC_NOBUSY; + if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK) { + bma->datatype |= XFS_ALLOC_USERDATA; + if (bma->offset == 0) + bma->datatype |= XFS_ALLOC_INITIAL_USER_DATA; + + if (mp->m_dalign && bma->length >= mp->m_dalign) { + error = xfs_bmap_isaeof(bma, whichfork); + if (error) + return error; + } + } } + + if ((bma->datatype & XFS_ALLOC_USERDATA) && + XFS_IS_REALTIME_INODE(bma->ip)) + error = xfs_bmap_rtalloc(bma); + else if (unlikely(XFS_TEST_ERROR(false, mp, + XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) + error = xfs_bmap_exact_minlen_extent_alloc(bma); + else + error = xfs_bmap_btalloc(bma); if (error) return error; if (bma->blkno == NULLFSBLOCK)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit b611fddc0435738e64453bbf1dadd4b12a801858 upstream.
Exact minlen allocations only exist as an error injection tool for debug builds. Currently this is implemented using ifdefs, which means the code isn't even compiled for non-XFS_DEBUG builds. Enhance the compile test coverage by always building the code and use the compilers' dead code elimination to remove it from the generated binary instead.
The only downside is that the alloc_minlen_only field is unconditionally added to struct xfs_alloc_args now, but by moving it around and packing it tightly this doesn't actually increase the size of the structure.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_alloc.c | 7 ++----- fs/xfs/libxfs/xfs_alloc.h | 4 +--- fs/xfs/libxfs/xfs_bmap.c | 6 ------ 3 files changed, 3 insertions(+), 14 deletions(-)
--- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2581,7 +2581,6 @@ __xfs_free_extent_later( return 0; }
-#ifdef DEBUG /* * Check if an AGF has a free extent record whose length is equal to * args->minlen. @@ -2620,7 +2619,6 @@ out:
return error; } -#endif
/* * Decide whether to use this allocation group for this allocation. @@ -2694,15 +2692,14 @@ xfs_alloc_fix_freelist( if (!xfs_alloc_space_available(args, need, alloc_flags)) goto out_agbp_relse;
-#ifdef DEBUG - if (args->alloc_minlen_only) { + if (IS_ENABLED(CONFIG_XFS_DEBUG) && args->alloc_minlen_only) { int stat;
error = xfs_exact_minlen_extent_available(args, agbp, &stat); if (error || !stat) goto out_agbp_relse; } -#endif + /* * Make the freelist shorter if it's too long. * --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -53,11 +53,9 @@ typedef struct xfs_alloc_arg { int datatype; /* mask defining data type treatment */ char wasdel; /* set if allocation was prev delayed */ char wasfromfl; /* set if allocation is from freelist */ + bool alloc_minlen_only; /* allocate exact minlen extent */ struct xfs_owner_info oinfo; /* owner of blocks being allocated */ enum xfs_ag_resv_type resv; /* block reservation to use */ -#ifdef DEBUG - bool alloc_minlen_only; /* allocate exact minlen extent */ -#endif } xfs_alloc_arg_t;
/* --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3388,7 +3388,6 @@ xfs_bmap_process_allocated_extent( xfs_bmap_btalloc_accounting(ap, args); }
-#ifdef DEBUG static int xfs_bmap_exact_minlen_extent_alloc( struct xfs_bmalloca *ap) @@ -3450,11 +3449,6 @@ xfs_bmap_exact_minlen_extent_alloc(
return 0; } -#else - -#define xfs_bmap_exact_minlen_extent_alloc(bma) (-EFSCORRUPTED) - -#endif
/* * If we are not low on available data blocks and we are allocating at
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 405ee87c6938f67e6ab62a3f8f85b3c60a093886 upstream.
[backport: dependency of 6aac770]
xfs_bmap_exact_minlen_extent_alloc duplicates the args setup in xfs_bmap_btalloc. Switch to call it from xfs_bmap_btalloc after doing the basic setup.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_bmap.c | 61 ++++++++++------------------------------------- 1 file changed, 13 insertions(+), 48 deletions(-)
--- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3390,28 +3390,17 @@ xfs_bmap_process_allocated_extent(
static int xfs_bmap_exact_minlen_extent_alloc( - struct xfs_bmalloca *ap) + struct xfs_bmalloca *ap, + struct xfs_alloc_arg *args) { - struct xfs_mount *mp = ap->ip->i_mount; - struct xfs_alloc_arg args = { .tp = ap->tp, .mp = mp }; - xfs_fileoff_t orig_offset; - xfs_extlen_t orig_length; - int error; - - ASSERT(ap->length); - if (ap->minlen != 1) { - ap->blkno = NULLFSBLOCK; - ap->length = 0; + args->fsbno = NULLFSBLOCK; return 0; }
- orig_offset = ap->offset; - orig_length = ap->length; - - args.alloc_minlen_only = 1; - - xfs_bmap_compute_alignments(ap, &args); + args->alloc_minlen_only = 1; + args->minlen = args->maxlen = ap->minlen; + args->total = ap->total;
/* * Unlike the longest extent available in an AG, we don't track @@ -3421,33 +3410,9 @@ xfs_bmap_exact_minlen_extent_alloc( * we need not be concerned about a drop in performance in * "debug only" code paths. */ - ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0); - - args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE; - args.minlen = args.maxlen = ap->minlen; - args.total = ap->total; + ap->blkno = XFS_AGB_TO_FSB(ap->ip->i_mount, 0, 0);
- args.alignment = 1; - args.minalignslop = 0; - - args.minleft = ap->minleft; - args.wasdel = ap->wasdel; - args.resv = XFS_AG_RESV_NONE; - args.datatype = ap->datatype; - - error = xfs_alloc_vextent_first_ag(&args, ap->blkno); - if (error) - return error; - - if (args.fsbno != NULLFSBLOCK) { - xfs_bmap_process_allocated_extent(ap, &args, orig_offset, - orig_length); - } else { - ap->blkno = NULLFSBLOCK; - ap->length = 0; - } - - return 0; + return xfs_alloc_vextent_first_ag(args, ap->blkno); }
/* @@ -3706,8 +3671,11 @@ xfs_bmap_btalloc( /* Trim the allocation back to the maximum an AG can fit. */ args.maxlen = min(ap->length, mp->m_ag_max_usable);
- if ((ap->datatype & XFS_ALLOC_USERDATA) && - xfs_inode_is_filestream(ap->ip)) + if (unlikely(XFS_TEST_ERROR(false, mp, + XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) + error = xfs_bmap_exact_minlen_extent_alloc(ap, &args); + else if ((ap->datatype & XFS_ALLOC_USERDATA) && + xfs_inode_is_filestream(ap->ip)) error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align); else error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align); @@ -4128,9 +4096,6 @@ xfs_bmapi_allocate( if ((bma->datatype & XFS_ALLOC_USERDATA) && XFS_IS_REALTIME_INODE(bma->ip)) error = xfs_bmap_rtalloc(bma); - else if (unlikely(XFS_TEST_ERROR(false, mp, - XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) - error = xfs_bmap_exact_minlen_extent_alloc(bma); else error = xfs_bmap_btalloc(bma); if (error)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 6aac77059881e4419df499392c995bf02fb9630b upstream.
Currently the debug-only xfs_bmap_exact_minlen_extent_alloc allocation variant fails to drop into the lowmode last resort allocator, and thus can sometimes fail allocations for which the caller has a transaction block reservation.
Fix this by using xfs_bmap_btalloc_low_space to do the actual allocation.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_bmap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3412,7 +3412,13 @@ xfs_bmap_exact_minlen_extent_alloc( */ ap->blkno = XFS_AGB_TO_FSB(ap->ip->i_mount, 0, 0);
- return xfs_alloc_vextent_first_ag(args, ap->blkno); + /* + * Call xfs_bmap_btalloc_low_space here as it first does a "normal" AG + * iteration and then drops args->total to args->minlen, which might be + * required to find an allocation for the transaction reservation when + * the file system is very full. + */ + return xfs_bmap_btalloc_low_space(ap, args); }
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uros Bizjak ubizjak@gmail.com
commit 20195d011c840b01fa91a85ebcd099ca95fbf8fc upstream.
Use !try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in xlog_cil_insert_pcp_aggregate(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg.
Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop.
Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read.
No functional change intended.
Signed-off-by: Uros Bizjak ubizjak@gmail.com Reviewed-by: Christoph Hellwig hch@infradead.org Cc: Chandan Babu R chandan.babu@oracle.com Cc: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_cil.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -156,7 +156,6 @@ xlog_cil_insert_pcp_aggregate( struct xfs_cil *cil, struct xfs_cil_ctx *ctx) { - struct xlog_cil_pcp *cilpcp; int cpu; int count = 0;
@@ -171,13 +170,11 @@ xlog_cil_insert_pcp_aggregate( * structures that could have a nonzero space_used. */ for_each_cpu(cpu, &ctx->cil_pcpmask) { - int old, prev; + struct xlog_cil_pcp *cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); + int old = READ_ONCE(cilpcp->space_used);
- cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); - do { - old = cilpcp->space_used; - prev = cmpxchg(&cilpcp->space_used, old, 0); - } while (old != prev); + while (!try_cmpxchg(&cilpcp->space_used, &old, 0)) + ; count += old; } atomic_add(count, &ctx->space_used);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Zekun zhangzekun11@huawei.com
commit f6225eebd76f371dab98b4d1c1a7c1e255190aef upstream.
The definition of xfs_attr_use_log_assist() has been removed since commit d9c61ccb3b09 ("xfs: move xfs_attr_use_log_assist out of xfs_log.c"). So, Remove the empty declartion in header files.
Signed-off-by: Zhang Zekun zhangzekun11@huawei.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log.h | 1 - 1 file changed, 1 deletion(-)
--- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -161,6 +161,5 @@ bool xlog_force_shutdown(struct xlog *
void xlog_use_incompat_feat(struct xlog *log); void xlog_drop_incompat_feat(struct xlog *log); -int xfs_attr_use_log_assist(struct xfs_mount *mp);
#endif /* __XFS_LOG_H__ */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 82742f8c3f1a93787a05a00aca50c2a565231f84 upstream.
[backport: dependency of 6a18765b]
Currently only the new agcount is passed to xfs_initialize_perag, which requires lookups of existing AGs to skip them and complicates error handling. Also pass the previous agcount so that the range that xfs_initialize_perag operates on is exactly defined. That way the extra lookups can be avoided, and error handling can clean up the exact range from the old count to the last added perag structure.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_ag.c | 28 ++++++---------------------- fs/xfs/libxfs/xfs_ag.h | 5 +++-- fs/xfs/xfs_fsops.c | 18 ++++++++---------- fs/xfs/xfs_log_recover.c | 5 +++-- fs/xfs/xfs_mount.c | 4 ++-- 5 files changed, 22 insertions(+), 38 deletions(-)
--- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -360,27 +360,16 @@ xfs_free_unused_perag_range( int xfs_initialize_perag( struct xfs_mount *mp, - xfs_agnumber_t agcount, + xfs_agnumber_t old_agcount, + xfs_agnumber_t new_agcount, xfs_rfsblock_t dblocks, xfs_agnumber_t *maxagi) { struct xfs_perag *pag; xfs_agnumber_t index; - xfs_agnumber_t first_initialised = NULLAGNUMBER; int error;
- /* - * Walk the current per-ag tree so we don't try to initialise AGs - * that already exist (growfs case). Allocate and insert all the - * AGs we don't find ready for initialisation. - */ - for (index = 0; index < agcount; index++) { - pag = xfs_perag_get(mp, index); - if (pag) { - xfs_perag_put(pag); - continue; - } - + for (index = old_agcount; index < new_agcount; index++) { pag = kmem_zalloc(sizeof(*pag), KM_MAYFAIL); if (!pag) { error = -ENOMEM; @@ -425,21 +414,17 @@ xfs_initialize_perag( /* Active ref owned by mount indicates AG is online. */ atomic_set(&pag->pag_active_ref, 1);
- /* first new pag is fully initialized */ - if (first_initialised == NULLAGNUMBER) - first_initialised = index; - /* * Pre-calculated geometry */ - pag->block_count = __xfs_ag_block_count(mp, index, agcount, + pag->block_count = __xfs_ag_block_count(mp, index, new_agcount, dblocks); pag->min_block = XFS_AGFL_BLOCK(mp); __xfs_agino_range(mp, pag->block_count, &pag->agino_min, &pag->agino_max); }
- index = xfs_set_inode_alloc(mp, agcount); + index = xfs_set_inode_alloc(mp, new_agcount);
if (maxagi) *maxagi = index; @@ -455,8 +440,7 @@ out_remove_pag: out_free_pag: kmem_free(pag); out_unwind_new_pags: - /* unwind any prior newly initialized pags */ - xfs_free_unused_perag_range(mp, first_initialised, agcount); + xfs_free_unused_perag_range(mp, old_agcount, index); return error; }
--- a/fs/xfs/libxfs/xfs_ag.h +++ b/fs/xfs/libxfs/xfs_ag.h @@ -135,8 +135,9 @@ __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_
void xfs_free_unused_perag_range(struct xfs_mount *mp, xfs_agnumber_t agstart, xfs_agnumber_t agend); -int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount, - xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi); +int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t old_agcount, + xfs_agnumber_t agcount, xfs_rfsblock_t dcount, + xfs_agnumber_t *maxagi); int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno); void xfs_free_perag(struct xfs_mount *mp);
--- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -87,6 +87,7 @@ xfs_growfs_data_private( struct xfs_mount *mp, /* mount point for filesystem */ struct xfs_growfs_data *in) /* growfs data input struct */ { + xfs_agnumber_t oagcount = mp->m_sb.sb_agcount; struct xfs_buf *bp; int error; xfs_agnumber_t nagcount; @@ -94,7 +95,6 @@ xfs_growfs_data_private( xfs_rfsblock_t nb, nb_div, nb_mod; int64_t delta; bool lastag_extended = false; - xfs_agnumber_t oagcount; struct xfs_trans *tp; struct aghdr_init_data id = {}; struct xfs_perag *last_pag; @@ -138,16 +138,14 @@ xfs_growfs_data_private( if (delta == 0) return 0;
- oagcount = mp->m_sb.sb_agcount; - /* allocate the new per-ag structures */ - if (nagcount > oagcount) { - error = xfs_initialize_perag(mp, nagcount, nb, &nagimax); - if (error) - return error; - } else if (nagcount < oagcount) { - /* TODO: shrinking the entire AGs hasn't yet completed */ + /* TODO: shrinking the entire AGs hasn't yet completed */ + if (nagcount < oagcount) return -EINVAL; - } + + /* allocate the new per-ag structures */ + error = xfs_initialize_perag(mp, oagcount, nagcount, nb, &nagimax); + if (error) + return error;
if (delta > 0) error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3317,6 +3317,7 @@ xlog_do_recover( struct xfs_mount *mp = log->l_mp; struct xfs_buf *bp = mp->m_sb_bp; struct xfs_sb *sbp = &mp->m_sb; + xfs_agnumber_t orig_agcount = sbp->sb_agcount; int error;
trace_xfs_log_recover(log, head_blk, tail_blk); @@ -3365,8 +3366,8 @@ xlog_do_recover( /* re-initialise in-core superblock and geometry structures */ mp->m_features |= xfs_sb_version_to_features(sbp); xfs_reinit_percpu_counters(mp); - error = xfs_initialize_perag(mp, sbp->sb_agcount, sbp->sb_dblocks, - &mp->m_maxagi); + error = xfs_initialize_perag(mp, orig_agcount, sbp->sb_agcount, + sbp->sb_dblocks, &mp->m_maxagi); if (error) { xfs_warn(mp, "Failed post-recovery per-ag init: %d", error); return error; --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -797,8 +797,8 @@ xfs_mountfs( /* * Allocate and initialize the per-ag data. */ - error = xfs_initialize_perag(mp, sbp->sb_agcount, mp->m_sb.sb_dblocks, - &mp->m_maxagi); + error = xfs_initialize_perag(mp, 0, sbp->sb_agcount, + mp->m_sb.sb_dblocks, &mp->m_maxagi); if (error) { xfs_warn(mp, "Failed per-ag init: %d", error); goto out_free_dir;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 6a18765b54e2e52aebcdb84c3b4f4d1f7cb2c0ca upstream.
Primary superblock buffers that change the file system geometry after a growfs operation can affect the operation of later CIL checkpoints that make use of the newly added space and allocation groups.
Apply the changes to the in-memory structures as part of recovery pass 2, to ensure recovery works fine for such cases.
In the future we should apply the logic to other updates such as features bits as well.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_buf_item_recover.c | 52 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_log_recover.c | 8 ------ 2 files changed, 52 insertions(+), 8 deletions(-)
--- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -22,6 +22,9 @@ #include "xfs_inode.h" #include "xfs_dir2.h" #include "xfs_quota.h" +#include "xfs_alloc.h" +#include "xfs_ag.h" +#include "xfs_sb.h"
/* * This is the number of entries in the l_buf_cancel_table used during @@ -685,6 +688,49 @@ xlog_recover_do_inode_buffer( }
/* + * Update the in-memory superblock and perag structures from the primary SB + * buffer. + * + * This is required because transactions running after growfs may require the + * updated values to be set in a previous fully commit transaction. + */ +static int +xlog_recover_do_primary_sb_buffer( + struct xfs_mount *mp, + struct xlog_recover_item *item, + struct xfs_buf *bp, + struct xfs_buf_log_format *buf_f, + xfs_lsn_t current_lsn) +{ + struct xfs_dsb *dsb = bp->b_addr; + xfs_agnumber_t orig_agcount = mp->m_sb.sb_agcount; + int error; + + xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); + + /* + * Update the in-core super block from the freshly recovered on-disk one. + */ + xfs_sb_from_disk(&mp->m_sb, dsb); + + /* + * Initialize the new perags, and also update various block and inode + * allocator setting based off the number of AGs or total blocks. + * Because of the latter this also needs to happen if the agcount did + * not change. + */ + error = xfs_initialize_perag(mp, orig_agcount, + mp->m_sb.sb_agcount, mp->m_sb.sb_dblocks, + &mp->m_maxagi); + if (error) { + xfs_warn(mp, "Failed recovery per-ag init: %d", error); + return error; + } + mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); + return 0; +} + +/* * V5 filesystems know the age of the buffer on disk being recovered. We can * have newer objects on disk than we are replaying, and so for these cases we * don't want to replay the current change as that will make the buffer contents @@ -967,6 +1013,12 @@ xlog_recover_buf_commit_pass2( dirty = xlog_recover_do_dquot_buffer(mp, log, item, bp, buf_f); if (!dirty) goto out_release; + } else if ((xfs_blft_from_flags(buf_f) & XFS_BLFT_SB_BUF) && + xfs_buf_daddr(bp) == 0) { + error = xlog_recover_do_primary_sb_buffer(mp, item, bp, buf_f, + current_lsn); + if (error) + goto out_release; } else { xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); } --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3317,7 +3317,6 @@ xlog_do_recover( struct xfs_mount *mp = log->l_mp; struct xfs_buf *bp = mp->m_sb_bp; struct xfs_sb *sbp = &mp->m_sb; - xfs_agnumber_t orig_agcount = sbp->sb_agcount; int error;
trace_xfs_log_recover(log, head_blk, tail_blk); @@ -3366,13 +3365,6 @@ xlog_do_recover( /* re-initialise in-core superblock and geometry structures */ mp->m_features |= xfs_sb_version_to_features(sbp); xfs_reinit_percpu_counters(mp); - error = xfs_initialize_perag(mp, orig_agcount, sbp->sb_agcount, - sbp->sb_dblocks, &mp->m_maxagi); - if (error) { - xfs_warn(mp, "Failed post-recovery per-ag init: %d", error); - return error; - } - mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
/* Normal transactions can now occur */ clear_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit b882b0f8138ffa935834e775953f1630f89bbb62 upstream.
XFS currently does not support reducing the agcount, so error out if a logged sb buffer tries to shrink the agcount.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_buf_item_recover.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -713,6 +713,11 @@ xlog_recover_do_primary_sb_buffer( */ xfs_sb_from_disk(&mp->m_sb, dsb);
+ if (mp->m_sb.sb_agcount < orig_agcount) { + xfs_alert(mp, "Shrinking AG count in log recovery not supported"); + return -EFSCORRUPTED; + } + /* * Initialize the new perags, and also update various block and inode * allocator setting based off the number of AGs or total blocks.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 069cf5e32b700f94c6ac60f6171662bdfb04f325 upstream.
[backport: uses kmem_zalloc instead of kzalloc]
__GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail, which isn't really helpful during log recovery. Remove the flag and stick to the default GFP_KERNEL policies.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_ag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -370,7 +370,7 @@ xfs_initialize_perag( int error;
for (index = old_agcount; index < new_agcount; index++) { - pag = kmem_zalloc(sizeof(*pag), KM_MAYFAIL); + pag = kmem_zalloc(sizeof(*pag), 0); if (!pag) { error = -ENOMEM; goto out_unwind_new_pags;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 4a201dcfa1ff0dcfe4348c40f3ad8bd68b97eb6c upstream.
Currently log recovery never updates the in-core perag values for the last allocation group when they were grown by growfs. This leads to btree record validation failures for the alloc, ialloc or finotbt trees if a transaction references this new space.
Found by Brian's new growfs recovery stress test.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_ag.c | 17 +++++++++++++++++ fs/xfs/libxfs/xfs_ag.h | 1 + fs/xfs/xfs_buf_item_recover.c | 19 ++++++++++++++++--- 3 files changed, 34 insertions(+), 3 deletions(-)
--- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -358,6 +358,23 @@ xfs_free_unused_perag_range( }
int +xfs_update_last_ag_size( + struct xfs_mount *mp, + xfs_agnumber_t prev_agcount) +{ + struct xfs_perag *pag = xfs_perag_grab(mp, prev_agcount - 1); + + if (!pag) + return -EFSCORRUPTED; + pag->block_count = __xfs_ag_block_count(mp, prev_agcount - 1, + mp->m_sb.sb_agcount, mp->m_sb.sb_dblocks); + __xfs_agino_range(mp, pag->block_count, &pag->agino_min, + &pag->agino_max); + xfs_perag_rele(pag); + return 0; +} + +int xfs_initialize_perag( struct xfs_mount *mp, xfs_agnumber_t old_agcount, --- a/fs/xfs/libxfs/xfs_ag.h +++ b/fs/xfs/libxfs/xfs_ag.h @@ -140,6 +140,7 @@ int xfs_initialize_perag(struct xfs_moun xfs_agnumber_t *maxagi); int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno); void xfs_free_perag(struct xfs_mount *mp); +int xfs_update_last_ag_size(struct xfs_mount *mp, xfs_agnumber_t prev_agcount);
/* Passive AG references */ struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno); --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -708,6 +708,11 @@ xlog_recover_do_primary_sb_buffer(
xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
+ if (orig_agcount == 0) { + xfs_alert(mp, "Trying to grow file system without AGs"); + return -EFSCORRUPTED; + } + /* * Update the in-core super block from the freshly recovered on-disk one. */ @@ -719,14 +724,22 @@ xlog_recover_do_primary_sb_buffer( }
/* + * Growfs can also grow the last existing AG. In this case we also need + * to update the length in the in-core perag structure and values + * depending on it. + */ + error = xfs_update_last_ag_size(mp, orig_agcount); + if (error) + return error; + + /* * Initialize the new perags, and also update various block and inode * allocator setting based off the number of AGs or total blocks. * Because of the latter this also needs to happen if the agcount did * not change. */ - error = xfs_initialize_perag(mp, orig_agcount, - mp->m_sb.sb_agcount, mp->m_sb.sb_dblocks, - &mp->m_maxagi); + error = xfs_initialize_perag(mp, orig_agcount, mp->m_sb.sb_agcount, + mp->m_sb.sb_dblocks, &mp->m_maxagi); if (error) { xfs_warn(mp, "Failed recovery per-ag init: %d", error); return error;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chi Zhiling chizhiling@kylinos.cn
commit 3ef22684038aa577c10972ee9c6a2455f5fac941 upstream.
Recently, we found that the CPU spent a lot of time in xfs_alloc_ag_vextent_size when the filesystem has millions of fragmented spaces.
The reason is that we conducted much extra searching for extents that could not yield a better result, and these searches would cost a lot of time when there were millions of extents to search through. Even if we get the same result length, we don't switch our choice to the new one, so we can definitely terminate the search early.
Since the result length cannot exceed the found length, when the found length equals the best result length we already have, we can conclude the search.
We did a test in that filesystem: [root@localhost ~]# xfs_db -c freesp /dev/vdb from to extents blocks pct 1 1 215 215 0.01 2 3 994476 1988952 99.99
Before this patch: 0) | xfs_alloc_ag_vextent_size [xfs]() { 0) * 15597.94 us | }
After this patch: 0) | xfs_alloc_ag_vextent_size [xfs]() { 0) 19.176 us | }
Signed-off-by: Chi Zhiling chizhiling@kylinos.cn Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1783,7 +1783,7 @@ restart: error = -EFSCORRUPTED; goto error0; } - if (flen < bestrlen) + if (flen <= bestrlen) break; busy = xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen, &busy_gen);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 81a1e1c32ef474c20ccb9f730afe1ac25b1c62a4 upstream.
Directly return the error from xfs_bmap_longest_free_extent instead of breaking from the loop and handling it there, and use a done label to directly jump to the exist when we found a suitable perag structure to reduce the indentation level and pag/max_pag check complexity in the tail of the function.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_filestream.c | 96 +++++++++++++++++++++++------------------------- 1 file changed, 46 insertions(+), 50 deletions(-)
--- a/fs/xfs/xfs_filestream.c +++ b/fs/xfs/xfs_filestream.c @@ -67,22 +67,28 @@ xfs_filestream_pick_ag( xfs_extlen_t minfree, maxfree = 0; xfs_agnumber_t agno; bool first_pass = true; - int err;
/* 2% of an AG's blocks must be free for it to be chosen. */ minfree = mp->m_sb.sb_agblocks / 50;
restart: for_each_perag_wrap(mp, start_agno, agno, pag) { + int err; + trace_xfs_filestream_scan(pag, pino); + *longest = 0; err = xfs_bmap_longest_free_extent(pag, NULL, longest); if (err) { - if (err != -EAGAIN) - break; - /* Couldn't lock the AGF, skip this AG. */ - err = 0; - continue; + if (err == -EAGAIN) { + /* Couldn't lock the AGF, skip this AG. */ + err = 0; + continue; + } + xfs_perag_rele(pag); + if (max_pag) + xfs_perag_rele(max_pag); + return err; }
/* Keep track of the AG with the most free blocks. */ @@ -107,7 +113,9 @@ restart: !(flags & XFS_PICK_USERDATA) || (flags & XFS_PICK_LOWSPACE))) { /* Break out, retaining the reference on the AG. */ - break; + if (max_pag) + xfs_perag_rele(max_pag); + goto done; } }
@@ -115,56 +123,44 @@ restart: atomic_dec(&pag->pagf_fstrms); }
- if (err) { - xfs_perag_rele(pag); - if (max_pag) - xfs_perag_rele(max_pag); - return err; + /* + * Allow a second pass to give xfs_bmap_longest_free_extent() another + * attempt at locking AGFs that it might have skipped over before we + * fail. + */ + if (first_pass) { + first_pass = false; + goto restart; }
- if (!pag) { - /* - * Allow a second pass to give xfs_bmap_longest_free_extent() - * another attempt at locking AGFs that it might have skipped - * over before we fail. - */ - if (first_pass) { - first_pass = false; - goto restart; - } - - /* - * We must be low on data space, so run a final lowspace - * optimised selection pass if we haven't already. - */ - if (!(flags & XFS_PICK_LOWSPACE)) { - flags |= XFS_PICK_LOWSPACE; - goto restart; - } - - /* - * No unassociated AGs are available, so select the AG with the - * most free space, regardless of whether it's already in use by - * another filestream. It none suit, just use whatever AG we can - * grab. - */ - if (!max_pag) { - for_each_perag_wrap(args->mp, 0, start_agno, pag) { - max_pag = pag; - break; - } + /* + * We must be low on data space, so run a final lowspace optimised + * selection pass if we haven't already. + */ + if (!(flags & XFS_PICK_LOWSPACE)) { + flags |= XFS_PICK_LOWSPACE; + goto restart; + }
- /* Bail if there are no AGs at all to select from. */ - if (!max_pag) - return -ENOSPC; + /* + * No unassociated AGs are available, so select the AG with the most + * free space, regardless of whether it's already in use by another + * filestream. It none suit, just use whatever AG we can grab. + */ + if (!max_pag) { + for_each_perag_wrap(args->mp, 0, start_agno, pag) { + max_pag = pag; + break; }
- pag = max_pag; - atomic_inc(&pag->pagf_fstrms); - } else if (max_pag) { - xfs_perag_rele(max_pag); + /* Bail if there are no AGs at all to select from. */ + if (!max_pag) + return -ENOSPC; }
+ pag = max_pag; + atomic_inc(&pag->pagf_fstrms); +done: trace_xfs_filestream_pick(pag, pino); args->pag = pag; return 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ojaswin Mujoo ojaswin@linux.ibm.com
commit 2a492ff66673c38a77d0815d67b9a8cce2ef57f8 upstream.
Extsize should only be allowed to be set on files with no data in it. For this, we check if the files have extents but miss to check if delayed extents are present. This patch adds that check.
While we are at it, also refactor this check into a helper since it's used in some other places as well like xfs_inactive() or xfs_ioctl_setattr_xflags()
**Without the patch (SUCCEEDS)**
$ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536'
wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec)
**With the patch (FAILS as expected)**
$ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536'
wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec) xfs_io: FS_IOC_FSSETXATTR testfile: Invalid argument
Fixes: e94af02a9cd7 ("[XFS] fix old xfs_setattr mis-merge from irix; mostly harmless esp if not using xfs rt") Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: John Garry john.g.garry@oracle.com Signed-off-by: Ojaswin Mujoo ojaswin@linux.ibm.com Signed-off-by: Carlos Maiolino cem@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 5 +++++ fs/xfs/xfs_ioctl.c | 4 ++-- 3 files changed, 8 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1758,7 +1758,7 @@ xfs_inactive(
if (S_ISREG(VFS_I(ip)->i_mode) && (ip->i_disk_size != 0 || XFS_ISIZE(ip) != 0 || - ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0)) + xfs_inode_has_filedata(ip))) truncate = 1;
if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -286,6 +286,11 @@ static inline bool xfs_is_metadata_inode xfs_is_quota_inode(&mp->m_sb, ip->i_ino); }
+static inline bool xfs_inode_has_filedata(const struct xfs_inode *ip) +{ + return ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0; +} + /* * Check if an inode has any data in the COW fork. This might be often false * even for inodes with the reflink flag when there is no pending COW operation. --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1126,7 +1126,7 @@ xfs_ioctl_setattr_xflags(
if (rtflag != XFS_IS_REALTIME_INODE(ip)) { /* Can't change realtime flag if any extents are allocated. */ - if (ip->i_df.if_nextents || ip->i_delayed_blks) + if (xfs_inode_has_filedata(ip)) return -EINVAL;
/* @@ -1247,7 +1247,7 @@ xfs_ioctl_setattr_check_extsize( if (!fa->fsx_valid) return 0;
- if (S_ISREG(VFS_I(ip)->i_mode) && ip->i_df.if_nextents && + if (S_ISREG(VFS_I(ip)->i_mode) && xfs_inode_has_filedata(ip) && XFS_FSB_TO_B(mp, ip->i_extsize) != fa->fsx_extsize) return -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
[ Upstream commit 9a17ebfea9d0c7e0bb7409dcf655bf982a5d6e52 ]
On the data device, calling statvfs on a projinherit directory results in the block and avail counts being curtailed to the project quota block limits, if any are set. Do the same for realtime files or directories, only use the project quota rt block limits.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Stable-dep-of: 4b8d867ca6e2 ("xfs: don't over-report free space or inodes in statvfs") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_qm_bhv.c | 18 ++++++++++++------ fs/xfs/xfs_super.c | 11 +++++------ 2 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/xfs_qm_bhv.c b/fs/xfs/xfs_qm_bhv.c index b77673dd05581..268a07218c777 100644 --- a/fs/xfs/xfs_qm_bhv.c +++ b/fs/xfs/xfs_qm_bhv.c @@ -19,18 +19,24 @@ STATIC void xfs_fill_statvfs_from_dquot( struct kstatfs *statp, + struct xfs_inode *ip, struct xfs_dquot *dqp) { + struct xfs_dquot_res *blkres = &dqp->q_blk; uint64_t limit;
- limit = dqp->q_blk.softlimit ? - dqp->q_blk.softlimit : - dqp->q_blk.hardlimit; + if (XFS_IS_REALTIME_MOUNT(ip->i_mount) && + (ip->i_diflags & (XFS_DIFLAG_RTINHERIT | XFS_DIFLAG_REALTIME))) + blkres = &dqp->q_rtb; + + limit = blkres->softlimit ? + blkres->softlimit : + blkres->hardlimit; if (limit && statp->f_blocks > limit) { statp->f_blocks = limit; statp->f_bfree = statp->f_bavail = - (statp->f_blocks > dqp->q_blk.reserved) ? - (statp->f_blocks - dqp->q_blk.reserved) : 0; + (statp->f_blocks > blkres->reserved) ? + (statp->f_blocks - blkres->reserved) : 0; }
limit = dqp->q_ino.softlimit ? @@ -61,7 +67,7 @@ xfs_qm_statvfs( struct xfs_dquot *dqp;
if (!xfs_qm_dqget(mp, ip->i_projid, XFS_DQTYPE_PROJ, false, &dqp)) { - xfs_fill_statvfs_from_dquot(statp, dqp); + xfs_fill_statvfs_from_dquot(statp, ip, dqp); xfs_qm_dqput(dqp); } } diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 13007b6bc9f33..a726fbba49e40 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -878,12 +878,6 @@ xfs_fs_statfs( ffree = statp->f_files - (icount - ifree); statp->f_ffree = max_t(int64_t, ffree, 0);
- - if ((ip->i_diflags & XFS_DIFLAG_PROJINHERIT) && - ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) == - (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD)) - xfs_qm_statvfs(ip, statp); - if (XFS_IS_REALTIME_MOUNT(mp) && (ip->i_diflags & (XFS_DIFLAG_RTINHERIT | XFS_DIFLAG_REALTIME))) { s64 freertx; @@ -893,6 +887,11 @@ xfs_fs_statfs( statp->f_bavail = statp->f_bfree = freertx * sbp->sb_rextsize; }
+ if ((ip->i_diflags & XFS_DIFLAG_PROJINHERIT) && + ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) == + (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD)) + xfs_qm_statvfs(ip, statp); + return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
[ Upstream commit 4b8d867ca6e2fc6d152f629fdaf027053b81765a ]
Emmanual Florac reports a strange occurrence when project quota limits are enabled, free space is lower than the remaining quota, and someone runs statvfs:
# mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o prjquota # xfs_quota -x -c 'limit -p bhard=2G 55' /mnt # mkdir /mnt/dir # xfs_io -c 'chproj 55' -c 'chattr +P' -c 'stat -vvvv' /mnt/dir # fallocate -l 19g /mnt/a # df /mnt /mnt/dir Filesystem Size Used Avail Use% Mounted on /dev/sda 20G 20G 345M 99% /mnt /dev/sda 2.0G 0 2.0G 0% /mnt
I think the bug here is that xfs_fill_statvfs_from_dquot unconditionally assigns to f_bfree without checking that the filesystem has enough free space to fill the remaining project quota. However, this is a longstanding behavior of xfs so it's unclear what to do here.
Cc: stable@vger.kernel.org # v2.6.18 Fixes: 932f2c323196c2 ("[XFS] statvfs component of directory/project quota support, code originally by Glen.") Reported-by: Emmanuel Florac eflorac@intellique.com Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_qm_bhv.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_qm_bhv.c b/fs/xfs/xfs_qm_bhv.c index 268a07218c777..26b2c449f3c66 100644 --- a/fs/xfs/xfs_qm_bhv.c +++ b/fs/xfs/xfs_qm_bhv.c @@ -32,21 +32,28 @@ xfs_fill_statvfs_from_dquot( limit = blkres->softlimit ? blkres->softlimit : blkres->hardlimit; - if (limit && statp->f_blocks > limit) { - statp->f_blocks = limit; - statp->f_bfree = statp->f_bavail = - (statp->f_blocks > blkres->reserved) ? - (statp->f_blocks - blkres->reserved) : 0; + if (limit) { + uint64_t remaining = 0; + + if (limit > blkres->reserved) + remaining = limit - blkres->reserved; + + statp->f_blocks = min(statp->f_blocks, limit); + statp->f_bfree = min(statp->f_bfree, remaining); + statp->f_bavail = min(statp->f_bavail, remaining); }
limit = dqp->q_ino.softlimit ? dqp->q_ino.softlimit : dqp->q_ino.hardlimit; - if (limit && statp->f_files > limit) { - statp->f_files = limit; - statp->f_ffree = - (statp->f_files > dqp->q_ino.reserved) ? - (statp->f_files - dqp->q_ino.reserved) : 0; + if (limit) { + uint64_t remaining = 0; + + if (limit > dqp->q_ino.reserved) + remaining = limit - dqp->q_ino.reserved; + + statp->f_files = min(statp->f_files, limit); + statp->f_ffree = min(statp->f_ffree, remaining); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit ac619781967bd5663c29606246b50dbebd8b3473 ]
It's a little weird to borrow 'del_work' for md_start_sync(), declare a new work_struct 'sync_work' for md_start_sync().
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Xiao Ni xni@redhat.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20230825031622.1530464-2-yukuai1@huaweicloud.com Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md.c | 10 ++++++---- drivers/md/md.h | 5 ++++- 2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 9bc19a5a4119b..342407ea87d83 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -648,13 +648,13 @@ void mddev_put(struct mddev *mddev) * flush_workqueue() after mddev_find will succeed in waiting * for the work to be done. */ - INIT_WORK(&mddev->del_work, mddev_delayed_delete); queue_work(md_misc_wq, &mddev->del_work); } spin_unlock(&all_mddevs_lock); }
static void md_safemode_timeout(struct timer_list *t); +static void md_start_sync(struct work_struct *ws);
void mddev_init(struct mddev *mddev) { @@ -679,6 +679,9 @@ void mddev_init(struct mddev *mddev) mddev->resync_min = 0; mddev->resync_max = MaxSector; mddev->level = LEVEL_NONE; + + INIT_WORK(&mddev->sync_work, md_start_sync); + INIT_WORK(&mddev->del_work, mddev_delayed_delete); } EXPORT_SYMBOL_GPL(mddev_init);
@@ -9333,7 +9336,7 @@ static int remove_and_add_spares(struct mddev *mddev,
static void md_start_sync(struct work_struct *ws) { - struct mddev *mddev = container_of(ws, struct mddev, del_work); + struct mddev *mddev = container_of(ws, struct mddev, sync_work);
rcu_assign_pointer(mddev->sync_thread, md_register_thread(md_do_sync, mddev, "resync")); @@ -9546,8 +9549,7 @@ void md_check_recovery(struct mddev *mddev) */ md_bitmap_write_all(mddev->bitmap); } - INIT_WORK(&mddev->del_work, md_start_sync); - queue_work(md_misc_wq, &mddev->del_work); + queue_work(md_misc_wq, &mddev->sync_work); goto unlock; } not_running: diff --git a/drivers/md/md.h b/drivers/md/md.h index f29fa8650cd0f..46995558d3bd9 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -453,7 +453,10 @@ struct mddev { struct kernfs_node *sysfs_degraded; /*handle for 'degraded' */ struct kernfs_node *sysfs_level; /*handle for 'level' */
- struct work_struct del_work; /* used for delayed sysfs removal */ + /* used for delayed sysfs removal */ + struct work_struct del_work; + /* used for register new sync thread */ + struct work_struct sync_work;
/* "lock" protects: * flush_bio transition from NULL to !NULL
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 3d8d32873c7b6d9cec5b40c2ddb8c7c55961694f ]
There are no functional changes, prepare to simplify md_seq_ops in next patch.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20230927061241.1552837-2-yukuai1@huaweicloud.com Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 342407ea87d83..836e8ed58c8ad 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -633,23 +633,28 @@ static inline struct mddev *mddev_get(struct mddev *mddev)
static void mddev_delayed_delete(struct work_struct *ws);
+static void __mddev_put(struct mddev *mddev) +{ + if (mddev->raid_disks || !list_empty(&mddev->disks) || + mddev->ctime || mddev->hold_active) + return; + + /* Array is not configured at all, and not held active, so destroy it */ + set_bit(MD_DELETED, &mddev->flags); + + /* + * Call queue_work inside the spinlock so that flush_workqueue() after + * mddev_find will succeed in waiting for the work to be done. + */ + queue_work(md_misc_wq, &mddev->del_work); +} + void mddev_put(struct mddev *mddev) { if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) return; - if (!mddev->raid_disks && list_empty(&mddev->disks) && - mddev->ctime == 0 && !mddev->hold_active) { - /* Array is not configured at all, and not held active, - * so destroy it */ - set_bit(MD_DELETED, &mddev->flags);
- /* - * Call queue_work inside the spinlock so that - * flush_workqueue() after mddev_find will succeed in waiting - * for the work to be done. - */ - queue_work(md_misc_wq, &mddev->del_work); - } + __mddev_put(mddev); spin_unlock(&all_mddevs_lock); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit cf1b6d4441fffd0ba8ae4ced6a12f578c95ca049 ]
Before this patch, the implementation is hacky and hard to understand:
1) md_seq_start set pos to 1; 2) md_seq_show found pos is 1, then print Personalities; 3) md_seq_next found pos is 1, then it update pos to the first mddev; 4) md_seq_show found pos is not 1 or 2, show mddev; 5) md_seq_next found pos is not 1 or 2, update pos to next mddev; 6) loop 4-5 until the last mddev, then md_seq_next update pos to 2; 7) md_seq_show found pos is 2, then print unused devices; 8) md_seq_next found pos is 2, stop;
This patch remove the magic value and use seq_list_start/next/stop() directly, and move printing "Personalities" to md_seq_start(), "unsed devices" to md_seq_stop():
1) md_seq_start print Personalities, and then set pos to first mddev; 2) md_seq_show show mddev; 3) md_seq_next update pos to next mddev; 4) loop 2-3 until the last mddev; 5) md_seq_stop print unsed devices;
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20230927061241.1552837-3-yukuai1@huaweicloud.com Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md.c | 100 +++++++++++------------------------------------- 1 file changed, 22 insertions(+), 78 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 836e8ed58c8ad..27a6a11b71ee4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8260,105 +8260,46 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) }
static void *md_seq_start(struct seq_file *seq, loff_t *pos) + __acquires(&all_mddevs_lock) { - struct list_head *tmp; - loff_t l = *pos; - struct mddev *mddev; + struct md_personality *pers;
- if (l == 0x10000) { - ++*pos; - return (void *)2; - } - if (l > 0x10000) - return NULL; - if (!l--) - /* header */ - return (void*)1; + seq_puts(seq, "Personalities : "); + spin_lock(&pers_lock); + list_for_each_entry(pers, &pers_list, list) + seq_printf(seq, "[%s] ", pers->name); + + spin_unlock(&pers_lock); + seq_puts(seq, "\n"); + seq->poll_event = atomic_read(&md_event_count);
spin_lock(&all_mddevs_lock); - list_for_each(tmp,&all_mddevs) - if (!l--) { - mddev = list_entry(tmp, struct mddev, all_mddevs); - if (!mddev_get(mddev)) - continue; - spin_unlock(&all_mddevs_lock); - return mddev; - } - spin_unlock(&all_mddevs_lock); - if (!l--) - return (void*)2;/* tail */ - return NULL; + + return seq_list_start(&all_mddevs, *pos); }
static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct list_head *tmp; - struct mddev *next_mddev, *mddev = v; - struct mddev *to_put = NULL; - - ++*pos; - if (v == (void*)2) - return NULL; - - spin_lock(&all_mddevs_lock); - if (v == (void*)1) { - tmp = all_mddevs.next; - } else { - to_put = mddev; - tmp = mddev->all_mddevs.next; - } - - for (;;) { - if (tmp == &all_mddevs) { - next_mddev = (void*)2; - *pos = 0x10000; - break; - } - next_mddev = list_entry(tmp, struct mddev, all_mddevs); - if (mddev_get(next_mddev)) - break; - mddev = next_mddev; - tmp = mddev->all_mddevs.next; - } - spin_unlock(&all_mddevs_lock); - - if (to_put) - mddev_put(to_put); - return next_mddev; - + return seq_list_next(v, &all_mddevs, pos); }
static void md_seq_stop(struct seq_file *seq, void *v) + __releases(&all_mddevs_lock) { - struct mddev *mddev = v; - - if (mddev && v != (void*)1 && v != (void*)2) - mddev_put(mddev); + status_unused(seq); + spin_unlock(&all_mddevs_lock); }
static int md_seq_show(struct seq_file *seq, void *v) { - struct mddev *mddev = v; + struct mddev *mddev = list_entry(v, struct mddev, all_mddevs); sector_t sectors; struct md_rdev *rdev;
- if (v == (void*)1) { - struct md_personality *pers; - seq_printf(seq, "Personalities : "); - spin_lock(&pers_lock); - list_for_each_entry(pers, &pers_list, list) - seq_printf(seq, "[%s] ", pers->name); - - spin_unlock(&pers_lock); - seq_printf(seq, "\n"); - seq->poll_event = atomic_read(&md_event_count); + if (!mddev_get(mddev)) return 0; - } - if (v == (void*)2) { - status_unused(seq); - return 0; - }
+ spin_unlock(&all_mddevs_lock); spin_lock(&mddev->lock); if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { seq_printf(seq, "%s : %sactive", mdname(mddev), @@ -8429,6 +8370,9 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "\n"); } spin_unlock(&mddev->lock); + spin_lock(&all_mddevs_lock); + if (atomic_dec_and_test(&mddev->active)) + __mddev_put(mddev);
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 38f287d7e495ae00d4481702f44ff7ca79f5c9bc ]
There are no functional changes, and the new helper will be used in multiple places in following patches to avoid dereferencing bitmap directly.
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20240826074452.1490072-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md-bitmap.c | 25 ++++++------------------- drivers/md/md-bitmap.h | 8 +++++++- drivers/md/md.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 41 insertions(+), 21 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 2085b1705f144..f7b02d87a6da7 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -2112,32 +2112,19 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot, } EXPORT_SYMBOL_GPL(md_bitmap_copy_from_slot);
- -void md_bitmap_status(struct seq_file *seq, struct bitmap *bitmap) +int md_bitmap_get_stats(struct bitmap *bitmap, struct md_bitmap_stats *stats) { - unsigned long chunk_kb; struct bitmap_counts *counts;
if (!bitmap) - return; + return -ENOENT;
counts = &bitmap->counts; + stats->missing_pages = counts->missing_pages; + stats->pages = counts->pages; + stats->file = bitmap->storage.file;
- chunk_kb = bitmap->mddev->bitmap_info.chunksize >> 10; - seq_printf(seq, "bitmap: %lu/%lu pages [%luKB], " - "%lu%s chunk", - counts->pages - counts->missing_pages, - counts->pages, - (counts->pages - counts->missing_pages) - << (PAGE_SHIFT - 10), - chunk_kb ? chunk_kb : bitmap->mddev->bitmap_info.chunksize, - chunk_kb ? "KB" : "B"); - if (bitmap->storage.file) { - seq_printf(seq, ", file: "); - seq_file_path(seq, bitmap->storage.file, " \t\n"); - } - - seq_printf(seq, "\n"); + return 0; }
int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks, diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 8b89e260a93b7..60b86ee939081 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -234,6 +234,12 @@ struct bitmap { int cluster_slot; /* Slot offset for clustered env */ };
+struct md_bitmap_stats { + unsigned long missing_pages; + unsigned long pages; + struct file *file; +}; + /* the bitmap API */
/* these are used only by md/bitmap */ @@ -244,7 +250,7 @@ void md_bitmap_destroy(struct mddev *mddev);
void md_bitmap_print_sb(struct bitmap *bitmap); void md_bitmap_update_sb(struct bitmap *bitmap); -void md_bitmap_status(struct seq_file *seq, struct bitmap *bitmap); +int md_bitmap_get_stats(struct bitmap *bitmap, struct md_bitmap_stats *stats);
int md_bitmap_setallbits(struct bitmap *bitmap); void md_bitmap_write_all(struct bitmap *bitmap); diff --git a/drivers/md/md.c b/drivers/md/md.c index 27a6a11b71ee4..b73649fd8e039 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8290,6 +8290,33 @@ static void md_seq_stop(struct seq_file *seq, void *v) spin_unlock(&all_mddevs_lock); }
+static void md_bitmap_status(struct seq_file *seq, struct mddev *mddev) +{ + struct md_bitmap_stats stats; + unsigned long used_pages; + unsigned long chunk_kb; + int err; + + err = md_bitmap_get_stats(mddev->bitmap, &stats); + if (err) + return; + + chunk_kb = mddev->bitmap_info.chunksize >> 10; + used_pages = stats.pages - stats.missing_pages; + + seq_printf(seq, "bitmap: %lu/%lu pages [%luKB], %lu%s chunk", + used_pages, stats.pages, used_pages << (PAGE_SHIFT - 10), + chunk_kb ? chunk_kb : mddev->bitmap_info.chunksize, + chunk_kb ? "KB" : "B"); + + if (stats.file) { + seq_puts(seq, ", file: "); + seq_file_path(seq, stats.file, " \t\n"); + } + + seq_putc(seq, '\n'); +} + static int md_seq_show(struct seq_file *seq, void *v) { struct mddev *mddev = list_entry(v, struct mddev, all_mddevs); @@ -8365,7 +8392,7 @@ static int md_seq_show(struct seq_file *seq, void *v) } else seq_printf(seq, "\n ");
- md_bitmap_status(seq, mddev->bitmap); + md_bitmap_status(seq, mddev);
seq_printf(seq, "\n"); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 82697ccf7e495c1ba81e315c2886d6220ff84c2c ]
drivers/md/md-cluster.c:1220:22: warning: incorrect type in assignment (different base types) drivers/md/md-cluster.c:1220:22: expected unsigned long my_sync_size drivers/md/md-cluster.c:1220:22: got restricted __le64 [usertype] sync_size drivers/md/md-cluster.c:1252:35: warning: incorrect type in assignment (different base types) drivers/md/md-cluster.c:1252:35: expected unsigned long sync_size drivers/md/md-cluster.c:1252:35: got restricted __le64 [usertype] sync_size drivers/md/md-cluster.c:1253:41: warning: restricted __le64 degrades to integer
Fix the warnings by using le64_to_cpu() to convet __le64 to integer.
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20240826074452.1490072-6-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md-cluster.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c index 1e26eb2233495..ca4d3a8d5dd76 100644 --- a/drivers/md/md-cluster.c +++ b/drivers/md/md-cluster.c @@ -1200,7 +1200,7 @@ static int cluster_check_sync_size(struct mddev *mddev) struct dlm_lock_resource *bm_lockres;
sb = kmap_atomic(bitmap->storage.sb_page); - my_sync_size = sb->sync_size; + my_sync_size = le64_to_cpu(sb->sync_size); kunmap_atomic(sb);
for (i = 0; i < node_num; i++) { @@ -1232,8 +1232,8 @@ static int cluster_check_sync_size(struct mddev *mddev)
sb = kmap_atomic(bitmap->storage.sb_page); if (sync_size == 0) - sync_size = sb->sync_size; - else if (sync_size != sb->sync_size) { + sync_size = le64_to_cpu(sb->sync_size); + else if (sync_size != le64_to_cpu(sb->sync_size)) { kunmap_atomic(sb); md_bitmap_free(bitmap); return -1;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit ec6bb299c7c3dd4ca1724d13d5f5fae3ee54fc65 ]
To avoid dereferencing bitmap directly in md-cluster to prepare inventing a new bitmap.
BTW, also fix following checkpatch warnings:
WARNING: Deprecated use of 'kmap_atomic', prefer 'kmap_local_page' instead WARNING: Deprecated use of 'kunmap_atomic', prefer 'kunmap_local' instead
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20240826074452.1490072-7-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org Stable-dep-of: 8d28d0ddb986 ("md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md-bitmap.c | 6 ++++++ drivers/md/md-bitmap.h | 1 + drivers/md/md-cluster.c | 34 ++++++++++++++++++++-------------- 3 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index f7b02d87a6da7..80b0cd7b88995 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -2115,10 +2115,15 @@ EXPORT_SYMBOL_GPL(md_bitmap_copy_from_slot); int md_bitmap_get_stats(struct bitmap *bitmap, struct md_bitmap_stats *stats) { struct bitmap_counts *counts; + bitmap_super_t *sb;
if (!bitmap) return -ENOENT;
+ sb = kmap_local_page(bitmap->storage.sb_page); + stats->sync_size = le64_to_cpu(sb->sync_size); + kunmap_local(sb); + counts = &bitmap->counts; stats->missing_pages = counts->missing_pages; stats->pages = counts->pages; @@ -2126,6 +2131,7 @@ int md_bitmap_get_stats(struct bitmap *bitmap, struct md_bitmap_stats *stats)
return 0; } +EXPORT_SYMBOL_GPL(md_bitmap_get_stats);
int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks, int chunksize, int init) diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 60b86ee939081..840efd1b8a01c 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -236,6 +236,7 @@ struct bitmap {
struct md_bitmap_stats { unsigned long missing_pages; + unsigned long sync_size; unsigned long pages; struct file *file; }; diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c index ca4d3a8d5dd76..6a89f6b5d64f9 100644 --- a/drivers/md/md-cluster.c +++ b/drivers/md/md-cluster.c @@ -1190,18 +1190,21 @@ static int resize_bitmaps(struct mddev *mddev, sector_t newsize, sector_t oldsiz */ static int cluster_check_sync_size(struct mddev *mddev) { - int i, rv; - bitmap_super_t *sb; - unsigned long my_sync_size, sync_size = 0; - int node_num = mddev->bitmap_info.nodes; int current_slot = md_cluster_ops->slot_number(mddev); + int node_num = mddev->bitmap_info.nodes; struct bitmap *bitmap = mddev->bitmap; - char str[64]; struct dlm_lock_resource *bm_lockres; + struct md_bitmap_stats stats; + unsigned long sync_size = 0; + unsigned long my_sync_size; + char str[64]; + int i, rv;
- sb = kmap_atomic(bitmap->storage.sb_page); - my_sync_size = le64_to_cpu(sb->sync_size); - kunmap_atomic(sb); + rv = md_bitmap_get_stats(bitmap, &stats); + if (rv) + return rv; + + my_sync_size = stats.sync_size;
for (i = 0; i < node_num; i++) { if (i == current_slot) @@ -1230,15 +1233,18 @@ static int cluster_check_sync_size(struct mddev *mddev) md_bitmap_update_sb(bitmap); lockres_free(bm_lockres);
- sb = kmap_atomic(bitmap->storage.sb_page); - if (sync_size == 0) - sync_size = le64_to_cpu(sb->sync_size); - else if (sync_size != le64_to_cpu(sb->sync_size)) { - kunmap_atomic(sb); + rv = md_bitmap_get_stats(bitmap, &stats); + if (rv) { + md_bitmap_free(bitmap); + return rv; + } + + if (sync_size == 0) { + sync_size = stats.sync_size; + } else if (sync_size != stats.sync_size) { md_bitmap_free(bitmap); return -1; } - kunmap_atomic(sb); md_bitmap_free(bitmap); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 8d28d0ddb986f56920ac97ae704cc3340a699a30 ]
After commit ec6bb299c7c3 ("md/md-bitmap: add 'sync_size' into struct md_bitmap_stats"), following panic is reported:
Oops: general protection fault, probably for non-canonical address RIP: 0010:bitmap_get_stats+0x2b/0xa0 Call Trace: <TASK> md_seq_show+0x2d2/0x5b0 seq_read_iter+0x2b9/0x470 seq_read+0x12f/0x180 proc_reg_read+0x57/0xb0 vfs_read+0xf6/0x380 ksys_read+0x6c/0xf0 do_syscall_64+0x82/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Root cause is that bitmap_get_stats() can be called at anytime if mddev is still there, even if bitmap is destroyed, or not fully initialized. Deferenceing bitmap in this case can crash the kernel. Meanwhile, the above commit start to deferencing bitmap->storage, make the problem easier to trigger.
Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex.
Cc: stable@vger.kernel.org # v6.12+ Fixes: 32a7627cf3a3 ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@orac... Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md-bitmap.c | 5 ++++- drivers/md/md.c | 5 +++++ 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 80b0cd7b88995..deb40a8ba3999 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -2119,7 +2119,10 @@ int md_bitmap_get_stats(struct bitmap *bitmap, struct md_bitmap_stats *stats)
if (!bitmap) return -ENOENT; - + if (bitmap->mddev->bitmap_info.external) + return -ENOENT; + if (!bitmap->storage.sb_page) /* no superblock */ + return -EINVAL; sb = kmap_local_page(bitmap->storage.sb_page); stats->sync_size = le64_to_cpu(sb->sync_size); kunmap_local(sb); diff --git a/drivers/md/md.c b/drivers/md/md.c index b73649fd8e039..534c4efd935f6 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8327,6 +8327,10 @@ static int md_seq_show(struct seq_file *seq, void *v) return 0;
spin_unlock(&all_mddevs_lock); + + /* prevent bitmap to be freed after checking */ + mutex_lock(&mddev->bitmap_info.mutex); + spin_lock(&mddev->lock); if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { seq_printf(seq, "%s : %sactive", mdname(mddev), @@ -8397,6 +8401,7 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "\n"); } spin_unlock(&mddev->lock); + mutex_unlock(&mddev->bitmap_info.mutex); spin_lock(&all_mddevs_lock); if (atomic_dec_and_test(&mddev->active)) __mddev_put(mddev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carlos Galo carlosgalo@google.com
[ Upstream commit 72ba14deb40a9e9668ec5e66a341ed657e5215c2 ]
The current implementation of the mark_victim tracepoint provides only the process ID (pid) of the victim process. This limitation poses challenges for userspace tools requiring real-time OOM analysis and intervention. Although this information is available from the kernel logs, it’s not the appropriate format to provide OOM notifications. In Android, BPF programs are used with the mark_victim trace events to notify userspace of an OOM kill. For consistency, update the trace event to include the same information about the OOMed victim as the kernel logs.
- UID In Android each installed application has a unique UID. Including the `uid` assists in correlating OOM events with specific apps.
- Process Name (comm) Enables identification of the affected process.
- OOM Score Will allow userspace to get additional insight of the relative kill priority of the OOM victim. In Android, the oom_score_adj is used to categorize app state (foreground, background, etc.), which aids in analyzing user-perceptible impacts of OOM events [1].
- Total VM, RSS Stats, and pgtables Amount of memory used by the victim that will, potentially, be freed up by killing it.
[1] https://cs.android.com/android/platform/superproject/main/+/246dc8fc95b6d93a... Signed-off-by: Carlos Galo carlosgalo@google.com Reviewed-by: Steven Rostedt rostedt@goodmis.org Cc: Suren Baghdasaryan surenb@google.com Cc: Michal Hocko mhocko@suse.com Cc: "Masami Hiramatsu (Google)" mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/oom.h | 36 ++++++++++++++++++++++++++++++++---- mm/oom_kill.c | 6 +++++- 2 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h index 26a11e4a2c361..b799f3bcba823 100644 --- a/include/trace/events/oom.h +++ b/include/trace/events/oom.h @@ -7,6 +7,8 @@ #include <linux/tracepoint.h> #include <trace/events/mmflags.h>
+#define PG_COUNT_TO_KB(x) ((x) << (PAGE_SHIFT - 10)) + TRACE_EVENT(oom_score_adj_update,
TP_PROTO(struct task_struct *task), @@ -72,19 +74,45 @@ TRACE_EVENT(reclaim_retry_zone, );
TRACE_EVENT(mark_victim, - TP_PROTO(int pid), + TP_PROTO(struct task_struct *task, uid_t uid),
- TP_ARGS(pid), + TP_ARGS(task, uid),
TP_STRUCT__entry( __field(int, pid) + __string(comm, task->comm) + __field(unsigned long, total_vm) + __field(unsigned long, anon_rss) + __field(unsigned long, file_rss) + __field(unsigned long, shmem_rss) + __field(uid_t, uid) + __field(unsigned long, pgtables) + __field(short, oom_score_adj) ),
TP_fast_assign( - __entry->pid = pid; + __entry->pid = task->pid; + __assign_str(comm, task->comm); + __entry->total_vm = PG_COUNT_TO_KB(task->mm->total_vm); + __entry->anon_rss = PG_COUNT_TO_KB(get_mm_counter(task->mm, MM_ANONPAGES)); + __entry->file_rss = PG_COUNT_TO_KB(get_mm_counter(task->mm, MM_FILEPAGES)); + __entry->shmem_rss = PG_COUNT_TO_KB(get_mm_counter(task->mm, MM_SHMEMPAGES)); + __entry->uid = uid; + __entry->pgtables = mm_pgtables_bytes(task->mm) >> 10; + __entry->oom_score_adj = task->signal->oom_score_adj; ),
- TP_printk("pid=%d", __entry->pid) + TP_printk("pid=%d comm=%s total-vm=%lukB anon-rss=%lukB file-rss:%lukB shmem-rss:%lukB uid=%u pgtables=%lukB oom_score_adj=%hd", + __entry->pid, + __get_str(comm), + __entry->total_vm, + __entry->anon_rss, + __entry->file_rss, + __entry->shmem_rss, + __entry->uid, + __entry->pgtables, + __entry->oom_score_adj + ) );
TRACE_EVENT(wake_reaper, diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 44bde56ecd025..22b99f835c8c4 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -44,6 +44,7 @@ #include <linux/kthread.h> #include <linux/init.h> #include <linux/mmu_notifier.h> +#include <linux/cred.h>
#include <asm/tlb.h> #include "internal.h" @@ -755,6 +756,7 @@ static inline void queue_oom_reaper(struct task_struct *tsk) */ static void mark_oom_victim(struct task_struct *tsk) { + const struct cred *cred; struct mm_struct *mm = tsk->mm;
WARN_ON(oom_killer_disabled); @@ -774,7 +776,9 @@ static void mark_oom_victim(struct task_struct *tsk) */ __thaw_task(tsk); atomic_inc(&oom_victims); - trace_mark_victim(tsk->pid); + cred = get_task_cred(tsk); + trace_mark_victim(tsk, cred->uid.val); + put_cred(cred); }
/**
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ridong chenridong@huawei.com
[ Upstream commit ade81479c7dda1ce3eedb215c78bc615bbd04f06 ]
A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered.
watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:console_unlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintk_emit+0x193/0x280 printk+0x52/0x6e dump_task+0x114/0x130 mem_cgroup_scan_tasks+0x76/0x100 dump_header+0x1fe/0x210 oom_kill_process+0xd1/0x100 out_of_memory+0x125/0x570 mem_cgroup_out_of_memory+0xb5/0xd0 try_charge+0x720/0x770 mem_cgroup_try_charge+0x86/0x180 mem_cgroup_try_charge_delay+0x1c/0x40 do_anonymous_page+0xb5/0x390 handle_mm_fault+0xc4/0x1f0
This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process.
To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' function per 1000 iterations. For global OOM, call 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
Link: https://lkml.kernel.org/r/20241224025238.3768787-1-chenridong@huaweicloud.co... Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads") Signed-off-by: Chen Ridong chenridong@huawei.com Acked-by: Michal Hocko mhocko@suse.com Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Johannes Weiner hannes@cmpxchg.org Cc: Shakeel Butt shakeelb@google.com Cc: Muchun Song songmuchun@bytedance.com Cc: Michal Koutný mkoutny@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/memcontrol.c | 7 ++++++- mm/oom_kill.c | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d2ceadd11b100..9bf5a69e20d87 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1266,6 +1266,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, { struct mem_cgroup *iter; int ret = 0; + int i = 0;
BUG_ON(mem_cgroup_is_root(memcg));
@@ -1274,8 +1275,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, struct task_struct *task;
css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); - while (!ret && (task = css_task_iter_next(&it))) + while (!ret && (task = css_task_iter_next(&it))) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + cond_resched(); ret = fn(task, arg); + } css_task_iter_end(&it); if (ret) { mem_cgroup_iter_break(memcg, iter); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 22b99f835c8c4..17a2ef9f93d3d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -45,6 +45,7 @@ #include <linux/init.h> #include <linux/mmu_notifier.h> #include <linux/cred.h> +#include <linux/nmi.h>
#include <asm/tlb.h> #include "internal.h" @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc) mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); else { struct task_struct *p; + int i = 0;
rcu_read_lock(); - for_each_process(p) + for_each_process(p) { + /* Avoid potential softlockup warning */ + if ((++i & 1023) == 0) + touch_softlockup_watchdog(); dump_task(p, oc); + } rcu_read_unlock(); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Johnson quic_jjohnson@quicinc.com
[ Upstream commit 64e018d7a8990c11734704a0767c47fd8efd5388 ]
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/cpufreq/cpufreq-dt-platdev.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Stable-dep-of: f1f010c9d9c6 ("cpufreq: fix using cpufreq-dt as module") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/cpufreq-dt-platdev.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c index fb2875ce1fdd5..99c31837084c0 100644 --- a/drivers/cpufreq/cpufreq-dt-platdev.c +++ b/drivers/cpufreq/cpufreq-dt-platdev.c @@ -225,4 +225,5 @@ static int __init cpufreq_dt_platdev_init(void) sizeof(struct cpufreq_dt_platform_data))); } core_initcall(cpufreq_dt_platdev_init); +MODULE_DESCRIPTION("Generic DT based cpufreq platdev driver"); MODULE_LICENSE("GPL");
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Kemnade andreas@kemnade.info
[ Upstream commit f1f010c9d9c62c865d9f54e94075800ba764b4d9 ]
This driver can be built as a module since commit 3b062a086984 ("cpufreq: dt-platdev: Support building as module"), but unfortunately this caused a regression because the cputfreq-dt-platdev.ko module does not autoload.
Usually, this is solved by just using the MODULE_DEVICE_TABLE() macro to export all the device IDs as module aliases. But this driver is special due how matches with devices and decides what platform supports.
There are two of_device_id lists, an allow list that are for CPU devices that always match and a deny list that's for devices that must not match.
The driver registers a cpufreq-dt platform device for all the CPU device nodes that either are in the allow list or contain an operating-points-v2 property and are not in the deny list.
Enforce builtin compile of cpufreq-dt-platdev to make autoload work.
Fixes: 3b062a086984 ("cpufreq: dt-platdev: Support building as module") Link: https://lore.kernel.org/all/20241104201424.2a42efdd@akair/ Link: https://lore.kernel.org/all/20241119111918.1732531-1-javierm@redhat.com/ Cc: stable@vger.kernel.org Signed-off-by: Andreas Kemnade andreas@kemnade.info Reported-by: Radu Rendec rrendec@redhat.com Reported-by: Javier Martinez Canillas javierm@redhat.com [ Viresh: Picked commit log from Javier, updated tags ] Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/Kconfig | 2 +- drivers/cpufreq/cpufreq-dt-platdev.c | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index f429b9b37b76c..7e773c47a4fcd 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -218,7 +218,7 @@ config CPUFREQ_DT If in doubt, say N.
config CPUFREQ_DT_PLATDEV - tristate "Generic DT based cpufreq platdev driver" + bool "Generic DT based cpufreq platdev driver" depends on OF help This adds a generic DT based cpufreq platdev driver for frequency diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c index 99c31837084c0..09becf14653b5 100644 --- a/drivers/cpufreq/cpufreq-dt-platdev.c +++ b/drivers/cpufreq/cpufreq-dt-platdev.c @@ -225,5 +225,3 @@ static int __init cpufreq_dt_platdev_init(void) sizeof(struct cpufreq_dt_platform_data))); } core_initcall(cpufreq_dt_platdev_init); -MODULE_DESCRIPTION("Generic DT based cpufreq platdev driver"); -MODULE_LICENSE("GPL");
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit e41137d8bd1a8e8bab8dcbfe3ec056418db3df18 ]
Download board id specific NVM instead of default for WCN7850 if board id is available.
Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: a2fad248947d ("Bluetooth: qca: Fix poor RF performance for WCN6855") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index 35fb26cbf2294..513ff87a7a049 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -739,6 +739,19 @@ static void qca_generate_hsp_nvm_name(char *fwname, size_t max_size, snprintf(fwname, max_size, "qca/hpnv%02x%s.%x", rom_ver, variant, bid); }
+static inline void qca_get_nvm_name_generic(struct qca_fw_config *cfg, + const char *stem, u8 rom_ver, u16 bid) +{ + if (bid == 0x0) + snprintf(cfg->fwname, sizeof(cfg->fwname), "qca/%snv%02x.bin", stem, rom_ver); + else if (bid & 0xff00) + snprintf(cfg->fwname, sizeof(cfg->fwname), + "qca/%snv%02x.b%x", stem, rom_ver, bid); + else + snprintf(cfg->fwname, sizeof(cfg->fwname), + "qca/%snv%02x.b%02x", stem, rom_ver, bid); +} + int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, enum qca_btsoc_type soc_type, struct qca_btsoc_version ver, const char *firmware_name) @@ -819,7 +832,7 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, /* Give the controller some time to get ready to receive the NVM */ msleep(10);
- if (soc_type == QCA_QCA2066) + if (soc_type == QCA_QCA2066 || soc_type == QCA_WCN7850) qca_read_fw_board_id(hdev, &boardid);
/* Download NVM configuration */ @@ -861,8 +874,7 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/hpnv%02x.bin", rom_ver); break; case QCA_WCN7850: - snprintf(config.fwname, sizeof(config.fwname), - "qca/hmtnv%02x.bin", rom_ver); + qca_get_nvm_name_generic(&config, "hmt", rom_ver, boardid); break;
default:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cheng Jiang quic_chejiang@quicinc.com
[ Upstream commit a4c5a468c6329bde7dfd46bacff2cbf5f8a8152e ]
Different connectivity boards may be attached to the same platform. For example, QCA6698-based boards can support either a two-antenna or three-antenna solution, both of which work on the sa8775p-ride platform. Due to differences in connectivity boards and variations in RF performance from different foundries, different NVM configurations are used based on the board ID.
Therefore, in the firmware-name property, if the NVM file has an extension, the NVM file will be used. Otherwise, the system will first try the .bNN (board ID) file, and if that fails, it will fall back to the .bin file.
Possible configurations: firmware-name = "QCA6698/hpnv21"; firmware-name = "QCA6698/hpnv21.bin";
Signed-off-by: Cheng Jiang quic_chejiang@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: a2fad248947d ("Bluetooth: qca: Fix poor RF performance for WCN6855") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 113 ++++++++++++++++++++++++++++---------- 1 file changed, 85 insertions(+), 28 deletions(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index 513ff87a7a049..484a860785fde 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -289,6 +289,39 @@ int qca_send_pre_shutdown_cmd(struct hci_dev *hdev) } EXPORT_SYMBOL_GPL(qca_send_pre_shutdown_cmd);
+static bool qca_filename_has_extension(const char *filename) +{ + const char *suffix = strrchr(filename, '.'); + + /* File extensions require a dot, but not as the first or last character */ + if (!suffix || suffix == filename || *(suffix + 1) == '\0') + return 0; + + /* Avoid matching directories with names that look like files with extensions */ + return !strchr(suffix, '/'); +} + +static bool qca_get_alt_nvm_file(char *filename, size_t max_size) +{ + char fwname[64]; + const char *suffix; + + /* nvm file name has an extension, replace with .bin */ + if (qca_filename_has_extension(filename)) { + suffix = strrchr(filename, '.'); + strscpy(fwname, filename, suffix - filename + 1); + snprintf(fwname + (suffix - filename), + sizeof(fwname) - (suffix - filename), ".bin"); + /* If nvm file is already the default one, return false to skip the retry. */ + if (strcmp(fwname, filename) == 0) + return false; + + snprintf(filename, max_size, "%s", fwname); + return true; + } + return false; +} + static int qca_tlv_check_data(struct hci_dev *hdev, struct qca_fw_config *config, u8 *fw_data, size_t fw_size, @@ -586,6 +619,19 @@ static int qca_download_firmware(struct hci_dev *hdev, config->fwname, ret); return ret; } + } + /* If the board-specific file is missing, try loading the default + * one, unless that was attempted already. + */ + else if (config->type == TLV_TYPE_NVM && + qca_get_alt_nvm_file(config->fwname, sizeof(config->fwname))) { + bt_dev_info(hdev, "QCA Downloading %s", config->fwname); + ret = request_firmware(&fw, config->fwname, &hdev->dev); + if (ret) { + bt_dev_err(hdev, "QCA Failed to request file: %s (%d)", + config->fwname, ret); + return ret; + } } else { bt_dev_err(hdev, "QCA Failed to request file: %s (%d)", config->fwname, ret); @@ -722,34 +768,38 @@ static int qca_check_bdaddr(struct hci_dev *hdev, const struct qca_fw_config *co return 0; }
-static void qca_generate_hsp_nvm_name(char *fwname, size_t max_size, +static void qca_get_nvm_name_by_board(char *fwname, size_t max_size, + const char *stem, enum qca_btsoc_type soc_type, struct qca_btsoc_version ver, u8 rom_ver, u16 bid) { const char *variant; + const char *prefix;
- /* hsp gf chip */ - if ((le32_to_cpu(ver.soc_id) & QCA_HSP_GF_SOC_MASK) == QCA_HSP_GF_SOC_ID) - variant = "g"; - else - variant = ""; + /* Set the default value to variant and prefix */ + variant = ""; + prefix = "b";
- if (bid == 0x0) - snprintf(fwname, max_size, "qca/hpnv%02x%s.bin", rom_ver, variant); - else - snprintf(fwname, max_size, "qca/hpnv%02x%s.%x", rom_ver, variant, bid); -} + if (soc_type == QCA_QCA2066) + prefix = "";
-static inline void qca_get_nvm_name_generic(struct qca_fw_config *cfg, - const char *stem, u8 rom_ver, u16 bid) -{ - if (bid == 0x0) - snprintf(cfg->fwname, sizeof(cfg->fwname), "qca/%snv%02x.bin", stem, rom_ver); - else if (bid & 0xff00) - snprintf(cfg->fwname, sizeof(cfg->fwname), - "qca/%snv%02x.b%x", stem, rom_ver, bid); - else - snprintf(cfg->fwname, sizeof(cfg->fwname), - "qca/%snv%02x.b%02x", stem, rom_ver, bid); + if (soc_type == QCA_WCN6855 || soc_type == QCA_QCA2066) { + /* If the chip is manufactured by GlobalFoundries */ + if ((le32_to_cpu(ver.soc_id) & QCA_HSP_GF_SOC_MASK) == QCA_HSP_GF_SOC_ID) + variant = "g"; + } + + if (rom_ver != 0) { + if (bid == 0x0 || bid == 0xffff) + snprintf(fwname, max_size, "qca/%s%02x%s.bin", stem, rom_ver, variant); + else + snprintf(fwname, max_size, "qca/%s%02x%s.%s%02x", stem, rom_ver, + variant, prefix, bid); + } else { + if (bid == 0x0 || bid == 0xffff) + snprintf(fwname, max_size, "qca/%s%s.bin", stem, variant); + else + snprintf(fwname, max_size, "qca/%s%s.%s%02x", stem, variant, prefix, bid); + } }
int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, @@ -838,8 +888,14 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, /* Download NVM configuration */ config.type = TLV_TYPE_NVM; if (firmware_name) { - snprintf(config.fwname, sizeof(config.fwname), - "qca/%s", firmware_name); + /* The firmware name has an extension, use it directly */ + if (qca_filename_has_extension(firmware_name)) { + snprintf(config.fwname, sizeof(config.fwname), "qca/%s", firmware_name); + } else { + qca_read_fw_board_id(hdev, &boardid); + qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), + firmware_name, soc_type, ver, 0, boardid); + } } else { switch (soc_type) { case QCA_WCN3990: @@ -858,8 +914,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/apnv%02x.bin", rom_ver); break; case QCA_QCA2066: - qca_generate_hsp_nvm_name(config.fwname, - sizeof(config.fwname), ver, rom_ver, boardid); + qca_get_nvm_name_by_board(config.fwname, + sizeof(config.fwname), "hpnv", soc_type, ver, + rom_ver, boardid); break; case QCA_QCA6390: snprintf(config.fwname, sizeof(config.fwname), @@ -874,9 +931,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/hpnv%02x.bin", rom_ver); break; case QCA_WCN7850: - qca_get_nvm_name_generic(&config, "hmt", rom_ver, boardid); + qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), + "hmtnv", soc_type, ver, rom_ver, boardid); break; - default: snprintf(config.fwname, sizeof(config.fwname), "qca/nvm_%08x.bin", soc_ver);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit a2fad248947d702ed3dcb52b8377c1a3ae201e44 ]
For WCN6855, board ID specific NVM needs to be downloaded once board ID is available, but the default NVM is always downloaded currently.
The wrong NVM causes poor RF performance, and effects user experience for several types of laptop with WCN6855 on the market.
Fix by downloading board ID specific NVM if board ID is available.
Fixes: 095327fede00 ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6855") Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Tested-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Johan Hovold johan+linaro@kernel.org Tested-by: Steev Klimaszewski steev@kali.org #Thinkpad X13s Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index 484a860785fde..892e2540f008a 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -927,8 +927,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, "qca/msnv%02x.bin", rom_ver); break; case QCA_WCN6855: - snprintf(config.fwname, sizeof(config.fwname), - "qca/hpnv%02x.bin", rom_ver); + qca_read_fw_board_id(hdev, &boardid); + qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), + "hpnv", soc_type, ver, rom_ver, boardid); break; case QCA_WCN7850: qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit 0e45a09a1da0872786885c505467aab8fb29b5b4 ]
serio_pause_rx() and serio_continue_rx() are usually used together to temporarily stop receiving interrupts/data for a given serio port. Define "serio_pause_rx" guard for this so that the port is always resumed once critical section is over.
Example:
scoped_guard(serio_pause_rx, elo->serio) { elo->expected_packet = toupper(packet[0]); init_completion(&elo->cmd_done); }
Link: https://lore.kernel.org/r/20240905041732.2034348-2-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Stable-dep-of: 08bd5b7c9a24 ("Input: synaptics - fix crash when enabling pass-through port") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/serio.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/linux/serio.h b/include/linux/serio.h index 6c27d413da921..e105ff2ee651a 100644 --- a/include/linux/serio.h +++ b/include/linux/serio.h @@ -6,6 +6,7 @@ #define _SERIO_H
+#include <linux/cleanup.h> #include <linux/types.h> #include <linux/interrupt.h> #include <linux/list.h> @@ -161,4 +162,6 @@ static inline void serio_continue_rx(struct serio *serio) spin_unlock_irq(&serio->lock); }
+DEFINE_GUARD(serio_pause_rx, struct serio *, serio_pause_rx(_T), serio_continue_rx(_T)) + #endif
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit 08bd5b7c9a2401faabdaa1472d45c7de0755fd7e ]
When enabling a pass-through port an interrupt might come before psmouse driver binds to the pass-through port. However synaptics sub-driver tries to access psmouse instance presumably associated with the pass-through port to figure out if only 1 byte of response or entire protocol packet needs to be forwarded to the pass-through port and may crash if psmouse instance has not been attached to the port yet.
Fix the crash by introducing open() and close() methods for the port and check if the port is open before trying to access psmouse instance. Because psmouse calls serio_open() only after attaching psmouse instance to serio port instance this prevents the potential crash.
Reported-by: Takashi Iwai tiwai@suse.de Fixes: 100e16959c3c ("Input: libps2 - attach ps2dev instances as serio port's drvdata") Link: https://bugzilla.suse.com/show_bug.cgi?id=1219522 Cc: stable@vger.kernel.org Reviewed-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/Z4qSHORvPn7EU2j1@google.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/mouse/synaptics.c | 56 ++++++++++++++++++++++++--------- drivers/input/mouse/synaptics.h | 1 + 2 files changed, 43 insertions(+), 14 deletions(-)
diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c index cff3393f0dd00..26677432ac836 100644 --- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -667,23 +667,50 @@ static void synaptics_pt_stop(struct serio *serio) serio_continue_rx(parent->ps2dev.serio); }
+static int synaptics_pt_open(struct serio *serio) +{ + struct psmouse *parent = psmouse_from_serio(serio->parent); + struct synaptics_data *priv = parent->private; + + guard(serio_pause_rx)(parent->ps2dev.serio); + priv->pt_port_open = true; + + return 0; +} + +static void synaptics_pt_close(struct serio *serio) +{ + struct psmouse *parent = psmouse_from_serio(serio->parent); + struct synaptics_data *priv = parent->private; + + guard(serio_pause_rx)(parent->ps2dev.serio); + priv->pt_port_open = false; +} + static int synaptics_is_pt_packet(u8 *buf) { return (buf[0] & 0xFC) == 0x84 && (buf[3] & 0xCC) == 0xC4; }
-static void synaptics_pass_pt_packet(struct serio *ptport, u8 *packet) +static void synaptics_pass_pt_packet(struct synaptics_data *priv, u8 *packet) { - struct psmouse *child = psmouse_from_serio(ptport); + struct serio *ptport;
- if (child && child->state == PSMOUSE_ACTIVATED) { - serio_interrupt(ptport, packet[1], 0); - serio_interrupt(ptport, packet[4], 0); - serio_interrupt(ptport, packet[5], 0); - if (child->pktsize == 4) - serio_interrupt(ptport, packet[2], 0); - } else { - serio_interrupt(ptport, packet[1], 0); + ptport = priv->pt_port; + if (!ptport) + return; + + serio_interrupt(ptport, packet[1], 0); + + if (priv->pt_port_open) { + struct psmouse *child = psmouse_from_serio(ptport); + + if (child->state == PSMOUSE_ACTIVATED) { + serio_interrupt(ptport, packet[4], 0); + serio_interrupt(ptport, packet[5], 0); + if (child->pktsize == 4) + serio_interrupt(ptport, packet[2], 0); + } } }
@@ -722,6 +749,8 @@ static void synaptics_pt_create(struct psmouse *psmouse) serio->write = synaptics_pt_write; serio->start = synaptics_pt_start; serio->stop = synaptics_pt_stop; + serio->open = synaptics_pt_open; + serio->close = synaptics_pt_close; serio->parent = psmouse->ps2dev.serio;
psmouse->pt_activate = synaptics_pt_activate; @@ -1218,11 +1247,10 @@ static psmouse_ret_t synaptics_process_byte(struct psmouse *psmouse)
if (SYN_CAP_PASS_THROUGH(priv->info.capabilities) && synaptics_is_pt_packet(psmouse->packet)) { - if (priv->pt_port) - synaptics_pass_pt_packet(priv->pt_port, - psmouse->packet); - } else + synaptics_pass_pt_packet(priv, psmouse->packet); + } else { synaptics_process_packet(psmouse); + }
return PSMOUSE_FULL_PACKET; } diff --git a/drivers/input/mouse/synaptics.h b/drivers/input/mouse/synaptics.h index 08533d1b1b16f..4b34f13b9f761 100644 --- a/drivers/input/mouse/synaptics.h +++ b/drivers/input/mouse/synaptics.h @@ -188,6 +188,7 @@ struct synaptics_data { bool disable_gesture; /* disable gestures */
struct serio *pt_port; /* Pass-through serio port */ + bool pt_port_open;
/* * Last received Advanced Gesture Mode (AGM) packet. An AGM packet
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 82a0a3e6f8c02b3236b55e784a083fa4ee07c321 ]
My static checker rule complains about this code. The concern is that if "sample_space" is negative then the "sample_space >= runtime->channels" condition will not work as intended because it will be type promoted to a high unsigned int value.
strm->fifo_sample_size is SSI_FIFO_DEPTH (32). The SSIFSR_TDC_MASK is 0x3f. Without any further context it does seem like a reasonable warning and it can't hurt to add a check for negatives.
Cc: stable@vger.kernel.org Fixes: 03e786bd4341 ("ASoC: sh: Add RZ/G2L SSIF-2 driver") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://patch.msgid.link/e07c3dc5-d885-4b04-a742-71f42243f4fd@stanley.mounta... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sh/rz-ssi.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/sh/rz-ssi.c b/sound/soc/sh/rz-ssi.c index 353863f49b313..54f096bdc7ee2 100644 --- a/sound/soc/sh/rz-ssi.c +++ b/sound/soc/sh/rz-ssi.c @@ -484,6 +484,8 @@ static int rz_ssi_pio_send(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm) sample_space = strm->fifo_sample_size; ssifsr = rz_ssi_reg_readl(ssi, SSIFSR); sample_space -= (ssifsr >> SSIFSR_TDC_SHIFT) & SSIFSR_TDC_MASK; + if (sample_space < 0) + return -EINVAL;
/* Only add full frames at a time */ while (frames_left && (sample_space >= runtime->channels)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 0a744cceebd0480cb39587b3b1339d66a9d14063 ]
Commit 2e4955167ec5 ("firmware: qcom: scm: Fix __scm and waitq completion variable initialization") introduced a write barrier in probe function to store global '__scm' variable. It also claimed that it added a read barrier, because as we all known barriers are paired (see memory-barriers.txt: "Note that write barriers should normally be paired with read or address-dependency barriers"), however it did not really add it.
The offending commit used READ_ONCE() to access '__scm' global which is not a barrier.
The barrier is needed so the store to '__scm' will be properly visible. This is most likely not fatal in current driver design, because missing read barrier would mean qcom_scm_is_available() callers will access old value, NULL. Driver does not support unbinding and does not correctly handle probe failures, thus there is no risk of stale or old pointer in '__scm' variable.
However for code correctness, readability and to be sure that we did not mess up something in this tricky topic of SMP barriers, add a read barrier for accessing '__scm'. Change also comment from useless/obvious what does barrier do, to what is expected: which other parts of the code are involved here.
Fixes: 2e4955167ec5 ("firmware: qcom: scm: Fix __scm and waitq completion variable initialization") Cc: stable@vger.kernel.org Reviewed-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20241209-qcom-scm-missing-barriers-and-all-sort-of... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/qcom_scm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c index 7af59985f1c1f..4c5c2b73d42c2 100644 --- a/drivers/firmware/qcom_scm.c +++ b/drivers/firmware/qcom_scm.c @@ -1339,7 +1339,8 @@ static int qcom_scm_find_dload_address(struct device *dev, u64 *addr) */ bool qcom_scm_is_available(void) { - return !!READ_ONCE(__scm); + /* Paired with smp_store_release() in qcom_scm_probe */ + return !!smp_load_acquire(&__scm); } EXPORT_SYMBOL_GPL(qcom_scm_is_available);
@@ -1457,7 +1458,7 @@ static int qcom_scm_probe(struct platform_device *pdev) if (ret) return ret;
- /* Let all above stores be available after this */ + /* Paired with smp_load_acquire() in qcom_scm_is_available(). */ smp_store_release(&__scm, scm);
irq = platform_get_irq_optional(pdev, 0);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Douglas Gilbert dgilbert@interlog.com
[ Upstream commit 2bbeb8d12404cf0603f513fc33269ef9abfbb396 ]
The default handling of the NOT READY sense key is to wait for the device to become ready. The "wait" is assumed to be relatively short. However there is a sub-class of NOT READY that have the "... in progress" phrase in their additional sense code and these can take much longer. Following on from commit 505aa4b6a883 ("scsi: sd: Defer spinning up drive while SANITIZE is in progress") we now have element depopulation and restoration that can take a long time. For example, over 24 hours for a 20 TB, 7200 rpm hard disk to depopulate 1 of its 20 elements.
Add handling of ASC/ASCQ: 0x4,0x24 (depopulation in progress) and ASC/ASCQ: 0x4,0x25 (depopulation restoration in progress) to sd.c . The scsi_lib.c has incomplete handling of these two messages, so complete it.
Signed-off-by: Douglas Gilbert dgilbert@interlog.com Link: https://lore.kernel.org/r/20231015050650.131145-1-dgilbert@interlog.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Stable-dep-of: 9ff7c383b8ac ("scsi: core: Do not retry I/Os during depopulation") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_lib.c | 1 + drivers/scsi/sd.c | 4 ++++ 2 files changed, 5 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 97def2619ecf2..2f54e1a853099 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -775,6 +775,7 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) case 0x1b: /* sanitize in progress */ case 0x1d: /* configuration in progress */ case 0x24: /* depopulation in progress */ + case 0x25: /* depopulation restore in progress */ action = ACTION_DELAYED_RETRY; break; case 0x0a: /* ALUA state transition */ diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 2c627deedc1fa..fe694fec16b51 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2309,6 +2309,10 @@ sd_spinup_disk(struct scsi_disk *sdkp) break; /* unavailable */ if (sshdr.asc == 4 && sshdr.ascq == 0x1b) break; /* sanitize in progress */ + if (sshdr.asc == 4 && sshdr.ascq == 0x24) + break; /* depopulation in progress */ + if (sshdr.asc == 4 && sshdr.ascq == 0x25) + break; /* depopulation restoration in progress */ /* * Issue command to spin up drive when not ready */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Igor Pylypiv ipylypiv@google.com
[ Upstream commit 9ff7c383b8ac0c482a1da7989f703406d78445c6 ]
Fail I/Os instead of retry to prevent user space processes from being blocked on the I/O completion for several minutes.
Retrying I/Os during "depopulation in progress" or "depopulation restore in progress" results in a continuous retry loop until the depopulation completes or until the I/O retry loop is aborted due to a timeout by the scsi_cmd_runtime_exceeced().
Depopulation is slow and can take 24+ hours to complete on 20+ TB HDDs. Most I/Os in the depopulation retry loop end up taking several minutes before returning the failure to user space.
Cc: stable@vger.kernel.org # 4.18.x: 2bbeb8d scsi: core: Handle depopulation and restoration in progress Cc: stable@vger.kernel.org # 4.18.x Fixes: e37c7d9a0341 ("scsi: core: sanitize++ in progress") Signed-off-by: Igor Pylypiv ipylypiv@google.com Link: https://lore.kernel.org/r/20250131184408.859579-1-ipylypiv@google.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_lib.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 2f54e1a853099..f026377f1cf1c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -774,13 +774,18 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) case 0x1a: /* start stop unit in progress */ case 0x1b: /* sanitize in progress */ case 0x1d: /* configuration in progress */ - case 0x24: /* depopulation in progress */ - case 0x25: /* depopulation restore in progress */ action = ACTION_DELAYED_RETRY; break; case 0x0a: /* ALUA state transition */ action = ACTION_DELAYED_REPREP; break; + /* + * Depopulation might take many hours, + * thus it is not worthwhile to retry. + */ + case 0x24: /* depopulation in progress */ + case 0x25: /* depopulation restore in progress */ + fallthrough; default: action = ACTION_FAIL; break;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Armstrong neil.armstrong@linaro.org
[ Upstream commit 033fbfa0eb60e519f50e97ef93baec270cd28a88 ]
By default the DSP domains are non secure, add the missing qcom,non-secure-domain property to mark them as non-secure.
Fixes: 91d70eb70867 ("arm64: dts: qcom: sm8450: add fastrpc nodes") Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20240227-topic-sm8x50-upstream-fastrpc-non-secure-... Signed-off-by: Bjorn Andersson andersson@kernel.org Stable-dep-of: 13c96bee5d5e ("arm64: dts: qcom: sm8450: Fix ADSP memory base and length") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8450.dtsi | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sm8450.dtsi b/arch/arm64/boot/dts/qcom/sm8450.dtsi index 2a49a29713752..fb0162e65a38c 100644 --- a/arch/arm64/boot/dts/qcom/sm8450.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8450.dtsi @@ -2135,6 +2135,7 @@ compatible = "qcom,fastrpc"; qcom,glink-channels = "fastrpcglink-apps-dsp"; label = "sdsp"; + qcom,non-secure-domain; #address-cells = <1>; #size-cells = <0>;
@@ -2449,6 +2450,7 @@ compatible = "qcom,fastrpc"; qcom,glink-channels = "fastrpcglink-apps-dsp"; label = "adsp"; + qcom,non-secure-domain; #address-cells = <1>; #size-cells = <0>;
@@ -2515,6 +2517,7 @@ compatible = "qcom,fastrpc"; qcom,glink-channels = "fastrpcglink-apps-dsp"; label = "cdsp"; + qcom,non-secure-domain; #address-cells = <1>; #size-cells = <0>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 13c96bee5d5e5b61a9d8d000c9bb37bb9a2a0551 ]
The address space in ADSP PAS (Peripheral Authentication Service) remoteproc node should point to the QDSP PUB address space (QDSP6...SS_PUB): 0x0300_0000 with length of 0x10000, which also matches downstream DTS. 0x3000_0000, value used so far, was in datasheet is the region of CDSP.
Correct the base address and length, which also moves the node to different place to keep things sorted by unit address. The diff looks big, but only the unit address and "reg" property were changed. This should have no functional impact on Linux users, because PAS loader does not use this address space at all.
Fixes: 1172729576fb ("arm64: dts: qcom: sm8450: Add remoteproc enablers and instances") Cc: stable@vger.kernel.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20241213-dts-qcom-cdsp-mpss-base-address-v3-4-2e00... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8450.dtsi | 212 +++++++++++++-------------- 1 file changed, 106 insertions(+), 106 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8450.dtsi b/arch/arm64/boot/dts/qcom/sm8450.dtsi index fb0162e65a38c..3b4d788230089 100644 --- a/arch/arm64/boot/dts/qcom/sm8450.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8450.dtsi @@ -2161,6 +2161,112 @@ }; };
+ remoteproc_adsp: remoteproc@3000000 { + compatible = "qcom,sm8450-adsp-pas"; + reg = <0x0 0x03000000 0x0 0x10000>; + + interrupts-extended = <&pdc 6 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 0 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 1 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 2 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", + "handover", "stop-ack"; + + clocks = <&rpmhcc RPMH_CXO_CLK>; + clock-names = "xo"; + + power-domains = <&rpmhpd RPMHPD_LCX>, + <&rpmhpd RPMHPD_LMX>; + power-domain-names = "lcx", "lmx"; + + memory-region = <&adsp_mem>; + + qcom,qmp = <&aoss_qmp>; + + qcom,smem-states = <&smp2p_adsp_out 0>; + qcom,smem-state-names = "stop"; + + status = "disabled"; + + remoteproc_adsp_glink: glink-edge { + interrupts-extended = <&ipcc IPCC_CLIENT_LPASS + IPCC_MPROC_SIGNAL_GLINK_QMP + IRQ_TYPE_EDGE_RISING>; + mboxes = <&ipcc IPCC_CLIENT_LPASS + IPCC_MPROC_SIGNAL_GLINK_QMP>; + + label = "lpass"; + qcom,remote-pid = <2>; + + gpr { + compatible = "qcom,gpr"; + qcom,glink-channels = "adsp_apps"; + qcom,domain = <GPR_DOMAIN_ID_ADSP>; + qcom,intents = <512 20>; + #address-cells = <1>; + #size-cells = <0>; + + q6apm: service@1 { + compatible = "qcom,q6apm"; + reg = <GPR_APM_MODULE_IID>; + #sound-dai-cells = <0>; + qcom,protection-domain = "avs/audio", + "msm/adsp/audio_pd"; + + q6apmdai: dais { + compatible = "qcom,q6apm-dais"; + iommus = <&apps_smmu 0x1801 0x0>; + }; + + q6apmbedai: bedais { + compatible = "qcom,q6apm-lpass-dais"; + #sound-dai-cells = <1>; + }; + }; + + q6prm: service@2 { + compatible = "qcom,q6prm"; + reg = <GPR_PRM_MODULE_IID>; + qcom,protection-domain = "avs/audio", + "msm/adsp/audio_pd"; + + q6prmcc: clock-controller { + compatible = "qcom,q6prm-lpass-clocks"; + #clock-cells = <2>; + }; + }; + }; + + fastrpc { + compatible = "qcom,fastrpc"; + qcom,glink-channels = "fastrpcglink-apps-dsp"; + label = "adsp"; + qcom,non-secure-domain; + #address-cells = <1>; + #size-cells = <0>; + + compute-cb@3 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <3>; + iommus = <&apps_smmu 0x1803 0x0>; + }; + + compute-cb@4 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <4>; + iommus = <&apps_smmu 0x1804 0x0>; + }; + + compute-cb@5 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <5>; + iommus = <&apps_smmu 0x1805 0x0>; + }; + }; + }; + }; + wsa2macro: codec@31e0000 { compatible = "qcom,sm8450-lpass-wsa-macro"; reg = <0 0x031e0000 0 0x1000>; @@ -2369,112 +2475,6 @@ status = "disabled"; };
- remoteproc_adsp: remoteproc@30000000 { - compatible = "qcom,sm8450-adsp-pas"; - reg = <0 0x30000000 0 0x100>; - - interrupts-extended = <&pdc 6 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 0 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 1 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 2 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 3 IRQ_TYPE_EDGE_RISING>; - interrupt-names = "wdog", "fatal", "ready", - "handover", "stop-ack"; - - clocks = <&rpmhcc RPMH_CXO_CLK>; - clock-names = "xo"; - - power-domains = <&rpmhpd RPMHPD_LCX>, - <&rpmhpd RPMHPD_LMX>; - power-domain-names = "lcx", "lmx"; - - memory-region = <&adsp_mem>; - - qcom,qmp = <&aoss_qmp>; - - qcom,smem-states = <&smp2p_adsp_out 0>; - qcom,smem-state-names = "stop"; - - status = "disabled"; - - remoteproc_adsp_glink: glink-edge { - interrupts-extended = <&ipcc IPCC_CLIENT_LPASS - IPCC_MPROC_SIGNAL_GLINK_QMP - IRQ_TYPE_EDGE_RISING>; - mboxes = <&ipcc IPCC_CLIENT_LPASS - IPCC_MPROC_SIGNAL_GLINK_QMP>; - - label = "lpass"; - qcom,remote-pid = <2>; - - gpr { - compatible = "qcom,gpr"; - qcom,glink-channels = "adsp_apps"; - qcom,domain = <GPR_DOMAIN_ID_ADSP>; - qcom,intents = <512 20>; - #address-cells = <1>; - #size-cells = <0>; - - q6apm: service@1 { - compatible = "qcom,q6apm"; - reg = <GPR_APM_MODULE_IID>; - #sound-dai-cells = <0>; - qcom,protection-domain = "avs/audio", - "msm/adsp/audio_pd"; - - q6apmdai: dais { - compatible = "qcom,q6apm-dais"; - iommus = <&apps_smmu 0x1801 0x0>; - }; - - q6apmbedai: bedais { - compatible = "qcom,q6apm-lpass-dais"; - #sound-dai-cells = <1>; - }; - }; - - q6prm: service@2 { - compatible = "qcom,q6prm"; - reg = <GPR_PRM_MODULE_IID>; - qcom,protection-domain = "avs/audio", - "msm/adsp/audio_pd"; - - q6prmcc: clock-controller { - compatible = "qcom,q6prm-lpass-clocks"; - #clock-cells = <2>; - }; - }; - }; - - fastrpc { - compatible = "qcom,fastrpc"; - qcom,glink-channels = "fastrpcglink-apps-dsp"; - label = "adsp"; - qcom,non-secure-domain; - #address-cells = <1>; - #size-cells = <0>; - - compute-cb@3 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <3>; - iommus = <&apps_smmu 0x1803 0x0>; - }; - - compute-cb@4 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <4>; - iommus = <&apps_smmu 0x1804 0x0>; - }; - - compute-cb@5 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <5>; - iommus = <&apps_smmu 0x1805 0x0>; - }; - }; - }; - }; - remoteproc_cdsp: remoteproc@32300000 { compatible = "qcom,sm8450-cdsp-pas"; reg = <0 0x32300000 0 0x10000>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ling Xu quic_lxu5@quicinc.com
[ Upstream commit 4a03b85b8491d8bfe84a26ff979507b6ae7122c1 ]
Add dma-coherent property to fastRPC context bank nodes to pass dma sequence test in fastrpc sanity test, ensure that data integrity is maintained during DMA operations.
Signed-off-by: Ling Xu quic_lxu5@quicinc.com Link: https://lore.kernel.org/r/20240125102413.3016-2-quic_lxu5@quicinc.com Signed-off-by: Bjorn Andersson andersson@kernel.org Stable-dep-of: a6a8f54bc2af ("arm64: dts: qcom: sm8550: Fix ADSP memory base and length") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8550.dtsi | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sm8550.dtsi b/arch/arm64/boot/dts/qcom/sm8550.dtsi index f3a0e1fe333c4..51407c482b51f 100644 --- a/arch/arm64/boot/dts/qcom/sm8550.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8550.dtsi @@ -4006,6 +4006,7 @@ reg = <3>; iommus = <&apps_smmu 0x1003 0x80>, <&apps_smmu 0x1063 0x0>; + dma-coherent; };
compute-cb@4 { @@ -4013,6 +4014,7 @@ reg = <4>; iommus = <&apps_smmu 0x1004 0x80>, <&apps_smmu 0x1064 0x0>; + dma-coherent; };
compute-cb@5 { @@ -4020,6 +4022,7 @@ reg = <5>; iommus = <&apps_smmu 0x1005 0x80>, <&apps_smmu 0x1065 0x0>; + dma-coherent; };
compute-cb@6 { @@ -4027,6 +4030,7 @@ reg = <6>; iommus = <&apps_smmu 0x1006 0x80>, <&apps_smmu 0x1066 0x0>; + dma-coherent; };
compute-cb@7 { @@ -4034,6 +4038,7 @@ reg = <7>; iommus = <&apps_smmu 0x1007 0x80>, <&apps_smmu 0x1067 0x0>; + dma-coherent; }; };
@@ -4140,6 +4145,7 @@ iommus = <&apps_smmu 0x1961 0x0>, <&apps_smmu 0x0c01 0x20>, <&apps_smmu 0x19c1 0x10>; + dma-coherent; };
compute-cb@2 { @@ -4148,6 +4154,7 @@ iommus = <&apps_smmu 0x1962 0x0>, <&apps_smmu 0x0c02 0x20>, <&apps_smmu 0x19c2 0x10>; + dma-coherent; };
compute-cb@3 { @@ -4156,6 +4163,7 @@ iommus = <&apps_smmu 0x1963 0x0>, <&apps_smmu 0x0c03 0x20>, <&apps_smmu 0x19c3 0x10>; + dma-coherent; };
compute-cb@4 { @@ -4164,6 +4172,7 @@ iommus = <&apps_smmu 0x1964 0x0>, <&apps_smmu 0x0c04 0x20>, <&apps_smmu 0x19c4 0x10>; + dma-coherent; };
compute-cb@5 { @@ -4172,6 +4181,7 @@ iommus = <&apps_smmu 0x1965 0x0>, <&apps_smmu 0x0c05 0x20>, <&apps_smmu 0x19c5 0x10>; + dma-coherent; };
compute-cb@6 { @@ -4180,6 +4190,7 @@ iommus = <&apps_smmu 0x1966 0x0>, <&apps_smmu 0x0c06 0x20>, <&apps_smmu 0x19c6 0x10>; + dma-coherent; };
compute-cb@7 { @@ -4188,6 +4199,7 @@ iommus = <&apps_smmu 0x1967 0x0>, <&apps_smmu 0x0c07 0x20>, <&apps_smmu 0x19c7 0x10>; + dma-coherent; };
compute-cb@8 { @@ -4196,6 +4208,7 @@ iommus = <&apps_smmu 0x1968 0x0>, <&apps_smmu 0x0c08 0x20>, <&apps_smmu 0x19c8 0x10>; + dma-coherent; };
/* note: secure cb9 in downstream */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Armstrong neil.armstrong@linaro.org
[ Upstream commit 49c50ad9e6cbaa6a3da59cdd85d4ffb354ef65f4 ]
By default the DSP domains are non secure, add the missing qcom,non-secure-domain property to mark them as non-secure.
Fixes: d0c061e366ed ("arm64: dts: qcom: sm8550: add adsp, cdsp & mdss nodes") Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20240227-topic-sm8x50-upstream-fastrpc-non-secure-... Signed-off-by: Bjorn Andersson andersson@kernel.org Stable-dep-of: a6a8f54bc2af ("arm64: dts: qcom: sm8550: Fix ADSP memory base and length") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8550.dtsi | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sm8550.dtsi b/arch/arm64/boot/dts/qcom/sm8550.dtsi index 51407c482b51f..500dfbd79fb69 100644 --- a/arch/arm64/boot/dts/qcom/sm8550.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8550.dtsi @@ -3998,6 +3998,7 @@ compatible = "qcom,fastrpc"; qcom,glink-channels = "fastrpcglink-apps-dsp"; label = "adsp"; + qcom,non-secure-domain; #address-cells = <1>; #size-cells = <0>;
@@ -4136,6 +4137,7 @@ compatible = "qcom,fastrpc"; qcom,glink-channels = "fastrpcglink-apps-dsp"; label = "cdsp"; + qcom,non-secure-domain; #address-cells = <1>; #size-cells = <0>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit a6a8f54bc2af555738322783ba1e990c2ae7f443 ]
The address space in ADSP PAS (Peripheral Authentication Service) remoteproc node should point to the QDSP PUB address space (QDSP6...SS_PUB): 0x0680_0000 with length of 0x10000.
0x3000_0000, value used so far, is the main region of CDSP. Downstream DTS uses 0x0300_0000, which is oddly similar to 0x3000_0000, yet quite different and points to unused area.
Correct the base address and length, which also moves the node to different place to keep things sorted by unit address. The diff looks big, but only the unit address and "reg" property were changed. This should have no functional impact on Linux users, because PAS loader does not use this address space at all.
Fixes: d0c061e366ed ("arm64: dts: qcom: sm8550: add adsp, cdsp & mdss nodes") Cc: stable@vger.kernel.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@oss.qualcomm.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20241213-dts-qcom-cdsp-mpss-base-address-v3-7-2e00... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8550.dtsi | 262 +++++++++++++-------------- 1 file changed, 131 insertions(+), 131 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8550.dtsi b/arch/arm64/boot/dts/qcom/sm8550.dtsi index 500dfbd79fb69..bc9a1fca2db3a 100644 --- a/arch/arm64/boot/dts/qcom/sm8550.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8550.dtsi @@ -2026,6 +2026,137 @@ }; };
+ remoteproc_adsp: remoteproc@6800000 { + compatible = "qcom,sm8550-adsp-pas"; + reg = <0x0 0x06800000 0x0 0x10000>; + + interrupts-extended = <&pdc 6 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 0 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 1 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 2 IRQ_TYPE_EDGE_RISING>, + <&smp2p_adsp_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", + "handover", "stop-ack"; + + clocks = <&rpmhcc RPMH_CXO_CLK>; + clock-names = "xo"; + + power-domains = <&rpmhpd RPMHPD_LCX>, + <&rpmhpd RPMHPD_LMX>; + power-domain-names = "lcx", "lmx"; + + interconnects = <&lpass_lpicx_noc MASTER_LPASS_PROC 0 &mc_virt SLAVE_EBI1 0>; + + memory-region = <&adspslpi_mem>, <&q6_adsp_dtb_mem>; + + qcom,qmp = <&aoss_qmp>; + + qcom,smem-states = <&smp2p_adsp_out 0>; + qcom,smem-state-names = "stop"; + + status = "disabled"; + + remoteproc_adsp_glink: glink-edge { + interrupts-extended = <&ipcc IPCC_CLIENT_LPASS + IPCC_MPROC_SIGNAL_GLINK_QMP + IRQ_TYPE_EDGE_RISING>; + mboxes = <&ipcc IPCC_CLIENT_LPASS + IPCC_MPROC_SIGNAL_GLINK_QMP>; + + label = "lpass"; + qcom,remote-pid = <2>; + + fastrpc { + compatible = "qcom,fastrpc"; + qcom,glink-channels = "fastrpcglink-apps-dsp"; + label = "adsp"; + qcom,non-secure-domain; + #address-cells = <1>; + #size-cells = <0>; + + compute-cb@3 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <3>; + iommus = <&apps_smmu 0x1003 0x80>, + <&apps_smmu 0x1063 0x0>; + dma-coherent; + }; + + compute-cb@4 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <4>; + iommus = <&apps_smmu 0x1004 0x80>, + <&apps_smmu 0x1064 0x0>; + dma-coherent; + }; + + compute-cb@5 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <5>; + iommus = <&apps_smmu 0x1005 0x80>, + <&apps_smmu 0x1065 0x0>; + dma-coherent; + }; + + compute-cb@6 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <6>; + iommus = <&apps_smmu 0x1006 0x80>, + <&apps_smmu 0x1066 0x0>; + dma-coherent; + }; + + compute-cb@7 { + compatible = "qcom,fastrpc-compute-cb"; + reg = <7>; + iommus = <&apps_smmu 0x1007 0x80>, + <&apps_smmu 0x1067 0x0>; + dma-coherent; + }; + }; + + gpr { + compatible = "qcom,gpr"; + qcom,glink-channels = "adsp_apps"; + qcom,domain = <GPR_DOMAIN_ID_ADSP>; + qcom,intents = <512 20>; + #address-cells = <1>; + #size-cells = <0>; + + q6apm: service@1 { + compatible = "qcom,q6apm"; + reg = <GPR_APM_MODULE_IID>; + #sound-dai-cells = <0>; + qcom,protection-domain = "avs/audio", + "msm/adsp/audio_pd"; + + q6apmdai: dais { + compatible = "qcom,q6apm-dais"; + iommus = <&apps_smmu 0x1001 0x80>, + <&apps_smmu 0x1061 0x0>; + }; + + q6apmbedai: bedais { + compatible = "qcom,q6apm-lpass-dais"; + #sound-dai-cells = <1>; + }; + }; + + q6prm: service@2 { + compatible = "qcom,q6prm"; + reg = <GPR_PRM_MODULE_IID>; + qcom,protection-domain = "avs/audio", + "msm/adsp/audio_pd"; + + q6prmcc: clock-controller { + compatible = "qcom,q6prm-lpass-clocks"; + #clock-cells = <2>; + }; + }; + }; + }; + }; + lpass_wsa2macro: codec@6aa0000 { compatible = "qcom,sm8550-lpass-wsa-macro"; reg = <0 0x06aa0000 0 0x1000>; @@ -3954,137 +4085,6 @@ interrupts = <GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>; };
- remoteproc_adsp: remoteproc@30000000 { - compatible = "qcom,sm8550-adsp-pas"; - reg = <0x0 0x30000000 0x0 0x100>; - - interrupts-extended = <&pdc 6 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 0 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 1 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 2 IRQ_TYPE_EDGE_RISING>, - <&smp2p_adsp_in 3 IRQ_TYPE_EDGE_RISING>; - interrupt-names = "wdog", "fatal", "ready", - "handover", "stop-ack"; - - clocks = <&rpmhcc RPMH_CXO_CLK>; - clock-names = "xo"; - - power-domains = <&rpmhpd RPMHPD_LCX>, - <&rpmhpd RPMHPD_LMX>; - power-domain-names = "lcx", "lmx"; - - interconnects = <&lpass_lpicx_noc MASTER_LPASS_PROC 0 &mc_virt SLAVE_EBI1 0>; - - memory-region = <&adspslpi_mem>, <&q6_adsp_dtb_mem>; - - qcom,qmp = <&aoss_qmp>; - - qcom,smem-states = <&smp2p_adsp_out 0>; - qcom,smem-state-names = "stop"; - - status = "disabled"; - - remoteproc_adsp_glink: glink-edge { - interrupts-extended = <&ipcc IPCC_CLIENT_LPASS - IPCC_MPROC_SIGNAL_GLINK_QMP - IRQ_TYPE_EDGE_RISING>; - mboxes = <&ipcc IPCC_CLIENT_LPASS - IPCC_MPROC_SIGNAL_GLINK_QMP>; - - label = "lpass"; - qcom,remote-pid = <2>; - - fastrpc { - compatible = "qcom,fastrpc"; - qcom,glink-channels = "fastrpcglink-apps-dsp"; - label = "adsp"; - qcom,non-secure-domain; - #address-cells = <1>; - #size-cells = <0>; - - compute-cb@3 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <3>; - iommus = <&apps_smmu 0x1003 0x80>, - <&apps_smmu 0x1063 0x0>; - dma-coherent; - }; - - compute-cb@4 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <4>; - iommus = <&apps_smmu 0x1004 0x80>, - <&apps_smmu 0x1064 0x0>; - dma-coherent; - }; - - compute-cb@5 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <5>; - iommus = <&apps_smmu 0x1005 0x80>, - <&apps_smmu 0x1065 0x0>; - dma-coherent; - }; - - compute-cb@6 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <6>; - iommus = <&apps_smmu 0x1006 0x80>, - <&apps_smmu 0x1066 0x0>; - dma-coherent; - }; - - compute-cb@7 { - compatible = "qcom,fastrpc-compute-cb"; - reg = <7>; - iommus = <&apps_smmu 0x1007 0x80>, - <&apps_smmu 0x1067 0x0>; - dma-coherent; - }; - }; - - gpr { - compatible = "qcom,gpr"; - qcom,glink-channels = "adsp_apps"; - qcom,domain = <GPR_DOMAIN_ID_ADSP>; - qcom,intents = <512 20>; - #address-cells = <1>; - #size-cells = <0>; - - q6apm: service@1 { - compatible = "qcom,q6apm"; - reg = <GPR_APM_MODULE_IID>; - #sound-dai-cells = <0>; - qcom,protection-domain = "avs/audio", - "msm/adsp/audio_pd"; - - q6apmdai: dais { - compatible = "qcom,q6apm-dais"; - iommus = <&apps_smmu 0x1001 0x80>, - <&apps_smmu 0x1061 0x0>; - }; - - q6apmbedai: bedais { - compatible = "qcom,q6apm-lpass-dais"; - #sound-dai-cells = <1>; - }; - }; - - q6prm: service@2 { - compatible = "qcom,q6prm"; - reg = <GPR_PRM_MODULE_IID>; - qcom,protection-domain = "avs/audio", - "msm/adsp/audio_pd"; - - q6prmcc: clock-controller { - compatible = "qcom,q6prm-lpass-clocks"; - #clock-cells = <2>; - }; - }; - }; - }; - }; - nsp_noc: interconnect@320c0000 { compatible = "qcom,sm8550-nsp-noc"; reg = <0 0x320c0000 0 0xe080>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit a129ac3555c0dca6f04ae404dc0f0790656587fb ]
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove callback to the void returning variant.
Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Link: https://lore.kernel.org/r/20230925095532.1984344-15-u.kleine-koenig@pengutro... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Stable-dep-of: c9c0036c1990 ("soc: mediatek: mtk-devapc: Fix leaking IO map on driver remove") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/mediatek/mtk-devapc.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c index 0dfc1da9471cb..eb8f92f585882 100644 --- a/drivers/soc/mediatek/mtk-devapc.c +++ b/drivers/soc/mediatek/mtk-devapc.c @@ -300,18 +300,16 @@ static int mtk_devapc_probe(struct platform_device *pdev) return ret; }
-static int mtk_devapc_remove(struct platform_device *pdev) +static void mtk_devapc_remove(struct platform_device *pdev) { struct mtk_devapc_context *ctx = platform_get_drvdata(pdev);
stop_devapc(ctx); - - return 0; }
static struct platform_driver mtk_devapc_driver = { .probe = mtk_devapc_probe, - .remove = mtk_devapc_remove, + .remove_new = mtk_devapc_remove, .driver = { .name = "mtk-devapc", .of_match_table = mtk_devapc_dt_match,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit c9c0036c1990da8d2dd33563e327e05a775fcf10 ]
Driver removal should fully clean up - unmap the memory.
Fixes: 0890beb22618 ("soc: mediatek: add mt6779 devapc driver") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20250104142012.115974-2-krzysztof.kozlowski@linaro... Signed-off-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/mediatek/mtk-devapc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c index eb8f92f585882..d83a46334adbb 100644 --- a/drivers/soc/mediatek/mtk-devapc.c +++ b/drivers/soc/mediatek/mtk-devapc.c @@ -305,6 +305,7 @@ static void mtk_devapc_remove(struct platform_device *pdev) struct mtk_devapc_context *ctx = platform_get_drvdata(pdev);
stop_devapc(ctx); + iounmap(ctx->infra_base); }
static struct platform_driver mtk_devapc_driver = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Ribalda ribalda@chromium.org
[ Upstream commit 64627daf0c5f7838111f52bbbd1a597cb5d6871a ]
Avoid using the iterators after the list_for_each() constructs. This patch should be a NOP, but makes cocci, happier:
drivers/media/usb/uvc/uvc_ctrl.c:1861:44-50: ERROR: invalid reference to the index variable of the iterator on line 1850 drivers/media/usb/uvc/uvc_ctrl.c:2195:17-23: ERROR: invalid reference to the index variable of the iterator on line 2179
Reviewed-by: Sergey Senozhatsky senozhatsky@chromium.org Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Ricardo Ribalda ribalda@chromium.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Stable-dep-of: d9fecd096f67 ("media: uvcvideo: Only save async fh if success") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/usb/uvc/uvc_ctrl.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index ce70e96b8fb52..f78e0c02b3379 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1848,16 +1848,18 @@ int __uvc_ctrl_commit(struct uvc_fh *handle, int rollback, list_for_each_entry(entity, &chain->entities, chain) { ret = uvc_ctrl_commit_entity(chain->dev, entity, rollback, &err_ctrl); - if (ret < 0) + if (ret < 0) { + if (ctrls) + ctrls->error_idx = + uvc_ctrl_find_ctrl_idx(entity, ctrls, + err_ctrl); goto done; + } }
if (!rollback) uvc_ctrl_send_events(handle, ctrls->controls, ctrls->count); done: - if (ret < 0 && ctrls) - ctrls->error_idx = uvc_ctrl_find_ctrl_idx(entity, ctrls, - err_ctrl); mutex_unlock(&chain->ctrl_mutex); return ret; } @@ -2170,7 +2172,7 @@ static int uvc_ctrl_init_xu_ctrl(struct uvc_device *dev, int uvc_xu_ctrl_query(struct uvc_video_chain *chain, struct uvc_xu_control_query *xqry) { - struct uvc_entity *entity; + struct uvc_entity *entity, *iter; struct uvc_control *ctrl; unsigned int i; bool found; @@ -2180,16 +2182,16 @@ int uvc_xu_ctrl_query(struct uvc_video_chain *chain, int ret;
/* Find the extension unit. */ - found = false; - list_for_each_entry(entity, &chain->entities, chain) { - if (UVC_ENTITY_TYPE(entity) == UVC_VC_EXTENSION_UNIT && - entity->id == xqry->unit) { - found = true; + entity = NULL; + list_for_each_entry(iter, &chain->entities, chain) { + if (UVC_ENTITY_TYPE(iter) == UVC_VC_EXTENSION_UNIT && + iter->id == xqry->unit) { + entity = iter; break; } }
- if (!found) { + if (!entity) { uvc_dbg(chain->dev, CONTROL, "Extension unit %u not found\n", xqry->unit); return -ENOENT;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Ribalda ribalda@chromium.org
[ Upstream commit d9fecd096f67a4469536e040a8a10bbfb665918b ]
Now we keep a reference to the active fh for any call to uvc_ctrl_set, regardless if it is an actual set or if it is a just a try or if the device refused the operation.
We should only keep the file handle if the device actually accepted applying the operation.
Cc: stable@vger.kernel.org Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") Suggested-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Ricardo Ribalda ribalda@chromium.org Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-1-26c867231118@chromium.... Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/usb/uvc/uvc_ctrl.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index f78e0c02b3379..478e2c1fdf0fd 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1762,7 +1762,10 @@ int uvc_ctrl_begin(struct uvc_video_chain *chain) }
static int uvc_ctrl_commit_entity(struct uvc_device *dev, - struct uvc_entity *entity, int rollback, struct uvc_control **err_ctrl) + struct uvc_fh *handle, + struct uvc_entity *entity, + int rollback, + struct uvc_control **err_ctrl) { struct uvc_control *ctrl; unsigned int i; @@ -1810,6 +1813,10 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev, *err_ctrl = ctrl; return ret; } + + if (!rollback && handle && + ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS) + ctrl->handle = handle; }
return 0; @@ -1846,8 +1853,8 @@ int __uvc_ctrl_commit(struct uvc_fh *handle, int rollback,
/* Find the control. */ list_for_each_entry(entity, &chain->entities, chain) { - ret = uvc_ctrl_commit_entity(chain->dev, entity, rollback, - &err_ctrl); + ret = uvc_ctrl_commit_entity(chain->dev, handle, entity, + rollback, &err_ctrl); if (ret < 0) { if (ctrls) ctrls->error_idx = @@ -1997,9 +2004,6 @@ int uvc_ctrl_set(struct uvc_fh *handle, mapping->set(mapping, value, uvc_ctrl_data(ctrl, UVC_CTRL_DATA_CURRENT));
- if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS) - ctrl->handle = handle; - ctrl->dirty = 1; ctrl->modified = 1; return 0; @@ -2328,7 +2332,7 @@ int uvc_ctrl_restore_values(struct uvc_device *dev) ctrl->dirty = 1; }
- ret = uvc_ctrl_commit_entity(dev, entity, 0, NULL); + ret = uvc_ctrl_commit_entity(dev, NULL, entity, 0, NULL); if (ret < 0) return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Ribalda ribalda@chromium.org
[ Upstream commit 221cd51efe4565501a3dbf04cc011b537dcce7fb ]
When an async control is written, we copy a pointer to the file handle that started the operation. That pointer will be used when the device is done. Which could be anytime in the future.
If the user closes that file descriptor, its structure will be freed, and there will be one dangling pointer per pending async control, that the driver will try to use.
Clean all the dangling pointers during release().
To avoid adding a performance penalty in the most common case (no async operation), a counter has been introduced with some logic to make sure that it is properly handled.
Cc: stable@vger.kernel.org Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Ricardo Ribalda ribalda@chromium.org Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-3-26c867231118@chromium.... Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/usb/uvc/uvc_ctrl.c | 59 ++++++++++++++++++++++++++++++-- drivers/media/usb/uvc/uvc_v4l2.c | 2 ++ drivers/media/usb/uvc/uvcvideo.h | 9 ++++- 3 files changed, 67 insertions(+), 3 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index 478e2c1fdf0fd..028c4a5049af9 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1532,6 +1532,40 @@ static void uvc_ctrl_send_slave_event(struct uvc_video_chain *chain, uvc_ctrl_send_event(chain, handle, ctrl, mapping, val, changes); }
+static void uvc_ctrl_set_handle(struct uvc_fh *handle, struct uvc_control *ctrl, + struct uvc_fh *new_handle) +{ + lockdep_assert_held(&handle->chain->ctrl_mutex); + + if (new_handle) { + if (ctrl->handle) + dev_warn_ratelimited(&handle->stream->dev->udev->dev, + "UVC non compliance: Setting an async control with a pending operation."); + + if (new_handle == ctrl->handle) + return; + + if (ctrl->handle) { + WARN_ON(!ctrl->handle->pending_async_ctrls); + if (ctrl->handle->pending_async_ctrls) + ctrl->handle->pending_async_ctrls--; + } + + ctrl->handle = new_handle; + handle->pending_async_ctrls++; + return; + } + + /* Cannot clear the handle for a control not owned by us.*/ + if (WARN_ON(ctrl->handle != handle)) + return; + + ctrl->handle = NULL; + if (WARN_ON(!handle->pending_async_ctrls)) + return; + handle->pending_async_ctrls--; +} + void uvc_ctrl_status_event(struct uvc_video_chain *chain, struct uvc_control *ctrl, const u8 *data) { @@ -1542,7 +1576,8 @@ void uvc_ctrl_status_event(struct uvc_video_chain *chain, mutex_lock(&chain->ctrl_mutex);
handle = ctrl->handle; - ctrl->handle = NULL; + if (handle) + uvc_ctrl_set_handle(handle, ctrl, NULL);
list_for_each_entry(mapping, &ctrl->info.mappings, list) { s32 value = __uvc_ctrl_get_value(mapping, data); @@ -1816,7 +1851,7 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
if (!rollback && handle && ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS) - ctrl->handle = handle; + uvc_ctrl_set_handle(handle, ctrl, handle); }
return 0; @@ -2754,6 +2789,26 @@ int uvc_ctrl_init_device(struct uvc_device *dev) return 0; }
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle) +{ + struct uvc_entity *entity; + + guard(mutex)(&handle->chain->ctrl_mutex); + + if (!handle->pending_async_ctrls) + return; + + list_for_each_entry(entity, &handle->chain->dev->entities, list) { + for (unsigned int i = 0; i < entity->ncontrols; ++i) { + if (entity->controls[i].handle != handle) + continue; + uvc_ctrl_set_handle(handle, &entity->controls[i], NULL); + } + } + + WARN_ON(handle->pending_async_ctrls); +} + /* * Cleanup device controls. */ diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c index f4988f03640ae..7bcd706281daf 100644 --- a/drivers/media/usb/uvc/uvc_v4l2.c +++ b/drivers/media/usb/uvc/uvc_v4l2.c @@ -659,6 +659,8 @@ static int uvc_v4l2_release(struct file *file)
uvc_dbg(stream->dev, CALLS, "%s\n", __func__);
+ uvc_ctrl_cleanup_fh(handle); + /* Only free resources if this is a privileged handle. */ if (uvc_has_privileges(handle)) uvc_queue_release(&stream->queue); diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index 30fd056b2aec9..e99bfaa622669 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -334,7 +334,11 @@ struct uvc_video_chain { struct uvc_entity *processing; /* Processing unit */ struct uvc_entity *selector; /* Selector unit */
- struct mutex ctrl_mutex; /* Protects ctrl.info */ + struct mutex ctrl_mutex; /* + * Protects ctrl.info, + * ctrl.handle and + * uvc_fh.pending_async_ctrls + */
struct v4l2_prio_state prio; /* V4L2 priority state */ u32 caps; /* V4L2 chain-wide caps */ @@ -609,6 +613,7 @@ struct uvc_fh { struct uvc_video_chain *chain; struct uvc_streaming *stream; enum uvc_handle_state state; + unsigned int pending_async_ctrls; };
struct uvc_driver { @@ -794,6 +799,8 @@ int uvc_ctrl_is_accessible(struct uvc_video_chain *chain, u32 v4l2_id, int uvc_xu_ctrl_query(struct uvc_video_chain *chain, struct uvc_xu_control_query *xqry);
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle); + /* Utility functions */ struct usb_host_endpoint *uvc_find_endpoint(struct usb_host_interface *alts, u8 epaddr);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
[ Upstream commit ec9c08a1cb8dc5e8e003f95f5f62de41dde235bb ]
Before adding all the NVMEM layout bus infrastructure to the core, let's move the main nvmem_device structure in an internal header, only available to the core. This way all the additional code can be added in a dedicated file in order to keep the current core file tidy.
Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20231215111536.316972-4-srinivas.kandagatla@linaro... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 391b06ecb63e ("nvmem: imx-ocotp-ele: fix MAC address byte order") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvmem/core.c | 24 +----------------------- drivers/nvmem/internals.h | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 23 deletions(-) create mode 100644 drivers/nvmem/internals.h
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c index fd11d3825cf85..ec35886e921a8 100644 --- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -19,29 +19,7 @@ #include <linux/of.h> #include <linux/slab.h>
-struct nvmem_device { - struct module *owner; - struct device dev; - int stride; - int word_size; - int id; - struct kref refcnt; - size_t size; - bool read_only; - bool root_only; - int flags; - enum nvmem_type type; - struct bin_attribute eeprom; - struct device *base_dev; - struct list_head cells; - const struct nvmem_keepout *keepout; - unsigned int nkeepout; - nvmem_reg_read_t reg_read; - nvmem_reg_write_t reg_write; - struct gpio_desc *wp_gpio; - struct nvmem_layout *layout; - void *priv; -}; +#include "internals.h"
#define to_nvmem_device(d) container_of(d, struct nvmem_device, dev)
diff --git a/drivers/nvmem/internals.h b/drivers/nvmem/internals.h new file mode 100644 index 0000000000000..ce353831cd655 --- /dev/null +++ b/drivers/nvmem/internals.h @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_NVMEM_INTERNALS_H +#define _LINUX_NVMEM_INTERNALS_H + +#include <linux/device.h> +#include <linux/nvmem-consumer.h> +#include <linux/nvmem-provider.h> + +struct nvmem_device { + struct module *owner; + struct device dev; + struct list_head node; + int stride; + int word_size; + int id; + struct kref refcnt; + size_t size; + bool read_only; + bool root_only; + int flags; + enum nvmem_type type; + struct bin_attribute eeprom; + struct device *base_dev; + struct list_head cells; + const struct nvmem_keepout *keepout; + unsigned int nkeepout; + nvmem_reg_read_t reg_read; + nvmem_reg_write_t reg_write; + struct gpio_desc *wp_gpio; + struct nvmem_layout *layout; + void *priv; +}; + +#endif /* ifndef _LINUX_NVMEM_INTERNALS_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
[ Upstream commit 1b7c298a4ecbc28cc6ee94005734bff55eb83d22 ]
The layout entry is not used and will anyway be made useless by the new layout bus infrastructure coming next, so drop it. While at it, clarify the kdoc entry.
Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20231215111536.316972-5-srinivas.kandagatla@linaro... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 391b06ecb63e ("nvmem: imx-ocotp-ele: fix MAC address byte order") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvmem/core.c | 2 +- drivers/nvmem/layouts/onie-tlv.c | 3 +-- drivers/nvmem/layouts/sl28vpd.c | 3 +-- include/linux/nvmem-provider.h | 8 +++----- 4 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c index ec35886e921a8..ed8a1cba361e2 100644 --- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -815,7 +815,7 @@ static int nvmem_add_cells_from_layout(struct nvmem_device *nvmem) int ret;
if (layout && layout->add_cells) { - ret = layout->add_cells(&nvmem->dev, nvmem, layout); + ret = layout->add_cells(&nvmem->dev, nvmem); if (ret) return ret; } diff --git a/drivers/nvmem/layouts/onie-tlv.c b/drivers/nvmem/layouts/onie-tlv.c index 59fc87ccfcffe..defd42d4375cc 100644 --- a/drivers/nvmem/layouts/onie-tlv.c +++ b/drivers/nvmem/layouts/onie-tlv.c @@ -182,8 +182,7 @@ static bool onie_tlv_crc_is_valid(struct device *dev, size_t table_len, u8 *tabl return true; }
-static int onie_tlv_parse_table(struct device *dev, struct nvmem_device *nvmem, - struct nvmem_layout *layout) +static int onie_tlv_parse_table(struct device *dev, struct nvmem_device *nvmem) { struct onie_tlv_hdr hdr; size_t table_len, data_len, hdr_len; diff --git a/drivers/nvmem/layouts/sl28vpd.c b/drivers/nvmem/layouts/sl28vpd.c index 05671371f6316..26c7cf21b5233 100644 --- a/drivers/nvmem/layouts/sl28vpd.c +++ b/drivers/nvmem/layouts/sl28vpd.c @@ -80,8 +80,7 @@ static int sl28vpd_v1_check_crc(struct device *dev, struct nvmem_device *nvmem) return 0; }
-static int sl28vpd_add_cells(struct device *dev, struct nvmem_device *nvmem, - struct nvmem_layout *layout) +static int sl28vpd_add_cells(struct device *dev, struct nvmem_device *nvmem) { const struct nvmem_cell_info *pinfo; struct nvmem_cell_info info = {0}; diff --git a/include/linux/nvmem-provider.h b/include/linux/nvmem-provider.h index 1b81adebdb8be..ecd580ee84db9 100644 --- a/include/linux/nvmem-provider.h +++ b/include/linux/nvmem-provider.h @@ -158,9 +158,8 @@ struct nvmem_cell_table { * * @name: Layout name. * @of_match_table: Open firmware match table. - * @add_cells: Will be called if a nvmem device is found which - * has this layout. The function will add layout - * specific cells with nvmem_add_one_cell(). + * @add_cells: Called to populate the layout using + * nvmem_add_one_cell(). * @fixup_cell_info: Will be called before a cell is added. Can be * used to modify the nvmem_cell_info. * @owner: Pointer to struct module. @@ -174,8 +173,7 @@ struct nvmem_cell_table { struct nvmem_layout { const char *name; const struct of_device_id *of_match_table; - int (*add_cells)(struct device *dev, struct nvmem_device *nvmem, - struct nvmem_layout *layout); + int (*add_cells)(struct device *dev, struct nvmem_device *nvmem); void (*fixup_cell_info)(struct nvmem_device *nvmem, struct nvmem_layout *layout, struct nvmem_cell_info *cell);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
[ Upstream commit 1172460e716784ac7e1049a537bdca8edbf97360 ]
This hook is meant to be used by any provider and instantiating a layout just for this is useless. Let's instead move this hook to the nvmem device and add it to the config structure to be easily shared by the providers.
While at moving this hook, rename it ->fixup_dt_cell_info() to clarify its main intended purpose.
Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20231215111536.316972-6-srinivas.kandagatla@linaro... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 391b06ecb63e ("nvmem: imx-ocotp-ele: fix MAC address byte order") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvmem/core.c | 6 +++--- drivers/nvmem/imx-ocotp.c | 11 +++-------- drivers/nvmem/internals.h | 2 ++ drivers/nvmem/mtk-efuse.c | 11 +++-------- include/linux/nvmem-provider.h | 9 ++++----- 5 files changed, 15 insertions(+), 24 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c index ed8a1cba361e2..3ea94bc26e800 100644 --- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -674,7 +674,6 @@ static int nvmem_validate_keepouts(struct nvmem_device *nvmem)
static int nvmem_add_cells_from_dt(struct nvmem_device *nvmem, struct device_node *np) { - struct nvmem_layout *layout = nvmem->layout; struct device *dev = &nvmem->dev; struct device_node *child; const __be32 *addr; @@ -704,8 +703,8 @@ static int nvmem_add_cells_from_dt(struct nvmem_device *nvmem, struct device_nod
info.np = of_node_get(child);
- if (layout && layout->fixup_cell_info) - layout->fixup_cell_info(nvmem, layout, &info); + if (nvmem->fixup_dt_cell_info) + nvmem->fixup_dt_cell_info(nvmem, &info);
ret = nvmem_add_one_cell(nvmem, &info); kfree(info.name); @@ -902,6 +901,7 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
kref_init(&nvmem->refcnt); INIT_LIST_HEAD(&nvmem->cells); + nvmem->fixup_dt_cell_info = config->fixup_dt_cell_info;
nvmem->owner = config->owner; if (!nvmem->owner && config->dev->driver) diff --git a/drivers/nvmem/imx-ocotp.c b/drivers/nvmem/imx-ocotp.c index f1e202efaa497..79dd4fda03295 100644 --- a/drivers/nvmem/imx-ocotp.c +++ b/drivers/nvmem/imx-ocotp.c @@ -583,17 +583,12 @@ static const struct of_device_id imx_ocotp_dt_ids[] = { }; MODULE_DEVICE_TABLE(of, imx_ocotp_dt_ids);
-static void imx_ocotp_fixup_cell_info(struct nvmem_device *nvmem, - struct nvmem_layout *layout, - struct nvmem_cell_info *cell) +static void imx_ocotp_fixup_dt_cell_info(struct nvmem_device *nvmem, + struct nvmem_cell_info *cell) { cell->read_post_process = imx_ocotp_cell_pp; }
-static struct nvmem_layout imx_ocotp_layout = { - .fixup_cell_info = imx_ocotp_fixup_cell_info, -}; - static int imx_ocotp_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -619,7 +614,7 @@ static int imx_ocotp_probe(struct platform_device *pdev) imx_ocotp_nvmem_config.size = 4 * priv->params->nregs; imx_ocotp_nvmem_config.dev = dev; imx_ocotp_nvmem_config.priv = priv; - imx_ocotp_nvmem_config.layout = &imx_ocotp_layout; + imx_ocotp_nvmem_config.fixup_dt_cell_info = &imx_ocotp_fixup_dt_cell_info;
priv->config = &imx_ocotp_nvmem_config;
diff --git a/drivers/nvmem/internals.h b/drivers/nvmem/internals.h index ce353831cd655..893553fbdf51a 100644 --- a/drivers/nvmem/internals.h +++ b/drivers/nvmem/internals.h @@ -23,6 +23,8 @@ struct nvmem_device { struct bin_attribute eeprom; struct device *base_dev; struct list_head cells; + void (*fixup_dt_cell_info)(struct nvmem_device *nvmem, + struct nvmem_cell_info *cell); const struct nvmem_keepout *keepout; unsigned int nkeepout; nvmem_reg_read_t reg_read; diff --git a/drivers/nvmem/mtk-efuse.c b/drivers/nvmem/mtk-efuse.c index 87c94686cfd21..84f05b40a4112 100644 --- a/drivers/nvmem/mtk-efuse.c +++ b/drivers/nvmem/mtk-efuse.c @@ -45,9 +45,8 @@ static int mtk_efuse_gpu_speedbin_pp(void *context, const char *id, int index, return 0; }
-static void mtk_efuse_fixup_cell_info(struct nvmem_device *nvmem, - struct nvmem_layout *layout, - struct nvmem_cell_info *cell) +static void mtk_efuse_fixup_dt_cell_info(struct nvmem_device *nvmem, + struct nvmem_cell_info *cell) { size_t sz = strlen(cell->name);
@@ -61,10 +60,6 @@ static void mtk_efuse_fixup_cell_info(struct nvmem_device *nvmem, cell->read_post_process = mtk_efuse_gpu_speedbin_pp; }
-static struct nvmem_layout mtk_efuse_layout = { - .fixup_cell_info = mtk_efuse_fixup_cell_info, -}; - static int mtk_efuse_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -91,7 +86,7 @@ static int mtk_efuse_probe(struct platform_device *pdev) econfig.priv = priv; econfig.dev = dev; if (pdata->uses_post_processing) - econfig.layout = &mtk_efuse_layout; + econfig.fixup_dt_cell_info = &mtk_efuse_fixup_dt_cell_info; nvmem = devm_nvmem_register(dev, &econfig);
return PTR_ERR_OR_ZERO(nvmem); diff --git a/include/linux/nvmem-provider.h b/include/linux/nvmem-provider.h index ecd580ee84db9..9a015e4d428cc 100644 --- a/include/linux/nvmem-provider.h +++ b/include/linux/nvmem-provider.h @@ -83,6 +83,8 @@ struct nvmem_cell_info { * @cells: Optional array of pre-defined NVMEM cells. * @ncells: Number of elements in cells. * @add_legacy_fixed_of_cells: Read fixed NVMEM cells from old OF syntax. + * @fixup_dt_cell_info: Will be called before a cell is added. Can be + * used to modify the nvmem_cell_info. * @keepout: Optional array of keepout ranges (sorted ascending by start). * @nkeepout: Number of elements in the keepout array. * @type: Type of the nvmem storage @@ -114,6 +116,8 @@ struct nvmem_config { const struct nvmem_cell_info *cells; int ncells; bool add_legacy_fixed_of_cells; + void (*fixup_dt_cell_info)(struct nvmem_device *nvmem, + struct nvmem_cell_info *cell); const struct nvmem_keepout *keepout; unsigned int nkeepout; enum nvmem_type type; @@ -160,8 +164,6 @@ struct nvmem_cell_table { * @of_match_table: Open firmware match table. * @add_cells: Called to populate the layout using * nvmem_add_one_cell(). - * @fixup_cell_info: Will be called before a cell is added. Can be - * used to modify the nvmem_cell_info. * @owner: Pointer to struct module. * @node: List node. * @@ -174,9 +176,6 @@ struct nvmem_layout { const char *name; const struct of_device_id *of_match_table; int (*add_cells)(struct device *dev, struct nvmem_device *nvmem); - void (*fixup_cell_info)(struct nvmem_device *nvmem, - struct nvmem_layout *layout, - struct nvmem_cell_info *cell);
/* private */ struct module *owner;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit 391b06ecb63e6eacd054582cb4eb738dfbf5eb77 ]
According to the i.MX93 Fusemap the two MAC addresses are stored in words 315 to 317 like this:
315 MAC1_ADDR_31_0[31:0] 316 MAC1_ADDR_47_32[47:32] MAC2_ADDR_15_0[15:0] 317 MAC2_ADDR_47_16[31:0]
This means the MAC addresses are stored in reverse byte order. We have to swap the bytes before passing them to the upper layers. The storage format is consistent to the one used on i.MX6 using imx-ocotp driver which does the same byte swapping as introduced here.
With this patch the MAC address on my i.MX93 TQ board correctly reads as 00:d0:93:6b:27:b8 instead of b8:27:6b:93:d0:00.
Fixes: 22e9e6fcfb50 ("nvmem: imx: support i.MX93 OCOTP") Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Cc: stable stable@kernel.org Reviewed-by: Peng Fan peng.fan@nxp.com Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20241230141901.263976-4-srinivas.kandagatla@linaro... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvmem/imx-ocotp-ele.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/nvmem/imx-ocotp-ele.c b/drivers/nvmem/imx-ocotp-ele.c index dfc925edfc83e..1356ec93bfd00 100644 --- a/drivers/nvmem/imx-ocotp-ele.c +++ b/drivers/nvmem/imx-ocotp-ele.c @@ -107,6 +107,26 @@ static int imx_ocotp_reg_read(void *context, unsigned int offset, void *val, siz return 0; };
+static int imx_ocotp_cell_pp(void *context, const char *id, int index, + unsigned int offset, void *data, size_t bytes) +{ + u8 *buf = data; + int i; + + /* Deal with some post processing of nvmem cell data */ + if (id && !strcmp(id, "mac-address")) + for (i = 0; i < bytes / 2; i++) + swap(buf[i], buf[bytes - i - 1]); + + return 0; +} + +static void imx_ocotp_fixup_dt_cell_info(struct nvmem_device *nvmem, + struct nvmem_cell_info *cell) +{ + cell->read_post_process = imx_ocotp_cell_pp; +} + static int imx_ele_ocotp_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -133,6 +153,8 @@ static int imx_ele_ocotp_probe(struct platform_device *pdev) priv->config.stride = 1; priv->config.priv = priv; priv->config.read_only = true; + priv->config.add_legacy_fixed_of_cells = true; + priv->config.fixup_dt_cell_info = imx_ocotp_fixup_dt_cell_info; mutex_init(&priv->lock);
nvmem = devm_nvmem_register(dev, &priv->config);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roy Luo royluo@google.com
[ Upstream commit 0ef40f399aa2be8c04aee9b7430705612c104ce5 ]
udc device and gadget device are tightly coupled, yet there's no good way to corelate the two. Add a sysfs link in udc that points to the corresponding gadget device. An example use case: userspace configures a f_midi configfs driver and bind the udc device, then it tries to locate the corresponding midi device, which is a child device of the gadget device. The gadget device that's associated to the udc device has to be identified in order to index the midi device. Having a sysfs link would make things much easier.
Signed-off-by: Roy Luo royluo@google.com Link: https://lore.kernel.org/r/20240307030922.3573161-1-royluo@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 399a45e5237c ("usb: gadget: core: flush gadget workqueue after device removal") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/core.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c index 33979f61dc4dd..1d58adc597a7e 100644 --- a/drivers/usb/gadget/udc/core.c +++ b/drivers/usb/gadget/udc/core.c @@ -1419,8 +1419,16 @@ int usb_add_gadget(struct usb_gadget *gadget) if (ret) goto err_free_id;
+ ret = sysfs_create_link(&udc->dev.kobj, + &gadget->dev.kobj, "gadget"); + if (ret) + goto err_del_gadget; + return 0;
+ err_del_gadget: + device_del(&gadget->dev); + err_free_id: ida_free(&gadget_id_numbers, gadget->id_number);
@@ -1529,6 +1537,7 @@ void usb_del_gadget(struct usb_gadget *gadget) mutex_unlock(&udc_lock);
kobject_uevent(&udc->dev.kobj, KOBJ_REMOVE); + sysfs_remove_link(&udc->dev.kobj, "gadget"); flush_work(&gadget->work); device_del(&gadget->dev); ida_free(&gadget_id_numbers, gadget->id_number);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roy Luo royluo@google.com
[ Upstream commit 399a45e5237ca14037120b1b895bd38a3b4492ea ]
device_del() can lead to new work being scheduled in gadget->work workqueue. This is observed, for example, with the dwc3 driver with the following call stack: device_del() gadget_unbind_driver() usb_gadget_disconnect_locked() dwc3_gadget_pullup() dwc3_gadget_soft_disconnect() usb_gadget_set_state() schedule_work(&gadget->work)
Move flush_work() after device_del() to ensure the workqueue is cleaned up.
Fixes: 5702f75375aa9 ("usb: gadget: udc-core: move sysfs_notify() to a workqueue") Cc: stable stable@kernel.org Signed-off-by: Roy Luo royluo@google.com Reviewed-by: Alan Stern stern@rowland.harvard.edu Reviewed-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Link: https://lore.kernel.org/r/20250204233642.666991-1-royluo@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c index 1d58adc597a7e..a4120a25428e5 100644 --- a/drivers/usb/gadget/udc/core.c +++ b/drivers/usb/gadget/udc/core.c @@ -1538,8 +1538,8 @@ void usb_del_gadget(struct usb_gadget *gadget)
kobject_uevent(&udc->dev.kobj, KOBJ_REMOVE); sysfs_remove_link(&udc->dev.kobj, "gadget"); - flush_work(&gadget->work); device_del(&gadget->dev); + flush_work(&gadget->work); ida_free(&gadget_id_numbers, gadget->id_number); cancel_work_sync(&udc->vbus_work); device_unregister(&udc->dev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jill Donahue jilliandonahue58@gmail.com
[ Upstream commit 4ab37fcb42832cdd3e9d5e50653285ca84d6686f ]
When using USB MIDI, a lock is attempted to be acquired twice through a re-entrant call to f_midi_transmit, causing a deadlock.
Fix it by using queue_work() to schedule the inner f_midi_transmit() via a high priority work queue from the completion handler.
Link: https://lore.kernel.org/all/CAArt=LjxU0fUZOj06X+5tkeGT+6RbXzpWg1h4t4Fwa_KGVA... Fixes: d5daf49b58661 ("USB: gadget: midi: add midi function driver") Cc: stable stable@kernel.org Signed-off-by: Jill Donahue jilliandonahue58@gmail.com Link: https://lore.kernel.org/r/20250211174805.1369265-1-jdonahue@fender.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/f_midi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c index 49946af11a905..6d91d7d7a23f8 100644 --- a/drivers/usb/gadget/function/f_midi.c +++ b/drivers/usb/gadget/function/f_midi.c @@ -282,7 +282,7 @@ f_midi_complete(struct usb_ep *ep, struct usb_request *req) /* Our transmit completed. See if there's more to go. * f_midi_transmit eats req, don't queue it again. */ req->length = 0; - f_midi_transmit(midi); + queue_work(system_highpri_wq, &midi->work); return; } break;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Keeping jkeeping@inmusicbrands.com
[ Upstream commit 6b24e67b4056ba83b1e95e005b7e50fdb1cc6cf4 ]
Commit 2f45a4e289779 ("ASoC: rockchip: i2s_tdm: Fixup config for SND_SOC_DAIFMT_DSP_A/B") applied a partial change to fix the configuration for DSP A and DSP B formats.
The shift control also needs updating to set the correct offset for frame data compared to LRCK. Set the correct values.
Fixes: 081068fd64140 ("ASoC: rockchip: add support for i2s-tdm controller") Signed-off-by: John Keeping jkeeping@inmusicbrands.com Link: https://patch.msgid.link/20250204161311.2117240-1-jkeeping@inmusicbrands.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/rockchip/rockchip_i2s_tdm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/rockchip/rockchip_i2s_tdm.c b/sound/soc/rockchip/rockchip_i2s_tdm.c index 14e5c53e697b0..7ae93cbaea9a7 100644 --- a/sound/soc/rockchip/rockchip_i2s_tdm.c +++ b/sound/soc/rockchip/rockchip_i2s_tdm.c @@ -453,11 +453,11 @@ static int rockchip_i2s_tdm_set_fmt(struct snd_soc_dai *cpu_dai, break; case SND_SOC_DAIFMT_DSP_A: val = I2S_TXCR_TFS_TDM_PCM; - tdm_val = TDM_SHIFT_CTRL(0); + tdm_val = TDM_SHIFT_CTRL(2); break; case SND_SOC_DAIFMT_DSP_B: val = I2S_TXCR_TFS_TDM_PCM; - tdm_val = TDM_SHIFT_CTRL(2); + tdm_val = TDM_SHIFT_CTRL(4); break; default: ret = -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 8ae4f16f7d7b59cca55aeca6db7c9636ffe7fbaa ]
The stub versions of __real_pte() etc are only used with HPT & 4K pages, so move them into the hash-4k.h header.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240821080729.872034-1-mpe@ellerman.id.au Stable-dep-of: 61bcc752d1b8 ("powerpc/64s: Rewrite __real_pte() and __rpte_to_hidx() as static inline") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/book3s/64/hash-4k.h | 20 +++++++++++++++ arch/powerpc/include/asm/book3s/64/pgtable.h | 26 -------------------- 2 files changed, 20 insertions(+), 26 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index 6472b08fa1b0c..57ebbacf1709c 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -89,6 +89,26 @@ static inline int hash__hugepd_ok(hugepd_t hpd) } #endif
+/* + * With 4K page size the real_pte machinery is all nops. + */ +#define __real_pte(e, p, o) ((real_pte_t){(e)}) +#define __rpte_to_pte(r) ((r).pte) +#define __rpte_to_hidx(r,index) (pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT) + +#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift) \ + do { \ + index = 0; \ + shift = mmu_psize_defs[psize].shift; \ + +#define pte_iterate_hashed_end() } while(0) + +/* + * We expect this to be called only for user addresses or kernel virtual + * addresses other than the linear mapping. + */ +#define pte_pagesize_index(mm, addr, pte) MMU_PAGE_4K + /* * 4K PTE format is different from 64K PTE format. Saving the hash_slot is just * a matter of returning the PTE bits that need to be modified. On 64K PTE, diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 5c497c862d757..8a6e6b6daa906 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -319,32 +319,6 @@ extern unsigned long pci_io_base;
#ifndef __ASSEMBLY__
-/* - * This is the default implementation of various PTE accessors, it's - * used in all cases except Book3S with 64K pages where we have a - * concept of sub-pages - */ -#ifndef __real_pte - -#define __real_pte(e, p, o) ((real_pte_t){(e)}) -#define __rpte_to_pte(r) ((r).pte) -#define __rpte_to_hidx(r,index) (pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT) - -#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift) \ - do { \ - index = 0; \ - shift = mmu_psize_defs[psize].shift; \ - -#define pte_iterate_hashed_end() } while(0) - -/* - * We expect this to be called only for user addresses or kernel virtual - * addresses other than the linear mapping. - */ -#define pte_pagesize_index(mm, addr, pte) MMU_PAGE_4K - -#endif /* __real_pte */ - static inline unsigned long pte_update(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long clr, unsigned long set, int huge)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit 61bcc752d1b81fde3cae454ff20c1d3c359df500 ]
Rewrite __real_pte() and __rpte_to_hidx() as static inline in order to avoid following warnings/errors when building with 4k page size:
CC arch/powerpc/mm/book3s64/hash_tlb.o arch/powerpc/mm/book3s64/hash_tlb.c: In function 'hpte_need_flush': arch/powerpc/mm/book3s64/hash_tlb.c:49:16: error: variable 'offset' set but not used [-Werror=unused-but-set-variable] 49 | int i, offset; | ^~~~~~
CC arch/powerpc/mm/book3s64/hash_native.o arch/powerpc/mm/book3s64/hash_native.c: In function 'native_flush_hash_range': arch/powerpc/mm/book3s64/hash_native.c:782:29: error: variable 'index' set but not used [-Werror=unused-but-set-variable] 782 | unsigned long hash, index, hidx, shift, slot; | ^~~~~
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202501081741.AYFwybsq-lkp@intel.com/ Fixes: ff31e105464d ("powerpc/mm/hash64: Store the slot information at the right offset for hugetlb") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/e0d340a5b7bd478ecbf245d826e6ab2778b74e06.1736706263... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/book3s/64/hash-4k.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index 57ebbacf1709c..2a2649e0f91df 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -92,9 +92,17 @@ static inline int hash__hugepd_ok(hugepd_t hpd) /* * With 4K page size the real_pte machinery is all nops. */ -#define __real_pte(e, p, o) ((real_pte_t){(e)}) +static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep, int offset) +{ + return (real_pte_t){pte}; +} + #define __rpte_to_pte(r) ((r).pte) -#define __rpte_to_hidx(r,index) (pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT) + +static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index) +{ + return pte_val(__rpte_to_pte(rpte)) >> H_PAGE_F_GIX_SHIFT; +}
#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift) \ do { \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
[ Upstream commit 174448badb4409491bfba2e6b46f7aa078741c5e ]
Headset MIC will no function when power_save=0.
Fixes: 1fd50509fe14 ("ALSA: hda/realtek: Update ALC225 depop procedure") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219743 Signed-off-by: Kailang Yang kailang@realtek.com Link: https://lore.kernel.org/0474a095ab0044d0939ec4bf4362423d@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index abe3d5b9b84b3..75162e5f712b4 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -3779,6 +3779,7 @@ static void alc225_init(struct hda_codec *codec) AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE);
msleep(75); + alc_update_coef_idx(codec, 0x4a, 3 << 10, 0); alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x4); /* Hight power */ } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit d262a192d38e527faa5984629aabda2e0d1c4f54 ]
Erhard reported the following KASAN hit while booting his PowerMac G4 with a KASAN-enabled kernel 6.13-rc6:
BUG: KASAN: vmalloc-out-of-bounds in copy_to_kernel_nofault+0xd8/0x1c8 Write of size 8 at addr f1000000 by task chronyd/1293
CPU: 0 UID: 123 PID: 1293 Comm: chronyd Tainted: G W 6.13.0-rc6-PMacG4 #2 Tainted: [W]=WARN Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [c2437590] [c1631a84] dump_stack_lvl+0x70/0x8c (unreliable) [c24375b0] [c0504998] print_report+0xdc/0x504 [c2437610] [c050475c] kasan_report+0xf8/0x108 [c2437690] [c0505a3c] kasan_check_range+0x24/0x18c [c24376a0] [c03fb5e4] copy_to_kernel_nofault+0xd8/0x1c8 [c24376c0] [c004c014] patch_instructions+0x15c/0x16c [c2437710] [c00731a8] bpf_arch_text_copy+0x60/0x7c [c2437730] [c0281168] bpf_jit_binary_pack_finalize+0x50/0xac [c2437750] [c0073cf4] bpf_int_jit_compile+0xb30/0xdec [c2437880] [c0280394] bpf_prog_select_runtime+0x15c/0x478 [c24378d0] [c1263428] bpf_prepare_filter+0xbf8/0xc14 [c2437990] [c12677ec] bpf_prog_create_from_user+0x258/0x2b4 [c24379d0] [c027111c] do_seccomp+0x3dc/0x1890 [c2437ac0] [c001d8e0] system_call_exception+0x2dc/0x420 [c2437f30] [c00281ac] ret_from_syscall+0x0/0x2c --- interrupt: c00 at 0x5a1274 NIP: 005a1274 LR: 006a3b3c CTR: 005296c8 REGS: c2437f40 TRAP: 0c00 Tainted: G W (6.13.0-rc6-PMacG4) MSR: 0200f932 <VEC,EE,PR,FP,ME,IR,DR,RI> CR: 24004422 XER: 00000000
GPR00: 00000166 af8f3fa0 a7ee3540 00000001 00000000 013b6500 005a5858 0200f932 GPR08: 00000000 00001fe9 013d5fc8 005296c8 2822244c 00b2fcd8 00000000 af8f4b57 GPR16: 00000000 00000001 00000000 00000000 00000000 00000001 00000000 00000002 GPR24: 00afdbb0 00000000 00000000 00000000 006e0004 013ce060 006e7c1c 00000001 NIP [005a1274] 0x5a1274 LR [006a3b3c] 0x6a3b3c --- interrupt: c00
The buggy address belongs to the virtual mapping at [f1000000, f1002000) created by: text_area_cpu_up+0x20/0x190
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x76e30 flags: 0x80000000(zone=2) raw: 80000000 00000000 00000122 00000000 00000000 00000000 ffffffff 00000001 raw: 00000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: f0ffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0ffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f1000000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
^ f1000080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f1000100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ==================================================================
f8 corresponds to KASAN_VMALLOC_INVALID which means the area is not initialised hence not supposed to be used yet.
Powerpc text patching infrastructure allocates a virtual memory area using get_vm_area() and flags it as VM_ALLOC. But that flag is meant to be used for vmalloc() and vmalloc() allocated memory is not supposed to be used before a call to __vmalloc_node_range() which is never called for that area.
That went undetected until commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault")
The area allocated by text_area_cpu_up() is not vmalloc memory, it is mapped directly on demand when needed by map_kernel_page(). There is no VM flag corresponding to such usage, so just pass no flag. That way the area will be unpoisonned and usable immediately.
Reported-by: Erhard Furtner erhard_f@mailbox.org Closes: https://lore.kernel.org/all/20250112135832.57c92322@yea/ Fixes: 37bc3e5fd764 ("powerpc/lib/code-patching: Use alternate map for patch_instruction()") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Madhavan Srinivasan maddy@linux.ibm.com Link: https://patch.msgid.link/06621423da339b374f48c0886e3a5db18e896be8.1739342693... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/lib/code-patching.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index b00112d7ad467..4426a77c8f063 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -105,7 +105,7 @@ static int text_area_cpu_up(unsigned int cpu) unsigned long addr; int err;
- area = get_vm_area(PAGE_SIZE, VM_ALLOC); + area = get_vm_area(PAGE_SIZE, 0); if (!area) { WARN_ONCE(1, "Failed to create text area for cpu %d\n", cpu);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 9593172d93b9f91c362baec4643003dc29802929 ]
syzkaller reported a use-after-free in geneve_find_dev() [0] without repro.
geneve_configure() links struct geneve_dev.next to net_generic(net, geneve_net_id)->geneve_list.
The net here could differ from dev_net(dev) if IFLA_NET_NS_PID, IFLA_NET_NS_FD, or IFLA_TARGET_NETNSID is set.
When dev_net(dev) is dismantled, geneve_exit_batch_rtnl() finally calls unregister_netdevice_queue() for each dev in the netns, and later the dev is freed.
However, its geneve_dev.next is still linked to the backend UDP socket netns.
Then, use-after-free will occur when another geneve dev is created in the netns.
Let's call geneve_dellink() instead in geneve_destroy_tunnels().
[0]: BUG: KASAN: slab-use-after-free in geneve_find_dev drivers/net/geneve.c:1295 [inline] BUG: KASAN: slab-use-after-free in geneve_configure+0x234/0x858 drivers/net/geneve.c:1343 Read of size 2 at addr ffff000054d6ee24 by task syz.1.4029/13441
CPU: 1 UID: 0 PID: 13441 Comm: syz.1.4029 Not tainted 6.13.0-g0ad9617c78ac #24 dc35ca22c79fb82e8e7bc5c9c9adafea898b1e3d Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x16c/0x6f0 mm/kasan/report.c:489 kasan_report+0xc0/0x120 mm/kasan/report.c:602 __asan_report_load2_noabort+0x20/0x30 mm/kasan/report_generic.c:379 geneve_find_dev drivers/net/geneve.c:1295 [inline] geneve_configure+0x234/0x858 drivers/net/geneve.c:1343 geneve_newlink+0xb8/0x128 drivers/net/geneve.c:1634 rtnl_newlink_create+0x23c/0x868 net/core/rtnetlink.c:3795 __rtnl_newlink net/core/rtnetlink.c:3906 [inline] rtnl_newlink+0x1054/0x1630 net/core/rtnetlink.c:4021 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2543 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6938 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:713 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x410/0x6f8 net/socket.c:2568 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2622 __sys_sendmsg net/socket.c:2654 [inline] __do_sys_sendmsg net/socket.c:2659 [inline] __se_sys_sendmsg net/socket.c:2657 [inline] __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2657 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
Allocated by task 13247: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x30/0x68 mm/kasan/common.c:68 kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __do_kmalloc_node mm/slub.c:4298 [inline] __kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4304 __kvmalloc_node_noprof+0x9c/0x230 mm/util.c:645 alloc_netdev_mqs+0xb8/0x11a0 net/core/dev.c:11470 rtnl_create_link+0x2b8/0xb50 net/core/rtnetlink.c:3604 rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3780 __rtnl_newlink net/core/rtnetlink.c:3906 [inline] rtnl_newlink+0x1054/0x1630 net/core/rtnetlink.c:4021 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2543 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6938 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:713 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x410/0x6f8 net/socket.c:2568 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2622 __sys_sendmsg net/socket.c:2654 [inline] __do_sys_sendmsg net/socket.c:2659 [inline] __se_sys_sendmsg net/socket.c:2657 [inline] __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2657 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
Freed by task 45: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x30/0x68 mm/kasan/common.c:68 kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x48/0x68 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2353 [inline] slab_free mm/slub.c:4613 [inline] kfree+0x140/0x420 mm/slub.c:4761 kvfree+0x4c/0x68 mm/util.c:688 netdev_release+0x94/0xc8 net/core/net-sysfs.c:2065 device_release+0x98/0x1c0 kobject_cleanup lib/kobject.c:689 [inline] kobject_release lib/kobject.c:720 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x2b0/0x438 lib/kobject.c:737 netdev_run_todo+0xe5c/0xfc8 net/core/dev.c:11185 rtnl_unlock+0x20/0x38 net/core/rtnetlink.c:151 cleanup_net+0x4fc/0x8c0 net/core/net_namespace.c:648 process_one_work+0x700/0x1398 kernel/workqueue.c:3236 process_scheduled_works kernel/workqueue.c:3317 [inline] worker_thread+0x8c4/0xe10 kernel/workqueue.c:3398 kthread+0x4bc/0x608 kernel/kthread.c:464 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862
The buggy address belongs to the object at ffff000054d6e000 which belongs to the cache kmalloc-cg-4k of size 4096 The buggy address is located 3620 bytes inside of freed 4096-byte region [ffff000054d6e000, ffff000054d6f000)
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x94d68 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff000016276181 flags: 0x3fffe0000000040(head|node=0|zone=0|lastcpupid=0x1ffff) page_type: f5(slab) raw: 03fffe0000000040 ffff0000c000f500 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000040004 00000001f5000000 ffff000016276181 head: 03fffe0000000040 ffff0000c000f500 dead000000000122 0000000000000000 head: 0000000000000000 0000000000040004 00000001f5000000 ffff000016276181 head: 03fffe0000000003 fffffdffc1535a01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff000054d6ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000054d6ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff000054d6ee00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff000054d6ee80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000054d6ef00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250213043354.91368-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/geneve.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index b939d4711c59b..c2066f19295d4 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1966,16 +1966,11 @@ static void geneve_destroy_tunnels(struct net *net, struct list_head *head) /* gather any geneve devices that were moved into this ns */ for_each_netdev_safe(net, dev, aux) if (dev->rtnl_link_ops == &geneve_link_ops) - unregister_netdevice_queue(dev, head); + geneve_dellink(dev, head);
/* now gather any other geneve devices that were created in this ns */ - list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) { - /* If geneve->dev is in the same netns, it was already added - * to the list by the previous loop. - */ - if (!net_eq(dev_net(geneve->dev), net)) - unregister_netdevice_queue(geneve->dev, head); - } + list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) + geneve_dellink(geneve->dev, head); }
static void __net_exit geneve_exit_batch_net(struct list_head *net_list)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Rodionov vitalyr@opensource.cirrus.com
[ Upstream commit 08b613b9e2ba431db3bd15cb68ca72472a50ef5c ]
This patch corrects the full-scale volume setting logic. On certain platforms, the full-scale volume bit is required. The current logic mistakenly sets this bit and incorrectly clears reserved bit 0, causing the headphone output to be muted.
Fixes: 342b6b610ae2 ("ALSA: hda/cs8409: Fix Full Scale Volume setting for all variants") Signed-off-by: Vitaly Rodionov vitalyr@opensource.cirrus.com Link: https://patch.msgid.link/20250214210736.30814-1-vitalyr@opensource.cirrus.co... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_cs8409-tables.c | 6 +++--- sound/pci/hda/patch_cs8409.c | 20 +++++++++++--------- sound/pci/hda/patch_cs8409.h | 5 +++-- 3 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/sound/pci/hda/patch_cs8409-tables.c b/sound/pci/hda/patch_cs8409-tables.c index 759f48038273d..621f947e38174 100644 --- a/sound/pci/hda/patch_cs8409-tables.c +++ b/sound/pci/hda/patch_cs8409-tables.c @@ -121,7 +121,7 @@ static const struct cs8409_i2c_param cs42l42_init_reg_seq[] = { { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3f }, - { CS42L42_HP_CTL, 0x03 }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_MIC_DET_CTL1, 0xB6 }, { CS42L42_TIPSENSE_CTL, 0xC2 }, { CS42L42_HS_CLAMP_DISABLE, 0x01 }, @@ -315,7 +315,7 @@ static const struct cs8409_i2c_param dolphin_c0_init_reg_seq[] = { { CS42L42_ASP_TX_SZ_EN, 0x01 }, { CS42L42_PWR_CTL1, 0x0A }, { CS42L42_PWR_CTL2, 0x84 }, - { CS42L42_HP_CTL, 0x03 }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3f }, @@ -371,7 +371,7 @@ static const struct cs8409_i2c_param dolphin_c1_init_reg_seq[] = { { CS42L42_ASP_TX_SZ_EN, 0x00 }, { CS42L42_PWR_CTL1, 0x0E }, { CS42L42_PWR_CTL2, 0x84 }, - { CS42L42_HP_CTL, 0x01 }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3f }, diff --git a/sound/pci/hda/patch_cs8409.c b/sound/pci/hda/patch_cs8409.c index 892223d9e64ab..b003ac1990ba8 100644 --- a/sound/pci/hda/patch_cs8409.c +++ b/sound/pci/hda/patch_cs8409.c @@ -876,7 +876,7 @@ static void cs42l42_resume(struct sub_codec *cs42l42) { CS42L42_DET_INT_STATUS2, 0x00 }, { CS42L42_TSRS_PLUG_STATUS, 0x00 }, }; - int fsv_old, fsv_new; + unsigned int fsv;
/* Bring CS42L42 out of Reset */ spec->gpio_data = snd_hda_codec_read(codec, CS8409_PIN_AFG, 0, AC_VERB_GET_GPIO_DATA, 0); @@ -893,13 +893,15 @@ static void cs42l42_resume(struct sub_codec *cs42l42) /* Clear interrupts, by reading interrupt status registers */ cs8409_i2c_bulk_read(cs42l42, irq_regs, ARRAY_SIZE(irq_regs));
- fsv_old = cs8409_i2c_read(cs42l42, CS42L42_HP_CTL); - if (cs42l42->full_scale_vol == CS42L42_FULL_SCALE_VOL_0DB) - fsv_new = fsv_old & ~CS42L42_FULL_SCALE_VOL_MASK; - else - fsv_new = fsv_old & CS42L42_FULL_SCALE_VOL_MASK; - if (fsv_new != fsv_old) - cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv_new); + fsv = cs8409_i2c_read(cs42l42, CS42L42_HP_CTL); + if (cs42l42->full_scale_vol) { + // Set the full scale volume bit + fsv |= CS42L42_FULL_SCALE_VOL_MASK; + cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv); + } + // Unmute analog channels A and B + fsv = (fsv & ~CS42L42_ANA_MUTE_AB); + cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv);
/* we have to explicitly allow unsol event handling even during the * resume phase so that the jack event is processed properly @@ -921,7 +923,7 @@ static void cs42l42_suspend(struct sub_codec *cs42l42) { CS42L42_MIXER_CHA_VOL, 0x3F }, { CS42L42_MIXER_ADC_VOL, 0x3F }, { CS42L42_MIXER_CHB_VOL, 0x3F }, - { CS42L42_HP_CTL, 0x0F }, + { CS42L42_HP_CTL, 0x0D }, { CS42L42_ASP_RX_DAI0_EN, 0x00 }, { CS42L42_ASP_CLK_CFG, 0x00 }, { CS42L42_PWR_CTL1, 0xFE }, diff --git a/sound/pci/hda/patch_cs8409.h b/sound/pci/hda/patch_cs8409.h index 5e48115caf096..14645d25e70fd 100644 --- a/sound/pci/hda/patch_cs8409.h +++ b/sound/pci/hda/patch_cs8409.h @@ -230,9 +230,10 @@ enum cs8409_coefficient_index_registers { #define CS42L42_PDN_TIMEOUT_US (250000) #define CS42L42_PDN_SLEEP_US (2000) #define CS42L42_INIT_TIMEOUT_MS (45) +#define CS42L42_ANA_MUTE_AB (0x0C) #define CS42L42_FULL_SCALE_VOL_MASK (2) -#define CS42L42_FULL_SCALE_VOL_0DB (1) -#define CS42L42_FULL_SCALE_VOL_MINUS6DB (0) +#define CS42L42_FULL_SCALE_VOL_0DB (0) +#define CS42L42_FULL_SCALE_VOL_MINUS6DB (1)
/* Dell BULLSEYE / WARLOCK / CYBORG Specific Definitions */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre Riteau pierre@stackhpc.com
[ Upstream commit 071ed42cff4fcdd89025d966d48eabef59913bf2 ]
tcf_exts_miss_cookie_base_alloc() calls xa_alloc_cyclic() which can return 1 if the allocation succeeded after wrapping. This was treated as an error, with value 1 returned to caller tcf_exts_init_ex() which sets exts->actions to NULL and returns 1 to caller fl_change().
fl_change() treats err == 1 as success, calling tcf_exts_validate_ex() which calls tcf_action_init() with exts->actions as argument, where it is dereferenced.
Example trace:
BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 114 PID: 16151 Comm: handler114 Kdump: loaded Not tainted 5.14.0-503.16.1.el9_5.x86_64 #1 RIP: 0010:tcf_action_init+0x1f8/0x2c0 Call Trace: tcf_action_init+0x1f8/0x2c0 tcf_exts_validate_ex+0x175/0x190 fl_change+0x537/0x1120 [cls_flower]
Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by: Pierre Riteau pierre@stackhpc.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Link: https://patch.msgid.link/20250213223610.320278-1-pierre@stackhpc.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 84e18b5f72a30..96c39e9a873c7 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -97,7 +97,7 @@ tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base, n, xa_limit_32b, &next, GFP_KERNEL); - if (err) + if (err < 0) goto err_xa_alloc;
exts->miss_cookie_node = n;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit e77aa4b2eaa7fb31b2a7a50214ecb946b2a8b0f6 ]
When a destination client is a user client in the legacy MIDI mode and it sets the no-UMP-conversion flag, currently the all UMP events are still passed as-is. But this may confuse the user-space, because the event packet size is different from the legacy mode.
Since we cannot handle UMP events in user clients unless it's running in the UMP client mode, we should filter out those events instead of accepting blindly. This patch addresses it by slightly adjusting the conditions for UMP event handling at the event delivery time.
Fixes: 329ffe11a014 ("ALSA: seq: Allow suppressing UMP conversions") Link: https://lore.kernel.org/b77a2cd6-7b59-4eb0-a8db-22d507d3af5f@gmail.com Link: https://patch.msgid.link/20250217170034.21930-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/seq/seq_clientmgr.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c index 8b7dfbc8e8207..54931ad0dc990 100644 --- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -682,12 +682,18 @@ static int snd_seq_deliver_single_event(struct snd_seq_client *client, dest_port->time_real);
#if IS_ENABLED(CONFIG_SND_SEQ_UMP) - if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { - if (snd_seq_ev_is_ump(event)) { + if (snd_seq_ev_is_ump(event)) { + if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { result = snd_seq_deliver_from_ump(client, dest, dest_port, event, atomic, hop); goto __skip; - } else if (snd_seq_client_is_ump(dest)) { + } else if (dest->type == USER_CLIENT && + !snd_seq_client_is_ump(dest)) { + result = 0; // drop the event + goto __skip; + } + } else if (snd_seq_client_is_ump(dest)) { + if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { result = snd_seq_deliver_to_ump(client, dest, dest_port, event, atomic, hop); goto __skip;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Ruess julianr@linux.ibm.com
[ Upstream commit 915e34d5ad35a6a9e56113f852ade4a730fb88f0 ]
According to device_release() in /drivers/base/core.c, a device without a release function is a broken device and must be fixed.
The current code directly frees the device after calling device_add() without waiting for other kernel parts to release their references. Thus, a reference could still be held to a struct device, e.g., by sysfs, leading to potential use-after-free issues if a proper release function is not set.
Fixes: 8c81ba20349d ("net/smc: De-tangle ism and smc device initialization") Reviewed-by: Alexandra Winter wintera@linux.ibm.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Signed-off-by: Julian Ruess julianr@linux.ibm.com Signed-off-by: Alexandra Winter wintera@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250214120137.563409-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/ism_drv.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index f6a0626a6b3ec..af0d90beba638 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -611,6 +611,15 @@ static int ism_dev_init(struct ism_dev *ism) return ret; }
+static void ism_dev_release(struct device *dev) +{ + struct ism_dev *ism; + + ism = container_of(dev, struct ism_dev, dev); + + kfree(ism); +} + static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct ism_dev *ism; @@ -624,6 +633,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) dev_set_drvdata(&pdev->dev, ism); ism->pdev = pdev; ism->dev.parent = &pdev->dev; + ism->dev.release = ism_dev_release; device_initialize(&ism->dev); dev_set_name(&ism->dev, dev_name(&pdev->dev)); ret = device_add(&ism->dev); @@ -660,7 +670,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) device_del(&ism->dev); err_dev: dev_set_drvdata(&pdev->dev, NULL); - kfree(ism); + put_device(&ism->dev);
return ret; } @@ -706,7 +716,7 @@ static void ism_remove(struct pci_dev *pdev) pci_disable_device(pdev); device_del(&ism->dev); dev_set_drvdata(&pdev->dev, NULL); - kfree(ism); + put_device(&ism->dev); }
static struct pci_driver ism_driver = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit 5cb431dcf8048572e9ffc6c30cdbd8832cbe502d ]
In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then it will inform upper level networking functions to disable tx queues. H_CLOSED signals that the connection with the vnic server is down and a transport event is expected to recover the device.
Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success. Therefore, the queues would remain active until ibmvnic_cleanup() is called within do_reset().
The problem is that do_reset() depends on the RTNL lock. If several ibmvnic devices are resetting then there can be a long wait time until the last device can grab the lock. During this time the tx/rx queues still appear active to upper level functions.
FYI, we do make a call to netif_carrier_off() outside the RTNL lock but its calls to dev_deactivate() are also dependent on the RTNL lock.
As a result, large amounts of retransmissions were observed in a short period of time, eventually leading to ETIMEOUT. This was specifically seen with HNV devices, likely because of even more RTNL dependencies.
Therefore, ensure the return code of ibmvnic_tx_scrq_flush() is propagated to the xmit function to allow for an earlier (and lock-less) response to a transport event.
Signed-off-by: Nick Child nnac123@linux.ibm.com Link: https://lore.kernel.org/r/20240416164128.387920-1-nnac123@linux.ibm.com Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: bdf5d13aa05e ("ibmvnic: Don't reference skb after sending to VIOS") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 61685c3053ad7..e1e4dc81ad309 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2371,7 +2371,7 @@ static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter, ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq); else ind_bufp->index = 0; - return 0; + return rc; }
static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) @@ -2424,7 +2424,9 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_dropped++; tx_send_failed++; ret = NETDEV_TX_OK; - ibmvnic_tx_scrq_flush(adapter, tx_scrq); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + if (lpar_rc != H_SUCCESS) + goto tx_err; goto out; }
@@ -2439,8 +2441,10 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) dev_kfree_skb_any(skb); tx_send_failed++; tx_dropped++; - ibmvnic_tx_scrq_flush(adapter, tx_scrq); ret = NETDEV_TX_OK; + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + if (lpar_rc != H_SUCCESS) + goto tx_err; goto out; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit 74839f7a82689bf5a21a5447cae8e3a7b7a606d2 ]
Firmware supports two hcalls to send a sub-crq request: H_SEND_SUB_CRQ_INDIRECT and H_SEND_SUB_CRQ. The indirect hcall allows for submission of batched messages while the other hcall is limited to only one message. This protocol is defined in PAPR section 17.2.3.3.
Previously, the ibmvnic xmit function only used the indirect hcall. This allowed the driver to batch it's skbs. A single skb can occupy a few entries per hcall depending on if FW requires skb header information or not. The FW only needs header information if the packet is segmented.
By this logic, if an skb is not GSO then it can fit in one sub-crq message and therefore is a candidate for H_SEND_SUB_CRQ. Batching skb transmission is only useful when there are more packets coming down the line (ie netdev_xmit_more is true).
As it turns out, H_SEND_SUB_CRQ induces less latency than H_SEND_SUB_CRQ_INDIRECT. Therefore, use H_SEND_SUB_CRQ where appropriate.
Small latency gains seen when doing TCP_RR_150 (request/response workload). Ftrace results (graph-time=1): Previous: ibmvnic_xmit = 29618270.83 us / 8860058.0 hits = AVG 3.34 ibmvnic_tx_scrq_flush = 21972231.02 us / 6553972.0 hits = AVG 3.35 Now: ibmvnic_xmit = 22153350.96 us / 8438942.0 hits = AVG 2.63 ibmvnic_tx_scrq_flush = 15858922.4 us / 6244076.0 hits = AVG 2.54
Signed-off-by: Nick Child nnac123@linux.ibm.com Link: https://patch.msgid.link/20240807211809.1259563-6-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: bdf5d13aa05e ("ibmvnic: Don't reference skb after sending to VIOS") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 52 ++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index e1e4dc81ad309..e2e4a1a2fa74f 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -117,6 +117,7 @@ static void free_long_term_buff(struct ibmvnic_adapter *adapter, struct ibmvnic_long_term_buff *ltb); static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter); static void flush_reset_queue(struct ibmvnic_adapter *adapter); +static void print_subcrq_error(struct device *dev, int rc, const char *func);
struct ibmvnic_stat { char name[ETH_GSTRING_LEN]; @@ -2350,8 +2351,29 @@ static void ibmvnic_tx_scrq_clean_buffer(struct ibmvnic_adapter *adapter, } }
+static int send_subcrq_direct(struct ibmvnic_adapter *adapter, + u64 remote_handle, u64 *entry) +{ + unsigned int ua = adapter->vdev->unit_address; + struct device *dev = &adapter->vdev->dev; + int rc; + + /* Make sure the hypervisor sees the complete request */ + dma_wmb(); + rc = plpar_hcall_norets(H_SEND_SUB_CRQ, ua, + cpu_to_be64(remote_handle), + cpu_to_be64(entry[0]), cpu_to_be64(entry[1]), + cpu_to_be64(entry[2]), cpu_to_be64(entry[3])); + + if (rc) + print_subcrq_error(dev, rc, __func__); + + return rc; +} + static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter, - struct ibmvnic_sub_crq_queue *tx_scrq) + struct ibmvnic_sub_crq_queue *tx_scrq, + bool indirect) { struct ibmvnic_ind_xmit_queue *ind_bufp; u64 dma_addr; @@ -2366,7 +2388,13 @@ static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter,
if (!entries) return 0; - rc = send_subcrq_indirect(adapter, handle, dma_addr, entries); + + if (indirect) + rc = send_subcrq_indirect(adapter, handle, dma_addr, entries); + else + rc = send_subcrq_direct(adapter, handle, + (u64 *)ind_bufp->indir_arr); + if (rc) ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq); else @@ -2424,7 +2452,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_dropped++; tx_send_failed++; ret = NETDEV_TX_OK; - lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq, true); if (lpar_rc != H_SUCCESS) goto tx_err; goto out; @@ -2442,7 +2470,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_send_failed++; tx_dropped++; ret = NETDEV_TX_OK; - lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq, true); if (lpar_rc != H_SUCCESS) goto tx_err; goto out; @@ -2540,6 +2568,16 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_crq.v1.flags1 |= IBMVNIC_TX_LSO; tx_crq.v1.mss = cpu_to_be16(skb_shinfo(skb)->gso_size); hdrs += 2; + } else if (!ind_bufp->index && !netdev_xmit_more()) { + ind_bufp->indir_arr[0] = tx_crq; + ind_bufp->index = 1; + tx_buff->num_entries = 1; + netdev_tx_sent_queue(txq, skb->len); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq, false); + if (lpar_rc != H_SUCCESS) + goto tx_err; + + goto early_exit; }
if ((*hdrs >> 7) & 1) @@ -2549,7 +2587,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_buff->num_entries = num_entries; /* flush buffer if current entry can not fit */ if (num_entries + ind_bufp->index > IBMVNIC_MAX_IND_DESCS) { - lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq, true); if (lpar_rc != H_SUCCESS) goto tx_flush_err; } @@ -2557,15 +2595,17 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) indir_arr[0] = tx_crq; memcpy(&ind_bufp->indir_arr[ind_bufp->index], &indir_arr[0], num_entries * sizeof(struct ibmvnic_generic_scrq)); + ind_bufp->index += num_entries; if (__netdev_tx_sent_queue(txq, skb->len, netdev_xmit_more() && ind_bufp->index < IBMVNIC_MAX_IND_DESCS)) { - lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq, true); if (lpar_rc != H_SUCCESS) goto tx_err; }
+early_exit: if (atomic_add_return(num_entries, &tx_scrq->used) >= adapter->req_tx_entries_per_subcrq) { netdev_dbg(netdev, "Stopping queue %d\n", queue_num);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit 2ee73c54a615b74d2e7ee6f20844fd3ba63fc485 ]
Allow tracking of packets sent with send_subcrq direct vs indirect. `ethtool -S <dev>` will now provide a counter of the number of uses of each xmit method. This metric will be useful in performance debugging.
Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20241001163531.1803152-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: bdf5d13aa05e ("ibmvnic: Don't reference skb after sending to VIOS") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 23 ++++++++++++++++------- drivers/net/ethernet/ibm/ibmvnic.h | 3 ++- 2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index e2e4a1a2fa74f..cd5224f6c42a3 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2326,7 +2326,7 @@ static void ibmvnic_tx_scrq_clean_buffer(struct ibmvnic_adapter *adapter, tx_buff = &tx_pool->tx_buff[index]; adapter->netdev->stats.tx_packets--; adapter->netdev->stats.tx_bytes -= tx_buff->skb->len; - adapter->tx_stats_buffers[queue_num].packets--; + adapter->tx_stats_buffers[queue_num].batched_packets--; adapter->tx_stats_buffers[queue_num].bytes -= tx_buff->skb->len; dev_kfree_skb_any(tx_buff->skb); @@ -2418,7 +2418,8 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) unsigned int tx_map_failed = 0; union sub_crq indir_arr[16]; unsigned int tx_dropped = 0; - unsigned int tx_packets = 0; + unsigned int tx_dpackets = 0; + unsigned int tx_bpackets = 0; unsigned int tx_bytes = 0; dma_addr_t data_dma_addr; struct netdev_queue *txq; @@ -2577,6 +2578,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) if (lpar_rc != H_SUCCESS) goto tx_err;
+ tx_dpackets++; goto early_exit; }
@@ -2605,6 +2607,8 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) goto tx_err; }
+ tx_bpackets++; + early_exit: if (atomic_add_return(num_entries, &tx_scrq->used) >= adapter->req_tx_entries_per_subcrq) { @@ -2612,7 +2616,6 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) netif_stop_subqueue(netdev, queue_num); }
- tx_packets++; tx_bytes += skb->len; txq_trans_cond_update(txq); ret = NETDEV_TX_OK; @@ -2642,10 +2645,11 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) rcu_read_unlock(); netdev->stats.tx_dropped += tx_dropped; netdev->stats.tx_bytes += tx_bytes; - netdev->stats.tx_packets += tx_packets; + netdev->stats.tx_packets += tx_bpackets + tx_dpackets; adapter->tx_send_failed += tx_send_failed; adapter->tx_map_failed += tx_map_failed; - adapter->tx_stats_buffers[queue_num].packets += tx_packets; + adapter->tx_stats_buffers[queue_num].batched_packets += tx_bpackets; + adapter->tx_stats_buffers[queue_num].direct_packets += tx_dpackets; adapter->tx_stats_buffers[queue_num].bytes += tx_bytes; adapter->tx_stats_buffers[queue_num].dropped_packets += tx_dropped;
@@ -3811,7 +3815,10 @@ static void ibmvnic_get_strings(struct net_device *dev, u32 stringset, u8 *data) memcpy(data, ibmvnic_stats[i].name, ETH_GSTRING_LEN);
for (i = 0; i < adapter->req_tx_queues; i++) { - snprintf(data, ETH_GSTRING_LEN, "tx%d_packets", i); + snprintf(data, ETH_GSTRING_LEN, "tx%d_batched_packets", i); + data += ETH_GSTRING_LEN; + + snprintf(data, ETH_GSTRING_LEN, "tx%d_direct_packets", i); data += ETH_GSTRING_LEN;
snprintf(data, ETH_GSTRING_LEN, "tx%d_bytes", i); @@ -3876,7 +3883,9 @@ static void ibmvnic_get_ethtool_stats(struct net_device *dev, (adapter, ibmvnic_stats[i].offset));
for (j = 0; j < adapter->req_tx_queues; j++) { - data[i] = adapter->tx_stats_buffers[j].packets; + data[i] = adapter->tx_stats_buffers[j].batched_packets; + i++; + data[i] = adapter->tx_stats_buffers[j].direct_packets; i++; data[i] = adapter->tx_stats_buffers[j].bytes; i++; diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h index 4e18b4cefa972..b3fc18db4f4c3 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.h +++ b/drivers/net/ethernet/ibm/ibmvnic.h @@ -213,7 +213,8 @@ struct ibmvnic_statistics {
#define NUM_TX_STATS 3 struct ibmvnic_tx_queue_stats { - u64 packets; + u64 batched_packets; + u64 direct_packets; u64 bytes; u64 dropped_packets; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit bdf5d13aa05ec314d4385b31ac974d6c7e0997c9 ]
Previously, after successfully flushing the xmit buffer to VIOS, the tx_bytes stat was incremented by the length of the skb.
It is invalid to access the skb memory after sending the buffer to the VIOS because, at any point after sending, the VIOS can trigger an interrupt to free this memory. A race between reading skb->len and freeing the skb is possible (especially during LPM) and will result in use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ibmvnic_xmit+0x75c/0x1808 [ibmvnic] Read of size 4 at addr c00000024eb48a70 by task hxecom/14495 <...> Call Trace: [c000000118f66cf0] [c0000000018cba6c] dump_stack_lvl+0x84/0xe8 (unreliable) [c000000118f66d20] [c0000000006f0080] print_report+0x1a8/0x7f0 [c000000118f66df0] [c0000000006f08f0] kasan_report+0x128/0x1f8 [c000000118f66f00] [c0000000006f2868] __asan_load4+0xac/0xe0 [c000000118f66f20] [c0080000046eac84] ibmvnic_xmit+0x75c/0x1808 [ibmvnic] [c000000118f67340] [c0000000014be168] dev_hard_start_xmit+0x150/0x358 <...> Freed by task 0: kasan_save_stack+0x34/0x68 kasan_save_track+0x2c/0x50 kasan_save_free_info+0x64/0x108 __kasan_mempool_poison_object+0x148/0x2d4 napi_skb_cache_put+0x5c/0x194 net_tx_action+0x154/0x5b8 handle_softirqs+0x20c/0x60c do_softirq_own_stack+0x6c/0x88 <...> The buggy address belongs to the object at c00000024eb48a00 which belongs to the cache skbuff_head_cache of size 224 ==================================================================
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250214155233.235559-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index cd5224f6c42a3..4f18addc191b8 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2424,6 +2424,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) dma_addr_t data_dma_addr; struct netdev_queue *txq; unsigned long lpar_rc; + unsigned int skblen; union sub_crq tx_crq; unsigned int offset; int num_entries = 1; @@ -2526,6 +2527,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_buff->skb = skb; tx_buff->index = bufidx; tx_buff->pool_index = queue_num; + skblen = skb->len;
memset(&tx_crq, 0, sizeof(tx_crq)); tx_crq.v1.first = IBMVNIC_CRQ_CMD; @@ -2616,7 +2618,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) netif_stop_subqueue(netdev, queue_num); }
- tx_bytes += skb->len; + tx_bytes += skblen; txq_trans_cond_update(txq); ret = NETDEV_TX_OK; goto out;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 8fb5bb169d17cdd12c2dcc2e96830ed487d77a0f ]
sockmap expects all vsocks to have a transport assigned, which is expressed in vsock_proto::psock_update_sk_prot(). However, there is an edge case where an unconnected (connectible) socket may lose its previously assigned transport. This is handled with a NULL check in the vsock/BPF recv path.
Another design detail is that listening vsocks are not supposed to have any transport assigned at all. Which implies they are not supported by the sockmap. But this is complicated by the fact that a socket, before switching to TCP_LISTEN, may have had some transport assigned during a failed connect() attempt. Hence, we may end up with a listening vsock in a sockmap, which blows up quickly:
KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127] CPU: 7 UID: 0 PID: 56 Comm: kworker/7:0 Not tainted 6.14.0-rc1+ Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:vsock_read_skb+0x4b/0x90 Call Trace: sk_psock_verdict_data_ready+0xa4/0x2e0 virtio_transport_recv_pkt+0x1ca8/0x2acc vsock_loopback_work+0x27d/0x3f0 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x35a/0x700 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30
For connectible sockets, instead of relying solely on the state of vsk->transport, tell sockmap to only allow those representing established connections. This aligns with the behaviour for AF_INET and AF_UNIX.
Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj mhal@rbox.co Acked-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock_map.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index f37a26efdd8ab..dcc0f31a17a8d 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -538,6 +538,9 @@ static bool sock_map_sk_state_allowed(const struct sock *sk) return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN); if (sk_is_stream_unix(sk)) return (1 << sk->sk_state) & TCPF_ESTABLISHED; + if (sk_is_vsock(sk) && + (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET)) + return (1 << sk->sk_state) & TCPF_ESTABLISHED; return true; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 857ae05549ee2542317e7084ecaa5f8536634dd9 ]
In the spirit of commit 91751e248256 ("vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]"), armorize the "impossible" cases with a warning.
Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj mhal@rbox.co Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/af_vsock.c | 3 +++ net/vmw_vsock/vsock_bpf.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 618b18e80cea0..622875a6f787c 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1185,6 +1185,9 @@ static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor) { struct vsock_sock *vsk = vsock_sk(sk);
+ if (WARN_ON_ONCE(!vsk->transport)) + return -ENODEV; + return vsk->transport->read_skb(vsk, read_actor); }
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index f201d9eca1df2..07b96d56f3a57 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -87,7 +87,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, lock_sock(sk); vsk = vsock_sk(sk);
- if (!vsk->transport) { + if (WARN_ON_ONCE(!vsk->transport)) { copied = -ENODEV; goto out; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit f5da7c45188eea71394bf445655cae2df88a7788 ]
Since commit under Fixes we set the window clamp in accordance to newly measured rcvbuf scaling_ratio. If the scaling_ratio decreased significantly we may put ourselves in a situation where windows become smaller than rcvq_space, preventing tcp_rcv_space_adjust() from increasing rcvbuf.
The significant decrease of scaling_ratio is far more likely since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), which increased the "default" scaling ratio from ~30% to 50%.
Hitting the bad condition depends a lot on TCP tuning, and drivers at play. One of Meta's workloads hits it reliably under following conditions: - default rcvbuf of 125k - sender MTU 1500, receiver MTU 5000 - driver settles on scaling_ratio of 78 for the config above. Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss (10 * 5k = 50k). Once we find out the true scaling ratio and MSS we clamp the windows to 38k. Triggering the condition also depends on the message sequence of this workload. I can't repro the problem with simple iperf or TCP_RR-style tests.
Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Neal Cardwell ncardwell@google.com Link: https://patch.msgid.link/20250217232905.3162187-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f6a213bae5ccc..6074b4c3ab940 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -248,9 +248,15 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) do_div(val, skb->truesize); tcp_sk(sk)->scaling_ratio = val ? val : 1;
- if (old_ratio != tcp_sk(sk)->scaling_ratio) - WRITE_ONCE(tcp_sk(sk)->window_clamp, - tcp_win_from_space(sk, sk->sk_rcvbuf)); + if (old_ratio != tcp_sk(sk)->scaling_ratio) { + struct tcp_sock *tp = tcp_sk(sk); + + val = tcp_win_from_space(sk, sk->sk_rcvbuf); + tcp_set_window_clamp(sk, val); + + if (tp->window_clamp < tp->rcvq_space.space) + tp->rcvq_space.space = tp->window_clamp; + } } icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, tcp_sk(sk)->advmss);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 4ccacf86491d33d2486b62d4d44864d7101b299d ]
Brad Spengler reported the list_del() corruption splat in gtp_net_exit_batch_rtnl(). [0]
Commit eb28fd76c0a0 ("gtp: Destroy device along with udp socket's netns dismantle.") added the for_each_netdev() loop in gtp_net_exit_batch_rtnl() to destroy devices in each netns as done in geneve and ip tunnels.
However, this could trigger ->dellink() twice for the same device during ->exit_batch_rtnl().
Say we have two netns A & B and gtp device B that resides in netns B but whose UDP socket is in netns A.
1. cleanup_net() processes netns A and then B.
2. gtp_net_exit_batch_rtnl() finds the device B while iterating netns A's gn->gtp_dev_list and calls ->dellink().
[ device B is not yet unlinked from netns B as unregister_netdevice_many() has not been called. ]
3. gtp_net_exit_batch_rtnl() finds the device B while iterating netns B's for_each_netdev() and calls ->dellink().
gtp_dellink() cleans up the device's hash table, unlinks the dev from gn->gtp_dev_list, and calls unregister_netdevice_queue().
Basically, calling gtp_dellink() multiple times is fine unless CONFIG_DEBUG_LIST is enabled.
Let's remove for_each_netdev() in gtp_net_exit_batch_rtnl() and delegate the destruction to default_device_exit_batch() as done in bareudp.
[0]: list_del corruption, ffff8880aaa62c00->next (autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]) is LIST_POISON1 (ffffffffffffff02) (prev is 0xffffffffffffff04) kernel BUG at lib/list_debug.c:58! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 UID: 0 PID: 1804 Comm: kworker/u8:7 Tainted: G T 6.12.13-grsec-full-20250211091339 #1 Tainted: [T]=RANDSTRUCT Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:[<ffffffff84947381>] __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58 Code: c2 76 91 31 c0 e8 9f b1 f7 fc 0f 0b 4d 89 f0 48 c7 c1 02 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 e0 c2 76 91 31 c0 e8 7f b1 f7 fc <0f> 0b 4d 89 e8 48 c7 c1 04 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 60 RSP: 0018:fffffe8040b4fbd0 EFLAGS: 00010283 RAX: 00000000000000cc RBX: dffffc0000000000 RCX: ffffffff818c4054 RDX: ffffffff84947381 RSI: ffffffff818d1512 RDI: 0000000000000000 RBP: ffff8880aaa62c00 R08: 0000000000000001 R09: fffffbd008169f32 R10: fffffe8040b4f997 R11: 0000000000000001 R12: a1988d84f24943e4 R13: ffffffffffffff02 R14: ffffffffffffff04 R15: ffff8880aaa62c08 RBX: kasan shadow of 0x0 RCX: __wake_up_klogd.part.0+0x74/0xe0 kernel/printk/printk.c:4554 RDX: __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58 RSI: vprintk+0x72/0x100 kernel/printk/printk_safe.c:71 RBP: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object] RSP: process kstack fffffe8040b4fbd0+0x7bd0/0x8000 [kworker/u8:7+netns 1804 ] R09: kasan shadow of process kstack fffffe8040b4f990+0x7990/0x8000 [kworker/u8:7+netns 1804 ] R10: process kstack fffffe8040b4f997+0x7997/0x8000 [kworker/u8:7+netns 1804 ] R15: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc08/0x1000 [slab object] FS: 0000000000000000(0000) GS:ffff888116000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000748f5372c000 CR3: 0000000015408000 CR4: 00000000003406f0 shadow CR4: 00000000003406f0 Stack: 0000000000000000 ffffffff8a0c35e7 ffffffff8a0c3603 ffff8880aaa62c00 ffff8880aaa62c00 0000000000000004 ffff88811145311c 0000000000000005 0000000000000001 ffff8880aaa62000 fffffe8040b4fd40 ffffffff8a0c360d Call Trace: <TASK> [<ffffffff8a0c360d>] __list_del_entry_valid include/linux/list.h:131 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] __list_del_entry include/linux/list.h:248 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] list_del include/linux/list.h:262 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] gtp_dellink+0x16d/0x360 drivers/net/gtp.c:1557 fffffe8040b4fc28 [<ffffffff8a0d0404>] gtp_net_exit_batch_rtnl+0x124/0x2c0 drivers/net/gtp.c:2495 fffffe8040b4fc88 [<ffffffff8e705b24>] cleanup_net+0x5a4/0xbe0 net/core/net_namespace.c:635 fffffe8040b4fcd0 [<ffffffff81754c97>] process_one_work+0xbd7/0x2160 kernel/workqueue.c:3326 fffffe8040b4fd88 [<ffffffff81757195>] process_scheduled_works kernel/workqueue.c:3407 [inline] fffffe8040b4fec0 [<ffffffff81757195>] worker_thread+0x6b5/0xfa0 kernel/workqueue.c:3488 fffffe8040b4fec0 [<ffffffff817782a0>] kthread+0x360/0x4c0 kernel/kthread.c:397 fffffe8040b4ff78 [<ffffffff814d8594>] ret_from_fork+0x74/0xe0 arch/x86/kernel/process.c:172 fffffe8040b4ffb8 [<ffffffff8110f509>] ret_from_fork_asm+0x29/0xc0 arch/x86/entry/entry_64.S:399 fffffe8040b4ffe8 </TASK> Modules linked in:
Fixes: eb28fd76c0a0 ("gtp: Destroy device along with udp socket's netns dismantle.") Reported-by: Brad Spengler spender@grsecurity.net Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250217203705.40342-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/gtp.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 47238c3ec82e7..55160a5fc90fc 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1895,11 +1895,6 @@ static void __net_exit gtp_net_exit_batch_rtnl(struct list_head *net_list, list_for_each_entry(net, net_list, exit_list) { struct gtp_net *gn = net_generic(net, gtp_net_id); struct gtp_dev *gtp, *gtp_next; - struct net_device *dev; - - for_each_netdev(net, dev) - if (dev->rtnl_link_ops == >p_link_ops) - gtp_dellink(dev, dev_to_kill);
list_for_each_entry_safe(gtp, gtp_next, &gn->gtp_dev_list, list) gtp_dellink(gtp->dev, dev_to_kill);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 62fab6eef61f245dc8797e3a6a5b890ef40e8628 ]
As explained in the previous patch, iterating for_each_netdev() and gn->geneve_list during ->exit_batch_rtnl() could trigger ->dellink() twice for the same device.
If CONFIG_DEBUG_LIST is enabled, we will see a list_del() corruption splat in the 2nd call of geneve_dellink().
Let's remove for_each_netdev() in geneve_destroy_tunnels() and delegate that part to default_device_exit_batch().
Fixes: 9593172d93b9 ("geneve: Fix use-after-free in geneve_find_dev().") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250217203705.40342-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/geneve.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index c2066f19295d4..27761334e1bff 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1961,14 +1961,7 @@ static void geneve_destroy_tunnels(struct net *net, struct list_head *head) { struct geneve_net *gn = net_generic(net, geneve_net_id); struct geneve_dev *geneve, *next; - struct net_device *dev, *aux;
- /* gather any geneve devices that were moved into this ns */ - for_each_netdev_safe(net, dev, aux) - if (dev->rtnl_link_ops == &geneve_link_ops) - geneve_dellink(dev, head); - - /* now gather any other geneve devices that were created in this ns */ list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) geneve_dellink(geneve->dev, head); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 3e5796862c692ea608d96f0a1437f9290f44953a ]
This patch fixes a bug in TC flower filter where rules combining a specific destination port with a source port range weren't working correctly.
The specific case was when users tried to configure rules like:
tc filter add dev ens38 ingress protocol ip flower ip_proto udp \ dst_port 5000 src_port 2000-3000 action drop
The root cause was in the flow dissector code. While both FLOW_DISSECTOR_KEY_PORTS and FLOW_DISSECTOR_KEY_PORTS_RANGE flags were being set correctly in the classifier, the __skb_flow_dissect_ports() function was only populating one of them: whichever came first in the enum check. This meant that when the code needed both a specific port and a port range, one of them would be left as 0, causing the filter to not match packets as expected.
Fix it by removing the either/or logic and instead checking and populating both key types independently when they're in use.
Fixes: 8ffb055beae5 ("cls_flower: Fix the behavior using port ranges with hw-offload") Reported-by: Qiang Zhang dtzq01@gmail.com Closes: https://lore.kernel.org/netdev/CAPx+-5uvFxkhkz4=j_Xuwkezjn9U6kzKTD5jz4tZ9msS... Cc: Yoshiki Komachi komachi.yoshiki@gmail.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://patch.msgid.link/20250218043210.732959-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/flow_dissector.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 00a5c41c1831d..842da66034158 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -829,23 +829,30 @@ __skb_flow_dissect_ports(const struct sk_buff *skb, void *target_container, const void *data, int nhoff, u8 ip_proto, int hlen) { - enum flow_dissector_key_id dissector_ports = FLOW_DISSECTOR_KEY_MAX; - struct flow_dissector_key_ports *key_ports; + struct flow_dissector_key_ports_range *key_ports_range = NULL; + struct flow_dissector_key_ports *key_ports = NULL; + __be32 ports;
if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) - dissector_ports = FLOW_DISSECTOR_KEY_PORTS; - else if (dissector_uses_key(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS_RANGE)) - dissector_ports = FLOW_DISSECTOR_KEY_PORTS_RANGE; + key_ports = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS, + target_container);
- if (dissector_ports == FLOW_DISSECTOR_KEY_MAX) + if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS_RANGE)) + key_ports_range = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS_RANGE, + target_container); + + if (!key_ports && !key_ports_range) return;
- key_ports = skb_flow_dissector_target(flow_dissector, - dissector_ports, - target_container); - key_ports->ports = __skb_flow_get_ports(skb, nhoff, ip_proto, - data, hlen); + ports = __skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen); + + if (key_ports) + key_ports->ports = ports; + + if (key_ports_range) + key_ports_range->tp.ports = ports; }
static void
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 69ab34f705fbfabcace64b5d53bb7a4450fac875 ]
Fix how port range keys are handled in __skb_flow_bpf_to_target() by: - Separating PORTS and PORTS_RANGE key handling - Using correct key_ports_range structure for range keys - Properly initializing both key types independently
This ensures port range information is correctly stored in its dedicated structure rather than incorrectly using the regular ports key structure.
Fixes: 59fb9b62fb6c ("flow_dissector: Fix to use new variables for port ranges in bpf hook") Reported-by: Qiang Zhang dtzq01@gmail.com Closes: https://lore.kernel.org/netdev/CAPx+-5uvFxkhkz4=j_Xuwkezjn9U6kzKTD5jz4tZ9msS... Cc: Yoshiki Komachi komachi.yoshiki@gmail.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Link: https://patch.msgid.link/20250218043210.732959-4-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/flow_dissector.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 842da66034158..aafa754b6cbab 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -907,6 +907,7 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, struct flow_dissector *flow_dissector, void *target_container) { + struct flow_dissector_key_ports_range *key_ports_range = NULL; struct flow_dissector_key_ports *key_ports = NULL; struct flow_dissector_key_control *key_control; struct flow_dissector_key_basic *key_basic; @@ -951,20 +952,21 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; }
- if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) + if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) { key_ports = skb_flow_dissector_target(flow_dissector, FLOW_DISSECTOR_KEY_PORTS, target_container); - else if (dissector_uses_key(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS_RANGE)) - key_ports = skb_flow_dissector_target(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS_RANGE, - target_container); - - if (key_ports) { key_ports->src = flow_keys->sport; key_ports->dst = flow_keys->dport; } + if (dissector_uses_key(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS_RANGE)) { + key_ports_range = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS_RANGE, + target_container); + key_ports_range->tp.src = flow_keys->sport; + key_ports_range->tp.dst = flow_keys->dport; + }
if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 4b5a28b38c4a0106c64416a1b2042405166b26ce ]
Add dedicated helper for finding devices by hardware address when holding rtnl_lock, similar to existing dev_getbyhwaddr_rcu(). This prevents PROVE_LOCKING warnings when rtnl_lock is held but RCU read lock is not.
Extract common address comparison logic into dev_addr_cmp().
The context about this change could be found in the following discussion:
Link: https://lore.kernel.org/all/20250206-scarlet-ermine-of-improvement-1fcac5@le...
Cc: kuniyu@amazon.com Cc: ushankar@purestorage.com Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-1-d3d6892db9e1@debian.... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 4eae0ee0f1e6 ("arp: switch to dev_getbyhwaddr() in arp_req_set_public()") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netdevice.h | 2 ++ net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++--- 2 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 95ee88dfe0b9c..337a9d1c558f3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3072,6 +3072,8 @@ static inline struct net_device *first_net_device_rcu(struct net *net) }
int netdev_boot_setup_check(struct net_device *dev); +struct net_device *dev_getbyhwaddr(struct net *net, unsigned short type, + const char *hwaddr); struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type, const char *hwaddr); struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type); diff --git a/net/core/dev.c b/net/core/dev.c index 479a3892f98c3..8c30cdcf05d4b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -954,6 +954,12 @@ int netdev_get_name(struct net *net, char *name, int ifindex) return ret; }
+static bool dev_addr_cmp(struct net_device *dev, unsigned short type, + const char *ha) +{ + return dev->type == type && !memcmp(dev->dev_addr, ha, dev->addr_len); +} + /** * dev_getbyhwaddr_rcu - find a device by its hardware address * @net: the applicable net namespace @@ -962,7 +968,7 @@ int netdev_get_name(struct net *net, char *name, int ifindex) * * Search for an interface by MAC address. Returns NULL if the device * is not found or a pointer to the device. - * The caller must hold RCU or RTNL. + * The caller must hold RCU. * The returned device has not had its ref count increased * and the caller must therefore be careful about locking * @@ -974,14 +980,39 @@ struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type, struct net_device *dev;
for_each_netdev_rcu(net, dev) - if (dev->type == type && - !memcmp(dev->dev_addr, ha, dev->addr_len)) + if (dev_addr_cmp(dev, type, ha)) return dev;
return NULL; } EXPORT_SYMBOL(dev_getbyhwaddr_rcu);
+/** + * dev_getbyhwaddr() - find a device by its hardware address + * @net: the applicable net namespace + * @type: media type of device + * @ha: hardware address + * + * Similar to dev_getbyhwaddr_rcu(), but the owner needs to hold + * rtnl_lock. + * + * Context: rtnl_lock() must be held. + * Return: pointer to the net_device, or NULL if not found + */ +struct net_device *dev_getbyhwaddr(struct net *net, unsigned short type, + const char *ha) +{ + struct net_device *dev; + + ASSERT_RTNL(); + for_each_netdev(net, dev) + if (dev_addr_cmp(dev, type, ha)) + return dev; + + return NULL; +} +EXPORT_SYMBOL(dev_getbyhwaddr); + struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type) { struct net_device *dev, *ret = NULL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 4eae0ee0f1e6256d0b0b9dd6e72f1d9cf8f72e08 ]
The arp_req_set_public() function is called with the rtnl lock held, which provides enough synchronization protection. This makes the RCU variant of dev_getbyhwaddr() unnecessary. Switch to using the simpler dev_getbyhwaddr() function since we already have the required rtnl locking.
This change helps maintain consistency in the networking code by using the appropriate helper function for the existing locking context. Since we're not holding the RCU read lock in arp_req_set_public() existing code could trigger false positive locking warnings.
Fixes: 941666c2e3e0 ("net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()") Suggested-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Breno Leitao leitao@debian.org Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-2-d3d6892db9e1@debian.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/arp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 02776453bf97a..784dc8b37be5a 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1030,7 +1030,7 @@ static int arp_req_set_public(struct net *net, struct arpreq *r, if (mask && mask != htonl(0xFFFFFFFF)) return -EINVAL; if (!dev && (r->arp_flags & ATF_COM)) { - dev = dev_getbyhwaddr_rcu(net, r->arp_ha.sa_family, + dev = dev_getbyhwaddr(net, r->arp_ha.sa_family, r->arp_ha.sa_data); if (!dev) return -ENODEV;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Hu nick.hu@sifive.com
[ Upstream commit a370295367b55662a32a4be92565fe72a5aa79bb ]
The external PHY will undergo a soft reset twice during the resume process when it wake up from suspend. The first reset occurs when the axienet driver calls phylink_of_phy_connect(), and the second occurs when mdio_bus_phy_resume() invokes phy_init_hw(). The second soft reset of the external PHY does not reinitialize the internal PHY, which causes issues with the internal PHY, resulting in the PHY link being down. To prevent this, setting the mac_managed_pm flag skips the mdio_bus_phy_resume() function.
Fixes: a129b41fe0a8 ("Revert "net: phy: dp83867: perform soft reset and retain established link"") Signed-off-by: Nick Hu nick.hu@sifive.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://patch.msgid.link/20250217055843.19799-1-nick.hu@sifive.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 02e11827440b5..3517a2275821f 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -2161,6 +2161,7 @@ static int axienet_probe(struct platform_device *pdev)
lp->phylink_config.dev = &ndev->dev; lp->phylink_config.type = PHYLINK_NETDEV; + lp->phylink_config.mac_managed_pm = true; lp->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_ASYM_PAUSE | MAC_10FD | MAC_100FD | MAC_1000FD;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 9b6412e6979f6f9e0632075f8f008937b5cd4efd ]
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while running tests that boil down to: - create a pair of netns - run a basic TCP test over ipcomp6 - delete the pair of netns
The xfrm_state found on spi_byaddr was not deleted at the time we delete the netns, because we still have a reference on it. This lingering reference comes from a secpath (which holds a ref on the xfrm_state), which is still attached to an skb. This skb is not leaked, it ends up on sk_receive_queue and then gets defer-free'd by skb_attempt_defer_free.
The problem happens when we defer freeing an skb (push it on one CPU's defer_list), and don't flush that list before the netns is deleted. In that case, we still have a reference on the xfrm_state that we don't expect at this point.
We already drop the skb's dst in the TCP receive path when it's no longer needed, so let's also drop the secpath. At this point, tcp_filter has already called into the LSM hooks that may require the secpath, so it should not be needed anymore. However, in some of those places, the MPTCP extension has just been attached to the skb, so we cannot simply drop all extensions.
Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Reported-by: Xiumei Mu xmu@redhat.com Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tcp.h | 14 ++++++++++++++ net/ipv4/tcp_fastopen.c | 4 ++-- net/ipv4/tcp_input.c | 8 ++++---- net/ipv4/tcp_ipv4.c | 2 +- 4 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index b3917af309e0f..78c755414fa87 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -40,6 +40,7 @@ #include <net/inet_ecn.h> #include <net/dst.h> #include <net/mptcp.h> +#include <net/xfrm.h>
#include <linux/seq_file.h> #include <linux/memcontrol.h> @@ -630,6 +631,19 @@ void tcp_fin(struct sock *sk); void tcp_check_space(struct sock *sk); void tcp_sack_compress_send_ack(struct sock *sk);
+static inline void tcp_cleanup_skb(struct sk_buff *skb) +{ + skb_dst_drop(skb); + secpath_reset(skb); +} + +static inline void tcp_add_receive_queue(struct sock *sk, struct sk_buff *skb) +{ + DEBUG_NET_WARN_ON_ONCE(skb_dst(skb)); + DEBUG_NET_WARN_ON_ONCE(secpath_exists(skb)); + __skb_queue_tail(&sk->sk_receive_queue, skb); +} + /* tcp_timer.c */ void tcp_init_xmit_timers(struct sock *); static inline void tcp_clear_xmit_timers(struct sock *sk) diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 0f523cbfe329e..32b28fc21b63c 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -178,7 +178,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) if (!skb) return;
- skb_dst_drop(skb); + tcp_cleanup_skb(skb); /* segs_in has been initialized to 1 in tcp_create_openreq_child(). * Hence, reset segs_in to 0 before calling tcp_segs_in() * to avoid double counting. Also, tcp_segs_in() expects @@ -195,7 +195,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_SYN;
tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); tp->syn_data_acked = 1;
/* u64_stats_update_begin(&tp->syncp) not needed here, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6074b4c3ab940..10d38ec0ff5ac 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4874,7 +4874,7 @@ static void tcp_ofo_queue(struct sock *sk) tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; if (!eaten) - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); else kfree_skb_partial(skb, fragstolen);
@@ -5065,7 +5065,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, skb, fragstolen)) ? 1 : 0; tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); if (!eaten) { - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); skb_set_owner_r(skb, sk); } return eaten; @@ -5148,7 +5148,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); return; } - skb_dst_drop(skb); + tcp_cleanup_skb(skb); __skb_pull(skb, tcp_hdr(skb)->doff * 4);
reason = SKB_DROP_REASON_NOT_SPECIFIED; @@ -6098,7 +6098,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS);
/* Bulk data transfer: receiver */ - skb_dst_drop(skb); + tcp_cleanup_skb(skb); __skb_pull(skb, tcp_header_len); eaten = tcp_queue_rcv(sk, skb, &fragstolen);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 705320f160ac8..2f49a504c9d3e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1842,7 +1842,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb, */ skb_condense(skb);
- skb_dst_drop(skb); + tcp_cleanup_skb(skb);
if (unlikely(tcp_checksum_complete(skb))) { bh_unlock_sock(sk);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
[ Upstream commit 576d96c5c896221b5bc8feae473739469a92e144 ]
K2G display controller does not support soft reset, but we can do the most important steps manually: mask the IRQs and disable the VPs.
Reviewed-by: Aradhya Bhatia a-bhatia1@ti.com Link: https://lore.kernel.org/r/20231109-tidss-probe-v2-7-ac91b5ea35c0@ideasonboar... Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Stable-dep-of: a9a73f2661e6 ("drm/tidss: Fix race condition while handling interrupt registers") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/tidss/tidss_dispc.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c index ee3531bbccd7d..4327e1203c565 100644 --- a/drivers/gpu/drm/tidss/tidss_dispc.c +++ b/drivers/gpu/drm/tidss/tidss_dispc.c @@ -2704,14 +2704,28 @@ static void dispc_init_errata(struct dispc_device *dispc) } }
+/* + * K2G display controller does not support soft reset, so we do a basic manual + * reset here: make sure the IRQs are masked and VPs are disabled. + */ +static void dispc_softreset_k2g(struct dispc_device *dispc) +{ + dispc_set_irqenable(dispc, 0); + dispc_read_and_clear_irqstatus(dispc); + + for (unsigned int vp_idx = 0; vp_idx < dispc->feat->num_vps; ++vp_idx) + VP_REG_FLD_MOD(dispc, vp_idx, DISPC_VP_CONTROL, 0, 0, 0); +} + static int dispc_softreset(struct dispc_device *dispc) { u32 val; int ret = 0;
- /* K2G display controller does not support soft reset */ - if (dispc->feat->subrev == DISPC_K2G) + if (dispc->feat->subrev == DISPC_K2G) { + dispc_softreset_k2g(dispc); return 0; + }
/* Soft reset */ REG_FLD_MOD(dispc, DSS_SYSCONFIG, 1, 1, 1);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Devarsh Thakkar devarsht@ti.com
[ Upstream commit a9a73f2661e6f625d306c9b0ef082e4593f45a21 ]
The driver has a spinlock for protecting the irq_masks field and irq enable registers. However, the driver misses protecting the irq status registers which can lead to races.
Take the spinlock when accessing irqstatus too.
Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") Cc: stable@vger.kernel.org Signed-off-by: Devarsh Thakkar devarsht@ti.com [Tomi: updated the desc] Reviewed-by: Jonathan Cormier jcormier@criticallink.com Tested-by: Jonathan Cormier jcormier@criticallink.com Reviewed-by: Aradhya Bhatia aradhya.bhatia@linux.dev Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Link: https://patchwork.freedesktop.org/patch/msgid/20241021-tidss-irq-fix-v1-6-82... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/tidss/tidss_dispc.c | 4 ++++ drivers/gpu/drm/tidss/tidss_irq.c | 2 ++ 2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c index 4327e1203c565..355c64bafb82b 100644 --- a/drivers/gpu/drm/tidss/tidss_dispc.c +++ b/drivers/gpu/drm/tidss/tidss_dispc.c @@ -2710,8 +2710,12 @@ static void dispc_init_errata(struct dispc_device *dispc) */ static void dispc_softreset_k2g(struct dispc_device *dispc) { + unsigned long flags; + + spin_lock_irqsave(&dispc->tidss->wait_lock, flags); dispc_set_irqenable(dispc, 0); dispc_read_and_clear_irqstatus(dispc); + spin_unlock_irqrestore(&dispc->tidss->wait_lock, flags);
for (unsigned int vp_idx = 0; vp_idx < dispc->feat->num_vps; ++vp_idx) VP_REG_FLD_MOD(dispc, vp_idx, DISPC_VP_CONTROL, 0, 0, 0); diff --git a/drivers/gpu/drm/tidss/tidss_irq.c b/drivers/gpu/drm/tidss/tidss_irq.c index 0c681c7600bcb..f13c7e434f8ed 100644 --- a/drivers/gpu/drm/tidss/tidss_irq.c +++ b/drivers/gpu/drm/tidss/tidss_irq.c @@ -60,7 +60,9 @@ static irqreturn_t tidss_irq_handler(int irq, void *arg) unsigned int id; dispc_irq_t irqstatus;
+ spin_lock(&tidss->wait_lock); irqstatus = dispc_read_and_clear_irqstatus(tidss->dispc); + spin_unlock(&tidss->wait_lock);
for (id = 0; id < tidss->num_crtcs; id++) { struct drm_crtc *crtc = tidss->crtcs[id];
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit b2acb89af1a400be721bcb14f137aa22b509caba ]
Error messages resulting from incorrect usage of the kernel uabi should not spam dmesg by default. But it is useful to enable them to debug userspace. So demote to DRM_UT_DRIVER.
Signed-off-by: Rob Clark robdclark@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/564189/ Stable-dep-of: 3a47f4b439be ("drm/msm/gem: prevent integer overflow in msm_ioctl_gem_submit()") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_gem.c | 6 ++--- drivers/gpu/drm/msm/msm_gem_submit.c | 36 ++++++++++++++++------------ 2 files changed, 24 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index db1e748daa753..1113e6b2ec8ec 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -226,9 +226,9 @@ static struct page **msm_gem_pin_pages_locked(struct drm_gem_object *obj,
msm_gem_assert_locked(obj);
- if (GEM_WARN_ON(msm_obj->madv > madv)) { - DRM_DEV_ERROR(obj->dev->dev, "Invalid madv state: %u vs %u\n", - msm_obj->madv, madv); + if (msm_obj->madv > madv) { + DRM_DEV_DEBUG_DRIVER(obj->dev->dev, "Invalid madv state: %u vs %u\n", + msm_obj->madv, madv); return ERR_PTR(-EBUSY); }
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index 99744de6c05a1..207b6ba1565d8 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -17,6 +17,12 @@ #include "msm_gem.h" #include "msm_gpu_trace.h"
+/* For userspace errors, use DRM_UT_DRIVER.. so that userspace can enable + * error msgs for debugging, but we don't spam dmesg by default + */ +#define SUBMIT_ERROR(submit, fmt, ...) \ + DRM_DEV_DEBUG_DRIVER((submit)->dev->dev, fmt, ##__VA_ARGS__) + /* * Cmdstream submission: */ @@ -136,7 +142,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
if ((submit_bo.flags & ~MSM_SUBMIT_BO_FLAGS) || !(submit_bo.flags & MANDATORY_FLAGS)) { - DRM_ERROR("invalid flags: %x\n", submit_bo.flags); + SUBMIT_ERROR(submit, "invalid flags: %x\n", submit_bo.flags); ret = -EINVAL; i = 0; goto out; @@ -158,7 +164,7 @@ static int submit_lookup_objects(struct msm_gem_submit *submit, */ obj = idr_find(&file->object_idr, submit->bos[i].handle); if (!obj) { - DRM_ERROR("invalid handle %u at index %u\n", submit->bos[i].handle, i); + SUBMIT_ERROR(submit, "invalid handle %u at index %u\n", submit->bos[i].handle, i); ret = -EINVAL; goto out_unlock; } @@ -202,13 +208,13 @@ static int submit_lookup_cmds(struct msm_gem_submit *submit, case MSM_SUBMIT_CMD_CTX_RESTORE_BUF: break; default: - DRM_ERROR("invalid type: %08x\n", submit_cmd.type); + SUBMIT_ERROR(submit, "invalid type: %08x\n", submit_cmd.type); return -EINVAL; }
if (submit_cmd.size % 4) { - DRM_ERROR("non-aligned cmdstream buffer size: %u\n", - submit_cmd.size); + SUBMIT_ERROR(submit, "non-aligned cmdstream buffer size: %u\n", + submit_cmd.size); ret = -EINVAL; goto out; } @@ -306,8 +312,8 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
fail: if (ret == -EALREADY) { - DRM_ERROR("handle %u at index %u already on submit list\n", - submit->bos[i].handle, i); + SUBMIT_ERROR(submit, "handle %u at index %u already on submit list\n", + submit->bos[i].handle, i); ret = -EINVAL; }
@@ -448,8 +454,8 @@ static int submit_bo(struct msm_gem_submit *submit, uint32_t idx, struct drm_gem_object **obj, uint64_t *iova, bool *valid) { if (idx >= submit->nr_bos) { - DRM_ERROR("invalid buffer index: %u (out of %u)\n", - idx, submit->nr_bos); + SUBMIT_ERROR(submit, "invalid buffer index: %u (out of %u)\n", + idx, submit->nr_bos); return -EINVAL; }
@@ -475,7 +481,7 @@ static int submit_reloc(struct msm_gem_submit *submit, struct drm_gem_object *ob return 0;
if (offset % 4) { - DRM_ERROR("non-aligned cmdstream buffer: %u\n", offset); + SUBMIT_ERROR(submit, "non-aligned cmdstream buffer: %u\n", offset); return -EINVAL; }
@@ -497,8 +503,8 @@ static int submit_reloc(struct msm_gem_submit *submit, struct drm_gem_object *ob bool valid;
if (submit_reloc.submit_offset % 4) { - DRM_ERROR("non-aligned reloc offset: %u\n", - submit_reloc.submit_offset); + SUBMIT_ERROR(submit, "non-aligned reloc offset: %u\n", + submit_reloc.submit_offset); ret = -EINVAL; goto out; } @@ -508,7 +514,7 @@ static int submit_reloc(struct msm_gem_submit *submit, struct drm_gem_object *ob
if ((off >= (obj->size / 4)) || (off < last_offset)) { - DRM_ERROR("invalid offset %u at reloc %u\n", off, i); + SUBMIT_ERROR(submit, "invalid offset %u at reloc %u\n", off, i); ret = -EINVAL; goto out; } @@ -881,7 +887,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data, if (!submit->cmd[i].size || ((submit->cmd[i].size + submit->cmd[i].offset) > obj->size / 4)) { - DRM_ERROR("invalid cmdstream size: %u\n", submit->cmd[i].size * 4); + SUBMIT_ERROR(submit, "invalid cmdstream size: %u\n", submit->cmd[i].size * 4); ret = -EINVAL; goto out; } @@ -893,7 +899,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
if (!gpu->allow_relocs) { if (submit->cmd[i].nr_relocs) { - DRM_ERROR("relocs not allowed\n"); + SUBMIT_ERROR(submit, "relocs not allowed\n"); ret = -EINVAL; goto out; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 3a47f4b439beb98e955d501c609dfd12b7836d61 ]
The "submit->cmd[i].size" and "submit->cmd[i].offset" variables are u32 values that come from the user via the submit_lookup_cmds() function. This addition could lead to an integer wrapping bug so use size_add() to prevent that.
Fixes: 198725337ef1 ("drm/msm: fix cmdstream size check") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/624696/ Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_gem_submit.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index 207b6ba1565d8..018b39546fc1d 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -885,8 +885,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data, goto out;
if (!submit->cmd[i].size || - ((submit->cmd[i].size + submit->cmd[i].offset) > - obj->size / 4)) { + (size_add(submit->cmd[i].size, submit->cmd[i].offset) > obj->size / 4)) { SUBMIT_ERROR(submit, "invalid cmdstream size: %u\n", submit->cmd[i].size * 4); ret = -EINVAL; goto out;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 6b3d638ca897e099fa99bd6d02189d3176f80a47 ]
KMSAN reported a use-after-free issue in eth_skb_pkt_type()[1]. The cause of the issue was that eth_skb_pkt_type() accessed skb's data that didn't contain an Ethernet header. This occurs when bpf_prog_test_run_xdp() passes an invalid value as the user_data argument to bpf_test_init().
Fix this by returning an error when user_data is less than ETH_HLEN in bpf_test_init(). Additionally, remove the check for "if (user_size > size)" as it is unnecessary.
[1] BUG: KMSAN: use-after-free in eth_skb_pkt_type include/linux/etherdevice.h:627 [inline] BUG: KMSAN: use-after-free in eth_type_trans+0x4ee/0x980 net/ethernet/eth.c:165 eth_skb_pkt_type include/linux/etherdevice.h:627 [inline] eth_type_trans+0x4ee/0x980 net/ethernet/eth.c:165 __xdp_build_skb_from_frame+0x5a8/0xa50 net/core/xdp.c:635 xdp_recv_frames net/bpf/test_run.c:272 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x2954/0x3330 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0x148e/0x1b10 net/bpf/test_run.c:1318 bpf_prog_test_run+0x5b7/0xa30 kernel/bpf/syscall.c:4371 __sys_bpf+0x6a6/0xe20 kernel/bpf/syscall.c:5777 __do_sys_bpf kernel/bpf/syscall.c:5866 [inline] __se_sys_bpf kernel/bpf/syscall.c:5864 [inline] __x64_sys_bpf+0xa4/0xf0 kernel/bpf/syscall.c:5864 x64_sys_call+0x2ea0/0x3d90 arch/x86/include/generated/asm/syscalls_64.h:322 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at: free_pages_prepare mm/page_alloc.c:1056 [inline] free_unref_page+0x156/0x1320 mm/page_alloc.c:2657 __free_pages+0xa3/0x1b0 mm/page_alloc.c:4838 bpf_ringbuf_free kernel/bpf/ringbuf.c:226 [inline] ringbuf_map_free+0xff/0x1e0 kernel/bpf/ringbuf.c:235 bpf_map_free kernel/bpf/syscall.c:838 [inline] bpf_map_free_deferred+0x17c/0x310 kernel/bpf/syscall.c:862 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa2b/0x1b60 kernel/workqueue.c:3310 worker_thread+0xedf/0x1550 kernel/workqueue.c:3391 kthread+0x535/0x6b0 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
CPU: 1 UID: 0 PID: 17276 Comm: syz.1.16450 Not tainted 6.12.0-05490-g9bb88c659673 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Fixes: be3d72a2896c ("bpf: move user_size out of bpf_test_init") Reported-by: syzkaller syzkaller@googlegroups.com Suggested-by: Martin KaFai Lau martin.lau@linux.dev Signed-off-by: Shigeru Yoshida syoshida@redhat.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Acked-by: Stanislav Fomichev sdf@fomichev.me Acked-by: Daniel Borkmann daniel@iogearbox.net Link: https://patch.msgid.link/20250121150643.671650-1-syoshida@redhat.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bpf/test_run.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 905de361f8623..73fb9db55798c 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -630,12 +630,9 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, void __user *data_in = u64_to_user_ptr(kattr->test.data_in); void *data;
- if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) + if (user_size < ETH_HLEN || user_size > PAGE_SIZE - headroom - tailroom) return ERR_PTR(-EINVAL);
- if (user_size > size) - return ERR_PTR(-EMSGSIZE); - size = SKB_DATA_ALIGN(size); data = kzalloc(size + headroom + tailroom, GFP_USER); if (!data)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit 98671a0fd1f14e4a518ee06b19037c20014900eb ]
For all BPF maps we ensure that VM_MAYWRITE is cleared when memory-mapping BPF map contents as initially read-only VMA. This is because in some cases BPF verifier relies on the underlying data to not be modified afterwards by user space, so once something is mapped read-only, it shouldn't be re-mmap'ed as read-write.
As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for read-write mappings, and if VM_WRITE is not set, there is no way for user space to upgrade read-only mapping to read-write one.
This patch cleans up this VM_WRITE vs VM_MAYWRITE handling within bpf_map_mmap(), which is an entry point for any BPF map mmap()-ing logic. We also drop unnecessary sanitization of VM_MAYWRITE in BPF ringbuf's map_mmap() callback implementation, as it is already performed by common code in bpf_map_mmap().
Note, though, that in bpf_map_mmap_{open,close}() callbacks we can't drop VM_MAYWRITE use, because it's possible (and is outside of subsystem's control) to have initially read-write memory mapping, which is subsequently dropped to read-only by user space through mprotect(). In such case, from BPF verifier POV it's read-write data throughout the lifetime of BPF map, and is counted as "active writer".
But its VMAs will start out as VM_WRITE|VM_MAYWRITE, then mprotect() can change it to just VM_MAYWRITE (and no VM_WRITE), so when its finally munmap()'ed and bpf_map_mmap_close() is called, vm_flags will be just VM_MAYWRITE, but we still need to decrement active writer count with bpf_map_write_active_dec() as it's still considered to be a read-write mapping by the rest of BPF subsystem.
Similar reasoning applies to bpf_map_mmap_open(), which is called whenever mmap(), munmap(), and/or mprotect() forces mm subsystem to split original VMA into multiple discontiguous VMAs.
Memory-mapping handling is a bit tricky, yes.
Cc: Jann Horn jannh@google.com Cc: Suren Baghdasaryan surenb@google.com Cc: Shakeel Butt shakeel.butt@linux.dev Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20250129012246.1515826-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Stable-dep-of: bc27c52eea18 ("bpf: avoid holding freeze_mutex during mmap operation") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/ringbuf.c | 4 ---- kernel/bpf/syscall.c | 10 ++++++++-- 2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 246559c3e93d0..528f4d6342262 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma /* allow writable mapping for the consumer_pos only */ if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) return -EPERM; - } else { - vm_flags_clear(vma, VM_MAYWRITE); } /* remap_vmalloc_range() checks size and offset constraints */ return remap_vmalloc_range(vma, rb_map->rb, @@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma * position, and the ring buffer data itself. */ return -EPERM; - } else { - vm_flags_clear(vma, VM_MAYWRITE); } /* remap_vmalloc_range() checks size and offset constraints */ return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ba38c08a9a059..98d7558e2f2be 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -912,15 +912,21 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) vma->vm_ops = &bpf_map_default_vmops; vma->vm_private_data = map; vm_flags_clear(vma, VM_MAYEXEC); + /* If mapping is read-only, then disallow potentially re-mapping with + * PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing + * means that as far as BPF map's memory-mapped VMAs are concerned, + * VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set, + * both should be set, so we can forget about VM_MAYWRITE and always + * check just VM_WRITE + */ if (!(vma->vm_flags & VM_WRITE)) - /* disallow re-mapping with PROT_WRITE */ vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma); if (err) goto out;
- if (vma->vm_flags & VM_MAYWRITE) + if (vma->vm_flags & VM_WRITE) bpf_map_write_active_inc(map); out: mutex_unlock(&map->freeze_mutex);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit bc27c52eea189e8f7492d40739b7746d67b65beb ]
We use map->freeze_mutex to prevent races between map_freeze() and memory mapping BPF map contents with writable permissions. The way we naively do this means we'll hold freeze_mutex for entire duration of all the mm and VMA manipulations, which is completely unnecessary. This can potentially also lead to deadlocks, as reported by syzbot in [0].
So, instead, hold freeze_mutex only during writeability checks, bump (proactively) "write active" count for the map, unlock the mutex and proceed with mmap logic. And only if something went wrong during mmap logic, then undo that "write active" counter increment.
[0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@google.com/
Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") Reported-by: syzbot+4dc041c686b7c816a71e@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20250129012246.1515826-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 98d7558e2f2be..9f791b6b09edc 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -882,7 +882,7 @@ static const struct vm_operations_struct bpf_map_default_vmops = { static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) { struct bpf_map *map = filp->private_data; - int err; + int err = 0;
if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record)) return -ENOTSUPP; @@ -906,7 +906,12 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) err = -EACCES; goto out; } + bpf_map_write_active_inc(map); } +out: + mutex_unlock(&map->freeze_mutex); + if (err) + return err;
/* set default open/close callbacks */ vma->vm_ops = &bpf_map_default_vmops; @@ -923,13 +928,11 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma); - if (err) - goto out; + if (err) { + if (vma->vm_flags & VM_WRITE) + bpf_map_write_active_dec(map); + }
- if (vma->vm_flags & VM_WRITE) - bpf_map_write_active_inc(map); -out: - mutex_unlock(&map->freeze_mutex); return err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiayuan Chen mrpre@163.com
[ Upstream commit 0532a79efd68a4d9686b0385e4993af4b130ff82 ]
Added a new read_sock handler, allowing users to customize read operations instead of relying on the native socket's read_sock.
Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Reviewed-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://patch.msgid.link/20250122100917.49845-2-mrpre@163.com Stable-dep-of: 36b62df5683c ("bpf: Fix wrong copied_seq calculation") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/strparser.rst | 9 ++++++++- include/net/strparser.h | 2 ++ net/strparser/strparser.c | 11 +++++++++-- 3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/strparser.rst b/Documentation/networking/strparser.rst index 6cab1f74ae05a..7f623d1db72aa 100644 --- a/Documentation/networking/strparser.rst +++ b/Documentation/networking/strparser.rst @@ -112,7 +112,7 @@ Functions Callbacks =========
-There are six callbacks: +There are seven callbacks:
::
@@ -182,6 +182,13 @@ There are six callbacks: the length of the message. skb->len - offset may be greater then full_len since strparser does not trim the skb.
+ :: + + int (*read_sock)(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); + + The read_sock callback is used by strparser instead of + sock->ops->read_sock, if provided. ::
int (*read_sock_done)(struct strparser *strp, int err); diff --git a/include/net/strparser.h b/include/net/strparser.h index 41e2ce9e9e10f..0a83010b3a64a 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -43,6 +43,8 @@ struct strparser; struct strp_callbacks { int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); + int (*read_sock)(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); int (*read_sock_done)(struct strparser *strp, int err); void (*abort_parser)(struct strparser *strp, int err); void (*lock)(struct strparser *strp); diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index 8299ceb3e3739..95696f42647ec 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -347,7 +347,10 @@ static int strp_read_sock(struct strparser *strp) struct socket *sock = strp->sk->sk_socket; read_descriptor_t desc;
- if (unlikely(!sock || !sock->ops || !sock->ops->read_sock)) + if (unlikely(!sock || !sock->ops)) + return -EBUSY; + + if (unlikely(!strp->cb.read_sock && !sock->ops->read_sock)) return -EBUSY;
desc.arg.data = strp; @@ -355,7 +358,10 @@ static int strp_read_sock(struct strparser *strp) desc.count = 1; /* give more than one skb per call */
/* sk should be locked here, so okay to do read_sock */ - sock->ops->read_sock(strp->sk, &desc, strp_recv); + if (strp->cb.read_sock) + strp->cb.read_sock(strp, &desc, strp_recv); + else + sock->ops->read_sock(strp->sk, &desc, strp_recv);
desc.error = strp->cb.read_sock_done(strp, desc.error);
@@ -468,6 +474,7 @@ int strp_init(struct strparser *strp, struct sock *sk, strp->cb.unlock = cb->unlock ? : strp_sock_unlock; strp->cb.rcv_msg = cb->rcv_msg; strp->cb.parse_msg = cb->parse_msg; + strp->cb.read_sock = cb->read_sock; strp->cb.read_sock_done = cb->read_sock_done ? : default_read_sock_done; strp->cb.abort_parser = cb->abort_parser ? : strp_abort_strp;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiayuan Chen mrpre@163.com
[ Upstream commit 36b62df5683c315ba58c950f1a9c771c796c30ec ]
'sk->copied_seq' was updated in the tcp_eat_skb() function when the action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, the update logic for 'sk->copied_seq' was moved to tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
It works for a single stream_verdict scenario, as it also modified sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb to remove updating 'sk->copied_seq'.
However, for programs where both stream_parser and stream_verdict are active (strparser purpose), tcp_read_sock() was used instead of tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock). tcp_read_sock() now still updates 'sk->copied_seq', leading to duplicate updates.
In summary, for strparser + SK_PASS, copied_seq is redundantly calculated in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
The issue causes incorrect copied_seq calculations, which prevent correct data reads from the recv() interface in user-land.
We do not want to add new proto_ops to implement a new version of tcp_read_sock, as this would introduce code complexity [1].
We could have added noack and copied_seq to desc, and then called ops->read_sock. However, unfortunately, other modules didn’t fully initialize desc to zero. So, for now, we are directly calling tcp_read_sock_noack() in tcp_bpf.c.
[1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com
Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Suggested-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Reviewed-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://patch.msgid.link/20250122100917.49845-3-mrpre@163.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/skmsg.h | 2 ++ include/net/tcp.h | 8 ++++++++ net/core/skmsg.c | 7 +++++++ net/ipv4/tcp.c | 29 ++++++++++++++++++++++++----- net/ipv4/tcp_bpf.c | 36 ++++++++++++++++++++++++++++++++++++ 5 files changed, 77 insertions(+), 5 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6ccfd9236387c..32bbebf5b71e3 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -87,6 +87,8 @@ struct sk_psock { struct sk_psock_progs progs; #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) struct strparser strp; + u32 copied_seq; + u32 ingress_bytes; #endif struct sk_buff_head ingress_skb; struct list_head ingress_msg; diff --git a/include/net/tcp.h b/include/net/tcp.h index 78c755414fa87..a6def0aab3ed3 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -690,6 +690,9 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq); int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off); void tcp_read_done(struct sock *sk, size_t len); @@ -2404,6 +2407,11 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); +#ifdef CONFIG_BPF_STREAM_PARSER +struct strparser; +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); +#endif /* CONFIG_BPF_STREAM_PARSER */ #endif /* CONFIG_BPF_SYSCALL */
#ifdef CONFIG_INET diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 902098e221b39..b9b941c487c8a 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -548,6 +548,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, return num_sge; }
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) + psock->ingress_bytes += len; +#endif copied = len; msg->sg.start = 0; msg->sg.size = copied; @@ -1143,6 +1146,10 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock) if (!ret) sk_psock_set_state(psock, SK_PSOCK_RX_STRP_ENABLED);
+ if (sk_is_tcp(sk)) { + psock->strp.cb.read_sock = tcp_bpf_strp_read_sock; + psock->copied_seq = tcp_sk(sk)->copied_seq; + } return ret; }
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5e6615f69f175..7ad82be40f348 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1553,12 +1553,13 @@ EXPORT_SYMBOL(tcp_recv_skb); * or for 'peeking' the socket using this routine * (although both would be easy to implement). */ -int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq) { struct sk_buff *skb; struct tcp_sock *tp = tcp_sk(sk); - u32 seq = tp->copied_seq; + u32 seq = *copied_seq; u32 offset; int copied = 0;
@@ -1612,9 +1613,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, tcp_eat_recv_skb(sk, skb); if (!desc->count) break; - WRITE_ONCE(tp->copied_seq, seq); + WRITE_ONCE(*copied_seq, seq); } - WRITE_ONCE(tp->copied_seq, seq); + WRITE_ONCE(*copied_seq, seq); + + if (noack) + goto out;
tcp_rcv_space_adjust(sk);
@@ -1623,10 +1627,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, tcp_recv_skb(sk, seq, &offset); tcp_cleanup_rbuf(sk, copied); } +out: return copied; } + +int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + return __tcp_read_sock(sk, desc, recv_actor, false, + &tcp_sk(sk)->copied_seq); +} EXPORT_SYMBOL(tcp_read_sock);
+int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq) +{ + return __tcp_read_sock(sk, desc, recv_actor, noack, copied_seq); +} + int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct sk_buff *skb; diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index f882054fae5ee..5312237e80409 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -646,6 +646,42 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendmsg == tcp_sendmsg ? 0 : -ENOTSUPP; }
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + struct sock *sk = strp->sk; + struct sk_psock *psock; + struct tcp_sock *tp; + int copied = 0; + + tp = tcp_sk(sk); + rcu_read_lock(); + psock = sk_psock(sk); + if (WARN_ON_ONCE(!psock)) { + desc->error = -EINVAL; + goto out; + } + + psock->ingress_bytes = 0; + copied = tcp_read_sock_noack(sk, desc, recv_actor, true, + &psock->copied_seq); + if (copied < 0) + goto out; + /* recv_actor may redirect skb to another socket (SK_REDIRECT) or + * just put skb into ingress queue of current socket (SK_PASS). + * For SK_REDIRECT, we need to ack the frame immediately but for + * SK_PASS, we want to delay the ack until tcp_bpf_recvmsg_parser(). + */ + tp->copied_seq = psock->copied_seq - psock->ingress_bytes; + tcp_rcv_space_adjust(sk); + __tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes); +out: + rcu_read_unlock(); + return copied; +} +#endif /* CONFIG_BPF_STREAM_PARSER */ + int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) { int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiayuan Chen mrpre@163.com
[ Upstream commit 5459cce6bf49e72ee29be21865869c2ac42419f5 ]
Currently, only TCP supports strparser, but sockmap doesn't intercept non-TCP connections to attach strparser. For example, with UDP, although the read/write handlers are replaced, strparser is not executed due to the lack of a read_sock operation.
Furthermore, in udp_bpf_recvmsg(), it checks whether the psock has data, and if not, it falls back to the native UDP read interface, making UDP + strparser appear to read correctly. According to its commit history, this behavior is unexpected.
Moreover, since UDP lacks the concept of streams, we intercept it directly.
Fixes: 1fa1fe8ff161 ("bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0") Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Acked-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://patch.msgid.link/20250122100917.49845-4-mrpre@163.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index dcc0f31a17a8d..3a53b6a0e76e2 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -300,7 +300,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
write_lock_bh(&sk->sk_callback_lock); if (stream_parser && stream_verdict && !psock->saved_data_ready) { - ret = sk_psock_init_strp(sk, psock); + if (sk_is_tcp(sk)) + ret = sk_psock_init_strp(sk, psock); + else + ret = -EOPNOTSUPP; if (ret) { write_unlock_bh(&sk->sk_callback_lock); sk_psock_put(sk, psock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abel Wu wuyun.abel@bytedance.com
[ Upstream commit c78f4afbd962f43a3989f45f3ca04300252b19b5 ]
The following commit bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]") first introduced deadlock prevention for fentry/fexit programs attaching on bpf_task_storage helpers. That commit also employed the logic in map free path in its v6 version.
Later bpf_cgrp_storage was first introduced in c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") which faces the same issue as bpf_task_storage, instead of its busy counter, NULL was passed to bpf_local_storage_map_free() which opened a window to cause deadlock:
<TASK> (acquiring local_storage->lock) _raw_spin_lock_irqsave+0x3d/0x50 bpf_local_storage_update+0xd1/0x460 bpf_cgrp_storage_get+0x109/0x130 bpf_prog_a4d4a370ba857314_cgrp_ptr+0x139/0x170 ? __bpf_prog_enter_recur+0x16/0x80 bpf_trampoline_6442485186+0x43/0xa4 cgroup_storage_ptr+0x9/0x20 (holding local_storage->lock) bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 bpf_selem_unlink_storage+0x6f/0x110 bpf_local_storage_map_free+0xa2/0x110 bpf_map_free_deferred+0x5b/0x90 process_one_work+0x17c/0x390 worker_thread+0x251/0x360 kthread+0xd2/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK>
Progs: - A: SEC("fentry/cgroup_storage_ptr") - cgid (BPF_MAP_TYPE_HASH) Record the id of the cgroup the current task belonging to in this hash map, using the address of the cgroup as the map key. - cgrpa (BPF_MAP_TYPE_CGRP_STORAGE) If current task is a kworker, lookup the above hash map using function parameter @owner as the key to get its corresponding cgroup id which is then used to get a trusted pointer to the cgroup through bpf_cgroup_from_id(). This trusted pointer can then be passed to bpf_cgrp_storage_get() to finally trigger the deadlock issue. - B: SEC("tp_btf/sys_enter") - cgrpb (BPF_MAP_TYPE_CGRP_STORAGE) The only purpose of this prog is to fill Prog A's hash map by calling bpf_cgrp_storage_get() for as many userspace tasks as possible.
Steps to reproduce: - Run A; - while (true) { Run B; Destroy B; }
Fix this issue by passing its busy counter to the free procedure so it can be properly incremented before storage/smap locking.
Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") Signed-off-by: Abel Wu wuyun.abel@bytedance.com Acked-by: Martin KaFai Lau martin.lau@kernel.org Link: https://lore.kernel.org/r/20241221061018.37717-1-wuyun.abel@bytedance.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/bpf_cgrp_storage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c index d44fe8dd97329..ee1c7b77096e7 100644 --- a/kernel/bpf/bpf_cgrp_storage.c +++ b/kernel/bpf/bpf_cgrp_storage.c @@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
static void cgroup_storage_map_free(struct bpf_map *map) { - bpf_local_storage_map_free(map, &cgroup_cache, NULL); + bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy); }
/* *gfp_flags* is a hidden argument provided by the verifier */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Vatoropin a.vatoropin@crpt.ru
[ Upstream commit 3fb3cb4350befc4f901c54e0cb4a2a47b1302e08 ]
Size of variable sd_gain equals four bytes - DA9150_QIF_SD_GAIN_SIZE. Size of variable shunt_val equals two bytes - DA9150_QIF_SHUNT_VAL_SIZE.
The expression sd_gain * shunt_val is currently being evaluated using 32-bit arithmetic. So during the multiplication an overflow may occur.
As the value of type 'u64' is used as storage for the eventual result, put ULL variable at the first position of each expression in order to give the compiler complete information about the proper arithmetic to use. According to C99 the guaranteed width for a variable of type 'unsigned long long' >= 64 bits.
Remove the explicit cast to u64 as it is meaningless.
Just for the sake of consistency, perform the similar trick with another expression concerning 'iavg'.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a419b4fd9138 ("power: Add support for DA9150 Fuel-Gauge") Signed-off-by: Andrey Vatoropin a.vatoropin@crpt.ru Link: https://lore.kernel.org/r/20250130090030.53422-1-a.vatoropin@crpt.ru Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/da9150-fg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/power/supply/da9150-fg.c b/drivers/power/supply/da9150-fg.c index 652c1f213af1c..4f28ef1bba1a3 100644 --- a/drivers/power/supply/da9150-fg.c +++ b/drivers/power/supply/da9150-fg.c @@ -247,9 +247,9 @@ static int da9150_fg_current_avg(struct da9150_fg *fg, DA9150_QIF_SD_GAIN_SIZE); da9150_fg_read_sync_end(fg);
- div = (u64) (sd_gain * shunt_val * 65536ULL); + div = 65536ULL * sd_gain * shunt_val; do_div(div, 1000000); - res = (u64) (iavg * 1000000ULL); + res = 1000000ULL * iavg; do_div(res, div);
val->intval = (int) res;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
[ Upstream commit b3fefbb30a1691533cb905006b69b2a474660744 ]
In case we have to retry the loop, we are missing to unlock+put the folio. In that case, we will keep failing make_device_exclusive_range() because we cannot grab the folio lock, and even return from the function with the folio locked and referenced, effectively never succeeding the make_device_exclusive_range().
While at it, convert the other unlock+put to use a folio as well.
This was found by code inspection.
Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access") Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Alistair Popple apopple@nvidia.com Tested-by: Alistair Popple apopple@nvidia.com Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20250124181524.3584236-2-david... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index ec9f307370fa8..6c71f6738ca51 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -593,6 +593,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, unsigned long timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); struct mm_struct *mm = svmm->notifier.mm; + struct folio *folio; struct page *page; unsigned long start = args->p.addr; unsigned long notifier_seq; @@ -619,12 +620,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = -EINVAL; goto out; } + folio = page_folio(page);
mutex_lock(&svmm->mutex); if (!mmu_interval_read_retry(¬ifier->notifier, notifier_seq)) break; mutex_unlock(&svmm->mutex); + + folio_unlock(folio); + folio_put(folio); }
/* Map the page on the GPU. */ @@ -640,8 +645,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); mutex_unlock(&svmm->mutex);
- unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio);
out: mmu_interval_notifier_remove(¬ifier->notifier);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit 669c285620231786fffe9d87ab432e08a6ed922b ]
If userspace is trying to achieve a timeout of zero, let 'em have it. Only round up if the timeout is greater than zero.
Fixes: 4969bccd5f4e ("drm/msm: Avoid rounding down to zero jiffies") Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Akhil P Oommen quic_akhilpo@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/632264/ Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_drv.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index 48e1a8c6942c9..223bf904235a8 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -533,15 +533,12 @@ static inline int align_pitch(int width, int bpp) static inline unsigned long timeout_to_jiffies(const ktime_t *timeout) { ktime_t now = ktime_get(); - s64 remaining_jiffies;
- if (ktime_compare(*timeout, now) < 0) { - remaining_jiffies = 0; - } else { - ktime_t rem = ktime_sub(*timeout, now); - remaining_jiffies = ktime_divns(rem, NSEC_PER_SEC / HZ); - } + if (ktime_compare(*timeout, now) <= 0) + return 0;
+ ktime_t rem = ktime_sub(*timeout, now); + s64 remaining_jiffies = ktime_divns(rem, NSEC_PER_SEC / HZ); return clamp(remaining_jiffies, 1LL, (s64)INT_MAX); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Caleb Sander Mateos csander@purestorage.com
[ Upstream commit 487a3ea7b1b8ba2ca7d2c2bb3c3594dc360d6261 ]
nvme_validate_passthru_nsid() logs an err message whose format string is split over 2 lines. There is a missing space between the two pieces, resulting in log lines like "... does not match nsid (1)of namespace". Add the missing space between ")" and "of". Also combine the format string pieces onto a single line to make the err message easier to grep.
Fixes: e7d4b5493a2d ("nvme: factor out a nvme_validate_passthru_nsid helper") Signed-off-by: Caleb Sander Mateos csander@purestorage.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/ioctl.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 19a7f0160618d..4ce31f9f06947 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -336,8 +336,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, { if (ns && nsid != ns->head->ns_id) { dev_err(ctrl->device, - "%s: nsid (%u) in cmd does not match nsid (%u)" - "of namespace\n", + "%s: nsid (%u) in cmd does not match nsid (%u) of namespace\n", current->comm, nsid, ns->head->ns_id); return false; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yan Zhai yan@cloudflare.com
[ Upstream commit 5644c6b50ffee0a56c1e01430a8c88e34decb120 ]
The generic_map_lookup_batch currently returns EINTR if it fails with ENOENT and retries several times on bpf_map_copy_value. The next batch would start from the same location, presuming it's a transient issue. This is incorrect if a map can actually have "holes", i.e. "get_next_key" can return a key that does not point to a valid value. At least the array of maps type may contain such holes legitly. Right now these holes show up, generic batch lookup cannot proceed any more. It will always fail with EINTR errors.
Rather, do not retry in generic_map_lookup_batch. If it finds a non existing element, skip to the next key. This simple solution comes with a price that transient errors may not be recovered, and the iteration might cycle back to the first key under parallel deletion. For example, Hou Tao houtao@huaweicloud.com pointed out a following scenario:
For LPM trie map: (1) ->map_get_next_key(map, prev_key, key) returns a valid key
(2) bpf_map_copy_value() return -ENOMENT It means the key must be deleted concurrently.
(3) goto next_key It swaps the prev_key and key
(4) ->map_get_next_key(map, prev_key, key) again prev_key points to a non-existing key, for LPM trie it will treat just like prev_key=NULL case, the returned key will be duplicated.
With the retry logic, the iteration can continue to the key next to the deleted one. But if we directly skip to the next key, the iteration loop would restart from the first key for the lpm_trie type.
However, not all races may be recovered. For example, if current key is deleted after instead of before bpf_map_copy_value, or if the prev_key also gets deleted, then the loop will still restart from the first key for lpm_tire anyway. For generic lookup it might be better to stay simple, i.e. just skip to the next key. To guarantee that the output keys are not duplicated, it is better to implement map type specific batch operations, which can properly lock the trie and synchronize with concurrent mutators.
Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op") Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/ Signed-off-by: Yan Zhai yan@cloudflare.com Acked-by: Hou Tao houtao1@huawei.com Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.173917159... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 9f791b6b09edc..f089a61630111 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1807,8 +1807,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, return err; }
-#define MAP_LOOKUP_RETRIES 3 - int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) @@ -1818,8 +1816,8 @@ int generic_map_lookup_batch(struct bpf_map *map, void __user *values = u64_to_user_ptr(attr->batch.values); void __user *keys = u64_to_user_ptr(attr->batch.keys); void *buf, *buf_prevkey, *prev_key, *key, *value; - int err, retry = MAP_LOOKUP_RETRIES; u32 value_size, cp, max_count; + int err;
if (attr->batch.elem_flags & ~BPF_F_LOCK) return -EINVAL; @@ -1865,14 +1863,8 @@ int generic_map_lookup_batch(struct bpf_map *map, err = bpf_map_copy_value(map, key, value, attr->batch.elem_flags);
- if (err == -ENOENT) { - if (retry) { - retry--; - continue; - } - err = -EINTR; - break; - } + if (err == -ENOENT) + goto next_key;
if (err) goto free_buf; @@ -1887,12 +1879,12 @@ int generic_map_lookup_batch(struct bpf_map *map, goto free_buf; }
+ cp++; +next_key: if (!prev_key) prev_key = buf_prevkey;
swap(prev_key, key); - retry = MAP_LOOKUP_RETRIES; - cp++; cond_resched(); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaron Kling webgeek1234@gmail.com
[ Upstream commit 3dbc0215e3c502a9f3221576da0fdc9847fb9721 ]
Most kernel configs enable multiple Tegra SoC generations, causing this typo to go unnoticed. But in the case where a kernel config is strictly for Tegra186, this is a problem.
Fixes: 989863d7cbe5 ("drm/nouveau/pmu: select implementation based on available firmware") Signed-off-by: Aaron Kling webgeek1234@gmail.com Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20250218-nouveau-gm10b-guard-v... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c index a6f410ba60bc9..d393bc540f862 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c @@ -75,7 +75,7 @@ gp10b_pmu_acr = { .bootstrap_multiple_falcons = gp10b_pmu_acr_bootstrap_multiple_falcons, };
-#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) +#if IS_ENABLED(CONFIG_ARCH_TEGRA_186_SOC) MODULE_FIRMWARE("nvidia/gp10b/pmu/desc.bin"); MODULE_FIRMWARE("nvidia/gp10b/pmu/image.bin"); MODULE_FIRMWARE("nvidia/gp10b/pmu/sig.bin");
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen-Yu Tsai wenst@chromium.org
[ Upstream commit 26f6e91fa29a58fdc76b47f94f8f6027944a490c ]
Most SoC dtsi files have the display output interfaces disabled by default, and only enabled on boards that utilize them. The MT8183 has it backwards: the display outputs are left enabled by default, and only disabled at the board level.
Reverse the situation for the DSI output so that it follows the normal scheme. For ease of backporting the DPI output is handled in a separate patch.
Fixes: 88ec840270e6 ("arm64: dts: mt8183: Add dsi node") Fixes: 19b6403f1e2a ("arm64: dts: mt8183: add mt8183 pumpkin board") Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai wenst@chromium.org Reviewed-by: Fei Shao fshao@chromium.org Link: https://lore.kernel.org/r/20241025075630.3917458-2-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi index d1b6355148620..f3a1b96f1ee4d 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi @@ -1828,6 +1828,7 @@ dsi0: dsi@14014000 { resets = <&mmsys MT8183_MMSYS_SW0_RST_B_DISP_DSI0>; phys = <&mipi_tx0>; phy-names = "dphy"; + status = "disabled"; };
mutex: mutex@14016000 {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jessica Zhang quic_jesszhan@quicinc.com
commit f063ac6b55df03ed25996bdc84d9e1c50147cfa1 upstream.
Disable pingpong dither in dpu_encoder_helper_phys_cleanup().
This avoids the issue where an encoder unknowingly uses dither after reserving a pingpong block that was previously bound to an encoder that had enabled dither.
Cc: stable@vger.kernel.org Reported-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Closes: https://lore.kernel.org/all/jr7zbj5w7iq4apg3gofuvcwf4r2swzqjk7sshwcdjll4mn6c... Signed-off-by: Jessica Zhang quic_jesszhan@quicinc.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Fixes: 3c128638a07d ("drm/msm/dpu: add support for dither block in display") Patchwork: https://patchwork.freedesktop.org/patch/636517/ Link: https://lore.kernel.org/r/20250211-dither-disable-v1-1-ac2cb455f6b9@quicinc.... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c @@ -2075,6 +2075,9 @@ void dpu_encoder_helper_phys_cleanup(str } }
+ if (phys_enc->hw_pp && phys_enc->hw_pp->ops.setup_dither) + phys_enc->hw_pp->ops.setup_dither(phys_enc->hw_pp, NULL); + /* reset the merge 3D HW block */ if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) { phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 07fb70d82e0df085980246bf17bc12537588795f upstream.
Any active plane needs to have its crtc included in the atomic state. For planes enabled via uapi that is all handler in the core. But when we use a plane for joiner the uapi code things the plane is disabled and therefore doesn't have a crtc. So we need to pull those in by hand. We do it first thing in intel_joiner_add_affected_crtcs() so that any newly added crtc will subsequently pull in all of its joined crtcs as well.
The symptoms from failing to do this are: - duct tape in the form of commit 1d5b09f8daf8 ("drm/i915: Fix NULL ptr deref by checking new_crtc_state") - the plane's hw state will get overwritten by the disabled uapi state if it can't find the uapi counterpart plane in the atomic state from where it should copy the correct state
Cc: stable@vger.kernel.org Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250212164330.16891-2-ville.s... (cherry picked from commit 91077d1deb5374eb8be00fb391710f00e751dc4b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -6141,12 +6141,30 @@ static int intel_async_flip_check_hw(str static int intel_bigjoiner_add_affected_crtcs(struct intel_atomic_state *state) { struct drm_i915_private *i915 = to_i915(state->base.dev); + const struct intel_plane_state *plane_state; struct intel_crtc_state *crtc_state; + struct intel_plane *plane; struct intel_crtc *crtc; u8 affected_pipes = 0; u8 modeset_pipes = 0; int i;
+ /* + * Any plane which is in use by the joiner needs its crtc. + * Pull those in first as this will not have happened yet + * if the plane remains disabled according to uapi. + */ + for_each_new_intel_plane_in_state(state, plane, plane_state, i) { + crtc = to_intel_crtc(plane_state->hw.crtc); + if (!crtc) + continue; + + crtc_state = intel_atomic_get_crtc_state(&state->base, crtc); + if (IS_ERR(crtc_state)) + return PTR_ERR(crtc_state); + } + + /* Now pull in all joined crtcs */ for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) { affected_pipes |= crtc_state->bigjoiner_pipes; if (intel_crtc_needs_modeset(crtc_state))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit b9275eabe31e6679ae12c46a4a0a18d622db4570 upstream.
At the end of a 128b/132b link training sequence, the HW expects the transcoder training pattern to be set to TPS2 and from that to normal mode (disabling the training pattern). Transitioning from TPS1 directly to normal mode leaves the transcoder in a stuck state, resulting in page-flip timeouts later in the modeset sequence.
Atm, in case of a failure during link training, the transcoder may be still set to output the TPS1 pattern. Later the transcoder is then set from TPS1 directly to normal mode in intel_dp_stop_link_train(), leading to modeset failures later as described above. Fix this by setting the training patter to TPS2, if the link training failed at any point.
The clue in the specification about the above HW behavior is the explicit mention that TPS2 must be set after the link training sequence (and there isn't a similar requirement specified for the 8b/10b link training), see the Bspec links below.
v2: Add bspec aspect/link to the commit log. (Jani)
Bspec: 54128, 65448, 68849 Cc: stable@vger.kernel.org # v5.18+ Cc: Jani Nikula jani.nikula@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Acked-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250217223828.1166093-2-imre.... Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com (cherry picked from commit 8b4bbaf8ddc1f68f3ee96a706f65fdb1bcd9d355) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_dp_link_training.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c +++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c @@ -1364,7 +1364,7 @@ intel_dp_128b132b_link_train(struct inte
if (wait_for(intel_dp_128b132b_intra_hop(intel_dp, crtc_state) == 0, 500)) { lt_err(intel_dp, DP_PHY_DPRX, "128b/132b intra-hop not clear\n"); - return false; + goto out; }
if (intel_dp_128b132b_lane_eq(intel_dp, crtc_state) && @@ -1376,6 +1376,19 @@ intel_dp_128b132b_link_train(struct inte passed ? "passed" : "failed", crtc_state->port_clock, crtc_state->lane_count);
+out: + /* + * Ensure that the training pattern does get set to TPS2 even in case + * of a failure, as is the case at the end of a passing link training + * and what is expected by the transcoder. Leaving TPS1 set (and + * disabling the link train mode in DP_TP_CTL later from TPS1 directly) + * would result in a stuck transcoder HW state and flip-done timeouts + * later in the modeset sequence. + */ + if (!passed) + intel_dp_program_link_training_pattern(intel_dp, crtc_state, + DP_PHY_DPRX, DP_TRAINING_PATTERN_2); + return passed; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit 1e988c3fe1264708f4f92109203ac5b1d65de50b upstream.
sqe->opcode is used for different tables, make sure we santitise it against speculations.
Cc: stable@vger.kernel.org Fixes: d3656344fea03 ("io_uring: add lookup table for various opcode needs") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Li Zetao lizetao1@huawei.com Link: https://lore.kernel.org/r/7eddbf31c8ca0a3947f8ed98271acc2b4349c016.173956840... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2176,6 +2176,8 @@ static int io_init_req(struct io_ring_ct req->opcode = 0; return io_init_fail_req(req, -EINVAL); } + opcode = array_index_nospec(opcode, IORING_OP_LAST); + def = &io_issue_defs[opcode]; if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) { /* enforce forwards compatibility on users */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sumit Garg sumit.garg@linaro.org
commit 70b0d6b0a199c5a3ee6c72f5e61681ed6f759612 upstream.
OP-TEE supplicant is a user-space daemon and it's possible for it be hung or crashed or killed in the middle of processing an OP-TEE RPC call. It becomes more complicated when there is incorrect shutdown ordering of the supplicant process vs the OP-TEE client application which can eventually lead to system hang-up waiting for the closure of the client application.
Allow the client process waiting in kernel for supplicant response to be killed rather than indefinitely waiting in an unkillable state. Also, a normal uninterruptible wait should not have resulted in the hung-task watchdog getting triggered, but the endless loop would.
This fixes issues observed during system reboot/shutdown when supplicant got hung for some reason or gets crashed/killed which lead to client getting hung in an unkillable state. It in turn lead to system being in hung up state requiring hard power off/on to recover.
Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver") Suggested-by: Arnd Bergmann arnd@arndb.de Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg sumit.garg@linaro.org Reviewed-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tee/optee/supp.c | 35 ++++++++--------------------------- 1 file changed, 8 insertions(+), 27 deletions(-)
--- a/drivers/tee/optee/supp.c +++ b/drivers/tee/optee/supp.c @@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_conte struct optee *optee = tee_get_drvdata(ctx->teedev); struct optee_supp *supp = &optee->supp; struct optee_supp_req *req; - bool interruptable; u32 ret;
/* @@ -111,36 +110,18 @@ u32 optee_supp_thrd_req(struct tee_conte /* * Wait for supplicant to process and return result, once we've * returned from wait_for_completion(&req->c) successfully we have - * exclusive access again. + * exclusive access again. Allow the wait to be killable such that + * the wait doesn't turn into an indefinite state if the supplicant + * gets hung for some reason. */ - while (wait_for_completion_interruptible(&req->c)) { + if (wait_for_completion_killable(&req->c)) { mutex_lock(&supp->mutex); - interruptable = !supp->ctx; - if (interruptable) { - /* - * There's no supplicant available and since the - * supp->mutex currently is held none can - * become available until the mutex released - * again. - * - * Interrupting an RPC to supplicant is only - * allowed as a way of slightly improving the user - * experience in case the supplicant hasn't been - * started yet. During normal operation the supplicant - * will serve all requests in a timely manner and - * interrupting then wouldn't make sense. - */ - if (req->in_queue) { - list_del(&req->link); - req->in_queue = false; - } + if (req->in_queue) { + list_del(&req->link); + req->in_queue = false; } mutex_unlock(&supp->mutex); - - if (interruptable) { - req->ret = TEEC_ERROR_COMMUNICATION; - break; - } + req->ret = TEEC_ERROR_COMMUNICATION; }
ret = req->ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gavrilov Ilia Ilia.Gavrilov@infotecs.ru
commit 07b598c0e6f06a0f254c88dafb4ad50f8a8c6eea upstream.
Syzkaller reports the following bug:
BUG: spinlock bad magic on CPU#1, syz-executor.0/7995 lock: 0xffff88805303f3e0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 1 PID: 7995 Comm: syz-executor.0 Tainted: G E 5.10.209+ #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x119/0x179 lib/dump_stack.c:118 debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline] do_raw_spin_lock+0x1f6/0x270 kernel/locking/spinlock_debug.c:112 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159 reset_per_cpu_data+0xe6/0x240 [drop_monitor] net_dm_cmd_trace+0x43d/0x17a0 [drop_monitor] genl_family_rcv_msg_doit+0x22f/0x330 net/netlink/genetlink.c:739 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0x341/0x5a0 net/netlink/genetlink.c:800 netlink_rcv_skb+0x14d/0x440 net/netlink/af_netlink.c:2497 genl_rcv+0x29/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x54b/0x800 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:651 [inline] __sock_sendmsg+0x157/0x190 net/socket.c:663 ____sys_sendmsg+0x712/0x870 net/socket.c:2378 ___sys_sendmsg+0xf8/0x170 net/socket.c:2432 __sys_sendmsg+0xea/0x1b0 net/socket.c:2461 do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x62/0xc7 RIP: 0033:0x7f3f9815aee9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f972bf0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f3f9826d050 RCX: 00007f3f9815aee9 RDX: 0000000020000000 RSI: 0000000020001300 RDI: 0000000000000007 RBP: 00007f3f981b63bd R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000006e R14: 00007f3f9826d050 R15: 00007ffe01ee6768
If drop_monitor is built as a kernel module, syzkaller may have time to send a netlink NET_DM_CMD_START message during the module loading. This will call the net_dm_monitor_start() function that uses a spinlock that has not yet been initialized.
To fix this, let's place resource initialization above the registration of a generic netlink family.
Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov Ilia.Gavrilov@infotecs.ru Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://patch.msgid.link/20250213152054.2785669-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/drop_monitor.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-)
--- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -1731,30 +1731,30 @@ static int __init init_net_drop_monitor( return -ENOSPC; }
- rc = genl_register_family(&net_drop_monitor_family); - if (rc) { - pr_err("Could not create drop monitor netlink family\n"); - return rc; + for_each_possible_cpu(cpu) { + net_dm_cpu_data_init(cpu); + net_dm_hw_cpu_data_init(cpu); } - WARN_ON(net_drop_monitor_family.mcgrp_offset != NET_DM_GRP_ALERT);
rc = register_netdevice_notifier(&dropmon_net_notifier); if (rc < 0) { pr_crit("Failed to register netdevice notifier\n"); + return rc; + } + + rc = genl_register_family(&net_drop_monitor_family); + if (rc) { + pr_err("Could not create drop monitor netlink family\n"); goto out_unreg; } + WARN_ON(net_drop_monitor_family.mcgrp_offset != NET_DM_GRP_ALERT);
rc = 0;
- for_each_possible_cpu(cpu) { - net_dm_cpu_data_init(cpu); - net_dm_hw_cpu_data_init(cpu); - } - goto out;
out_unreg: - genl_unregister_family(&net_drop_monitor_family); + WARN_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); out: return rc; } @@ -1763,19 +1763,18 @@ static void exit_net_drop_monitor(void) { int cpu;
- BUG_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); - /* * Because of the module_get/put we do in the trace state change path * we are guaranteed not to have any current users when we get here */ + BUG_ON(genl_unregister_family(&net_drop_monitor_family)); + + BUG_ON(unregister_netdevice_notifier(&dropmon_net_notifier));
for_each_possible_cpu(cpu) { net_dm_hw_cpu_data_fini(cpu); net_dm_cpu_data_fini(cpu); } - - BUG_ON(genl_unregister_family(&net_drop_monitor_family)); }
module_init(init_net_drop_monitor);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoxiang Li haoxiang_li2024@163.com
commit e31e3f6c0ce473f7ce1e70d54ac8e3ed190509f8 upstream.
Add check for the return value of devm_kstrdup() in loongson2_guts_probe() to catch potential exception.
Fixes: b82621ac8450 ("soc: loongson: add GUTS driver for loongson-2 platforms") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li haoxiang_li2024@163.com Link: https://lore.kernel.org/r/20250220081714.2676828-1-haoxiang_li2024@163.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/loongson/loongson2_guts.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/soc/loongson/loongson2_guts.c +++ b/drivers/soc/loongson/loongson2_guts.c @@ -114,8 +114,11 @@ static int loongson2_guts_probe(struct p if (of_property_read_string(root, "model", &machine)) of_property_read_string_index(root, "compatible", 0, &machine); of_node_put(root); - if (machine) + if (machine) { soc_dev_attr.machine = devm_kstrdup(dev, machine, GFP_KERNEL); + if (!soc_dev_attr.machine) + return -ENOMEM; + }
svr = loongson2_guts_get_svr(); soc_die = loongson2_soc_die_match(svr, loongson2_soc_die);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit f4b78260fc678ccd7169f32dc9f3bfa3b93931c7 upstream.
import_iovec() says that it should always be fine to kfree the iovec returned in @iovp regardless of the error code. __import_iovec_ubuf() never reallocates it and thus should clear the pointer even in cases when copy_iovec_*() fail.
Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.173833246... Fixes: 3b2deb0e46da ("iov_iter: import single vector iovecs as ITER_UBUF") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Jens Axboe axboe@kernel.dk Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/iov_iter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1441,6 +1441,8 @@ static ssize_t __import_iovec_ubuf(int t struct iovec *iov = *iovp; ssize_t ret;
+ *iovp = NULL; + if (compat) ret = copy_compat_iovec_from_user(iov, uvec, 1); else @@ -1451,7 +1453,6 @@ static ssize_t __import_iovec_ubuf(int t ret = import_ubuf(type, iov->iov_base, iov->iov_len, i); if (unlikely(ret)) return ret; - *iovp = NULL; return i->count; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoxiang Li haoxiang_li2024@163.com
commit 878e7b11736e062514e58f3b445ff343e6705537 upstream.
Add check for the return value of nfp_app_ctrl_msg_alloc() in nfp_bpf_cmsg_alloc() to prevent null pointer dereference.
Fixes: ff3d43f7568c ("nfp: bpf: implement helpers for FW map ops") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li haoxiang_li2024@163.com Link: https://patch.msgid.link/20250218030409.2425798-1-haoxiang_li2024@163.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c @@ -20,6 +20,8 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *b struct sk_buff *skb;
skb = nfp_app_ctrl_msg_alloc(bpf->app, size, GFP_KERNEL); + if (!skb) + return NULL; skb_put(skb, size);
return skb;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Ujfalusi peter.ujfalusi@linux.intel.com
commit d8d99c3b5c485f339864aeaa29f76269cc0ea975 upstream.
The nullity of sps->cstream should be checked similarly as it is done in sof_set_stream_data_offset() function. Assuming that it is not NULL if sps->stream is NULL is incorrect and can lead to NULL pointer dereference.
Fixes: 090349a9feba ("ASoC: SOF: Add support for compress API for stream data/offset") Cc: stable@vger.kernel.org Reported-by: Curtis Malainey cujomalainey@chromium.org Closes: https://github.com/thesofproject/linux/pull/5214 Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Daniel Baluta daniel.baluta@nxp.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Curtis Malainey cujomalainey@chromium.org Link: https://patch.msgid.link/20250205135232.19762-2-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/sof/stream-ipc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/sound/soc/sof/stream-ipc.c b/sound/soc/sof/stream-ipc.c index 794c7bbccbaf..8262443ac89a 100644 --- a/sound/soc/sof/stream-ipc.c +++ b/sound/soc/sof/stream-ipc.c @@ -43,7 +43,7 @@ int sof_ipc_msg_data(struct snd_sof_dev *sdev, return -ESTRPIPE;
posn_offset = stream->posn_offset; - } else { + } else if (sps->cstream) {
struct sof_compr_stream *sstream = sps->cstream->runtime->private_data;
@@ -51,6 +51,10 @@ int sof_ipc_msg_data(struct snd_sof_dev *sdev, return -ESTRPIPE;
posn_offset = sstream->posn_offset; + + } else { + dev_err(sdev->dev, "%s: No stream opened\n", __func__); + return -EINVAL; }
snd_sof_dsp_mailbox_read(sdev, posn_offset, p, sz);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Zhandarovich n.zhandarovich@fintech.ru
commit a8c9a453387640dbe45761970f41301a6985e7fa upstream.
If 'micfil->quality' received from micfil_quality_set() somehow ends up with an unpredictable value, switch() operator will fail to initialize local variable qsel before regmap_update_bits() tries to utilize it.
While it is unlikely, play it safe and enable a default case that returns -EINVAL error.
Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE.
Fixes: bea1d61d5892 ("ASoC: fsl_micfil: rework quality setting") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich n.zhandarovich@fintech.ru Link: https://patch.msgid.link/20250116142436.22389-1-n.zhandarovich@fintech.ru Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/fsl/fsl_micfil.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/soc/fsl/fsl_micfil.c +++ b/sound/soc/fsl/fsl_micfil.c @@ -156,6 +156,8 @@ static int micfil_set_quality(struct fsl case QUALITY_VLOW2: qsel = MICFIL_QSEL_VLOW2_QUALITY; break; + default: + return -EINVAL; }
return regmap_update_bits(micfil->regmap, REG_MICFIL_CTRL2,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wentao Liang vulab@iscas.ac.cn
commit 822b7ec657e99b44b874e052d8540d8b54fe8569 upstream.
Check the return value of snd_ctl_rename_id() in snd_hda_create_dig_out_ctls(). Ensure that failures are properly handled.
[ Note: the error cannot happen practically because the only error condition in snd_ctl_rename_id() is the missing ID, but this is a rename, hence it must be present. But for the code consistency, it's safer to have always the proper return check -- tiwai ]
Fixes: 5c219a340850 ("ALSA: hda: Fix kctl->id initialization") Cc: stable@vger.kernel.org # 6.4+ Signed-off-by: Wentao Liang vulab@iscas.ac.cn Link: https://patch.msgid.link/20250213074543.1620-1-vulab@iscas.ac.cn Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/hda_codec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -2463,7 +2463,9 @@ int snd_hda_create_dig_out_ctls(struct h break; id = kctl->id; id.index = spdif_index; - snd_ctl_rename_id(codec->card, &kctl->id, &id); + err = snd_ctl_rename_id(codec->card, &kctl->id, &id); + if (err < 0) + return err; } bus->primary_dig_out_type = HDA_PCM_TYPE_HDMI; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Veness john-linux@pelago.org.uk
commit 6d1f86610f23b0bc334d6506a186f21a98f51392 upstream.
Allows the LED on the dedicated mute button on the HP ProBook 450 G4 laptop to change colour correctly.
Signed-off-by: John Veness john-linux@pelago.org.uk Cc: stable@vger.kernel.org Link: https://patch.msgid.link/2fb55d48-6991-4a42-b591-4c78f2fad8d7@pelago.org.uk Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_conexant.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -1084,6 +1084,7 @@ static const struct hda_quirk cxt5066_fi SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO), SND_PCI_QUIRK(0x103c, 0x8174, "HP Spectre x360", CXT_FIXUP_HP_SPECTRE), SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO), + SND_PCI_QUIRK(0x103c, 0x8231, "HP ProBook 450 G4", CXT_FIXUP_MUTE_LED_GPIO), SND_PCI_QUIRK(0x103c, 0x828c, "HP EliteBook 840 G4", CXT_FIXUP_HP_DOCK), SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Ujfalusi peter.ujfalusi@linux.intel.com
commit 46c7b901e2a03536df5a3cb40b3b26e2be505df6 upstream.
The spcm->stream[substream->stream].substream is set during open and was left untouched. After the first PCM stream it will never be NULL and we have code which checks for substream NULLity as indication if the stream is active or not. For the compressed cstream pointer the same has been done, this change will correct the handling of PCM streams.
Fixes: 090349a9feba ("ASoC: SOF: Add support for compress API for stream data/offset") Cc: stable@vger.kernel.org Reported-by: Curtis Malainey cujomalainey@chromium.org Closes: https://github.com/thesofproject/linux/pull/5214 Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Daniel Baluta daniel.baluta@nxp.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Curtis Malainey cujomalainey@chromium.org Link: https://patch.msgid.link/20250205135232.19762-3-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/sof/pcm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/soc/sof/pcm.c +++ b/sound/soc/sof/pcm.c @@ -507,6 +507,8 @@ static int sof_pcm_close(struct snd_soc_ */ }
+ spcm->stream[substream->stream].substream = NULL; + return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 56d5f3eba3f5de0efdd556de4ef381e109b973a9 upstream.
In [1] it was reported that the acct(2) system call can be used to trigger NULL deref in cases where it is set to write to a file that triggers an internal lookup. This can e.g., happen when pointing acc(2) to /sys/power/resume. At the point the where the write to this file happens the calling task has already exited and called exit_fs(). A lookup will thus trigger a NULL-deref when accessing current->fs.
Reorganize the code so that the the final write happens from the workqueue but with the caller's credentials. This preserves the (strange) permission model and has almost no regression risk.
This api should stop to exist though.
Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com [1] Link: https://lore.kernel.org/r/20250211-work-acct-v1-1-1c16aecab8b3@kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Zicheng Qu quzicheng@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/acct.c | 120 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 70 insertions(+), 50 deletions(-)
--- a/kernel/acct.c +++ b/kernel/acct.c @@ -104,48 +104,50 @@ struct bsd_acct_struct { atomic_long_t count; struct rcu_head rcu; struct mutex lock; - int active; + bool active; + bool check_space; unsigned long needcheck; struct file *file; struct pid_namespace *ns; struct work_struct work; struct completion done; + acct_t ac; };
-static void do_acct_process(struct bsd_acct_struct *acct); +static void fill_ac(struct bsd_acct_struct *acct); +static void acct_write_process(struct bsd_acct_struct *acct);
/* * Check the amount of free space and suspend/resume accordingly. */ -static int check_free_space(struct bsd_acct_struct *acct) +static bool check_free_space(struct bsd_acct_struct *acct) { struct kstatfs sbuf;
- if (time_is_after_jiffies(acct->needcheck)) - goto out; + if (!acct->check_space) + return acct->active;
/* May block */ if (vfs_statfs(&acct->file->f_path, &sbuf)) - goto out; + return acct->active;
if (acct->active) { u64 suspend = sbuf.f_blocks * SUSPEND; do_div(suspend, 100); if (sbuf.f_bavail <= suspend) { - acct->active = 0; + acct->active = false; pr_info("Process accounting paused\n"); } } else { u64 resume = sbuf.f_blocks * RESUME; do_div(resume, 100); if (sbuf.f_bavail >= resume) { - acct->active = 1; + acct->active = true; pr_info("Process accounting resumed\n"); } }
acct->needcheck = jiffies + ACCT_TIMEOUT*HZ; -out: return acct->active; }
@@ -190,7 +192,11 @@ static void acct_pin_kill(struct fs_pin { struct bsd_acct_struct *acct = to_acct(pin); mutex_lock(&acct->lock); - do_acct_process(acct); + /* + * Fill the accounting struct with the exiting task's info + * before punting to the workqueue. + */ + fill_ac(acct); schedule_work(&acct->work); wait_for_completion(&acct->done); cmpxchg(&acct->ns->bacct, pin, NULL); @@ -203,6 +209,9 @@ static void close_work(struct work_struc { struct bsd_acct_struct *acct = container_of(work, struct bsd_acct_struct, work); struct file *file = acct->file; + + /* We were fired by acct_pin_kill() which holds acct->lock. */ + acct_write_process(acct); if (file->f_op->flush) file->f_op->flush(file, NULL); __fput_sync(file); @@ -431,13 +440,27 @@ static u32 encode_float(u64 value) * do_exit() or when switching to a different output file. */
-static void fill_ac(acct_t *ac) +static void fill_ac(struct bsd_acct_struct *acct) { struct pacct_struct *pacct = ¤t->signal->pacct; + struct file *file = acct->file; + acct_t *ac = &acct->ac; u64 elapsed, run_time; time64_t btime; struct tty_struct *tty;
+ lockdep_assert_held(&acct->lock); + + if (time_is_after_jiffies(acct->needcheck)) { + acct->check_space = false; + + /* Don't fill in @ac if nothing will be written. */ + if (!acct->active) + return; + } else { + acct->check_space = true; + } + /* * Fill the accounting struct with the needed info as recorded * by the different kernel functions. @@ -485,64 +508,61 @@ static void fill_ac(acct_t *ac) ac->ac_majflt = encode_comp_t(pacct->ac_majflt); ac->ac_exitcode = pacct->ac_exitcode; spin_unlock_irq(¤t->sighand->siglock); -} -/* - * do_acct_process does all actual work. Caller holds the reference to file. - */ -static void do_acct_process(struct bsd_acct_struct *acct) -{ - acct_t ac; - unsigned long flim; - const struct cred *orig_cred; - struct file *file = acct->file; - - /* - * Accounting records are not subject to resource limits. - */ - flim = rlimit(RLIMIT_FSIZE); - current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; - /* Perform file operations on behalf of whoever enabled accounting */ - orig_cred = override_creds(file->f_cred);
- /* - * First check to see if there is enough free_space to continue - * the process accounting system. - */ - if (!check_free_space(acct)) - goto out; - - fill_ac(&ac); /* we really need to bite the bullet and change layout */ - ac.ac_uid = from_kuid_munged(file->f_cred->user_ns, orig_cred->uid); - ac.ac_gid = from_kgid_munged(file->f_cred->user_ns, orig_cred->gid); + ac->ac_uid = from_kuid_munged(file->f_cred->user_ns, current_uid()); + ac->ac_gid = from_kgid_munged(file->f_cred->user_ns, current_gid()); #if ACCT_VERSION == 1 || ACCT_VERSION == 2 /* backward-compatible 16 bit fields */ - ac.ac_uid16 = ac.ac_uid; - ac.ac_gid16 = ac.ac_gid; + ac->ac_uid16 = ac->ac_uid; + ac->ac_gid16 = ac->ac_gid; #elif ACCT_VERSION == 3 { struct pid_namespace *ns = acct->ns;
- ac.ac_pid = task_tgid_nr_ns(current, ns); + ac->ac_pid = task_tgid_nr_ns(current, ns); rcu_read_lock(); - ac.ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), - ns); + ac->ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), ns); rcu_read_unlock(); } #endif +} + +static void acct_write_process(struct bsd_acct_struct *acct) +{ + struct file *file = acct->file; + const struct cred *cred; + acct_t *ac = &acct->ac; + + /* Perform file operations on behalf of whoever enabled accounting */ + cred = override_creds(file->f_cred); + /* - * Get freeze protection. If the fs is frozen, just skip the write - * as we could deadlock the system otherwise. + * First check to see if there is enough free_space to continue + * the process accounting system. Then get freeze protection. If + * the fs is frozen, just skip the write as we could deadlock + * the system otherwise. */ - if (file_start_write_trylock(file)) { + if (check_free_space(acct) && file_start_write_trylock(file)) { /* it's been opened O_APPEND, so position is irrelevant */ loff_t pos = 0; - __kernel_write(file, &ac, sizeof(acct_t), &pos); + __kernel_write(file, ac, sizeof(acct_t), &pos); file_end_write(file); } -out: + + revert_creds(cred); +} + +static void do_acct_process(struct bsd_acct_struct *acct) +{ + unsigned long flim; + + /* Accounting records are not subject to resource limits. */ + flim = rlimit(RLIMIT_FSIZE); + current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; + fill_ac(acct); + acct_write_process(acct); current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim; - revert_creds(orig_cred); }
/**
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 890ed45bde808c422c3c27d3285fc45affa0f930 upstream.
There's no point in allowing anything kernel internal nor procfs or sysfs.
Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com Link: https://lore.kernel.org/r/20250211-work-acct-v1-2-1c16aecab8b3@kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Amir Goldstein amir73il@gmail.com Reported-by: Zicheng Qu quzicheng@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/acct.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/kernel/acct.c +++ b/kernel/acct.c @@ -244,6 +244,20 @@ static int acct_on(struct filename *path return -EACCES; }
+ /* Exclude kernel kernel internal filesystems. */ + if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) { + kfree(acct); + filp_close(file, NULL); + return -EINVAL; + } + + /* Exclude procfs and sysfs. */ + if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) { + kfree(acct); + filp_close(file, NULL); + return -EINVAL; + } + if (!(file->f_mode & FMODE_CAN_WRITE)) { kfree(acct); filp_close(file, NULL);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Cañuelo Navarro rcn@igalia.com
commit 2ede647a6fde3e54a6bfda7cf01c716649655900 upstream.
Add a sanity check to madvise_dontneed_free() to address a corner case in madvise where a race condition causes the current vma being processed to be backed by a different page size.
During a madvise(MADV_DONTNEED) call on a memory region registered with a userfaultfd, there's a period of time where the process mm lock is temporarily released in order to send a UFFD_EVENT_REMOVE and let userspace handle the event. During this time, the vma covering the current address range may change due to an explicit mmap done concurrently by another thread.
If, after that change, the memory region, which was originally backed by 4KB pages, is now backed by hugepages, the end address is rounded down to a hugepage boundary to avoid data loss (see "Fixes" below). This rounding may cause the end address to be truncated to the same address as the start.
Make this corner case follow the same semantics as in other similar cases where the requested region has zero length (ie. return 0).
This will make madvise_walk_vmas() continue to the next vma in the range (this time holding the process mm lock) which, due to the prev pointer becoming stale because of the vma change, will be the same hugepage-backed vma that was just checked before. The next time madvise_dontneed_free() runs for this vma, if the start address isn't aligned to a hugepage boundary, it'll return -EINVAL, which is also in line with the madvise api.
From userspace perspective, madvise() will return EINVAL because the start
address isn't aligned according to the new vma alignment requirements (hugepage), even though it was correctly page-aligned when the call was issued.
Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs") Signed-off-by: Ricardo Cañuelo Navarro rcn@igalia.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Florent Revest revest@google.com Cc: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/madvise.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/madvise.c +++ b/mm/madvise.c @@ -899,7 +899,16 @@ static long madvise_dontneed_free(struct */ end = vma->vm_end; } - VM_WARN_ON(start >= end); + /* + * If the memory region between start and end was + * originally backed by 4kB pages and then remapped to + * be backed by hugepages while mmap_lock was dropped, + * the adjustment for hugetlb vma above may have rounded + * end down to the start address. + */ + if (start == end) + return 0; + VM_WARN_ON(start > end); }
if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niravkumar L Rabara niravkumar.l.rabara@intel.com
commit 2b9df00cded911e2ca2cfae5c45082166b24f8aa upstream.
Replace dma_request_channel() with dma_request_chan_by_mask() and use helper functions to return proper error code instead of fixed -EBUSY.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem") Cc: stable@vger.kernel.org Signed-off-by: Niravkumar L Rabara niravkumar.l.rabara@intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/cadence-nand-controller.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c +++ b/drivers/mtd/nand/raw/cadence-nand-controller.c @@ -2909,11 +2909,10 @@ static int cadence_nand_init(struct cdns dma_cap_set(DMA_MEMCPY, mask);
if (cdns_ctrl->caps1->has_dma) { - cdns_ctrl->dmac = dma_request_channel(mask, NULL, NULL); - if (!cdns_ctrl->dmac) { - dev_err(cdns_ctrl->dev, - "Unable to get a DMA channel\n"); - ret = -EBUSY; + cdns_ctrl->dmac = dma_request_chan_by_mask(&mask); + if (IS_ERR(cdns_ctrl->dmac)) { + ret = dev_err_probe(cdns_ctrl->dev, PTR_ERR(cdns_ctrl->dmac), + "%d: Failed to get a DMA channel\n", ret); goto disable_irq; } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niravkumar L Rabara niravkumar.l.rabara@intel.com
commit d76d22b5096c5b05208fd982b153b3f182350b19 upstream.
Remap the slave DMA I/O resources to enhance driver portability. Using a physical address causes DMA translation failure when the ARM SMMU is enabled.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem") Cc: stable@vger.kernel.org Signed-off-by: Niravkumar L Rabara niravkumar.l.rabara@intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/cadence-nand-controller.c | 29 +++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c +++ b/drivers/mtd/nand/raw/cadence-nand-controller.c @@ -469,6 +469,8 @@ struct cdns_nand_ctrl { struct { void __iomem *virt; dma_addr_t dma; + dma_addr_t iova_dma; + u32 size; } io;
int irq; @@ -1838,11 +1840,11 @@ static int cadence_nand_slave_dma_transf }
if (dir == DMA_FROM_DEVICE) { - src_dma = cdns_ctrl->io.dma; + src_dma = cdns_ctrl->io.iova_dma; dst_dma = buf_dma; } else { src_dma = buf_dma; - dst_dma = cdns_ctrl->io.dma; + dst_dma = cdns_ctrl->io.iova_dma; }
tx = dmaengine_prep_dma_memcpy(cdns_ctrl->dmac, dst_dma, src_dma, len, @@ -2874,6 +2876,7 @@ cadence_nand_irq_cleanup(int irqnum, str static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) { dma_cap_mask_t mask; + struct dma_device *dma_dev = cdns_ctrl->dmac->device; int ret;
cdns_ctrl->cdma_desc = dma_alloc_coherent(cdns_ctrl->dev, @@ -2917,6 +2920,16 @@ static int cadence_nand_init(struct cdns } }
+ cdns_ctrl->io.iova_dma = dma_map_resource(dma_dev->dev, cdns_ctrl->io.dma, + cdns_ctrl->io.size, + DMA_BIDIRECTIONAL, 0); + + ret = dma_mapping_error(dma_dev->dev, cdns_ctrl->io.iova_dma); + if (ret) { + dev_err(cdns_ctrl->dev, "Failed to map I/O resource to DMA\n"); + goto dma_release_chnl; + } + nand_controller_init(&cdns_ctrl->controller); INIT_LIST_HEAD(&cdns_ctrl->chips);
@@ -2927,18 +2940,22 @@ static int cadence_nand_init(struct cdns if (ret) { dev_err(cdns_ctrl->dev, "Failed to register MTD: %d\n", ret); - goto dma_release_chnl; + goto unmap_dma_resource; }
kfree(cdns_ctrl->buf); cdns_ctrl->buf = kzalloc(cdns_ctrl->buf_size, GFP_KERNEL); if (!cdns_ctrl->buf) { ret = -ENOMEM; - goto dma_release_chnl; + goto unmap_dma_resource; }
return 0;
+unmap_dma_resource: + dma_unmap_resource(dma_dev->dev, cdns_ctrl->io.iova_dma, + cdns_ctrl->io.size, DMA_BIDIRECTIONAL, 0); + dma_release_chnl: if (cdns_ctrl->dmac) dma_release_channel(cdns_ctrl->dmac); @@ -2960,6 +2977,8 @@ free_buf_desc: static void cadence_nand_remove(struct cdns_nand_ctrl *cdns_ctrl) { cadence_nand_chips_cleanup(cdns_ctrl); + dma_unmap_resource(cdns_ctrl->dmac->device->dev, cdns_ctrl->io.iova_dma, + cdns_ctrl->io.size, DMA_BIDIRECTIONAL, 0); cadence_nand_irq_cleanup(cdns_ctrl->irq, cdns_ctrl); kfree(cdns_ctrl->buf); dma_free_coherent(cdns_ctrl->dev, sizeof(struct cadence_nand_cdma_desc), @@ -3028,7 +3047,9 @@ static int cadence_nand_dt_probe(struct cdns_ctrl->io.virt = devm_platform_get_and_ioremap_resource(ofdev, 1, &res); if (IS_ERR(cdns_ctrl->io.virt)) return PTR_ERR(cdns_ctrl->io.virt); + cdns_ctrl->io.dma = res->start; + cdns_ctrl->io.size = resource_size(res);
dt->clk = devm_clk_get(cdns_ctrl->dev, "nf_clk"); if (IS_ERR(dt->clk))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niravkumar L Rabara niravkumar.l.rabara@intel.com
commit f37d135b42cb484bdecee93f56b9f483214ede78 upstream.
dma_map_single is using physical/bus device (DMA) but dma_unmap_single is using framework device(NAND controller), which is incorrect. Fixed dma_unmap_single to use correct physical/bus device.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem") Cc: stable@vger.kernel.org Signed-off-by: Niravkumar L Rabara niravkumar.l.rabara@intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/cadence-nand-controller.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c +++ b/drivers/mtd/nand/raw/cadence-nand-controller.c @@ -1866,12 +1866,12 @@ static int cadence_nand_slave_dma_transf dma_async_issue_pending(cdns_ctrl->dmac); wait_for_completion(&finished);
- dma_unmap_single(cdns_ctrl->dev, buf_dma, len, dir); + dma_unmap_single(dma_dev->dev, buf_dma, len, dir);
return 0;
err_unmap: - dma_unmap_single(cdns_ctrl->dev, buf_dma, len, dir); + dma_unmap_single(dma_dev->dev, buf_dma, len, dir);
err: dev_dbg(cdns_ctrl->dev, "Fall back to CPU I/O\n");
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoxiang Li haoxiang_li2024@163.com
commit 860ca5e50f73c2a1cef7eefc9d39d04e275417f7 upstream.
Add check for the return value of cifs_buf_get() and cifs_small_buf_get() in receive_encrypted_standard() to prevent null pointer dereference.
Fixes: eec04ea11969 ("smb: client: fix OOB in receive_encrypted_standard()") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li haoxiang_li2024@163.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2ops.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -4905,6 +4905,10 @@ one_more: next_buffer = (char *)cifs_buf_get(); else next_buffer = (char *)cifs_small_buf_get(); + if (!next_buffer) { + cifs_server_dbg(VFS, "No memory for (large) SMB response\n"); + return -1; + } memcpy(next_buffer, buf + next_cmd, pdu_length - next_cmd); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Komal Bajaj quic_kbajaj@quicinc.com
commit c158647c107358bf1be579f98e4bb705c1953292 upstream.
The previous implementation incorrectly configured the cmn_interrupt_2_enable register for interrupt handling. Using cmn_interrupt_2_enable to configure Tag, Data RAM ECC interrupts would lead to issues like double handling of the interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured for interrupts which needs to be handled by EL3.
EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.
Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Signed-off-by: Komal Bajaj quic_kbajaj@quicinc.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Cc: stable@kernel.org Link: https://lore.kernel.org/r/20241119064608.12326-1-quic_kbajaj@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/edac/qcom_edac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/edac/qcom_edac.c +++ b/drivers/edac/qcom_edac.c @@ -95,7 +95,7 @@ static int qcom_llcc_core_setup(struct l * Configure interrupt enable registers such that Tag, Data RAM related * interrupts are propagated to interrupt controller for servicing */ - ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable, + ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_0_enable, TRP0_INTERRUPT_ENABLE, TRP0_INTERRUPT_ENABLE); if (ret) @@ -113,7 +113,7 @@ static int qcom_llcc_core_setup(struct l if (ret) return ret;
- ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable, + ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_0_enable, DRP0_INTERRUPT_ENABLE, DRP0_INTERRUPT_ENABLE); if (ret)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 57b76bedc5c52c66968183b5ef57234894c25ce7 upstream.
The function tracer should record the preemption level at the point when the function is invoked. If the tracing subsystem decrement the preemption counter it needs to correct this before feeding the data into the trace buffer. This was broken in the commit cited below while shifting the preempt-disabled section.
Use tracing_gen_ctx_dec() which properly subtracts one from the preemption counter on a preemptible kernel.
Cc: stable@vger.kernel.org Cc: Wander Lairson Costa wander@redhat.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/20250220140749.pfw8qoNZ@linutronix.de Fixes: ce5e48036c9e7 ("ftrace: disable preemption when recursion locked") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Tested-by: Wander Lairson Costa wander@redhat.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_functions.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -185,7 +185,7 @@ function_trace_call(unsigned long ip, un if (bit < 0) return;
- trace_ctx = tracing_gen_ctx(); + trace_ctx = tracing_gen_ctx_dec();
cpu = smp_processor_id(); data = per_cpu_ptr(tr->array_buffer.data, cpu); @@ -285,7 +285,6 @@ function_no_repeats_trace_call(unsigned struct trace_array *tr = op->private; struct trace_array_cpu *data; unsigned int trace_ctx; - unsigned long flags; int bit; int cpu;
@@ -312,8 +311,7 @@ function_no_repeats_trace_call(unsigned if (is_repeat_check(tr, last_info, ip, parent_ip)) goto out;
- local_save_flags(flags); - trace_ctx = tracing_gen_ctx_flags(flags); + trace_ctx = tracing_gen_ctx_dec(); process_repeats(tr, ip, parent_ip, last_info, trace_ctx);
trace_function(tr, ip, parent_ip, trace_ctx);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit 8eb4b09e0bbd30981305643229fe7640ad41b667 upstream.
Check if a function is already in the manager ops of a subops. A manager ops contains multiple subops, and if two or more subops are tracing the same function, the manager ops only needs a single entry in its hash.
Cc: stable@vger.kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Sven Schnelle svens@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Alexander Gordeev agordeev@linux.ibm.com Link: https://lore.kernel.org/20250220202055.226762894@goodmis.org Fixes: 4f554e955614f ("ftrace: Add ftrace_set_filter_ips function") Tested-by: Heiko Carstens hca@linux.ibm.com Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5233,6 +5233,9 @@ __ftrace_match_addr(struct ftrace_hash * return -ENOENT; free_hash_entry(hash, entry); return 0; + } else if (__ftrace_lookup_ip(hash, ip) != NULL) { + /* Already exists */ + return 0; }
entry = add_hash_entry(hash, ip);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Ratiu cratiu@nvidia.com
commit 4dbc1d1a9f39c3711ad2a40addca04d07d9ab5d0 upstream.
When profile rollback fails in mlx5e_netdev_change_profile, the netdev profile var is left set to NULL. Avoid a crash when unloading the driver by not calling profile->cleanup in such a case.
This was encountered while testing, with the original trigger that the wq rescuer thread creation got interrupted (presumably due to Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by mlx5e_priv_init, the profile rollback also fails for the same reason (signal still active) so the profile is left as NULL, leading to a crash later in _mlx5e_remove.
[ 732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2) [ 734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR [ 734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12 [ 734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12 [ 734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR [ 734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12 [ 734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 [ 745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 745.538222] #PF: supervisor read access in kernel mode <snipped> [ 745.551290] Call Trace: [ 745.551590] <TASK> [ 745.551866] ? __die+0x20/0x60 [ 745.552218] ? page_fault_oops+0x150/0x400 [ 745.555307] ? exc_page_fault+0x79/0x240 [ 745.555729] ? asm_exc_page_fault+0x22/0x30 [ 745.556166] ? mlx5e_remove+0x6b/0xb0 [mlx5_core] [ 745.556698] auxiliary_bus_remove+0x18/0x30 [ 745.557134] device_release_driver_internal+0x1df/0x240 [ 745.557654] bus_remove_device+0xd7/0x140 [ 745.558075] device_del+0x15b/0x3c0 [ 745.558456] mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core] [ 745.559112] mlx5_unregister_device+0x34/0x50 [mlx5_core] [ 745.559686] mlx5_uninit_one+0x46/0xf0 [mlx5_core] [ 745.560203] remove_one+0x4e/0xd0 [mlx5_core] [ 745.560694] pci_device_remove+0x39/0xa0 [ 745.561112] device_release_driver_internal+0x1df/0x240 [ 745.561631] driver_detach+0x47/0x90 [ 745.562022] bus_remove_driver+0x84/0x100 [ 745.562444] pci_unregister_driver+0x3b/0x90 [ 745.562890] mlx5_cleanup+0xc/0x1b [mlx5_core] [ 745.563415] __x64_sys_delete_module+0x14d/0x2f0 [ 745.563886] ? kmem_cache_free+0x1b0/0x460 [ 745.564313] ? lockdep_hardirqs_on_prepare+0xe2/0x190 [ 745.564825] do_syscall_64+0x6d/0x140 [ 745.565223] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 745.565725] RIP: 0033:0x7f1579b1288b
Fixes: 3ef14e463f6e ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Jianqi Ren jianqi.ren.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -6110,7 +6110,9 @@ static void mlx5e_remove(struct auxiliar mlx5e_dcbnl_delete_app(priv); unregister_netdev(priv->netdev); mlx5e_suspend(adev, state); - priv->profile->cleanup(priv); + /* Avoid cleanup if profile rollback failed. */ + if (priv->profile) + priv->profile->cleanup(priv); mlx5e_destroy_netdev(priv); mlx5e_devlink_port_unregister(mlx5e_dev); mlx5e_destroy_devlink(mlx5e_dev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
commit f2d87a759f6841a132e845e2fafdad37385ddd30 upstream.
Commit ac619781967b ("md: use separate work_struct for md_start_sync()") use a new sync_work to replace del_work, however, stop_sync_thread() and __md_stop_writes() was trying to wait for sync_thread to be done, hence they should switch to use sync_work as well.
Noted that md_start_sync() from sync_work will grab 'reconfig_mutex', hence other contex can't held the same lock to flush work, and this will be fixed in later patches.
Fixes: ac619781967b ("md: use separate work_struct for md_start_sync()") Signed-off-by: Yu Kuai yukuai3@huawei.com Acked-by: Xiao Ni xni@redhat.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20231205094215.1824240-2-yukuai1@huaweicloud.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/md.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4836,7 +4836,7 @@ static void stop_sync_thread(struct mdde return; }
- if (work_pending(&mddev->del_work)) + if (work_pending(&mddev->sync_work)) flush_workqueue(md_misc_wq);
set_bit(MD_RECOVERY_INTR, &mddev->recovery); @@ -6293,7 +6293,7 @@ static void md_clean(struct mddev *mddev static void __md_stop_writes(struct mddev *mddev) { set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - if (work_pending(&mddev->del_work)) + if (work_pending(&mddev->sync_work)) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
commit f9cfe7e7f96a9414a17d596e288693c4f2325d49 upstream.
Commit cf1b6d4441ff ("md: simplify md_seq_ops") introduce following regressions:
1) If list all_mddevs is emptly, personalities and unused devices won't be showed to user anymore. 2) If seq_file buffer overflowed from md_seq_show(), then md_seq_start() will be called again, hence personalities will be showed to user again. 3) If seq_file buffer overflowed from md_seq_stop(), seq_read_iter() doesn't handle this, hence unused devices won't be showed to user.
Fix above problems by printing personalities and unused devices in md_seq_show().
Fixes: cf1b6d4441ff ("md: simplify md_seq_ops") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20240109133957.2975272-1-yukuai1@huaweicloud.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/md.c | 40 +++++++++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 13 deletions(-)
--- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8121,6 +8121,19 @@ static void status_unused(struct seq_fil seq_printf(seq, "\n"); }
+static void status_personalities(struct seq_file *seq) +{ + struct md_personality *pers; + + seq_puts(seq, "Personalities : "); + spin_lock(&pers_lock); + list_for_each_entry(pers, &pers_list, list) + seq_printf(seq, "[%s] ", pers->name); + + spin_unlock(&pers_lock); + seq_puts(seq, "\n"); +} + static int status_resync(struct seq_file *seq, struct mddev *mddev) { sector_t max_sectors, resync, res; @@ -8262,20 +8275,10 @@ static int status_resync(struct seq_file static void *md_seq_start(struct seq_file *seq, loff_t *pos) __acquires(&all_mddevs_lock) { - struct md_personality *pers; - - seq_puts(seq, "Personalities : "); - spin_lock(&pers_lock); - list_for_each_entry(pers, &pers_list, list) - seq_printf(seq, "[%s] ", pers->name); - - spin_unlock(&pers_lock); - seq_puts(seq, "\n"); seq->poll_event = atomic_read(&md_event_count); - spin_lock(&all_mddevs_lock);
- return seq_list_start(&all_mddevs, *pos); + return seq_list_start_head(&all_mddevs, *pos); }
static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) @@ -8286,7 +8289,6 @@ static void *md_seq_next(struct seq_file static void md_seq_stop(struct seq_file *seq, void *v) __releases(&all_mddevs_lock) { - status_unused(seq); spin_unlock(&all_mddevs_lock); }
@@ -8319,10 +8321,18 @@ static void md_bitmap_status(struct seq_
static int md_seq_show(struct seq_file *seq, void *v) { - struct mddev *mddev = list_entry(v, struct mddev, all_mddevs); + struct mddev *mddev; sector_t sectors; struct md_rdev *rdev;
+ if (v == &all_mddevs) { + status_personalities(seq); + if (list_empty(&all_mddevs)) + status_unused(seq); + return 0; + } + + mddev = list_entry(v, struct mddev, all_mddevs); if (!mddev_get(mddev)) return 0;
@@ -8403,6 +8413,10 @@ static int md_seq_show(struct seq_file * spin_unlock(&mddev->lock); mutex_unlock(&mddev->bitmap_info.mutex); spin_lock(&all_mddevs_lock); + + if (mddev == list_last_entry(&all_mddevs, struct mddev, all_mddevs)) + status_unused(seq); + if (atomic_dec_and_test(&mddev->active)) __mddev_put(mddev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tianling Shen cnsztl@gmail.com
commit a6a7cba17c544fb95d5a29ab9d9ed4503029cb29 upstream.
In general the delay should be added by the PHY instead of the MAC, and this improves network stability on some boards which seem to need different delay.
Fixes: 387b3bbac5ea ("arm64: dts: rockchip: Add Xunlong OrangePi R1 Plus LTS") Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Tianling Shen cnsztl@gmail.com Link: https://lore.kernel.org/r/20250119091154.1110762-1-cnsztl@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de [Fix conflicts due to missing dtsi conversion] Signed-off-by: Tianling Shen cnsztl@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts @@ -15,9 +15,11 @@ };
&gmac2io { + /delete-property/ tx_delay; + /delete-property/ rx_delay; + phy-handle = <&yt8531c>; - tx_delay = <0x19>; - rx_delay = <0x05>; + phy-mode = "rgmii-id";
mdio { /delete-node/ ethernet-phy@1;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
commit 47a973fd75639fe80d59f9e1860113bb2a0b112b upstream.
The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves. To tell the availability of the sub-leaf 1 (enumerate the counter mask), perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).
The error is not user-visible on bare metal. Because the sub-leaf 0 and the sub-leaf 1 are always available. However, it may bring issues in a virtualization environment when a VMM only enumerates the sub-leaf 0.
Introduce the cpuid35_e?x to replace the macros, which makes the implementation style consistent.
Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extension leaf") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250129154820.3755948-3-kan.liang@linux.intel.com [ The patch is not exactly the same as the upstream patch. Because in the 6.6 stable kernel, the umask2/eq enumeration is not supported. The number of counters is used rather than the counter mask. But the change is straightforward, which utilizes the structured union to replace the macros when parsing the CPUID enumeration. It also fixed a wrong macros. ] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 17 ++++++++++------- arch/x86/include/asm/perf_event.h | 26 +++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 8 deletions(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4643,16 +4643,19 @@ static void intel_pmu_check_num_counters
static void update_pmu_cap(struct x86_hybrid_pmu *pmu) { - unsigned int sub_bitmaps = cpuid_eax(ARCH_PERFMON_EXT_LEAF); - unsigned int eax, ebx, ecx, edx; + unsigned int cntr, fixed_cntr, ecx, edx; + union cpuid35_eax eax; + union cpuid35_ebx ebx;
- if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF_BIT) { + cpuid(ARCH_PERFMON_EXT_LEAF, &eax.full, &ebx.full, &ecx, &edx); + + if (eax.split.cntr_subleaf) { cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF, - &eax, &ebx, &ecx, &edx); - pmu->num_counters = fls(eax); - pmu->num_counters_fixed = fls(ebx); + &cntr, &fixed_cntr, &ecx, &edx); + pmu->num_counters = fls(cntr); + pmu->num_counters_fixed = fls(fixed_cntr); intel_pmu_check_num_counters(&pmu->num_counters, &pmu->num_counters_fixed, - &pmu->intel_ctrl, ebx); + &pmu->intel_ctrl, fixed_cntr); } }
--- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -177,9 +177,33 @@ union cpuid10_edx { * detection/enumeration details: */ #define ARCH_PERFMON_EXT_LEAF 0x00000023 -#define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1 #define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1
+union cpuid35_eax { + struct { + unsigned int leaf0:1; + /* Counters Sub-Leaf */ + unsigned int cntr_subleaf:1; + /* Auto Counter Reload Sub-Leaf */ + unsigned int acr_subleaf:1; + /* Events Sub-Leaf */ + unsigned int events_subleaf:1; + unsigned int reserved:28; + } split; + unsigned int full; +}; + +union cpuid35_ebx { + struct { + /* UnitMask2 Supported */ + unsigned int umask2:1; + /* EQ-bit Supported */ + unsigned int eq:1; + unsigned int reserved:30; + } split; + unsigned int full; +}; + /* * Intel Architectural LBR CPUID detection/enumeration details: */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 584db20c181f5e28c0386d7987406ace7fbd3e49 upstream.
Patch series "nilfs2: Folio conversions for directory paths".
This series applies page->folio conversions to nilfs2 directory operations. This reduces hidden compound_head() calls and also converts deprecated kmap calls to kmap_local in the directory code.
Although nilfs2 does not yet support large folios, Matthew has done his best here to include support for large folios, which will be needed for devices with large block sizes.
This series corresponds to the second half of the original post [1], but with two complementary patches inserted at the beginning and some adjustments, to prevent a kmap_local constraint violation found during testing with highmem mapping.
[1] https://lkml.kernel.org/r/20231106173903.1734114-1-willy@infradead.org
I have reviewed all changes and tested this for regular and small block sizes, both on machines with and without highmem mapping. No issues found.
This patch (of 17):
In a few directory operations, the call to nilfs_put_page() for a page obtained using nilfs_find_entry() or nilfs_dotdot() is hidden in nilfs_set_link() and nilfs_delete_entry(), making it difficult to track page release and preventing change of its call position.
By moving nilfs_put_page() out of these functions, this makes the page get/put correspondence clearer and makes it easier to swap nilfs_put_page() calls (and kunmap calls within them) when modifying multiple directory entries simultaneously in nilfs_rename().
Also, update comments for nilfs_set_link() and nilfs_delete_entry() to reflect changes in their behavior.
To make nilfs_put_page() visible from namei.c, this moves its definition to nilfs.h and replaces existing equivalents to use it, but the exposure of that definition is temporary and will be removed on a later kmap -> kmap_local conversion.
Link: https://lkml.kernel.org/r/20231127143036.2425-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20231127143036.2425-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: ee70999a988b ("nilfs2: handle errors that nilfs_prepare_chunk() may return") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/dir.c | 11 +---------- fs/nilfs2/namei.c | 13 +++++++------ fs/nilfs2/nilfs.h | 6 ++++++ 3 files changed, 14 insertions(+), 16 deletions(-)
--- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -64,12 +64,6 @@ static inline unsigned int nilfs_chunk_s return inode->i_sb->s_blocksize; }
-static inline void nilfs_put_page(struct page *page) -{ - kunmap(page); - put_page(page); -} - /* * Return the offset into page `page_nr' of the last valid * byte in that page, plus one. @@ -450,7 +444,6 @@ int nilfs_inode_by_name(struct inode *di return 0; }
-/* Releases the page */ void nilfs_set_link(struct inode *dir, struct nilfs_dir_entry *de, struct page *page, struct inode *inode) { @@ -465,7 +458,6 @@ void nilfs_set_link(struct inode *dir, s de->inode = cpu_to_le64(inode->i_ino); nilfs_set_de_type(de, inode); nilfs_commit_chunk(page, mapping, from, to); - nilfs_put_page(page); dir->i_mtime = inode_set_ctime_current(dir); }
@@ -569,7 +561,7 @@ out_unlock:
/* * nilfs_delete_entry deletes a directory entry by merging it with the - * previous entry. Page is up-to-date. Releases the page. + * previous entry. Page is up-to-date. */ int nilfs_delete_entry(struct nilfs_dir_entry *dir, struct page *page) { @@ -605,7 +597,6 @@ int nilfs_delete_entry(struct nilfs_dir_ nilfs_commit_chunk(page, mapping, from, to); inode->i_mtime = inode_set_ctime_current(inode); out: - nilfs_put_page(page); return err; }
--- a/fs/nilfs2/namei.c +++ b/fs/nilfs2/namei.c @@ -297,6 +297,7 @@ static int nilfs_do_unlink(struct inode set_nlink(inode, 1); } err = nilfs_delete_entry(de, page); + nilfs_put_page(page); if (err) goto out;
@@ -406,6 +407,7 @@ static int nilfs_rename(struct mnt_idmap goto out_dir; } nilfs_set_link(new_dir, new_de, new_page, old_inode); + nilfs_put_page(new_page); nilfs_mark_inode_dirty(new_dir); inode_set_ctime_current(new_inode); if (dir_de) @@ -429,9 +431,11 @@ static int nilfs_rename(struct mnt_idmap inode_set_ctime_current(old_inode);
nilfs_delete_entry(old_de, old_page); + nilfs_put_page(old_page);
if (dir_de) { nilfs_set_link(old_inode, dir_de, dir_page, new_dir); + nilfs_put_page(dir_page); drop_nlink(old_dir); } nilfs_mark_inode_dirty(old_dir); @@ -441,13 +445,10 @@ static int nilfs_rename(struct mnt_idmap return err;
out_dir: - if (dir_de) { - kunmap(dir_page); - put_page(dir_page); - } + if (dir_de) + nilfs_put_page(dir_page); out_old: - kunmap(old_page); - put_page(old_page); + nilfs_put_page(old_page); out: nilfs_transaction_abort(old_dir->i_sb); return err; --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -243,6 +243,12 @@ extern struct nilfs_dir_entry *nilfs_dot extern void nilfs_set_link(struct inode *, struct nilfs_dir_entry *, struct page *, struct inode *);
+static inline void nilfs_put_page(struct page *page) +{ + kunmap(page); + put_page(page); +} + /* file.c */ extern int nilfs_sync_file(struct file *, loff_t, loff_t, int);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 8cf57c6df818f58fdad16a909506be213623a88e upstream.
In nilfs_rename(), calls to nilfs_put_page() to release pages obtained with nilfs_find_entry() or nilfs_dotdot() are alternated in the normal path.
When replacing the kernel memory mapping method from kmap to kmap_local_{page,folio}, this violates the constraint on the calling order of kunmap_local().
Swap the order of nilfs_put_page calls where the kmap sections of multiple pages overlap so that they are nested, allowing direct replacement of nilfs_put_page() -> unmap_and_put_page().
Without this reordering, that replacement will cause a kernel WARNING in kunmap_local_indexed() on architectures with high memory mapping.
Link: https://lkml.kernel.org/r/20231127143036.2425-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: ee70999a988b ("nilfs2: handle errors that nilfs_prepare_chunk() may return") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/namei.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/namei.c +++ b/fs/nilfs2/namei.c @@ -431,13 +431,14 @@ static int nilfs_rename(struct mnt_idmap inode_set_ctime_current(old_inode);
nilfs_delete_entry(old_de, old_page); - nilfs_put_page(old_page);
if (dir_de) { nilfs_set_link(old_inode, dir_de, dir_page, new_dir); nilfs_put_page(dir_page); drop_nlink(old_dir); } + nilfs_put_page(old_page); + nilfs_mark_inode_dirty(old_dir); nilfs_mark_inode_dirty(old_inode);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit ee70999a988b8abc3490609142f50ebaa8344432 upstream.
Patch series "nilfs2: fix issues with rename operations".
This series fixes BUG_ON check failures reported by syzbot around rename operations, and a minor behavioral issue where the mtime of a child directory changes when it is renamed instead of moved.
This patch (of 2):
The directory manipulation routines nilfs_set_link() and nilfs_delete_entry() rewrite the directory entry in the folio/page previously read by nilfs_find_entry(), so error handling is omitted on the assumption that nilfs_prepare_chunk(), which prepares the buffer for rewriting, will always succeed for these. And if an error is returned, it triggers the legacy BUG_ON() checks in each routine.
This assumption is wrong, as proven by syzbot: the buffer layer called by nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may fail due to metadata corruption or other reasons. This has been there all along, but improved sanity checks and error handling may have made it more reproducible in fuzzing tests.
Fix this issue by adding missing error paths in nilfs_set_link(), nilfs_delete_entry(), and their caller nilfs_rename().
[konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1 Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/dir.c | 13 ++++++++++--- fs/nilfs2/namei.c | 29 +++++++++++++++-------------- fs/nilfs2/nilfs.h | 4 ++-- 3 files changed, 27 insertions(+), 19 deletions(-)
--- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -444,7 +444,7 @@ int nilfs_inode_by_name(struct inode *di return 0; }
-void nilfs_set_link(struct inode *dir, struct nilfs_dir_entry *de, +int nilfs_set_link(struct inode *dir, struct nilfs_dir_entry *de, struct page *page, struct inode *inode) { unsigned int from = (char *)de - (char *)page_address(page); @@ -454,11 +454,15 @@ void nilfs_set_link(struct inode *dir, s
lock_page(page); err = nilfs_prepare_chunk(page, from, to); - BUG_ON(err); + if (unlikely(err)) { + unlock_page(page); + return err; + } de->inode = cpu_to_le64(inode->i_ino); nilfs_set_de_type(de, inode); nilfs_commit_chunk(page, mapping, from, to); dir->i_mtime = inode_set_ctime_current(dir); + return 0; }
/* @@ -590,7 +594,10 @@ int nilfs_delete_entry(struct nilfs_dir_ from = (char *)pde - (char *)page_address(page); lock_page(page); err = nilfs_prepare_chunk(page, from, to); - BUG_ON(err); + if (unlikely(err)) { + unlock_page(page); + goto out; + } if (pde) pde->rec_len = nilfs_rec_len_to_disk(to - from); dir->inode = 0; --- a/fs/nilfs2/namei.c +++ b/fs/nilfs2/namei.c @@ -406,8 +406,10 @@ static int nilfs_rename(struct mnt_idmap err = PTR_ERR(new_de); goto out_dir; } - nilfs_set_link(new_dir, new_de, new_page, old_inode); + err = nilfs_set_link(new_dir, new_de, new_page, old_inode); nilfs_put_page(new_page); + if (unlikely(err)) + goto out_dir; nilfs_mark_inode_dirty(new_dir); inode_set_ctime_current(new_inode); if (dir_de) @@ -430,28 +432,27 @@ static int nilfs_rename(struct mnt_idmap */ inode_set_ctime_current(old_inode);
- nilfs_delete_entry(old_de, old_page); - - if (dir_de) { - nilfs_set_link(old_inode, dir_de, dir_page, new_dir); - nilfs_put_page(dir_page); - drop_nlink(old_dir); + err = nilfs_delete_entry(old_de, old_page); + if (likely(!err)) { + if (dir_de) { + err = nilfs_set_link(old_inode, dir_de, dir_page, + new_dir); + drop_nlink(old_dir); + } + nilfs_mark_inode_dirty(old_dir); } - nilfs_put_page(old_page); - - nilfs_mark_inode_dirty(old_dir); nilfs_mark_inode_dirty(old_inode);
- err = nilfs_transaction_commit(old_dir->i_sb); - return err; - out_dir: if (dir_de) nilfs_put_page(dir_page); out_old: nilfs_put_page(old_page); out: - nilfs_transaction_abort(old_dir->i_sb); + if (likely(!err)) + err = nilfs_transaction_commit(old_dir->i_sb); + else + nilfs_transaction_abort(old_dir->i_sb); return err; }
--- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -240,8 +240,8 @@ nilfs_find_entry(struct inode *, const s extern int nilfs_delete_entry(struct nilfs_dir_entry *, struct page *); extern int nilfs_empty_dir(struct inode *); extern struct nilfs_dir_entry *nilfs_dotdot(struct inode *, struct page **); -extern void nilfs_set_link(struct inode *, struct nilfs_dir_entry *, - struct page *, struct inode *); +int nilfs_set_link(struct inode *dir, struct nilfs_dir_entry *de, + struct page *page, struct inode *inode);
static inline void nilfs_put_page(struct page *page) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Patrick Bellasi derkling@google.com
commit 318e8c339c9a0891c389298bb328ed0762a9935e upstream.
In [1] the meaning of the synthetic IBPB flags has been redefined for a better separation of concerns: - ENTRY_IBPB -- issue IBPB on entry only - IBPB_ON_VMEXIT -- issue IBPB on VM-Exit only and the Retbleed mitigations have been updated to match this new semantics.
Commit [2] was merged shortly before [1], and their interaction was not handled properly. This resulted in IBPB not being triggered on VM-Exit in all SRSO mitigation configs requesting an IBPB there.
Specifically, an IBPB on VM-Exit is triggered only when X86_FEATURE_IBPB_ON_VMEXIT is set. However:
- X86_FEATURE_IBPB_ON_VMEXIT is not set for "spec_rstack_overflow=ibpb", because before [1] having X86_FEATURE_ENTRY_IBPB was enough. Hence, an IBPB is triggered on entry but the expected IBPB on VM-exit is not.
- X86_FEATURE_IBPB_ON_VMEXIT is not set also when "spec_rstack_overflow=ibpb-vmexit" if X86_FEATURE_ENTRY_IBPB is already set.
That's because before [1] this was effectively redundant. Hence, e.g. a "retbleed=ibpb spec_rstack_overflow=bpb-vmexit" config mistakenly reports the machine still vulnerable to SRSO, despite an IBPB being triggered both on entry and VM-Exit, because of the Retbleed selected mitigation config.
- UNTRAIN_RET_VM won't still actually do anything unless CONFIG_MITIGATION_IBPB_ENTRY is set.
For "spec_rstack_overflow=ibpb", enable IBPB on both entry and VM-Exit and clear X86_FEATURE_RSB_VMEXIT which is made superfluous by X86_FEATURE_IBPB_ON_VMEXIT. This effectively makes this mitigation option similar to the one for 'retbleed=ibpb', thus re-order the code for the RETBLEED_MITIGATION_IBPB option to be less confusing by having all features enabling before the disabling of the not needed ones.
For "spec_rstack_overflow=ibpb-vmexit", guard this mitigation setting with CONFIG_MITIGATION_IBPB_ENTRY to ensure UNTRAIN_RET_VM sequence is effectively compiled in. Drop instead the CONFIG_MITIGATION_SRSO guard, since none of the SRSO compile cruft is required in this configuration. Also, check only that the required microcode is present to effectively enabled the IBPB on VM-Exit.
Finally, update the KConfig description for CONFIG_MITIGATION_IBPB_ENTRY to list also all SRSO config settings enabled by this guard.
Fixes: 864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM") [1] Fixes: d893832d0e1e ("x86/srso: Add IBPB on VMEXIT") [2] Reported-by: Yosry Ahmed yosryahmed@google.com Signed-off-by: Patrick Bellasi derkling@google.com Reviewed-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig | 3 ++- arch/x86/kernel/cpu/bugs.c | 21 ++++++++++++++------- 2 files changed, 16 insertions(+), 8 deletions(-)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2514,7 +2514,8 @@ config CPU_IBPB_ENTRY depends on CPU_SUP_AMD && X86_64 default y help - Compile the kernel with support for the retbleed=ibpb mitigation. + Compile the kernel with support for the retbleed=ibpb and + spec_rstack_overflow={ibpb,ibpb-vmexit} mitigations.
config CPU_IBRS_ENTRY bool "Enable IBRS on kernel entry" --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1113,6 +1113,8 @@ do_cmd_auto:
case RETBLEED_MITIGATION_IBPB: setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB); + setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT); + mitigate_smt = true;
/* * IBPB on entry already obviates the need for @@ -1122,9 +1124,6 @@ do_cmd_auto: setup_clear_cpu_cap(X86_FEATURE_UNRET); setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
- setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT); - mitigate_smt = true; - /* * There is no need for RSB filling: entry_ibpb() ensures * all predictions, including the RSB, are invalidated, @@ -2626,6 +2625,7 @@ static void __init srso_select_mitigatio if (IS_ENABLED(CONFIG_CPU_IBPB_ENTRY)) { if (has_microcode) { setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB); + setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT); srso_mitigation = SRSO_MITIGATION_IBPB;
/* @@ -2635,6 +2635,13 @@ static void __init srso_select_mitigatio */ setup_clear_cpu_cap(X86_FEATURE_UNRET); setup_clear_cpu_cap(X86_FEATURE_RETHUNK); + + /* + * There is no need for RSB filling: entry_ibpb() ensures + * all predictions, including the RSB, are invalidated, + * regardless of IBPB implementation. + */ + setup_clear_cpu_cap(X86_FEATURE_RSB_VMEXIT); } } else { pr_err("WARNING: kernel not compiled with CPU_IBPB_ENTRY.\n"); @@ -2643,8 +2650,8 @@ static void __init srso_select_mitigatio break;
case SRSO_CMD_IBPB_ON_VMEXIT: - if (IS_ENABLED(CONFIG_CPU_SRSO)) { - if (!boot_cpu_has(X86_FEATURE_ENTRY_IBPB) && has_microcode) { + if (IS_ENABLED(CONFIG_CPU_IBPB_ENTRY)) { + if (has_microcode) { setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT); srso_mitigation = SRSO_MITIGATION_IBPB_ON_VMEXIT;
@@ -2656,9 +2663,9 @@ static void __init srso_select_mitigatio setup_clear_cpu_cap(X86_FEATURE_RSB_VMEXIT); } } else { - pr_err("WARNING: kernel not compiled with CPU_SRSO.\n"); + pr_err("WARNING: kernel not compiled with CPU_IBPB_ENTRY.\n"); goto pred_cmd; - } + } break;
default:
On 2/24/2025 6:33 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.80 release. There are 140 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 26 Feb 2025 14:25:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.80-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: FLorian Fainelli florian.fainelli@broadcom.com
linux-stable-mirror@lists.linaro.org