This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.0.8-rc1
Dokyung Song dokyung.song@gmail.com wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker()
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/sdvo: Setup DDC fully before output init
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/sdvo: Filter out invalid outputs more sensibly
Leo Chen sancchen@amd.com drm/amd/display: Update DSC capabilitie for DCN314
Dillon Varone Dillon.Varone@amd.com drm/amd/display: Update latencies on DCN321
Graham Sider Graham.Sider@amd.com drm/amdgpu: disable GFXOFF during compute for GFX11
Brian Norris briannorris@chromium.org drm/rockchip: dsi: Force synchronous probe
Brian Norris briannorris@chromium.org drm/rockchip: dsi: Clean up 'usage_mode' when failing to attach
Ronnie Sahlberg lsahlber@redhat.com cifs: fix regression in very old smb1 mounts
Matthew Wilcox (Oracle) willy@infradead.org ext4,f2fs: fix readahead of verity data
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: update the emulation mode after CR0 write
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: update the emulation mode after rsm
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: introduce emulator_recalc_and_set_mode
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: em_sysexit should update ctxt->mode
Maxim Levitsky mlevitsk@redhat.com KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
Marc Zyngier maz@kernel.org KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
Ryan Roberts ryan.roberts@arm.com KVM: arm64: Fix bad dereference on MTE-enabled systems
Sean Christopherson seanjc@google.com KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache
Michal Luczaj mhal@rbox.co KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
Emanuele Giuseppe Esposito eesposit@redhat.com KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
Sean Christopherson seanjc@google.com KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
Sean Christopherson seanjc@google.com KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
Sean Christopherson seanjc@google.com KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.8000001FH
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.80000001H
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.80000008H
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.8000001AH
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.80000006H
Jiri Olsa olsajiri@gmail.com x86/syscall: Include asm/ptrace.h in syscall_wrapper header
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/tdx: Panic on bad configs that #VE on "private" memory access
Dave Hansen dave.hansen@linux.intel.com x86/tdx: Prepare for using "INFO" call for a second purpose
Theodore Ts'o tytso@mit.edu ext4: update the backup superblock's at the end of the online resize
Luís Henriques lhenriques@suse.de ext4: fix BUG_ON() when directory entry has invalid rec_len
Ye Bin yebin10@huawei.com ext4: fix warning in 'ext4_da_release_space'
Helge Deller deller@gmx.de parisc: Avoid printing the hardware path twice
Helge Deller deller@gmx.de parisc: Export iosapic_serial_irq() symbol for serial port driver
Helge Deller deller@gmx.de parisc: Make 8250_gsc driver dependend on CONFIG_PARISC
Stefan Metzmacher metze@samba.org net: also flag accepted sockets supporting msghdr originated zerocopy
Pavel Begunkov asml.silence@gmail.com net: remove SOCK_SUPPORT_ZC from sockmap
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix pebs event constraints for SPR
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix pebs event constraints for ICL
Petr Benes petr.benes@ysoft.com ARM: dts: imx6dl-yapp4: Do not allow PM to switch PU regulator off on Q/QP
Mark Rutland mark.rutland@arm.com arm64: entry: avoid kprobe recursion
Pavel Begunkov asml.silence@gmail.com net/ulp: remove SOCK_SUPPORT_ZC from tls sockets
Ard Biesheuvel ardb@kernel.org efi: efivars: Fix variable writes with unsupported query_variable_store()
Ard Biesheuvel ardb@kernel.org efi: random: Use 'ACPI reclaim' memory for random seed
Ard Biesheuvel ardb@kernel.org efi: random: reduce seed size to 32 bytes
Mickaël Salaün mic@digikod.net selftests/landlock: Build without static libraries
Miklos Szeredi mszeredi@redhat.com fuse: fix readdir cache race
Miklos Szeredi mszeredi@redhat.com fuse: add file_modified() to fallocate
Gaosheng Cui cuigaosheng1@huawei.com capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
Zheng Yejian zhengyejian1@huawei.com tracing/histogram: Update document for KEYS_MAX size
Rasmus Villemoes linux@rasmusvillemoes.dk tools/nolibc/string: Fix memcmp() implementation
Steven Rostedt (Google) rostedt@goodmis.org ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()
Li Qiang liq3ea@163.com kprobe: reverse kp->flags when arm_kprobe failed
Shang XiaoJing shangxiaojing@huawei.com tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
Rafael Mendonca rafaelmendsr@gmail.com fprobe: Check rethook_alloc() return in rethook initialization
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/fprobe: Fix to check whether fprobe is registered correctly
Li Huafei lihuafei1@huawei.com ftrace: Fix use-after-free for dynamic ftrace_ops
Dan Williams dan.j.williams@intel.com cxl/region: Fix 'distance' calculation with passthrough ports
Dan Williams dan.j.williams@intel.com cxl/region: Fix cxl_region leak, cleanup targets at region delete
Dan Williams dan.j.williams@intel.com cxl/region: Fix region HPA ordering validation
Vishal Verma vishal.l.verma@intel.com cxl/region: Fix decoder allocation crash
Dan Williams dan.j.williams@intel.com cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
Dan Williams dan.j.williams@intel.com ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
Christophe JAILLET christophe.jaillet@wanadoo.fr btrfs: fix a memory allocation failure test in btrfs_submit_direct
Qu Wenruo wqu@suse.com btrfs: don't use btrfs_chunk::sub_stripes from disk
David Sterba dsterba@suse.com btrfs: fix type of parameter generation in btrfs_get_dentry
Josef Bacik josef@toxicpanda.com btrfs: fix tree mod log mishandling of reallocated nodes
Filipe Manana fdmanana@suse.com btrfs: fix lost file sync on direct IO write with nowait and dsync iocb
Geert Uytterhoeven geert+renesas@glider.be clk: renesas: r8a779g0: Add SASYNCPER clocks
Eric Biggers ebiggers@google.com fscrypt: fix keyring memory leak on mount failure
Eric Biggers ebiggers@google.com fscrypt: stop using keyrings subsystem for fscrypt_master_key
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix attempting to access uninitialized memory
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
Chen Zhongjin chenzhongjin@huawei.com i2c: piix4: Fix adapter not be removed in piix4_remove()
Cristian Marussi cristian.marussi@arm.com arm64: dts: juno: Add thermal critical trip points
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix deferred_tx_wq release on error paths
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix devres allocation device in virtio transport
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Make Rx chan_setup fail on memory errors
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Suppress the driver's bind attributes
Linus Walleij linus.walleij@linaro.org ARM: dts: ux500: Add trips to battery thermal zones
Chen Jun chenjun102@huawei.com blk-mq: Fix kmemleak in blk_mq_init_allocated_queue
Chen Zhongjin chenzhongjin@huawei.com block: Fix possible memory leak for rq_wb on add_disk failure
Ming Lei ming.lei@redhat.com ublk_drv: return flag of UBLK_F_URING_CMD_COMP_IN_TASK in case of module
Robert Beckett bob.beckett@collabora.com drm/i915: stop abusing swiotlb_max_segment
John Keeping john@metanate.com drm/rockchip: fix fbdev on non-IOMMU devices
Aurelien Jarno aurelien@aurel32.net drm/rockchip: dw_hdmi: filter regulator -EPROBE_DEFER error messages
Ioana Ciornei ioana.ciornei@nxp.com arm64: dts: ls208xa: specify clock frequencies for the MDIO controllers
Ioana Ciornei ioana.ciornei@nxp.com arm64: dts: ls1088a: specify clock frequencies for the MDIO controllers
Ioana Ciornei ioana.ciornei@nxp.com arm64: dts: lx2160a: specify clock frequencies for the MDIO controllers
Peng Fan peng.fan@nxp.com arm64: dts: imx93: correct gpio-ranges
Peng Fan peng.fan@nxp.com arm64: dts: imx93: add gpio clk
Peng Fan peng.fan@nxp.com arm64: dts: imx8: correct clock order
Tim Harvey tharvey@gateworks.com ARM: dts: imx6qdl-gw59{10,13}: fix user pushbutton GPIO offset
Li Jun jun.li@nxp.com arm64: dts: imx8mn: Correct the usb power domain
Li Jun jun.li@nxp.com arm64: dts: imx8mn: remove otg1 power domain dependency on hsio
Li Jun jun.li@nxp.com arm64: dts: imx8mm: correct usb power domains
Li Jun jun.li@nxp.com arm64: dts: imx8mm: remove otg1/2 power domain dependency on hsio
Max Krummenacher max.krummenacher@toradex.com arm64: dts: verdin-imx8mp: fix ctrl_sleep_moci
Taniya Das quic_tdas@quicinc.com clk: qcom: Update the force mem core bit for GPU clocks
Geert Uytterhoeven geert+renesas@glider.be clk: renesas: r8a779g0: Fix HSCIF parent clocks
Jerry Snitselaar jsnitsel@redhat.com efi/tpm: Pass correct address to memblock_reserve
Marek Vasut marex@denx.de arm64: dts: imx8mm: Enable CPLD_Dn pull down resistor on MX8Menlo
Marek Vasut marex@denx.de clk: rs9: Fix I2C accessors
Pavel Begunkov asml.silence@gmail.com bio: safeguard REQ_ALLOC_CACHE bio put
Martin Tůma martin.tuma@digiteqautomotive.com i2c: xiic: Add platform module alias
Xander Li xander_li@kingston.com.tw nvme-pci: disable write zeroes on various Kingston SSD
YuBiao Wang YuBiao.Wang@amd.com drm/amdgpu: dequeue mes scheduler during fini
Yifan Zha Yifan.Zha@amd.com drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
Nathan Chancellor nathan@kernel.org drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
Kenneth Feng kenneth.feng@amd.com drm/amd/pm: skip loading pptable from driver on secure board for smu_v13_0_10
Danijel Slivka danijel.slivka@amd.com drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
Samuel Bailey samuel.bailey1@gmail.com HID: saitek: add madcatz variant of MMO7 mouse device ID
Uday Shankar ushankar@purestorage.com scsi: core: Restrict legal sdev_state transitions via sysfs
Pavel Begunkov asml.silence@gmail.com io_uring: don't iopoll from io_ring_ctx_wait_and_kill()
Jason A. Donenfeld Jason@zx2c4.com hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()
Ashish Kalra ashish.kalra@amd.com ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
Maxime Ripard maxime@cerno.tech drm/vc4: hdmi: Check the HSM rate at runtime_resume
Sakari Ailus sakari.ailus@linux.intel.com media: v4l: subdev: Fail graciously when getting try data for NULL state
Benjamin Gaignard benjamin.gaignard@collabora.com media: hantro: HEVC: Fix chroma offset computation
Benjamin Gaignard benjamin.gaignard@collabora.com media: hantro: HEVC: Fix auxilary buffer size calculation
Benjamin Gaignard benjamin.gaignard@collabora.com media: hantro: Store HEVC bit depth in context
Hangyu Hua hbh25y@gmail.com media: meson: vdec: fix possible refcount leak in vdec_probe()
Rory Liu hellojacky0226@hotmail.com media: platform: cros-ec: Add Kuldax to the match table
Hans Verkuil hverkuil-cisco@xs4all.nl media: dvb-frontends/drxk: initialize err to 0
Hans Verkuil hverkuil-cisco@xs4all.nl media: cros-ec-cec: limit msg.len to CEC_MAX_MSG_SIZE
Hans Verkuil hverkuil-cisco@xs4all.nl media: s5p_cec: limit msg.len to CEC_MAX_MSG_SIZE
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Zero v4l2_subdev_format fields in when validating links
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Use correct macro for gradient registers
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Initialize color space on resizer sink and source pads
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Don't pass the quantization to rkisp1_csm_config()
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Fix source pad format configuration
Olivier Moysan olivier.moysan@foss.st.com iio: adc: stm32-adc: fix channel sampling time init
Dexuan Cui decui@microsoft.com vsock: fix possible infinite sleep in vsock_connectible_wait_data()
Zhengchao Shao shaozhengchao@huawei.com ipv6: fix WARNING in ip6_route_net_exit_late()
Ido Schimmel idosch@nvidia.com bridge: Fix flushing of dynamic FDB entries
Chen Zhongjin chenzhongjin@huawei.com net, neigh: Fix null-ptr-deref in neigh_table_clear()
Chen Zhongjin chenzhongjin@huawei.com net/smc: Fix possible leaked pernet namespace in smc_init()
Liu Peibao liupeibao@loongson.cn stmmac: dwmac-loongson: fix invalid mdio_node
Nick Child nnac123@linux.ibm.com ibmvnic: Free rwi on reset success
Gaosheng Cui cuigaosheng1@huawei.com net: mdio: fix undefined behavior in bit shift for __mdiobus_register
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
Hawkins Jiawei yin31149@gmail.com Bluetooth: L2CAP: Fix memory leak in vhci_write
Zhengchao Shao shaozhengchao@huawei.com Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
Soenke Huster soenke.huster@eknoes.de Bluetooth: virtio_bt: Use skb_put to set length
Pauli Virtanen pav@iki.fi Bluetooth: hci_conn: Fix CIS connection dst_type handling
Maxim Mikityanskiy maxtram95@gmail.com Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: enforce documented limit to prevent allocating huge memory
Filipe Manana fdmanana@suse.com btrfs: fix ulist leaks in error paths of qgroup self tests
Filipe Manana fdmanana@suse.com btrfs: fix inode list leak during backref walking at find_parent_nodes()
Filipe Manana fdmanana@suse.com btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
Yang Yingliang yangyingliang@huawei.com isdn: mISDN: netjet: fix wrong check of device registration
Yang Yingliang yangyingliang@huawei.com mISDN: fix possible memory leak in mISDN_register_device()
Zhang Qilong zhangqilong3@huawei.com rose: Fix NULL pointer dereference in rose_send_frame()
Zhengchao Shao shaozhengchao@huawei.com ipvs: fix WARNING in ip_vs_app_net_cleanup()
Zhengchao Shao shaozhengchao@huawei.com ipvs: fix WARNING in __ip_vs_cleanup_batch()
Jason A. Donenfeld Jason@zx2c4.com ipvs: use explicitly signed chars
Horatiu Vultur horatiu.vultur@microchip.com net: lan966x: Fix unmapping of received frames using FDMA
Horatiu Vultur horatiu.vultur@microchip.com net: lan966x: Fix FDMA when MTU is changed
Horatiu Vultur horatiu.vultur@microchip.com net: lan966x: Adjust maximum frame size when vlan is enabled/disabled
Horatiu Vultur horatiu.vultur@microchip.com net: lan966x: Fix the MTU calculation
Jeff Layton jlayton@kernel.org nfsd: fix net-namespace logic in __nfsd_file_cache_purge
Jeff Layton jlayton@kernel.org nfsd: fix nfsd_file_unhash_and_dispose
Christophe JAILLET christophe.jaillet@wanadoo.fr sfc: Fix an error handling path in efx_pci_probe()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: release flow rule object from commit path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: netlink notifier might race to release objects
Ziyang Xuan william.xuanziyang@huawei.com net: tun: fix bugs for oversize packet when napi frags enabled
Dan Carpenter dan.carpenter@oracle.com net: sched: Fix use after free in red_enqueue()
Yang Yingliang yangyingliang@huawei.com ata: palmld: fix return value check in palmld_pata_probe()
Sergey Shtylyov s.shtylyov@omp.ru ata: pata_legacy: fix pdc20230_set_piomode()
Zhang Changzhong zhangchangzhong@huawei.com net: fec: fix improper use of NETDEV_TX_BUSY
Shang XiaoJing shangxiaojing@huawei.com nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()
Shang XiaoJing shangxiaojing@huawei.com nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()
Shang XiaoJing shangxiaojing@huawei.com nfc: nxp-nci: Fix potential memory leak in nxp_nci_send()
Shang XiaoJing shangxiaojing@huawei.com nfc: fdp: Fix potential memory leak in fdp_nci_send()
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: fall back to default tagger if we can't load the one from DT
Willy Tarreau w@1wt.eu tools/nolibc: Fix missing strlen() definition and infinite loop with gcc-12
Dan Carpenter dan.carpenter@oracle.com RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()
Chen Zhongjin chenzhongjin@huawei.com RDMA/core: Fix null-ptr-deref in ib_core_cleanup()
Chen Zhongjin chenzhongjin@huawei.com net: dsa: Fix possible memory leaks in dsa_loop_init()
Zhang Xiaoxu zhangxiaoxu5@huawei.com nfs4: Fix kmemleak when allocate slot failed
Benjamin Coddington bcodding@redhat.com NFSv4.2: Fixup CLONE dest file size for zero-length count
Zhang Xiaoxu zhangxiaoxu5@huawei.com SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Fix a potential state reclaim deadlock
Li Zhijian lizhijian@fujitsu.com RDMA/rxe: Fix mr leak in RESPST_ERR_RNR
Akira Yokosawa akiyks@gmail.com docs/process/howto: Replace C89 with C11
Yixing Liu liuyixing1@huawei.com RDMA/hns: Fix NULL pointer problem in free_mr_init()
Yangyang Li liyangyang20@huawei.com RDMA/hns: Disable local invalidate operation
Dean Luick dean.luick@cornelisnetworks.com IB/hfi1: Correctly move list in sc_disable()
Håkon Bugge haakon.bugge@oracle.com RDMA/cma: Use output interface for net_dev check
Jason Gunthorpe jgg@ziepe.ca drm/i915/gvt: Add missing vfio_unregister_group_dev() call
Thinh Nguyen Thinh.Nguyen@synopsys.com usb: dwc3: gadget: Don't delay End Transfer on delayed_status
Wesley Cheng quic_wcheng@quicinc.com usb: dwc3: gadget: Force sending delayed status during soft disconnect
-------------
Diffstat:
Documentation/process/howto.rst | 2 +- Documentation/trace/histogram.rst | 2 +- Documentation/translations/it_IT/process/howto.rst | 2 +- Documentation/translations/ja_JP/howto.rst | 2 +- Documentation/translations/ko_KR/howto.rst | 2 +- Documentation/translations/zh_CN/process/howto.rst | 2 +- Documentation/translations/zh_TW/process/howto.rst | 2 +- Makefile | 4 +- arch/arm/boot/dts/imx6q-yapp4-crux.dts | 4 + arch/arm/boot/dts/imx6qdl-gw5910.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-gw5913.dtsi | 2 +- arch/arm/boot/dts/imx6qp-yapp4-crux-plus.dts | 4 + arch/arm/boot/dts/ste-href.dtsi | 8 + arch/arm/boot/dts/ste-snowball.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-codina-tmo.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-codina.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-gavini.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-golden.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-janice.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-kyle.dts | 8 + arch/arm/boot/dts/ste-ux500-samsung-skomer.dts | 8 + arch/arm64/boot/dts/arm/juno-base.dtsi | 14 + arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 6 + arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 6 + arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 6 + arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 18 +- arch/arm64/boot/dts/freescale/imx8mm-mx8menlo.dts | 16 +- arch/arm64/boot/dts/freescale/imx8mm.dtsi | 8 +- arch/arm64/boot/dts/freescale/imx8mn.dtsi | 4 +- arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi | 20 +- arch/arm64/boot/dts/freescale/imx93.dtsi | 21 +- arch/arm64/kernel/entry-common.c | 3 +- arch/arm64/kvm/hyp/exception.c | 3 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 20 + arch/arm64/kvm/hyp/nvhe/switch.c | 26 -- arch/arm64/kvm/hyp/vhe/switch.c | 8 - arch/parisc/include/asm/hardware.h | 12 +- arch/parisc/kernel/drivers.c | 14 +- arch/x86/coco/tdx/tdx.c | 25 +- arch/x86/events/intel/core.c | 1 + arch/x86/events/intel/ds.c | 18 +- arch/x86/include/asm/syscall_wrapper.h | 2 +- arch/x86/kvm/cpuid.c | 11 +- arch/x86/kvm/emulate.c | 108 +++-- arch/x86/kvm/vmx/capabilities.h | 19 +- arch/x86/kvm/vmx/vmx.c | 23 +- arch/x86/kvm/x86.c | 12 +- arch/x86/kvm/xen.c | 57 +-- block/bio.c | 2 +- block/blk-mq.c | 4 +- block/genhd.c | 1 + drivers/acpi/apei/ghes.c | 2 +- drivers/acpi/numa/srat.c | 1 + drivers/ata/pata_legacy.c | 5 +- drivers/ata/pata_palmld.c | 4 +- drivers/block/ublk_drv.c | 3 + drivers/bluetooth/virtio_bt.c | 2 +- drivers/char/hw_random/bcm2835-rng.c | 2 +- drivers/clk/clk-renesas-pcie.c | 65 ++- drivers/clk/qcom/gcc-sc7280.c | 1 + drivers/clk/qcom/gpucc-sc7280.c | 1 + drivers/clk/renesas/r8a779g0-cpg-mssr.c | 13 +- drivers/cxl/core/pmem.c | 2 + drivers/cxl/core/port.c | 11 +- drivers/cxl/core/region.c | 90 ++-- drivers/cxl/cxl.h | 4 +- drivers/cxl/pmem.c | 101 +++-- drivers/firmware/arm_scmi/driver.c | 9 +- drivers/firmware/arm_scmi/virtio.c | 26 +- drivers/firmware/efi/efi.c | 2 +- drivers/firmware/efi/libstub/random.c | 7 +- drivers/firmware/efi/tpm.c | 2 +- drivers/firmware/efi/vars.c | 68 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 18 +- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 42 +- .../drm/amd/display/dc/dcn314/dcn314_resource.c | 2 +- .../gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c | 10 +- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 5 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 +- drivers/gpu/drm/i915/display/intel_sdvo.c | 58 ++- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 19 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + drivers/gpu/drm/i915/i915_scatterlist.h | 34 +- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 22 +- drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 3 +- drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 5 +- drivers/gpu/drm/vc4/vc4_hdmi.c | 20 + drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + drivers/hid/hid-saitek.c | 2 + drivers/i2c/busses/i2c-piix4.c | 1 + drivers/i2c/busses/i2c-xiic.c | 1 + drivers/iio/adc/stm32-adc.c | 11 +- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/device.c | 10 +- drivers/infiniband/core/nldev.c | 2 +- drivers/infiniband/hw/hfi1/pio.c | 3 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 15 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 2 - drivers/infiniband/hw/qedr/main.c | 9 +- drivers/infiniband/sw/rxe/rxe_resp.c | 4 +- drivers/isdn/hardware/mISDN/netjet.c | 2 +- drivers/isdn/mISDN/core.c | 5 +- drivers/media/cec/platform/cros-ec/cros-ec-cec.c | 4 + drivers/media/cec/platform/s5p/s5p_cec.c | 2 + drivers/media/dvb-frontends/drxk_hard.c | 2 +- .../platform/rockchip/rkisp1/rkisp1-capture.c | 7 +- .../media/platform/rockchip/rkisp1/rkisp1-isp.c | 40 +- .../media/platform/rockchip/rkisp1/rkisp1-params.c | 14 +- .../media/platform/rockchip/rkisp1/rkisp1-regs.h | 2 +- .../platform/rockchip/rkisp1/rkisp1-resizer.c | 4 + drivers/net/dsa/dsa_loop.c | 25 +- drivers/net/ethernet/freescale/fec_main.c | 4 +- drivers/net/ethernet/ibm/ibmvnic.c | 16 +- .../net/ethernet/microchip/lan966x/lan966x_fdma.c | 26 +- .../net/ethernet/microchip/lan966x/lan966x_main.c | 4 +- .../net/ethernet/microchip/lan966x/lan966x_main.h | 2 + .../net/ethernet/microchip/lan966x/lan966x_regs.h | 15 + .../net/ethernet/microchip/lan966x/lan966x_vlan.c | 6 + drivers/net/ethernet/sfc/efx.c | 8 +- .../net/ethernet/stmicro/stmmac/dwmac-loongson.c | 7 +- drivers/net/phy/mdio_bus.c | 2 +- drivers/net/tun.c | 3 +- .../wireless/broadcom/brcm80211/brcmfmac/fweh.c | 4 + drivers/nfc/fdp/fdp.c | 10 +- drivers/nfc/nfcmrvl/i2c.c | 7 +- drivers/nfc/nxp-nci/core.c | 7 +- drivers/nfc/s3fwrn5/core.c | 8 +- drivers/nvme/host/pci.c | 10 + drivers/parisc/iosapic.c | 1 + drivers/scsi/scsi_sysfs.c | 8 + drivers/staging/media/hantro/hantro_drv.c | 7 + drivers/staging/media/hantro/hantro_g2_hevc_dec.c | 2 +- drivers/staging/media/hantro/hantro_hevc.c | 4 +- drivers/staging/media/meson/vdec/vdec.c | 2 + drivers/tty/serial/8250/Kconfig | 2 +- fs/btrfs/backref.c | 54 ++- fs/btrfs/ctree.h | 5 +- fs/btrfs/export.c | 2 +- fs/btrfs/export.h | 2 +- fs/btrfs/extent-tree.c | 25 +- fs/btrfs/file.c | 22 +- fs/btrfs/inode.c | 16 +- fs/btrfs/tests/qgroup-tests.c | 20 +- fs/btrfs/volumes.c | 12 +- fs/cifs/connect.c | 11 +- fs/crypto/fscrypt_private.h | 71 ++- fs/crypto/hooks.c | 10 +- fs/crypto/keyring.c | 491 +++++++++++---------- fs/crypto/keysetup.c | 81 ++-- fs/crypto/policy.c | 8 +- fs/ext4/ioctl.c | 3 +- fs/ext4/migrate.c | 3 +- fs/ext4/namei.c | 10 +- fs/ext4/resize.c | 5 + fs/ext4/verity.c | 3 +- fs/f2fs/verity.c | 3 +- fs/fuse/file.c | 4 + fs/fuse/readdir.c | 10 +- fs/nfs/delegation.c | 36 +- fs/nfs/nfs42proc.c | 3 + fs/nfs/nfs4client.c | 1 + fs/nfs/nfs4state.c | 2 + fs/nfsd/filecache.c | 39 +- fs/super.c | 3 +- include/acpi/ghes.h | 2 +- include/linux/efi.h | 2 +- include/linux/fs.h | 2 +- include/linux/fscrypt.h | 4 +- include/linux/kvm_host.h | 24 +- include/media/v4l2-subdev.h | 6 + include/net/sock.h | 7 + io_uring/io_uring.c | 13 +- kernel/kprobes.c | 5 +- kernel/trace/fprobe.c | 5 +- kernel/trace/ftrace.c | 16 +- kernel/trace/kprobe_event_gen_test.c | 18 +- kernel/trace/ring_buffer.c | 11 + net/bluetooth/hci_conn.c | 18 +- net/bluetooth/iso.c | 14 +- net/bluetooth/l2cap_core.c | 84 +++- net/bridge/br_netlink.c | 2 +- net/bridge/br_sysfs_br.c | 2 +- net/core/neighbour.c | 2 +- net/dsa/dsa2.c | 13 +- net/ipv4/af_inet.c | 2 + net/ipv4/tcp_bpf.c | 4 +- net/ipv4/tcp_ulp.c | 3 + net/ipv4/udp_bpf.c | 4 +- net/ipv6/route.c | 14 +- net/netfilter/ipset/ip_set_hash_gen.h | 30 +- net/netfilter/ipvs/ip_vs_app.c | 10 +- net/netfilter/ipvs/ip_vs_conn.c | 30 +- net/netfilter/nf_tables_api.c | 8 +- net/rose/rose_link.c | 3 + net/sched/sch_red.c | 4 +- net/smc/af_smc.c | 6 +- net/sunrpc/sysfs.c | 12 +- net/unix/unix_bpf.c | 8 +- net/vmw_vsock/af_vsock.c | 5 +- security/commoncap.c | 6 +- tools/include/nolibc/string.h | 17 +- tools/testing/selftests/landlock/Makefile | 7 +- virt/kvm/pfncache.c | 62 ++- 213 files changed, 1962 insertions(+), 1102 deletions(-)
From: Wesley Cheng quic_wcheng@quicinc.com
commit e1ee843488d58099a89979627ef85d5bd6c5cacd upstream.
If any function drivers request for a delayed status phase, this leads to a SETUP transfer timeout error, since the function may take longer to process the DATA stage. This eventually results in end transfer timeouts, as there is a pending SETUP transaction.
In addition, allow the DWC3_EP_DELAY_STOP to be set for if there is a delayed status requested. Ocasionally, a host may abort the current SETUP transaction, by issuing a subsequent SETUP token. In those situations, it would result in an endxfer timeout as well.
Reviewed-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Wesley Cheng quic_wcheng@quicinc.com Link: https://lore.kernel.org/r/20220817182359.13550-3-quic_wcheng@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/gadget.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 0ed9826a4c47..530ef3232418 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -3712,7 +3712,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, * timeout. Delay issuing the End Transfer command until the Setup TRB is * prepared. */ - if (dwc->ep0state != EP0_SETUP_PHASE && !dwc->delayed_status) { + if (dwc->ep0state != EP0_SETUP_PHASE) { dep->flags |= DWC3_EP_DELAY_STOP; return; }
From: Thinh Nguyen Thinh.Nguyen@synopsys.com
commit 4db0fbb601361767144e712beb96704b966339f5 upstream.
The gadget driver may wait on the request completion when it sets the USB_GADGET_DELAYED_STATUS. Make sure that the End Transfer command can go through if the dwc->delayed_status is set so that the request can complete. When the delayed_status is set, the Setup packet is already processed, and the next phase should be either Data or Status. It's unlikely that the host would cancel the control transfer and send a new Setup packet during End Transfer command. But if that's the case, we can try again when ep0state returns to EP0_SETUP_PHASE.
Fixes: e1ee843488d5 ("usb: dwc3: gadget: Force sending delayed status during soft disconnect") Cc: stable@vger.kernel.org Signed-off-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Link: https://lore.kernel.org/r/3f9f59e5d74efcbaee444cf4b30ef639cc7b124e.166614695... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/gadget.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 530ef3232418..0ed9826a4c47 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -3712,7 +3712,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, * timeout. Delay issuing the End Transfer command until the Setup TRB is * prepared. */ - if (dwc->ep0state != EP0_SETUP_PHASE) { + if (dwc->ep0state != EP0_SETUP_PHASE && !dwc->delayed_status) { dep->flags |= DWC3_EP_DELAY_STOP; return; }
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit f423fa1bc9fe1978e6b9f54927411b62cb43eb04 ]
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvidi... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..dacd57732dbe 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached)) return; + vfio_unregister_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu); }
On Tue, 8 Nov 2022 14:37:21 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit f423fa1bc9fe1978e6b9f54927411b62cb43eb04 ]
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvidi... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..dacd57732dbe 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu);
}
Nak, the v6.0 backport for this also needs to call vfio_uninit_group_dev(). kvmgt had missed both calls, but at the time of f423fa1bc9fe this latter missing call had already been replaced by vfio_put_device() in a5ddd2a99a7a, where cb9ff3f3b84c had implemented a device release function with this call. The correct backport should be:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..2404d856f764 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,8 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached)) return; + vfio_unregister_group_dev(&vgpu->vfio_device); + vfio_uninit_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu); }
Kevin, Yi, please confirm. Thanks,
Alex
On Tue, Nov 08, 2022 at 09:16:51AM -0700, Alex Williamson wrote:
On Tue, 8 Nov 2022 14:37:21 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit f423fa1bc9fe1978e6b9f54927411b62cb43eb04 ]
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvidi... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..dacd57732dbe 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu);
}
Nak, the v6.0 backport for this also needs to call vfio_uninit_group_dev(). kvmgt had missed both calls, but at the time of f423fa1bc9fe this latter missing call had already been replaced by vfio_put_device() in a5ddd2a99a7a, where cb9ff3f3b84c had implemented a device release function with this call. The correct backport should be:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..2404d856f764 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,8 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device);
- vfio_uninit_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu);
} Kevin, Yi, please confirm. Thanks,
Can you submit a real backport for this? I'll go drop this one for now.
thanks,
greg k-h
From: Alex Williamson alex.williamson@redhat.com Sent: Wednesday, November 9, 2022 12:17 AM
On Tue, 8 Nov 2022 14:37:21 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit f423fa1bc9fe1978e6b9f54927411b62cb43eb04 ]
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use
vfio_register_emulated_iommu_dev")
Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-
vfio_gvt_unregister_jgg@nvidia.com
Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
b/drivers/gpu/drm/i915/gvt/kvmgt.c
index e3cd58946477..dacd57732dbe 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,7 @@ static void intel_vgpu_remove(struct
mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu);
}
Nak, the v6.0 backport for this also needs to call vfio_uninit_group_dev(). kvmgt had missed both calls, but at the time of f423fa1bc9fe this latter missing call had already been replaced by vfio_put_device() in a5ddd2a99a7a, where cb9ff3f3b84c had implemented a device release function with this call. The correct backport should be:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..2404d856f764 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,8 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device);
- vfio_uninit_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu);
}
Kevin, Yi, please confirm. Thanks,
Reviewed-by: Kevin Tian kevin.tian@intel.com
From: Håkon Bugge haakon.bugge@oracle.com
[ Upstream commit eb83f502adb036cd56c27e13b9ca3b2aabfa790b ]
Commit 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation") swapped the src and dst addresses in the call to validate_net_dev().
As a consequence, the test in validate_ipv4_net_dev() to see if the net_dev is the right one, is incorrect for port 1 <-> 2 communication when the ports are on the same sub-net. This is fixed by denoting the flowi4_oif as the device instead of the incoming one.
The bug has not been observed using IPv6 addresses.
Fixes: 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation") Signed-off-by: Håkon Bugge haakon.bugge@oracle.com Link: https://lore.kernel.org/r/20221012141542.16925-1-haakon.bugge@oracle.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index be317f2665a9..ff8821f79fec 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1556,7 +1556,7 @@ static bool validate_ipv4_net_dev(struct net_device *net_dev, return false;
memset(&fl4, 0, sizeof(fl4)); - fl4.flowi4_iif = net_dev->ifindex; + fl4.flowi4_oif = net_dev->ifindex; fl4.daddr = daddr; fl4.saddr = saddr;
From: Dean Luick dean.luick@cornelisnetworks.com
[ Upstream commit 1afac08b39d85437187bb2a92d89a741b1078f55 ]
Commit 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") incorrectly tries to move a list from one list head to another. The result is a kernel crash.
The crash is triggered when a link goes down and there are waiters for a send to complete. The following signature is seen:
BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] Call Trace: sc_disable+0x1ba/0x240 [hfi1] pio_freeze+0x3d/0x60 [hfi1] handle_freeze+0x27/0x1b0 [hfi1] process_one_work+0x1b0/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xd7/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30
The fix is to use the correct call to move the list.
Fixes: 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") Signed-off-by: Dean Luick dean.luick@cornelisnetworks.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Link: https://lore.kernel.org/r/166610327042.674422.6146908799669288976.stgit@awfm... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/pio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c index 3d42bd2b36bd..51ae58c02b15 100644 --- a/drivers/infiniband/hw/hfi1/pio.c +++ b/drivers/infiniband/hw/hfi1/pio.c @@ -913,8 +913,7 @@ void sc_disable(struct send_context *sc) spin_unlock(&sc->release_lock);
write_seqlock(&sc->waitlock); - if (!list_empty(&sc->piowait)) - list_move(&sc->piowait, &wake_list); + list_splice_init(&sc->piowait, &wake_list); write_sequnlock(&sc->waitlock); while (!list_empty(&wake_list)) { struct iowait *wait;
From: Yangyang Li liyangyang20@huawei.com
[ Upstream commit 9e272ed69ad6f6952fafd0599d6993575512408e ]
When function reset and local invalidate are mixed, HNS RoCEE may hang. Before introducing the cause of the problem, two hardware internal concepts need to be introduced:
1. Execution queue: The queue of hardware execution instructions, function reset and local invalidate are queued for execution in this queue.
2.Local queue: A queue that stores local operation instructions. The instructions in the local queue will be sent to the execution queue for execution. The instructions in the local queue will not be removed until the execution is completed.
The reason for the problem is as follows:
1. There is a function reset instruction in the execution queue, which is currently being executed. A necessary condition for the successful execution of function reset is: the hardware pipeline needs to empty the instructions that were not completed before;
2. A local invalidate instruction at the head of the local queue is sent to the execution queue. Now there are two instructions in the execution queue, the first is the function reset instruction, and the second is the local invalidate instruction, which will be executed in se quence;
3. The user has issued many local invalidate operations, causing the local queue to be filled up.
4. The user still has a new local operation command and is queuing to enter the local queue. But the local queue is full and cannot receive new instructions, this instruction is temporarily stored at the hardware pipeline.
5. The function reset has been waiting for the instruction before the hardware pipeline stage is drained. The hardware pipeline stage also caches a local invalidate instruction, so the function reset cannot be completed, and the instructions after it cannot be executed.
These factors together cause the execution logic deadlock of the hardware, and the consequence is that RoCEE will not have any response. Considering that the local operation command may potentially cause RoCEE to hang, this feature is no longer supported.
Fixes: e93df0108579 ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Yangyang Li liyangyang20@huawei.com Signed-off-by: Wenpeng Liang liangwenpeng@huawei.com Signed-off-by: Haoyue Xu xuhaoyue1@hisilicon.com Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 11 ----------- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 2 -- 2 files changed, 13 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index c780646bd60a..8507d788cc94 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -118,7 +118,6 @@ static const u32 hns_roce_op_code[] = { HR_OPC_MAP(ATOMIC_CMP_AND_SWP, ATOM_CMP_AND_SWAP), HR_OPC_MAP(ATOMIC_FETCH_AND_ADD, ATOM_FETCH_AND_ADD), HR_OPC_MAP(SEND_WITH_INV, SEND_WITH_INV), - HR_OPC_MAP(LOCAL_INV, LOCAL_INV), HR_OPC_MAP(MASKED_ATOMIC_CMP_AND_SWP, ATOM_MSK_CMP_AND_SWAP), HR_OPC_MAP(MASKED_ATOMIC_FETCH_AND_ADD, ATOM_MSK_FETCH_AND_ADD), HR_OPC_MAP(REG_MR, FAST_REG_PMR), @@ -560,9 +559,6 @@ static int set_rc_opcode(struct hns_roce_dev *hr_dev, else ret = -EOPNOTSUPP; break; - case IB_WR_LOCAL_INV: - hr_reg_enable(rc_sq_wqe, RC_SEND_WQE_SO); - fallthrough; case IB_WR_SEND_WITH_INV: rc_sq_wqe->inv_key = cpu_to_le32(wr->ex.invalidate_rkey); break; @@ -3226,7 +3222,6 @@ static int hns_roce_v2_write_mtpt(struct hns_roce_dev *hr_dev,
hr_reg_write(mpt_entry, MPT_ST, V2_MPT_ST_VALID); hr_reg_write(mpt_entry, MPT_PD, mr->pd); - hr_reg_enable(mpt_entry, MPT_L_INV_EN);
hr_reg_write_bool(mpt_entry, MPT_BIND_EN, mr->access & IB_ACCESS_MW_BIND); @@ -3317,7 +3312,6 @@ static int hns_roce_v2_frmr_write_mtpt(struct hns_roce_dev *hr_dev,
hr_reg_enable(mpt_entry, MPT_RA_EN); hr_reg_enable(mpt_entry, MPT_R_INV_EN); - hr_reg_enable(mpt_entry, MPT_L_INV_EN);
hr_reg_enable(mpt_entry, MPT_FRE); hr_reg_clear(mpt_entry, MPT_MR_MW); @@ -3349,7 +3343,6 @@ static int hns_roce_v2_mw_write_mtpt(void *mb_buf, struct hns_roce_mw *mw) hr_reg_write(mpt_entry, MPT_PD, mw->pdn);
hr_reg_enable(mpt_entry, MPT_R_INV_EN); - hr_reg_enable(mpt_entry, MPT_L_INV_EN); hr_reg_enable(mpt_entry, MPT_LW_EN);
hr_reg_enable(mpt_entry, MPT_MR_MW); @@ -3798,7 +3791,6 @@ static const u32 wc_send_op_map[] = { HR_WC_OP_MAP(RDMA_READ, RDMA_READ), HR_WC_OP_MAP(RDMA_WRITE, RDMA_WRITE), HR_WC_OP_MAP(RDMA_WRITE_WITH_IMM, RDMA_WRITE), - HR_WC_OP_MAP(LOCAL_INV, LOCAL_INV), HR_WC_OP_MAP(ATOM_CMP_AND_SWAP, COMP_SWAP), HR_WC_OP_MAP(ATOM_FETCH_AND_ADD, FETCH_ADD), HR_WC_OP_MAP(ATOM_MSK_CMP_AND_SWAP, MASKED_COMP_SWAP), @@ -3848,9 +3840,6 @@ static void fill_send_wc(struct ib_wc *wc, struct hns_roce_v2_cqe *cqe) case HNS_ROCE_V2_WQE_OP_RDMA_WRITE_WITH_IMM: wc->wc_flags |= IB_WC_WITH_IMM; break; - case HNS_ROCE_V2_WQE_OP_LOCAL_INV: - wc->wc_flags |= IB_WC_WITH_INVALIDATE; - break; case HNS_ROCE_V2_WQE_OP_ATOM_CMP_AND_SWAP: case HNS_ROCE_V2_WQE_OP_ATOM_FETCH_AND_ADD: case HNS_ROCE_V2_WQE_OP_ATOM_MSK_CMP_AND_SWAP: diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h index 64797109bab6..4544a8775ce5 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h @@ -182,7 +182,6 @@ enum { HNS_ROCE_V2_WQE_OP_ATOM_MSK_CMP_AND_SWAP = 0x8, HNS_ROCE_V2_WQE_OP_ATOM_MSK_FETCH_AND_ADD = 0x9, HNS_ROCE_V2_WQE_OP_FAST_REG_PMR = 0xa, - HNS_ROCE_V2_WQE_OP_LOCAL_INV = 0xb, HNS_ROCE_V2_WQE_OP_BIND_MW = 0xc, HNS_ROCE_V2_WQE_OP_MASK = 0x1f, }; @@ -916,7 +915,6 @@ struct hns_roce_v2_rc_send_wqe { #define RC_SEND_WQE_OWNER RC_SEND_WQE_FIELD_LOC(7, 7) #define RC_SEND_WQE_CQE RC_SEND_WQE_FIELD_LOC(8, 8) #define RC_SEND_WQE_FENCE RC_SEND_WQE_FIELD_LOC(9, 9) -#define RC_SEND_WQE_SO RC_SEND_WQE_FIELD_LOC(10, 10) #define RC_SEND_WQE_SE RC_SEND_WQE_FIELD_LOC(11, 11) #define RC_SEND_WQE_INLINE RC_SEND_WQE_FIELD_LOC(12, 12) #define RC_SEND_WQE_WQE_INDEX RC_SEND_WQE_FIELD_LOC(30, 15)
From: Yixing Liu liuyixing1@huawei.com
[ Upstream commit 12bcaf87d8b66d8cd812479c8a6349dcb245375c ]
Lock grab occurs in a concurrent scenario, resulting in stepping on a NULL pointer. It should be init mutex_init() first before use the lock.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: __mutex_lock.constprop.0+0xd0/0x5c0 __mutex_lock_slowpath+0x1c/0x2c mutex_lock+0x44/0x50 free_mr_send_cmd_to_hw+0x7c/0x1c0 [hns_roce_hw_v2] hns_roce_v2_dereg_mr+0x30/0x40 [hns_roce_hw_v2] hns_roce_dereg_mr+0x4c/0x130 [hns_roce_hw_v2] ib_dereg_mr_user+0x54/0x124 uverbs_free_mr+0x24/0x30 destroy_hw_idr_uobject+0x38/0x74 uverbs_destroy_uobject+0x48/0x1c4 uobj_destroy+0x74/0xcc ib_uverbs_cmd_verbs+0x368/0xbb0 ib_uverbs_ioctl+0xec/0x1a4 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x58/0x190 do_el0_svc+0x30/0x90 el0_svc+0x2c/0xb4 el0t_64_sync_handler+0x1a4/0x1b0 el0t_64_sync+0x19c/0x1a0
Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Yixing Liu liuyixing1@huawei.com Signed-off-by: Haoyue Xu xuhaoyue1@hisilicon.com Link: https://lore.kernel.org/r/20221024083814.1089722-3-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 8507d788cc94..105888c6ccb7 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -2805,8 +2805,12 @@ static int free_mr_modify_qp(struct hns_roce_dev *hr_dev)
static int free_mr_init(struct hns_roce_dev *hr_dev) { + struct hns_roce_v2_priv *priv = hr_dev->priv; + struct hns_roce_v2_free_mr *free_mr = &priv->free_mr; int ret;
+ mutex_init(&free_mr->mutex); + ret = free_mr_alloc_res(hr_dev); if (ret) return ret;
From: Akira Yokosawa akiyks@gmail.com
[ Upstream commit 2f3f53d62307262f0086804ea7cea99b0e085450 ]
Commit e8c07082a810 ("Kbuild: move to -std=gnu11") updated process/programming-language.rst, but failed to update process/howto.rst.
Update howto.rst and resolve the inconsistency.
Fixes: e8c07082a810 ("Kbuild: move to -std=gnu11") Signed-off-by: Akira Yokosawa akiyks@gmail.com Cc: Arnd Bergmann arnd@arndb.de Cc: Federico Vaga federico.vaga@vaga.pv.it Cc: Alex Shi alexs@kernel.org Cc: Hu Haowen src.res@email.cn Cc: Tsugikazu Shibata shibata@linuxfoundation.org Link: https://lore.kernel.org/r/20221015092201.32099-1-akiyks@gmail.com Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/process/howto.rst | 2 +- Documentation/translations/it_IT/process/howto.rst | 2 +- Documentation/translations/ja_JP/howto.rst | 2 +- Documentation/translations/ko_KR/howto.rst | 2 +- Documentation/translations/zh_CN/process/howto.rst | 2 +- Documentation/translations/zh_TW/process/howto.rst | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index cd6997a9d203..8fc5398c732b 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -36,7 +36,7 @@ experience, the following books are good for, if anything, reference: - "C: A Reference Manual" by Harbison and Steele [Prentice Hall]
The kernel is written using GNU C and the GNU toolchain. While it -adheres to the ISO C89 standard, it uses a number of extensions that are +adheres to the ISO C11 standard, it uses a number of extensions that are not featured in the standard. The kernel is a freestanding C environment, with no reliance on the standard C library, so some portions of the C standard are not supported. Arbitrary long long diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst index 16ad5622d549..67b84f015da8 100644 --- a/Documentation/translations/it_IT/process/howto.rst +++ b/Documentation/translations/it_IT/process/howto.rst @@ -44,7 +44,7 @@ altro, utili riferimenti: - "C: A Reference Manual" di Harbison and Steele [Prentice Hall]
Il kernel è stato scritto usando GNU C e la toolchain GNU. -Sebbene si attenga allo standard ISO C89, esso utilizza una serie di +Sebbene si attenga allo standard ISO C11, esso utilizza una serie di estensioni che non sono previste in questo standard. Il kernel è un ambiente C indipendente, che non ha alcuna dipendenza dalle librerie C standard, così alcune parti del C standard non sono supportate. diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst index 649e2ff2a407..e2e946a4298a 100644 --- a/Documentation/translations/ja_JP/howto.rst +++ b/Documentation/translations/ja_JP/howto.rst @@ -65,7 +65,7 @@ Linux カーネル開発のやり方 - 『新・詳説 C 言語 H&S リファレンス』 (サミュエル P ハービソン/ガイ L スティール共著 斉藤 信男監訳)[ソフトバンク]
カーネルは GNU C と GNU ツールチェインを使って書かれています。カーネル -は ISO C89 仕様に準拠して書く一方で、標準には無い言語拡張を多く使って +は ISO C11 仕様に準拠して書く一方で、標準には無い言語拡張を多く使って います。カーネルは標準 C ライブラリに依存しない、C 言語非依存環境です。 そのため、C の標準の中で使えないものもあります。特に任意の long long の除算や浮動小数点は使えません。カーネルがツールチェインや C 言語拡張 diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index e43970584ca4..2a7ab4257e44 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -62,7 +62,7 @@ Documentation/process/howto.rst - "Practical C Programming" by Steve Oualline [O'Reilly] - "C: A Reference Manual" by Harbison and Steele [Prentice Hall]
-커널은 GNU C와 GNU 툴체인을 사용하여 작성되었다. 이 툴들은 ISO C89 표준을 +커널은 GNU C와 GNU 툴체인을 사용하여 작성되었다. 이 툴들은 ISO C11 표준을 따르는 반면 표준에 있지 않은 많은 확장기능도 가지고 있다. 커널은 표준 C 라이브러리와는 관계없이 freestanding C 환경이어서 C 표준의 일부는 지원되지 않는다. 임의의 long long 나누기나 floating point는 지원되지 않는다. diff --git a/Documentation/translations/zh_CN/process/howto.rst b/Documentation/translations/zh_CN/process/howto.rst index 1455190dc087..e7e4bb000b8a 100644 --- a/Documentation/translations/zh_CN/process/howto.rst +++ b/Documentation/translations/zh_CN/process/howto.rst @@ -45,7 +45,7 @@ Linux内核大部分是由C语言写成的,一些体系结构相关的代码 - "C: A Reference Manual" by Harbison and Steele [Prentice Hall] 《C语言参考手册(原书第5版)》(邱仲潘 等译)[机械工业出版社]
-Linux内核使用GNU C和GNU工具链开发。虽然它遵循ISO C89标准,但也用到了一些 +Linux内核使用GNU C和GNU工具链开发。虽然它遵循ISO C11标准,但也用到了一些 标准中没有定义的扩展。内核是自给自足的C环境,不依赖于标准C库的支持,所以 并不支持C标准中的部分定义。比如long long类型的大数除法和浮点运算就不允许 使用。有时候确实很难弄清楚内核对工具链的要求和它所使用的扩展,不幸的是目 diff --git a/Documentation/translations/zh_TW/process/howto.rst b/Documentation/translations/zh_TW/process/howto.rst index 68ae4411285b..e335789d7e26 100644 --- a/Documentation/translations/zh_TW/process/howto.rst +++ b/Documentation/translations/zh_TW/process/howto.rst @@ -48,7 +48,7 @@ Linux內核大部分是由C語言寫成的,一些體系結構相關的代碼 - "C: A Reference Manual" by Harbison and Steele [Prentice Hall] 《C語言參考手冊(原書第5版)》(邱仲潘 等譯)[機械工業出版社]
-Linux內核使用GNU C和GNU工具鏈開發。雖然它遵循ISO C89標準,但也用到了一些 +Linux內核使用GNU C和GNU工具鏈開發。雖然它遵循ISO C11標準,但也用到了一些 標準中沒有定義的擴展。內核是自給自足的C環境,不依賴於標準C庫的支持,所以 並不支持C標準中的部分定義。比如long long類型的大數除法和浮點運算就不允許 使用。有時候確實很難弄清楚內核對工具鏈的要求和它所使用的擴展,不幸的是目
From: Li Zhijian lizhijian@fujitsu.com
[ Upstream commit b5f9a01fae42684648c2ee3cd9985f80c67ab9f7 ]
rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr) to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:
WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe] ... Call Trace: rxe_dereg_mr+0x4c/0x60 [rdma_rxe] ib_dereg_mr_user+0xa8/0x200 [ib_core] ib_mr_pool_destroy+0x77/0xb0 [ib_core] nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma] nvme_rdma_free_queue+0x40/0x50 [nvme_rdma] nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma] nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma] process_one_work+0x582/0xa40 ? pwq_dec_nr_in_flight+0x100/0x100 ? rwlock_bug.part.0+0x60/0x60 worker_thread+0x2a9/0x700 ? process_one_work+0xa40/0xa40 kthread+0x168/0x1a0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30
Link: https://lore.kernel.org/r/20221024052049.20577-1-lizhijian@fujitsu.com Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") Signed-off-by: Li Zhijian lizhijian@fujitsu.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/rxe/rxe_resp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 7c336db5cb54..f8b1d9fa0494 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -806,8 +806,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
skb = prepare_ack_packet(qp, &ack_pkt, opcode, payload, res->cur_psn, AETH_ACK_UNLIMITED); - if (!skb) + if (!skb) { + rxe_put(mr); return RESPST_ERR_RNR; + }
rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), payload, RXE_FROM_MR_OBJ);
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 1ba04394e028ea8b45d92685cc0d6ab582cf7647 ]
If the server reboots while we are engaged in a delegation return, and there is a pNFS layout with return-on-close set, then the current code can end up deadlocking in pnfs_roc() when nfs_inode_set_delegation() tries to return the old delegation. Now that delegreturn actually uses its own copy of the stateid, it should be safe to just always update the delegation stateid in place.
Fixes: 078000d02d57 ("pNFS: We want return-on-close to complete when evicting the inode") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/delegation.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 5c97cad741a7..ead8a0e06abf 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -228,8 +228,7 @@ static int nfs_delegation_claim_opens(struct inode *inode, * */ void nfs_inode_reclaim_delegation(struct inode *inode, const struct cred *cred, - fmode_t type, - const nfs4_stateid *stateid, + fmode_t type, const nfs4_stateid *stateid, unsigned long pagemod_limit) { struct nfs_delegation *delegation; @@ -239,25 +238,24 @@ void nfs_inode_reclaim_delegation(struct inode *inode, const struct cred *cred, delegation = rcu_dereference(NFS_I(inode)->delegation); if (delegation != NULL) { spin_lock(&delegation->lock); - if (nfs4_is_valid_delegation(delegation, 0)) { - nfs4_stateid_copy(&delegation->stateid, stateid); - delegation->type = type; - delegation->pagemod_limit = pagemod_limit; - oldcred = delegation->cred; - delegation->cred = get_cred(cred); - clear_bit(NFS_DELEGATION_NEED_RECLAIM, - &delegation->flags); - spin_unlock(&delegation->lock); - rcu_read_unlock(); - put_cred(oldcred); - trace_nfs4_reclaim_delegation(inode, type); - return; - } - /* We appear to have raced with a delegation return. */ + nfs4_stateid_copy(&delegation->stateid, stateid); + delegation->type = type; + delegation->pagemod_limit = pagemod_limit; + oldcred = delegation->cred; + delegation->cred = get_cred(cred); + clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags); + if (test_and_clear_bit(NFS_DELEGATION_REVOKED, + &delegation->flags)) + atomic_long_inc(&nfs_active_delegations); spin_unlock(&delegation->lock); + rcu_read_unlock(); + put_cred(oldcred); + trace_nfs4_reclaim_delegation(inode, type); + } else { + rcu_read_unlock(); + nfs_inode_set_delegation(inode, cred, type, stateid, + pagemod_limit); } - rcu_read_unlock(); - nfs_inode_set_delegation(inode, cred, type, stateid, pagemod_limit); }
static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync)
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 5d917cba3201e5c25059df96c29252fd99c4f6a7 ]
If RECLAIM_COMPLETE sets the NFS4CLNT_BIND_CONN_TO_SESSION flag, then we need to loop back in order to handle it.
Fixes: 0048fdd06614 ("NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 9bab3e9c702a..0b6bd6336c98 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2671,6 +2671,7 @@ static void nfs4_state_manager(struct nfs_client *clp) if (status < 0) goto out_error; nfs4_state_end_reclaim_reboot(clp); + continue; }
/* Detect expired delegations... */
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit e59679f2b7e522ecad99974e5636291ffd47c184 ]
Currently, we are only guaranteed to send RECLAIM_COMPLETE if we have open state to recover. Fix the client to always send RECLAIM_COMPLETE after setting up the lease.
Fixes: fce5c838e133 ("nfs41: RECLAIM_COMPLETE functionality") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 0b6bd6336c98..a629d7db9420 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1787,6 +1787,7 @@ static void nfs4_state_mark_reclaim_helper(struct nfs_client *clp,
static void nfs4_state_start_reclaim_reboot(struct nfs_client *clp) { + set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state); /* Mark all delegations for reclaim */ nfs_delegation_mark_reclaim(clp); nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_reboot);
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit cbdeaee94a415800c65a8c3fa04d9664a8b8fb3a ]
There is a null-ptr-deref when xps sysfs alloc failed: BUG: KASAN: null-ptr-deref in sysfs_do_create_link_sd+0x40/0xd0 Read of size 8 at addr 0000000000000030 by task gssproxy/457
CPU: 5 PID: 457 Comm: gssproxy Not tainted 6.0.0-09040-g02357b27ee03 #9 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xa3/0x120 sysfs_do_create_link_sd+0x40/0xd0 rpc_sysfs_client_setup+0x161/0x1b0 rpc_new_client+0x3fc/0x6e0 rpc_create_xprt+0x71/0x220 rpc_create+0x1d4/0x350 gssp_rpc_create+0xc3/0x160 set_gssp_clnt+0xbc/0x140 write_gssp+0x116/0x1a0 proc_reg_write+0xd6/0x130 vfs_write+0x177/0x690 ksys_write+0xb9/0x150 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
When the xprt_switch sysfs alloc failed, should not add xprt and switch sysfs to it, otherwise, maybe null-ptr-deref; also initialize the 'xps_sysfs' to NULL to avoid oops when destroy it.
Fixes: 2a338a543163 ("sunrpc: add a symlink from rpc-client directory to the xprt_switch") Fixes: d408ebe04ac5 ("sunrpc: add add sysfs directory per xprt under each xprt_switch") Fixes: baea99445dd4 ("sunrpc: add xprt_switch direcotry to sunrpc's sysfs") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/sysfs.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/sysfs.c b/net/sunrpc/sysfs.c index c65c90ad626a..c1f559892ae8 100644 --- a/net/sunrpc/sysfs.c +++ b/net/sunrpc/sysfs.c @@ -518,13 +518,16 @@ void rpc_sysfs_client_setup(struct rpc_clnt *clnt, struct net *net) { struct rpc_sysfs_client *rpc_client; + struct rpc_sysfs_xprt_switch *xswitch = + (struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs; + + if (!xswitch) + return;
rpc_client = rpc_sysfs_client_alloc(rpc_sunrpc_client_kobj, net, clnt->cl_clid); if (rpc_client) { char name[] = "switch"; - struct rpc_sysfs_xprt_switch *xswitch = - (struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs; int ret;
clnt->cl_sysfs = rpc_client; @@ -558,6 +561,8 @@ void rpc_sysfs_xprt_switch_setup(struct rpc_xprt_switch *xprt_switch, rpc_xprt_switch->xprt_switch = xprt_switch; rpc_xprt_switch->xprt = xprt; kobject_uevent(&rpc_xprt_switch->kobject, KOBJ_ADD); + } else { + xprt_switch->xps_sysfs = NULL; } }
@@ -569,6 +574,9 @@ void rpc_sysfs_xprt_setup(struct rpc_xprt_switch *xprt_switch, struct rpc_sysfs_xprt_switch *switch_obj = (struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs;
+ if (!switch_obj) + return; + rpc_xprt = rpc_sysfs_xprt_alloc(&switch_obj->kobject, xprt, gfp_flags); if (rpc_xprt) { xprt->xprt_sysfs = rpc_xprt;
From: Benjamin Coddington bcodding@redhat.com
[ Upstream commit 038efb6348ce96228f6828354cb809c22a661681 ]
When holding a delegation, the NFS client optimizes away setting the attributes of a file from the GETATTR in the compound after CLONE, and for a zero-length CLONE we will end up setting the inode's size to zero in nfs42_copy_dest_done(). Handle this case by computing the resulting count from the server's reported size after CLONE's GETATTR.
Suggested-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Benjamin Coddington bcodding@redhat.com Fixes: 94d202d5ca39 ("NFSv42: Copy offload should update the file size when appropriate") Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 6dab9e408372..21c9e97c3ba3 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -1093,6 +1093,9 @@ static int _nfs42_proc_clone(struct rpc_message *msg, struct file *src_f, &args.seq_args, &res.seq_res, 0); trace_nfs4_clone(src_inode, dst_inode, &args, status); if (status == 0) { + /* a zero-length count means clone to EOF in src */ + if (count == 0 && res.dst_fattr->valid & NFS_ATTR_FATTR_SIZE) + count = nfs_size_to_loff_t(res.dst_fattr->size) - dst_offset; nfs42_copy_dest_done(dst_inode, dst_offset, count); status = nfs_post_op_update_inode(dst_inode, res.dst_fattr); }
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit 7e8436728e22181c3f12a5dbabd35ed3a8b8c593 ]
If one of the slot allocate failed, should cleanup all the other allocated slots, otherwise, the allocated slots will leak:
unreferenced object 0xffff8881115aa100 (size 64): comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s) hex dump (first 32 bytes): 00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff ...s......Z..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130 [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270 [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90 [<00000000128486db>] nfs4_init_client+0xce/0x270 [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0 [<000000000e593b52>] nfs4_create_server+0x300/0x5f0 [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110 [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0 [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0 [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0 [<000000005d56bdec>] do_syscall_64+0x35/0x80 [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: abf79bb341bf ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4client.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 3c5678aec006..8ae2827da28d 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -346,6 +346,7 @@ int nfs40_init_client(struct nfs_client *clp) ret = nfs4_setup_slot_table(tbl, NFS4_MAX_SLOT_TABLE, "NFSv4.0 transport Slot table"); if (ret) { + nfs4_shutdown_slot_table(tbl); kfree(tbl); return ret; }
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 633efc8b3dc96f56f5a57f2a49764853a2fa3f50 ]
kmemleak reported memory leaks in dsa_loop_init():
kmemleak: 12 new suspected memory leaks
unreferenced object 0xffff8880138ce000 (size 2048): comm "modprobe", pid 390, jiffies 4295040478 (age 238.976s) backtrace: [<000000006a94f1d5>] kmalloc_trace+0x26/0x60 [<00000000a9c44622>] phy_device_create+0x5d/0x970 [<00000000d0ee2afc>] get_phy_device+0xf3/0x2b0 [<00000000dca0c71f>] __fixed_phy_register.part.0+0x92/0x4e0 [<000000008a834798>] fixed_phy_register+0x84/0xb0 [<0000000055223fcb>] dsa_loop_init+0xa9/0x116 [dsa_loop] ...
There are two reasons for memleak in dsa_loop_init().
First, fixed_phy_register() create and register phy_device:
fixed_phy_register() get_phy_device() phy_device_create() # freed by phy_device_free() phy_device_register() # freed by phy_device_remove()
But fixed_phy_unregister() only calls phy_device_remove(). So the memory allocated in phy_device_create() is leaked.
Second, when mdio_driver_register() fail in dsa_loop_init(), it just returns and there is no cleanup for phydevs.
Fix the problems by catching the error of mdio_driver_register() in dsa_loop_init(), then calling both fixed_phy_unregister() and phy_device_free() to release phydevs. Also add a function for phydevs cleanup to avoid duplacate.
Fixes: 98cd1552ea27 ("net: dsa: Mock-up driver") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/dsa_loop.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c index 263e41191c29..0d20ebafbd03 100644 --- a/drivers/net/dsa/dsa_loop.c +++ b/drivers/net/dsa/dsa_loop.c @@ -378,6 +378,17 @@ static struct mdio_driver dsa_loop_drv = {
#define NUM_FIXED_PHYS (DSA_LOOP_NUM_PORTS - 2)
+static void dsa_loop_phydevs_unregister(void) +{ + unsigned int i; + + for (i = 0; i < NUM_FIXED_PHYS; i++) + if (!IS_ERR(phydevs[i])) { + fixed_phy_unregister(phydevs[i]); + phy_device_free(phydevs[i]); + } +} + static int __init dsa_loop_init(void) { struct fixed_phy_status status = { @@ -385,23 +396,23 @@ static int __init dsa_loop_init(void) .speed = SPEED_100, .duplex = DUPLEX_FULL, }; - unsigned int i; + unsigned int i, ret;
for (i = 0; i < NUM_FIXED_PHYS; i++) phydevs[i] = fixed_phy_register(PHY_POLL, &status, NULL);
- return mdio_driver_register(&dsa_loop_drv); + ret = mdio_driver_register(&dsa_loop_drv); + if (ret) + dsa_loop_phydevs_unregister(); + + return ret; } module_init(dsa_loop_init);
static void __exit dsa_loop_exit(void) { - unsigned int i; - mdio_driver_unregister(&dsa_loop_drv); - for (i = 0; i < NUM_FIXED_PHYS; i++) - if (!IS_ERR(phydevs[i])) - fixed_phy_unregister(phydevs[i]); + dsa_loop_phydevs_unregister(); } module_exit(dsa_loop_exit);
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 07c0d131cc0fe1f3981a42958fc52d573d303d89 ]
KASAN reported a null-ptr-deref error:
KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] CPU: 1 PID: 379 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:destroy_workqueue+0x2f/0x740 RSP: 0018:ffff888016137df8 EFLAGS: 00000202 ... Call Trace: ib_core_cleanup+0xa/0xa1 [ib_core] __do_sys_delete_module.constprop.0+0x34f/0x5b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa1a0d221b7 ...
It is because the fail of roce_gid_mgmt_init() is ignored:
ib_core_init() roce_gid_mgmt_init() gid_cache_wq = alloc_ordered_workqueue # fail ... ib_core_cleanup() roce_gid_mgmt_cleanup() destroy_workqueue(gid_cache_wq) # destroy an unallocated wq
Fix this by catching the fail of roce_gid_mgmt_init() in ib_core_init().
Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Link: https://lore.kernel.org/r/20221025024146.109137-1-chenzhongjin@huawei.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/device.c | 10 +++++++++- drivers/infiniband/core/nldev.c | 2 +- 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index d275db195f1a..4053a09b8d33 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2815,10 +2815,18 @@ static int __init ib_core_init(void)
nldev_init(); rdma_nl_register(RDMA_NL_LS, ibnl_ls_cb_table); - roce_gid_mgmt_init(); + ret = roce_gid_mgmt_init(); + if (ret) { + pr_warn("Couldn't init RoCE GID management\n"); + goto err_parent; + }
return 0;
+err_parent: + rdma_nl_unregister(RDMA_NL_LS); + nldev_exit(); + unregister_pernet_device(&rdma_dev_net_ops); err_compat: unregister_blocking_lsm_notifier(&ibdev_lsm_nb); err_sa: diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index b92358f606d0..12dc97067ed2 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -2537,7 +2537,7 @@ void __init nldev_init(void) rdma_nl_register(RDMA_NL_NLDEV, nldev_cb_table); }
-void __exit nldev_exit(void) +void nldev_exit(void) { rdma_nl_unregister(RDMA_NL_NLDEV); }
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 7a47e077e503feb73d56e491ce89aa73b67a3972 ]
Add a check for if create_singlethread_workqueue() fails and also destroy the work queue on failure paths.
Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/Y1gBkDucQhhWj5YM@kili Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/qedr/main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c index 5152f10d2e6d..ba0c3e4c07d8 100644 --- a/drivers/infiniband/hw/qedr/main.c +++ b/drivers/infiniband/hw/qedr/main.c @@ -344,6 +344,10 @@ static int qedr_alloc_resources(struct qedr_dev *dev) if (IS_IWARP(dev)) { xa_init(&dev->qps); dev->iwarp_wq = create_singlethread_workqueue("qedr_iwarpq"); + if (!dev->iwarp_wq) { + rc = -ENOMEM; + goto err1; + } }
/* Allocate Status blocks for CNQ */ @@ -351,7 +355,7 @@ static int qedr_alloc_resources(struct qedr_dev *dev) GFP_KERNEL); if (!dev->sb_array) { rc = -ENOMEM; - goto err1; + goto err_destroy_wq; }
dev->cnq_array = kcalloc(dev->num_cnq, @@ -402,6 +406,9 @@ static int qedr_alloc_resources(struct qedr_dev *dev) kfree(dev->cnq_array); err2: kfree(dev->sb_array); +err_destroy_wq: + if (IS_IWARP(dev)) + destroy_workqueue(dev->iwarp_wq); err1: kfree(dev->sgid_tbl); return rc;
From: Willy Tarreau w@1wt.eu
[ Upstream commit bfc3b0f05653a28c8d41067a2aa3875d1f982e3e ]
When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen() and replaces it with a jump to strlen(), which is not defined as a symbol and breaks compilation. Worse, when the function is called strlen(), the function is simply replaced with a jump to itself, hence becomes an infinite loop.
One way to avoid this is to always set -ffreestanding, but the calling code doesn't know this and there's no way (either via attributes or pragmas) to globally enable it from include files, effectively leaving a painful situation for the caller.
Alexey suggested to place an empty asm() statement inside the loop to stop gcc from recognizing a well-known pattern, which happens to work pretty fine. At least it allows us to make sure our local definition is not replaced with a self jump.
The function only needs to be renamed back to strlen() so that the symbol exists, which implies that nolibc_strlen() which is used on variable strings has to be declared as a macro that points back to it before the strlen() macro is redifined.
It was verified to produce valid code with gcc 3.4 to 12.1 at different optimization levels, and both with constant and variable strings.
In case this problem surfaces again in the future, an alternate approach consisting in adding an optimize("no-tree-loop-distribute-patterns") function attribute for gcc>=12 worked as well but is less pretty.
Reported-by: kernel test robot yujie.liu@intel.com Link: https://lore.kernel.org/r/202210081618.754a77db-yujie.liu@intel.com Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc") Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0") Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Alexey Dobriyan adobriyan@gmail.com Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/include/nolibc/string.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h index bef35bee9c44..718a405ffbc3 100644 --- a/tools/include/nolibc/string.h +++ b/tools/include/nolibc/string.h @@ -125,14 +125,18 @@ char *strcpy(char *dst, const char *src) }
/* this function is only used with arguments that are not constants or when - * it's not known because optimizations are disabled. + * it's not known because optimizations are disabled. Note that gcc 12 + * recognizes an strlen() pattern and replaces it with a jump to strlen(), + * thus itself, hence the asm() statement below that's meant to disable this + * confusing practice. */ static __attribute__((unused)) -size_t nolibc_strlen(const char *str) +size_t strlen(const char *str) { size_t len;
- for (len = 0; str[len]; len++); + for (len = 0; str[len]; len++) + asm(""); return len; }
@@ -140,13 +144,12 @@ size_t nolibc_strlen(const char *str) * the two branches, then will rely on an external definition of strlen(). */ #if defined(__OPTIMIZE__) +#define nolibc_strlen(x) strlen(x) #define strlen(str) ({ \ __builtin_constant_p((str)) ? \ __builtin_strlen((str)) : \ nolibc_strlen((str)); \ }) -#else -#define strlen(str) nolibc_strlen((str)) #endif
static __attribute__((unused))
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit a2c65a9d0568b6737c02b54f00b80716a53fac61 ]
DSA tagging protocol drivers can be changed at runtime through sysfs and at probe time through the device tree (support for the latter was added later).
When changing through sysfs, it is assumed that the module for the new tagging protocol was already loaded into the kernel (in fact this is only a concern for Ocelot/Felix switches, where we have tag_ocelot.ko and tag_ocelot_8021q.ko; for every other switch, the default and alternative protocols are compiled within the same .ko, so there is nothing for the user to load).
The kernel cannot currently call request_module(), because it has no way of constructing the modalias name of the tagging protocol driver ("dsa_tag-%d", where the number is one of DSA_TAG_PROTO_*_VALUE). The device tree only contains the string name of the tagging protocol ("ocelot-8021q"), and the only mapping between the string and the DSA_TAG_PROTO_OCELOT_8021Q_VALUE is present in tag_ocelot_8021q.ko. So this is a chicken-and-egg situation and dsa_core.ko has nothing based on which it can automatically request the insertion of the module.
As a consequence, if CONFIG_NET_DSA_TAG_OCELOT_8021Q is built as module, the switch will forever defer probing.
The long-term solution is to make DSA call request_module() somehow, but that probably needs some refactoring.
What we can do to keep operating with existing device tree blobs is to cancel the attempt to change the tagging protocol with the one specified there, and to remain operating with the default one. Depending on the situation, the default protocol might still allow some functionality (in the case of ocelot, it does), and it's better to have that than to fail to probe.
Fixes: deff710703d8 ("net: dsa: Allow default tag protocol to be overridden from DT") Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/ Reported-by: Heiko Thiery heiko.thiery@gmail.com Reported-by: Michael Walle michael@walle.cc Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Tested-by: Michael Walle michael@walle.cc Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20221027145439.3086017-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/dsa2.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index cac48a741f27..e537655e442b 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -1407,9 +1407,9 @@ static enum dsa_tag_protocol dsa_get_tag_protocol(struct dsa_port *dp, static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master, const char *user_protocol) { + const struct dsa_device_ops *tag_ops = NULL; struct dsa_switch *ds = dp->ds; struct dsa_switch_tree *dst = ds->dst; - const struct dsa_device_ops *tag_ops; enum dsa_tag_protocol default_proto;
/* Find out which protocol the switch would prefer. */ @@ -1432,10 +1432,17 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master, }
tag_ops = dsa_find_tagger_by_name(user_protocol); - } else { - tag_ops = dsa_tag_driver_get(default_proto); + if (IS_ERR(tag_ops)) { + dev_warn(ds->dev, + "Failed to find a tagging driver for protocol %s, using default\n", + user_protocol); + tag_ops = NULL; + } }
+ if (!tag_ops) + tag_ops = dsa_tag_driver_get(default_proto); + if (IS_ERR(tag_ops)) { if (PTR_ERR(tag_ops) == -ENOPROTOOPT) return -EPROBE_DEFER;
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 8e4aae6b8ca76afb1fb64dcb24be44ba814e7f8a ]
fdp_nci_send() will call fdp_nci_i2c_write that will not free skb in the function. As a result, when fdp_nci_i2c_write() finished, the skb will memleak. fdp_nci_send() should free skb after fdp_nci_i2c_write() finished.
Fixes: a06347c04c13 ("NFC: Add Intel Fields Peak NFC solution driver") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/fdp/fdp.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/nfc/fdp/fdp.c b/drivers/nfc/fdp/fdp.c index c6b3334f24c9..f12f903a9dd1 100644 --- a/drivers/nfc/fdp/fdp.c +++ b/drivers/nfc/fdp/fdp.c @@ -249,11 +249,19 @@ static int fdp_nci_close(struct nci_dev *ndev) static int fdp_nci_send(struct nci_dev *ndev, struct sk_buff *skb) { struct fdp_nci_info *info = nci_get_drvdata(ndev); + int ret;
if (atomic_dec_and_test(&info->data_pkt_counter)) info->data_pkt_counter_cb(ndev);
- return info->phy_ops->write(info->phy, skb); + ret = info->phy_ops->write(info->phy, skb); + if (ret < 0) { + kfree_skb(skb); + return ret; + } + + consume_skb(skb); + return 0; }
static int fdp_nci_request_firmware(struct nci_dev *ndev)
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 7bf1ed6aff0f70434bd0cdd45495e83f1dffb551 ]
nxp_nci_send() will call nxp_nci_i2c_write(), and only free skb when nxp_nci_i2c_write() failed. However, even if the nxp_nci_i2c_write() run succeeds, the skb will not be freed in nxp_nci_i2c_write(). As the result, the skb will memleak. nxp_nci_send() should also free the skb when nxp_nci_i2c_write() succeeds.
Fixes: dece45855a8b ("NFC: nxp-nci: Add support for NXP NCI chips") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/nxp-nci/core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/nxp-nci/core.c b/drivers/nfc/nxp-nci/core.c index 7c93d484dc1b..580cb6ecffee 100644 --- a/drivers/nfc/nxp-nci/core.c +++ b/drivers/nfc/nxp-nci/core.c @@ -80,10 +80,13 @@ static int nxp_nci_send(struct nci_dev *ndev, struct sk_buff *skb) return -EINVAL;
r = info->phy_ops->write(info->phy_id, skb); - if (r < 0) + if (r < 0) { kfree_skb(skb); + return r; + }
- return r; + consume_skb(skb); + return 0; }
static int nxp_nci_rf_pll_unlocked_ntf(struct nci_dev *ndev,
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 3a146b7e3099dc7cf3114f627d9b79291e2d2203 ]
s3fwrn5_nci_send() will call s3fwrn5_i2c_write() or s3fwrn82_uart_write(), and free the skb if write() failed. However, even if the write() run succeeds, the skb will not be freed in write(). As the result, the skb will memleak. s3fwrn5_nci_send() should also free the skb when write() succeeds.
Fixes: c04c674fadeb ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/s3fwrn5/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/s3fwrn5/core.c b/drivers/nfc/s3fwrn5/core.c index 1c412007fabb..0270e05b68df 100644 --- a/drivers/nfc/s3fwrn5/core.c +++ b/drivers/nfc/s3fwrn5/core.c @@ -110,11 +110,15 @@ static int s3fwrn5_nci_send(struct nci_dev *ndev, struct sk_buff *skb) }
ret = s3fwrn5_write(info, skb); - if (ret < 0) + if (ret < 0) { kfree_skb(skb); + mutex_unlock(&info->mutex); + return ret; + }
+ consume_skb(skb); mutex_unlock(&info->mutex); - return ret; + return 0; }
static int s3fwrn5_nci_post_setup(struct nci_dev *ndev)
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 93d904a734a74c54d945a9884b4962977f1176cd ]
nfcmrvl_i2c_nci_send() will be called by nfcmrvl_nci_send(), and skb should be freed in nfcmrvl_i2c_nci_send(). However, nfcmrvl_nci_send() will only free skb when i2c_master_send() return >=0, which means skb will memleak when i2c_master_send() failed. Free skb no matter whether i2c_master_send() succeeds.
Fixes: b5b3e23e4cac ("NFC: nfcmrvl: add i2c driver") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/nfcmrvl/i2c.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nfc/nfcmrvl/i2c.c b/drivers/nfc/nfcmrvl/i2c.c index 01329b91d59d..a902720cd849 100644 --- a/drivers/nfc/nfcmrvl/i2c.c +++ b/drivers/nfc/nfcmrvl/i2c.c @@ -132,10 +132,15 @@ static int nfcmrvl_i2c_nci_send(struct nfcmrvl_private *priv, ret = -EREMOTEIO; } else ret = 0; + } + + if (ret) { kfree_skb(skb); + return ret; }
- return ret; + consume_skb(skb); + return 0; }
static void nfcmrvl_i2c_nci_update_config(struct nfcmrvl_private *priv,
From: Zhang Changzhong zhangchangzhong@huawei.com
[ Upstream commit 06a4df5863f73af193a4ff7abf7cb04058584f06 ]
The ndo_start_xmit() method must not free skb when returning NETDEV_TX_BUSY, since caller is going to requeue freed skb.
Fix it by returning NETDEV_TX_OK in case of dma_map_single() fails.
Fixes: 79f339125ea3 ("net: fec: Add software TSO support") Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index a486435ceee2..5aa254eaa8d0 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -657,7 +657,7 @@ fec_enet_txq_put_data_tso(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, dev_kfree_skb_any(skb); if (net_ratelimit()) netdev_err(ndev, "Tx DMA memory map failed\n"); - return NETDEV_TX_BUSY; + return NETDEV_TX_OK; }
bdp->cbd_datlen = cpu_to_fec16(size); @@ -719,7 +719,7 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq, dev_kfree_skb_any(skb); if (net_ratelimit()) netdev_err(ndev, "Tx DMA memory map failed\n"); - return NETDEV_TX_BUSY; + return NETDEV_TX_OK; } }
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 171a93182eccd6e6835d2c86b40787f9f832efaa ]
Clang gives a warning when compiling pata_legacy.c with 'make W=1' about the 'rt' local variable in pdc20230_set_piomode() being set but unused. Quite obviously, there is an outb() call missing to write back the updated variable. Moreover, checking the docs by Petr Soucek revealed that bitwise AND should have been done with a negated timing mask and the master/slave timing masks were swapped while updating...
Fixes: 669a5db411d8 ("[libata] Add a bunch of PATA drivers.") Reported-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/pata_legacy.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/ata/pata_legacy.c b/drivers/ata/pata_legacy.c index 0a8bf09a5c19..03c580625c2c 100644 --- a/drivers/ata/pata_legacy.c +++ b/drivers/ata/pata_legacy.c @@ -315,9 +315,10 @@ static void pdc20230_set_piomode(struct ata_port *ap, struct ata_device *adev) outb(inb(0x1F4) & 0x07, 0x1F4);
rt = inb(0x1F3); - rt &= 0x07 << (3 * adev->devno); + rt &= ~(0x07 << (3 * !adev->devno)); if (pio) - rt |= (1 + 3 * pio) << (3 * adev->devno); + rt |= (1 + 3 * pio) << (3 * !adev->devno); + outb(rt, 0x1F3);
udelay(100); outb(inb(0x1F2) | 0x01, 0x1F2);
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 015618c3ec19584c83ff179fa631be8cec906aaf ]
If devm_platform_ioremap_resource() fails, it never return NULL pointer, replace the check with IS_ERR().
Fixes: 57bf0f5a162d ("ARM: pxa: use pdev resource for palmld mmio") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Sergey Shtylyov s.shtylyov@omp.ru Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/pata_palmld.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ata/pata_palmld.c b/drivers/ata/pata_palmld.c index 400e65190904..51caa2a427dd 100644 --- a/drivers/ata/pata_palmld.c +++ b/drivers/ata/pata_palmld.c @@ -63,8 +63,8 @@ static int palmld_pata_probe(struct platform_device *pdev)
/* remap drive's physical memory address */ mem = devm_platform_ioremap_resource(pdev, 0); - if (!mem) - return -ENOMEM; + if (IS_ERR(mem)) + return PTR_ERR(mem);
/* request and activate power and reset GPIOs */ lda->power = devm_gpiod_get(dev, "power", GPIOD_OUT_HIGH);
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 8bdc2acd420c6f3dd1f1c78750ec989f02a1e2b9 ]
We can't use "skb" again after passing it to qdisc_enqueue(). This is basically identical to commit 2f09707d0c97 ("sch_sfb: Also store skb len before calling child enqueue").
Fixes: d7f4f332f082 ("sch_red: update backlog as well") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_red.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index f1e013e3f04a..935d90874b1b 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -72,6 +72,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, { struct red_sched_data *q = qdisc_priv(sch); struct Qdisc *child = q->qdisc; + unsigned int len; int ret;
q->vars.qavg = red_calc_qavg(&q->parms, @@ -126,9 +127,10 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, break; }
+ len = qdisc_pkt_len(skb); ret = qdisc_enqueue(skb, child, to_free); if (likely(ret == NET_XMIT_SUCCESS)) { - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; } else if (net_xmit_drop_count(ret)) { q->stats.pdrop++;
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 363a5328f4b0517e59572118ccfb7c626d81dca9 ]
Recently, we got two syzkaller problems because of oversize packet when napi frags enabled.
One of the problems is because the first seg size of the iov_iter from user space is very big, it is 2147479538 which is bigger than the threshold value for bail out early in __alloc_pages(). And skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN flag. Thus we got a warning as following:
======================================================== WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 ... Call trace: __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 __alloc_pages_node include/linux/gfp.h:550 [inline] alloc_pages_node include/linux/gfp.h:564 [inline] kmalloc_large_node+0x94/0x350 mm/slub.c:4038 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 __skb_grow include/linux/skbuff.h:2779 [inline] tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
The other problem is because odd IPv6 packets without NEXTHDR_NONE extension header and have big packet length, it is 2127925 which is bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in ipv6_gro_receive(), network_header offset and transport_header offset are all bigger than U16_MAX. That would trigger skb->network_header and skb->transport_header overflow error, because they are all '__u16' type. Eventually, it would affect the value for __skb_push(skb, value), and make it be a big value. After __skb_push() in ipv6_gro_receive(), skb->data would less than skb->head, an out of bounds memory bug occurred. That would trigger the problem as following:
================================================================== BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260 ... Call trace: dump_backtrace+0xd8/0x130 show_stack+0x1c/0x50 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0xbc/0x2e8 print_report+0x100/0x1e4 kasan_report+0x80/0x120 __asan_load8+0x78/0xa0 eth_type_trans+0x100/0x260 napi_gro_frags+0x164/0x550 tun_get_user+0xda4/0x1270 tun_chr_write_iter+0x74/0x130 do_iter_readv_writev+0x130/0x1ec do_iter_write+0xbc/0x1e0 vfs_writev+0x13c/0x26c
To fix the problems, restrict the packet size less than (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved skb space in napi_alloc_skb() because transport_header is an offset from skb->head. Add len check in tun_napi_alloc_frags() simply.
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index db736b944016..b02bd0a6c0a9 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1459,7 +1459,8 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, int err; int i;
- if (it->nr_segs > MAX_SKB_FRAGS + 1) + if (it->nr_segs > MAX_SKB_FRAGS + 1 || + len > (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN)) return ERR_PTR(-EMSGSIZE);
local_bh_disable();
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit d4bc8271db21ea9f1c86a1ca4d64999f184d4aae ]
commit release path is invoked via call_rcu and it runs lockless to release the objects after rcu grace period. The netlink notifier handler might win race to remove objects that the transaction context is still referencing from the commit release path.
Call rcu_barrier() to ensure pending rcu callbacks run to completion if the list of transactions to be destroyed is not empty.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 5897afd12466..cc598504bc10 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10030,6 +10030,8 @@ static int nft_rcv_nl_event(struct notifier_block *this, unsigned long event, nft_net = nft_pernet(net); deleted = 0; mutex_lock(&nft_net->commit_mutex); + if (!list_empty(&nf_tables_destroy_list)) + rcu_barrier(); again: list_for_each_entry(table, &nft_net->tables, list) { if (nft_table_has_owner(table) &&
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 26b5934ff4194e13196bedcba373cd4915071d0e ]
No need to postpone this to the commit release path, since no packets are walking over this object, this is accessed from control plane only. This helped uncovered UAF triggered by races with the netlink notifier.
Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path") Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index cc598504bc10..879f4a1a27d5 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8465,9 +8465,6 @@ static void nft_commit_release(struct nft_trans *trans) nf_tables_chain_destroy(&trans->ctx); break; case NFT_MSG_DELRULE: - if (trans->ctx.chain->flags & NFT_CHAIN_HW_OFFLOAD) - nft_flow_rule_destroy(nft_trans_flow_rule(trans)); - nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); break; case NFT_MSG_DELSET: @@ -8973,6 +8970,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_rule_expr_deactivate(&trans->ctx, nft_trans_rule(trans), NFT_TRANS_COMMIT); + + if (trans->ctx.chain->flags & NFT_CHAIN_HW_OFFLOAD) + nft_flow_rule_destroy(nft_trans_flow_rule(trans)); break; case NFT_MSG_NEWSET: nft_clear(net, nft_trans_set(trans));
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 6c412da54c80a54b1a8b7f89677f6e82f0fabec4 ]
If an error occurs after the first kzalloc() the corresponding memory allocation is never freed.
Add the missing kfree() in the error handling path, as already done in the remove() function.
Fixes: 7e773594dada ("sfc: Separate efx_nic memory from net_device memory") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Acked-by: Martin Habets habetsm.xilinx@gmail.com Link: https://lore.kernel.org/r/dc114193121c52c8fa3779e49bdd99d4b41344a9.166707700... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/efx.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 153d68e29b8b..68a68477ab7b 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -1059,8 +1059,10 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
/* Allocate and initialise a struct net_device */ net_dev = alloc_etherdev_mq(sizeof(probe_data), EFX_MAX_CORE_TX_QUEUES); - if (!net_dev) - return -ENOMEM; + if (!net_dev) { + rc = -ENOMEM; + goto fail0; + } probe_ptr = netdev_priv(net_dev); *probe_ptr = probe_data; efx->net_dev = net_dev; @@ -1132,6 +1134,8 @@ static int efx_pci_probe(struct pci_dev *pci_dev, WARN_ON(rc > 0); netif_dbg(efx, drv, efx->net_dev, "initialisation failed. rc=%d\n", rc); free_netdev(net_dev); + fail0: + kfree(probe_data); return rc; }
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 8d0d254b15cc5b7d46d85fb7ab8ecede9575e672 ]
nfsd_file_unhash_and_dispose() is called for two reasons:
We're either shutting down and purging the filecache, or we've gotten a notification about a file delete, so we want to go ahead and unhash it so that it'll get cleaned up when we close.
We're either walking the hashtable or doing a lookup in it and we don't take a reference in either case. What we want to do in both cases is to try and unhash the object and put it on the dispose list if that was successful. If it's no longer hashed, then we don't want to touch it, with the assumption being that something else is already cleaning up the sentinel reference.
Instead of trying to selectively decrement the refcount in this function, just unhash it, and if that was successful, move it to the dispose list. Then, the disposal routine will just clean that up as usual.
Also, just make this a void function, drop the WARN_ON_ONCE, and the comments about deadlocking since the nature of the purported deadlock is no longer clear.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Stable-dep-of: d3aefd2b29ff ("nfsd: fix net-namespace logic in __nfsd_file_cache_purge") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 36 +++++++----------------------------- 1 file changed, 7 insertions(+), 29 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index eeed4ae5b4ad..844c41832d50 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -405,22 +405,15 @@ nfsd_file_unhash(struct nfsd_file *nf) return false; }
-/* - * Return true if the file was unhashed. - */ -static bool +static void nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) { trace_nfsd_file_unhash_and_dispose(nf); - if (!nfsd_file_unhash(nf)) - return false; - /* keep final reference for nfsd_file_lru_dispose */ - if (refcount_dec_not_one(&nf->nf_ref)) - return true; - - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, dispose); - return true; + if (nfsd_file_unhash(nf)) { + /* caller must call nfsd_file_dispose_list() later */ + nfsd_file_lru_remove(nf); + list_add(&nf->nf_lru, dispose); + } }
static void @@ -562,8 +555,6 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) * @lock: LRU list lock (unused) * @arg: dispose list * - * Note this can deadlock with nfsd_file_cache_purge. - * * Return values: * %LRU_REMOVED: @item was removed from the LRU * %LRU_ROTATE: @item is to be moved to the LRU tail @@ -748,8 +739,6 @@ nfsd_file_close_inode(struct inode *inode) * * Walk the LRU list and close any entries that have not been used since * the last scan. - * - * Note this can deadlock with nfsd_file_cache_purge. */ static void nfsd_file_delayed_close(struct work_struct *work) @@ -891,16 +880,12 @@ nfsd_file_cache_init(void) goto out; }
-/* - * Note this can deadlock with nfsd_file_lru_cb. - */ static void __nfsd_file_cache_purge(struct net *net) { struct rhashtable_iter iter; struct nfsd_file *nf; LIST_HEAD(dispose); - bool del;
rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); do { @@ -910,14 +895,7 @@ __nfsd_file_cache_purge(struct net *net) while (!IS_ERR_OR_NULL(nf)) { if (net && nf->nf_net != net) continue; - del = nfsd_file_unhash_and_dispose(nf, &dispose); - - /* - * Deadlock detected! Something marked this entry as - * unhased, but hasn't removed it from the hash list. - */ - WARN_ON_ONCE(!del); - + nfsd_file_unhash_and_dispose(nf, &dispose); nf = rhashtable_walk_next(&iter); }
From: Jeff Layton jlayton@kernel.org
[ Upstream commit d3aefd2b29ff5ffdeb5c06a7d3191a027a18cdb8 ]
If the namespace doesn't match the one in "net", then we'll continue, but that doesn't cause another rhashtable_walk_next call, so it will loop infinitely.
Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable") Reported-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/ Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 844c41832d50..173b38ffa423 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -893,9 +893,8 @@ __nfsd_file_cache_purge(struct net *net)
nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (net && nf->nf_net != net) - continue; - nfsd_file_unhash_and_dispose(nf, &dispose); + if (!net || nf->nf_net == net) + nfsd_file_unhash_and_dispose(nf, &dispose); nf = rhashtable_walk_next(&iter); }
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit 486c292230166c2d61701d3c984bf9143588ea28 ]
When the MTU was changed, the lan966x didn't take in consideration the L2 header and the FCS. So the HW was configured with a smaller value than what was desired. Therefore the correct value to configure the HW would be new_mtu + ETH_HLEN + ETH_FCS_LEN. The vlan tag is not considered here, because at the time when the blamed commit was added, there was no vlan filtering support. The vlan fix will be part of the next patch.
Fixes: d28d6d2e37d1 ("net: lan966x: add port module support") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 2 +- drivers/net/ethernet/microchip/lan966x/lan966x_main.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c index d928b75f3780..989e5f045d7e 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c @@ -385,7 +385,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu) int old_mtu = dev->mtu; int err;
- lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(new_mtu), + lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(LAN966X_HW_MTU(new_mtu)), lan966x, DEV_MAC_MAXLEN_CFG(port->chip_port)); dev->mtu = new_mtu;
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h index 2787055c1847..e316bfe186d7 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h @@ -24,6 +24,8 @@ #define LAN966X_BUFFER_MEMORY (160 * 1024) #define LAN966X_BUFFER_MIN_SZ 60
+#define LAN966X_HW_MTU(mtu) ((mtu) + ETH_HLEN + ETH_FCS_LEN) + #define PGID_AGGR 64 #define PGID_SRC 80 #define PGID_ENTRIES 89
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit 25f28bb1b4a7717a9df3aa574d210374ebb6bb23 ]
When vlan filtering is enabled/disabled, it is required to adjust the maximum received frame size that it can received. When vlan filtering is enabled, it would all to receive extra 4 bytes, that are the vlan tag. So the maximum frame size would be 1522 with a vlan tag. If vlan filtering is disabled then the maximum frame size would be 1518 regardless if there is or not a vlan tag.
Fixes: 6d2c186afa5d ("net: lan966x: Add vlan support.") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/microchip/lan966x/lan966x_regs.h | 15 +++++++++++++++ .../net/ethernet/microchip/lan966x/lan966x_vlan.c | 6 ++++++ 2 files changed, 21 insertions(+)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h index 8265ad89f0bc..357ecc2f1089 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_regs.h @@ -444,6 +444,21 @@ enum lan966x_target { #define DEV_MAC_MAXLEN_CFG_MAX_LEN_GET(x)\ FIELD_GET(DEV_MAC_MAXLEN_CFG_MAX_LEN, x)
+/* DEV:MAC_CFG_STATUS:MAC_TAGS_CFG */ +#define DEV_MAC_TAGS_CFG(t) __REG(TARGET_DEV, t, 8, 28, 0, 1, 44, 12, 0, 1, 4) + +#define DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA BIT(1) +#define DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA_SET(x)\ + FIELD_PREP(DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA, x) +#define DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA_GET(x)\ + FIELD_GET(DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA, x) + +#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA BIT(0) +#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA_SET(x)\ + FIELD_PREP(DEV_MAC_TAGS_CFG_VLAN_AWR_ENA, x) +#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA_GET(x)\ + FIELD_GET(DEV_MAC_TAGS_CFG_VLAN_AWR_ENA, x) + /* DEV:MAC_CFG_STATUS:MAC_IFG_CFG */ #define DEV_MAC_IFG_CFG(t) __REG(TARGET_DEV, t, 8, 28, 0, 1, 44, 20, 0, 1, 4)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c b/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c index 8d7260cd7da9..3c44660128da 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c @@ -169,6 +169,12 @@ void lan966x_vlan_port_apply(struct lan966x_port *port) ANA_VLAN_CFG_VLAN_POP_CNT, lan966x, ANA_VLAN_CFG(port->chip_port));
+ lan_rmw(DEV_MAC_TAGS_CFG_VLAN_AWR_ENA_SET(port->vlan_aware) | + DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA_SET(port->vlan_aware), + DEV_MAC_TAGS_CFG_VLAN_AWR_ENA | + DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA, + lan966x, DEV_MAC_TAGS_CFG(port->chip_port)); + /* Drop frames with multicast source address */ val = ANA_DROP_CFG_DROP_MC_SMAC_ENA_SET(1); if (port->vlan_aware && !pvid)
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit 872ad758f9b7fb4eb42aebaf64e50c5b29b7ffe5 ]
When MTU is changed, FDMA is required to calculate what is the maximum size of the frame that it can received. So it can calculate what is the page order needed to allocate for the received frames. The first problem was that, when the max MTU was calculated it was reading the value from dev and not from HW, so in this way it was missing L2 header + the FCS. The other problem was that once the skb is created using __build_skb_around, it would reserve some space for skb_shared_info. So if we received a frame which size is at the limit of the page order then the creating will failed because it would not have space to put all the data.
Fixes: 2ea1cbac267e ("net: lan966x: Update FDMA to change MTU.") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c | 8 ++++++-- drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c index 69f741db25b1..407a6c125515 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c @@ -668,12 +668,14 @@ static int lan966x_fdma_get_max_mtu(struct lan966x *lan966x) int i;
for (i = 0; i < lan966x->num_phys_ports; ++i) { + struct lan966x_port *port; int mtu;
- if (!lan966x->ports[i]) + port = lan966x->ports[i]; + if (!port) continue;
- mtu = lan966x->ports[i]->dev->mtu; + mtu = lan_rd(lan966x, DEV_MAC_MAXLEN_CFG(port->chip_port)); if (mtu > max_mtu) max_mtu = mtu; } @@ -733,6 +735,8 @@ int lan966x_fdma_change_mtu(struct lan966x *lan966x)
max_mtu = lan966x_fdma_get_max_mtu(lan966x); max_mtu += IFH_LEN * sizeof(u32); + max_mtu += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + max_mtu += VLAN_HLEN * 2;
if (round_up(max_mtu, PAGE_SIZE) / PAGE_SIZE - 1 == lan966x->rx.page_order) diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c index 989e5f045d7e..4a3cb7579420 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c @@ -394,7 +394,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
err = lan966x_fdma_change_mtu(lan966x); if (err) { - lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(old_mtu), + lan_wr(DEV_MAC_MAXLEN_CFG_MAX_LEN_SET(LAN966X_HW_MTU(old_mtu)), lan966x, DEV_MAC_MAXLEN_CFG(port->chip_port)); dev->mtu = old_mtu; }
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit fc57062f98b0b0ae52bc584d8fd5ac77c50df607 ]
When lan966x was receiving a frame, then it was building the skb and after that it was calling dma_unmap_single with frame size as the length. This actually has 2 issues: 1. It is using a length to map and a different length to unmap. 2. When the unmap was happening, the data was sync for cpu but it could be that this will overwrite what build_skb was initializing.
The fix for these two problems is to change the order of operations. First to sync the frame for cpu, then to build the skb and in the end to unmap using the correct size but without sync the frame again for cpu.
Fixes: c8349639324a ("net: lan966x: Add FDMA functionality") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Link: https://lore.kernel.org/r/20221031133421.1283196-1-horatiu.vultur@microchip.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/microchip/lan966x/lan966x_fdma.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c index 407a6c125515..3a1a0f9178c0 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c @@ -414,13 +414,15 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx) /* Get the received frame and unmap it */ db = &rx->dcbs[rx->dcb_index].db[rx->db_index]; page = rx->page[rx->dcb_index][rx->db_index]; + + dma_sync_single_for_cpu(lan966x->dev, (dma_addr_t)db->dataptr, + FDMA_DCB_STATUS_BLOCKL(db->status), + DMA_FROM_DEVICE); + skb = build_skb(page_address(page), PAGE_SIZE << rx->page_order); if (unlikely(!skb)) goto unmap_page;
- dma_unmap_single(lan966x->dev, (dma_addr_t)db->dataptr, - FDMA_DCB_STATUS_BLOCKL(db->status), - DMA_FROM_DEVICE); skb_put(skb, FDMA_DCB_STATUS_BLOCKL(db->status));
lan966x_ifh_get_src_port(skb->data, &src_port); @@ -429,6 +431,10 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx) if (WARN_ON(src_port >= lan966x->num_phys_ports)) goto free_skb;
+ dma_unmap_single_attrs(lan966x->dev, (dma_addr_t)db->dataptr, + PAGE_SIZE << rx->page_order, DMA_FROM_DEVICE, + DMA_ATTR_SKIP_CPU_SYNC); + skb->dev = lan966x->ports[src_port]->dev; skb_pull(skb, IFH_LEN * sizeof(u32));
@@ -454,9 +460,9 @@ static struct sk_buff *lan966x_fdma_rx_get_frame(struct lan966x_rx *rx) free_skb: kfree_skb(skb); unmap_page: - dma_unmap_page(lan966x->dev, (dma_addr_t)db->dataptr, - FDMA_DCB_STATUS_BLOCKL(db->status), - DMA_FROM_DEVICE); + dma_unmap_single_attrs(lan966x->dev, (dma_addr_t)db->dataptr, + PAGE_SIZE << rx->page_order, DMA_FROM_DEVICE, + DMA_ATTR_SKIP_CPU_SYNC); __free_pages(page, rx->page_order);
return NULL;
From: Jason A. Donenfeld Jason@zx2c4.com
[ Upstream commit 5c26159c97b324dc5174a5713eafb8c855cf8106 ]
The `char` type with no explicit sign is sometimes signed and sometimes unsigned. This code will break on platforms such as arm, where char is unsigned. So mark it here as explicitly signed, so that the todrop_counter decrement and subsequent comparison is correct.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_conn.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index fb67f1ca2495..db13288fddfa 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1265,8 +1265,8 @@ static inline int todrop_entry(struct ip_vs_conn *cp) * The drop rate array needs tuning for real environments. * Called from timer bh only => no locking */ - static const char todrop_rate[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8}; - static char todrop_counter[9] = {0}; + static const signed char todrop_rate[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8}; + static signed char todrop_counter[9] = {0}; int i;
/* if the conn entry hasn't lasted for 60 seconds, don't drop it.
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 3d00c6a0da8ddcf75213e004765e4a42acc71d5d ]
During the initialization of ip_vs_conn_net_init(), if file ip_vs_conn or ip_vs_conn_sync fails to be created, the initialization is successful by default. Therefore, the ip_vs_conn or ip_vs_conn_sync file doesn't be found during the remove.
The following is the stack information: name 'ip_vs_conn_sync' WARNING: CPU: 3 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> __ip_vs_cleanup_batch+0x7d/0x120 ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_conn.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index db13288fddfa..cb6d68220c26 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1447,20 +1447,36 @@ int __net_init ip_vs_conn_net_init(struct netns_ipvs *ipvs) { atomic_set(&ipvs->conn_count, 0);
- proc_create_net("ip_vs_conn", 0, ipvs->net->proc_net, - &ip_vs_conn_seq_ops, sizeof(struct ip_vs_iter_state)); - proc_create_net("ip_vs_conn_sync", 0, ipvs->net->proc_net, - &ip_vs_conn_sync_seq_ops, - sizeof(struct ip_vs_iter_state)); +#ifdef CONFIG_PROC_FS + if (!proc_create_net("ip_vs_conn", 0, ipvs->net->proc_net, + &ip_vs_conn_seq_ops, + sizeof(struct ip_vs_iter_state))) + goto err_conn; + + if (!proc_create_net("ip_vs_conn_sync", 0, ipvs->net->proc_net, + &ip_vs_conn_sync_seq_ops, + sizeof(struct ip_vs_iter_state))) + goto err_conn_sync; +#endif + return 0; + +#ifdef CONFIG_PROC_FS +err_conn_sync: + remove_proc_entry("ip_vs_conn", ipvs->net->proc_net); +err_conn: + return -ENOMEM; +#endif }
void __net_exit ip_vs_conn_net_cleanup(struct netns_ipvs *ipvs) { /* flush all the connection entries first */ ip_vs_conn_flush(ipvs); +#ifdef CONFIG_PROC_FS remove_proc_entry("ip_vs_conn", ipvs->net->proc_net); remove_proc_entry("ip_vs_conn_sync", ipvs->net->proc_net); +#endif }
int __init ip_vs_conn_init(void)
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 5663ed63adb9619c98ab7479aa4606fa9b7a548c ]
During the initialization of ip_vs_app_net_init(), if file ip_vs_app fails to be created, the initialization is successful by default. Therefore, the ip_vs_app file doesn't be found during the remove in ip_vs_app_net_cleanup(). It will cause WRNING.
The following is the stack information: name 'ip_vs_app' WARNING: CPU: 1 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 457c4cbc5a3d ("[NET]: Make /proc/net per network namespace") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_app.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c index f9b16f2b2219..fdacbc3c15be 100644 --- a/net/netfilter/ipvs/ip_vs_app.c +++ b/net/netfilter/ipvs/ip_vs_app.c @@ -599,13 +599,19 @@ static const struct seq_operations ip_vs_app_seq_ops = { int __net_init ip_vs_app_net_init(struct netns_ipvs *ipvs) { INIT_LIST_HEAD(&ipvs->app_list); - proc_create_net("ip_vs_app", 0, ipvs->net->proc_net, &ip_vs_app_seq_ops, - sizeof(struct seq_net_private)); +#ifdef CONFIG_PROC_FS + if (!proc_create_net("ip_vs_app", 0, ipvs->net->proc_net, + &ip_vs_app_seq_ops, + sizeof(struct seq_net_private))) + return -ENOMEM; +#endif return 0; }
void __net_exit ip_vs_app_net_cleanup(struct netns_ipvs *ipvs) { unregister_ip_vs_app(ipvs, NULL /* all */); +#ifdef CONFIG_PROC_FS remove_proc_entry("ip_vs_app", ipvs->net->proc_net); +#endif }
From: Zhang Qilong zhangqilong3@huawei.com
[ Upstream commit e97c089d7a49f67027395ddf70bf327eeac2611e ]
The syzkaller reported an issue:
KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387] CPU: 0 PID: 4069 Comm: kworker/0:15 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: rcu_gp srcu_invoke_callbacks RIP: 0010:rose_send_frame+0x1dd/0x2f0 net/rose/rose_link.c:101 Call Trace: <IRQ> rose_transmit_clear_request+0x1d5/0x290 net/rose/rose_link.c:255 rose_rx_call_request+0x4c0/0x1bc0 net/rose/af_rose.c:1009 rose_loopback_timer+0x19e/0x590 net/rose/rose_loopback.c:111 call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790 __run_timers kernel/time/timer.c:1768 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803 __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571 [...] </IRQ>
It triggers NULL pointer dereference when 'neigh->dev->dev_addr' is called in the rose_send_frame(). It's the first occurrence of the `neigh` is in rose_loopback_timer() as `rose_loopback_neigh', and the 'dev' in 'rose_loopback_neigh' is initialized sa nullptr.
It had been fixed by commit 3b3fd068c56e3fbea30090859216a368398e39bf ("rose: Fix Null pointer dereference in rose_send_frame()") ever. But it's introduced by commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ("rose: check NULL rose_loopback_neigh->loopback") again.
We fix it by add NULL check in rose_transmit_clear_request(). When the 'dev' in 'neigh' is NULL, we don't reply the request and just clear it.
syzkaller don't provide repro, and I provide a syz repro like: r0 = syz_init_net_socket$bt_sco(0x1f, 0x5, 0x2) ioctl$sock_inet_SIOCSIFFLAGS(r0, 0x8914, &(0x7f0000000180)={'rose0\x00', 0x201}) r1 = syz_init_net_socket$rose(0xb, 0x5, 0x0) bind$rose(r1, &(0x7f00000000c0)=@full={0xb, @dev, @null, 0x0, [@null, @null, @netrom, @netrom, @default, @null]}, 0x40) connect$rose(r1, &(0x7f0000000240)=@short={0xb, @dev={0xbb, 0xbb, 0xbb, 0x1, 0x0}, @remote={0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x1}, 0x1, @netrom={0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0x0, 0x0}}, 0x1c)
Fixes: 3c53cd65dece ("rose: check NULL rose_loopback_neigh->loopback") Signed-off-by: Zhang Qilong zhangqilong3@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rose/rose_link.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/rose/rose_link.c b/net/rose/rose_link.c index 8b96a56d3a49..0f77ae8ef944 100644 --- a/net/rose/rose_link.c +++ b/net/rose/rose_link.c @@ -236,6 +236,9 @@ void rose_transmit_clear_request(struct rose_neigh *neigh, unsigned int lci, uns unsigned char *dptr; int len;
+ if (!neigh->dev) + return; + len = AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + ROSE_MIN_LEN + 3;
if ((skb = alloc_skb(len, GFP_ATOMIC)) == NULL)
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit e7d1d4d9ac0dfa40be4c2c8abd0731659869b297 ]
Afer commit 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array"), the name of device is allocated dynamically, add put_device() to give up the reference, so that the name can be freed in kobject_cleanup() when the refcount is 0.
Set device class before put_device() to avoid null release() function WARN message in device_release().
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/mISDN/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c index a41b4b264594..7ea0100f218a 100644 --- a/drivers/isdn/mISDN/core.c +++ b/drivers/isdn/mISDN/core.c @@ -233,11 +233,12 @@ mISDN_register_device(struct mISDNdevice *dev, if (debug & DEBUG_CORE) printk(KERN_DEBUG "mISDN_register %s %d\n", dev_name(&dev->dev), dev->id); + dev->dev.class = &mISDN_class; + err = create_stack(dev); if (err) goto error1;
- dev->dev.class = &mISDN_class; dev->dev.platform_data = dev; dev->dev.parent = parent; dev_set_drvdata(&dev->dev, dev); @@ -249,8 +250,8 @@ mISDN_register_device(struct mISDNdevice *dev,
error3: delete_stack(dev); - return err; error1: + put_device(&dev->dev); return err;
}
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit bf00f5426074249058a106a6edbb89e4b25a4d79 ]
The class is set in mISDN_register_device(), but if device_add() returns error, it will lead to delete a device without added, fix this by using device_is_registered() to check if the device is registered.
Fixes: a900845e5661 ("mISDN: Add support for Traverse Technologies NETJet PCI cards") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/netjet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/isdn/hardware/mISDN/netjet.c b/drivers/isdn/hardware/mISDN/netjet.c index a52f275f8263..f8447135a902 100644 --- a/drivers/isdn/hardware/mISDN/netjet.c +++ b/drivers/isdn/hardware/mISDN/netjet.c @@ -956,7 +956,7 @@ nj_release(struct tiger_hw *card) } if (card->irq > 0) free_irq(card->irq, card); - if (card->isac.dch.dev.dev.class) + if (device_is_registered(&card->isac.dch.dev.dev)) mISDN_unregister_device(&card->isac.dch.dev);
for (i = 0; i < 2; i++) {
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 5614dc3a47e3310fbc77ea3b67eaadd1c6417bf1 ]
During backref walking, at resolve_indirect_refs(), if we get an error we jump to the 'out' label and call ulist_free() on the 'parents' ulist, which frees all the elements in the ulist - however that does not free any inode lists that may be attached to elements, through the 'aux' field of a ulist node, so we end up leaking lists if we have any attached to the unodes.
Fix this by calling free_leaf_list() instead of ulist_free() when we exit from resolve_indirect_refs(). The static function free_leaf_list() is moved up for this to be possible and it's slightly simplified by removing unnecessary code.
Fixes: 3301958b7c1d ("Btrfs: add inodes before dropping the extent lock in find_all_leafs") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/backref.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index ccc818b40977..90702a74f90c 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -648,6 +648,18 @@ unode_aux_to_inode_list(struct ulist_node *node) return (struct extent_inode_elem *)(uintptr_t)node->aux; }
+static void free_leaf_list(struct ulist *ulist) +{ + struct ulist_node *node; + struct ulist_iterator uiter; + + ULIST_ITER_INIT(&uiter); + while ((node = ulist_next(ulist, &uiter))) + free_inode_elem_list(unode_aux_to_inode_list(node)); + + ulist_free(ulist); +} + /* * We maintain three separate rbtrees: one for direct refs, one for * indirect refs which have a key, and one for indirect refs which do not @@ -762,7 +774,11 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, cond_resched(); } out: - ulist_free(parents); + /* + * We may have inode lists attached to refs in the parents ulist, so we + * must free them before freeing the ulist and its refs. + */ + free_leaf_list(parents); return ret; }
@@ -1409,24 +1425,6 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, return ret; }
-static void free_leaf_list(struct ulist *blocks) -{ - struct ulist_node *node = NULL; - struct extent_inode_elem *eie; - struct ulist_iterator uiter; - - ULIST_ITER_INIT(&uiter); - while ((node = ulist_next(blocks, &uiter))) { - if (!node->aux) - continue; - eie = unode_aux_to_inode_list(node); - free_inode_elem_list(eie); - node->aux = 0; - } - - ulist_free(blocks); -} - /* * Finds all leafs with a reference to the specified combination of bytenr and * offset. key_list_head will point to a list of corresponding keys (caller must
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 92876eec382a0f19f33d09d2c939e9ca49038ae5 ]
During backref walking, at find_parent_nodes(), if we are dealing with a data extent and we get an error while resolving the indirect backrefs, at resolve_indirect_refs(), or in the while loop that iterates over the refs in the direct refs rbtree, we end up leaking the inode lists attached to the direct refs we have in the direct refs rbtree that were not yet added to the refs ulist passed as argument to find_parent_nodes(). Since they were not yet added to the refs ulist and prelim_release() does not free the lists, on error the caller can only free the lists attached to the refs that were added to the refs ulist, all the remaining refs get their inode lists never freed, therefore leaking their memory.
Fix this by having prelim_release() always free any attached inode list to each ref found in the rbtree, and have find_parent_nodes() set the ref's inode list to NULL once it transfers ownership of the inode list to a ref added to the refs ulist passed to find_parent_nodes().
Fixes: 86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/backref.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 90702a74f90c..21c478df6aef 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -289,8 +289,10 @@ static void prelim_release(struct preftree *preftree) struct prelim_ref *ref, *next_ref;
rbtree_postorder_for_each_entry_safe(ref, next_ref, - &preftree->root.rb_root, rbnode) + &preftree->root.rb_root, rbnode) { + free_inode_elem_list(ref->inode_list); free_pref(ref); + }
preftree->root = RB_ROOT_CACHED; preftree->count = 0; @@ -1384,6 +1386,12 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, if (ret < 0) goto out; ref->inode_list = eie; + /* + * We transferred the list ownership to the ref, + * so set to NULL to avoid a double free in case + * an error happens after this. + */ + eie = NULL; } ret = ulist_add_merge_ptr(refs, ref->parent, ref->inode_list, @@ -1409,6 +1417,14 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, eie->next = ref->inode_list; } eie = NULL; + /* + * We have transferred the inode list ownership from + * this ref to the ref we added to the 'refs' ulist. + * So set this ref's inode list to NULL to avoid + * use-after-free when our caller uses it or double + * frees in case an error happens before we return. + */ + ref->inode_list = NULL; } cond_resched(); }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit d37de92b38932d40e4a251e876cc388f9aee5f42 ]
In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests, if we fail to add the tree ref, remove the extent item or remove the extent ref, we are returning from the test function without freeing the "old_roots" ulist that was allocated by the previous calls to btrfs_find_all_roots(). Fix that by calling ulist_free() before returning.
Fixes: 442244c96332 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tests/qgroup-tests.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c index eee1e4459541..843dd3d3adbe 100644 --- a/fs/btrfs/tests/qgroup-tests.c +++ b/fs/btrfs/tests/qgroup-tests.c @@ -232,8 +232,10 @@ static int test_no_shared_qgroup(struct btrfs_root *root,
ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -266,8 +268,10 @@ static int test_no_shared_qgroup(struct btrfs_root *root, }
ret = remove_extent_item(root, nodesize, nodesize); - if (ret) + if (ret) { + ulist_free(old_roots); return -EINVAL; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -329,8 +333,10 @@ static int test_multiple_refs(struct btrfs_root *root,
ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -362,8 +368,10 @@ static int test_multiple_refs(struct btrfs_root *root,
ret = add_tree_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -401,8 +409,10 @@ static int test_multiple_refs(struct btrfs_root *root,
ret = remove_extent_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) {
From: Jozsef Kadlecsik kadlec@netfilter.org
[ Upstream commit 510841da1fcc16f702440ab58ef0b4d82a9056b7 ]
Daniel Xu reported that the hash:net,iface type of the ipset subsystem does not limit adding the same network with different interfaces to a set, which can lead to huge memory usage or allocation failure.
The quick reproducer is
$ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0 $ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done
The backtrace when vmalloc fails:
[Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages <...> [Tue Oct 25 00:13:08 2022] Call Trace: [Tue Oct 25 00:13:08 2022] <TASK> [Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60 [Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180 [Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20 [Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90 [Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710 <...>
The fix is to enforce the limit documented in the ipset(8) manpage:
The internal restriction of the hash:net,iface set type is that the same network prefix cannot be stored with more than 64 different interfaces in a single set.
Fixes: ccf0a4b7fc68 ("netfilter: ipset: Add bucketsize parameter to all hash types") Reported-by: Daniel Xu dxu@dxuuu.xyz Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipset/ip_set_hash_gen.h | 30 ++++++--------------------- 1 file changed, 6 insertions(+), 24 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 6e391308431d..3adc291d9ce1 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -42,31 +42,8 @@ #define AHASH_MAX_SIZE (6 * AHASH_INIT_SIZE) /* Max muber of elements in the array block when tuned */ #define AHASH_MAX_TUNED 64 - #define AHASH_MAX(h) ((h)->bucketsize)
-/* Max number of elements can be tuned */ -#ifdef IP_SET_HASH_WITH_MULTI -static u8 -tune_bucketsize(u8 curr, u32 multi) -{ - u32 n; - - if (multi < curr) - return curr; - - n = curr + AHASH_INIT_SIZE; - /* Currently, at listing one hash bucket must fit into a message. - * Therefore we have a hard limit here. - */ - return n > curr && n <= AHASH_MAX_TUNED ? n : curr; -} -#define TUNE_BUCKETSIZE(h, multi) \ - ((h)->bucketsize = tune_bucketsize((h)->bucketsize, multi)) -#else -#define TUNE_BUCKETSIZE(h, multi) -#endif - /* A hash bucket */ struct hbucket { struct rcu_head rcu; /* for call_rcu */ @@ -936,7 +913,12 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, goto set_full; /* Create a new slot */ if (n->pos >= n->size) { - TUNE_BUCKETSIZE(h, multi); +#ifdef IP_SET_HASH_WITH_MULTI + if (h->bucketsize >= AHASH_MAX_TUNED) + goto set_full; + else if (h->bucketsize < multi) + h->bucketsize += AHASH_INIT_SIZE; +#endif if (n->size >= AHASH_MAX(h)) { /* Trigger rehashing */ mtype_data_next(&h->next, d);
From: Maxim Mikityanskiy maxtram95@gmail.com
[ Upstream commit 3aff8aaca4e36dc8b17eaa011684881a80238966 ]
Fix the race condition between the following two flows that run in parallel:
1. l2cap_reassemble_sdu -> chan->ops->recv (l2cap_sock_recv_cb) -> __sock_queue_rcv_skb.
2. bt_sock_recvmsg -> skb_recv_datagram, skb_free_datagram.
An SKB can be queued by the first flow and immediately dequeued and freed by the second flow, therefore the callers of l2cap_reassemble_sdu can't use the SKB after that function returns. However, some places continue accessing struct l2cap_ctrl that resides in the SKB's CB for a short time after l2cap_reassemble_sdu returns, leading to a use-after-free condition (the stack trace is below, line numbers for kernel 5.19.8).
Fix it by keeping a local copy of struct l2cap_ctrl.
BUG: KASAN: use-after-free in l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth Read of size 1 at addr ffff88812025f2f0 by task kworker/u17:3/43169
Workqueue: hci0 hci_rx_work [bluetooth] Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) print_report.cold (mm/kasan/report.c:314 mm/kasan/report.c:429) ? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) ? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth l2cap_rx (net/bluetooth/l2cap_core.c:7236 net/bluetooth/l2cap_core.c:7271) bluetooth ret_from_fork (arch/x86/entry/entry_64.S:306) </TASK>
Allocated by task 43169: kasan_save_stack (mm/kasan/common.c:39) __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) kmem_cache_alloc_node (mm/slab.h:750 mm/slub.c:3243 mm/slub.c:3293) __alloc_skb (net/core/skbuff.c:414) l2cap_recv_frag (./include/net/bluetooth/bluetooth.h:425 net/bluetooth/l2cap_core.c:8329) bluetooth l2cap_recv_acldata (net/bluetooth/l2cap_core.c:8442) bluetooth hci_rx_work (net/bluetooth/hci_core.c:3642 net/bluetooth/hci_core.c:3832) bluetooth process_one_work (kernel/workqueue.c:2289) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2437) kthread (kernel/kthread.c:376) ret_from_fork (arch/x86/entry/entry_64.S:306)
Freed by task 27920: kasan_save_stack (mm/kasan/common.c:39) kasan_set_track (mm/kasan/common.c:45) kasan_set_free_info (mm/kasan/generic.c:372) ____kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328) slab_free_freelist_hook (mm/slub.c:1780) kmem_cache_free (mm/slub.c:3536 mm/slub.c:3553) skb_free_datagram (./include/net/sock.h:1578 ./include/net/sock.h:1639 net/core/datagram.c:323) bt_sock_recvmsg (net/bluetooth/af_bluetooth.c:295) bluetooth l2cap_sock_recvmsg (net/bluetooth/l2cap_sock.c:1212) bluetooth sock_read_iter (net/socket.c:1087) new_sync_read (./include/linux/fs.h:2052 fs/read_write.c:401) vfs_read (fs/read_write.c:482) ksys_read (fs/read_write.c:620) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
Link: https://lore.kernel.org/linux-bluetooth/CAKErNvoqga1WcmoR3-0875esY6TVWFQDand... Fixes: d2a7ac5d5d3a ("Bluetooth: Add the ERTM receive state machine") Fixes: 4b51dae96731 ("Bluetooth: Add streaming mode receive and incoming packet classifier") Signed-off-by: Maxim Mikityanskiy maxtram95@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 48 ++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 7 deletions(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 1f34b82ca0ec..2283871d3f01 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -6885,6 +6885,7 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, struct l2cap_ctrl *control, struct sk_buff *skb, u8 event) { + struct l2cap_ctrl local_control; int err = 0; bool skb_in_use = false;
@@ -6909,15 +6910,32 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, chan->buffer_seq = chan->expected_tx_seq; skb_in_use = true;
+ /* l2cap_reassemble_sdu may free skb, hence invalidate + * control, so make a copy in advance to use it after + * l2cap_reassemble_sdu returns and to avoid the race + * condition, for example: + * + * The current thread calls: + * l2cap_reassemble_sdu + * chan->ops->recv == l2cap_sock_recv_cb + * __sock_queue_rcv_skb + * Another thread calls: + * bt_sock_recvmsg + * skb_recv_datagram + * skb_free_datagram + * Then the current thread tries to access control, but + * it was freed by skb_free_datagram. + */ + local_control = *control; err = l2cap_reassemble_sdu(chan, skb, control); if (err) break;
- if (control->final) { + if (local_control.final) { if (!test_and_clear_bit(CONN_REJ_ACT, &chan->conn_state)) { - control->final = 0; - l2cap_retransmit_all(chan, control); + local_control.final = 0; + l2cap_retransmit_all(chan, &local_control); l2cap_ertm_send(chan); } } @@ -7297,11 +7315,27 @@ static int l2cap_rx(struct l2cap_chan *chan, struct l2cap_ctrl *control, static int l2cap_stream_rx(struct l2cap_chan *chan, struct l2cap_ctrl *control, struct sk_buff *skb) { + /* l2cap_reassemble_sdu may free skb, hence invalidate control, so store + * the txseq field in advance to use it after l2cap_reassemble_sdu + * returns and to avoid the race condition, for example: + * + * The current thread calls: + * l2cap_reassemble_sdu + * chan->ops->recv == l2cap_sock_recv_cb + * __sock_queue_rcv_skb + * Another thread calls: + * bt_sock_recvmsg + * skb_recv_datagram + * skb_free_datagram + * Then the current thread tries to access control, but it was freed by + * skb_free_datagram. + */ + u16 txseq = control->txseq; + BT_DBG("chan %p, control %p, skb %p, state %d", chan, control, skb, chan->rx_state);
- if (l2cap_classify_txseq(chan, control->txseq) == - L2CAP_TXSEQ_EXPECTED) { + if (l2cap_classify_txseq(chan, txseq) == L2CAP_TXSEQ_EXPECTED) { l2cap_pass_to_tx(chan, control);
BT_DBG("buffer_seq %u->%u", chan->buffer_seq, @@ -7324,8 +7358,8 @@ static int l2cap_stream_rx(struct l2cap_chan *chan, struct l2cap_ctrl *control, } }
- chan->last_acked_seq = control->txseq; - chan->expected_tx_seq = __next_seq(chan, control->txseq); + chan->last_acked_seq = txseq; + chan->expected_tx_seq = __next_seq(chan, txseq);
return 0; }
From: Pauli Virtanen pav@iki.fi
[ Upstream commit b36a234dc438cb6b76fc929a8df9a0e59c8acf23 ]
hci_connect_cis and iso_connect_cis call hci_bind_cis inconsistently with dst_type being either ISO socket address type or the HCI type, but these values cannot be mixed like this. Fix this by using only the HCI type.
CIS connection dst_type was also not initialized in hci_bind_cis, even though it is used in hci_conn_hash_lookup_cis to find existing connections. Set the value in hci_bind_cis, so that existing CIS connections are found e.g. when doing deferred socket connections, also when dst_type is not 0 (ADDR_LE_DEV_PUBLIC).
Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen pav@iki.fi Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_conn.c | 7 +------ net/bluetooth/iso.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index 9777e7b109ee..f1263cdd71dd 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1697,6 +1697,7 @@ struct hci_conn *hci_bind_cis(struct hci_dev *hdev, bdaddr_t *dst, if (!cis) return ERR_PTR(-ENOMEM); cis->cleanup = cis_cleanup; + cis->dst_type = dst_type; }
if (cis->state == BT_CONNECTED) @@ -2076,12 +2077,6 @@ struct hci_conn *hci_connect_cis(struct hci_dev *hdev, bdaddr_t *dst, struct hci_conn *le; struct hci_conn *cis;
- /* Convert from ISO socket address type to HCI address type */ - if (dst_type == BDADDR_LE_PUBLIC) - dst_type = ADDR_LE_DEV_PUBLIC; - else - dst_type = ADDR_LE_DEV_RANDOM; - if (hci_dev_test_flag(hdev, HCI_ADVERTISING)) le = hci_connect_le(hdev, dst, dst_type, false, BT_SECURITY_LOW, diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 613039ba5dbf..f825857db6d0 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -235,6 +235,14 @@ static int iso_chan_add(struct iso_conn *conn, struct sock *sk, return err; }
+static inline u8 le_addr_type(u8 bdaddr_type) +{ + if (bdaddr_type == BDADDR_LE_PUBLIC) + return ADDR_LE_DEV_PUBLIC; + else + return ADDR_LE_DEV_RANDOM; +} + static int iso_connect_bis(struct sock *sk) { struct iso_conn *conn; @@ -328,14 +336,16 @@ static int iso_connect_cis(struct sock *sk) /* Just bind if DEFER_SETUP has been set */ if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) { hcon = hci_bind_cis(hdev, &iso_pi(sk)->dst, - iso_pi(sk)->dst_type, &iso_pi(sk)->qos); + le_addr_type(iso_pi(sk)->dst_type), + &iso_pi(sk)->qos); if (IS_ERR(hcon)) { err = PTR_ERR(hcon); goto done; } } else { hcon = hci_connect_cis(hdev, &iso_pi(sk)->dst, - iso_pi(sk)->dst_type, &iso_pi(sk)->qos); + le_addr_type(iso_pi(sk)->dst_type), + &iso_pi(sk)->qos); if (IS_ERR(hcon)) { err = PTR_ERR(hcon); goto done;
From: Soenke Huster soenke.huster@eknoes.de
[ Upstream commit 160fbcf3bfb93c3c086427f9f4c8bc70f217e9be ]
By using skb_put we ensure that skb->tail is set correctly. Currently, skb->tail is always zero, which leads to errors, such as the following page fault in rfcomm_recv_frame:
BUG: unable to handle page fault for address: ffffed1021de29ff #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:rfcomm_run+0x831/0x4040 (net/bluetooth/rfcomm/core.c:1751)
Fixes: afd2daa26c7a ("Bluetooth: Add support for virtio transport driver") Signed-off-by: Soenke Huster soenke.huster@eknoes.de Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/virtio_bt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bluetooth/virtio_bt.c b/drivers/bluetooth/virtio_bt.c index 67c21263f9e0..fd281d439505 100644 --- a/drivers/bluetooth/virtio_bt.c +++ b/drivers/bluetooth/virtio_bt.c @@ -219,7 +219,7 @@ static void virtbt_rx_work(struct work_struct *work) if (!skb) return;
- skb->len = len; + skb_put(skb, len); virtbt_rx_handle(vbt, skb);
if (virtbt_add_inbuf(vbt) < 0)
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 0d0e2d032811280b927650ff3c15fe5020e82533 ]
When l2cap_recv_frame() is invoked to receive data, and the cid is L2CAP_CID_A2MP, if the channel does not exist, it will create a channel. However, after a channel is created, the hold operation of the channel is not performed. In this case, the value of channel reference counting is 1. As a result, after hci_error_reset() is triggered, l2cap_conn_del() invokes the close hook function of A2MP to release the channel. Then l2cap_chan_unlock(chan) will trigger UAF issue.
The process is as follows: Receive data: l2cap_data_channel() a2mp_channel_create() --->channel ref is 2 l2cap_chan_put() --->channel ref is 1
Triger event: hci_error_reset() hci_dev_do_close() ... l2cap_disconn_cfm() l2cap_conn_del() l2cap_chan_hold() --->channel ref is 2 l2cap_chan_del() --->channel ref is 1 a2mp_chan_close_cb() --->channel ref is 0, release channel l2cap_chan_unlock() --->UAF of channel
The detailed Call Trace is as follows: BUG: KASAN: use-after-free in __mutex_unlock_slowpath+0xa6/0x5e0 Read of size 8 at addr ffff8880160664b8 by task kworker/u11:1/7593 Workqueue: hci0 hci_error_reset Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 print_report.cold+0x2ba/0x719 kasan_report+0xb1/0x1e0 kasan_check_range+0x140/0x190 __mutex_unlock_slowpath+0xa6/0x5e0 l2cap_conn_del+0x404/0x7b0 l2cap_disconn_cfm+0x8c/0xc0 hci_conn_hash_flush+0x11f/0x260 hci_dev_close_sync+0x5f5/0x11f0 hci_dev_do_close+0x2d/0x70 hci_error_reset+0x9e/0x140 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Allocated by task 7593: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0xa9/0xd0 l2cap_chan_create+0x40/0x930 amp_mgr_create+0x96/0x990 a2mp_channel_create+0x7d/0x150 l2cap_recv_frame+0x51b8/0x9a70 l2cap_recv_acldata+0xaa3/0xc00 hci_rx_work+0x702/0x1220 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30
Freed by task 7593: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x167/0x1c0 slab_free_freelist_hook+0x89/0x1c0 kfree+0xe2/0x580 l2cap_chan_put+0x22a/0x2d0 l2cap_conn_del+0x3fc/0x7b0 l2cap_disconn_cfm+0x8c/0xc0 hci_conn_hash_flush+0x11f/0x260 hci_dev_close_sync+0x5f5/0x11f0 hci_dev_do_close+0x2d/0x70 hci_error_reset+0x9e/0x140 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30
Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0xbe/0xd0 call_rcu+0x99/0x740 netlink_release+0xe6a/0x1cf0 __sock_release+0xcd/0x280 sock_close+0x18/0x20 __fput+0x27c/0xa90 task_work_run+0xdd/0x1a0 exit_to_user_mode_prepare+0x23c/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0xbe/0xd0 call_rcu+0x99/0x740 netlink_release+0xe6a/0x1cf0 __sock_release+0xcd/0x280 sock_close+0x18/0x20 __fput+0x27c/0xa90 task_work_run+0xdd/0x1a0 exit_to_user_mode_prepare+0x23c/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 2283871d3f01..9a32ce634919 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -7615,6 +7615,7 @@ static void l2cap_data_channel(struct l2cap_conn *conn, u16 cid, return; }
+ l2cap_chan_hold(chan); l2cap_chan_lock(chan); } else { BT_DBG("unknown cid 0x%4.4x", cid);
From: Hawkins Jiawei yin31149@gmail.com
[ Upstream commit 7c9524d929648935bac2bbb4c20437df8f9c3f42 ]
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810d81ac00 (size 240): [...] hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff838733d9>] __alloc_skb+0x1f9/0x270 net/core/skbuff.c:418 [<ffffffff833f742f>] alloc_skb include/linux/skbuff.h:1257 [inline] [<ffffffff833f742f>] bt_skb_alloc include/net/bluetooth/bluetooth.h:469 [inline] [<ffffffff833f742f>] vhci_get_user drivers/bluetooth/hci_vhci.c:391 [inline] [<ffffffff833f742f>] vhci_write+0x5f/0x230 drivers/bluetooth/hci_vhci.c:511 [<ffffffff815e398d>] call_write_iter include/linux/fs.h:2192 [inline] [<ffffffff815e398d>] new_sync_write fs/read_write.c:491 [inline] [<ffffffff815e398d>] vfs_write+0x42d/0x540 fs/read_write.c:578 [<ffffffff815e3cdd>] ksys_write+0x9d/0x160 fs/read_write.c:631 [<ffffffff845e0645>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845e0645>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ====================================
HCI core will uses hci_rx_work() to process frame, which is queued to the hdev->rx_q tail in hci_recv_frame() by HCI driver.
Yet the problem is that, HCI core may not free the skb after handling ACL data packets. To be more specific, when start fragment does not contain the L2CAP length, HCI core just copies skb into conn->rx_skb and finishes frame process in l2cap_recv_acldata(), without freeing the skb, which triggers the above memory leak.
This patch solves it by releasing the relative skb, after processing the above case in l2cap_recv_acldata().
Fixes: 4d7ea8ee90e4 ("Bluetooth: L2CAP: Fix handling fragmented length") Link: https://lore.kernel.org/all/0000000000000d0b1905e6aaef64@google.com/ Reported-and-tested-by: syzbot+8f819e36e01022991cfa@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 9a32ce634919..1fbe087d6ae4 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -8461,9 +8461,8 @@ void l2cap_recv_acldata(struct hci_conn *hcon, struct sk_buff *skb, u16 flags) * expected length. */ if (skb->len < L2CAP_LEN_SIZE) { - if (l2cap_recv_frag(conn, skb, conn->mtu) < 0) - goto drop; - return; + l2cap_recv_frag(conn, skb, conn->mtu); + break; }
len = get_unaligned_le16(skb->data) + L2CAP_HDR_SIZE; @@ -8507,7 +8506,7 @@ void l2cap_recv_acldata(struct hci_conn *hcon, struct sk_buff *skb, u16 flags)
/* Header still could not be read just continue */ if (conn->rx_skb->len < L2CAP_LEN_SIZE) - return; + break; }
if (skb->len > conn->rx_len) {
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 5638d9ea9c01c77fc11693d48cf719bc7e88f224 ]
When disconnecting an ISO link the controller may not generate HCI_EV_NUM_COMP_PKTS for unacked packets which needs to be restored in hci_conn_del otherwise the host would assume they are still in use and would not be able to use all the buffers available.
Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Tested-by: Frédéric Danis frederic.danis@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_conn.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index f1263cdd71dd..f26ed278d9e3 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1003,10 +1003,21 @@ int hci_conn_del(struct hci_conn *conn) hdev->acl_cnt += conn->sent; } else { struct hci_conn *acl = conn->link; + if (acl) { acl->link = NULL; hci_conn_drop(acl); } + + /* Unacked ISO frames */ + if (conn->type == ISO_LINK) { + if (hdev->iso_pkts) + hdev->iso_cnt += conn->sent; + else if (hdev->le_pkts) + hdev->le_cnt += conn->sent; + else + hdev->acl_cnt += conn->sent; + } }
if (conn->amp_mgr)
From: Gaosheng Cui cuigaosheng1@huawei.com
[ Upstream commit 40e4eb324c59e11fcb927aa46742d28aba6ecb8a ]
Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in drivers/net/phy/mdio_bus.c:586:27 left shift of 1 by 31 places cannot be represented in type 'int' Call Trace: <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c __mdiobus_register+0x49d/0x4e0 fixed_mdio_bus_init+0xd8/0x12d do_one_initcall+0x76/0x430 kernel_init_freeable+0x3b3/0x422 kernel_init+0x24/0x1e0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 4fd5f812c23c ("phylib: allow incremental scanning of an mii bus") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20221031132645.168421-1-cuigaosheng1@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/mdio_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 8a2dbe849866..1bc988d9f2e8 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -583,7 +583,7 @@ int __mdiobus_register(struct mii_bus *bus, struct module *owner) }
for (i = 0; i < PHY_MAX_ADDR; i++) { - if ((bus->phy_mask & (1 << i)) == 0) { + if ((bus->phy_mask & BIT(i)) == 0) { struct phy_device *phydev;
phydev = mdiobus_scan(bus, i);
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit d6dd2fe71153f0ff748bf188bd4af076fe09a0a6 ]
Free the rwi structure in the event that the last rwi in the list processed successfully. The logic in commit 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets") introduces an issue that results in a 32 byte memory leak whenever the last rwi in the list gets processed.
Fixes: 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets") Signed-off-by: Nick Child nnac123@linux.ibm.com Link: https://lore.kernel.org/r/20221031150642.13356-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5ab7c0f81e9a..e6b141536879 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3007,19 +3007,19 @@ static void __ibmvnic_reset(struct work_struct *work) rwi = get_next_rwi(adapter);
/* - * If there is another reset queued, free the previous rwi - * and process the new reset even if previous reset failed - * (the previous reset could have failed because of a fail - * over for instance, so process the fail over). - * * If there are no resets queued and the previous reset failed, * the adapter would be in an undefined state. So retry the * previous reset as a hard reset. + * + * Else, free the previous rwi and, if there is another reset + * queued, process the new reset even if previous reset failed + * (the previous reset could have failed because of a fail + * over for instance, so process the fail over). */ - if (rwi) - kfree(tmprwi); - else if (rc) + if (!rwi && rc) rwi = tmprwi; + else + kfree(tmprwi);
if (rwi && (rwi->reset_reason == VNIC_RESET_FAILOVER || rwi->reset_reason == VNIC_RESET_MOBILITY || rc))
From: Liu Peibao liupeibao@loongson.cn
[ Upstream commit 2ae34111fe4eebb69986f6490015b57c88804373 ]
In current code "plat->mdio_node" is always NULL, the mdio support is lost as there is no "mdio_bus_data". The original driver could work as the "mdio" variable is never set to false, which is described in commit <b0e03950dd71> ("stmmac: dwmac-loongson: fix uninitialized variable ......"). And after this commit merged, the "mdio" variable is always false, causing the mdio supoort logic lost.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Liu Peibao liupeibao@loongson.cn Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20221101060218.16453-1-liupeibao@loongson.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c index 017dbbda0c1c..79fa7870563b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c @@ -51,7 +51,6 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id struct stmmac_resources res; struct device_node *np; int ret, i, phy_mode; - bool mdio = false;
np = dev_of_node(&pdev->dev);
@@ -69,12 +68,10 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id if (!plat) return -ENOMEM;
+ plat->mdio_node = of_get_child_by_name(np, "mdio"); if (plat->mdio_node) { - dev_err(&pdev->dev, "Found MDIO subnode\n"); - mdio = true; - } + dev_info(&pdev->dev, "Found MDIO subnode\n");
- if (mdio) { plat->mdio_bus_data = devm_kzalloc(&pdev->dev, sizeof(*plat->mdio_bus_data), GFP_KERNEL);
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 62ff373da2534534c55debe6c724c7fe14adb97f ]
In smc_init(), register_pernet_subsys(&smc_net_stat_ops) is called without any error handling. If it fails, registering of &smc_net_ops won't be reverted. And if smc_nl_init() fails, &smc_net_stat_ops itself won't be reverted.
This leaves wild ops in subsystem linkedlist and when another module tries to call register_pernet_operations() it triggers page fault:
BUG: unable to handle page fault for address: fffffbfff81b964c RIP: 0010:register_pernet_operations+0x1b9/0x5f0 Call Trace: <TASK> register_pernet_subsys+0x29/0x40 ebtables_init+0x58/0x1000 [ebtables] ...
Fixes: 194730a9beb5 ("net/smc: Make SMC statistics network namespace aware") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Tony Lu tonylu@linux.alibaba.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Link: https://lore.kernel.org/r/20221101093722.127223-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 0939cc3b915a..58ff5f33b370 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3380,14 +3380,14 @@ static int __init smc_init(void)
rc = register_pernet_subsys(&smc_net_stat_ops); if (rc) - return rc; + goto out_pernet_subsys;
smc_ism_init(); smc_clc_init();
rc = smc_nl_init(); if (rc) - goto out_pernet_subsys; + goto out_pernet_subsys_stat;
rc = smc_pnet_init(); if (rc) @@ -3480,6 +3480,8 @@ static int __init smc_init(void) smc_pnet_exit(); out_nl: smc_nl_exit(); +out_pernet_subsys_stat: + unregister_pernet_subsys(&smc_net_stat_ops); out_pernet_subsys: unregister_pernet_subsys(&smc_net_ops);
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit f8017317cb0b279b8ab98b0f3901a2e0ac880dad ]
When IPv6 module gets initialized but hits an error in the middle, kenel panic with:
KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f] CPU: 1 PID: 361 Comm: insmod Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370 RSP: 0018:ffff888012677908 EFLAGS: 00000202 ... Call Trace: <TASK> neigh_table_clear+0x94/0x2d0 ndisc_cleanup+0x27/0x40 [ipv6] inet6_init+0x21c/0x2cb [ipv6] do_one_initcall+0xd3/0x4d0 do_init_module+0x1ae/0x670 ... Kernel panic - not syncing: Fatal exception
When ipv6 initialization fails, it will try to cleanup and calls:
neigh_table_clear() neigh_ifdown(tbl, NULL) pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL)) # dev_net(NULL) triggers null-ptr-deref.
Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev is NULL, to make kernel not panic immediately.
Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Denis V. Lunev den@openvz.org Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/neighbour.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 78cc8fb68814..84755db81e9d 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -409,7 +409,7 @@ static int __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev, write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev, skip_perm); pneigh_ifdown_and_unlock(tbl, dev); - pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev)); + pneigh_queue_purge(&tbl->proxy_queue, dev ? dev_net(dev) : NULL); if (skb_queue_empty_lockless(&tbl->proxy_queue)) del_timer_sync(&tbl->proxy_timer); return 0;
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 628ac04a75ed5ff13647e725f40192da22ef2be8 ]
The following commands should result in all the dynamic FDB entries being flushed, but instead all the non-local (non-permanent) entries are flushed:
# bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic # ip link set dev br0 type bridge fdb_flush # bridge fdb show brport dummy1 00:00:00:00:00:01 master br0 permanent 33:33:00:00:00:01 self permanent 01:00:5e:00:00:01 self permanent
This is because br_fdb_flush() works with FDB flags and not the corresponding enumerator values. Fix by passing the FDB flag instead.
After the fix:
# bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic # ip link set dev br0 type bridge fdb_flush # bridge fdb show brport dummy1 00:aa:bb:cc:dd:ee master br0 static 00:00:00:00:00:01 master br0 permanent 33:33:00:00:00:01 self permanent 01:00:5e:00:00:01 self permanent
Fixes: 1f78ee14eeac ("net: bridge: fdb: add support for fine-grained flushing") Signed-off-by: Ido Schimmel idosch@nvidia.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20221101185753.2120691-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_netlink.c | 2 +- net/bridge/br_sysfs_br.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 5aeb3646e74c..d087fd4c784a 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1332,7 +1332,7 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
if (data[IFLA_BR_FDB_FLUSH]) { struct net_bridge_fdb_flush_desc desc = { - .flags_mask = BR_FDB_STATIC + .flags_mask = BIT(BR_FDB_STATIC) };
br_fdb_flush(br, &desc); diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index 612e367fff20..ea733542244c 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -345,7 +345,7 @@ static int set_flush(struct net_bridge *br, unsigned long val, struct netlink_ext_ack *extack) { struct net_bridge_fdb_flush_desc desc = { - .flags_mask = BR_FDB_STATIC + .flags_mask = BIT(BR_FDB_STATIC) };
br_fdb_flush(br, &desc);
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 768b3c745fe5789f2430bdab02f35a9ad1148d97 ]
During the initialization of ip6_route_net_init_late(), if file ipv6_route or rt6_stats fails to be created, the initialization is successful by default. Therefore, the ipv6_route or rt6_stats file doesn't be found during the remove in ip6_route_net_exit_late(). It will cause WRNING.
The following is the stack information: name 'rt6_stats' WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 PKRU: 55555554 Call Trace: <TASK> ops_exit_list+0xb0/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: cdb1876192db ("[NETNS][IPV6] route6 - create route6 proc files for the namespace") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 69252eb462b2..2f355f0ec32a 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6555,10 +6555,16 @@ static void __net_exit ip6_route_net_exit(struct net *net) static int __net_init ip6_route_net_init_late(struct net *net) { #ifdef CONFIG_PROC_FS - proc_create_net("ipv6_route", 0, net->proc_net, &ipv6_route_seq_ops, - sizeof(struct ipv6_route_iter)); - proc_create_net_single("rt6_stats", 0444, net->proc_net, - rt6_stats_seq_show, NULL); + if (!proc_create_net("ipv6_route", 0, net->proc_net, + &ipv6_route_seq_ops, + sizeof(struct ipv6_route_iter))) + return -ENOMEM; + + if (!proc_create_net_single("rt6_stats", 0444, net->proc_net, + rt6_stats_seq_show, NULL)) { + remove_proc_entry("ipv6_route", net->proc_net); + return -ENOMEM; + } #endif return 0; }
From: Dexuan Cui decui@microsoft.com
[ Upstream commit 466a85336fee6e3b35eb97b8405a28302fd25809 ]
Currently vsock_connectible_has_data() may miss a wakeup operation between vsock_connectible_has_data() == 0 and the prepare_to_wait().
Fix the race by adding the process to the wait queue before checking vsock_connectible_has_data().
Fixes: b3f7fd54881b ("af_vsock: separate wait data loop") Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Reported-by: Frédéric Dalleau frederic.dalleau@docker.com Tested-by: Frédéric Dalleau frederic.dalleau@docker.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/af_vsock.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index b4ee163154a6..e5ab2418f9d6 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1894,8 +1894,11 @@ static int vsock_connectible_wait_data(struct sock *sk, err = 0; transport = vsk->transport;
- while ((data = vsock_connectible_has_data(vsk)) == 0) { + while (1) { prepare_to_wait(sk_sleep(sk), wait, TASK_INTERRUPTIBLE); + data = vsock_connectible_has_data(vsk); + if (data != 0) + break;
if (sk->sk_err != 0 || (sk->sk_shutdown & RCV_SHUTDOWN) ||
From: Olivier Moysan olivier.moysan@foss.st.com
[ Upstream commit 174dac5dc800e4e2e4552baf6340846a344d01a3 ]
Fix channel init for ADC generic channel bindings. In generic channel initialization, stm32_adc_smpr_init() is called to initialize channel sampling time. The "st,min-sample-time-ns" property is an optional property. If it is not defined, stm32_adc_smpr_init() is currently skipped. However stm32_adc_smpr_init() must always be called, to force a minimum sampling time for the internal channels, as the minimum sampling time is known. Make stm32_adc_smpr_init() call unconditional.
Fixes: 796e5d0b1e9b ("iio: adc: stm32-adc: use generic binding for sample-time") Signed-off-by: Olivier Moysan olivier.moysan@foss.st.com Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Fabrice Gasnier fabrice.gasnier@foss.st.com Link: https://lore.kernel.org/r/20221012142205.13041-2-olivier.moysan@foss.st.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/stm32-adc.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c index 130e8dd6f0c8..7719f7f93c3f 100644 --- a/drivers/iio/adc/stm32-adc.c +++ b/drivers/iio/adc/stm32-adc.c @@ -2064,18 +2064,19 @@ static int stm32_adc_generic_chan_init(struct iio_dev *indio_dev, stm32_adc_chan_init_one(indio_dev, &channels[scan_index], val, vin[1], scan_index, differential);
+ val = 0; ret = of_property_read_u32(child, "st,min-sample-time-ns", &val); /* st,min-sample-time-ns is optional */ - if (!ret) { - stm32_adc_smpr_init(adc, channels[scan_index].channel, val); - if (differential) - stm32_adc_smpr_init(adc, vin[1], val); - } else if (ret != -EINVAL) { + if (ret && ret != -EINVAL) { dev_err(&indio_dev->dev, "Invalid st,min-sample-time-ns property %d\n", ret); goto err; }
+ stm32_adc_smpr_init(adc, channels[scan_index].channel, val); + if (differential) + stm32_adc_smpr_init(adc, vin[1], val); + scan_index++; }
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit cb00f3a4421d5c7d7155bd4bded7fb2ff8eec211 ]
The ISP converts Bayer data to YUV when operating normally, and can also operate in pass-through mode where the input and output formats must match. Converting from YUV to Bayer isn't possible. If such an invalid configuration is attempted, adjust it by copying the sink pad media bus code to the source pad.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../platform/rockchip/rkisp1/rkisp1-isp.c | 40 +++++++++++++++---- 1 file changed, 32 insertions(+), 8 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c index 383a3ec83ca9..00032b849a07 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c @@ -472,23 +472,43 @@ static void rkisp1_isp_set_src_fmt(struct rkisp1_isp *isp, struct v4l2_mbus_framefmt *format, unsigned int which) { - const struct rkisp1_mbus_info *mbus_info; + const struct rkisp1_mbus_info *sink_info; + const struct rkisp1_mbus_info *src_info; + struct v4l2_mbus_framefmt *sink_fmt; struct v4l2_mbus_framefmt *src_fmt; const struct v4l2_rect *src_crop;
+ sink_fmt = rkisp1_isp_get_pad_fmt(isp, sd_state, + RKISP1_ISP_PAD_SINK_VIDEO, which); src_fmt = rkisp1_isp_get_pad_fmt(isp, sd_state, RKISP1_ISP_PAD_SOURCE_VIDEO, which); src_crop = rkisp1_isp_get_pad_crop(isp, sd_state, RKISP1_ISP_PAD_SOURCE_VIDEO, which);
+ /* + * Media bus code. The ISP can operate in pass-through mode (Bayer in, + * Bayer out or YUV in, YUV out) or process Bayer data to YUV, but + * can't convert from YUV to Bayer. + */ + sink_info = rkisp1_mbus_info_get_by_code(sink_fmt->code); + src_fmt->code = format->code; - mbus_info = rkisp1_mbus_info_get_by_code(src_fmt->code); - if (!mbus_info || !(mbus_info->direction & RKISP1_ISP_SD_SRC)) { + src_info = rkisp1_mbus_info_get_by_code(src_fmt->code); + if (!src_info || !(src_info->direction & RKISP1_ISP_SD_SRC)) { src_fmt->code = RKISP1_DEF_SRC_PAD_FMT; - mbus_info = rkisp1_mbus_info_get_by_code(src_fmt->code); + src_info = rkisp1_mbus_info_get_by_code(src_fmt->code); } - if (which == V4L2_SUBDEV_FORMAT_ACTIVE) - isp->src_fmt = mbus_info; + + if (sink_info->pixel_enc == V4L2_PIXEL_ENC_YUV && + src_info->pixel_enc == V4L2_PIXEL_ENC_BAYER) { + src_fmt->code = sink_fmt->code; + src_info = sink_info; + } + + /* + * The source width and height must be identical to the source crop + * size. + */ src_fmt->width = src_crop->width; src_fmt->height = src_crop->height;
@@ -498,14 +518,18 @@ static void rkisp1_isp_set_src_fmt(struct rkisp1_isp *isp, */ if (format->flags & V4L2_MBUS_FRAMEFMT_SET_CSC && format->quantization == V4L2_QUANTIZATION_FULL_RANGE && - mbus_info->pixel_enc == V4L2_PIXEL_ENC_YUV) + src_info->pixel_enc == V4L2_PIXEL_ENC_YUV) src_fmt->quantization = V4L2_QUANTIZATION_FULL_RANGE; - else if (mbus_info->pixel_enc == V4L2_PIXEL_ENC_YUV) + else if (src_info->pixel_enc == V4L2_PIXEL_ENC_YUV) src_fmt->quantization = V4L2_QUANTIZATION_LIM_RANGE; else src_fmt->quantization = V4L2_QUANTIZATION_FULL_RANGE;
*format = *src_fmt; + + /* Store the source format info when setting the active format. */ + if (which == V4L2_SUBDEV_FORMAT_ACTIVE) + isp->src_fmt = src_info; }
static void rkisp1_isp_set_src_crop(struct rkisp1_isp *isp,
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit 711d91497e203b058cf0a08c0f7d41c04efbde76 ]
The rkisp1_csm_config() function takes a pointer to the rkisp1_params structure which contains the quantization value. There's no need to pass it separately to the function. Drop it from the function parameters.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-params.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c index 9da7dc1bc690..32485f7c79d5 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c @@ -1066,7 +1066,7 @@ static void rkisp1_ie_enable(struct rkisp1_params *params, bool en) } }
-static void rkisp1_csm_config(struct rkisp1_params *params, bool full_range) +static void rkisp1_csm_config(struct rkisp1_params *params) { static const u16 full_range_coeff[] = { 0x0026, 0x004b, 0x000f, @@ -1080,7 +1080,7 @@ static void rkisp1_csm_config(struct rkisp1_params *params, bool full_range) }; unsigned int i;
- if (full_range) { + if (params->quantization == V4L2_QUANTIZATION_FULL_RANGE) { for (i = 0; i < ARRAY_SIZE(full_range_coeff); i++) rkisp1_write(params->rkisp1, RKISP1_CIF_ISP_CC_COEFF_0 + i * 4, @@ -1552,11 +1552,7 @@ static void rkisp1_params_config_parameter(struct rkisp1_params *params) rkisp1_param_set_bits(params, RKISP1_CIF_ISP_HIST_PROP_V10, rkisp1_hst_params_default_config.mode);
- /* set the range */ - if (params->quantization == V4L2_QUANTIZATION_FULL_RANGE) - rkisp1_csm_config(params, true); - else - rkisp1_csm_config(params, false); + rkisp1_csm_config(params);
spin_lock_irq(¶ms->config_lock);
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit 83b9296e399367862845d3b19984444fc756bd61 ]
Initialize the four color space fields on the sink and source video pads of the resizer in the .init_cfg() operation. The resizer can't perform any color space conversion, so set the sink and source color spaces to the same defaults, which match the ISP source video pad default.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c index f4caa8f684aa..a2dc6f60d9cf 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c @@ -411,6 +411,10 @@ static int rkisp1_rsz_init_config(struct v4l2_subdev *sd, sink_fmt->height = RKISP1_DEFAULT_HEIGHT; sink_fmt->field = V4L2_FIELD_NONE; sink_fmt->code = RKISP1_DEF_FMT; + sink_fmt->colorspace = V4L2_COLORSPACE_SRGB; + sink_fmt->xfer_func = V4L2_XFER_FUNC_SRGB; + sink_fmt->ycbcr_enc = V4L2_YCBCR_ENC_601; + sink_fmt->quantization = V4L2_QUANTIZATION_LIM_RANGE;
sink_crop = v4l2_subdev_get_try_crop(sd, sd_state, RKISP1_RSZ_PAD_SINK);
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit 4c3501f13e8e60f6e7e7308c77ac4404e1007c18 ]
The rkisp1_lsc_config() function incorrectly uses the RKISP1_CIF_ISP_LSC_SECT_SIZE() macro for the gradient registers. Replace it with the correct macro, and rename it from RKISP1_CIF_ISP_LSC_GRAD_SIZE() to RKISP1_CIF_ISP_LSC_SECT_GRAD() as the corresponding registers store the gradients for each sector, not a size. This doesn't cause any functional change as the two macros are defined identically (the size and gradient registers store fields in the same number of bits at the same positions).
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-params.c | 4 ++-- drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c index 32485f7c79d5..02ac3043badd 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c @@ -343,7 +343,7 @@ static void rkisp1_lsc_config(struct rkisp1_params *params, RKISP1_CIF_ISP_LSC_XSIZE_01 + i * 4, data);
/* program x grad tables */ - data = RKISP1_CIF_ISP_LSC_SECT_SIZE(arg->x_grad_tbl[i * 2], + data = RKISP1_CIF_ISP_LSC_SECT_GRAD(arg->x_grad_tbl[i * 2], arg->x_grad_tbl[i * 2 + 1]); rkisp1_write(params->rkisp1, RKISP1_CIF_ISP_LSC_XGRAD_01 + i * 4, data); @@ -355,7 +355,7 @@ static void rkisp1_lsc_config(struct rkisp1_params *params, RKISP1_CIF_ISP_LSC_YSIZE_01 + i * 4, data);
/* program y grad tables */ - data = RKISP1_CIF_ISP_LSC_SECT_SIZE(arg->y_grad_tbl[i * 2], + data = RKISP1_CIF_ISP_LSC_SECT_GRAD(arg->y_grad_tbl[i * 2], arg->y_grad_tbl[i * 2 + 1]); rkisp1_write(params->rkisp1, RKISP1_CIF_ISP_LSC_YGRAD_01 + i * 4, data); diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h b/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h index dd3e6c38be67..025491f8793f 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h @@ -576,7 +576,7 @@ (((v0) & 0x1FFF) | (((v1) & 0x1FFF) << 13)) #define RKISP1_CIF_ISP_LSC_SECT_SIZE(v0, v1) \ (((v0) & 0xFFF) | (((v1) & 0xFFF) << 16)) -#define RKISP1_CIF_ISP_LSC_GRAD_SIZE(v0, v1) \ +#define RKISP1_CIF_ISP_LSC_SECT_GRAD(v0, v1) \ (((v0) & 0xFFF) | (((v1) & 0xFFF) << 16))
/* LSC: ISP_LSC_TABLE_SEL */
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit c53e3a049f35978a150526671587fd46b1ae7ca1 ]
The local sd_fmt variable in rkisp1_capture_link_validate() has uninitialized fields, which causes random failures when calling the subdev .get_fmt() operation. Fix it by initializing the variable when declaring it, which zeros all other fields.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c index d5904c96ff3f..c66963a2ccd9 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c @@ -1273,11 +1273,12 @@ static int rkisp1_capture_link_validate(struct media_link *link) struct rkisp1_capture *cap = video_get_drvdata(vdev); const struct rkisp1_capture_fmt_cfg *fmt = rkisp1_find_fmt_cfg(cap, cap->pix.fmt.pixelformat); - struct v4l2_subdev_format sd_fmt; + struct v4l2_subdev_format sd_fmt = { + .which = V4L2_SUBDEV_FORMAT_ACTIVE, + .pad = link->source->index, + }; int ret;
- sd_fmt.which = V4L2_SUBDEV_FORMAT_ACTIVE; - sd_fmt.pad = link->source->index; ret = v4l2_subdev_call(sd, pad, get_fmt, NULL, &sd_fmt); if (ret) return ret;
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 93f65ce036863893c164ca410938e0968964b26c ]
I expect that the hardware will have limited this to 16, but just in case it hasn't, check for this corner case.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/platform/s5p/s5p_cec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/media/cec/platform/s5p/s5p_cec.c b/drivers/media/cec/platform/s5p/s5p_cec.c index ce9a9d922f11..0a30e7acdc10 100644 --- a/drivers/media/cec/platform/s5p/s5p_cec.c +++ b/drivers/media/cec/platform/s5p/s5p_cec.c @@ -115,6 +115,8 @@ static irqreturn_t s5p_cec_irq_handler(int irq, void *priv) dev_dbg(cec->dev, "Buffer overrun (worker did not process previous message)\n"); cec->rx = STATE_BUSY; cec->msg.len = status >> 24; + if (cec->msg.len > CEC_MAX_MSG_SIZE) + cec->msg.len = CEC_MAX_MSG_SIZE; cec->msg.rx_status = CEC_RX_STATUS_OK; s5p_cec_get_rx_buf(cec, cec->msg.len, cec->msg.msg);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 2dc73b48665411a08c4e5f0f823dea8510761603 ]
I expect that the hardware will have limited this to 16, but just in case it hasn't, check for this corner case.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/platform/cros-ec/cros-ec-cec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/media/cec/platform/cros-ec/cros-ec-cec.c b/drivers/media/cec/platform/cros-ec/cros-ec-cec.c index 3b583ed4da9d..e5ebaa58be45 100644 --- a/drivers/media/cec/platform/cros-ec/cros-ec-cec.c +++ b/drivers/media/cec/platform/cros-ec/cros-ec-cec.c @@ -44,6 +44,8 @@ static void handle_cec_message(struct cros_ec_cec *cros_ec_cec) uint8_t *cec_message = cros_ec->event_data.data.cec_message; unsigned int len = cros_ec->event_size;
+ if (len > CEC_MAX_MSG_SIZE) + len = CEC_MAX_MSG_SIZE; cros_ec_cec->rx_msg.len = len; memcpy(cros_ec_cec->rx_msg.msg, cec_message, len);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 20694e96ca089ce6693c2348f8f628ee621e4e74 ]
Fix a compiler warning:
drivers/media/dvb-frontends/drxk_hard.c: In function 'drxk_read_ucblocks': drivers/media/dvb-frontends/drxk_hard.c:6673:21: warning: 'err' may be used uninitialized [-Wmaybe-uninitialized] 6673 | *ucblocks = (u32) err; | ^~~~~~~~~ drivers/media/dvb-frontends/drxk_hard.c:6663:13: note: 'err' was declared here 6663 | u16 err; | ^~~
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/dvb-frontends/drxk_hard.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/dvb-frontends/drxk_hard.c b/drivers/media/dvb-frontends/drxk_hard.c index 9430295a8175..ef0d063ec352 100644 --- a/drivers/media/dvb-frontends/drxk_hard.c +++ b/drivers/media/dvb-frontends/drxk_hard.c @@ -6660,7 +6660,7 @@ static int drxk_read_snr(struct dvb_frontend *fe, u16 *snr) static int drxk_read_ucblocks(struct dvb_frontend *fe, u32 *ucblocks) { struct drxk_state *state = fe->demodulator_priv; - u16 err; + u16 err = 0;
dprintk(1, "\n");
From: Rory Liu hellojacky0226@hotmail.com
[ Upstream commit 594b6bdde2e7833a56413de5092b6e4188d33ff7 ]
The Google Kuldax device uses the same approach as the Google Brask which enables the HDMI CEC via the cros-ec-cec driver.
Signed-off-by: Rory Liu hellojacky0226@hotmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/platform/cros-ec/cros-ec-cec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/media/cec/platform/cros-ec/cros-ec-cec.c b/drivers/media/cec/platform/cros-ec/cros-ec-cec.c index e5ebaa58be45..6ebedc71d67d 100644 --- a/drivers/media/cec/platform/cros-ec/cros-ec-cec.c +++ b/drivers/media/cec/platform/cros-ec/cros-ec-cec.c @@ -223,6 +223,8 @@ static const struct cec_dmi_match cec_dmi_match_table[] = { { "Google", "Moli", "0000:00:02.0", "Port B" }, /* Google Kinox */ { "Google", "Kinox", "0000:00:02.0", "Port B" }, + /* Google Kuldax */ + { "Google", "Kuldax", "0000:00:02.0", "Port B" }, };
static struct device *cros_ec_cec_find_hdmi_dev(struct device *dev,
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit 7718999356234d9cc6a11b4641bb773928f1390f ]
v4l2_device_unregister need to be called to put the refcount got by v4l2_device_register when vdec_probe fails or vdec_remove is called.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/media/meson/vdec/vdec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/staging/media/meson/vdec/vdec.c b/drivers/staging/media/meson/vdec/vdec.c index 8549d95be0f2..52f224d8def1 100644 --- a/drivers/staging/media/meson/vdec/vdec.c +++ b/drivers/staging/media/meson/vdec/vdec.c @@ -1102,6 +1102,7 @@ static int vdec_probe(struct platform_device *pdev)
err_vdev_release: video_device_release(vdev); + v4l2_device_unregister(&core->v4l2_dev); return ret; }
@@ -1110,6 +1111,7 @@ static int vdec_remove(struct platform_device *pdev) struct amvdec_core *core = platform_get_drvdata(pdev);
video_unregister_device(core->vdev_dec); + v4l2_device_unregister(&core->v4l2_dev);
return 0; }
From: Benjamin Gaignard benjamin.gaignard@collabora.com
[ Upstream commit 4bec03301ecd81760c159402467dbb2cfd527684 ]
Store HEVC bit depth in context. Bit depth is equal to hevc sps bit_depth_luma_minus8 + 8.
Signed-off-by: Benjamin Gaignard benjamin.gaignard@collabora.com Reviewed-by: Ezequiel Garcia ezequiel@vanguardiasur.com.ar Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/media/hantro/hantro_drv.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c index 2036f72eeb4a..1dd8312d824c 100644 --- a/drivers/staging/media/hantro/hantro_drv.c +++ b/drivers/staging/media/hantro/hantro_drv.c @@ -251,6 +251,11 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq)
static int hantro_try_ctrl(struct v4l2_ctrl *ctrl) { + struct hantro_ctx *ctx; + + ctx = container_of(ctrl->handler, + struct hantro_ctx, ctrl_handler); + if (ctrl->id == V4L2_CID_STATELESS_H264_SPS) { const struct v4l2_ctrl_h264_sps *sps = ctrl->p_new.p_h264_sps;
@@ -272,6 +277,8 @@ static int hantro_try_ctrl(struct v4l2_ctrl *ctrl) if (sps->bit_depth_luma_minus8 != 0) /* Only 8-bit is supported */ return -EINVAL; + + ctx->bit_depth = sps->bit_depth_luma_minus8 + 8; } else if (ctrl->id == V4L2_CID_STATELESS_VP9_FRAME) { const struct v4l2_ctrl_vp9_frame *dec_params = ctrl->p_new.p_vp9_frame;
From: Benjamin Gaignard benjamin.gaignard@collabora.com
[ Upstream commit 8a438580a09ecef78cd6c5825d628b4d5ae1c127 ]
SAO and FILTER buffers size depend of the bit depth. Make sure we have enough space for 10bit bitstreams.
Signed-off-by: Benjamin Gaignard benjamin.gaignard@collabora.com Reviewed-by: Ezequiel Garcia ezequiel@vanguardiasur.com.ar Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/media/hantro/hantro_hevc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/media/hantro/hantro_hevc.c b/drivers/staging/media/hantro/hantro_hevc.c index b990bc98164c..9383fb7081f6 100644 --- a/drivers/staging/media/hantro/hantro_hevc.c +++ b/drivers/staging/media/hantro/hantro_hevc.c @@ -104,7 +104,7 @@ static int tile_buffer_reallocate(struct hantro_ctx *ctx) hevc_dec->tile_bsd.cpu = NULL; }
- size = VERT_FILTER_RAM_SIZE * height64 * (num_tile_cols - 1); + size = (VERT_FILTER_RAM_SIZE * height64 * (num_tile_cols - 1) * ctx->bit_depth) / 8; hevc_dec->tile_filter.cpu = dma_alloc_coherent(vpu->dev, size, &hevc_dec->tile_filter.dma, GFP_KERNEL); @@ -112,7 +112,7 @@ static int tile_buffer_reallocate(struct hantro_ctx *ctx) goto err_free_tile_buffers; hevc_dec->tile_filter.size = size;
- size = VERT_SAO_RAM_SIZE * height64 * (num_tile_cols - 1); + size = (VERT_SAO_RAM_SIZE * height64 * (num_tile_cols - 1) * ctx->bit_depth) / 8; hevc_dec->tile_sao.cpu = dma_alloc_coherent(vpu->dev, size, &hevc_dec->tile_sao.dma, GFP_KERNEL);
From: Benjamin Gaignard benjamin.gaignard@collabora.com
[ Upstream commit f64853ad7f964b3bf7c1d63b27ca7ef972797a1c ]
The chroma offset depends of the bitstream depth. Make sure that ctx->bit_depth is used to compute it.
Signed-off-by: Benjamin Gaignard benjamin.gaignard@collabora.com Reviewed-by: Ezequiel Garcia ezequiel@vanguardiasur.com.ar Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/media/hantro/hantro_g2_hevc_dec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/media/hantro/hantro_g2_hevc_dec.c b/drivers/staging/media/hantro/hantro_g2_hevc_dec.c index 233ecd863d5f..a917079a6ed3 100644 --- a/drivers/staging/media/hantro/hantro_g2_hevc_dec.c +++ b/drivers/staging/media/hantro/hantro_g2_hevc_dec.c @@ -12,7 +12,7 @@
static size_t hantro_hevc_chroma_offset(struct hantro_ctx *ctx) { - return ctx->dst_fmt.width * ctx->dst_fmt.height; + return ctx->dst_fmt.width * ctx->dst_fmt.height * ctx->bit_depth / 8; }
static size_t hantro_hevc_motion_vectors_offset(struct hantro_ctx *ctx)
From: Sakari Ailus sakari.ailus@linux.intel.com
[ Upstream commit 2ba3e38517f5a4ebf9c997168079dca01b7f9fc6 ]
The state argument for the functions for obtaining various parts of the state is NULL if it is called by drivers for active state. Fail graciously in that case instead of dereferencing a NULL pointer.
Suggested-by: Bingbu Cao bingbu.cao@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Reviewed-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/media/v4l2-subdev.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h index 9689f38a0af1..ec1896886dbd 100644 --- a/include/media/v4l2-subdev.h +++ b/include/media/v4l2-subdev.h @@ -1046,6 +1046,8 @@ v4l2_subdev_get_pad_format(struct v4l2_subdev *sd, struct v4l2_subdev_state *state, unsigned int pad) { + if (WARN_ON(!state)) + return NULL; if (WARN_ON(pad >= sd->entity.num_pads)) pad = 0; return &state->pads[pad].try_fmt; @@ -1064,6 +1066,8 @@ v4l2_subdev_get_pad_crop(struct v4l2_subdev *sd, struct v4l2_subdev_state *state, unsigned int pad) { + if (WARN_ON(!state)) + return NULL; if (WARN_ON(pad >= sd->entity.num_pads)) pad = 0; return &state->pads[pad].try_crop; @@ -1082,6 +1086,8 @@ v4l2_subdev_get_pad_compose(struct v4l2_subdev *sd, struct v4l2_subdev_state *state, unsigned int pad) { + if (WARN_ON(!state)) + return NULL; if (WARN_ON(pad >= sd->entity.num_pads)) pad = 0; return &state->pads[pad].try_compose;
From: Maxime Ripard maxime@cerno.tech
[ Upstream commit 4190e8bbcbc77a9c36724681801cedc5229e7fc2 ]
If our HSM clock has not been properly initialized, any register access will silently lock up the system.
Let's check that this can't happen by adding a check for the rate before any register access, and error out otherwise.
Link: https://lore.kernel.org/dri-devel/20220922145448.w3xfywkn5ecak2et@pengutroni... Reviewed-by: Javier Martinez Canillas javierm@redhat.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://patchwork.freedesktop.org/patch/msgid/20220929-rpi-pi3-unplugged-fix... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vc4/vc4_hdmi.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 780a19a75c3f..874c6bd787c5 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -2869,6 +2869,7 @@ static int vc4_hdmi_runtime_resume(struct device *dev) struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev); unsigned long __maybe_unused flags; u32 __maybe_unused value; + unsigned long rate; int ret;
/* @@ -2884,6 +2885,21 @@ static int vc4_hdmi_runtime_resume(struct device *dev) if (ret) return ret;
+ /* + * Whenever the RaspberryPi boots without an HDMI monitor + * plugged in, the firmware won't have initialized the HSM clock + * rate and it will be reported as 0. + * + * If we try to access a register of the controller in such a + * case, it will lead to a silent CPU stall. Let's make sure we + * prevent such a case. + */ + rate = clk_get_rate(vc4_hdmi->hsm_clock); + if (!rate) { + ret = -EINVAL; + goto err_disable_clk; + } + if (vc4_hdmi->variant->reset) vc4_hdmi->variant->reset(vc4_hdmi);
@@ -2905,6 +2921,10 @@ static int vc4_hdmi_runtime_resume(struct device *dev) #endif
return 0; + +err_disable_clk: + clk_disable_unprepare(vc4_hdmi->hsm_clock); + return ret; }
static int vc4_hdmi_bind(struct device *dev, struct device *master, void *data)
From: Ashish Kalra ashish.kalra@amd.com
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra ashish.kalra@amd.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/apei/ghes.c | 2 +- include/acpi/ghes.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 80ad530583c9..9952f3a792ba 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -163,7 +163,7 @@ static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx) clear_fixmap(fixmap_idx); }
-int ghes_estatus_pool_init(int num_ghes) +int ghes_estatus_pool_init(unsigned int num_ghes) { unsigned long addr, len; int rc; diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 34fb3431a8f3..292a5c40bd0c 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -71,7 +71,7 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb); void ghes_unregister_vendor_record_notifier(struct notifier_block *nb); #endif
-int ghes_estatus_pool_init(int num_ghes); +int ghes_estatus_pool_init(unsigned int num_ghes);
/* From drivers/edac/ghes_edac.c */
From: Jason A. Donenfeld Jason@zx2c4.com
[ Upstream commit 96cb9d0554457086664d3bd10630b11193d863f1 ]
Rather than busy looping, yield back to the scheduler and sleep for a bit in the event that there's no data. This should hopefully prevent the stalls that Mark reported:
<6>[ 3.362859] Freeing initrd memory: 16196K <3>[ 23.160131] rcu: INFO: rcu_sched self-detected stall on CPU <3>[ 23.166057] rcu: 0-....: (2099 ticks this GP) idle=03b4/1/0x40000002 softirq=28/28 fqs=1050 <4>[ 23.174895] (t=2101 jiffies g=-1147 q=2353 ncpus=4) <4>[ 23.180203] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 <4>[ 23.186125] Hardware name: BCM2835 <4>[ 23.189837] PC is at bcm2835_rng_read+0x30/0x6c <4>[ 23.194709] LR is at hwrng_fillfn+0x71/0xf4 <4>[ 23.199218] pc : [<c07ccdc8>] lr : [<c07cb841>] psr: 40000033 <4>[ 23.205840] sp : f093df70 ip : 00000000 fp : 00000000 <4>[ 23.211404] r10: c3c7e800 r9 : 00000000 r8 : c17e6b20 <4>[ 23.216968] r7 : c17e6b64 r6 : c18b0a74 r5 : c07ccd99 r4 : c3f171c0 <4>[ 23.223855] r3 : 000fffff r2 : 00000040 r1 : c3c7e800 r0 : c3f171c0 <4>[ 23.230743] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none <4>[ 23.238426] Control: 50c5387d Table: 0020406a DAC: 00000051 <4>[ 23.244519] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1
Link: https://lore.kernel.org/all/Y0QJLauamRnCDUef@sirena.org.uk/ Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Acked-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/hw_random/bcm2835-rng.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/hw_random/bcm2835-rng.c b/drivers/char/hw_random/bcm2835-rng.c index e7dd457e9b22..e98fcac578d6 100644 --- a/drivers/char/hw_random/bcm2835-rng.c +++ b/drivers/char/hw_random/bcm2835-rng.c @@ -71,7 +71,7 @@ static int bcm2835_rng_read(struct hwrng *rng, void *buf, size_t max, while ((rng_readl(priv, RNG_STATUS) >> 24) == 0) { if (!wait) return 0; - cpu_relax(); + hwrng_msleep(rng, 1000); }
num_words = rng_readl(priv, RNG_STATUS) >> 24;
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 02bac94bd8efd75f615ac7515dd2def75b43e5b9 ]
We should not be completing requests from a task context that has already undergone io_uring cancellations, i.e. __io_uring_cancel(), as there are some assumptions, e.g. around cached task refs draining. Remove iopolling from io_ring_ctx_wait_and_kill() as it can be called later after PF_EXITING is set with the last task_work run.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/7c03cc91455c4a1af49c6b9cbda4e57ea467aa11.166589118... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/io_uring.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index c5dd483a7de2..d29f397f095e 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2653,15 +2653,12 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) io_poll_remove_all(ctx, NULL, true); mutex_unlock(&ctx->uring_lock);
- /* failed during ring init, it couldn't have issued any requests */ - if (ctx->rings) { + /* + * If we failed setting up the ctx, we might not have any rings + * and therefore did not submit any requests + */ + if (ctx->rings) io_kill_timeouts(ctx, NULL, true); - /* if we failed setting up the ctx, we might not have any rings */ - io_iopoll_try_reap_events(ctx); - /* drop cached put refs after potentially doing completions */ - if (current->io_uring) - io_uring_drop_tctx_refs(current); - }
INIT_WORK(&ctx->exit_work, io_ring_exit_work); /*
From: Uday Shankar ushankar@purestorage.com
[ Upstream commit 2331ce6126be8864b39490e705286b66e2344aac ]
Userspace can currently write to sysfs to transition sdev_state to RUNNING or OFFLINE from any source state. This causes issues because proper transitioning out of some states involves steps besides just changing sdev_state, so allowing userspace to change sdev_state regardless of the source state can result in inconsistencies; e.g. with ISCSI we can end up with sdev_state == SDEV_RUNNING while the device queue is quiesced. Any task attempting I/O on the device will then hang, and in more recent kernels, iscsid will hang as well.
More detail about this bug is provided in my first attempt:
https://groups.google.com/g/open-iscsi/c/PNKca4HgPDs/m/CXaDkntOAQAJ
Link: https://lore.kernel.org/r/20220924000241.2967323-1-ushankar@purestorage.com Signed-off-by: Uday Shankar ushankar@purestorage.com Suggested-by: Mike Christie michael.christie@oracle.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_sysfs.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 5d61f58399dc..dc41d7c6b9b1 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -828,6 +828,14 @@ store_state_field(struct device *dev, struct device_attribute *attr, }
mutex_lock(&sdev->state_mutex); + switch (sdev->sdev_state) { + case SDEV_RUNNING: + case SDEV_OFFLINE: + break; + default: + mutex_unlock(&sdev->state_mutex); + return -EINVAL; + } if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING) { ret = 0; } else {
From: Samuel Bailey samuel.bailey1@gmail.com
[ Upstream commit 79425b297f56bd481c6e97700a9a4e44c7bcfa35 ]
The MadCatz variant of the MMO7 mouse has the ID 0738:1713 and the same quirks as the Saitek variant.
Signed-off-by: Samuel Bailey samuel.bailey1@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + drivers/hid/hid-saitek.c | 2 ++ 3 files changed, 4 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 043cf1cc8794..256795ed6247 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -867,6 +867,7 @@ #define USB_DEVICE_ID_MADCATZ_BEATPAD 0x4540 #define USB_DEVICE_ID_MADCATZ_RAT5 0x1705 #define USB_DEVICE_ID_MADCATZ_RAT9 0x1709 +#define USB_DEVICE_ID_MADCATZ_MMO7 0x1713
#define USB_VENDOR_ID_MCC 0x09db #define USB_DEVICE_ID_MCC_PMD1024LS 0x0076 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 70f602c64fd1..50e1c717fc0a 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -620,6 +620,7 @@ static const struct hid_device_id hid_have_special_driver[] = { { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_MMO7) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_RAT5) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_RAT9) }, + { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_MMO7) }, #endif #if IS_ENABLED(CONFIG_HID_SAMSUNG) { HID_USB_DEVICE(USB_VENDOR_ID_SAMSUNG, USB_DEVICE_ID_SAMSUNG_IR_REMOTE) }, diff --git a/drivers/hid/hid-saitek.c b/drivers/hid/hid-saitek.c index c7bf14c01960..b84e975977c4 100644 --- a/drivers/hid/hid-saitek.c +++ b/drivers/hid/hid-saitek.c @@ -187,6 +187,8 @@ static const struct hid_device_id saitek_devices[] = { .driver_data = SAITEK_RELEASE_MODE_RAT7 }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_MMO7), .driver_data = SAITEK_RELEASE_MODE_MMO7 }, + { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_MMO7), + .driver_data = SAITEK_RELEASE_MODE_MMO7 }, { } };
From: Danijel Slivka danijel.slivka@amd.com
[ Upstream commit 65f8682b9aaae20c2cdee993e6fe52374ad513c9 ]
For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime.
v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT which indicates that VF MMIO write access is not allowed in sriov runtime
Signed-off-by: Danijel Slivka danijel.slivka@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 4 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 9be57389301b..af5aeb0ec2e9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -726,6 +726,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev) adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE; }
+ if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID) + /* VF MMIO access (except mailbox range) from CPU + * will be blocked during sriov runtime + */ + adev->virt.caps |= AMDGPU_VF_MMIO_ACCESS_PROTECT; + /* we have the ability to check now */ if (amdgpu_sriov_vf(adev)) { switch (adev->asic_type) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h index 239f232f9c02..617d072275eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h @@ -31,6 +31,7 @@ #define AMDGPU_SRIOV_CAPS_IS_VF (1 << 2) /* this GPU is a virtual function */ #define AMDGPU_PASSTHROUGH_MODE (1 << 3) /* thw whole GPU is pass through for VM */ #define AMDGPU_SRIOV_CAPS_RUNTIME (1 << 4) /* is out of full access mode */ +#define AMDGPU_VF_MMIO_ACCESS_PROTECT (1 << 5) /* MMIO write access is not allowed in sriov runtime */
/* flags for indirect register access path supported by rlcg for sriov */ #define AMDGPU_RLCG_GC_WRITE_LEGACY (0x8 << 28) @@ -294,6 +295,9 @@ struct amdgpu_video_codec_info; #define amdgpu_passthrough(adev) \ ((adev)->virt.caps & AMDGPU_PASSTHROUGH_MODE)
+#define amdgpu_sriov_vf_mmio_access_protection(adev) \ +((adev)->virt.caps & AMDGPU_VF_MMIO_ACCESS_PROTECT) + static inline bool is_virtual_machine(void) { #if defined(CONFIG_X86) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 690fd4f639f1..04130f8813ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2301,7 +2301,11 @@ void amdgpu_vm_manager_init(struct amdgpu_device *adev) */ #ifdef CONFIG_X86_64 if (amdgpu_vm_update_mode == -1) { - if (amdgpu_gmc_vram_full_visible(&adev->gmc)) + /* For asic with VF MMIO access protection + * avoid using CPU for VM table updates + */ + if (amdgpu_gmc_vram_full_visible(&adev->gmc) && + !amdgpu_sriov_vf_mmio_access_protection(adev)) adev->vm_manager.vm_update_mode = AMDGPU_VM_USE_CPU_FOR_COMPUTE; else
From: Kenneth Feng kenneth.feng@amd.com
[ Upstream commit f700486cd1f2bf381671d1c2c7dc9000db10c50e ]
skip loading pptable from driver on secure board since it's loaded from psp.
Signed-off-by: Kenneth Feng kenneth.feng@amd.com Reviewed-by: Guan Yu Guan.Yu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 93f9b8377539..750d8da84fac 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -210,7 +210,8 @@ int smu_v13_0_init_pptable_microcode(struct smu_context *smu) return 0;
if ((adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 7)) || - (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 0))) + (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 0)) || + (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 10))) return 0;
/* override pptable_id from driver parameter */
From: Nathan Chancellor nathan@kernel.org
[ Upstream commit e688ba3e276422aa88eae7a54186a95320836081 ]
When booting a kernel compiled with CONFIG_CFI_CLANG on a machine with an RX 6700 XT, there is a CFI failure in kfd_destroy_mqd_cp():
[ 12.894543] CFI failure at kfd_destroy_mqd_cp+0x2a/0x40 [amdgpu] (target: hqd_destroy_v10_3+0x0/0x260 [amdgpu]; expected type: 0x8594d794)
Clang's kernel Control Flow Integrity (kCFI) makes sure that all indirect call targets have a type that exactly matches the function pointer prototype. In this case, hqd_destroy()'s third parameter, reset_type, should have a type of 'uint32_t' but every implementation of this callback has a third parameter type of 'enum kfd_preempt_type'.
Update the function pointer prototype to match reality so that there is no more CFI violation.
Link: https://github.com/ClangBuiltLinux/linux/issues/1738 Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index e85364dff4e0..5cb3e8634739 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -262,8 +262,9 @@ struct kfd2kgd_calls { uint32_t queue_id);
int (*hqd_destroy)(struct amdgpu_device *adev, void *mqd, - uint32_t reset_type, unsigned int timeout, - uint32_t pipe_id, uint32_t queue_id); + enum kfd_preempt_type reset_type, + unsigned int timeout, uint32_t pipe_id, + uint32_t queue_id);
bool (*hqd_sdma_is_occupied)(struct amdgpu_device *adev, void *mqd);
From: Yifan Zha Yifan.Zha@amd.com
[ Upstream commit 97a3d6090f5c2a165dc88bda05c1dcf9f08bf886 ]
[Why] L1 blocks most of GC registers accessing by MMIO.
[How] Use RLCG interface to program GC registers under SRIOV VF in full access time.
Signed-off-by: Yifan Zha Yifan.Zha@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 18 +++++++++++------- 3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c index 0b0a72ca5695..7e80caa05060 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c @@ -111,7 +111,7 @@ static int init_interrupts_v11(struct amdgpu_device *adev, uint32_t pipe_id)
lock_srbm(adev, mec, pipe, 0, 0);
- WREG32(SOC15_REG_OFFSET(GC, 0, regCPC_INT_CNTL), + WREG32_SOC15(GC, 0, regCPC_INT_CNTL, CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK | CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index daf8ba8235cd..03775e0a8100 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1729,7 +1729,7 @@ static void gfx_v11_0_init_compute_vmid(struct amdgpu_device *adev) WREG32_SOC15(GC, 0, regSH_MEM_BASES, sh_mem_bases);
/* Enable trap for each kfd vmid. */ - data = RREG32(SOC15_REG_OFFSET(GC, 0, regSPI_GDBG_PER_VMID_CNTL)); + data = RREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL); data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1); } soc21_grbm_select(adev, 0, 0, 0, 0); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c index 1471bfb9ae38..2475fdbe8010 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -185,6 +185,10 @@ static void gmc_v11_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t vmid, /* Use register 17 for GART */ const unsigned eng = 17; unsigned int i; + unsigned char hub_ip = 0; + + hub_ip = (vmhub == AMDGPU_GFXHUB_0) ? + GC_HWIP : MMHUB_HWIP;
spin_lock(&adev->gmc.invalidate_lock); /* @@ -198,8 +202,8 @@ static void gmc_v11_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t vmid, if (use_semaphore) { for (i = 0; i < adev->usec_timeout; i++) { /* a read return value of 1 means semaphore acuqire */ - tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem + - hub->eng_distance * eng); + tmp = RREG32_RLC_NO_KIQ(hub->vm_inv_eng0_sem + + hub->eng_distance * eng, hub_ip); if (tmp & 0x1) break; udelay(1); @@ -209,12 +213,12 @@ static void gmc_v11_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t vmid, DRM_ERROR("Timeout waiting for sem acquire in VM flush!\n"); }
- WREG32_NO_KIQ(hub->vm_inv_eng0_req + hub->eng_distance * eng, inv_req); + WREG32_RLC_NO_KIQ(hub->vm_inv_eng0_req + hub->eng_distance * eng, inv_req, hub_ip);
/* Wait for ACK with a delay.*/ for (i = 0; i < adev->usec_timeout; i++) { - tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_ack + - hub->eng_distance * eng); + tmp = RREG32_RLC_NO_KIQ(hub->vm_inv_eng0_ack + + hub->eng_distance * eng, hub_ip); tmp &= 1 << vmid; if (tmp) break; @@ -228,8 +232,8 @@ static void gmc_v11_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t vmid, * add semaphore release after invalidation, * write with 0 means semaphore release */ - WREG32_NO_KIQ(hub->vm_inv_eng0_sem + - hub->eng_distance * eng, 0); + WREG32_RLC_NO_KIQ(hub->vm_inv_eng0_sem + + hub->eng_distance * eng, 0, hub_ip);
/* Issue additional private vm invalidation to MMHUB */ if ((vmhub != AMDGPU_GFXHUB_0) &&
From: YuBiao Wang YuBiao.Wang@amd.com
[ Upstream commit 2abe92c7adc9c0397ba51bf74909b85bc0fff84b ]
[Why] If mes is not dequeued during fini, mes will be in an uncleaned state during reload, then mes couldn't receive some commands which leads to reload failure.
[How] Perform MES dequeue via MMIO after all the unmap jobs are done by mes and before kiq fini.
v2: Move the dequeue operation inside kiq_hw_fini.
Signed-off-by: YuBiao Wang YuBiao.Wang@amd.com Reviewed-by: Jack Xiao Jack.Xiao@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 42 ++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index f92744b8d79d..2dd827472d6e 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -1145,6 +1145,42 @@ static int mes_v11_0_sw_fini(void *handle) return 0; }
+static void mes_v11_0_kiq_dequeue_sched(struct amdgpu_device *adev) +{ + uint32_t data; + int i; + + mutex_lock(&adev->srbm_mutex); + soc21_grbm_select(adev, 3, AMDGPU_MES_SCHED_PIPE, 0, 0); + + /* disable the queue if it's active */ + if (RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) & 1) { + WREG32_SOC15(GC, 0, regCP_HQD_DEQUEUE_REQUEST, 1); + for (i = 0; i < adev->usec_timeout; i++) { + if (!(RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) & 1)) + break; + udelay(1); + } + } + data = RREG32_SOC15(GC, 0, regCP_HQD_PQ_DOORBELL_CONTROL); + data = REG_SET_FIELD(data, CP_HQD_PQ_DOORBELL_CONTROL, + DOORBELL_EN, 0); + data = REG_SET_FIELD(data, CP_HQD_PQ_DOORBELL_CONTROL, + DOORBELL_HIT, 1); + WREG32_SOC15(GC, 0, regCP_HQD_PQ_DOORBELL_CONTROL, data); + + WREG32_SOC15(GC, 0, regCP_HQD_PQ_DOORBELL_CONTROL, 0); + + WREG32_SOC15(GC, 0, regCP_HQD_PQ_WPTR_LO, 0); + WREG32_SOC15(GC, 0, regCP_HQD_PQ_WPTR_HI, 0); + WREG32_SOC15(GC, 0, regCP_HQD_PQ_RPTR, 0); + + soc21_grbm_select(adev, 0, 0, 0, 0); + mutex_unlock(&adev->srbm_mutex); + + adev->mes.ring.sched.ready = false; +} + static void mes_v11_0_kiq_setting(struct amdgpu_ring *ring) { uint32_t tmp; @@ -1196,6 +1232,9 @@ static int mes_v11_0_kiq_hw_init(struct amdgpu_device *adev)
static int mes_v11_0_kiq_hw_fini(struct amdgpu_device *adev) { + if (adev->mes.ring.sched.ready) + mes_v11_0_kiq_dequeue_sched(adev); + mes_v11_0_enable(adev, false); return 0; } @@ -1251,9 +1290,6 @@ static int mes_v11_0_hw_init(void *handle)
static int mes_v11_0_hw_fini(void *handle) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; - - adev->mes.ring.sched.ready = false; return 0; }
From: Xander Li xander_li@kingston.com.tw
[ Upstream commit ac9b57d4e1e3ecf0122e915bbba1bd4c90ec3031 ]
Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to process. The firmware version is locked by these SSDs, we can not expect firmware improvement, so disable Write_Zeroes cmd.
Signed-off-by: Xander Li xander_li@kingston.com.tw Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 57cc2bb5b1a2..554468ea5a2a 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3508,6 +3508,16 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, + { PCI_DEVICE(0x2646, 0x5018), /* KINGSTON OM8SFP4xxxxP OS21012 NVMe SSD */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x2646, 0x5016), /* KINGSTON OM3PGP4xxxxP OS21011 NVMe SSD */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x2646, 0x501A), /* KINGSTON OM8PGP4xxxxP OS21005 NVMe SSD */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x2646, 0x501B), /* KINGSTON OM8PGP4xxxxQ OS21005 NVMe SSD */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x2646, 0x501E), /* KINGSTON OM3PGP4xxxxQ OS21011 NVMe SSD */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1e4B, 0x1001), /* MAXIO MAP1001 */ .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1e4B, 0x1002), /* MAXIO MAP1002 */
From: Martin Tůma martin.tuma@digiteqautomotive.com
[ Upstream commit b8caf0a0e04583fb71e21495bef84509182227ea ]
The missing "platform" alias is required for the mgb4 v4l2 driver to load the i2c controller driver when probing the HW.
Signed-off-by: Martin Tůma martin.tuma@digiteqautomotive.com Acked-by: Michal Simek michal.simek@amd.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-xiic.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/busses/i2c-xiic.c b/drivers/i2c/busses/i2c-xiic.c index b3fe6b2aa3ca..277a02455cdd 100644 --- a/drivers/i2c/busses/i2c-xiic.c +++ b/drivers/i2c/busses/i2c-xiic.c @@ -920,6 +920,7 @@ static struct platform_driver xiic_i2c_driver = {
module_platform_driver(xiic_i2c_driver);
+MODULE_ALIAS("platform:" DRIVER_NAME); MODULE_AUTHOR("info@mocean-labs.com"); MODULE_DESCRIPTION("Xilinx I2C bus driver"); MODULE_LICENSE("GPL v2");
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit d4347d50407daea6237872281ece64c4bdf1ec99 ]
bio_put() with REQ_ALLOC_CACHE assumes that it's executed not from an irq context. Let's add a warning if the invariant is not respected, especially since there is a couple of places removing REQ_POLLED by hand without also clearing REQ_ALLOC_CACHE.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/558d78313476c4e9c233902efa0092644c3d420a.166612246... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/bio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bio.c b/block/bio.c index 77e3b764a078..fc2364cf1775 100644 --- a/block/bio.c +++ b/block/bio.c @@ -741,7 +741,7 @@ void bio_put(struct bio *bio) return; }
- if (bio->bi_opf & REQ_ALLOC_CACHE) { + if ((bio->bi_opf & REQ_ALLOC_CACHE) && !WARN_ON_ONCE(in_interrupt())) { struct bio_alloc_cache *cache;
bio_uninit(bio);
From: Marek Vasut marex@denx.de
[ Upstream commit 2ff4ba9e37024735f5cefc5ea2a73fc66addfe0e ]
Add custom I2C accessors to this driver, since the regular I2C regmap ones do not generate the exact I2C transfers required by the chip. On I2C write, it is mandatory to send transfer length first, on read the chip returns the transfer length in first byte. Instead of always reading back 8 bytes, which is the default and also the size of the entire register file, set BCP register to 1 to read out 1 byte which is less wasteful.
Fixes: 892e0ddea1aa ("clk: rs9: Add Renesas 9-series PCIe clock generator driver") Reported-by: Alexander Stein alexander.stein@ew.tq-group.com Signed-off-by: Marek Vasut marex@denx.de Link: https://lore.kernel.org/r/20220929195521.284497-1-marex@denx.de Reviewed-by: Alexander Stein alexander.stein@ew.tq-group.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk-renesas-pcie.c | 65 ++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/clk-renesas-pcie.c b/drivers/clk/clk-renesas-pcie.c index 4f5df1fc74b4..e6247141d0c0 100644 --- a/drivers/clk/clk-renesas-pcie.c +++ b/drivers/clk/clk-renesas-pcie.c @@ -90,13 +90,66 @@ static const struct regmap_access_table rs9_writeable_table = { .n_yes_ranges = ARRAY_SIZE(rs9_writeable_ranges), };
+static int rs9_regmap_i2c_write(void *context, + unsigned int reg, unsigned int val) +{ + struct i2c_client *i2c = context; + const u8 data[3] = { reg, 1, val }; + const int count = ARRAY_SIZE(data); + int ret; + + ret = i2c_master_send(i2c, data, count); + if (ret == count) + return 0; + else if (ret < 0) + return ret; + else + return -EIO; +} + +static int rs9_regmap_i2c_read(void *context, + unsigned int reg, unsigned int *val) +{ + struct i2c_client *i2c = context; + struct i2c_msg xfer[2]; + u8 txdata = reg; + u8 rxdata[2]; + int ret; + + xfer[0].addr = i2c->addr; + xfer[0].flags = 0; + xfer[0].len = 1; + xfer[0].buf = (void *)&txdata; + + xfer[1].addr = i2c->addr; + xfer[1].flags = I2C_M_RD; + xfer[1].len = 2; + xfer[1].buf = (void *)rxdata; + + ret = i2c_transfer(i2c->adapter, xfer, 2); + if (ret < 0) + return ret; + if (ret != 2) + return -EIO; + + /* + * Byte 0 is transfer length, which is always 1 due + * to BCP register programming to 1 in rs9_probe(), + * ignore it and use data from Byte 1. + */ + *val = rxdata[1]; + return 0; +} + static const struct regmap_config rs9_regmap_config = { .reg_bits = 8, .val_bits = 8, - .cache_type = REGCACHE_FLAT, - .max_register = 0x8, + .cache_type = REGCACHE_NONE, + .max_register = RS9_REG_BCP, .rd_table = &rs9_readable_table, .wr_table = &rs9_writeable_table, + .reg_write = rs9_regmap_i2c_write, + .reg_read = rs9_regmap_i2c_read, };
static int rs9_get_output_config(struct rs9_driver_data *rs9, int idx) @@ -242,11 +295,17 @@ static int rs9_probe(struct i2c_client *client) return ret; }
- rs9->regmap = devm_regmap_init_i2c(client, &rs9_regmap_config); + rs9->regmap = devm_regmap_init(&client->dev, NULL, + client, &rs9_regmap_config); if (IS_ERR(rs9->regmap)) return dev_err_probe(&client->dev, PTR_ERR(rs9->regmap), "Failed to allocate register map\n");
+ /* Always read back 1 Byte via I2C */ + ret = regmap_write(rs9->regmap, RS9_REG_BCP, 1); + if (ret < 0) + return ret; + /* Register clock */ for (i = 0; i < rs9->chip_info->num_clks; i++) { snprintf(name, 5, "DIF%d", i);
From: Marek Vasut marex@denx.de
[ Upstream commit f23f1a1e8437e38014fe34a2f12e37e861e5bcc7 ]
Enable CPLD_Dn pull down resistor instead of pull up to avoid intefering with CPLD power off functionality.
Fixes: 510c527b4ff57 ("arm64: dts: imx8mm: Add i.MX8M Mini Toradex Verdin based Menlo board") Signed-off-by: Marek Vasut marex@denx.de Reviewed-by: Fabio Estevam festevam@denx.de Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../arm64/boot/dts/freescale/imx8mm-mx8menlo.dts | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-mx8menlo.dts b/arch/arm64/boot/dts/freescale/imx8mm-mx8menlo.dts index 32f6f2f50c10..43e89859c044 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-mx8menlo.dts +++ b/arch/arm64/boot/dts/freescale/imx8mm-mx8menlo.dts @@ -250,21 +250,21 @@ MX8MM_IOMUXC_SAI1_RXD1_GPIO4_IO3 0x1c4 /* SODIMM 96 */ MX8MM_IOMUXC_SAI1_RXD2_GPIO4_IO4 0x1c4 /* CPLD_D[7] */ - MX8MM_IOMUXC_SAI1_RXD3_GPIO4_IO5 0x1c4 + MX8MM_IOMUXC_SAI1_RXD3_GPIO4_IO5 0x184 /* CPLD_D[6] */ - MX8MM_IOMUXC_SAI1_RXFS_GPIO4_IO0 0x1c4 + MX8MM_IOMUXC_SAI1_RXFS_GPIO4_IO0 0x184 /* CPLD_D[5] */ - MX8MM_IOMUXC_SAI1_TXC_GPIO4_IO11 0x1c4 + MX8MM_IOMUXC_SAI1_TXC_GPIO4_IO11 0x184 /* CPLD_D[4] */ - MX8MM_IOMUXC_SAI1_TXD0_GPIO4_IO12 0x1c4 + MX8MM_IOMUXC_SAI1_TXD0_GPIO4_IO12 0x184 /* CPLD_D[3] */ - MX8MM_IOMUXC_SAI1_TXD1_GPIO4_IO13 0x1c4 + MX8MM_IOMUXC_SAI1_TXD1_GPIO4_IO13 0x184 /* CPLD_D[2] */ - MX8MM_IOMUXC_SAI1_TXD2_GPIO4_IO14 0x1c4 + MX8MM_IOMUXC_SAI1_TXD2_GPIO4_IO14 0x184 /* CPLD_D[1] */ - MX8MM_IOMUXC_SAI1_TXD3_GPIO4_IO15 0x1c4 + MX8MM_IOMUXC_SAI1_TXD3_GPIO4_IO15 0x184 /* CPLD_D[0] */ - MX8MM_IOMUXC_SAI1_TXD4_GPIO4_IO16 0x1c4 + MX8MM_IOMUXC_SAI1_TXD4_GPIO4_IO16 0x184 /* KBD_intK */ MX8MM_IOMUXC_SAI2_MCLK_GPIO4_IO27 0x1c4 /* DISP_reset */
From: Jerry Snitselaar jsnitsel@redhat.com
[ Upstream commit f4cd18c5b2000df0c382f6530eeca9141ea41faf ]
memblock_reserve() expects a physical address, but the address being passed for the TPM final events log is what was returned from early_memremap(). This results in something like the following:
[ 0.000000] memblock_reserve: [0xffffffffff2c0000-0xffffffffff2c00e4] efi_tpm_eventlog_init+0x324/0x370
Pass the address from efi like what is done for the TPM events log.
Fixes: c46f3405692d ("tpm: Reserve the TPM final events table") Cc: Matthew Garrett mjg59@google.com Cc: Jarkko Sakkinen jarkko@kernel.org Cc: Bartosz Szczepanek bsz@semihalf.com Cc: Ard Biesheuvel ardb@kernel.org Signed-off-by: Jerry Snitselaar jsnitsel@redhat.com Acked-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/tpm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/tpm.c b/drivers/firmware/efi/tpm.c index 8f665678e9e3..e8d69bd548f3 100644 --- a/drivers/firmware/efi/tpm.c +++ b/drivers/firmware/efi/tpm.c @@ -97,7 +97,7 @@ int __init efi_tpm_eventlog_init(void) goto out_calc; }
- memblock_reserve((unsigned long)final_tbl, + memblock_reserve(efi.tpm_final_log, tbl_size + sizeof(*final_tbl)); efi_tpm_final_log_size = tbl_size;
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit a9003f74f5a2f487e101f3aa1dd5c3d3a78c6999 ]
As serial communication requires a clean clock signal, the High Speed Serial Communication Interfaces with FIFO (HSCIF) is clocked by a clock that is not affected by Spread Spectrum or Fractional Multiplication.
Hence change the parent clocks for the HSCIF modules from the S0D3_PER clock to the SASYNCPERD1 clock (which has the same clock rate), cfr. R-Car V4H Hardware User's Manual rev. 0.54.
Fixes: 0ab55cf1834177a2 ("clk: renesas: cpg-mssr: Add support for R-Car V4H") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Stephen Boyd sboyd@kernel.org Link: https://lore.kernel.org/r/b7928abc8b9f53d5b06ec8624342f449de3d24ec.166514749... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/renesas/r8a779g0-cpg-mssr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/clk/renesas/r8a779g0-cpg-mssr.c b/drivers/clk/renesas/r8a779g0-cpg-mssr.c index 3fc4233b1ead..c9c59c6f7139 100644 --- a/drivers/clk/renesas/r8a779g0-cpg-mssr.c +++ b/drivers/clk/renesas/r8a779g0-cpg-mssr.c @@ -150,10 +150,10 @@ static const struct cpg_core_clk r8a779g0_core_clks[] __initconst = { };
static const struct mssr_mod_clk r8a779g0_mod_clks[] __initconst = { - DEF_MOD("hscif0", 514, R8A779G0_CLK_S0D3_PER), - DEF_MOD("hscif1", 515, R8A779G0_CLK_S0D3_PER), - DEF_MOD("hscif2", 516, R8A779G0_CLK_S0D3_PER), - DEF_MOD("hscif3", 517, R8A779G0_CLK_S0D3_PER), + DEF_MOD("hscif0", 514, R8A779G0_CLK_SASYNCPERD1), + DEF_MOD("hscif1", 515, R8A779G0_CLK_SASYNCPERD1), + DEF_MOD("hscif2", 516, R8A779G0_CLK_SASYNCPERD1), + DEF_MOD("hscif3", 517, R8A779G0_CLK_SASYNCPERD1), };
/*
From: Taniya Das quic_tdas@quicinc.com
[ Upstream commit ffa20aa581cf5377fc397b0d0ff9d67ea823629b ]
There are few GPU clocks which are powering up the memories and thus enable the FORCE_MEM_PERIPH always for these clocks to force the periph_on signal to remain active during halt state of the clock.
Fixes: a3cc092196ef ("clk: qcom: Add Global Clock controller (GCC) driver for SC7280") Fixes: 3e0f01d6c7e7 ("clk: qcom: Add graphics clock controller driver for SC7280") Signed-off-by: Taniya Das quic_tdas@quicinc.com Signed-off-by: Satya Priya quic_c_skakit@quicinc.com Link: https://lore.kernel.org/r/1666159535-6447-1-git-send-email-quic_c_skakit@qui... Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/qcom/gcc-sc7280.c | 1 + drivers/clk/qcom/gpucc-sc7280.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/clk/qcom/gcc-sc7280.c b/drivers/clk/qcom/gcc-sc7280.c index 7ff64d4d5920..2360f3f95618 100644 --- a/drivers/clk/qcom/gcc-sc7280.c +++ b/drivers/clk/qcom/gcc-sc7280.c @@ -3467,6 +3467,7 @@ static int gcc_sc7280_probe(struct platform_device *pdev) regmap_update_bits(regmap, 0x28004, BIT(0), BIT(0)); regmap_update_bits(regmap, 0x28014, BIT(0), BIT(0)); regmap_update_bits(regmap, 0x71004, BIT(0), BIT(0)); + regmap_update_bits(regmap, 0x7100C, BIT(13), BIT(13));
ret = qcom_cc_register_rcg_dfs(regmap, gcc_dfs_clocks, ARRAY_SIZE(gcc_dfs_clocks)); diff --git a/drivers/clk/qcom/gpucc-sc7280.c b/drivers/clk/qcom/gpucc-sc7280.c index 9a832f2bcf49..1490cd45a654 100644 --- a/drivers/clk/qcom/gpucc-sc7280.c +++ b/drivers/clk/qcom/gpucc-sc7280.c @@ -463,6 +463,7 @@ static int gpu_cc_sc7280_probe(struct platform_device *pdev) */ regmap_update_bits(regmap, 0x1170, BIT(0), BIT(0)); regmap_update_bits(regmap, 0x1098, BIT(0), BIT(0)); + regmap_update_bits(regmap, 0x1098, BIT(13), BIT(13));
return qcom_cc_really_probe(pdev, &gpu_cc_sc7280_desc, regmap); }
From: Max Krummenacher max.krummenacher@toradex.com
[ Upstream commit 2f321fd6d89ad1e9525f5aa1f2be9202c2f3e724 ]
The GPIO signaling ctrl_sleep_moci is currently handled as a gpio hog. But the gpio-hog node is made a child of the wrong gpio controller. Move it to the node representing gpio4 so that it actually works.
Without this carrier board components jumpered to use the signal are unconditionally switched off.
Fixes: a39ed23bdf6e ("arm64: dts: freescale: add initial support for verdin imx8m plus") Signed-off-by: Max Krummenacher max.krummenacher@toradex.com Signed-off-by: Marcel Ziswiler marcel.ziswiler@toradex.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../boot/dts/freescale/imx8mp-verdin.dtsi | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi b/arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi index 1c74c6a19449..360be51a3527 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi @@ -339,16 +339,6 @@ &gpio2 { "SODIMM_82", "SODIMM_70", "SODIMM_72"; - - ctrl-sleep-moci-hog { - gpio-hog; - /* Verdin CTRL_SLEEP_MOCI# (SODIMM 256) */ - gpios = <29 GPIO_ACTIVE_HIGH>; - line-name = "CTRL_SLEEP_MOCI#"; - output-high; - pinctrl-names = "default"; - pinctrl-0 = <&pinctrl_ctrl_sleep_moci>; - }; };
&gpio3 { @@ -417,6 +407,16 @@ &gpio4 { "SODIMM_256", "SODIMM_48", "SODIMM_44"; + + ctrl-sleep-moci-hog { + gpio-hog; + /* Verdin CTRL_SLEEP_MOCI# (SODIMM 256) */ + gpios = <29 GPIO_ACTIVE_HIGH>; + line-name = "CTRL_SLEEP_MOCI#"; + output-high; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_ctrl_sleep_moci>; + }; };
/* On-module I2C */
From: Li Jun jun.li@nxp.com
[ Upstream commit e1ec45b9a8127d9d31bb9fc1d802571a2ba8dd89 ]
pgc_otg1/2 are independent power domain of hsio, they for usb phy, so remove hsio power domain dependency from its node.
Fixes: d39d4bb15310 ("arm64: dts: imx8mm: add GPC node") Signed-off-by: Li Jun jun.li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm.dtsi | 2 -- 1 file changed, 2 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi b/arch/arm64/boot/dts/freescale/imx8mm.dtsi index afb90f59c83c..41204b871f4f 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi @@ -674,13 +674,11 @@ pgc_pcie: power-domain@1 { pgc_otg1: power-domain@2 { #power-domain-cells = <0>; reg = <IMX8MM_POWER_DOMAIN_OTG1>; - power-domains = <&pgc_hsiomix>; };
pgc_otg2: power-domain@3 { #power-domain-cells = <0>; reg = <IMX8MM_POWER_DOMAIN_OTG2>; - power-domains = <&pgc_hsiomix>; };
pgc_gpumix: power-domain@4 {
From: Li Jun jun.li@nxp.com
[ Upstream commit 4585c79ff477f9517b7f384a4fce351417e8fa36 ]
pgc_otg1/2 is actual the power domain of usb PHY, usb controller is in hsio power domain, and pgc_otg1/2 is required to be powered up to detect usb remote wakeup, so move the pgc_otg1/2 power domain to the usb phy node.
Fixes: 01df28d80859 ("arm64: dts: imx8mm: put USB controllers into power-domains") Signed-off-by: Li Jun jun.li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm.dtsi | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi b/arch/arm64/boot/dts/freescale/imx8mm.dtsi index 41204b871f4f..dabd94dc30c4 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi @@ -276,6 +276,7 @@ usbphynop1: usbphynop1 { assigned-clocks = <&clk IMX8MM_CLK_USB_PHY_REF>; assigned-clock-parents = <&clk IMX8MM_SYS_PLL1_100M>; clock-names = "main_clk"; + power-domains = <&pgc_otg1>; };
usbphynop2: usbphynop2 { @@ -285,6 +286,7 @@ usbphynop2: usbphynop2 { assigned-clocks = <&clk IMX8MM_CLK_USB_PHY_REF>; assigned-clock-parents = <&clk IMX8MM_SYS_PLL1_100M>; clock-names = "main_clk"; + power-domains = <&pgc_otg2>; };
soc: soc@0 { @@ -1184,7 +1186,7 @@ usbotg1: usb@32e40000 { assigned-clock-parents = <&clk IMX8MM_SYS_PLL2_500M>; phys = <&usbphynop1>; fsl,usbmisc = <&usbmisc1 0>; - power-domains = <&pgc_otg1>; + power-domains = <&pgc_hsiomix>; status = "disabled"; };
@@ -1204,7 +1206,7 @@ usbotg2: usb@32e50000 { assigned-clock-parents = <&clk IMX8MM_SYS_PLL2_500M>; phys = <&usbphynop2>; fsl,usbmisc = <&usbmisc2 0>; - power-domains = <&pgc_otg2>; + power-domains = <&pgc_hsiomix>; status = "disabled"; };
From: Li Jun jun.li@nxp.com
[ Upstream commit 9e0bbb7a5218d856f1ccf8f1bf38c8869572b464 ]
pgc_otg1 is an independent power domain of hsio, it's for usb phy, so remove hsio power domain from its node.
Fixes: 8b8ebec67360 ("arm64: dts: imx8mn: add GPC node") Signed-off-by: Li Jun jun.li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mn.dtsi | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mn.dtsi b/arch/arm64/boot/dts/freescale/imx8mn.dtsi index cb2836bfbd95..950f432627fe 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mn.dtsi @@ -662,7 +662,6 @@ pgc_hsiomix: power-domain@0 { pgc_otg1: power-domain@1 { #power-domain-cells = <0>; reg = <IMX8MN_POWER_DOMAIN_OTG1>; - power-domains = <&pgc_hsiomix>; };
pgc_gpumix: power-domain@2 {
From: Li Jun jun.li@nxp.com
[ Upstream commit ee895139a761bdb7869f9f5b9ccc19a064d0d740 ]
pgc_otg1 is actual the power domain of usb PHY, usb controller is in hsio power domain, and pgc_otg1 is required to be powered up to detect usb remote wakeup, so move the pgc_otg1 power domain to the usb phy node.
Fixes: ea2b5af58ab2 ("arm64: dts: imx8mn: put USB controller into power-domains") Signed-off-by: Li Jun jun.li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mn.dtsi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mn.dtsi b/arch/arm64/boot/dts/freescale/imx8mn.dtsi index 950f432627fe..ad0b99adf691 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mn.dtsi @@ -1075,7 +1075,7 @@ usbotg1: usb@32e40000 { assigned-clock-parents = <&clk IMX8MN_SYS_PLL2_500M>; phys = <&usbphynop1>; fsl,usbmisc = <&usbmisc1 0>; - power-domains = <&pgc_otg1>; + power-domains = <&pgc_hsiomix>; status = "disabled"; };
@@ -1174,5 +1174,6 @@ usbphynop1: usbphynop1 { assigned-clocks = <&clk IMX8MN_CLK_USB_PHY_REF>; assigned-clock-parents = <&clk IMX8MN_SYS_PLL1_100M>; clock-names = "main_clk"; + power-domains = <&pgc_otg1>; }; };
From: Tim Harvey tharvey@gateworks.com
[ Upstream commit bb5ad73941dc3f4e3c2241348f385da6501d50ea ]
The GW5910 and GW5913 have a user pushbutton that is tied to the Gateworks System Controller GPIO offset 2. Fix the invalid offset of 0.
Fixes: 64bf0a0af18d ("ARM: dts: imx6qdl-gw: add Gateworks System Controller support") Signed-off-by: Tim Harvey tharvey@gateworks.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/imx6qdl-gw5910.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-gw5913.dtsi | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/boot/dts/imx6qdl-gw5910.dtsi b/arch/arm/boot/dts/imx6qdl-gw5910.dtsi index 68e5ab2e27e2..6bb4855d13ce 100644 --- a/arch/arm/boot/dts/imx6qdl-gw5910.dtsi +++ b/arch/arm/boot/dts/imx6qdl-gw5910.dtsi @@ -29,7 +29,7 @@ gpio-keys {
user-pb { label = "user_pb"; - gpios = <&gsc_gpio 0 GPIO_ACTIVE_LOW>; + gpios = <&gsc_gpio 2 GPIO_ACTIVE_LOW>; linux,code = <BTN_0>; };
diff --git a/arch/arm/boot/dts/imx6qdl-gw5913.dtsi b/arch/arm/boot/dts/imx6qdl-gw5913.dtsi index 8e23cec7149e..696427b487f0 100644 --- a/arch/arm/boot/dts/imx6qdl-gw5913.dtsi +++ b/arch/arm/boot/dts/imx6qdl-gw5913.dtsi @@ -26,7 +26,7 @@ gpio-keys {
user-pb { label = "user_pb"; - gpios = <&gsc_gpio 0 GPIO_ACTIVE_LOW>; + gpios = <&gsc_gpio 2 GPIO_ACTIVE_LOW>; linux,code = <BTN_0>; };
From: Peng Fan peng.fan@nxp.com
[ Upstream commit 06acb824d7d00a30e9400f67eee481b218371b5a ]
Per bindings/mmc/fsl-imx-esdhc.yaml, the clock order is ipg, ahb, per, otherwise warning: " mmc@5b020000: clock-names:1: 'ahb' was expected mmc@5b020000: clock-names:2: 'per' was expected "
Fixes: 16c4ea7501b1 ("arm64: dts: imx8: switch to new lpcg clock binding") Signed-off-by: Peng Fan peng.fan@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi index 82a1c4488378..10370d1a6c6d 100644 --- a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi @@ -38,9 +38,9 @@ usdhc1: mmc@5b010000 { interrupts = <GIC_SPI 232 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b010000 0x10000>; clocks = <&sdhc0_lpcg IMX_LPCG_CLK_4>, - <&sdhc0_lpcg IMX_LPCG_CLK_5>, - <&sdhc0_lpcg IMX_LPCG_CLK_0>; - clock-names = "ipg", "per", "ahb"; + <&sdhc0_lpcg IMX_LPCG_CLK_0>, + <&sdhc0_lpcg IMX_LPCG_CLK_5>; + clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_0>; status = "disabled"; }; @@ -49,9 +49,9 @@ usdhc2: mmc@5b020000 { interrupts = <GIC_SPI 233 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b020000 0x10000>; clocks = <&sdhc1_lpcg IMX_LPCG_CLK_4>, - <&sdhc1_lpcg IMX_LPCG_CLK_5>, - <&sdhc1_lpcg IMX_LPCG_CLK_0>; - clock-names = "ipg", "per", "ahb"; + <&sdhc1_lpcg IMX_LPCG_CLK_0>, + <&sdhc1_lpcg IMX_LPCG_CLK_5>; + clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_1>; fsl,tuning-start-tap = <20>; fsl,tuning-step = <2>; @@ -62,9 +62,9 @@ usdhc3: mmc@5b030000 { interrupts = <GIC_SPI 234 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b030000 0x10000>; clocks = <&sdhc2_lpcg IMX_LPCG_CLK_4>, - <&sdhc2_lpcg IMX_LPCG_CLK_5>, - <&sdhc2_lpcg IMX_LPCG_CLK_0>; - clock-names = "ipg", "per", "ahb"; + <&sdhc2_lpcg IMX_LPCG_CLK_0>, + <&sdhc2_lpcg IMX_LPCG_CLK_5>; + clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_2>; status = "disabled"; };
From: Peng Fan peng.fan@nxp.com
[ Upstream commit e41ba695713996444c224cdac869a2f36a8514c4 ]
Add the GPIO clk, otherwise GPIO may not work if clk driver disable the GPIO clk during kernel boot.
Reviewed-by: Jacky Bai ping.bai@nxp.com Signed-off-by: Peng Fan peng.fan@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Stable-dep-of: d92a110130d4 ("arm64: dts: imx93: correct gpio-ranges") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx93.dtsi | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/imx93.dtsi b/arch/arm64/boot/dts/freescale/imx93.dtsi index f83a07c7c9b1..b04735004fdf 100644 --- a/arch/arm64/boot/dts/freescale/imx93.dtsi +++ b/arch/arm64/boot/dts/freescale/imx93.dtsi @@ -295,6 +295,9 @@ gpio2: gpio@43810080 { interrupts = <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>; interrupt-controller; #interrupt-cells = <2>; + clocks = <&clk IMX93_CLK_GPIO2_GATE>, + <&clk IMX93_CLK_GPIO2_GATE>; + clock-names = "gpio", "port"; gpio-ranges = <&iomuxc 0 32 32>; };
@@ -306,6 +309,9 @@ gpio3: gpio@43820080 { interrupts = <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>; interrupt-controller; #interrupt-cells = <2>; + clocks = <&clk IMX93_CLK_GPIO3_GATE>, + <&clk IMX93_CLK_GPIO3_GATE>; + clock-names = "gpio", "port"; gpio-ranges = <&iomuxc 0 64 32>; };
@@ -317,6 +323,9 @@ gpio4: gpio@43830080 { interrupts = <GIC_SPI 189 IRQ_TYPE_LEVEL_HIGH>; interrupt-controller; #interrupt-cells = <2>; + clocks = <&clk IMX93_CLK_GPIO4_GATE>, + <&clk IMX93_CLK_GPIO4_GATE>; + clock-names = "gpio", "port"; gpio-ranges = <&iomuxc 0 96 32>; };
@@ -328,6 +337,9 @@ gpio1: gpio@47400080 { interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>; interrupt-controller; #interrupt-cells = <2>; + clocks = <&clk IMX93_CLK_GPIO1_GATE>, + <&clk IMX93_CLK_GPIO1_GATE>; + clock-names = "gpio", "port"; gpio-ranges = <&iomuxc 0 0 32>; }; };
From: Peng Fan peng.fan@nxp.com
[ Upstream commit d92a110130d492bd5eab81827ce3730581dc933a ]
Per imx93-pinfunc.h and pinctrl-imx93.c, correct gpio-ranges.
Fixes: ec8b5b5058ea ("arm64: dts: freescale: Add i.MX93 dtsi support") Reported-by: David Wolfe david.wolfe@nxp.com Reviewed-by: Haibo Chen haibo.chen@nxp.com Reviewed-by: Jacky Bai ping.bai@nxp.com Signed-off-by: Peng Fan peng.fan@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx93.dtsi | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx93.dtsi b/arch/arm64/boot/dts/freescale/imx93.dtsi index b04735004fdf..6981d3b0e274 100644 --- a/arch/arm64/boot/dts/freescale/imx93.dtsi +++ b/arch/arm64/boot/dts/freescale/imx93.dtsi @@ -298,7 +298,7 @@ gpio2: gpio@43810080 { clocks = <&clk IMX93_CLK_GPIO2_GATE>, <&clk IMX93_CLK_GPIO2_GATE>; clock-names = "gpio", "port"; - gpio-ranges = <&iomuxc 0 32 32>; + gpio-ranges = <&iomuxc 0 4 30>; };
gpio3: gpio@43820080 { @@ -312,7 +312,8 @@ gpio3: gpio@43820080 { clocks = <&clk IMX93_CLK_GPIO3_GATE>, <&clk IMX93_CLK_GPIO3_GATE>; clock-names = "gpio", "port"; - gpio-ranges = <&iomuxc 0 64 32>; + gpio-ranges = <&iomuxc 0 84 8>, <&iomuxc 8 66 18>, + <&iomuxc 26 34 2>, <&iomuxc 28 0 4>; };
gpio4: gpio@43830080 { @@ -326,7 +327,7 @@ gpio4: gpio@43830080 { clocks = <&clk IMX93_CLK_GPIO4_GATE>, <&clk IMX93_CLK_GPIO4_GATE>; clock-names = "gpio", "port"; - gpio-ranges = <&iomuxc 0 96 32>; + gpio-ranges = <&iomuxc 0 38 28>, <&iomuxc 28 36 2>; };
gpio1: gpio@47400080 { @@ -340,7 +341,7 @@ gpio1: gpio@47400080 { clocks = <&clk IMX93_CLK_GPIO1_GATE>, <&clk IMX93_CLK_GPIO1_GATE>; clock-names = "gpio", "port"; - gpio-ranges = <&iomuxc 0 0 32>; + gpio-ranges = <&iomuxc 0 92 16>; }; }; };
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit c126a0abc5dadd7df236f20aae6d8c3d103f095c ]
Up until now, the external MDIO controller frequency values relied either on the default ones out of reset or on those setup by u-boot. Let's just properly specify the MDC frequency in the DTS so that even without u-boot's intervention Linux can drive the MDIO bus.
Fixes: 6e1b8fae892d ("arm64: dts: lx2160a: add emdio1 node") Fixes: 5705b9dcda57 ("arm64: dts: lx2160a: add emdio2 node") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi index 6680fb2a6dc9..8c76d86cb756 100644 --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi @@ -1385,6 +1385,9 @@ emdio1: mdio@8b96000 { #address-cells = <1>; #size-cells = <0>; little-endian; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
@@ -1395,6 +1398,9 @@ emdio2: mdio@8b97000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit d78a57426e64fc4c61e6189e450a0432d24536ca ]
Up until now, the external MDIO controller frequency values relied either on the default ones out of reset or on those setup by u-boot. Let's just properly specify the MDC frequency in the DTS so that even without u-boot's intervention Linux can drive the MDIO bus.
Fixes: bbe75af7b092 ("arm64: dts: ls1088a: add external MDIO device nodes") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi index 421d879013d7..260d045dbd9a 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi @@ -779,6 +779,9 @@ emdio1: mdio@8b96000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(1)>; status = "disabled"; };
@@ -788,6 +791,9 @@ emdio2: mdio@8b97000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(1)>; status = "disabled"; };
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit d5c921a53c80dfa942f6dff36253db5a50775a5f ]
Up until now, the external MDIO controller frequency values relied either on the default ones out of reset or on those setup by u-boot. Let's just properly specify the MDC frequency in the DTS so that even without u-boot's intervention Linux can drive the MDIO bus.
Fixes: 0420dde30a90 ("arm64: dts: ls208xa: add the external MDIO nodes") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi index d76f1c42f3fa..7bb33933c2cb 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi @@ -533,6 +533,9 @@ emdio1: mdio@8b96000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
@@ -542,6 +545,9 @@ emdio2: mdio@8b97000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
From: Aurelien Jarno aurelien@aurel32.net
[ Upstream commit bfab00b94bd8569cdb84a6511d6615e6a8104e9c ]
When the avdd-0v9 or avdd-1v8 supply are not yet available, EPROBE_DEFER is returned by rockchip_hdmi_parse_dt(). This causes the following error message to be printed multiple times:
dwhdmi-rockchip fe0a0000.hdmi: [drm:dw_hdmi_rockchip_bind [rockchipdrm]] *ERROR* Unable to parse OF data
Fix that by not printing the message when rockchip_hdmi_parse_dt() returns -EPROBE_DEFER.
Fixes: ca80c4eb4b01 ("drm/rockchip: dw_hdmi: add regulator support") Signed-off-by: Aurelien Jarno aurelien@aurel32.net Reviewed-by: Dmitry Osipenko dmitry.osipenko@collabora.com Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20220926203752.5430-1-aurelien... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c index c14f88893868..2f4b8f64cbad 100644 --- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c @@ -565,7 +565,8 @@ static int dw_hdmi_rockchip_bind(struct device *dev, struct device *master,
ret = rockchip_hdmi_parse_dt(hdmi); if (ret) { - DRM_DEV_ERROR(hdmi->dev, "Unable to parse OF data\n"); + if (ret != -EPROBE_DEFER) + DRM_DEV_ERROR(hdmi->dev, "Unable to parse OF data\n"); return ret; }
From: John Keeping john@metanate.com
[ Upstream commit ab78c74cfc5a3caa2bbb7627cb8f3bca40bb5fb0 ]
When switching to the generic fbdev infrastructure, it was missed that framebuffers were created with the alloc_kmap parameter to rockchip_gem_create_object() set to true. The generic infrastructure calls this via the .dumb_create() driver operation and thus creates a buffer without an associated kmap.
alloc_kmap only makes a difference on devices without an IOMMU, but when it is missing rockchip_gem_prime_vmap() fails and the framebuffer cannot be used.
Detect the case where a buffer is being allocated for the framebuffer and ensure a kernel mapping is created in this case.
Fixes: 24af7c34b290 ("drm/rockchip: use generic fbdev setup") Reported-by: Johan Jonker jbx6244@gmail.com Cc: Thomas Zimmermann tzimmermann@suse.de Signed-off-by: John Keeping john@metanate.com Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20221020181248.2497065-1-john@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c index 985584147da1..cf8322c300bd 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c @@ -364,9 +364,12 @@ rockchip_gem_create_with_handle(struct drm_file *file_priv, { struct rockchip_gem_object *rk_obj; struct drm_gem_object *obj; + bool is_framebuffer; int ret;
- rk_obj = rockchip_gem_create_object(drm, size, false); + is_framebuffer = drm->fb_helper && file_priv == drm->fb_helper->client.file; + + rk_obj = rockchip_gem_create_object(drm, size, is_framebuffer); if (IS_ERR(rk_obj)) return ERR_CAST(rk_obj);
From: Robert Beckett bob.beckett@collabora.com
[ Upstream commit d3f6bacfca86f6cf6bf85be1e8b54083d68d8195 ]
swiotlb_max_segment used to return either the maximum size that swiotlb could bounce, or for Xen PV PAGE_SIZE even if swiotlb could bounce buffer larger mappings. This made i915 on Xen PV work as it bypasses the coherency aspect of the DMA API and can't cope with bounce buffering and this avoided bounce buffering for the Xen/PV case.
So instead of adding this hack back, check for Xen/PV directly in i915 for the Xen case and otherwise use the proper DMA API helper to query the maximum mapping size.
Replace swiotlb_max_segment() calls with dma_max_mapping_size(). In i915_gem_object_get_pages_internal() no longer consider max_segment only if CONFIG_SWIOTLB is enabled. There can be other (iommu related) causes of specific max segment sizes.
Fixes: a2daa27c0c61 ("swiotlb: simplify swiotlb_max_segment") Reported-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Signed-off-by: Christoph Hellwig hch@lst.de [hch: added the Xen hack, rewrote the changelog] Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20221020110308.1582518-1-hch@l... (cherry picked from commit 78a07fe777c42800bd1adaec12abe5dcee43919e) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 19 +++-------- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 +-- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- drivers/gpu/drm/i915/i915_scatterlist.h | 34 ++++++++++++-------- 5 files changed, 29 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index c698f95af15f..629acb403a2c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -6,7 +6,6 @@
#include <linux/scatterlist.h> #include <linux/slab.h> -#include <linux/swiotlb.h>
#include "i915_drv.h" #include "i915_gem.h" @@ -38,22 +37,12 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) struct scatterlist *sg; unsigned int sg_page_sizes; unsigned int npages; - int max_order; + int max_order = MAX_ORDER; + unsigned int max_segment; gfp_t gfp;
- max_order = MAX_ORDER; -#ifdef CONFIG_SWIOTLB - if (is_swiotlb_active(obj->base.dev->dev)) { - unsigned int max_segment; - - max_segment = swiotlb_max_segment(); - if (max_segment) { - max_segment = max_t(unsigned int, max_segment, - PAGE_SIZE) >> PAGE_SHIFT; - max_order = min(max_order, ilog2(max_segment)); - } - } -#endif + max_segment = i915_sg_segment_size(i915->drm.dev) >> PAGE_SHIFT; + max_order = min(max_order, get_order(max_segment));
gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_RECLAIMABLE; if (IS_I965GM(i915) || IS_I965G(i915)) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 4eed3dd90ba8..34b9c76cd8e6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -194,7 +194,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) struct intel_memory_region *mem = obj->mm.region; struct address_space *mapping = obj->base.filp->f_mapping; const unsigned long page_count = obj->base.size / PAGE_SIZE; - unsigned int max_segment = i915_sg_segment_size(); + unsigned int max_segment = i915_sg_segment_size(i915->drm.dev); struct sg_table *st; struct sgt_iter sgt_iter; struct page *page; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 6f3ab7ade41a..e85cfc36359a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -189,7 +189,7 @@ static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev, struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev); struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM]; struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm); - const unsigned int max_segment = i915_sg_segment_size(); + const unsigned int max_segment = i915_sg_segment_size(i915->drm.dev); const size_t size = (size_t)ttm->num_pages << PAGE_SHIFT; struct file *filp = i915_tt->filp; struct sgt_iter sgt_iter; @@ -568,7 +568,7 @@ static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm) ret = sg_alloc_table_from_pages_segment(st, ttm->pages, ttm->num_pages, 0, (unsigned long)ttm->num_pages << PAGE_SHIFT, - i915_sg_segment_size(), GFP_KERNEL); + i915_sg_segment_size(i915_tt->dev), GFP_KERNEL); if (ret) { st->sgl = NULL; return ERR_PTR(ret); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 8423df021b71..e4515d6acd43 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -129,7 +129,7 @@ static void i915_gem_object_userptr_drop_ref(struct drm_i915_gem_object *obj) static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) { const unsigned long num_pages = obj->base.size >> PAGE_SHIFT; - unsigned int max_segment = i915_sg_segment_size(); + unsigned int max_segment = i915_sg_segment_size(obj->base.dev->dev); struct sg_table *st; unsigned int sg_page_sizes; struct page **pvec; diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h b/drivers/gpu/drm/i915/i915_scatterlist.h index 9ddb3e743a3e..b0a1db44f895 100644 --- a/drivers/gpu/drm/i915/i915_scatterlist.h +++ b/drivers/gpu/drm/i915/i915_scatterlist.h @@ -9,7 +9,8 @@
#include <linux/pfn.h> #include <linux/scatterlist.h> -#include <linux/swiotlb.h> +#include <linux/dma-mapping.h> +#include <xen/xen.h>
#include "i915_gem.h"
@@ -127,19 +128,26 @@ static inline unsigned int i915_sg_dma_sizes(struct scatterlist *sg) return page_sizes; }
-static inline unsigned int i915_sg_segment_size(void) +static inline unsigned int i915_sg_segment_size(struct device *dev) { - unsigned int size = swiotlb_max_segment(); - - if (size == 0) - size = UINT_MAX; - - size = rounddown(size, PAGE_SIZE); - /* swiotlb_max_segment_size can return 1 byte when it means one page. */ - if (size < PAGE_SIZE) - size = PAGE_SIZE; - - return size; + size_t max = min_t(size_t, UINT_MAX, dma_max_mapping_size(dev)); + + /* + * For Xen PV guests pages aren't contiguous in DMA (machine) address + * space. The DMA API takes care of that both in dma_alloc_* (by + * calling into the hypervisor to make the pages contiguous) and in + * dma_map_* (by bounce buffering). But i915 abuses ignores the + * coherency aspects of the DMA API and thus can't cope with bounce + * buffering actually happening, so add a hack here to force small + * allocations and mappings when running in PV mode on Xen. + * + * Note this will still break if bounce buffering is required for other + * reasons, like confidential computing hypervisors or PCIe root ports + * with addressing limitations. + */ + if (xen_pv_domain()) + max = PAGE_SIZE; + return round_down(max, PAGE_SIZE); }
bool i915_sg_trim(struct sg_table *orig_st);
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 224e858f215a3d6304f95a92357a1753475ca9cf ]
UBLK_F_URING_CMD_COMP_IN_TASK needs to be set and returned to userspace if ublk driver is built as module, otherwise userspace may get wrong flags shown.
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: ZiyangZhang ZiyangZhang@linux.alibaba.com Link: https://lore.kernel.org/r/20221029010432.598367-2-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/ublk_drv.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 6a4a94b4cdf4..31a8715d3a4d 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1507,6 +1507,9 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) */ ub->dev_info.flags &= UBLK_F_ALL;
+ if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK)) + ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK; + /* We are not ready to support zero copy */ ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY;
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit fa81cbafbf5764ad5053512152345fab37a1fe18 ]
kmemleak reported memory leaks in device_add_disk():
kmemleak: 3 new suspected memory leaks
unreferenced object 0xffff88800f420800 (size 512): comm "modprobe", pid 4275, jiffies 4295639067 (age 223.512s) hex dump (first 32 bytes): 04 00 00 00 08 00 00 00 01 00 00 00 00 00 00 00 ................ 00 e1 f5 05 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d3662699>] kmalloc_trace+0x26/0x60 [<00000000edc7aadc>] wbt_init+0x50/0x6f0 [<0000000069601d16>] wbt_enable_default+0x157/0x1c0 [<0000000028fc393f>] blk_register_queue+0x2a4/0x420 [<000000007345a042>] device_add_disk+0x6fd/0xe40 [<0000000060e6aab0>] nbd_dev_add+0x828/0xbf0 [nbd] ...
It is because the memory allocated in wbt_enable_default() is not released in device_add_disk() error path. Normally, these memory are freed in:
del_gendisk() rq_qos_exit() rqos->ops->exit(rqos); wbt_exit()
So rq_qos_exit() is called to free the rq_wb memory for wbt_init(). However in the error path of device_add_disk(), only blk_unregister_queue() is called and make rq_wb memory leaked.
Add rq_qos_exit() to the error path to fix it.
Fixes: 83cbce957446 ("block: add error handling for device_add_disk / add_disk") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20221029071355.35462-1-chenzhongjin@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/genhd.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/genhd.c b/block/genhd.c index 988ba52fd331..044ff97381e3 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -519,6 +519,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, bdi_unregister(disk->bdi); out_unregister_queue: blk_unregister_queue(disk); + rq_qos_exit(disk->queue); out_put_slave_dir: kobject_put(disk->slave_dir); out_put_holder_dir:
From: Chen Jun chenjun102@huawei.com
[ Upstream commit 943f45b9399ed8b2b5190cbc797995edaa97f58f ]
There is a kmemleak caused by modprobe null_blk.ko
unreferenced object 0xffff8881acb1f000 (size 1024): comm "modprobe", pid 836, jiffies 4294971190 (age 27.068s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 00 53 99 9e ff ff ff ff .........S...... backtrace: [<000000004a10c249>] kmalloc_node_trace+0x22/0x60 [<00000000648f7950>] blk_mq_alloc_and_init_hctx+0x289/0x350 [<00000000af06de0e>] blk_mq_realloc_hw_ctxs+0x2fe/0x3d0 [<00000000e00c1872>] blk_mq_init_allocated_queue+0x48c/0x1440 [<00000000d16b4e68>] __blk_mq_alloc_disk+0xc8/0x1c0 [<00000000d10c98c3>] 0xffffffffc450d69d [<00000000b9299f48>] 0xffffffffc4538392 [<0000000061c39ed6>] do_one_initcall+0xd0/0x4f0 [<00000000b389383b>] do_init_module+0x1a4/0x680 [<0000000087cf3542>] load_module+0x6249/0x7110 [<00000000beba61b8>] __do_sys_finit_module+0x140/0x200 [<00000000fdcfff51>] do_syscall_64+0x35/0x80 [<000000003c0f1f71>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
That is because q->ma_ops is set to NULL before blk_release_queue is called.
blk_mq_init_queue_data blk_mq_init_allocated_queue blk_mq_realloc_hw_ctxs for (i = 0; i < set->nr_hw_queues; i++) { old_hctx = xa_load(&q->hctx_table, i); if (!blk_mq_alloc_and_init_hctx(.., i, ..)) [1] if (!old_hctx) break;
xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); [2]
if (!q->nr_hw_queues) [3] goto err_hctxs;
err_exit: q->mq_ops = NULL; [4]
blk_put_queue blk_release_queue if (queue_is_mq(q)) [5] blk_mq_release(q);
[1]: blk_mq_alloc_and_init_hctx failed at i != 0. [2]: The hctxs allocated by [1] are moved to q->unused_hctx_list and will be cleaned up in blk_mq_release. [3]: q->nr_hw_queues is 0. [4]: Set q->mq_ops to NULL. [5]: queue_is_mq returns false due to [4]. And blk_mq_release will not be called. The hctxs in q->unused_hctx_list are leaked.
To fix it, call blk_release_queue in exception path.
Fixes: 2f8f1336a48b ("blk-mq: always free hctx after request queue is freed") Signed-off-by: Yuan Can yuancan@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20221031031242.94107-1-chenjun102@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index fe840536e6ac..edf41959a705 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4104,9 +4104,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, return 0;
err_hctxs: - xa_destroy(&q->hctx_table); - q->nr_hw_queues = 0; - blk_mq_sysfs_deinit(q); + blk_mq_release(q); err_poll: blk_stat_free_callback(q->poll_cb); q->poll_cb = NULL;
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit cd73adcdbad3d9e9923b045d3643409e9c148d17 ]
Recent changes to the thermal framework has made the trip points (trips) for thermal zones compulsory, which made the Ux500 DTS files break validation and also stopped probing because of similar changes to the code.
Fix this by adding an "outer bounding box": battery thermal zones should not get warmer than 70 degress, then we will shut down.
Fixes: 8c596324232d ("dt-bindings: thermal: Fix missing required property") Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Signed-off-by: Linus Walleij linus.walleij@linaro.org Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org Link: https://lore.kernel.org/r/20221030210854.346662-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/ste-href.dtsi | 8 ++++++++ arch/arm/boot/dts/ste-snowball.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-codina-tmo.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-codina.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-gavini.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-golden.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-janice.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-kyle.dts | 8 ++++++++ arch/arm/boot/dts/ste-ux500-samsung-skomer.dts | 8 ++++++++ 9 files changed, 72 insertions(+)
diff --git a/arch/arm/boot/dts/ste-href.dtsi b/arch/arm/boot/dts/ste-href.dtsi index fbaa0ce46427..8f1bb78fc1e4 100644 --- a/arch/arm/boot/dts/ste-href.dtsi +++ b/arch/arm/boot/dts/ste-href.dtsi @@ -24,6 +24,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-snowball.dts b/arch/arm/boot/dts/ste-snowball.dts index 1c9094f24893..e2f0cdacba7d 100644 --- a/arch/arm/boot/dts/ste-snowball.dts +++ b/arch/arm/boot/dts/ste-snowball.dts @@ -28,6 +28,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-codina-tmo.dts b/arch/arm/boot/dts/ste-ux500-samsung-codina-tmo.dts index d6940e0afa86..27a3ab7e25e1 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-codina-tmo.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-codina-tmo.dts @@ -44,6 +44,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-codina.dts b/arch/arm/boot/dts/ste-ux500-samsung-codina.dts index 5f41256d7f4b..b88f0c07873d 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-codina.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-codina.dts @@ -57,6 +57,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts b/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts index 806da3fc33cd..7231bc745200 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts @@ -30,6 +30,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts index b0dce91aff4b..9604695edf53 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts @@ -35,6 +35,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-janice.dts b/arch/arm/boot/dts/ste-ux500-samsung-janice.dts index ed5c79c3d04b..69387e8754a9 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-janice.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-janice.dts @@ -30,6 +30,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-kyle.dts b/arch/arm/boot/dts/ste-ux500-samsung-kyle.dts index c57676faf181..167846df3104 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-kyle.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-kyle.dts @@ -34,6 +34,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-skomer.dts b/arch/arm/boot/dts/ste-ux500-samsung-skomer.dts index 81b341a5ae45..93e5f5ed888d 100644 --- a/arch/arm/boot/dts/ste-ux500-samsung-skomer.dts +++ b/arch/arm/boot/dts/ste-ux500-samsung-skomer.dts @@ -30,6 +30,14 @@ battery-thermal { polling-delay = <0>; polling-delay-passive = <0>; thermal-sensors = <&bat_therm>; + + trips { + battery-crit-hi { + temperature = <70000>; + hysteresis = <2000>; + type = "critical"; + }; + }; }; };
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit fd96fbc8fad35d6b1872c90df8a2f5d721f14d91 ]
Suppress the capability to unbind the core SCMI driver since all the SCMI stack protocol drivers depend on it.
Fixes: aa4f886f3893 ("firmware: arm_scmi: add basic driver infrastructure for SCMI") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-2-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/driver.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index 609ebedee9cb..1b9aa34b8e47 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -2571,6 +2571,7 @@ MODULE_DEVICE_TABLE(of, scmi_of_match); static struct platform_driver scmi_driver = { .driver = { .name = "arm-scmi", + .suppress_bind_attrs = true, .of_match_table = scmi_of_match, .dev_groups = versions_groups, },
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit be9ba1f7f9e0b565b19f4294f5871da9d654bc6d ]
SCMI Rx channels are optional and they can fail to be setup when not present but anyway channels setup routines must bail-out on memory errors.
Make channels setup, and related probing, fail when memory errors are reported on Rx channels.
Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of the transport type") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-4-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/driver.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index 1b9aa34b8e47..9022f5ee29aa 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -2044,8 +2044,12 @@ scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id) { int ret = scmi_chan_setup(info, dev, prot_id, true);
- if (!ret) /* Rx is optional, hence no error check */ - scmi_chan_setup(info, dev, prot_id, false); + if (!ret) { + /* Rx is optional, report only memory errors */ + ret = scmi_chan_setup(info, dev, prot_id, false); + if (ret && ret != -ENOMEM) + ret = 0; + }
return ret; }
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit 5ffc1c4cb896f8d2cf10309422da3633a616d60f ]
SCMI virtio transport device managed allocations must use the main platform device in devres operations instead of the channel devices.
Cc: Peter Hilber peter.hilber@opensynergy.com Fixes: 46abe13b5e3d ("firmware: arm_scmi: Add virtio transport") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-5-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/virtio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/arm_scmi/virtio.c b/drivers/firmware/arm_scmi/virtio.c index 14709dbc96a1..36b7686843a4 100644 --- a/drivers/firmware/arm_scmi/virtio.c +++ b/drivers/firmware/arm_scmi/virtio.c @@ -444,12 +444,12 @@ static int virtio_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, for (i = 0; i < vioch->max_msg; i++) { struct scmi_vio_msg *msg;
- msg = devm_kzalloc(cinfo->dev, sizeof(*msg), GFP_KERNEL); + msg = devm_kzalloc(dev, sizeof(*msg), GFP_KERNEL); if (!msg) return -ENOMEM;
if (tx) { - msg->request = devm_kzalloc(cinfo->dev, + msg->request = devm_kzalloc(dev, VIRTIO_SCMI_MAX_PDU_SIZE, GFP_KERNEL); if (!msg->request) @@ -458,7 +458,7 @@ static int virtio_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, refcount_set(&msg->users, 1); }
- msg->input = devm_kzalloc(cinfo->dev, VIRTIO_SCMI_MAX_PDU_SIZE, + msg->input = devm_kzalloc(dev, VIRTIO_SCMI_MAX_PDU_SIZE, GFP_KERNEL); if (!msg->input) return -ENOMEM;
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit 1eff6929aff594fba3182660f7b6213ec0ceda0c ]
Use devres to allocate the dedicated deferred_tx_wq polling workqueue so as to automatically trigger the proper resource release on error path.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: 5a3b7185c47c ("firmware: arm_scmi: Add atomic mode support to virtio transport") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-6-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/virtio.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/arm_scmi/virtio.c b/drivers/firmware/arm_scmi/virtio.c index 36b7686843a4..33c9b81a55cd 100644 --- a/drivers/firmware/arm_scmi/virtio.c +++ b/drivers/firmware/arm_scmi/virtio.c @@ -148,7 +148,6 @@ static void scmi_vio_channel_cleanup_sync(struct scmi_vio_channel *vioch) { unsigned long flags; DECLARE_COMPLETION_ONSTACK(vioch_shutdown_done); - void *deferred_wq = NULL;
/* * Prepare to wait for the last release if not already released @@ -162,16 +161,11 @@ static void scmi_vio_channel_cleanup_sync(struct scmi_vio_channel *vioch)
vioch->shutdown_done = &vioch_shutdown_done; virtio_break_device(vioch->vqueue->vdev); - if (!vioch->is_rx && vioch->deferred_tx_wq) { - deferred_wq = vioch->deferred_tx_wq; + if (!vioch->is_rx && vioch->deferred_tx_wq) /* Cannot be kicked anymore after this...*/ vioch->deferred_tx_wq = NULL; - } spin_unlock_irqrestore(&vioch->lock, flags);
- if (deferred_wq) - destroy_workqueue(deferred_wq); - scmi_vio_channel_release(vioch);
/* Let any possibly concurrent RX path release the channel */ @@ -416,6 +410,11 @@ static bool virtio_chan_available(struct device *dev, int idx) return vioch && !vioch->cinfo; }
+static void scmi_destroy_tx_workqueue(void *deferred_tx_wq) +{ + destroy_workqueue(deferred_tx_wq); +} + static int virtio_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, bool tx) { @@ -430,6 +429,8 @@ static int virtio_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
/* Setup a deferred worker for polling. */ if (tx && !vioch->deferred_tx_wq) { + int ret; + vioch->deferred_tx_wq = alloc_workqueue(dev_name(&scmi_vdev->dev), WQ_UNBOUND | WQ_FREEZABLE | WQ_SYSFS, @@ -437,6 +438,11 @@ static int virtio_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, if (!vioch->deferred_tx_wq) return -ENOMEM;
+ ret = devm_add_action_or_reset(dev, scmi_destroy_tx_workqueue, + vioch->deferred_tx_wq); + if (ret) + return ret; + INIT_WORK(&vioch->deferred_tx_work, scmi_vio_deferred_tx_worker); }
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit c4a7b9b587ca1bb4678d48d8be7132492b23a81c ]
When thermnal zones are defined, trip points definitions are mandatory. Define a couple of critical trip points for monitoring of existing PMIC and SOC thermal zones.
This was lost between txt to yaml conversion and was re-enforced recently via the commit 8c596324232d ("dt-bindings: thermal: Fix missing required property")
Cc: Rob Herring robh+dt@kernel.org Cc: Krzysztof Kozlowski krzysztof.kozlowski+dt@linaro.org Cc: devicetree@vger.kernel.org Signed-off-by: Cristian Marussi cristian.marussi@arm.com Fixes: f7b636a8d83c ("arm64: dts: juno: add thermal zones for scpi sensors") Link: https://lore.kernel.org/r/20221028140833.280091-8-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/arm/juno-base.dtsi | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 2f27619d8abd..8b4d280b1e7e 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -751,12 +751,26 @@ pmic { polling-delay = <1000>; polling-delay-passive = <100>; thermal-sensors = <&scpi_sensors0 0>; + trips { + pmic_crit0: trip0 { + temperature = <90000>; + hysteresis = <2000>; + type = "critical"; + }; + }; };
soc { polling-delay = <1000>; polling-delay-passive = <100>; thermal-sensors = <&scpi_sensors0 3>; + trips { + soc_crit0: trip0 { + temperature = <80000>; + hysteresis = <2000>; + type = "critical"; + }; + }; };
big_cluster_thermal_zone: big-cluster {
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 569bea74c94d37785682b11bab76f557520477cd ]
In piix4_probe(), the piix4 adapter will be registered in:
piix4_probe() piix4_add_adapters_sb800() / piix4_add_adapter() i2c_add_adapter()
Based on the probed device type, piix4_add_adapters_sb800() or single piix4_add_adapter() will be called. For the former case, piix4_adapter_count is set as the number of adapters, while for antoher case it is not set and kept default *zero*.
When piix4 is removed, piix4_remove() removes the adapters added in piix4_probe(), basing on the piix4_adapter_count value. Because the count is zero for the single adapter case, the adapter won't be removed and makes the sources allocated for adapter leaked, such as the i2c client and device.
These sources can still be accessed by i2c or bus and cause problems. An easily reproduced case is that if a new adapter is registered, i2c will get the leaked adapter and try to call smbus_algorithm, which was already freed:
Triggered by: rmmod i2c_piix4 && modprobe max31730
BUG: unable to handle page fault for address: ffffffffc053d860 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 3752 Comm: modprobe Tainted: G Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:i2c_default_probe (drivers/i2c/i2c-core-base.c:2259) i2c_core RSP: 0018:ffff888107477710 EFLAGS: 00000246 ... <TASK> i2c_detect (drivers/i2c/i2c-core-base.c:2302) i2c_core __process_new_driver (drivers/i2c/i2c-core-base.c:1336) i2c_core bus_for_each_dev (drivers/base/bus.c:301) i2c_for_each_dev (drivers/i2c/i2c-core-base.c:1823) i2c_core i2c_register_driver (drivers/i2c/i2c-core-base.c:1861) i2c_core do_one_initcall (init/main.c:1296) do_init_module (kernel/module/main.c:2455) ... </TASK> ---[ end trace 0000000000000000 ]---
Fix this problem by correctly set piix4_adapter_count as 1 for the single adapter so it can be normally removed.
Fixes: 528d53a1592b ("i2c: piix4: Fix probing of reserved ports on AMD Family 16h Model 30h") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Jean Delvare jdelvare@suse.de Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-piix4.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c index 39cb1b7bb865..809fbd014cd6 100644 --- a/drivers/i2c/busses/i2c-piix4.c +++ b/drivers/i2c/busses/i2c-piix4.c @@ -1080,6 +1080,7 @@ static int piix4_probe(struct pci_dev *dev, const struct pci_device_id *id) "", &piix4_main_adapters[0]); if (retval < 0) return retval; + piix4_adapter_count = 1; }
/* Check for auxiliary SMBus on some AMD chipsets */
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit 711f8c3fb3db61897080468586b970c87c61d9e4 upstream.
The Bluetooth spec states that the valid range for SPSM is from 0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A page 1059: Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896 CC: stable@vger.kernel.org Reported-by: Tamás Koczka poprdi@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Tedd Ho-Jeong An tedd.an@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_core.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5813,6 +5813,19 @@ static int l2cap_le_connect_req(struct l BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm), scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A + * page 1059: + * + * Valid range: 0x0001-0x00ff + * + * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges + */ + if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) { + result = L2CAP_CR_LE_BAD_PSM; + chan = NULL; + goto response; + } + /* Check if we have socket listening on psm */ pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src, &conn->hcon->dst, LE_LINK); @@ -6001,6 +6014,18 @@ static inline int l2cap_ecred_conn_req(s
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A + * page 1059: + * + * Valid range: 0x0001-0x00ff + * + * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges + */ + if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) { + result = L2CAP_CR_LE_BAD_PSM; + goto response; + } + BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit b1a2cd50c0357f243b7435a732b4e62ba3157a2e upstream.
On l2cap_parse_conf_req the variable efs is only initialized if remote_efs has been set.
CVE: CVE-2022-42895 CC: stable@vger.kernel.org Reported-by: Tamás Koczka poprdi@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Tedd Ho-Jeong An tedd.an@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -3764,7 +3764,8 @@ done: l2cap_add_conf_opt(&ptr, L2CAP_CONF_RFC, sizeof(rfc), (unsigned long) &rfc, endptr - ptr);
- if (test_bit(FLAG_EFS_ENABLE, &chan->flags)) { + if (remote_efs && + test_bit(FLAG_EFS_ENABLE, &chan->flags)) { chan->remote_id = efs.id; chan->remote_stype = efs.stype; chan->remote_msdu = le16_to_cpu(efs.msdu);
From: Eric Biggers ebiggers@google.com
commit d7e7b9af104c7b389a0c21eb26532511bce4b510 upstream.
The approach of fs/crypto/ internally managing the fscrypt_master_key structs as the payloads of "struct key" objects contained in a "struct key" keyring has outlived its usefulness. The original idea was to simplify the code by reusing code from the keyrings subsystem. However, several issues have arisen that can't easily be resolved:
- When a master key struct is destroyed, blk_crypto_evict_key() must be called on any per-mode keys embedded in it. (This started being the case when inline encryption support was added.) Yet, the keyrings subsystem can arbitrarily delay the destruction of keys, even past the time the filesystem was unmounted. Therefore, currently there is no easy way to call blk_crypto_evict_key() when a master key is destroyed. Currently, this is worked around by holding an extra reference to the filesystem's request_queue(s). But it was overlooked that the request_queue reference is *not* guaranteed to pin the corresponding blk_crypto_profile too; for device-mapper devices that support inline crypto, it doesn't. This can cause a use-after-free.
- When the last inode that was using an incompletely-removed master key is evicted, the master key removal is completed by removing the key struct from the keyring. Currently this is done via key_invalidate(). Yet, key_invalidate() takes the key semaphore. This can deadlock when called from the shrinker, since in fscrypt_ioctl_add_key(), memory is allocated with GFP_KERNEL under the same semaphore.
- More generally, the fact that the keyrings subsystem can arbitrarily delay the destruction of keys (via garbage collection delay, or via random processes getting temporary key references) is undesirable, as it means we can't strictly guarantee that all secrets are ever wiped.
- Doing the master key lookups via the keyrings subsystem results in the key_permission LSM hook being called. fscrypt doesn't want this, as all access control for encrypted files is designed to happen via the files themselves, like any other files. The workaround which SELinux users are using is to change their SELinux policy to grant key search access to all domains. This works, but it is an odd extra step that shouldn't really have to be done.
The fix for all these issues is to change the implementation to what I should have done originally: don't use the keyrings subsystem to keep track of the filesystem's fscrypt_master_key structs. Instead, just store them in a regular kernel data structure, and rework the reference counting, locking, and lifetime accordingly. Retain support for RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free() with fscrypt_sb_delete(), which releases the keys synchronously and runs a bit earlier during unmount, so that block devices are still available.
A side effect of this patch is that neither the master keys themselves nor the filesystem keyrings will be listed in /proc/keys anymore. ("Master key users" and the master key users keyrings will still be listed.) However, this was mostly an implementation detail, and it was intended just for debugging purposes. I don't know of anyone using it.
This patch does *not* change how "master key users" (->mk_users) works; that still uses the keyrings subsystem. That is still needed for key quotas, and changing that isn't necessary to solve the issues listed above. If we decide to change that too, it would be a separate patch.
I've marked this as fixing the original commit that added the fscrypt keyring, but as noted above the most important issue that this patch fixes wasn't introduced until the addition of inline encryption support.
Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/crypto/fscrypt_private.h | 71 ++++-- fs/crypto/hooks.c | 10 fs/crypto/keyring.c | 484 +++++++++++++++++++++++--------------------- fs/crypto/keysetup.c | 81 ++----- fs/crypto/policy.c | 8 fs/super.c | 2 include/linux/fs.h | 2 include/linux/fscrypt.h | 4 8 files changed, 352 insertions(+), 310 deletions(-)
--- a/fs/crypto/fscrypt_private.h +++ b/fs/crypto/fscrypt_private.h @@ -225,7 +225,7 @@ struct fscrypt_info { * will be NULL if the master key was found in a process-subscribed * keyring rather than in the filesystem-level keyring. */ - struct key *ci_master_key; + struct fscrypt_master_key *ci_master_key;
/* * Link in list of inodes that were unlocked with the master key. @@ -437,6 +437,40 @@ struct fscrypt_master_key_secret { struct fscrypt_master_key {
/* + * Back-pointer to the super_block of the filesystem to which this + * master key has been added. Only valid if ->mk_active_refs > 0. + */ + struct super_block *mk_sb; + + /* + * Link in ->mk_sb->s_master_keys->key_hashtable. + * Only valid if ->mk_active_refs > 0. + */ + struct hlist_node mk_node; + + /* Semaphore that protects ->mk_secret and ->mk_users */ + struct rw_semaphore mk_sem; + + /* + * Active and structural reference counts. An active ref guarantees + * that the struct continues to exist, continues to be in the keyring + * ->mk_sb->s_master_keys, and that any embedded subkeys (e.g. + * ->mk_direct_keys) that have been prepared continue to exist. + * A structural ref only guarantees that the struct continues to exist. + * + * There is one active ref associated with ->mk_secret being present, + * and one active ref for each inode in ->mk_decrypted_inodes. + * + * There is one structural ref associated with the active refcount being + * nonzero. Finding a key in the keyring also takes a structural ref, + * which is then held temporarily while the key is operated on. + */ + refcount_t mk_active_refs; + refcount_t mk_struct_refs; + + struct rcu_head mk_rcu_head; + + /* * The secret key material. After FS_IOC_REMOVE_ENCRYPTION_KEY is * executed, this is wiped and no new inodes can be unlocked with this * key; however, there may still be inodes in ->mk_decrypted_inodes @@ -444,7 +478,10 @@ struct fscrypt_master_key { * FS_IOC_REMOVE_ENCRYPTION_KEY can be retried, or * FS_IOC_ADD_ENCRYPTION_KEY can add the secret again. * - * Locking: protected by this master key's key->sem. + * While ->mk_secret is present, one ref in ->mk_active_refs is held. + * + * Locking: protected by ->mk_sem. The manipulation of ->mk_active_refs + * associated with this field is protected by ->mk_sem as well. */ struct fscrypt_master_key_secret mk_secret;
@@ -465,23 +502,13 @@ struct fscrypt_master_key { * * This is NULL for v1 policy keys; those can only be added by root. * - * Locking: in addition to this keyring's own semaphore, this is - * protected by this master key's key->sem, so we can do atomic - * search+insert. It can also be searched without taking any locks, but - * in that case the returned key may have already been removed. + * Locking: protected by ->mk_sem. (We don't just rely on the keyrings + * subsystem semaphore ->mk_users->sem, as we need support for atomic + * search+insert along with proper synchronization with ->mk_secret.) */ struct key *mk_users;
/* - * Length of ->mk_decrypted_inodes, plus one if mk_secret is present. - * Once this goes to 0, the master key is removed from ->s_master_keys. - * The 'struct fscrypt_master_key' will continue to live as long as the - * 'struct key' whose payload it is, but we won't let this reference - * count rise again. - */ - refcount_t mk_refcount; - - /* * List of inodes that were unlocked using this key. This allows the * inodes to be evicted efficiently if the key is removed. */ @@ -506,10 +533,10 @@ static inline bool is_master_key_secret_present(const struct fscrypt_master_key_secret *secret) { /* - * The READ_ONCE() is only necessary for fscrypt_drop_inode() and - * fscrypt_key_describe(). These run in atomic context, so they can't - * take the key semaphore and thus 'secret' can change concurrently - * which would be a data race. But they only need to know whether the + * The READ_ONCE() is only necessary for fscrypt_drop_inode(). + * fscrypt_drop_inode() runs in atomic context, so it can't take the key + * semaphore and thus 'secret' can change concurrently which would be a + * data race. But fscrypt_drop_inode() only need to know whether the * secret *was* present at the time of check, so READ_ONCE() suffices. */ return READ_ONCE(secret->size) != 0; @@ -538,7 +565,11 @@ static inline int master_key_spec_len(co return 0; }
-struct key * +void fscrypt_put_master_key(struct fscrypt_master_key *mk); + +void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk); + +struct fscrypt_master_key * fscrypt_find_master_key(struct super_block *sb, const struct fscrypt_key_specifier *mk_spec);
--- a/fs/crypto/hooks.c +++ b/fs/crypto/hooks.c @@ -5,8 +5,6 @@ * Encryption hooks for higher-level filesystem operations. */
-#include <linux/key.h> - #include "fscrypt_private.h"
/** @@ -142,7 +140,6 @@ int fscrypt_prepare_setflags(struct inod unsigned int oldflags, unsigned int flags) { struct fscrypt_info *ci; - struct key *key; struct fscrypt_master_key *mk; int err;
@@ -158,14 +155,13 @@ int fscrypt_prepare_setflags(struct inod ci = inode->i_crypt_info; if (ci->ci_policy.version != FSCRYPT_POLICY_V2) return -EINVAL; - key = ci->ci_master_key; - mk = key->payload.data[0]; - down_read(&key->sem); + mk = ci->ci_master_key; + down_read(&mk->mk_sem); if (is_master_key_secret_present(&mk->mk_secret)) err = fscrypt_derive_dirhash_key(ci, mk); else err = -ENOKEY; - up_read(&key->sem); + up_read(&mk->mk_sem); return err; } return 0; --- a/fs/crypto/keyring.c +++ b/fs/crypto/keyring.c @@ -18,6 +18,7 @@ * information about these ioctls. */
+#include <asm/unaligned.h> #include <crypto/skcipher.h> #include <linux/key-type.h> #include <linux/random.h> @@ -25,6 +26,18 @@
#include "fscrypt_private.h"
+/* The master encryption keys for a filesystem (->s_master_keys) */ +struct fscrypt_keyring { + /* + * Lock that protects ->key_hashtable. It does *not* protect the + * fscrypt_master_key structs themselves. + */ + spinlock_t lock; + + /* Hash table that maps fscrypt_key_specifier to fscrypt_master_key */ + struct hlist_head key_hashtable[128]; +}; + static void wipe_master_key_secret(struct fscrypt_master_key_secret *secret) { fscrypt_destroy_hkdf(&secret->hkdf); @@ -38,20 +51,70 @@ static void move_master_key_secret(struc memzero_explicit(src, sizeof(*src)); }
-static void free_master_key(struct fscrypt_master_key *mk) +static void fscrypt_free_master_key(struct rcu_head *head) +{ + struct fscrypt_master_key *mk = + container_of(head, struct fscrypt_master_key, mk_rcu_head); + /* + * The master key secret and any embedded subkeys should have already + * been wiped when the last active reference to the fscrypt_master_key + * struct was dropped; doing it here would be unnecessarily late. + * Nevertheless, use kfree_sensitive() in case anything was missed. + */ + kfree_sensitive(mk); +} + +void fscrypt_put_master_key(struct fscrypt_master_key *mk) +{ + if (!refcount_dec_and_test(&mk->mk_struct_refs)) + return; + /* + * No structural references left, so free ->mk_users, and also free the + * fscrypt_master_key struct itself after an RCU grace period ensures + * that concurrent keyring lookups can no longer find it. + */ + WARN_ON(refcount_read(&mk->mk_active_refs) != 0); + key_put(mk->mk_users); + mk->mk_users = NULL; + call_rcu(&mk->mk_rcu_head, fscrypt_free_master_key); +} + +void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk) { + struct super_block *sb = mk->mk_sb; + struct fscrypt_keyring *keyring = sb->s_master_keys; size_t i;
- wipe_master_key_secret(&mk->mk_secret); + if (!refcount_dec_and_test(&mk->mk_active_refs)) + return; + /* + * No active references left, so complete the full removal of this + * fscrypt_master_key struct by removing it from the keyring and + * destroying any subkeys embedded in it. + */ + + spin_lock(&keyring->lock); + hlist_del_rcu(&mk->mk_node); + spin_unlock(&keyring->lock); + + /* + * ->mk_active_refs == 0 implies that ->mk_secret is not present and + * that ->mk_decrypted_inodes is empty. + */ + WARN_ON(is_master_key_secret_present(&mk->mk_secret)); + WARN_ON(!list_empty(&mk->mk_decrypted_inodes));
for (i = 0; i <= FSCRYPT_MODE_MAX; i++) { fscrypt_destroy_prepared_key(&mk->mk_direct_keys[i]); fscrypt_destroy_prepared_key(&mk->mk_iv_ino_lblk_64_keys[i]); fscrypt_destroy_prepared_key(&mk->mk_iv_ino_lblk_32_keys[i]); } + memzero_explicit(&mk->mk_ino_hash_key, + sizeof(mk->mk_ino_hash_key)); + mk->mk_ino_hash_key_initialized = false;
- key_put(mk->mk_users); - kfree_sensitive(mk); + /* Drop the structural ref associated with the active refs. */ + fscrypt_put_master_key(mk); }
static inline bool valid_key_spec(const struct fscrypt_key_specifier *spec) @@ -61,44 +124,6 @@ static inline bool valid_key_spec(const return master_key_spec_len(spec) != 0; }
-static int fscrypt_key_instantiate(struct key *key, - struct key_preparsed_payload *prep) -{ - key->payload.data[0] = (struct fscrypt_master_key *)prep->data; - return 0; -} - -static void fscrypt_key_destroy(struct key *key) -{ - free_master_key(key->payload.data[0]); -} - -static void fscrypt_key_describe(const struct key *key, struct seq_file *m) -{ - seq_puts(m, key->description); - - if (key_is_positive(key)) { - const struct fscrypt_master_key *mk = key->payload.data[0]; - - if (!is_master_key_secret_present(&mk->mk_secret)) - seq_puts(m, ": secret removed"); - } -} - -/* - * Type of key in ->s_master_keys. Each key of this type represents a master - * key which has been added to the filesystem. Its payload is a - * 'struct fscrypt_master_key'. The "." prefix in the key type name prevents - * users from adding keys of this type via the keyrings syscalls rather than via - * the intended method of FS_IOC_ADD_ENCRYPTION_KEY. - */ -static struct key_type key_type_fscrypt = { - .name = "._fscrypt", - .instantiate = fscrypt_key_instantiate, - .destroy = fscrypt_key_destroy, - .describe = fscrypt_key_describe, -}; - static int fscrypt_user_key_instantiate(struct key *key, struct key_preparsed_payload *prep) { @@ -131,32 +156,6 @@ static struct key_type key_type_fscrypt_ .describe = fscrypt_user_key_describe, };
-/* Search ->s_master_keys or ->mk_users */ -static struct key *search_fscrypt_keyring(struct key *keyring, - struct key_type *type, - const char *description) -{ - /* - * We need to mark the keyring reference as "possessed" so that we - * acquire permission to search it, via the KEY_POS_SEARCH permission. - */ - key_ref_t keyref = make_key_ref(keyring, true /* possessed */); - - keyref = keyring_search(keyref, type, description, false); - if (IS_ERR(keyref)) { - if (PTR_ERR(keyref) == -EAGAIN || /* not found */ - PTR_ERR(keyref) == -EKEYREVOKED) /* recently invalidated */ - keyref = ERR_PTR(-ENOKEY); - return ERR_CAST(keyref); - } - return key_ref_to_ptr(keyref); -} - -#define FSCRYPT_FS_KEYRING_DESCRIPTION_SIZE \ - (CONST_STRLEN("fscrypt-") + sizeof_field(struct super_block, s_id)) - -#define FSCRYPT_MK_DESCRIPTION_SIZE (2 * FSCRYPT_KEY_IDENTIFIER_SIZE + 1) - #define FSCRYPT_MK_USERS_DESCRIPTION_SIZE \ (CONST_STRLEN("fscrypt-") + 2 * FSCRYPT_KEY_IDENTIFIER_SIZE + \ CONST_STRLEN("-users") + 1) @@ -164,21 +163,6 @@ static struct key *search_fscrypt_keyrin #define FSCRYPT_MK_USER_DESCRIPTION_SIZE \ (2 * FSCRYPT_KEY_IDENTIFIER_SIZE + CONST_STRLEN(".uid.") + 10 + 1)
-static void format_fs_keyring_description( - char description[FSCRYPT_FS_KEYRING_DESCRIPTION_SIZE], - const struct super_block *sb) -{ - sprintf(description, "fscrypt-%s", sb->s_id); -} - -static void format_mk_description( - char description[FSCRYPT_MK_DESCRIPTION_SIZE], - const struct fscrypt_key_specifier *mk_spec) -{ - sprintf(description, "%*phN", - master_key_spec_len(mk_spec), (u8 *)&mk_spec->u); -} - static void format_mk_users_keyring_description( char description[FSCRYPT_MK_USERS_DESCRIPTION_SIZE], const u8 mk_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) @@ -199,20 +183,15 @@ static void format_mk_user_description( /* Create ->s_master_keys if needed. Synchronized by fscrypt_add_key_mutex. */ static int allocate_filesystem_keyring(struct super_block *sb) { - char description[FSCRYPT_FS_KEYRING_DESCRIPTION_SIZE]; - struct key *keyring; + struct fscrypt_keyring *keyring;
if (sb->s_master_keys) return 0;
- format_fs_keyring_description(description, sb); - keyring = keyring_alloc(description, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, - current_cred(), KEY_POS_SEARCH | - KEY_USR_SEARCH | KEY_USR_READ | KEY_USR_VIEW, - KEY_ALLOC_NOT_IN_QUOTA, NULL, NULL); - if (IS_ERR(keyring)) - return PTR_ERR(keyring); - + keyring = kzalloc(sizeof(*keyring), GFP_KERNEL); + if (!keyring) + return -ENOMEM; + spin_lock_init(&keyring->lock); /* * Pairs with the smp_load_acquire() in fscrypt_find_master_key(). * I.e., here we publish ->s_master_keys with a RELEASE barrier so that @@ -222,21 +201,75 @@ static int allocate_filesystem_keyring(s return 0; }
-void fscrypt_sb_free(struct super_block *sb) +/* + * This is called at unmount time to release all encryption keys that have been + * added to the filesystem, along with the keyring that contains them. + * + * Note that besides clearing and freeing memory, this might need to evict keys + * from the keyslots of an inline crypto engine. Therefore, this must be called + * while the filesystem's underlying block device(s) are still available. + */ +void fscrypt_sb_delete(struct super_block *sb) { - key_put(sb->s_master_keys); + struct fscrypt_keyring *keyring = sb->s_master_keys; + size_t i; + + if (!keyring) + return; + + for (i = 0; i < ARRAY_SIZE(keyring->key_hashtable); i++) { + struct hlist_head *bucket = &keyring->key_hashtable[i]; + struct fscrypt_master_key *mk; + struct hlist_node *tmp; + + hlist_for_each_entry_safe(mk, tmp, bucket, mk_node) { + /* + * Since all inodes were already evicted, every key + * remaining in the keyring should have an empty inode + * list, and should only still be in the keyring due to + * the single active ref associated with ->mk_secret. + * There should be no structural refs beyond the one + * associated with the active ref. + */ + WARN_ON(refcount_read(&mk->mk_active_refs) != 1); + WARN_ON(refcount_read(&mk->mk_struct_refs) != 1); + WARN_ON(!is_master_key_secret_present(&mk->mk_secret)); + wipe_master_key_secret(&mk->mk_secret); + fscrypt_put_master_key_activeref(mk); + } + } + kfree_sensitive(keyring); sb->s_master_keys = NULL; }
+static struct hlist_head * +fscrypt_mk_hash_bucket(struct fscrypt_keyring *keyring, + const struct fscrypt_key_specifier *mk_spec) +{ + /* + * Since key specifiers should be "random" values, it is sufficient to + * use a trivial hash function that just takes the first several bits of + * the key specifier. + */ + unsigned long i = get_unaligned((unsigned long *)&mk_spec->u); + + return &keyring->key_hashtable[i % ARRAY_SIZE(keyring->key_hashtable)]; +} + /* - * Find the specified master key in ->s_master_keys. - * Returns ERR_PTR(-ENOKEY) if not found. + * Find the specified master key struct in ->s_master_keys and take a structural + * ref to it. The structural ref guarantees that the key struct continues to + * exist, but it does *not* guarantee that ->s_master_keys continues to contain + * the key struct. The structural ref needs to be dropped by + * fscrypt_put_master_key(). Returns NULL if the key struct is not found. */ -struct key *fscrypt_find_master_key(struct super_block *sb, - const struct fscrypt_key_specifier *mk_spec) +struct fscrypt_master_key * +fscrypt_find_master_key(struct super_block *sb, + const struct fscrypt_key_specifier *mk_spec) { - struct key *keyring; - char description[FSCRYPT_MK_DESCRIPTION_SIZE]; + struct fscrypt_keyring *keyring; + struct hlist_head *bucket; + struct fscrypt_master_key *mk;
/* * Pairs with the smp_store_release() in allocate_filesystem_keyring(). @@ -246,10 +279,38 @@ struct key *fscrypt_find_master_key(stru */ keyring = smp_load_acquire(&sb->s_master_keys); if (keyring == NULL) - return ERR_PTR(-ENOKEY); /* No keyring yet, so no keys yet. */ + return NULL; /* No keyring yet, so no keys yet. */
- format_mk_description(description, mk_spec); - return search_fscrypt_keyring(keyring, &key_type_fscrypt, description); + bucket = fscrypt_mk_hash_bucket(keyring, mk_spec); + rcu_read_lock(); + switch (mk_spec->type) { + case FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR: + hlist_for_each_entry_rcu(mk, bucket, mk_node) { + if (mk->mk_spec.type == + FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR && + memcmp(mk->mk_spec.u.descriptor, + mk_spec->u.descriptor, + FSCRYPT_KEY_DESCRIPTOR_SIZE) == 0 && + refcount_inc_not_zero(&mk->mk_struct_refs)) + goto out; + } + break; + case FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER: + hlist_for_each_entry_rcu(mk, bucket, mk_node) { + if (mk->mk_spec.type == + FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER && + memcmp(mk->mk_spec.u.identifier, + mk_spec->u.identifier, + FSCRYPT_KEY_IDENTIFIER_SIZE) == 0 && + refcount_inc_not_zero(&mk->mk_struct_refs)) + goto out; + } + break; + } + mk = NULL; +out: + rcu_read_unlock(); + return mk; }
static int allocate_master_key_users_keyring(struct fscrypt_master_key *mk) @@ -277,17 +338,30 @@ static int allocate_master_key_users_key static struct key *find_master_key_user(struct fscrypt_master_key *mk) { char description[FSCRYPT_MK_USER_DESCRIPTION_SIZE]; + key_ref_t keyref;
format_mk_user_description(description, mk->mk_spec.u.identifier); - return search_fscrypt_keyring(mk->mk_users, &key_type_fscrypt_user, - description); + + /* + * We need to mark the keyring reference as "possessed" so that we + * acquire permission to search it, via the KEY_POS_SEARCH permission. + */ + keyref = keyring_search(make_key_ref(mk->mk_users, true /*possessed*/), + &key_type_fscrypt_user, description, false); + if (IS_ERR(keyref)) { + if (PTR_ERR(keyref) == -EAGAIN || /* not found */ + PTR_ERR(keyref) == -EKEYREVOKED) /* recently invalidated */ + keyref = ERR_PTR(-ENOKEY); + return ERR_CAST(keyref); + } + return key_ref_to_ptr(keyref); }
/* * Give the current user a "key" in ->mk_users. This charges the user's quota * and marks the master key as added by the current user, so that it cannot be - * removed by another user with the key. Either the master key's key->sem must - * be held for write, or the master key must be still undergoing initialization. + * removed by another user with the key. Either ->mk_sem must be held for + * write, or the master key must be still undergoing initialization. */ static int add_master_key_user(struct fscrypt_master_key *mk) { @@ -309,7 +383,7 @@ static int add_master_key_user(struct fs
/* * Remove the current user's "key" from ->mk_users. - * The master key's key->sem must be held for write. + * ->mk_sem must be held for write. * * Returns 0 if removed, -ENOKEY if not found, or another -errno code. */ @@ -327,63 +401,49 @@ static int remove_master_key_user(struct }
/* - * Allocate a new fscrypt_master_key which contains the given secret, set it as - * the payload of a new 'struct key' of type fscrypt, and link the 'struct key' - * into the given keyring. Synchronized by fscrypt_add_key_mutex. + * Allocate a new fscrypt_master_key, transfer the given secret over to it, and + * insert it into sb->s_master_keys. */ -static int add_new_master_key(struct fscrypt_master_key_secret *secret, - const struct fscrypt_key_specifier *mk_spec, - struct key *keyring) +static int add_new_master_key(struct super_block *sb, + struct fscrypt_master_key_secret *secret, + const struct fscrypt_key_specifier *mk_spec) { + struct fscrypt_keyring *keyring = sb->s_master_keys; struct fscrypt_master_key *mk; - char description[FSCRYPT_MK_DESCRIPTION_SIZE]; - struct key *key; int err;
mk = kzalloc(sizeof(*mk), GFP_KERNEL); if (!mk) return -ENOMEM;
+ mk->mk_sb = sb; + init_rwsem(&mk->mk_sem); + refcount_set(&mk->mk_struct_refs, 1); mk->mk_spec = *mk_spec;
- move_master_key_secret(&mk->mk_secret, secret); - - refcount_set(&mk->mk_refcount, 1); /* secret is present */ INIT_LIST_HEAD(&mk->mk_decrypted_inodes); spin_lock_init(&mk->mk_decrypted_inodes_lock);
if (mk_spec->type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) { err = allocate_master_key_users_keyring(mk); if (err) - goto out_free_mk; + goto out_put; err = add_master_key_user(mk); if (err) - goto out_free_mk; + goto out_put; }
- /* - * Note that we don't charge this key to anyone's quota, since when - * ->mk_users is in use those keys are charged instead, and otherwise - * (when ->mk_users isn't in use) only root can add these keys. - */ - format_mk_description(description, mk_spec); - key = key_alloc(&key_type_fscrypt, description, - GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(), - KEY_POS_SEARCH | KEY_USR_SEARCH | KEY_USR_VIEW, - KEY_ALLOC_NOT_IN_QUOTA, NULL); - if (IS_ERR(key)) { - err = PTR_ERR(key); - goto out_free_mk; - } - err = key_instantiate_and_link(key, mk, sizeof(*mk), keyring, NULL); - key_put(key); - if (err) - goto out_free_mk; + move_master_key_secret(&mk->mk_secret, secret); + refcount_set(&mk->mk_active_refs, 1); /* ->mk_secret is present */
+ spin_lock(&keyring->lock); + hlist_add_head_rcu(&mk->mk_node, + fscrypt_mk_hash_bucket(keyring, mk_spec)); + spin_unlock(&keyring->lock); return 0;
-out_free_mk: - free_master_key(mk); +out_put: + fscrypt_put_master_key(mk); return err; }
@@ -392,42 +452,34 @@ out_free_mk: static int add_existing_master_key(struct fscrypt_master_key *mk, struct fscrypt_master_key_secret *secret) { - struct key *mk_user; - bool rekey; int err;
/* * If the current user is already in ->mk_users, then there's nothing to - * do. (Not applicable for v1 policy keys, which have NULL ->mk_users.) + * do. Otherwise, we need to add the user to ->mk_users. (Neither is + * applicable for v1 policy keys, which have NULL ->mk_users.) */ if (mk->mk_users) { - mk_user = find_master_key_user(mk); + struct key *mk_user = find_master_key_user(mk); + if (mk_user != ERR_PTR(-ENOKEY)) { if (IS_ERR(mk_user)) return PTR_ERR(mk_user); key_put(mk_user); return 0; } - } - - /* If we'll be re-adding ->mk_secret, try to take the reference. */ - rekey = !is_master_key_secret_present(&mk->mk_secret); - if (rekey && !refcount_inc_not_zero(&mk->mk_refcount)) - return KEY_DEAD; - - /* Add the current user to ->mk_users, if applicable. */ - if (mk->mk_users) { err = add_master_key_user(mk); - if (err) { - if (rekey && refcount_dec_and_test(&mk->mk_refcount)) - return KEY_DEAD; + if (err) return err; - } }
/* Re-add the secret if needed. */ - if (rekey) + if (!is_master_key_secret_present(&mk->mk_secret)) { + if (!refcount_inc_not_zero(&mk->mk_active_refs)) + return KEY_DEAD; move_master_key_secret(&mk->mk_secret, secret); + } + return 0; }
@@ -436,38 +488,36 @@ static int do_add_master_key(struct supe const struct fscrypt_key_specifier *mk_spec) { static DEFINE_MUTEX(fscrypt_add_key_mutex); - struct key *key; + struct fscrypt_master_key *mk; int err;
mutex_lock(&fscrypt_add_key_mutex); /* serialize find + link */ -retry: - key = fscrypt_find_master_key(sb, mk_spec); - if (IS_ERR(key)) { - err = PTR_ERR(key); - if (err != -ENOKEY) - goto out_unlock; + + mk = fscrypt_find_master_key(sb, mk_spec); + if (!mk) { /* Didn't find the key in ->s_master_keys. Add it. */ err = allocate_filesystem_keyring(sb); - if (err) - goto out_unlock; - err = add_new_master_key(secret, mk_spec, sb->s_master_keys); + if (!err) + err = add_new_master_key(sb, secret, mk_spec); } else { /* * Found the key in ->s_master_keys. Re-add the secret if * needed, and add the user to ->mk_users if needed. */ - down_write(&key->sem); - err = add_existing_master_key(key->payload.data[0], secret); - up_write(&key->sem); + down_write(&mk->mk_sem); + err = add_existing_master_key(mk, secret); + up_write(&mk->mk_sem); if (err == KEY_DEAD) { - /* Key being removed or needs to be removed */ - key_invalidate(key); - key_put(key); - goto retry; + /* + * We found a key struct, but it's already been fully + * removed. Ignore the old struct and add a new one. + * fscrypt_add_key_mutex means we don't need to worry + * about concurrent adds. + */ + err = add_new_master_key(sb, secret, mk_spec); } - key_put(key); + fscrypt_put_master_key(mk); } -out_unlock: mutex_unlock(&fscrypt_add_key_mutex); return err; } @@ -771,19 +821,19 @@ int fscrypt_verify_key_added(struct supe const u8 identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) { struct fscrypt_key_specifier mk_spec; - struct key *key, *mk_user; struct fscrypt_master_key *mk; + struct key *mk_user; int err;
mk_spec.type = FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER; memcpy(mk_spec.u.identifier, identifier, FSCRYPT_KEY_IDENTIFIER_SIZE);
- key = fscrypt_find_master_key(sb, &mk_spec); - if (IS_ERR(key)) { - err = PTR_ERR(key); + mk = fscrypt_find_master_key(sb, &mk_spec); + if (!mk) { + err = -ENOKEY; goto out; } - mk = key->payload.data[0]; + down_read(&mk->mk_sem); mk_user = find_master_key_user(mk); if (IS_ERR(mk_user)) { err = PTR_ERR(mk_user); @@ -791,7 +841,8 @@ int fscrypt_verify_key_added(struct supe key_put(mk_user); err = 0; } - key_put(key); + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); out: if (err == -ENOKEY && capable(CAP_FOWNER)) err = 0; @@ -953,11 +1004,10 @@ static int do_remove_key(struct file *fi struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_remove_key_arg __user *uarg = _uarg; struct fscrypt_remove_key_arg arg; - struct key *key; struct fscrypt_master_key *mk; u32 status_flags = 0; int err; - bool dead; + bool inodes_remain;
if (copy_from_user(&arg, uarg, sizeof(arg))) return -EFAULT; @@ -977,12 +1027,10 @@ static int do_remove_key(struct file *fi return -EACCES;
/* Find the key being removed. */ - key = fscrypt_find_master_key(sb, &arg.key_spec); - if (IS_ERR(key)) - return PTR_ERR(key); - mk = key->payload.data[0]; - - down_write(&key->sem); + mk = fscrypt_find_master_key(sb, &arg.key_spec); + if (!mk) + return -ENOKEY; + down_write(&mk->mk_sem);
/* If relevant, remove current user's (or all users) claim to the key */ if (mk->mk_users && mk->mk_users->keys.nr_leaves_on_tree != 0) { @@ -991,7 +1039,7 @@ static int do_remove_key(struct file *fi else err = remove_master_key_user(mk); if (err) { - up_write(&key->sem); + up_write(&mk->mk_sem); goto out_put_key; } if (mk->mk_users->keys.nr_leaves_on_tree != 0) { @@ -1003,26 +1051,22 @@ static int do_remove_key(struct file *fi status_flags |= FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS; err = 0; - up_write(&key->sem); + up_write(&mk->mk_sem); goto out_put_key; } }
/* No user claims remaining. Go ahead and wipe the secret. */ - dead = false; + err = -ENOKEY; if (is_master_key_secret_present(&mk->mk_secret)) { wipe_master_key_secret(&mk->mk_secret); - dead = refcount_dec_and_test(&mk->mk_refcount); - } - up_write(&key->sem); - if (dead) { - /* - * No inodes reference the key, and we wiped the secret, so the - * key object is free to be removed from the keyring. - */ - key_invalidate(key); + fscrypt_put_master_key_activeref(mk); err = 0; - } else { + } + inodes_remain = refcount_read(&mk->mk_active_refs) > 0; + up_write(&mk->mk_sem); + + if (inodes_remain) { /* Some inodes still reference this key; try to evict them. */ err = try_to_lock_encrypted_files(sb, mk); if (err == -EBUSY) { @@ -1038,7 +1082,7 @@ static int do_remove_key(struct file *fi * has been fully removed including all files locked. */ out_put_key: - key_put(key); + fscrypt_put_master_key(mk); if (err == 0) err = put_user(status_flags, &uarg->removal_status_flags); return err; @@ -1085,7 +1129,6 @@ int fscrypt_ioctl_get_key_status(struct { struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_get_key_status_arg arg; - struct key *key; struct fscrypt_master_key *mk; int err;
@@ -1102,19 +1145,18 @@ int fscrypt_ioctl_get_key_status(struct arg.user_count = 0; memset(arg.__out_reserved, 0, sizeof(arg.__out_reserved));
- key = fscrypt_find_master_key(sb, &arg.key_spec); - if (IS_ERR(key)) { - if (key != ERR_PTR(-ENOKEY)) - return PTR_ERR(key); + mk = fscrypt_find_master_key(sb, &arg.key_spec); + if (!mk) { arg.status = FSCRYPT_KEY_STATUS_ABSENT; err = 0; goto out; } - mk = key->payload.data[0]; - down_read(&key->sem); + down_read(&mk->mk_sem);
if (!is_master_key_secret_present(&mk->mk_secret)) { - arg.status = FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED; + arg.status = refcount_read(&mk->mk_active_refs) > 0 ? + FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED : + FSCRYPT_KEY_STATUS_ABSENT /* raced with full removal */; err = 0; goto out_release_key; } @@ -1136,8 +1178,8 @@ int fscrypt_ioctl_get_key_status(struct } err = 0; out_release_key: - up_read(&key->sem); - key_put(key); + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); out: if (!err && copy_to_user(uarg, &arg, sizeof(arg))) err = -EFAULT; @@ -1149,13 +1191,9 @@ int __init fscrypt_init_keyring(void) { int err;
- err = register_key_type(&key_type_fscrypt); - if (err) - return err; - err = register_key_type(&key_type_fscrypt_user); if (err) - goto err_unregister_fscrypt; + return err;
err = register_key_type(&key_type_fscrypt_provisioning); if (err) @@ -1165,7 +1203,5 @@ int __init fscrypt_init_keyring(void)
err_unregister_fscrypt_user: unregister_key_type(&key_type_fscrypt_user); -err_unregister_fscrypt: - unregister_key_type(&key_type_fscrypt); return err; } --- a/fs/crypto/keysetup.c +++ b/fs/crypto/keysetup.c @@ -9,7 +9,6 @@ */
#include <crypto/skcipher.h> -#include <linux/key.h> #include <linux/random.h>
#include "fscrypt_private.h" @@ -159,6 +158,7 @@ void fscrypt_destroy_prepared_key(struct { crypto_free_skcipher(prep_key->tfm); fscrypt_destroy_inline_crypt_key(prep_key); + memzero_explicit(prep_key, sizeof(*prep_key)); }
/* Given a per-file encryption key, set up the file's crypto transform object */ @@ -412,20 +412,18 @@ static bool fscrypt_valid_master_key_siz /* * Find the master key, then set up the inode's actual encryption key. * - * If the master key is found in the filesystem-level keyring, then the - * corresponding 'struct key' is returned in *master_key_ret with its semaphore - * read-locked. This is needed to ensure that only one task links the - * fscrypt_info into ->mk_decrypted_inodes (as multiple tasks may race to create - * an fscrypt_info for the same inode), and to synchronize the master key being - * removed with a new inode starting to use it. + * If the master key is found in the filesystem-level keyring, then it is + * returned in *mk_ret with its semaphore read-locked. This is needed to ensure + * that only one task links the fscrypt_info into ->mk_decrypted_inodes (as + * multiple tasks may race to create an fscrypt_info for the same inode), and to + * synchronize the master key being removed with a new inode starting to use it. */ static int setup_file_encryption_key(struct fscrypt_info *ci, bool need_dirhash_key, - struct key **master_key_ret) + struct fscrypt_master_key **mk_ret) { - struct key *key; - struct fscrypt_master_key *mk = NULL; struct fscrypt_key_specifier mk_spec; + struct fscrypt_master_key *mk; int err;
err = fscrypt_select_encryption_impl(ci); @@ -436,11 +434,10 @@ static int setup_file_encryption_key(str if (err) return err;
- key = fscrypt_find_master_key(ci->ci_inode->i_sb, &mk_spec); - if (IS_ERR(key)) { - if (key != ERR_PTR(-ENOKEY) || - ci->ci_policy.version != FSCRYPT_POLICY_V1) - return PTR_ERR(key); + mk = fscrypt_find_master_key(ci->ci_inode->i_sb, &mk_spec); + if (!mk) { + if (ci->ci_policy.version != FSCRYPT_POLICY_V1) + return -ENOKEY;
/* * As a legacy fallback for v1 policies, search for the key in @@ -450,9 +447,7 @@ static int setup_file_encryption_key(str */ return fscrypt_setup_v1_file_key_via_subscribed_keyrings(ci); } - - mk = key->payload.data[0]; - down_read(&key->sem); + down_read(&mk->mk_sem);
/* Has the secret been removed (via FS_IOC_REMOVE_ENCRYPTION_KEY)? */ if (!is_master_key_secret_present(&mk->mk_secret)) { @@ -480,18 +475,18 @@ static int setup_file_encryption_key(str if (err) goto out_release_key;
- *master_key_ret = key; + *mk_ret = mk; return 0;
out_release_key: - up_read(&key->sem); - key_put(key); + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); return err; }
static void put_crypt_info(struct fscrypt_info *ci) { - struct key *key; + struct fscrypt_master_key *mk;
if (!ci) return; @@ -501,24 +496,18 @@ static void put_crypt_info(struct fscryp else if (ci->ci_owns_key) fscrypt_destroy_prepared_key(&ci->ci_enc_key);
- key = ci->ci_master_key; - if (key) { - struct fscrypt_master_key *mk = key->payload.data[0]; - + mk = ci->ci_master_key; + if (mk) { /* * Remove this inode from the list of inodes that were unlocked - * with the master key. - * - * In addition, if we're removing the last inode from a key that - * already had its secret removed, invalidate the key so that it - * gets removed from ->s_master_keys. + * with the master key. In addition, if we're removing the last + * inode from a master key struct that already had its secret + * removed, then complete the full removal of the struct. */ spin_lock(&mk->mk_decrypted_inodes_lock); list_del(&ci->ci_master_key_link); spin_unlock(&mk->mk_decrypted_inodes_lock); - if (refcount_dec_and_test(&mk->mk_refcount)) - key_invalidate(key); - key_put(key); + fscrypt_put_master_key_activeref(mk); } memzero_explicit(ci, sizeof(*ci)); kmem_cache_free(fscrypt_info_cachep, ci); @@ -532,7 +521,7 @@ fscrypt_setup_encryption_info(struct ino { struct fscrypt_info *crypt_info; struct fscrypt_mode *mode; - struct key *master_key = NULL; + struct fscrypt_master_key *mk = NULL; int res;
res = fscrypt_initialize(inode->i_sb->s_cop->flags); @@ -555,8 +544,7 @@ fscrypt_setup_encryption_info(struct ino WARN_ON(mode->ivsize > FSCRYPT_MAX_IV_SIZE); crypt_info->ci_mode = mode;
- res = setup_file_encryption_key(crypt_info, need_dirhash_key, - &master_key); + res = setup_file_encryption_key(crypt_info, need_dirhash_key, &mk); if (res) goto out;
@@ -571,12 +559,9 @@ fscrypt_setup_encryption_info(struct ino * We won the race and set ->i_crypt_info to our crypt_info. * Now link it into the master key's inode list. */ - if (master_key) { - struct fscrypt_master_key *mk = - master_key->payload.data[0]; - - refcount_inc(&mk->mk_refcount); - crypt_info->ci_master_key = key_get(master_key); + if (mk) { + crypt_info->ci_master_key = mk; + refcount_inc(&mk->mk_active_refs); spin_lock(&mk->mk_decrypted_inodes_lock); list_add(&crypt_info->ci_master_key_link, &mk->mk_decrypted_inodes); @@ -586,9 +571,9 @@ fscrypt_setup_encryption_info(struct ino } res = 0; out: - if (master_key) { - up_read(&master_key->sem); - key_put(master_key); + if (mk) { + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); } put_crypt_info(crypt_info); return res; @@ -753,7 +738,6 @@ EXPORT_SYMBOL(fscrypt_free_inode); int fscrypt_drop_inode(struct inode *inode) { const struct fscrypt_info *ci = fscrypt_get_info(inode); - const struct fscrypt_master_key *mk;
/* * If ci is NULL, then the inode doesn't have an encryption key set up @@ -763,7 +747,6 @@ int fscrypt_drop_inode(struct inode *ino */ if (!ci || !ci->ci_master_key) return 0; - mk = ci->ci_master_key->payload.data[0];
/* * With proper, non-racy use of FS_IOC_REMOVE_ENCRYPTION_KEY, all inodes @@ -782,6 +765,6 @@ int fscrypt_drop_inode(struct inode *ino * then the thread removing the key will either evict the inode itself * or will correctly detect that it wasn't evicted due to the race. */ - return !is_master_key_secret_present(&mk->mk_secret); + return !is_master_key_secret_present(&ci->ci_master_key->mk_secret); } EXPORT_SYMBOL_GPL(fscrypt_drop_inode); --- a/fs/crypto/policy.c +++ b/fs/crypto/policy.c @@ -744,12 +744,8 @@ int fscrypt_set_context(struct inode *in * delayed key setup that requires the inode number. */ if (ci->ci_policy.version == FSCRYPT_POLICY_V2 && - (ci->ci_policy.v2.flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32)) { - const struct fscrypt_master_key *mk = - ci->ci_master_key->payload.data[0]; - - fscrypt_hash_inode_number(ci, mk); - } + (ci->ci_policy.v2.flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32)) + fscrypt_hash_inode_number(ci, ci->ci_master_key);
return inode->i_sb->s_cop->set_context(inode, &ctx, ctxsize, fs_data); } --- a/fs/super.c +++ b/fs/super.c @@ -291,7 +291,6 @@ static void __put_super(struct super_blo WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); security_sb_free(s); - fscrypt_sb_free(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); call_rcu(&s->rcu, destroy_super_rcu); @@ -480,6 +479,7 @@ void generic_shutdown_super(struct super evict_inodes(sb); /* only nonzero refcount inodes can have marks */ fsnotify_sb_delete(sb); + fscrypt_sb_delete(sb); security_sb_delete(sb);
if (sb->s_dio_done_wq) { --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1472,7 +1472,7 @@ struct super_block { const struct xattr_handler **s_xattr; #ifdef CONFIG_FS_ENCRYPTION const struct fscrypt_operations *s_cop; - struct key *s_master_keys; /* master crypto keys in use */ + struct fscrypt_keyring *s_master_keys; /* master crypto keys in use */ #endif #ifdef CONFIG_FS_VERITY const struct fsverity_operations *s_vop; --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -312,7 +312,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -void fscrypt_sb_free(struct super_block *sb); +void fscrypt_sb_delete(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); int fscrypt_add_test_dummy_key(struct super_block *sb, const struct fscrypt_dummy_policy *dummy_policy); @@ -526,7 +526,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -static inline void fscrypt_sb_free(struct super_block *sb) +static inline void fscrypt_sb_delete(struct super_block *sb) { }
From: Eric Biggers ebiggers@google.com
commit ccd30a476f8e864732de220bd50e6f372f5ebcab upstream.
Commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") moved the keyring destruction from __put_super() to generic_shutdown_super() so that the filesystem's block device(s) are still available. Unfortunately, this causes a memory leak in the case where a mount is attempted with the test_dummy_encryption mount option, but the mount fails after the option has already been processed.
To fix this, attempt the keyring destruction in both places.
Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/crypto/keyring.c | 17 +++++++++++------ fs/super.c | 3 ++- include/linux/fscrypt.h | 4 ++-- 3 files changed, 15 insertions(+), 9 deletions(-)
--- a/fs/crypto/keyring.c +++ b/fs/crypto/keyring.c @@ -202,14 +202,19 @@ static int allocate_filesystem_keyring(s }
/* - * This is called at unmount time to release all encryption keys that have been - * added to the filesystem, along with the keyring that contains them. + * Release all encryption keys that have been added to the filesystem, along + * with the keyring that contains them. * - * Note that besides clearing and freeing memory, this might need to evict keys - * from the keyslots of an inline crypto engine. Therefore, this must be called - * while the filesystem's underlying block device(s) are still available. + * This is called at unmount time. The filesystem's underlying block device(s) + * are still available at this time; this is important because after user file + * accesses have been allowed, this function may need to evict keys from the + * keyslots of an inline crypto engine, which requires the block device(s). + * + * This is also called when the super_block is being freed. This is needed to + * avoid a memory leak if mounting fails after the "test_dummy_encryption" + * option was processed, as in that case the unmount-time call isn't made. */ -void fscrypt_sb_delete(struct super_block *sb) +void fscrypt_destroy_keyring(struct super_block *sb) { struct fscrypt_keyring *keyring = sb->s_master_keys; size_t i; --- a/fs/super.c +++ b/fs/super.c @@ -291,6 +291,7 @@ static void __put_super(struct super_blo WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); security_sb_free(s); + fscrypt_destroy_keyring(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); call_rcu(&s->rcu, destroy_super_rcu); @@ -479,7 +480,7 @@ void generic_shutdown_super(struct super evict_inodes(sb); /* only nonzero refcount inodes can have marks */ fsnotify_sb_delete(sb); - fscrypt_sb_delete(sb); + fscrypt_destroy_keyring(sb); security_sb_delete(sb);
if (sb->s_dio_done_wq) { --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -312,7 +312,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -void fscrypt_sb_delete(struct super_block *sb); +void fscrypt_destroy_keyring(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); int fscrypt_add_test_dummy_key(struct super_block *sb, const struct fscrypt_dummy_policy *dummy_policy); @@ -526,7 +526,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -static inline void fscrypt_sb_delete(struct super_block *sb) +static inline void fscrypt_destroy_keyring(struct super_block *sb) { }
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit ba5284ebe497044f37c9bb9c7b1564932f4b6610 ]
On R-Car V4H, all PLLs except PLL5 support Spread Spectrum and/or Fractional Multiplication to reduce electromagnetic interference.
Add the SASYNCPER and SASYNCPERD[124] clocks, which are used as clock sources for modules that must not be affected by Spread Spectrum and/or Fractional Multiplication.
Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Stephen Boyd sboyd@kernel.org Link: https://lore.kernel.org/r/d0f35c35e1f96c5a649ab477e7ba5d8025957cd0.166514749... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/renesas/r8a779g0-cpg-mssr.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/clk/renesas/r8a779g0-cpg-mssr.c b/drivers/clk/renesas/r8a779g0-cpg-mssr.c index c9c59c6f7139..7beb0e3b1872 100644 --- a/drivers/clk/renesas/r8a779g0-cpg-mssr.c +++ b/drivers/clk/renesas/r8a779g0-cpg-mssr.c @@ -47,6 +47,7 @@ enum clk_ids { CLK_S0_VIO, CLK_S0_VC, CLK_S0_HSC, + CLK_SASYNCPER, CLK_SV_VIP, CLK_SV_IR, CLK_SDSRC, @@ -84,6 +85,7 @@ static const struct cpg_core_clk r8a779g0_core_clks[] __initconst = { DEF_FIXED(".s0_vio", CLK_S0_VIO, CLK_PLL1_DIV2, 2, 1), DEF_FIXED(".s0_vc", CLK_S0_VC, CLK_PLL1_DIV2, 2, 1), DEF_FIXED(".s0_hsc", CLK_S0_HSC, CLK_PLL1_DIV2, 2, 1), + DEF_FIXED(".sasyncper", CLK_SASYNCPER, CLK_PLL5_DIV4, 3, 1), DEF_FIXED(".sv_vip", CLK_SV_VIP, CLK_PLL1, 5, 1), DEF_FIXED(".sv_ir", CLK_SV_IR, CLK_PLL1, 5, 1), DEF_BASE(".sdsrc", CLK_SDSRC, CLK_TYPE_GEN4_SDSRC, CLK_PLL5), @@ -128,6 +130,9 @@ static const struct cpg_core_clk r8a779g0_core_clks[] __initconst = { DEF_FIXED("s0d4_hsc", R8A779G0_CLK_S0D4_HSC, CLK_S0_HSC, 4, 1), DEF_FIXED("cl16m_hsc", R8A779G0_CLK_CL16M_HSC, CLK_S0_HSC, 48, 1), DEF_FIXED("s0d2_cc", R8A779G0_CLK_S0D2_CC, CLK_S0, 2, 1), + DEF_FIXED("sasyncperd1",R8A779G0_CLK_SASYNCPERD1, CLK_SASYNCPER,1, 1), + DEF_FIXED("sasyncperd2",R8A779G0_CLK_SASYNCPERD2, CLK_SASYNCPER,2, 1), + DEF_FIXED("sasyncperd4",R8A779G0_CLK_SASYNCPERD4, CLK_SASYNCPER,4, 1), DEF_FIXED("svd1_ir", R8A779G0_CLK_SVD1_IR, CLK_SV_IR, 1, 1), DEF_FIXED("svd2_ir", R8A779G0_CLK_SVD2_IR, CLK_SV_IR, 2, 1), DEF_FIXED("svd1_vip", R8A779G0_CLK_SVD1_VIP, CLK_SV_VIP, 1, 1),
From: Filipe Manana fdmanana@suse.com
commit 8184620ae21213d51eaf2e0bd4186baacb928172 upstream.
When doing a direct IO write using a iocb with nowait and dsync set, we end up not syncing the file once the write completes.
This is because we tell iomap to not call generic_write_sync(), which would result in calling btrfs_sync_file(), in order to avoid a deadlock since iomap can call it while we are holding the inode's lock and btrfs_sync_file() needs to acquire the inode's lock. The deadlock happens only if the write happens synchronously, when iomap_dio_rw() calls iomap_dio_complete() before it returns. Instead we do the sync ourselves at btrfs_do_write_iter().
For a nowait write however we can end up not doing the sync ourselves at at btrfs_do_write_iter() because the write could have been queued, and therefore we get -EIOCBQUEUED returned from iomap in such case. That makes us skip the sync call at btrfs_do_write_iter(), as we don't do it for any error returned from btrfs_direct_write(). We can't simply do the call even if -EIOCBQUEUED is returned, since that would block the task waiting for IO, both for the data since there are bios still in progress as well as potentially blocking when joining a log transaction and when syncing the log (writing log trees, super blocks, etc).
So let iomap do the sync call itself and in order to avoid deadlocks for the case of synchronous writes (without nowait), use __iomap_dio_rw() and have ourselves call iomap_dio_complete() after unlocking the inode.
A test case will later be sent for fstests, after this is fixed in Linus' tree.
Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes") Reported-by: Марк Коренберг socketpair@gmail.com Link: https://lore.kernel.org/linux-btrfs/CAEmTpZGRKbzc16fWPvxbr6AfFsQoLmz-Lcg-7Og... CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ctree.h | 5 ++++- fs/btrfs/file.c | 22 ++++++++++++++++------ fs/btrfs/inode.c | 14 +++++++++++--- 3 files changed, 31 insertions(+), 10 deletions(-)
--- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3407,7 +3407,10 @@ ssize_t btrfs_encoded_read(struct kiocb ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from, const struct btrfs_ioctl_encoded_io_args *encoded);
-ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter, size_t done_before); +ssize_t btrfs_dio_read(struct kiocb *iocb, struct iov_iter *iter, + size_t done_before); +struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter, + size_t done_before);
extern const struct dentry_operations btrfs_dentry_operations;
--- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1889,6 +1889,7 @@ static ssize_t btrfs_direct_write(struct loff_t endbyte; ssize_t err; unsigned int ilock_flags = 0; + struct iomap_dio *dio;
if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; @@ -1949,11 +1950,22 @@ relock: * So here we disable page faults in the iov_iter and then retry if we * got -EFAULT, faulting in the pages before the retry. */ -again: from->nofault = true; - err = btrfs_dio_rw(iocb, from, written); + dio = btrfs_dio_write(iocb, from, written); from->nofault = false;
+ /* + * iomap_dio_complete() will call btrfs_sync_file() if we have a dsync + * iocb, and that needs to lock the inode. So unlock it before calling + * iomap_dio_complete() to avoid a deadlock. + */ + btrfs_inode_unlock(inode, ilock_flags); + + if (IS_ERR_OR_NULL(dio)) + err = PTR_ERR_OR_ZERO(dio); + else + err = iomap_dio_complete(dio); + /* No increment (+=) because iomap returns a cumulative value. */ if (err > 0) written = err; @@ -1979,12 +1991,10 @@ again: } else { fault_in_iov_iter_readable(from, left); prev_left = left; - goto again; + goto relock; } }
- btrfs_inode_unlock(inode, ilock_flags); - /* * If 'err' is -ENOTBLK or we have not written all data, then it means * we must fallback to buffered IO. @@ -3787,7 +3797,7 @@ again: */ pagefault_disable(); to->nofault = true; - ret = btrfs_dio_rw(iocb, to, read); + ret = btrfs_dio_read(iocb, to, read); to->nofault = false; pagefault_enable();
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8241,13 +8241,21 @@ static const struct iomap_dio_ops btrfs_ .bio_set = &btrfs_dio_bioset, };
-ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter, size_t done_before) +ssize_t btrfs_dio_read(struct kiocb *iocb, struct iov_iter *iter, size_t done_before) { struct btrfs_dio_data data;
return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops, - IOMAP_DIO_PARTIAL | IOMAP_DIO_NOSYNC, - &data, done_before); + IOMAP_DIO_PARTIAL, &data, done_before); +} + +struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter, + size_t done_before) +{ + struct btrfs_dio_data data; + + return __iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops, + IOMAP_DIO_PARTIAL, &data, done_before); }
static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
From: Josef Bacik josef@toxicpanda.com
commit 968b71583130b6104c9f33ba60446d598e327a8b upstream.
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677! invalid opcode: 0000 [#1] SMP RIP: 0010:tree_mod_log_rewind+0x1b4/0x200 RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293 RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000 RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00 RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001 R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00 FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: btrfs_get_old_root+0x12b/0x420 btrfs_search_old_slot+0x64/0x2f0 ? tree_mod_log_oldest_root+0x3d/0xf0 resolve_indirect_ref+0xfd/0x660 ? ulist_alloc+0x31/0x60 ? kmem_cache_alloc_trace+0x114/0x2c0 find_parent_nodes+0x97a/0x17e0 ? ulist_alloc+0x30/0x60 btrfs_find_all_roots_safe+0x97/0x150 iterate_extent_inodes+0x154/0x370 ? btrfs_search_path_in_tree+0x240/0x240 iterate_inodes_from_logical+0x98/0xd0 ? btrfs_search_path_in_tree+0x240/0x240 btrfs_ioctl_logical_to_ino+0xd9/0x180 btrfs_ioctl+0xe2/0x2ec0 ? __mod_memcg_lruvec_state+0x3d/0x280 ? do_sys_openat2+0x6d/0x140 ? kretprobe_dispatcher+0x47/0x70 ? kretprobe_rethook_handler+0x38/0x50 ? rethook_trampoline_handler+0x82/0x140 ? arch_rethook_trampoline_callback+0x3b/0x50 ? kmem_cache_free+0xfb/0x270 ? do_sys_openat2+0xd5/0x140 __x64_sys_ioctl+0x71/0xb0 do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) { case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot, starting at 0. 'n' here is the number of items in this block, which in this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for a block that shouldn't have been replayed. Consider the following sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq(). 2. We begin removing items from this root, triggering KEY_REPLACE for it's child slots. 3. We remove one of the 2 children this root node points to, thus triggering the root node promotion of the remaining child, and freeing this node. 4. We modify a new root, and re-allocate the above node to the root node of this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2 logical 0 op KEY_REMOVE (slot 1) seq 3 logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4 logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5 logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6 logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7 logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root. 2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is currently logical 0. 3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which gives us the KEY_REPLACE seq 2, and since that's not a LOG_ROOT_REPLACE we incorrectly believe that we don't have an old root, because we expect that the most recent change should be a LOG_ROOT_REPLACE. 4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE, so we don't set old_root, we simply use our existing extent buffer. 5. Since we're using our existing extent buffer (logical 0) we call tree_mod_log_search(0) in order to get the newest change to start the rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8. 6. Again since we didn't find an old_root we simply clone logical 0 at it's current state. 7. We call tree_mod_log_rewind() with the cloned extent buffer. 8. Set n = btrfs_header_nritems(logical 0), which would be whatever the original nritems was when we COWed the original root, say for this example it's 2. 9. We start from the newest operation and work our way forward, so we see LOG_ROOT_REPLACE which we ignore. 10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the BUG_ON(tm->slot < n), because it expects if we've done this we have a completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then get the old_root set to logical 8192. In fact making that change fixes this particular problem.
However consider the much more complicated case. We have a child node in this tree and the above situation. In the above case we freed one of the child blocks at the seq 3 operation. If this block was also re-allocated and got new tree mod log operations we would have a different problem. btrfs_search_old_slot(orig root) would get down to the logical 0 root that still pointed at that node. However in btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This is not context aware enough to know which operations we should be replaying. If the block was re-allocated multiple times we may only want to replay a range of operations, and determining what that range is isn't possible to determine.
We could maybe solve this by keeping track of which root the node belonged to at every tree mod log operation, and then passing this around to make sure we're only replaying operations that relate to the root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow reallocations if we have currently running tree mod log users. We already do this for leaf's, so we're simply expanding this to nodes as well. This is a relatively uncommon occurrence, and the problem is complicated enough I'm worried that we will still have corner cases in the reallocation case. So fix this in the most straightforward way possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions") CC: stable@vger.kernel.org # 3.3+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3294,21 +3294,22 @@ void btrfs_free_tree_block(struct btrfs_ }
/* - * If this is a leaf and there are tree mod log users, we may - * have recorded mod log operations that point to this leaf. - * So we must make sure no one reuses this leaf's extent before - * mod log operations are applied to a node, otherwise after - * rewinding a node using the mod log operations we get an - * inconsistent btree, as the leaf's extent may now be used as - * a node or leaf for another different btree. + * If there are tree mod log users we may have recorded mod log + * operations for this node. If we re-allocate this node we + * could replay operations on this node that happened when it + * existed in a completely different root. For example if it + * was part of root A, then was reallocated to root B, and we + * are doing a btrfs_old_search_slot(root b), we could replay + * operations that happened when the block was part of root A, + * giving us an inconsistent view of the btree. + * * We are safe from races here because at this point no other * node or root points to this extent buffer, so if after this - * check a new tree mod log user joins, it will not be able to - * find a node pointing to this leaf and record operations that - * point to this leaf. + * check a new tree mod log user joins we will not have an + * existing log of operations on this node that we have to + * contend with. */ - if (btrfs_header_level(buf) == 0 && - test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags)) + if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags)) must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
From: David Sterba dsterba@suse.com
commit 2398091f9c2c8e0040f4f9928666787a3e8108a7 upstream.
The type of parameter generation has been u32 since the beginning, however all callers pass a u64 generation, so unify the types to prevent potential loss.
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/export.c | 2 +- fs/btrfs/export.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -58,7 +58,7 @@ static int btrfs_encode_fh(struct inode }
struct dentry *btrfs_get_dentry(struct super_block *sb, u64 objectid, - u64 root_objectid, u32 generation, + u64 root_objectid, u64 generation, int check_generation) { struct btrfs_fs_info *fs_info = btrfs_sb(sb); --- a/fs/btrfs/export.h +++ b/fs/btrfs/export.h @@ -19,7 +19,7 @@ struct btrfs_fid { } __attribute__ ((packed));
struct dentry *btrfs_get_dentry(struct super_block *sb, u64 objectid, - u64 root_objectid, u32 generation, + u64 root_objectid, u64 generation, int check_generation); struct dentry *btrfs_get_parent(struct dentry *child);
From: Qu Wenruo wqu@suse.com
commit 76a66ba101329316a5d7f4275070be22eb85fdf2 upstream.
[BUG] There are two reports (the earliest one from LKP, a more recent one from kernel bugzilla) that we can have some chunks with 0 as sub_stripes.
This will cause divide-by-zero errors at btrfs_rmap_block, which is introduced by a recent kernel patch ac0677348f3c ("btrfs: merge calculations for simple striped profiles in btrfs_rmap_block"):
if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { stripe_nr = stripe_nr * map->num_stripes + i; stripe_nr = div_u64(stripe_nr, map->sub_stripes); <<< }
[CAUSE]
From the more recent report, it has been proven that we have some chunks
with 0 as sub_stripes, mostly caused by older mkfs.
It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa ("btrfs-progs: Initialize sub_stripes to 1 in btrfs_alloc_data_chunk") which is included in v5.4 btrfs-progs release.
So there would be quite some old filesystems with such 0 sub_stripes.
[FIX] Just don't trust the sub_stripes values from disk.
We have a trusted btrfs_raid_array[] to fetch the correct sub_stripes numbers for each profile and that are fixed.
By this, we can keep the compatibility with older filesystems while still avoid divide-by-zero bugs.
Reported-by: kernel test robot oliver.sang@intel.com Reported-by: Viktor Kuzmin kvaster@gmail.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216559 Fixes: ac0677348f3c ("btrfs: merge calculations for simple striped profiles in btrfs_rmap_block") CC: stable@vger.kernel.org # 6.0 Reviewed-by: Su Yue glass@fydeos.io Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/volumes.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7029,6 +7029,7 @@ static int read_one_chunk(struct btrfs_k u64 devid; u64 type; u8 uuid[BTRFS_UUID_SIZE]; + int index; int num_stripes; int ret; int i; @@ -7036,6 +7037,7 @@ static int read_one_chunk(struct btrfs_k logical = key->offset; length = btrfs_chunk_length(leaf, chunk); type = btrfs_chunk_type(leaf, chunk); + index = btrfs_bg_flags_to_raid_index(type); num_stripes = btrfs_chunk_num_stripes(leaf, chunk);
#if BITS_PER_LONG == 32 @@ -7089,7 +7091,15 @@ static int read_one_chunk(struct btrfs_k map->io_align = btrfs_chunk_io_align(leaf, chunk); map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk); map->type = type; - map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk); + /* + * We can't use the sub_stripes value, as for profiles other than + * RAID10, they may have 0 as sub_stripes for filesystems created by + * older mkfs (<v5.4). + * In that case, it can cause divide-by-zero errors later. + * Since currently sub_stripes is fixed for each profile, let's + * use the trusted value instead. + */ + map->sub_stripes = btrfs_raid_array[index].sub_stripes; map->verified_stripes = 0; em->orig_block_len = btrfs_calc_stripe_length(em); for (i = 0; i < num_stripes; i++) {
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit 063b1f21cc9be07291a1f5e227436f353c6d1695 upstream.
After allocation 'dip' is tested instead of 'dip->csums'. Fix it.
Fixes: 642c5d34da53 ("btrfs: allocate the btrfs_dio_private as part of the iomap dio bio") CC: stable@vger.kernel.org # 5.19+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8142,7 +8142,7 @@ static void btrfs_submit_direct(const st */ status = BLK_STS_RESOURCE; dip->csums = kcalloc(nr_sectors, fs_info->csum_size, GFP_NOFS); - if (!dip) + if (!dip->csums) goto out_err;
status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
From: Dan Williams dan.j.williams@intel.com
commit 24f0692bfd41fd207d99c993a5785c3426762046 upstream.
The ACPI CEDT.CFMWS indicates a range of possible address where new CXL regions can appear. Each range is associated with a QTG id (QoS Throttling Group id). For each range + QTG pair that is not covered by a proximity domain in the SRAT, Linux creates a new NUMA node. However, the commit that added the new ranges missed updating the node_possible mask which causes memory_group_register() to fail. Add the new nodes to the nodes_possible mask.
Cc: stable@vger.kernel.org Fixes: fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT") Cc: Alison Schofield alison.schofield@intel.com Cc: Rafael J. Wysocki rafael.j.wysocki@intel.com Reported-by: Vishal Verma vishal.l.verma@intel.com Tested-by: Vishal Verma vishal.l.verma@intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Vishal Verma vishal.l.verma@intel.com Link: https://lore.kernel.org/r/166631003537.1167078.9373680312035292395.stgit@dwi... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/numa/srat.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -327,6 +327,7 @@ static int __init acpi_parse_cfmws(union pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", node, start, end); } + node_set(node, numa_nodes_parsed);
/* Set the next available fake_pxm value */ (*fake_pxm)++;
From: Dan Williams dan.j.williams@intel.com
commit 4d07ae22e79ebc2d7528bbc69daa53b86981cb3a upstream.
When a cxl_nvdimm object goes through a ->remove() event (device physically removed, nvdimm-bridge disabled, or nvdimm device disabled), then any associated regions must also be disabled. As highlighted by the cxl-create-region.sh test [1], a single device may host multiple regions, but the driver was only tracking one region at a time. This leads to a situation where only the last enabled region per nvdimm device is cleaned up properly. Other regions are leaked, and this also causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region associations.
Cc: stable@vger.kernel.org Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1] Reported-by: Vishal Verma vishal.l.verma@intel.com Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Vishal Verma vishal.l.verma@intel.com Link: https://lore.kernel.org/r/166752183647.947915.2045230911503793901.stgit@dwil... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cxl/core/pmem.c | 2 drivers/cxl/cxl.h | 2 drivers/cxl/pmem.c | 101 ++++++++++++++++++++++++++++++------------------ 3 files changed, 68 insertions(+), 37 deletions(-)
--- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -188,6 +188,7 @@ static void cxl_nvdimm_release(struct de { struct cxl_nvdimm *cxl_nvd = to_cxl_nvdimm(dev);
+ xa_destroy(&cxl_nvd->pmem_regions); kfree(cxl_nvd); }
@@ -230,6 +231,7 @@ static struct cxl_nvdimm *cxl_nvdimm_all
dev = &cxl_nvd->dev; cxl_nvd->cxlmd = cxlmd; + xa_init(&cxl_nvd->pmem_regions); device_initialize(dev); lockdep_set_class(&dev->mutex, &cxl_nvdimm_key); device_set_pm_not_required(dev); --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -423,7 +423,7 @@ struct cxl_nvdimm { struct device dev; struct cxl_memdev *cxlmd; struct cxl_nvdimm_bridge *bridge; - struct cxl_pmem_region *region; + struct xarray pmem_regions; };
struct cxl_pmem_region_mapping { --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -30,17 +30,20 @@ static void unregister_nvdimm(void *nvdi struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm); struct cxl_nvdimm_bridge *cxl_nvb = cxl_nvd->bridge; struct cxl_pmem_region *cxlr_pmem; + unsigned long index;
device_lock(&cxl_nvb->dev); - cxlr_pmem = cxl_nvd->region; dev_set_drvdata(&cxl_nvd->dev, NULL); - cxl_nvd->region = NULL; - device_unlock(&cxl_nvb->dev); + xa_for_each(&cxl_nvd->pmem_regions, index, cxlr_pmem) { + get_device(&cxlr_pmem->dev); + device_unlock(&cxl_nvb->dev);
- if (cxlr_pmem) { device_release_driver(&cxlr_pmem->dev); put_device(&cxlr_pmem->dev); + + device_lock(&cxl_nvb->dev); } + device_unlock(&cxl_nvb->dev);
nvdimm_delete(nvdimm); cxl_nvd->bridge = NULL; @@ -366,25 +369,49 @@ static int match_cxl_nvdimm(struct devic
static void unregister_nvdimm_region(void *nd_region) { - struct cxl_nvdimm_bridge *cxl_nvb; - struct cxl_pmem_region *cxlr_pmem; + nvdimm_region_delete(nd_region); +} + +static int cxl_nvdimm_add_region(struct cxl_nvdimm *cxl_nvd, + struct cxl_pmem_region *cxlr_pmem) +{ + int rc; + + rc = xa_insert(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem, + cxlr_pmem, GFP_KERNEL); + if (rc) + return rc; + + get_device(&cxlr_pmem->dev); + return 0; +} + +static void cxl_nvdimm_del_region(struct cxl_nvdimm *cxl_nvd, + struct cxl_pmem_region *cxlr_pmem) +{ + /* + * It is possible this is called without a corresponding + * cxl_nvdimm_add_region for @cxlr_pmem + */ + cxlr_pmem = xa_erase(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem); + if (cxlr_pmem) + put_device(&cxlr_pmem->dev); +} + +static void release_mappings(void *data) +{ int i; + struct cxl_pmem_region *cxlr_pmem = data; + struct cxl_nvdimm_bridge *cxl_nvb = cxlr_pmem->bridge;
- cxlr_pmem = nd_region_provider_data(nd_region); - cxl_nvb = cxlr_pmem->bridge; device_lock(&cxl_nvb->dev); for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_nvdimm *cxl_nvd = m->cxl_nvd;
- if (cxl_nvd->region) { - put_device(&cxlr_pmem->dev); - cxl_nvd->region = NULL; - } + cxl_nvdimm_del_region(cxl_nvd, cxlr_pmem); } device_unlock(&cxl_nvb->dev); - - nvdimm_region_delete(nd_region); }
static void cxlr_pmem_remove_resource(void *res) @@ -422,7 +449,7 @@ static int cxl_pmem_region_probe(struct if (!cxl_nvb->nvdimm_bus) { dev_dbg(dev, "nvdimm bus not found\n"); rc = -ENXIO; - goto err; + goto out_nvb; }
memset(&mappings, 0, sizeof(mappings)); @@ -431,7 +458,7 @@ static int cxl_pmem_region_probe(struct res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL); if (!res) { rc = -ENOMEM; - goto err; + goto out_nvb; }
res->name = "Persistent Memory"; @@ -442,11 +469,11 @@ static int cxl_pmem_region_probe(struct
rc = insert_resource(&iomem_resource, res); if (rc) - goto err; + goto out_nvb;
rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res); if (rc) - goto err; + goto out_nvb;
ndr_desc.res = res; ndr_desc.provider_data = cxlr_pmem; @@ -462,7 +489,7 @@ static int cxl_pmem_region_probe(struct nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL); if (!nd_set) { rc = -ENOMEM; - goto err; + goto out_nvb; }
ndr_desc.memregion = cxlr->id; @@ -472,9 +499,13 @@ static int cxl_pmem_region_probe(struct info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL); if (!info) { rc = -ENOMEM; - goto err; + goto out_nvb; }
+ rc = devm_add_action_or_reset(dev, release_mappings, cxlr_pmem); + if (rc) + goto out_nvd; + for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_memdev *cxlmd = m->cxlmd; @@ -486,7 +517,7 @@ static int cxl_pmem_region_probe(struct dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV; - goto err; + goto out_nvd; }
/* safe to drop ref now with bridge lock held */ @@ -498,10 +529,17 @@ static int cxl_pmem_region_probe(struct dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV; - goto err; + goto out_nvd; } - cxl_nvd->region = cxlr_pmem; - get_device(&cxlr_pmem->dev); + + /* + * Pin the region per nvdimm device as those may be released + * out-of-order with respect to the region, and a single nvdimm + * maybe associated with multiple regions + */ + rc = cxl_nvdimm_add_region(cxl_nvd, cxlr_pmem); + if (rc) + goto out_nvd; m->cxl_nvd = cxl_nvd; mappings[i] = (struct nd_mapping_desc) { .nvdimm = nvdimm, @@ -527,27 +565,18 @@ static int cxl_pmem_region_probe(struct nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc); if (!cxlr_pmem->nd_region) { rc = -ENOMEM; - goto err; + goto out_nvd; }
rc = devm_add_action_or_reset(dev, unregister_nvdimm_region, cxlr_pmem->nd_region); -out: +out_nvd: kfree(info); +out_nvb: device_unlock(&cxl_nvb->dev); put_device(&cxl_nvb->dev);
return rc; - -err: - dev_dbg(dev, "failed to create nvdimm region\n"); - for (i--; i >= 0; i--) { - nvdimm = mappings[i].nvdimm; - cxl_nvd = nvdimm_provider_data(nvdimm); - put_device(&cxl_nvd->region->dev); - cxl_nvd->region = NULL; - } - goto out; }
static struct cxl_driver cxl_pmem_region_driver = {
From: Vishal Verma vishal.l.verma@intel.com
commit 71ee71d7adcba648077997a29a91158d20c40b09 upstream.
When an intermediate port's decoders have been exhausted by existing regions, and creating a new region with the port in question in it's hierarchical path is attempted, cxl_port_attach_region() fails to find a port decoder (as would be expected), and drops into the failure / cleanup path.
However, during cleanup of the region reference, a sanity check attempts to dereference the decoder, which in the above case didn't exist. This causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into helper routines, and in this 'free' routine, check that the decoder, @cxld, is valid before attempting any operations on it.
Cc: stable@vger.kernel.org Suggested-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Vishal Verma vishal.l.verma@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cxl/core/region.c | 67 ++++++++++++++++++++++++++++------------------ 1 file changed, 41 insertions(+), 26 deletions(-)
--- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -686,18 +686,27 @@ static struct cxl_region_ref *alloc_regi return cxl_rr; }
-static void free_region_ref(struct cxl_region_ref *cxl_rr) +static void cxl_rr_free_decoder(struct cxl_region_ref *cxl_rr) { - struct cxl_port *port = cxl_rr->port; struct cxl_region *cxlr = cxl_rr->region; struct cxl_decoder *cxld = cxl_rr->decoder;
+ if (!cxld) + return; + dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n"); if (cxld->region == cxlr) { cxld->region = NULL; put_device(&cxlr->dev); } +}
+static void free_region_ref(struct cxl_region_ref *cxl_rr) +{ + struct cxl_port *port = cxl_rr->port; + struct cxl_region *cxlr = cxl_rr->region; + + cxl_rr_free_decoder(cxl_rr); xa_erase(&port->regions, (unsigned long)cxlr); xa_destroy(&cxl_rr->endpoints); kfree(cxl_rr); @@ -728,6 +737,33 @@ static int cxl_rr_ep_add(struct cxl_regi return 0; }
+static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, + struct cxl_region_ref *cxl_rr) +{ + struct cxl_decoder *cxld; + + if (port == cxled_to_port(cxled)) + cxld = &cxled->cxld; + else + cxld = cxl_region_find_decoder(port, cxlr); + if (!cxld) { + dev_dbg(&cxlr->dev, "%s: no decoder available\n", + dev_name(&port->dev)); + return -EBUSY; + } + + if (cxld->region) { + dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n", + dev_name(&port->dev), dev_name(&cxld->dev), + dev_name(&cxld->region->dev)); + return -EBUSY; + } + + cxl_rr->decoder = cxld; + return 0; +} + /** * cxl_port_attach_region() - track a region's interest in a port by endpoint * @port: port to add a new region reference 'struct cxl_region_ref' @@ -794,12 +830,6 @@ static int cxl_port_attach_region(struct cxl_rr->nr_targets++; nr_targets_inc = true; } - - /* - * The decoder for @cxlr was allocated when the region was first - * attached to @port. - */ - cxld = cxl_rr->decoder; } else { cxl_rr = alloc_region_ref(port, cxlr); if (IS_ERR(cxl_rr)) { @@ -810,26 +840,11 @@ static int cxl_port_attach_region(struct } nr_targets_inc = true;
- if (port == cxled_to_port(cxled)) - cxld = &cxled->cxld; - else - cxld = cxl_region_find_decoder(port, cxlr); - if (!cxld) { - dev_dbg(&cxlr->dev, "%s: no decoder available\n", - dev_name(&port->dev)); - goto out_erase; - } - - if (cxld->region) { - dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n", - dev_name(&port->dev), dev_name(&cxld->dev), - dev_name(&cxld->region->dev)); - rc = -EBUSY; + rc = cxl_rr_alloc_decoder(port, cxlr, cxled, cxl_rr); + if (rc) goto out_erase; - } - - cxl_rr->decoder = cxld; } + cxld = cxl_rr->decoder;
rc = cxl_rr_ep_add(cxl_rr, cxled); if (rc) {
From: Dan Williams dan.j.williams@intel.com
commit a90accb358ae33ea982a35595573f7a045993f8b upstream.
Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result:
devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace: <TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90
store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453
Cc: stable@vger.kernel.org Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reviewed-by: Vishal Verma vishal.l.verma@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/166752182461.947915.497032805239915067.stgit@dwill... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cxl/core/region.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -657,6 +657,9 @@ static struct cxl_region_ref *alloc_regi xa_for_each(&port->regions, index, iter) { struct cxl_region_params *ip = &iter->region->params;
+ if (!ip->res) + continue; + if (ip->res->start > p->res->start) { dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
From: Dan Williams dan.j.williams@intel.com
commit 0d9e734018d70cecf79e2e4c6082167160a0f13f upstream.
When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time.
Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down.
Cc: stable@vger.kernel.org Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Vishal Verma vishal.l.verma@intel.com Link: https://lore.kernel.org/r/166752183055.947915.17681995648556534844.stgit@dwi... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cxl/core/region.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1556,8 +1556,19 @@ static struct cxl_region *to_cxl_region( static void unregister_region(void *dev) { struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int i;
device_del(dev); + + /* + * Now that region sysfs is shutdown, the parameter block is now + * read-only, so no need to hold the region rwsem to access the + * region parameters. + */ + for (i = 0; i < p->interleave_ways; i++) + detach_target(cxlr, i); + cxl_region_iomem_release(cxlr); put_device(dev); }
From: Dan Williams dan.j.williams@intel.com
commit e4f6dfa9ef756a3934a4caf618b1e86e9e8e21d0 upstream.
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets.
An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices).
However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port.
Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance.
Cc: stable@vger.kernel.org Reported-by: Bobo WL lmw.bobo@gmail.com Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma vishal.l.verma@intel.com Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwil... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cxl/core/port.c | 11 +++++++++-- drivers/cxl/core/region.c | 9 ++++++++- drivers/cxl/cxl.h | 2 ++ 3 files changed, 19 insertions(+), 3 deletions(-)
--- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -811,6 +811,7 @@ static struct cxl_dport *find_dport(stru static int add_dport(struct cxl_port *port, struct cxl_dport *new) { struct cxl_dport *dup; + int rc;
device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); @@ -821,8 +822,14 @@ static int add_dport(struct cxl_port *po dev_name(dup->dport)); return -EBUSY; } - return xa_insert(&port->dports, (unsigned long)new->dport, new, - GFP_KERNEL); + + rc = xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); + if (rc) + return rc; + + port->nr_dports++; + return 0; }
/* --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -989,7 +989,14 @@ static int cxl_port_setup_targets(struct if (cxl_rr->nr_targets_set) { int i, distance;
- distance = p->nr_targets / cxl_rr->nr_targets; + /* + * Passthrough ports impose no distance requirements between + * peers + */ + if (port->nr_dports == 1) + distance = 0; + else + distance = p->nr_targets / cxl_rr->nr_targets; for (i = 0; i < cxl_rr->nr_targets_set; i++) if (ep->dport == cxlsd->target[i]) { rc = check_last_peer(cxled, ep, cxl_rr, --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -457,6 +457,7 @@ struct cxl_pmem_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids + * @nr_dports: number of entries in @dports * @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) @@ -475,6 +476,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + int nr_dports; int hdm_end; int commit_end; resource_size_t component_reg_phys;
From: Li Huafei lihuafei1@huawei.com
commit 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 upstream.
KASAN reported a use-after-free with ftrace ops [1]. It was found from vmcore that perf had registered two ops with the same content successively, both dynamic. After unregistering the second ops, a use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the FTRACE_UPDATE_CALLS command is not set because there is another enabled ops with the same content. Also, both ops are dynamic and the ftrace callback function is ftrace_ops_list_func, so the FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value of 'command' will be 0 and ftrace_shutdown() will skip the rcu synchronization.
However, ftrace may be activated. When the ops is released, another CPU may be accessing the ops. Add the missing synchronization to fix this problem.
[1] BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b4/0x248 lib/dump_stack.c:118 print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387 __kasan_report mm/kasan/report.c:547 [inline] kasan_report+0x118/0x210 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load8+0x98/0xc0 mm/kasan/generic.c:253 __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 ftrace_graph_call+0x0/0x4 __might_sleep+0x8/0x100 include/linux/perf_event.h:1170 __might_fault mm/memory.c:5183 [inline] __might_fault+0x58/0x70 mm/memory.c:5171 do_strncpy_from_user lib/strncpy_from_user.c:41 [inline] strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139 getname_flags+0xb0/0x31c fs/namei.c:149 getname+0x2c/0x40 fs/namei.c:209 [...]
Allocated by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:479 [inline] __kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493 kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950 kmalloc include/linux/slab.h:563 [inline] kzalloc include/linux/slab.h:675 [inline] perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 __arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723 [...]
Freed by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track+0x24/0x34 mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358 __kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437 __kasan_slab_free mm/kasan/common.c:445 [inline] kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446 slab_free_hook mm/slub.c:1569 [inline] slab_free_freelist_hook mm/slub.c:1608 [inline] slab_free mm/slub.c:3179 [inline] kfree+0x12c/0xc10 mm/slub.c:4176 perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 [...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei1...
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled") Cc: stable@vger.kernel.org Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Li Huafei lihuafei1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3031,18 +3031,8 @@ int ftrace_shutdown(struct ftrace_ops *o command |= FTRACE_UPDATE_TRACE_FUNC; }
- if (!command || !ftrace_enabled) { - /* - * If these are dynamic or per_cpu ops, they still - * need their data freed. Since, function tracing is - * not currently active, we can just free them - * without synchronizing all CPUs. - */ - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) - goto free_ops; - - return 0; - } + if (!command || !ftrace_enabled) + goto out;
/* * If the ops uses a trampoline, then it needs to be @@ -3079,6 +3069,7 @@ int ftrace_shutdown(struct ftrace_ops *o removed_ops = NULL; ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out: /* * Dynamic ops may be freed, we must make sure that all * callers are done before leaving this function. @@ -3106,7 +3097,6 @@ int ftrace_shutdown(struct ftrace_ops *o if (IS_ENABLED(CONFIG_PREEMPTION)) synchronize_rcu_tasks();
- free_ops: ftrace_trampoline_free(ops); }
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 61b304b73ab4b48b1cd7796efe42a570e2a0e0fc upstream.
Since commit ab51e15d535e ("fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobe") introduced fprobe_kprobe_handler() for fprobe::ops::func, unregister_fprobe() fails to unregister the registered if user specifies FPROBE_FL_KPROBE_SHARED flag. Moreover, __register_ftrace_function() is possible to change the ftrace_ops::func, thus we have to check fprobe::ops::saved_func instead.
To check it correctly, it should confirm the fprobe::ops::saved_func is either fprobe_handler() or fprobe_kprobe_handler().
Link: https://lore.kernel.org/all/166677683946.1459107.15997653945538644683.stgit@...
Fixes: cad9931f64dc ("fprobe: Add ftrace based probe APIs") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/fprobe.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/trace/fprobe.c +++ b/kernel/trace/fprobe.c @@ -301,7 +301,8 @@ int unregister_fprobe(struct fprobe *fp) { int ret;
- if (!fp || fp->ops.func != fprobe_handler) + if (!fp || (fp->ops.saved_func != fprobe_handler && + fp->ops.saved_func != fprobe_kprobe_handler)) return -EINVAL;
/*
From: Rafael Mendonca rafaelmendsr@gmail.com
commit d05ea35e7eea14d32f29fd688d3daeb9089de1a5 upstream.
Check if fp->rethook succeeded to be allocated. Otherwise, if rethook_alloc() fails, then we end up dereferencing a NULL pointer in rethook_add_node().
Link: https://lore.kernel.org/all/20221025031209.954836-1-rafaelmendsr@gmail.com/
Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support") Cc: stable@vger.kernel.org Signed-off-by: Rafael Mendonca rafaelmendsr@gmail.com Acked-by: Steven Rostedt (Google) rostedt@goodmis.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/fprobe.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/trace/fprobe.c +++ b/kernel/trace/fprobe.c @@ -141,6 +141,8 @@ static int fprobe_init_rethook(struct fp return -E2BIG;
fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler); + if (!fp->rethook) + return -ENOMEM; for (i = 0; i < size; i++) { struct fprobe_rethook_node *node;
From: Shang XiaoJing shangxiaojing@huawei.com
commit 66f0919c953ef7b55e5ab94389a013da2ce80a2c upstream.
test_gen_kprobe_cmd() only free buf in fail path, hence buf will leak when there is no failure. Move kfree(buf) from fail path to common path to prevent the memleak. The same reason and solution in test_gen_kretprobe_cmd().
unreferenced object 0xffff888143b14000 (size 2048): comm "insmod", pid 52490, jiffies 4301890980 (age 40.553s) hex dump (first 32 bytes): 70 3a 6b 70 72 6f 62 65 73 2f 67 65 6e 5f 6b 70 p:kprobes/gen_kp 72 6f 62 65 5f 74 65 73 74 20 64 6f 5f 73 79 73 robe_test do_sys backtrace: [<000000006d7b836b>] kmalloc_trace+0x27/0xa0 [<0000000009528b5b>] 0xffffffffa059006f [<000000008408b580>] do_one_initcall+0x87/0x2a0 [<00000000c4980a7e>] do_init_module+0xdf/0x320 [<00000000d775aad0>] load_module+0x3006/0x3390 [<00000000e9a74b80>] __do_sys_finit_module+0x113/0x1b0 [<000000003726480d>] do_syscall_64+0x35/0x80 [<000000003441e93b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lore.kernel.org/all/20221102072954.26555-1-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Cc: stable@vger.kernel.org Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/kprobe_event_gen_test.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-)
--- a/kernel/trace/kprobe_event_gen_test.c +++ b/kernel/trace/kprobe_event_gen_test.c @@ -100,20 +100,20 @@ static int __init test_gen_kprobe_cmd(vo KPROBE_GEN_TEST_FUNC, KPROBE_GEN_TEST_ARG0, KPROBE_GEN_TEST_ARG1); if (ret) - goto free; + goto out;
/* Use kprobe_event_add_fields to add the rest of the fields */
ret = kprobe_event_add_fields(&cmd, KPROBE_GEN_TEST_ARG2, KPROBE_GEN_TEST_ARG3); if (ret) - goto free; + goto out;
/* * This actually creates the event. */ ret = kprobe_event_gen_cmd_end(&cmd); if (ret) - goto free; + goto out;
/* * Now get the gen_kprobe_test event file. We need to prevent @@ -136,13 +136,11 @@ static int __init test_gen_kprobe_cmd(vo goto delete; } out: + kfree(buf); return ret; delete: /* We got an error after creating the event, delete it */ ret = kprobe_event_delete("gen_kprobe_test"); - free: - kfree(buf); - goto out; }
@@ -170,14 +168,14 @@ static int __init test_gen_kretprobe_cmd KPROBE_GEN_TEST_FUNC, "$retval"); if (ret) - goto free; + goto out;
/* * This actually creates the event. */ ret = kretprobe_event_gen_cmd_end(&cmd); if (ret) - goto free; + goto out;
/* * Now get the gen_kretprobe_test event file. We need to @@ -201,13 +199,11 @@ static int __init test_gen_kretprobe_cmd goto delete; } out: + kfree(buf); return ret; delete: /* We got an error after creating the event, delete it */ ret = kprobe_event_delete("gen_kretprobe_test"); - free: - kfree(buf); - goto out; }
From: Li Qiang liq3ea@163.com
commit 4a6f316d6855a434f56dbbeba05e14c01acde8f8 upstream.
In aggregate kprobe case, when arm_kprobe failed, we need set the kp->flags with KPROBE_FLAG_DISABLED again. If not, the 'kp' kprobe will been considered as enabled but it actually not enabled.
Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/
Fixes: 12310e343755 ("kprobes: Propagate error from arm_kprobe_ftrace()") Cc: stable@vger.kernel.org Signed-off-by: Li Qiang liq3ea@163.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kprobes.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2425,8 +2425,11 @@ int enable_kprobe(struct kprobe *kp) if (!kprobes_all_disarmed && kprobe_disabled(p)) { p->flags &= ~KPROBE_FLAG_DISABLED; ret = arm_kprobe(p); - if (ret) + if (ret) { p->flags |= KPROBE_FLAG_DISABLED; + if (p != kp) + kp->flags |= KPROBE_FLAG_DISABLED; + } } out: mutex_unlock(&kprobe_mutex);
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 7433632c9ff68a991bd0bc38cabf354e9d2de410 upstream.
On some machines the number of listed CPUs may be bigger than the actual CPUs that exist. The tracing subsystem allocates a per_cpu directory with access to the per CPU ring buffer via a cpuX file. But to save space, the ring buffer will only allocate buffers for online CPUs, even though the CPU array will be as big as the nr_cpu_ids.
With the addition of waking waiters on the ring buffer when closing the file, the ring_buffer_wake_waiters() now needs to make sure that the buffer is allocated (with the irq_work allocated with it) before trying to wake waiters, as it will cause a NULL pointer dereference.
While debugging this, I added a NULL check for the buffer itself (which is OK to do), and also NULL pointer checks against buffer->buffers (which is not fine, and will WARN) as well as making sure the CPU number passed in is within the nr_cpu_ids (which is also not fine if it isn't).
Link: https://lore.kernel.org/all/87h6zklb6n.wl-tiwai@suse.de/ Link: https://lore.kernel.org/all/CAM6Wdxc0KRJMXVAA0Y=u6Jh2V=uWB-_Fn6M4xRuNppfXzL1... Link: https://lkml.kernel.org/linux-trace-kernel/20221101191009.1e7378c8@rorschach...
Cc: stable@vger.kernel.org Cc: Steven Noonan steven.noonan@gmail.com Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1204705 Reported-by: Takashi Iwai tiwai@suse.de Reported-by: Roland Ruckerbauer roland.rucky@gmail.com Fixes: f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file") Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ring_buffer.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -937,6 +937,9 @@ void ring_buffer_wake_waiters(struct tra struct ring_buffer_per_cpu *cpu_buffer; struct rb_irq_work *rbwork;
+ if (!buffer) + return; + if (cpu == RING_BUFFER_ALL_CPUS) {
/* Wake up individual ones too. One level recursion */ @@ -945,7 +948,15 @@ void ring_buffer_wake_waiters(struct tra
rbwork = &buffer->irq_work; } else { + if (WARN_ON_ONCE(!buffer->buffers)) + return; + if (WARN_ON_ONCE(cpu >= nr_cpu_ids)) + return; + cpu_buffer = buffer->buffers[cpu]; + /* The CPU buffer may not have been initialized yet */ + if (!cpu_buffer) + return; rbwork = &cpu_buffer->irq_work; }
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit b3f4f51ea68a495f8a5956064c33dce711a2df91 upstream.
The C standard says that memcmp() must treat the buffers as consisting of "unsigned chars". If char happens to be unsigned, the casts are ok, but then obviously the c1 variable can never contain a negative value. And when char is signed, the casts are wrong, and there's still a problem with using an 8-bit quantity to hold the difference, because that can range from -255 to +255.
For example, assuming char is signed, comparing two 1-byte buffers, one containing 0x00 and another 0x80, the current implementation would return -128 for both memcmp(a, b, 1) and memcmp(b, a, 1), whereas one of those should of course return something positive.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/include/nolibc/string.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/include/nolibc/string.h +++ b/tools/include/nolibc/string.h @@ -19,9 +19,9 @@ static __attribute__((unused)) int memcmp(const void *s1, const void *s2, size_t n) { size_t ofs = 0; - char c1 = 0; + int c1 = 0;
- while (ofs < n && !(c1 = ((char *)s1)[ofs] - ((char *)s2)[ofs])) { + while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) { ofs++; } return c1;
From: Zheng Yejian zhengyejian1@huawei.com
commit a635beeacc6d56d2b71c39e6c0103f85b53d108e upstream.
After commit 4f36c2d85ced ("tracing: Increase tracing map KEYS_MAX size"), 'keys' supports up to three fields.
Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/r/20221017103806.2479139-1-zhengyejian1@huawei.com Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/trace/histogram.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -39,7 +39,7 @@ Documentation written by Tom Zanussi will use the event's kernel stacktrace as the key. The keywords 'keys' or 'key' can be used to specify keys, and the keywords 'values', 'vals', or 'val' can be used to specify values. Compound - keys consisting of up to two fields can be specified by the 'keys' + keys consisting of up to three fields can be specified by the 'keys' keyword. Hashing a compound key produces a unique entry in the table for each unique combination of component keys, and can be useful for providing more fine-grained summaries of event data.
From: Gaosheng Cui cuigaosheng1@huawei.com
commit 8cf0a1bc12870d148ae830a4ba88cfdf0e879cee upstream.
In cap_inode_getsecurity(), we will use vfs_getxattr_alloc() to complete the memory allocation of tmpbuf, if we have completed the memory allocation of tmpbuf, but failed to call handler->get(...), there will be a memleak in below logic:
|-- ret = (int)vfs_getxattr_alloc(mnt_userns, ...) | /* ^^^ alloc for tmpbuf */ |-- value = krealloc(*xattr_value, error + 1, flags) | /* ^^^ alloc memory */ |-- error = handler->get(handler, ...) | /* error! */ |-- *xattr_value = value | /* xattr_value is &tmpbuf (memory leak!) */
So we will try to free(tmpbuf) after vfs_getxattr_alloc() fails to fix it.
Cc: stable@vger.kernel.org Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Acked-by: Serge Hallyn serge@hallyn.com [PM: subject line and backtrace tweaks] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/commoncap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/security/commoncap.c +++ b/security/commoncap.c @@ -401,8 +401,10 @@ int cap_inode_getsecurity(struct user_na &tmpbuf, size, GFP_NOFS); dput(dentry);
- if (ret < 0 || !tmpbuf) - return ret; + if (ret < 0 || !tmpbuf) { + size = ret; + goto out_free; + }
fs_ns = inode->i_sb->s_user_ns; cap = (struct vfs_cap_data *) tmpbuf;
From: Miklos Szeredi mszeredi@redhat.com
commit 4a6f278d4827b59ba26ceae0ff4529ee826aa258 upstream.
Add missing file_modified() call to fuse_file_fallocate(). Without this fallocate on fuse failed to clear privileges.
Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation") Cc: stable@vger.kernel.org Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/file.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3001,6 +3001,10 @@ static long fuse_file_fallocate(struct f goto out; }
+ err = file_modified(file); + if (err) + goto out; + if (!(mode & FALLOC_FL_KEEP_SIZE)) set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
From: Miklos Szeredi mszeredi@redhat.com
commit 9fa248c65bdbf5af0a2f74dd38575acfc8dfd2bf upstream.
There's a race in fuse's readdir cache that can result in an uninitilized page being read. The page lock is supposed to prevent this from happening but in the following case it doesn't:
Two fuse_add_dirent_to_cache() start out and get the same parameters (size=0,offset=0). One of them wins the race to create and lock the page, after which it fills in data, sets rdc.size and unlocks the page.
In the meantime the page gets evicted from the cache before the other instance gets to run. That one also creates the page, but finds the size to be mismatched, bails out and leaves the uninitialized page in the cache.
Fix by marking a filled page uptodate and ignoring non-uptodate pages.
Reported-by: Frank Sorenson fsorenso@redhat.com Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/readdir.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -77,8 +77,10 @@ static void fuse_add_dirent_to_cache(str goto unlock;
addr = kmap_local_page(page); - if (!offset) + if (!offset) { clear_page(addr); + SetPageUptodate(page); + } memcpy(addr + offset, dirent, reclen); kunmap_local(addr); fi->rdc.size = (index << PAGE_SHIFT) + offset + reclen; @@ -516,6 +518,12 @@ retry_locked:
page = find_get_page_flags(file->f_mapping, index, FGP_ACCESSED | FGP_LOCK); + /* Page gone missing, then re-added to cache, but not initialized? */ + if (page && !PageUptodate(page)) { + unlock_page(page); + put_page(page); + page = NULL; + } spin_lock(&fi->rdc.lock); if (!page) { /*
From: Mickaël Salaün mic@digikod.net
commit 091873e47ef700e935aa80079b63929af599a0b2 upstream.
The only (forced) static test binary doesn't depend on libcap. Because using -lcap on systems that don't have such static library would fail (e.g. on Arch Linux), let's be more specific and require only dynamic libcap linking.
Fixes: a52540522c95 ("selftests/landlock: Fix out-of-tree builds") Cc: Anders Roxell anders.roxell@linaro.org Cc: Guillaume Tucker guillaume.tucker@collabora.com Cc: Mark Brown broonie@kernel.org Cc: Shuah Khan skhan@linuxfoundation.org Cc: stable@vger.kernel.org Signed-off-by: Mickaël Salaün mic@digikod.net Link: https://lore.kernel.org/r/20221019200536.2771316-1-mic@digikod.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/landlock/Makefile | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/landlock/Makefile b/tools/testing/selftests/landlock/Makefile index 6632bfff486b..348e2dbdb4e0 100644 --- a/tools/testing/selftests/landlock/Makefile +++ b/tools/testing/selftests/landlock/Makefile @@ -3,7 +3,6 @@ # First run: make -C ../../../.. headers_install
CFLAGS += -Wall -O2 $(KHDR_INCLUDES) -LDLIBS += -lcap
LOCAL_HDRS += common.h
@@ -13,10 +12,12 @@ TEST_GEN_PROGS := $(src_test:.c=)
TEST_GEN_PROGS_EXTENDED := true
-# Static linking for short targets: +# Short targets: +$(TEST_GEN_PROGS): LDLIBS += -lcap $(TEST_GEN_PROGS_EXTENDED): LDFLAGS += -static
include ../lib.mk
-# Static linking for targets with $(OUTPUT)/ prefix: +# Targets with $(OUTPUT)/ prefix: +$(TEST_GEN_PROGS): LDLIBS += -lcap $(TEST_GEN_PROGS_EXTENDED): LDFLAGS += -static
From: Ard Biesheuvel ardb@kernel.org
commit 161a438d730dade2ba2b1bf8785f0759aba4ca5f upstream.
We no longer need at least 64 bytes of random seed to permit the early crng init to complete. The RNG is now based on Blake2s, so reduce the EFI seed size to the Blake2s hash size, which is sufficient for our purposes.
While at it, drop the READ_ONCE(), which was supposed to prevent size from being evaluated after seed was unmapped. However, this cannot actually happen, so READ_ONCE() is unnecessary here.
Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Jason A. Donenfeld Jason@zx2c4.com Acked-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/efi.c | 2 +- include/linux/efi.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -608,7 +608,7 @@ int __init efi_config_parse_tables(const
seed = early_memremap(efi_rng_seed, sizeof(*seed)); if (seed != NULL) { - size = READ_ONCE(seed->size); + size = min(seed->size, EFI_RANDOM_SEED_SIZE); early_memunmap(seed, sizeof(*seed)); } else { pr_err("Could not map UEFI random seed!\n"); --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1192,7 +1192,7 @@ efi_status_t efi_random_get_seed(void); arch_efi_call_virt_teardown(); \ })
-#define EFI_RANDOM_SEED_SIZE 64U +#define EFI_RANDOM_SEED_SIZE 32U // BLAKE2S_HASH_SIZE
struct linux_efi_random_seed { u32 size;
From: Ard Biesheuvel ardb@kernel.org
commit 7d866e38c7e9ece8a096d0d098fa9d92b9d4f97e upstream.
EFI runtime services data is guaranteed to be preserved by the OS, making it a suitable candidate for the EFI random seed table, which may be passed to kexec kernels as well (after refreshing the seed), and so we need to ensure that the memory is preserved without support from the OS itself.
However, runtime services data is intended for allocations that are relevant to the implementations of the runtime services themselves, and so they are unmapped from the kernel linear map, and mapped into the EFI page tables that are active while runtime service invocations are in progress. None of this is needed for the RNG seed.
So let's switch to EFI 'ACPI reclaim' memory: in spite of the name, there is nothing exclusively ACPI about it, it is simply a type of allocation that carries firmware provided data which may or may not be relevant to the OS, and it is left up to the OS to decide whether to reclaim it after having consumed its contents.
Given that in Linux, we never reclaim these allocations, it is a good choice for the EFI RNG seed, as the allocation is guaranteed to survive kexec reboots.
One additional reason for changing this now is to align it with the upcoming recommendation for EFI bootloader provided RNG seeds, which must not use EFI runtime services code/data allocations.
Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/libstub/random.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -75,7 +75,12 @@ efi_status_t efi_random_get_seed(void) if (status != EFI_SUCCESS) return status;
- status = efi_bs_call(allocate_pool, EFI_RUNTIME_SERVICES_DATA, + /* + * Use EFI_ACPI_RECLAIM_MEMORY here so that it is guaranteed that the + * allocation will survive a kexec reboot (although we refresh the seed + * beforehand) + */ + status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY, sizeof(*seed) + EFI_RANDOM_SEED_SIZE, (void **)&seed); if (status != EFI_SUCCESS)
From: Ard Biesheuvel ardb@kernel.org
commit f11a74b45d330ad1ab986852b099747161052526 upstream.
Commit 8a254d90a775 ("efi: efivars: Fix variable writes without query_variable_store()") addressed an issue that was introduced during the EFI variable store refactor, where alternative implementations of the efivars layer that lacked query_variable_store() would no longer work.
Unfortunately, there is another case to consider here, which was missed: if the efivars layer is backed by the EFI runtime services as usual, but the EFI implementation predates the introduction of QueryVariableInfo(), we will return EFI_UNSUPPORTED, and this is no longer being dealt with correctly.
So let's fix this, and while at it, clean up the code a bit, by merging the check_var_size() routines as well as their callers.
Cc: stable@vger.kernel.org # v6.0 Fixes: bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer") Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Aditya Garg gargaditya08@live.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/vars.c | 68 +++++++++++-------------------------- 1 file changed, 20 insertions(+), 48 deletions(-)
diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c index 433b61587139..0ba9f18312f5 100644 --- a/drivers/firmware/efi/vars.c +++ b/drivers/firmware/efi/vars.c @@ -21,29 +21,22 @@ static struct efivars *__efivars;
static DEFINE_SEMAPHORE(efivars_lock);
-static efi_status_t check_var_size(u32 attributes, unsigned long size) -{ - const struct efivar_operations *fops; - - fops = __efivars->ops; - - if (!fops->query_variable_store) - return (size <= SZ_64K) ? EFI_SUCCESS : EFI_OUT_OF_RESOURCES; - - return fops->query_variable_store(attributes, size, false); -} - -static -efi_status_t check_var_size_nonblocking(u32 attributes, unsigned long size) +static efi_status_t check_var_size(bool nonblocking, u32 attributes, + unsigned long size) { const struct efivar_operations *fops; + efi_status_t status;
fops = __efivars->ops;
if (!fops->query_variable_store) + status = EFI_UNSUPPORTED; + else + status = fops->query_variable_store(attributes, size, + nonblocking); + if (status == EFI_UNSUPPORTED) return (size <= SZ_64K) ? EFI_SUCCESS : EFI_OUT_OF_RESOURCES; - - return fops->query_variable_store(attributes, size, true); + return status; }
/** @@ -195,26 +188,6 @@ efi_status_t efivar_get_next_variable(unsigned long *name_size, } EXPORT_SYMBOL_NS_GPL(efivar_get_next_variable, EFIVAR);
-/* - * efivar_set_variable_blocking() - local helper function for set_variable - * - * Must be called with efivars_lock held. - */ -static efi_status_t -efivar_set_variable_blocking(efi_char16_t *name, efi_guid_t *vendor, - u32 attr, unsigned long data_size, void *data) -{ - efi_status_t status; - - if (data_size > 0) { - status = check_var_size(attr, data_size + - ucs2_strsize(name, 1024)); - if (status != EFI_SUCCESS) - return status; - } - return __efivars->ops->set_variable(name, vendor, attr, data_size, data); -} - /* * efivar_set_variable_locked() - set a variable identified by name/vendor * @@ -228,23 +201,21 @@ efi_status_t efivar_set_variable_locked(efi_char16_t *name, efi_guid_t *vendor, efi_set_variable_t *setvar; efi_status_t status;
- if (!nonblocking) - return efivar_set_variable_blocking(name, vendor, attr, - data_size, data); + if (data_size > 0) { + status = check_var_size(nonblocking, attr, + data_size + ucs2_strsize(name, 1024)); + if (status != EFI_SUCCESS) + return status; + }
/* * If no _nonblocking variant exists, the ordinary one * is assumed to be non-blocking. */ - setvar = __efivars->ops->set_variable_nonblocking ?: - __efivars->ops->set_variable; + setvar = __efivars->ops->set_variable_nonblocking; + if (!setvar || !nonblocking) + setvar = __efivars->ops->set_variable;
- if (data_size > 0) { - status = check_var_size_nonblocking(attr, data_size + - ucs2_strsize(name, 1024)); - if (status != EFI_SUCCESS) - return status; - } return setvar(name, vendor, attr, data_size, data); } EXPORT_SYMBOL_NS_GPL(efivar_set_variable_locked, EFIVAR); @@ -264,7 +235,8 @@ efi_status_t efivar_set_variable(efi_char16_t *name, efi_guid_t *vendor, if (efivar_lock()) return EFI_ABORTED;
- status = efivar_set_variable_blocking(name, vendor, attr, data_size, data); + status = efivar_set_variable_locked(name, vendor, attr, data_size, + data, false); efivar_unlock(); return status; }
From: Pavel Begunkov asml.silence@gmail.com
commit e276d62dcfdee6582486e8b8344dd869518e14be upstream.
Remove SOCK_SUPPORT_ZC when we're setting ulp as it might not support msghdr::ubuf_info, e.g. like TLS replacing ->sk_prot with a new set of handlers.
Cc: stable@vger.kernel.org # 6.0 Reported-by: Jakub Kicinski kuba@kernel.org Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_ulp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c index 7c27aa629af1..9ae50b1bd844 100644 --- a/net/ipv4/tcp_ulp.c +++ b/net/ipv4/tcp_ulp.c @@ -136,6 +136,9 @@ static int __tcp_set_ulp(struct sock *sk, const struct tcp_ulp_ops *ulp_ops) if (icsk->icsk_ulp_ops) goto out_err;
+ if (sk->sk_socket) + clear_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); + err = ulp_ops->init(sk); if (err) goto out_err;
From: Mark Rutland mark.rutland@arm.com
commit 024f4b2e1f874934943eb2d3d288ebc52c79f55c upstream.
The cortex_a76_erratum_1463225_debug_handler() function is called when handling debug exceptions (and synchronous exceptions from BRK instructions), and so is called when a probed function executes. If the compiler does not inline cortex_a76_erratum_1463225_debug_handler(), it can be probed.
If cortex_a76_erratum_1463225_debug_handler() is probed, any debug exception or software breakpoint exception will result in recursive exceptions leading to a stack overflow. This can be triggered with the ftrace multiple_probes selftest, and as per the example splat below.
This is a regression caused by commit:
6459b8469753e9fe ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround")
... which removed the NOKPROBE_SYMBOL() annotation associated with the function.
My intent was that cortex_a76_erratum_1463225_debug_handler() would be inlined into its caller, el1_dbg(), which is marked noinstr and cannot be probed. Mark cortex_a76_erratum_1463225_debug_handler() as __always_inline to ensure this.
Example splat prior to this patch (with recursive entries elided):
| # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events | # echo p do_el0_svc >> /sys/kernel/debug/tracing/kprobe_events | # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable | Insufficient stack space to handle exception! | ESR: 0x0000000096000047 -- DABT (current EL) | FAR: 0xffff800009cefff0 | Task stack: [0xffff800009cf0000..0xffff800009cf4000] | IRQ stack: [0xffff800008000000..0xffff800008004000] | Overflow stack: [0xffff00007fbc00f0..0xffff00007fbc10f0] | CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2 | Hardware name: linux,dummy-virt (DT) | pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : arm64_enter_el1_dbg+0x4/0x20 | lr : el1_dbg+0x24/0x5c | sp : ffff800009cf0000 | x29: ffff800009cf0000 x28: ffff000002c74740 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: 00000000604003c5 x22: ffff80000801745c x21: 0000aaaac95ac068 | x20: 00000000f2000004 x19: ffff800009cf0040 x18: 0000000000000000 | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 | x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 | x11: 0000000000000010 x10: ffff800008c87190 x9 : ffff800008ca00d0 | x8 : 000000000000003c x7 : 0000000000000000 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000000043a4 | x2 : 00000000f2000004 x1 : 00000000f2000004 x0 : ffff800009cf0040 | Kernel panic - not syncing: kernel stack overflow | CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0xe4/0x104 | show_stack+0x18/0x4c | dump_stack_lvl+0x64/0x7c | dump_stack+0x18/0x38 | panic+0x14c/0x338 | test_taint+0x0/0x2c | panic_bad_stack+0x104/0x118 | handle_bad_stack+0x34/0x48 | __bad_stack+0x78/0x7c | arm64_enter_el1_dbg+0x4/0x20 | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 ... | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 ... | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | do_el0_svc+0x0/0x28 | el0t_64_sync_handler+0x84/0xf0 | el0t_64_sync+0x18c/0x190 | Kernel Offset: disabled | CPU features: 0x0080,00005021,19001080 | Memory Limit: none | ---[ end Kernel panic - not syncing: kernel stack overflow ]---
With this patch, cortex_a76_erratum_1463225_debug_handler() is inlined into el1_dbg(), and el1_dbg() cannot be probed:
| # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events | sh: write error: No such file or directory | # grep -w cortex_a76_erratum_1463225_debug_handler /proc/kallsyms | wc -l | 0 | # echo p el1_dbg > /sys/kernel/debug/tracing/kprobe_events | sh: write error: Invalid argument | # grep -w el1_dbg /proc/kallsyms | wc -l | 1
Fixes: 6459b8469753 ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround") Cc: stable@vger.kernel.org # 5.12.x Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20221017090157.2881408-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/entry-common.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -329,7 +329,8 @@ static void cortex_a76_erratum_1463225_s __this_cpu_write(__in_cortex_a76_erratum_1463225_wa, 0); }
-static bool cortex_a76_erratum_1463225_debug_handler(struct pt_regs *regs) +static __always_inline bool +cortex_a76_erratum_1463225_debug_handler(struct pt_regs *regs) { if (!__this_cpu_read(__in_cortex_a76_erratum_1463225_wa)) return false;
From: Petr Benes petr.benes@ysoft.com
commit 5e67d47d0b010f0704aca469d6d27637b1dcb2ce upstream.
Fix our design flaw in supply voltage distribution on the Quad and QuadPlus based boards.
The problem is that we supply the SoC cache (VDD_CACHE_CAP) from VDD_PU instead of VDD_SOC. The VDD_PU internal regulator can be disabled by PM if VPU or GPU is not used. If that happens the system freezes. To prevent that configure the reg_pu regulator to be always on.
Fixes: 0de4ab81ab26 ("ARM: dts: imx6dl-yapp4: Add Y Soft IOTA Crux/Crux+ board") Cc: petrben@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Petr Benes petr.benes@ysoft.com Signed-off-by: Michal Vokáč michal.vokac@ysoft.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/imx6q-yapp4-crux.dts | 4 ++++ arch/arm/boot/dts/imx6qp-yapp4-crux-plus.dts | 4 ++++ 2 files changed, 8 insertions(+)
diff --git a/arch/arm/boot/dts/imx6q-yapp4-crux.dts b/arch/arm/boot/dts/imx6q-yapp4-crux.dts index 15f4824a5142..bddf3822ebf7 100644 --- a/arch/arm/boot/dts/imx6q-yapp4-crux.dts +++ b/arch/arm/boot/dts/imx6q-yapp4-crux.dts @@ -33,6 +33,10 @@ &oled_1309 { status = "okay"; };
+®_pu { + regulator-always-on; +}; + ®_usb_h1_vbus { status = "okay"; }; diff --git a/arch/arm/boot/dts/imx6qp-yapp4-crux-plus.dts b/arch/arm/boot/dts/imx6qp-yapp4-crux-plus.dts index cea165f2161a..afaf4a6759d4 100644 --- a/arch/arm/boot/dts/imx6qp-yapp4-crux-plus.dts +++ b/arch/arm/boot/dts/imx6qp-yapp4-crux-plus.dts @@ -33,6 +33,10 @@ &oled_1309 { status = "okay"; };
+®_pu { + regulator-always-on; +}; + ®_usb_h1_vbus { status = "okay"; };
From: Kan Liang kan.liang@linux.intel.com
commit acc5568b90c19ac6375508a93b9676cd18a92a35 upstream.
According to the latest event list, update the MEM_INST_RETIRED events which support the DataLA facility.
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Reported-by: Jannis Klinkenberg jannis.klinkenberg@rwth-aachen.de Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154119.571386-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/ds.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -982,8 +982,13 @@ struct event_constraint intel_icl_pebs_e INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL), /* SLOTS */
INTEL_PLD_CONSTRAINT(0x1cd, 0xff), /* MEM_TRANS_RETIRED.LOAD_LATENCY */ - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf), /* MEM_INST_RETIRED.LOAD */ - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf), /* MEM_INST_RETIRED.STORE */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED.LOCK_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_INST_RETIRED.SPLIT_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_INST_RETIRED.SPLIT_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_INST_RETIRED.ALL_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_INST_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf), /* MEM_LOAD_*_RETIRED.* */
From: Kan Liang kan.liang@linux.intel.com
commit 6f8faf471446844bb9c318e0340221049d5c19f4 upstream.
The intel_pebs_isolation quirk checks both model number and stepping. Cooper Lake has a different stepping (11) than the other Skylake Xeon. It cannot benefit from the optimization in commit 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering").
Add the stepping of Cooper Lake into the isolation_ucodes[] table.
Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154550.571663-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4891,6 +4891,7 @@ static const struct x86_cpu_desc isolati INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 5, 0x00000000), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 6, 0x00000000), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 7, 0x00000000), + INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 11, 0x00000000), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_L, 3, 0x0000007c), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE, 3, 0x0000007c), INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE, 9, 0x0000004e),
From: Kan Liang kan.liang@linux.intel.com
commit 0916886bb978e7eae1ca3955ba07f51c020da20c upstream.
According to the latest event list, update the MEM_INST_RETIRED events which support the DataLA facility for SPR.
Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154119.571386-2-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/ds.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1009,8 +1009,13 @@ struct event_constraint intel_spr_pebs_e INTEL_FLAGS_EVENT_CONSTRAINT(0xc0, 0xfe), INTEL_PLD_CONSTRAINT(0x1cd, 0xfe), INTEL_PSD_CONSTRAINT(0x2cd, 0x1), - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf), - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf), + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED.LOCK_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_INST_RETIRED.SPLIT_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_INST_RETIRED.SPLIT_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_INST_RETIRED.ALL_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_INST_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf),
From: Pavel Begunkov asml.silence@gmail.com
commit fee9ac06647e59a69fb7aec58f25267c134264b4 upstream.
sockmap replaces ->sk_prot with its own callbacks, we should remove SOCK_SUPPORT_ZC as the new proto doesn't support msghdr::ubuf_info.
Cc: stable@vger.kernel.org # 6.0 Reported-by: Jakub Kicinski kuba@kernel.org Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/sock.h | 7 +++++++ net/ipv4/tcp_bpf.c | 4 ++-- net/ipv4/udp_bpf.c | 4 ++-- net/unix/unix_bpf.c | 8 ++++---- 4 files changed, 15 insertions(+), 8 deletions(-)
--- a/include/net/sock.h +++ b/include/net/sock.h @@ -1871,6 +1871,13 @@ void sock_kfree_s(struct sock *sk, void void sock_kzfree_s(struct sock *sk, void *mem, int size); void sk_send_sigurg(struct sock *sk);
+static inline void sock_replace_proto(struct sock *sk, struct proto *proto) +{ + if (sk->sk_socket) + clear_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); + WRITE_ONCE(sk->sk_prot, proto); +} + struct sockcm_cookie { u64 transmit_time; u32 mark; --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -607,7 +607,7 @@ int tcp_bpf_update_proto(struct sock *sk } else { sk->sk_write_space = psock->saved_write_space; /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, psock->sk_proto); + sock_replace_proto(sk, psock->sk_proto); } return 0; } @@ -620,7 +620,7 @@ int tcp_bpf_update_proto(struct sock *sk }
/* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); + sock_replace_proto(sk, &tcp_bpf_prots[family][config]); return 0; } EXPORT_SYMBOL_GPL(tcp_bpf_update_proto); --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -141,14 +141,14 @@ int udp_bpf_update_proto(struct sock *sk
if (restore) { sk->sk_write_space = psock->saved_write_space; - WRITE_ONCE(sk->sk_prot, psock->sk_proto); + sock_replace_proto(sk, psock->sk_proto); return 0; }
if (sk->sk_family == AF_INET6) udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
- WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]); + sock_replace_proto(sk, &udp_bpf_prots[family]); return 0; } EXPORT_SYMBOL_GPL(udp_bpf_update_proto); --- a/net/unix/unix_bpf.c +++ b/net/unix/unix_bpf.c @@ -145,12 +145,12 @@ int unix_dgram_bpf_update_proto(struct s
if (restore) { sk->sk_write_space = psock->saved_write_space; - WRITE_ONCE(sk->sk_prot, psock->sk_proto); + sock_replace_proto(sk, psock->sk_proto); return 0; }
unix_dgram_bpf_check_needs_rebuild(psock->sk_proto); - WRITE_ONCE(sk->sk_prot, &unix_dgram_bpf_prot); + sock_replace_proto(sk, &unix_dgram_bpf_prot); return 0; }
@@ -158,12 +158,12 @@ int unix_stream_bpf_update_proto(struct { if (restore) { sk->sk_write_space = psock->saved_write_space; - WRITE_ONCE(sk->sk_prot, psock->sk_proto); + sock_replace_proto(sk, psock->sk_proto); return 0; }
unix_stream_bpf_check_needs_rebuild(psock->sk_proto); - WRITE_ONCE(sk->sk_prot, &unix_stream_bpf_prot); + sock_replace_proto(sk, &unix_stream_bpf_prot); return 0; }
From: Stefan Metzmacher metze@samba.org
commit 71b7786ea478f3c4611deff4d2b9676b0c17c56b upstream.
Without this only the client initiated tcp sockets have SOCK_SUPPORT_ZC. The listening socket on the server also has it, but the accepted connections didn't, which meant IORING_OP_SEND[MSG]_ZC will always fails with -EOPNOTSUPP.
Fixes: e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy") Cc: stable@vger.kernel.org # 6.0 CC: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/io-uring/20221024141503.22b4e251@kernel.org/T/#m38aa... Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/af_inet.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -748,6 +748,8 @@ int inet_accept(struct socket *sock, str (TCPF_ESTABLISHED | TCPF_SYN_RECV | TCPF_CLOSE_WAIT | TCPF_CLOSE)));
+ if (test_bit(SOCK_SUPPORT_ZC, &sock->flags)) + set_bit(SOCK_SUPPORT_ZC, &newsock->flags); sock_graft(sk2, newsock);
newsock->state = SS_CONNECTED;
From: Helge Deller deller@gmx.de
commit e8a18e3f00f3ee8d07c17ab1ea3ad4df4a3b6fe0 upstream.
Although the name of the driver 8250_gsc.c suggests that it handles only serial ports on the GSC bus, it does handle serial ports listed in the parisc machine inventory as well, e.g. the serial ports in a C8000 PCI-only workstation.
Change the dependency to CONFIG_PARISC, so that the driver gets included in the kernel even if CONFIG_GSC isn't set.
Reported-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/8250/Kconfig +++ b/drivers/tty/serial/8250/Kconfig @@ -118,7 +118,7 @@ config SERIAL_8250_CONSOLE
config SERIAL_8250_GSC tristate - depends on SERIAL_8250 && GSC + depends on SERIAL_8250 && PARISC default SERIAL_8250
config SERIAL_8250_DMA
From: Helge Deller deller@gmx.de
commit a0c9f1f2e53b8eb2ae43987a30e547ba56b4fa18 upstream.
The parisc serial port driver needs this symbol when it's compiled as module.
Signed-off-by: Helge Deller deller@gmx.de Reported-by: kernel test robot lkp@intel.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/iosapic.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/parisc/iosapic.c +++ b/drivers/parisc/iosapic.c @@ -866,6 +866,7 @@ int iosapic_serial_irq(struct parisc_dev
return vi->txn_irq; } +EXPORT_SYMBOL(iosapic_serial_irq); #endif
From: Helge Deller deller@gmx.de
commit 2b6ae0962b421103feb41a80406732944b0665b3 upstream.
Avoid that the hardware path is shown twice in the kernel log, and clean up the output of the version numbers to show up in the same order as they are listed in the hardware database in the hardware.c file. Additionally, optimize the memory footprint of the hardware database and mark some code as init code.
Fixes: cab56b51ec0e ("parisc: Fix device names in /proc/iomem") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/include/asm/hardware.h | 12 ++++++------ arch/parisc/kernel/drivers.c | 14 ++++++-------- 2 files changed, 12 insertions(+), 14 deletions(-)
--- a/arch/parisc/include/asm/hardware.h +++ b/arch/parisc/include/asm/hardware.h @@ -10,12 +10,12 @@ #define SVERSION_ANY_ID PA_SVERSION_ANY_ID
struct hp_hardware { - unsigned short hw_type:5; /* HPHW_xxx */ - unsigned short hversion; - unsigned long sversion:28; - unsigned short opt; - const char name[80]; /* The hardware description */ -}; + unsigned int hw_type:8; /* HPHW_xxx */ + unsigned int hversion:12; + unsigned int sversion:12; + unsigned char opt; + unsigned char name[59]; /* The hardware description */ +} __packed;
struct parisc_device;
--- a/arch/parisc/kernel/drivers.c +++ b/arch/parisc/kernel/drivers.c @@ -882,15 +882,13 @@ void __init walk_central_bus(void) &root); }
-static void print_parisc_device(struct parisc_device *dev) +static __init void print_parisc_device(struct parisc_device *dev) { - char hw_path[64]; - static int count; + static int count __initdata;
- print_pa_hwpath(dev, hw_path); - pr_info("%d. %s at %pap [%s] { %d, 0x%x, 0x%.3x, 0x%.5x }", - ++count, dev->name, &(dev->hpa.start), hw_path, dev->id.hw_type, - dev->id.hversion_rev, dev->id.hversion, dev->id.sversion); + pr_info("%d. %s at %pap { type:%d, hv:%#x, sv:%#x, rev:%#x }", + ++count, dev->name, &(dev->hpa.start), dev->id.hw_type, + dev->id.hversion, dev->id.sversion, dev->id.hversion_rev);
if (dev->num_addrs) { int k; @@ -1079,7 +1077,7 @@ static __init int qemu_print_iodc_data(s
-static int print_one_device(struct device * dev, void * data) +static __init int print_one_device(struct device * dev, void * data) { struct parisc_device * pdev = to_parisc_device(dev);
From: Ye Bin yebin10@huawei.com
commit 1b8f787ef547230a3249bcf897221ef0cc78481b upstream.
Syzkaller report issue as follows: EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=0 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=0 EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks ------------[ cut here ]------------ WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524 Modules linked in: CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528 RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296 RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00 RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000 RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5 R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000 R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740 FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461 mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589 ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852 do_writepages+0x3c3/0x680 mm/page-writeback.c:2469 __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870 wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK>
Above issue may happens as follows: ext4_da_write_begin ext4_create_inline_data ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); __ext4_ioctl ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag ext4_da_write_begin ext4_da_convert_inline_data_to_extent ext4_da_write_inline_data_begin ext4_da_map_blocks ext4_insert_delayed_block if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk)) ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1 allocated = true; ext4_es_insert_delayed_block(inode, lblk, allocated); ext4_writepages mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1 ext4_es_remove_extent ext4_da_release_space(inode, reserved); if (unlikely(to_free > ei->i_reserved_data_blocks)) -> to_free == 1 but ei->i_reserved_data_blocks == 0 -> then trigger warning as above
To solve above issue, forbid inode do migrate which has inline data.
Cc: stable@kernel.org Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/migrate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -425,7 +425,8 @@ int ext4_ext_migrate(struct inode *inode * already is extent-based, error out. */ if (!ext4_has_feature_extents(inode->i_sb) || - (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) || + ext4_has_inline_data(inode)) return -EINVAL;
if (S_ISLNK(inode->i_mode) && inode->i_blocks == 0)
From: Luís Henriques lhenriques@suse.de
commit 17a0bc9bd697f75cfdf9b378d5eb2d7409c91340 upstream.
The rec_len field in the directory entry has to be a multiple of 4. A corrupted filesystem image can be used to hit a BUG() in ext4_rec_len_to_disk(), called from make_indexed_dir().
------------[ cut here ]------------ kernel BUG at fs/ext4/ext4.h:2413! ... RIP: 0010:make_indexed_dir+0x53f/0x5f0 ... Call Trace: <TASK> ? add_dirent_to_buf+0x1b2/0x200 ext4_add_entry+0x36e/0x480 ext4_add_nondir+0x2b/0xc0 ext4_create+0x163/0x200 path_openat+0x635/0xe90 do_filp_open+0xb4/0x160 ? __create_object.isra.0+0x1de/0x3b0 ? _raw_spin_unlock+0x12/0x30 do_sys_openat2+0x91/0x150 __x64_sys_open+0x6c/0xa0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
The fix simply adds a call to ext4_check_dir_entry() to validate the directory entry, returning -EFSCORRUPTED if the entry is invalid.
CC: stable@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540 Signed-off-by: Luís Henriques lhenriques@suse.de Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/namei.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2259,8 +2259,16 @@ static int make_indexed_dir(handle_t *ha memset(de, 0, len); /* wipe old data */ de = (struct ext4_dir_entry_2 *) data2; top = data2 + len; - while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top) + while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top) { + if (ext4_check_dir_entry(dir, NULL, de, bh2, data2, len, + (data2 + (blocksize - csum_size) - + (char *) de))) { + brelse(bh2); + brelse(bh); + return -EFSCORRUPTED; + } de = de2; + } de->rec_len = ext4_rec_len_to_disk(data2 + (blocksize - csum_size) - (char *) de, blocksize);
From: Theodore Ts'o tytso@mit.edu
commit 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4 upstream.
When expanding a file system using online resize, various fields in the superblock (e.g., s_blocks_count, s_inodes_count, etc.) change. To update the backup superblocks, the online resize uses the function update_backups() in fs/ext4/resize.c. This function was not updating the checksum field in the backup superblocks. This wasn't a big deal previously, because e2fsck didn't care about the checksum field in the backup superblock. (And indeed, update_backups() goes all the way back to the ext3 days, well before we had support for metadata checksums.)
However, there is an alternate, more general way of updating superblock fields, ext4_update_primary_sb() in fs/ext4/ioctl.c. This function does check the checksum of the backup superblock, and if it doesn't match will mark the file system as corrupted. That was clearly not the intent, so avoid to aborting the resize when a bad superblock is found.
In addition, teach update_backups() to properly update the checksum in the backup superblocks. We will eventually want to unify updapte_backups() with the infrasture in ext4_update_primary_sb(), but that's for another day.
Note: The problem has been around for a while; it just didn't really matter until ext4_update_primary_sb() was added by commit bbc605cdb1e1 ("ext4: implement support for get/set fs label"). And it became trivially easy to reproduce after commit 827891a38acc ("ext4: update the s_overhead_clusters in the backup sb's when resizing") in v6.0.
Cc: stable@kernel.org # 5.17+ Fixes: bbc605cdb1e1 ("ext4: implement support for get/set fs label") Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ioctl.c | 3 +-- fs/ext4/resize.c | 5 +++++ 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -145,9 +145,8 @@ static int ext4_update_backup_sb(struct if (ext4_has_metadata_csum(sb) && es->s_checksum != ext4_superblock_csum(sb, es)) { ext4_msg(sb, KERN_ERR, "Invalid checksum for backup " - "superblock %llu\n", sb_block); + "superblock %llu", sb_block); unlock_buffer(bh); - err = -EFSBADCRC; goto out_bh; } func(es, arg); --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1158,6 +1158,7 @@ static void update_backups(struct super_ while (group < sbi->s_groups_count) { struct buffer_head *bh; ext4_fsblk_t backup_block; + struct ext4_super_block *es;
/* Out of journal space, and can't get more - abort - so sad */ err = ext4_resize_ensure_credits_batch(handle, 1); @@ -1186,6 +1187,10 @@ static void update_backups(struct super_ memcpy(bh->b_data, data, size); if (rest) memset(bh->b_data + size, 0, rest); + es = (struct ext4_super_block *) bh->b_data; + es->s_block_group_nr = cpu_to_le16(group); + if (ext4_has_metadata_csum(sb)) + es->s_checksum = ext4_superblock_csum(sb, es); set_buffer_uptodate(bh); unlock_buffer(bh); err = ext4_handle_dirty_metadata(handle, NULL, bh);
From: Dave Hansen dave.hansen@linux.intel.com
commit a6dd6f39008bb3ef7c73ef0a2acc2a4209555bd8 upstream.
The TDG.VP.INFO TDCALL provides the guest with various details about the TDX system that the guest needs to run. Only one field is currently used: 'gpa_width' which tells the guest which PTE bits mark pages shared or private.
A second field is now needed: the guest "TD attributes" to tell if virtualization exceptions are configured in a way that can harm the guest.
Make the naming and calling convention more generic and discrete from the mask-centric one.
Thanks to Sathya for the inspiration here, but there's no code, comments or changelogs left from where he started.
Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Tested-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/coco/tdx/tdx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -98,7 +98,7 @@ static inline void tdx_module_call(u64 f panic("TDCALL %lld failed (Buggy TDX module!)\n", fn); }
-static u64 get_cc_mask(void) +static void tdx_parse_tdinfo(u64 *cc_mask) { struct tdx_module_output out; unsigned int gpa_width; @@ -121,7 +121,7 @@ static u64 get_cc_mask(void) * The highest bit of a guest physical address is the "sharing" bit. * Set it for shared pages and clear it for private pages. */ - return BIT_ULL(gpa_width - 1); + *cc_mask = BIT_ULL(gpa_width - 1); }
/* @@ -758,7 +758,7 @@ void __init tdx_early_init(void) setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
cc_set_vendor(CC_VENDOR_INTEL); - cc_mask = get_cc_mask(); + tdx_parse_tdinfo(&cc_mask); cc_set_mask(cc_mask);
/*
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 373e715e31bf4e0f129befe87613a278fac228d3 upstream.
All normal kernel memory is "TDX private memory". This includes everything from kernel stacks to kernel text. Handling exceptions on arbitrary accesses to kernel memory is essentially impossible because they can happen in horribly nasty places like kernel entry/exit. But, TDX hardware can theoretically _deliver_ a virtualization exception (#VE) on any access to private memory.
But, it's not as bad as it sounds. TDX can be configured to never deliver these exceptions on private memory with a "TD attribute" called ATTR_SEPT_VE_DISABLE. The guest has no way to *set* this attribute, but it can check it.
Ensure ATTR_SEPT_VE_DISABLE is set in early boot. panic() if it is unset. There is no sane way for Linux to run with this attribute clear so a panic() is appropriate.
There's small window during boot before the check where kernel has an early #VE handler. But the handler is only for port I/O and will also panic() as soon as it sees any other #VE, such as a one generated by a private memory access.
[ dhansen: Rewrite changelog and rebase on new tdx_parse_tdinfo(). Add Kirill's tested-by because I made changes since he wrote this. ]
Fixes: 9a22bf6debbf ("x86/traps: Add #VE support for TDX guest") Reported-by: ruogui.ygr@alibaba-inc.com Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Tested-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221028141220.29217-3-kirill.shutemov%40linux.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/coco/tdx/tdx.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
--- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -34,6 +34,8 @@ #define VE_GET_PORT_NUM(e) ((e) >> 16) #define VE_IS_IO_STRING(e) ((e) & BIT(4))
+#define ATTR_SEPT_VE_DISABLE BIT(28) + /* * Wrapper for standard use of __tdx_hypercall with no output aside from * return code. @@ -102,6 +104,7 @@ static void tdx_parse_tdinfo(u64 *cc_mas { struct tdx_module_output out; unsigned int gpa_width; + u64 td_attr;
/* * TDINFO TDX module call is used to get the TD execution environment @@ -109,19 +112,27 @@ static void tdx_parse_tdinfo(u64 *cc_mas * information, etc. More details about the ABI can be found in TDX * Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL * [TDG.VP.INFO]. - * - * The GPA width that comes out of this call is critical. TDX guests - * can not meaningfully run without it. */ tdx_module_call(TDX_GET_INFO, 0, 0, 0, 0, &out);
- gpa_width = out.rcx & GENMASK(5, 0); - /* * The highest bit of a guest physical address is the "sharing" bit. * Set it for shared pages and clear it for private pages. + * + * The GPA width that comes out of this call is critical. TDX guests + * can not meaningfully run without it. */ + gpa_width = out.rcx & GENMASK(5, 0); *cc_mask = BIT_ULL(gpa_width - 1); + + /* + * The kernel can not handle #VE's when accessing normal kernel + * memory. Ensure that no #VE will be delivered for accesses to + * TD-private memory. Only VMM-shared memory (MMIO) will #VE. + */ + td_attr = out.rdx; + if (!(td_attr & ATTR_SEPT_VE_DISABLE)) + panic("TD misconfiguration: SEPT_VE_DISABLE attibute must be set.\n"); }
/*
From: Jiri Olsa olsajiri@gmail.com
commit 9440c42941606af4c379afa3cf8624f0dc43a629 upstream.
With just the forward declaration of the 'struct pt_regs' in syscall_wrapper.h, the syscall stub functions:
__[x64|ia32]_sys_*(struct pt_regs *regs)
will have different definition of 'regs' argument in BTF data based on which object file they are defined in.
If the syscall's object includes 'struct pt_regs' definition, the BTF argument data will point to a 'struct pt_regs' record, like:
[226] STRUCT 'pt_regs' size=168 vlen=21 'r15' type_id=1 bits_offset=0 'r14' type_id=1 bits_offset=64 'r13' type_id=1 bits_offset=128 ...
If not, it will point to a fwd declaration record:
[15439] FWD 'pt_regs' fwd_kind=struct
and make bpf tracing program hooking on those functions unable to access fields from 'struct pt_regs'.
Include asm/ptrace.h directly in syscall_wrapper.h to make sure all syscalls see 'struct pt_regs' definition. This then results in BTF for '__*_sys_*(struct pt_regs *regs)' functions to point to the actual struct, not just the forward declaration.
[ bp: No Fixes tag as this is not really a bug fix but "adjustment" so that BTF is happy. ]
Reported-by: Akihiro HARAI jharai0815@gmail.com Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Andrii Nakryiko andrii@kernel.org Cc: stable@vger.kernel.org # this is needed only for BTF so kernels >= 5.15 Link: https://lore.kernel.org/r/20221018122708.823792-1-jolsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/syscall_wrapper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -6,7 +6,7 @@ #ifndef _ASM_X86_SYSCALL_WRAPPER_H #define _ASM_X86_SYSCALL_WRAPPER_H
-struct pt_regs; +#include <asm/ptrace.h>
extern long __x64_sys_ni_syscall(const struct pt_regs *regs); extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
From: Jim Mattson jmattson@google.com
commit eeb69eab57c6604ac90b3fd8e5ac43f24a5535b1 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.80000006H:EDX[17:16] are reserved bits and should be masked off.
Fixes: 43d05de2bee7 ("KVM: pass through CPUID(0x80000006)") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-2-jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1121,7 +1121,8 @@ static inline int __do_cpuid_func(struct cpuid_entry_override(entry, CPUID_8000_0001_ECX); break; case 0x80000006: - /* L2 cache and TLB: pass through host info. */ + /* Drop reserved bits, pass host L2 cache and TLB info. */ + entry->edx &= ~GENMASK(17, 16); break; case 0x80000007: /* Advanced power management */ /* invariant TSC is CPUID.80000007H:EDX[8] */
From: Jim Mattson jmattson@google.com
commit 079f6889818dd07903fb36c252532ab47ebb6d48 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. In the case of CPUID.8000001AH, only three bits are currently defined. The 125 reserved bits should be masked off.
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-4-jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1171,6 +1171,9 @@ static inline int __do_cpuid_func(struct entry->ecx = entry->edx = 0; break; case 0x8000001a: + entry->eax &= GENMASK(2, 0); + entry->ebx = entry->ecx = entry->edx = 0; + break; case 0x8000001e: break; case 0x8000001F:
From: Jim Mattson jmattson@google.com
commit 7030d8530e533844e2f4b0e7476498afcd324634 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. The following ranges of CPUID.80000008H are reserved and should be masked off: ECX[31:18] ECX[11:8]
In addition, the PerfTscSize field at ECX[17:16] should also be zero because KVM does not set the PERFTSC bit at CPUID.80000001H.ECX[27].
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-3-jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1152,6 +1152,7 @@ static inline int __do_cpuid_func(struct g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8); + entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); entry->edx = 0; cpuid_entry_override(entry, CPUID_8000_0008_EBX); break;
From: Jim Mattson jmattson@google.com
commit 0469e56a14bf8cfb80507e51b7aeec0332cdbc13 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.80000001:EBX[27:16] are reserved bits and should be masked off.
Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management") Signed-off-by: Jim Mattson jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1117,6 +1117,7 @@ static inline int __do_cpuid_func(struct entry->eax = max(entry->eax, 0x80000021); break; case 0x80000001: + entry->ebx &= ~GENMASK(27, 16); cpuid_entry_override(entry, CPUID_8000_0001_EDX); cpuid_entry_override(entry, CPUID_8000_0001_ECX); break;
From: Jim Mattson jmattson@google.com
commit 86c4f0d547f6460d0426ebb3ba0614f1134b8cda upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and should be masked off.
Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-6-jmattson@google.com Cc: stable@vger.kernel.org [Clear NumVMPL too. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1183,7 +1183,8 @@ static inline int __do_cpuid_func(struct entry->eax = entry->ebx = entry->ecx = entry->edx = 0; } else { cpuid_entry_override(entry, CPUID_8000_001F_EAX); - + /* Clear NumVMPL since KVM does not support VMPL. */ + entry->ebx &= ~GENMASK(31, 12); /* * Enumerate '0' for "PA bits reduction", the adjusted * MAXPHYADDR is enumerated directly (see 0x80000008).
From: Sean Christopherson seanjc@google.com
commit 145dfad998eac74abc59219d936e905766ba2d98 upstream.
Advertise LBR support to userspace via MSR_IA32_PERF_CAPABILITIES if and only if perf fully supports LBRs. Perf may disable LBRs (by zeroing the number of LBRs) even on platforms the allegedly support LBRs, e.g. if probing any LBR MSRs during setup fails.
Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Reported-by: Like Xu like.xu.linux@gmail.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20221006000314.73240-3-seanjc@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/capabilities.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -404,6 +404,7 @@ static inline bool vmx_pebs_supported(vo static inline u64 vmx_get_perf_capabilities(void) { u64 perf_cap = PMU_CAP_FW_WRITES; + struct x86_pmu_lbr lbr; u64 host_perf_cap = 0;
if (!enable_pmu) @@ -412,7 +413,8 @@ static inline u64 vmx_get_perf_capabilit if (boot_cpu_has(X86_FEATURE_PDCM)) rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
- perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT; + if (x86_perf_get_lbr(&lbr) >= 0 && lbr.nr) + perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
if (vmx_pebs_supported()) { perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
From: Sean Christopherson seanjc@google.com
commit 18e897d213cb152c786abab14919196bd9dc3a9f upstream.
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only caller. Setting bits only to clear them a few instructions later is rather silly, and splitting the logic makes things seem more complicated than they actually are.
Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single reference to the pair of bits. The extra layer of indirection provides no meaningful value and makes it unnecessarily tedious to understand what KVM is doing.
No functional change.
Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20221006000314.73240-4-seanjc@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/capabilities.h | 15 --------------- arch/x86/kvm/vmx/vmx.c | 12 +++++++----- 2 files changed, 7 insertions(+), 20 deletions(-)
--- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -24,8 +24,6 @@ extern int __read_mostly pt_mode; #define PMU_CAP_FW_WRITES (1ULL << 13) #define PMU_CAP_LBR_FMT 0x3f
-#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) - struct nested_vmx_msrs { /* * We only store the "true" versions of the VMX capability MSRs. We @@ -425,19 +423,6 @@ static inline u64 vmx_get_perf_capabilit return perf_cap; }
-static inline u64 vmx_supported_debugctl(void) -{ - u64 debugctl = 0; - - if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT)) - debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT; - - if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) - debugctl |= DEBUGCTLMSR_LBR_MASK; - - return debugctl; -} - static inline bool cpu_has_notify_vmexit(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2018,13 +2018,15 @@ static u64 nested_vmx_truncate_sysenter_
static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu) { - u64 debugctl = vmx_supported_debugctl(); + u64 debugctl = 0;
- if (!intel_pmu_lbr_is_enabled(vcpu)) - debugctl &= ~DEBUGCTLMSR_LBR_MASK; + if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && + guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)) + debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
- if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)) - debugctl &= ~DEBUGCTLMSR_BUS_LOCK_DETECT; + if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) && + intel_pmu_lbr_is_enabled(vcpu)) + debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
return debugctl; }
From: Sean Christopherson seanjc@google.com
commit b333b8ebb85d62469f32b52fa03fd7d1522afc03 upstream.
Ignore guest CPUID for host userspace writes to the DEBUGCTL MSR, KVM's ABI is that setting CPUID vs. state can be done in any order, i.e. KVM allows userspace to stuff MSRs prior to setting the guest's CPUID that makes the new MSR "legal".
Keep the vmx_get_perf_capabilities() check for guest writes, even though it's technically unnecessary since the vCPU's PERF_CAPABILITIES is consulted when refreshing LBR support. A future patch will clean up vmx_get_perf_capabilities() to avoid the RDMSR on every call, at which point the paranoia will incur no meaningful overhead.
Note, prior to vmx_get_perf_capabilities() checking that the host fully supports LBRs via x86_perf_get_lbr(), KVM effectively relied on intel_pmu_lbr_is_enabled() to guard against host userspace enabling LBRs on platforms without full support.
Fixes: c646236344e9 ("KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled") Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20221006000314.73240-5-seanjc@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2016,16 +2016,16 @@ static u64 nested_vmx_truncate_sysenter_ return (unsigned long)data; }
-static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu) +static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) { u64 debugctl = 0;
if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && - guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)) + (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))) debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) && - intel_pmu_lbr_is_enabled(vcpu)) + (host_initiated || intel_pmu_lbr_is_enabled(vcpu))) debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
return debugctl; @@ -2100,7 +2100,9 @@ static int vmx_set_msr(struct kvm_vcpu * vmcs_writel(GUEST_SYSENTER_ESP, data); break; case MSR_IA32_DEBUGCTLMSR: { - u64 invalid = data & ~vcpu_supported_debugctl(vcpu); + u64 invalid; + + invalid = data & ~vmx_get_supported_debugctl(vcpu, msr_info->host_initiated); if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) { if (report_ignored_msrs) vcpu_unimpl(vcpu, "%s: BTF|LBR in IA32_DEBUGCTLMSR 0x%llx, nop\n",
From: Emanuele Giuseppe Esposito eesposit@redhat.com
commit 1c1a41497ab879ac9608f3047f230af833eeef3d upstream.
Clear enable_sgx if ENCLS-exiting is not supported, i.e. if SGX cannot be virtualized. When KVM is loaded, adjust_vmx_controls checks that the bit is available before enabling the feature; however, other parts of the code check enable_sgx and not clearing the variable caused two different bugs, mostly affecting nested virtualization scenarios.
First, because enable_sgx remained true, SECONDARY_EXEC_ENCLS_EXITING would be marked available in the capability MSR that are accessed by a nested hypervisor. KVM would then propagate the control from vmcs12 to vmcs02 even if it isn't supported by the processor, thus causing an unexpected VM-Fail (exit code 0x7) in L1.
Second, vmx_set_cpu_caps() would not clear the SGX bits when hardware support is unavailable. This is a much less problematic bug as it only happens if SGX is soft-disabled (available in the processor but hidden in CPUID) or if SGX is supported for bare metal but not in the VMCS (will never happen when running on bare metal, but can theoertically happen when running in a VM).
Last but not least, this ensures that module params in sysfs reflect KVM's actual configuration.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128 Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson seanjc@google.com Suggested-by: Bandan Das bsd@redhat.com Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com Message-Id: 20221025123749.2201649-1-eesposit@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8281,6 +8281,11 @@ static __init int hardware_setup(void) if (!cpu_has_virtual_nmis()) enable_vnmi = 0;
+#ifdef CONFIG_X86_SGX_KVM + if (!cpu_has_vmx_encls_vmexit()) + enable_sgx = false; +#endif + /* * set_apic_access_page_addr() is used to reload apic access * page upon invalidation. No need to do anything if not
From: Michal Luczaj mhal@rbox.co
commit 52491a38b2c2411f3f0229dc6ad610349c704a41 upstream.
Move the gfn_to_pfn_cache lock initialization to another helper and call the new helper during VM/vCPU creation. There are race conditions possible due to kvm_gfn_to_pfn_cache_init()'s ability to re-initialize the cache's locks.
For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock.
(thread 1) | (thread 2) | kvm_xen_set_evtchn_fast | read_lock_irqsave(&gpc->lock, ...) | | kvm_gfn_to_pfn_cache_init | rwlock_init(&gpc->lock) read_unlock_irqrestore(&gpc->lock, ...) |
Rename "cache_init" and "cache_destroy" to activate+deactivate to avoid implying that the cache really is destroyed/freed.
Note, there more races in the newly named kvm_gpc_activate() that will be addressed separately.
Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson seanjc@google.com Signed-off-by: Michal Luczaj mhal@rbox.co [sean: call out that this is a bug fix] Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20221013211234.1318131-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 12 +++++---- arch/x86/kvm/xen.c | 57 ++++++++++++++++++++++++----------------------- include/linux/kvm_host.h | 24 ++++++++++++++----- virt/kvm/pfncache.c | 21 +++++++++-------- 4 files changed, 66 insertions(+), 48 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2304,11 +2304,11 @@ static void kvm_write_system_time(struct
/* we verify if the enable bit is set... */ if (system_time & 1) { - kvm_gfn_to_pfn_cache_init(vcpu->kvm, &vcpu->arch.pv_time, vcpu, - KVM_HOST_USES_PFN, system_time & ~1ULL, - sizeof(struct pvclock_vcpu_time_info)); + kvm_gpc_activate(vcpu->kvm, &vcpu->arch.pv_time, vcpu, + KVM_HOST_USES_PFN, system_time & ~1ULL, + sizeof(struct pvclock_vcpu_time_info)); } else { - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.pv_time); + kvm_gpc_deactivate(vcpu->kvm, &vcpu->arch.pv_time); }
return; @@ -3377,7 +3377,7 @@ static int kvm_pv_enable_async_pf_int(st
static void kvmclock_reset(struct kvm_vcpu *vcpu) { - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.pv_time); + kvm_gpc_deactivate(vcpu->kvm, &vcpu->arch.pv_time); vcpu->arch.time = 0; }
@@ -11629,6 +11629,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu vcpu->arch.regs_avail = ~0; vcpu->arch.regs_dirty = ~0;
+ kvm_gpc_init(&vcpu->arch.pv_time); + if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; else --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -42,13 +42,13 @@ static int kvm_xen_shared_info_init(stru int idx = srcu_read_lock(&kvm->srcu);
if (gfn == GPA_INVALID) { - kvm_gfn_to_pfn_cache_destroy(kvm, gpc); + kvm_gpc_deactivate(kvm, gpc); goto out; }
do { - ret = kvm_gfn_to_pfn_cache_init(kvm, gpc, NULL, KVM_HOST_USES_PFN, - gpa, PAGE_SIZE); + ret = kvm_gpc_activate(kvm, gpc, NULL, KVM_HOST_USES_PFN, gpa, + PAGE_SIZE); if (ret) goto out;
@@ -554,15 +554,15 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcp offsetof(struct compat_vcpu_info, time));
if (data->u.gpa == GPA_INVALID) { - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache); + kvm_gpc_deactivate(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache); r = 0; break; }
- r = kvm_gfn_to_pfn_cache_init(vcpu->kvm, - &vcpu->arch.xen.vcpu_info_cache, - NULL, KVM_HOST_USES_PFN, data->u.gpa, - sizeof(struct vcpu_info)); + r = kvm_gpc_activate(vcpu->kvm, + &vcpu->arch.xen.vcpu_info_cache, NULL, + KVM_HOST_USES_PFN, data->u.gpa, + sizeof(struct vcpu_info)); if (!r) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
@@ -570,16 +570,16 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcp
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: if (data->u.gpa == GPA_INVALID) { - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, - &vcpu->arch.xen.vcpu_time_info_cache); + kvm_gpc_deactivate(vcpu->kvm, + &vcpu->arch.xen.vcpu_time_info_cache); r = 0; break; }
- r = kvm_gfn_to_pfn_cache_init(vcpu->kvm, - &vcpu->arch.xen.vcpu_time_info_cache, - NULL, KVM_HOST_USES_PFN, data->u.gpa, - sizeof(struct pvclock_vcpu_time_info)); + r = kvm_gpc_activate(vcpu->kvm, + &vcpu->arch.xen.vcpu_time_info_cache, + NULL, KVM_HOST_USES_PFN, data->u.gpa, + sizeof(struct pvclock_vcpu_time_info)); if (!r) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); break; @@ -590,16 +590,15 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcp break; } if (data->u.gpa == GPA_INVALID) { - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, - &vcpu->arch.xen.runstate_cache); + kvm_gpc_deactivate(vcpu->kvm, + &vcpu->arch.xen.runstate_cache); r = 0; break; }
- r = kvm_gfn_to_pfn_cache_init(vcpu->kvm, - &vcpu->arch.xen.runstate_cache, - NULL, KVM_HOST_USES_PFN, data->u.gpa, - sizeof(struct vcpu_runstate_info)); + r = kvm_gpc_activate(vcpu->kvm, &vcpu->arch.xen.runstate_cache, + NULL, KVM_HOST_USES_PFN, data->u.gpa, + sizeof(struct vcpu_runstate_info)); break;
case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT: @@ -1817,7 +1816,12 @@ void kvm_xen_init_vcpu(struct kvm_vcpu * { vcpu->arch.xen.vcpu_id = vcpu->vcpu_idx; vcpu->arch.xen.poll_evtchn = 0; + timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); + + kvm_gpc_init(&vcpu->arch.xen.runstate_cache); + kvm_gpc_init(&vcpu->arch.xen.vcpu_info_cache); + kvm_gpc_init(&vcpu->arch.xen.vcpu_time_info_cache); }
void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) @@ -1825,18 +1829,17 @@ void kvm_xen_destroy_vcpu(struct kvm_vcp if (kvm_xen_timer_enabled(vcpu)) kvm_xen_stop_timer(vcpu);
- kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, - &vcpu->arch.xen.runstate_cache); - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, - &vcpu->arch.xen.vcpu_info_cache); - kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, - &vcpu->arch.xen.vcpu_time_info_cache); + kvm_gpc_deactivate(vcpu->kvm, &vcpu->arch.xen.runstate_cache); + kvm_gpc_deactivate(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache); + kvm_gpc_deactivate(vcpu->kvm, &vcpu->arch.xen.vcpu_time_info_cache); + del_timer_sync(&vcpu->arch.xen.poll_timer); }
void kvm_xen_init_vm(struct kvm *kvm) { idr_init(&kvm->arch.xen.evtchn_ports); + kvm_gpc_init(&kvm->arch.xen.shinfo_cache); }
void kvm_xen_destroy_vm(struct kvm *kvm) @@ -1844,7 +1847,7 @@ void kvm_xen_destroy_vm(struct kvm *kvm) struct evtchnfd *evtchnfd; int i;
- kvm_gfn_to_pfn_cache_destroy(kvm, &kvm->arch.xen.shinfo_cache); + kvm_gpc_deactivate(kvm, &kvm->arch.xen.shinfo_cache);
idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) { if (!evtchnfd->deliver.port.port) --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1241,8 +1241,18 @@ int kvm_vcpu_write_guest(struct kvm_vcpu void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
/** - * kvm_gfn_to_pfn_cache_init - prepare a cached kernel mapping and HPA for a - * given guest physical address. + * kvm_gpc_init - initialize gfn_to_pfn_cache. + * + * @gpc: struct gfn_to_pfn_cache object. + * + * This sets up a gfn_to_pfn_cache by initializing locks. Note, the cache must + * be zero-allocated (or zeroed by the caller before init). + */ +void kvm_gpc_init(struct gfn_to_pfn_cache *gpc); + +/** + * kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest + * physical address. * * @kvm: pointer to kvm instance. * @gpc: struct gfn_to_pfn_cache object. @@ -1266,9 +1276,9 @@ void kvm_vcpu_mark_page_dirty(struct kvm * kvm_gfn_to_pfn_cache_check() to ensure that the cache is valid before * accessing the target page. */ -int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, - struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, - gpa_t gpa, unsigned long len); +int kvm_gpc_activate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, + struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, + gpa_t gpa, unsigned long len);
/** * kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache. @@ -1325,7 +1335,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);
/** - * kvm_gfn_to_pfn_cache_destroy - destroy and unlink a gfn_to_pfn_cache. + * kvm_gpc_deactivate - deactivate and unlink a gfn_to_pfn_cache. * * @kvm: pointer to kvm instance. * @gpc: struct gfn_to_pfn_cache object. @@ -1333,7 +1343,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct k * This removes a cache from the @kvm's list to be processed on MMU notifier * invocation. */ -void kvm_gfn_to_pfn_cache_destroy(struct kvm *kvm, struct gfn_to_pfn_cache *gpc); +void kvm_gpc_deactivate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);
void kvm_sigset_activate(struct kvm_vcpu *vcpu); void kvm_sigset_deactivate(struct kvm_vcpu *vcpu); --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -346,17 +346,20 @@ void kvm_gfn_to_pfn_cache_unmap(struct k } EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap);
+void kvm_gpc_init(struct gfn_to_pfn_cache *gpc) +{ + rwlock_init(&gpc->lock); + mutex_init(&gpc->refresh_lock); +} +EXPORT_SYMBOL_GPL(kvm_gpc_init);
-int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, - struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, - gpa_t gpa, unsigned long len) +int kvm_gpc_activate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, + struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, + gpa_t gpa, unsigned long len) { WARN_ON_ONCE(!usage || (usage & KVM_GUEST_AND_HOST_USE_PFN) != usage);
if (!gpc->active) { - rwlock_init(&gpc->lock); - mutex_init(&gpc->refresh_lock); - gpc->khva = NULL; gpc->pfn = KVM_PFN_ERR_FAULT; gpc->uhva = KVM_HVA_ERR_BAD; @@ -371,9 +374,9 @@ int kvm_gfn_to_pfn_cache_init(struct kvm } return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len); } -EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_init); +EXPORT_SYMBOL_GPL(kvm_gpc_activate);
-void kvm_gfn_to_pfn_cache_destroy(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) +void kvm_gpc_deactivate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) { if (gpc->active) { spin_lock(&kvm->gpc_lock); @@ -384,4 +387,4 @@ void kvm_gfn_to_pfn_cache_destroy(struct gpc->active = false; } } -EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_destroy); +EXPORT_SYMBOL_GPL(kvm_gpc_deactivate);
From: Sean Christopherson seanjc@google.com
commit ecbcf030b45666ad11bc98565e71dfbcb7be4393 upstream.
Reject kvm_gpc_check() and kvm_gpc_refresh() if the cache is inactive. Not checking the active flag during refresh is particularly egregious, as KVM can end up with a valid, inactive cache, which can lead to a variety of use-after-free bugs, e.g. consuming a NULL kernel pointer or missing an mmu_notifier invalidation due to the cache not being on the list of gfns to invalidate.
Note, "active" needs to be set if and only if the cache is on the list of caches, i.e. is reachable via mmu_notifier events. If a relevant mmu_notifier event occurs while the cache is "active" but not on the list, KVM will not acquire the cache's lock and so will not serailize the mmu_notifier event with active users and/or kvm_gpc_refresh().
A race between KVM_XEN_ATTR_TYPE_SHARED_INFO and KVM_XEN_HVM_EVTCHN_SEND can be exploited to trigger the bug.
1. Deactivate shinfo cache:
kvm_xen_hvm_set_attr case KVM_XEN_ATTR_TYPE_SHARED_INFO kvm_gpc_deactivate kvm_gpc_unmap gpc->valid = false gpc->khva = NULL gpc->active = false
Result: active = false, valid = false
2. Cause cache refresh:
kvm_arch_vm_ioctl case KVM_XEN_HVM_EVTCHN_SEND kvm_xen_hvm_evtchn_send kvm_xen_set_evtchn kvm_xen_set_evtchn_fast kvm_gpc_check return -EWOULDBLOCK because !gpc->valid kvm_xen_set_evtchn_fast return -EWOULDBLOCK kvm_gpc_refresh hva_to_pfn_retry gpc->valid = true gpc->khva = not NULL
Result: active = false, valid = true
3. Race ioctl KVM_XEN_HVM_EVTCHN_SEND against ioctl KVM_XEN_ATTR_TYPE_SHARED_INFO:
kvm_arch_vm_ioctl case KVM_XEN_HVM_EVTCHN_SEND kvm_xen_hvm_evtchn_send kvm_xen_set_evtchn kvm_xen_set_evtchn_fast read_lock gpc->lock kvm_xen_hvm_set_attr case KVM_XEN_ATTR_TYPE_SHARED_INFO mutex_lock kvm->lock kvm_xen_shared_info_init kvm_gpc_activate gpc->khva = NULL kvm_gpc_check [ Check passes because gpc->valid is still true, even though gpc->khva is already NULL. ] shinfo = gpc->khva pending_bits = shinfo->evtchn_pending CRASH: test_and_set_bit(..., pending_bits)
Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Reported-by: : Michal Luczaj mhal@rbox.co Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20221013211234.1318131-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/pfncache.c | 41 ++++++++++++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 7 deletions(-)
--- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -81,6 +81,9 @@ bool kvm_gfn_to_pfn_cache_check(struct k { struct kvm_memslots *slots = kvm_memslots(kvm);
+ if (!gpc->active) + return false; + if ((gpa & ~PAGE_MASK) + len > PAGE_SIZE) return false;
@@ -240,10 +243,11 @@ int kvm_gfn_to_pfn_cache_refresh(struct { struct kvm_memslots *slots = kvm_memslots(kvm); unsigned long page_offset = gpa & ~PAGE_MASK; - kvm_pfn_t old_pfn, new_pfn; + bool unmap_old = false; unsigned long old_uhva; + kvm_pfn_t old_pfn; void *old_khva; - int ret = 0; + int ret;
/* * If must fit within a single page. The 'len' argument is @@ -261,6 +265,11 @@ int kvm_gfn_to_pfn_cache_refresh(struct
write_lock_irq(&gpc->lock);
+ if (!gpc->active) { + ret = -EINVAL; + goto out_unlock; + } + old_pfn = gpc->pfn; old_khva = gpc->khva - offset_in_page(gpc->khva); old_uhva = gpc->uhva; @@ -291,6 +300,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct /* If the HVA→PFN mapping was already valid, don't unmap it. */ old_pfn = KVM_PFN_ERR_FAULT; old_khva = NULL; + ret = 0; }
out: @@ -305,14 +315,15 @@ int kvm_gfn_to_pfn_cache_refresh(struct gpc->khva = NULL; }
- /* Snapshot the new pfn before dropping the lock! */ - new_pfn = gpc->pfn; + /* Detect a pfn change before dropping the lock! */ + unmap_old = (old_pfn != gpc->pfn);
+out_unlock: write_unlock_irq(&gpc->lock);
mutex_unlock(&gpc->refresh_lock);
- if (old_pfn != new_pfn) + if (unmap_old) gpc_unmap_khva(kvm, old_pfn, old_khva);
return ret; @@ -366,11 +377,19 @@ int kvm_gpc_activate(struct kvm *kvm, st gpc->vcpu = vcpu; gpc->usage = usage; gpc->valid = false; - gpc->active = true;
spin_lock(&kvm->gpc_lock); list_add(&gpc->list, &kvm->gpc_list); spin_unlock(&kvm->gpc_lock); + + /* + * Activate the cache after adding it to the list, a concurrent + * refresh must not establish a mapping until the cache is + * reachable by mmu_notifier events. + */ + write_lock_irq(&gpc->lock); + gpc->active = true; + write_unlock_irq(&gpc->lock); } return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len); } @@ -379,12 +398,20 @@ EXPORT_SYMBOL_GPL(kvm_gpc_activate); void kvm_gpc_deactivate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) { if (gpc->active) { + /* + * Deactivate the cache before removing it from the list, KVM + * must stall mmu_notifier events until all users go away, i.e. + * until gpc->lock is dropped and refresh is guaranteed to fail. + */ + write_lock_irq(&gpc->lock); + gpc->active = false; + write_unlock_irq(&gpc->lock); + spin_lock(&kvm->gpc_lock); list_del(&gpc->list); spin_unlock(&kvm->gpc_lock);
kvm_gfn_to_pfn_cache_unmap(kvm, gpc); - gpc->active = false; } } EXPORT_SYMBOL_GPL(kvm_gpc_deactivate);
From: Ryan Roberts ryan.roberts@arm.com
commit b6bcdc9f6b8321e4471ff45413b6410e16762a8d upstream.
enter_exception64() performs an MTE check, which involves dereferencing vcpu->kvm. While vcpu has already been fixed up to be a HYP VA pointer, kvm is still a pointer in the kernel VA space.
This only affects nVHE configurations with MTE enabled, as in other cases, the pointer is either valid (VHE) or not dereferenced (!MTE).
Fix this by first converting kvm to a HYP VA pointer.
Fixes: ea7fc1bb1cd1 ("KVM: arm64: Introduce MTE VM feature") Signed-off-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Steven Price steven.price@arm.com [maz: commit message tidy-up] Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221027120945.29679-1-ryan.roberts@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/exception.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/kvm/hyp/exception.c +++ b/arch/arm64/kvm/hyp/exception.c @@ -13,6 +13,7 @@ #include <hyp/adjust_pc.h> #include <linux/kvm_host.h> #include <asm/kvm_emulate.h> +#include <asm/kvm_mmu.h>
#if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__) #error Hypervisor code only! @@ -115,7 +116,7 @@ static void enter_exception64(struct kvm new |= (old & PSR_C_BIT); new |= (old & PSR_V_BIT);
- if (kvm_has_mte(vcpu->kvm)) + if (kvm_has_mte(kern_hyp_va(vcpu->kvm))) new |= PSR_TCO_BIT;
new |= (old & PSR_DIT_BIT);
From: Marc Zyngier maz@kernel.org
commit 4151bb636acf32bb2e6126cec8216b023117c0e9 upstream.
The trapping of SMPRI_EL1 and TPIDR2_EL0 currently only really work on nVHE, as only this mode uses the fine-grained trapping that controls these two registers.
Move the trapping enable/disable code into __{de,}activate_traps_common(), allowing it to be called when it actually matters on VHE, and remove the flipping of EL2 control for TPIDR2_EL0, which only affects the host access of this register.
Fixes: 861262ab8627 ("KVM: arm64: Handle SME host state when running guests") Reported-by: Mark Brown broonie@kernel.org Reviewed-by: Mark Brown broonie@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/86bkpqer4z.wl-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/include/hyp/switch.h | 20 +++++++++++++++++++ arch/arm64/kvm/hyp/nvhe/switch.c | 26 ------------------------- arch/arm64/kvm/hyp/vhe/switch.c | 8 -------- 3 files changed, 20 insertions(+), 34 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 6cbbb6c02f66..3330d1b76bdd 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -87,6 +87,17 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2); write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); + + if (cpus_have_final_cap(ARM64_SME)) { + sysreg_clear_set_s(SYS_HFGRTR_EL2, + HFGxTR_EL2_nSMPRI_EL1_MASK | + HFGxTR_EL2_nTPIDR2_EL0_MASK, + 0); + sysreg_clear_set_s(SYS_HFGWTR_EL2, + HFGxTR_EL2_nSMPRI_EL1_MASK | + HFGxTR_EL2_nTPIDR2_EL0_MASK, + 0); + } }
static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu) @@ -96,6 +107,15 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu) write_sysreg(0, hstr_el2); if (kvm_arm_support_pmu_v3()) write_sysreg(0, pmuserenr_el0); + + if (cpus_have_final_cap(ARM64_SME)) { + sysreg_clear_set_s(SYS_HFGRTR_EL2, 0, + HFGxTR_EL2_nSMPRI_EL1_MASK | + HFGxTR_EL2_nTPIDR2_EL0_MASK); + sysreg_clear_set_s(SYS_HFGWTR_EL2, 0, + HFGxTR_EL2_nSMPRI_EL1_MASK | + HFGxTR_EL2_nTPIDR2_EL0_MASK); + } }
static inline void ___activate_traps(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 8e9d49a964be..c2cb46ca4fb6 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -55,18 +55,6 @@ static void __activate_traps(struct kvm_vcpu *vcpu) write_sysreg(val, cptr_el2); write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el2);
- if (cpus_have_final_cap(ARM64_SME)) { - val = read_sysreg_s(SYS_HFGRTR_EL2); - val &= ~(HFGxTR_EL2_nTPIDR2_EL0_MASK | - HFGxTR_EL2_nSMPRI_EL1_MASK); - write_sysreg_s(val, SYS_HFGRTR_EL2); - - val = read_sysreg_s(SYS_HFGWTR_EL2); - val &= ~(HFGxTR_EL2_nTPIDR2_EL0_MASK | - HFGxTR_EL2_nSMPRI_EL1_MASK); - write_sysreg_s(val, SYS_HFGWTR_EL2); - } - if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -110,20 +98,6 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
- if (cpus_have_final_cap(ARM64_SME)) { - u64 val; - - val = read_sysreg_s(SYS_HFGRTR_EL2); - val |= HFGxTR_EL2_nTPIDR2_EL0_MASK | - HFGxTR_EL2_nSMPRI_EL1_MASK; - write_sysreg_s(val, SYS_HFGRTR_EL2); - - val = read_sysreg_s(SYS_HFGWTR_EL2); - val |= HFGxTR_EL2_nTPIDR2_EL0_MASK | - HFGxTR_EL2_nSMPRI_EL1_MASK; - write_sysreg_s(val, SYS_HFGWTR_EL2); - } - cptr = CPTR_EL2_DEFAULT; if (vcpu_has_sve(vcpu) && (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)) cptr |= CPTR_EL2_TZ; diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 7acb87eaa092..1a97391fedd2 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -63,10 +63,6 @@ static void __activate_traps(struct kvm_vcpu *vcpu) __activate_traps_fpsimd32(vcpu); }
- if (cpus_have_final_cap(ARM64_SME)) - write_sysreg(read_sysreg(sctlr_el2) & ~SCTLR_ELx_ENTP2, - sctlr_el2); - write_sysreg(val, cpacr_el1);
write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1); @@ -88,10 +84,6 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu) */ asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
- if (cpus_have_final_cap(ARM64_SME)) - write_sysreg(read_sysreg(sctlr_el2) | SCTLR_ELx_ENTP2, - sctlr_el2); - write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
if (!arm64_kernel_unmapped_at_el0())
From: Maxim Levitsky mlevitsk@redhat.com
commit 696db303e54f7352623d9f640e6c51d8fa9d5588 upstream.
On 64 bit host, if the guest doesn't have X86_FEATURE_LM, KVM will access 16 gprs to 32-bit smram image, causing out-ouf-bound ram access.
On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64 is compiled out, thus access overflow can't happen.
Fixes: b443183a25ab61 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM")
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20221025124741.228045-15-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2430,7 +2430,7 @@ static int rsm_load_state_32(struct x86_ ctxt->eflags = GET_SMSTATE(u32, smstate, 0x7ff4) | X86_EFLAGS_FIXED; ctxt->_eip = GET_SMSTATE(u32, smstate, 0x7ff0);
- for (i = 0; i < NR_EMULATOR_GPRS; i++) + for (i = 0; i < 8; i++) *reg_write(ctxt, i) = GET_SMSTATE(u32, smstate, 0x7fd0 + i * 4);
val = GET_SMSTATE(u32, smstate, 0x7fcc); @@ -2487,7 +2487,7 @@ static int rsm_load_state_64(struct x86_ u16 selector; int i, r;
- for (i = 0; i < NR_EMULATOR_GPRS; i++) + for (i = 0; i < 16; i++) *reg_write(ctxt, i) = GET_SMSTATE(u64, smstate, 0x7ff8 - i * 8);
ctxt->_eip = GET_SMSTATE(u64, smstate, 0x7f78);
From: Maxim Levitsky mlevitsk@redhat.com
commit 5015bb89b58225f97df6ac44383e7e8c8662c8c9 upstream.
SYSEXIT is one of the instructions that can change the processor mode, thus ctxt->mode should be updated after it.
Note that this is likely a benign bug, because the only problematic mode change is from 32 bit to 64 bit which can lead to truncation of RIP, and it is not possible to do with sysexit, since sysexit running in 32 bit mode will be limited to 32 bit version.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-11-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2874,6 +2874,7 @@ static int em_sysexit(struct x86_emulate ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
ctxt->_eip = rdx; + ctxt->mode = usermode; *reg_write(ctxt, VCPU_REGS_RSP) = rcx;
return X86EMUL_CONTINUE;
From: Maxim Levitsky mlevitsk@redhat.com
commit d087e0f79fa0dd336a9a6b2f79ec23120f5eff73 upstream.
Some instructions update the cpu execution mode, which needs to update the emulation mode.
Extract this code, and make assign_eip_far use it.
assign_eip_far now reads CS, instead of getting it via a parameter, which is ok, because callers always assign CS to the same value before calling this function.
No functional change is intended.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-12-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 85 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 28 deletions(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -791,8 +791,7 @@ static int linearize(struct x86_emulate_ ctxt->mode, linear); }
-static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst, - enum x86emul_mode mode) +static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst) { ulong linear; int rc; @@ -802,41 +801,71 @@ static inline int assign_eip(struct x86_
if (ctxt->op_bytes != sizeof(unsigned long)) addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1); - rc = __linearize(ctxt, addr, &max_size, 1, false, true, mode, &linear); + rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear); if (rc == X86EMUL_CONTINUE) ctxt->_eip = addr.ea; return rc; }
+static inline int emulator_recalc_and_set_mode(struct x86_emulate_ctxt *ctxt) +{ + u64 efer; + struct desc_struct cs; + u16 selector; + u32 base3; + + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); + + if (!(ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE)) { + /* Real mode. cpu must not have long mode active */ + if (efer & EFER_LMA) + return X86EMUL_UNHANDLEABLE; + ctxt->mode = X86EMUL_MODE_REAL; + return X86EMUL_CONTINUE; + } + + if (ctxt->eflags & X86_EFLAGS_VM) { + /* Protected/VM86 mode. cpu must not have long mode active */ + if (efer & EFER_LMA) + return X86EMUL_UNHANDLEABLE; + ctxt->mode = X86EMUL_MODE_VM86; + return X86EMUL_CONTINUE; + } + + if (!ctxt->ops->get_segment(ctxt, &selector, &cs, &base3, VCPU_SREG_CS)) + return X86EMUL_UNHANDLEABLE; + + if (efer & EFER_LMA) { + if (cs.l) { + /* Proper long mode */ + ctxt->mode = X86EMUL_MODE_PROT64; + } else if (cs.d) { + /* 32 bit compatibility mode*/ + ctxt->mode = X86EMUL_MODE_PROT32; + } else { + ctxt->mode = X86EMUL_MODE_PROT16; + } + } else { + /* Legacy 32 bit / 16 bit mode */ + ctxt->mode = cs.d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; + } + + return X86EMUL_CONTINUE; +} + static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst) { - return assign_eip(ctxt, dst, ctxt->mode); + return assign_eip(ctxt, dst); }
-static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst, - const struct desc_struct *cs_desc) +static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst) { - enum x86emul_mode mode = ctxt->mode; - int rc; + int rc = emulator_recalc_and_set_mode(ctxt);
-#ifdef CONFIG_X86_64 - if (ctxt->mode >= X86EMUL_MODE_PROT16) { - if (cs_desc->l) { - u64 efer = 0; + if (rc != X86EMUL_CONTINUE) + return rc;
- ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); - if (efer & EFER_LMA) - mode = X86EMUL_MODE_PROT64; - } else - mode = X86EMUL_MODE_PROT32; /* temporary value */ - } -#endif - if (mode == X86EMUL_MODE_PROT16 || mode == X86EMUL_MODE_PROT32) - mode = cs_desc->d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; - rc = assign_eip(ctxt, dst, mode); - if (rc == X86EMUL_CONTINUE) - ctxt->mode = mode; - return rc; + return assign_eip(ctxt, dst); }
static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel) @@ -2170,7 +2199,7 @@ static int em_jmp_far(struct x86_emulate if (rc != X86EMUL_CONTINUE) return rc;
- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc); + rc = assign_eip_far(ctxt, ctxt->src.val); /* Error handling is not implemented. */ if (rc != X86EMUL_CONTINUE) return X86EMUL_UNHANDLEABLE; @@ -2248,7 +2277,7 @@ static int em_ret_far(struct x86_emulate &new_desc); if (rc != X86EMUL_CONTINUE) return rc; - rc = assign_eip_far(ctxt, eip, &new_desc); + rc = assign_eip_far(ctxt, eip); /* Error handling is not implemented. */ if (rc != X86EMUL_CONTINUE) return X86EMUL_UNHANDLEABLE; @@ -3468,7 +3497,7 @@ static int em_call_far(struct x86_emulat if (rc != X86EMUL_CONTINUE) return rc;
- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc); + rc = assign_eip_far(ctxt, ctxt->src.val); if (rc != X86EMUL_CONTINUE) goto fail;
From: Maxim Levitsky mlevitsk@redhat.com
commit 055f37f84e304e59c046d1accfd8f08462f52c4c upstream.
Update the emulation mode after RSM so that RIP will be correctly written back, because the RSM instruction can switch the CPU mode from 32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-13-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2660,7 +2660,7 @@ static int em_rsm(struct x86_emulate_ctx * those side effects need to be explicitly handled for both success * and shutdown. */ - return X86EMUL_CONTINUE; + return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown: ctxt->ops->triple_fault(ctxt);
From: Maxim Levitsky mlevitsk@redhat.com
commit ad8f9e69942c7db90758d9d774157e53bce94840 upstream.
Update the emulation mode when handling writes to CR0, because toggling CR0.PE switches between Real and Protected Mode, and toggling CR0.PG when EFER.LME=1 switches between Long and Protected Mode.
This is likely a benign bug because there is no writeback of state, other than the RIP increment, and when toggling CR0.PE, the CPU has to execute code from a very low memory address.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-14-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3639,11 +3639,25 @@ static int em_movbe(struct x86_emulate_c
static int em_cr_write(struct x86_emulate_ctxt *ctxt) { - if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val)) + int cr_num = ctxt->modrm_reg; + int r; + + if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val)) return emulate_gp(ctxt, 0);
/* Disable writeback. */ ctxt->dst.type = OP_NONE; + + if (cr_num == 0) { + /* + * CR0 write might have updated CR0.PE and/or CR0.PG + * which can affect the cpu's execution mode. + */ + r = emulator_recalc_and_set_mode(ctxt); + if (r != X86EMUL_CONTINUE) + return r; + } + return X86EMUL_CONTINUE; }
From: Matthew Wilcox (Oracle) willy@infradead.org
commit 4fa0e3ff217f775cb58d2d6d51820ec519243fb9 upstream.
The recent change of page_cache_ra_unbounded() arguments was buggy in the two callers, causing us to readahead the wrong pages. Move the definition of ractl down to after the index is set correctly. This affected performance on configurations that use fs-verity.
Link: https://lkml.kernel.org/r/20221012193419.1453558-1-willy@infradead.org Fixes: 73bb49da50cd ("mm/readahead: make page_cache_ra_unbounded take a readahead_control") Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Reported-by: Jintao Yin nicememory@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Cc: Eric Biggers ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/verity.c | 3 ++- fs/f2fs/verity.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -365,13 +365,14 @@ static struct page *ext4_read_merkle_tre pgoff_t index, unsigned long num_ra_pages) { - DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); struct page *page;
index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED); if (!page || !PageUptodate(page)) { + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); + if (page) put_page(page); else if (num_ra_pages > 1) --- a/fs/f2fs/verity.c +++ b/fs/f2fs/verity.c @@ -262,13 +262,14 @@ static struct page *f2fs_read_merkle_tre pgoff_t index, unsigned long num_ra_pages) { - DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); struct page *page;
index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED); if (!page || !PageUptodate(page)) { + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); + if (page) put_page(page); else if (num_ra_pages > 1)
From: Ronnie Sahlberg lsahlber@redhat.com
commit 2f6f19c7aaad5005dc75298a413eb0243c5d312d upstream.
BZ: 215375
Fixes: 76a3c92ec9e0 ("cifs: remove support for NTLM and weaker authentication algorithms") Reviewed-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/connect.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -3921,12 +3921,11 @@ CIFSTCon(const unsigned int xid, struct pSMB->AndXCommand = 0xFF; pSMB->Flags = cpu_to_le16(TCON_EXTENDED_SECINFO); bcc_ptr = &pSMB->Password[0]; - if (tcon->pipe || (ses->server->sec_mode & SECMODE_USER)) { - pSMB->PasswordLength = cpu_to_le16(1); /* minimum */ - *bcc_ptr = 0; /* password is null byte */ - bcc_ptr++; /* skip password */ - /* already aligned so no need to do it below */ - } + + pSMB->PasswordLength = cpu_to_le16(1); /* minimum */ + *bcc_ptr = 0; /* password is null byte */ + bcc_ptr++; /* skip password */ + /* already aligned so no need to do it below */
if (ses->server->sign) smb_buffer->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
From: Brian Norris briannorris@chromium.org
commit 0be67e0556e469c57100ffe3c90df90abc796f3b upstream.
If we fail to attach the first time (especially: EPROBE_DEFER), we fail to clean up 'usage_mode', and thus will fail to attach on any subsequent attempts, with "dsi controller already in use".
Re-set to DW_DSI_USAGE_IDLE on attach failure.
This is especially common to hit when enabling asynchronous probe on a duel-DSI system (such as RK3399 Gru/Scarlet), such that we're more likely to fail dw_mipi_dsi_rockchip_find_second() the first time.
Fixes: 71f68fe7f121 ("drm/rockchip: dsi: add ability to work as a phy instead of full dsi") Cc: stable@vger.kernel.org Signed-off-by: Brian Norris briannorris@chromium.org Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20221019170255.1.Ia68dfb27b835... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c @@ -1031,23 +1031,31 @@ static int dw_mipi_dsi_rockchip_host_att if (ret) { DRM_DEV_ERROR(dsi->dev, "Failed to register component: %d\n", ret); - return ret; + goto out; }
second = dw_mipi_dsi_rockchip_find_second(dsi); - if (IS_ERR(second)) - return PTR_ERR(second); + if (IS_ERR(second)) { + ret = PTR_ERR(second); + goto out; + } if (second) { ret = component_add(second, &dw_mipi_dsi_rockchip_ops); if (ret) { DRM_DEV_ERROR(second, "Failed to register component: %d\n", ret); - return ret; + goto out; } }
return 0; + +out: + mutex_lock(&dsi->usage_mutex); + dsi->usage_mode = DW_DSI_USAGE_IDLE; + mutex_unlock(&dsi->usage_mutex); + return ret; }
static int dw_mipi_dsi_rockchip_host_detach(void *priv_data,
From: Brian Norris briannorris@chromium.org
commit 81e592f86f7afdb76d655e7fbd7803d7b8f985d8 upstream.
We can't safely probe a dual-DSI display asynchronously (driver_async_probe='*' or driver_async_probe='dw-mipi-dsi-rockchip' cmdline), because dw_mipi_dsi_rockchip_find_second() pokes one DSI device's drvdata from the other device without any locking.
Request synchronous probe, at least until this driver learns some appropriate locking for dual-DSI initialization.
Cc: stable@vger.kernel.org Signed-off-by: Brian Norris briannorris@chromium.org Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20221019170255.2.I6b985b0ca372... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c @@ -1642,5 +1642,11 @@ struct platform_driver dw_mipi_dsi_rockc .of_match_table = dw_mipi_dsi_rockchip_dt_ids, .pm = &dw_mipi_dsi_rockchip_pm_ops, .name = "dw-mipi-dsi-rockchip", + /* + * For dual-DSI display, one DSI pokes at the other DSI's + * drvdata in dw_mipi_dsi_rockchip_find_second(). This is not + * safe for asynchronous probe. + */ + .probe_type = PROBE_FORCE_SYNCHRONOUS, }, };
From: Graham Sider Graham.Sider@amd.com
commit a3e5ce56f3d260f2ec8e5242c33f57e60ae9eba7 upstream.
Temporary workaround to fix issues observed in some compute applications when GFXOFF is enabled on GFX11.
Signed-off-by: Graham Sider Graham.Sider@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -703,6 +703,13 @@ err:
void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { + /* Temporary workaround to fix issues observed in some + * compute applications when GFXOFF is enabled on GFX11. + */ + if (IP_VERSION_MAJ(adev->ip_versions[GC_HWIP][0]) == 11) { + pr_debug("GFXOFF is %s\n", idle ? "enabled" : "disabled"); + amdgpu_gfx_off_ctrl(adev, idle); + } amdgpu_dpm_switch_power_profile(adev, PP_SMC_POWER_PROFILE_COMPUTE, !idle);
From: Dillon Varone Dillon.Varone@amd.com
commit c580d758ba1b79de9ea7a475d95a6278736ae462 upstream.
Update DF related latencies based on new measurements.
Tested-by: Mark Broadworth mark.broadworth@amd.com Reviewed-by: Jun Lei Jun.Lei@amd.com Acked-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Dillon Varone Dillon.Varone@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c @@ -119,15 +119,15 @@ struct _vcs_dpi_soc_bounding_box_st dcn3 }, }, .num_states = 1, - .sr_exit_time_us = 12.36, - .sr_enter_plus_exit_time_us = 16.72, + .sr_exit_time_us = 19.95, + .sr_enter_plus_exit_time_us = 24.36, .sr_exit_z8_time_us = 285.0, .sr_enter_plus_exit_z8_time_us = 320, .writeback_latency_us = 12.0, .round_trip_ping_latency_dcfclk_cycles = 263, - .urgent_latency_pixel_data_only_us = 4.0, - .urgent_latency_pixel_mixed_with_vm_data_us = 4.0, - .urgent_latency_vm_data_only_us = 4.0, + .urgent_latency_pixel_data_only_us = 9.35, + .urgent_latency_pixel_mixed_with_vm_data_us = 9.35, + .urgent_latency_vm_data_only_us = 9.35, .fclk_change_latency_us = 20, .usr_retraining_latency_us = 2, .smn_latency_us = 2,
From: Leo Chen sancchen@amd.com
commit 341421084d705475817f7f0d68e130370d10b20d upstream.
dcn314 has 4 DSC - conflicted hardware document updated and confirmed.
Tested-by: Mark Broadworth mark.broadworth@amd.com Reviewed-by: Charlene Liu Charlene.Liu@amd.com Acked-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Leo Chen sancchen@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c @@ -847,7 +847,7 @@ static const struct resource_caps res_ca .num_ddc = 5, .num_vmid = 16, .num_mpc_3dlut = 2, - .num_dsc = 3, + .num_dsc = 4, };
static const struct dc_plane_cap plane_cap = {
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 3e206b6aa6df7eed4297577e0cf8403169b800a2 upstream.
We try to filter out the corresponding xxx1 output if the xxx0 output is not present. But the way that is being done is pretty awkward. Make it less so.
Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-2-ville.s... Reviewed-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit cc1e66394daaa7e9f005e2487a84e34a39f9308b) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_sdvo.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c +++ b/drivers/gpu/drm/i915/display/intel_sdvo.c @@ -2926,16 +2926,33 @@ err: return false; }
+static u16 intel_sdvo_filter_output_flags(u16 flags) +{ + flags &= SDVO_OUTPUT_MASK; + + /* SDVO requires XXX1 function may not exist unless it has XXX0 function.*/ + if (!(flags & SDVO_OUTPUT_TMDS0)) + flags &= ~SDVO_OUTPUT_TMDS1; + + if (!(flags & SDVO_OUTPUT_RGB0)) + flags &= ~SDVO_OUTPUT_RGB1; + + if (!(flags & SDVO_OUTPUT_LVDS0)) + flags &= ~SDVO_OUTPUT_LVDS1; + + return flags; +} + static bool intel_sdvo_output_setup(struct intel_sdvo *intel_sdvo, u16 flags) { - /* SDVO requires XXX1 function may not exist unless it has XXX0 function.*/ + flags = intel_sdvo_filter_output_flags(flags);
if (flags & SDVO_OUTPUT_TMDS0) if (!intel_sdvo_dvi_init(intel_sdvo, 0)) return false;
- if ((flags & SDVO_TMDS_MASK) == SDVO_TMDS_MASK) + if (flags & SDVO_OUTPUT_TMDS1) if (!intel_sdvo_dvi_init(intel_sdvo, 1)) return false;
@@ -2956,7 +2973,7 @@ intel_sdvo_output_setup(struct intel_sdv if (!intel_sdvo_analog_init(intel_sdvo, 0)) return false;
- if ((flags & SDVO_RGB_MASK) == SDVO_RGB_MASK) + if (flags & SDVO_OUTPUT_RGB1) if (!intel_sdvo_analog_init(intel_sdvo, 1)) return false;
@@ -2964,11 +2981,11 @@ intel_sdvo_output_setup(struct intel_sdv if (!intel_sdvo_lvds_init(intel_sdvo, 0)) return false;
- if ((flags & SDVO_LVDS_MASK) == SDVO_LVDS_MASK) + if (flags & SDVO_OUTPUT_LVDS1) if (!intel_sdvo_lvds_init(intel_sdvo, 1)) return false;
- if ((flags & SDVO_OUTPUT_MASK) == 0) { + if (flags == 0) { unsigned char bytes[2];
intel_sdvo->controlled_output = 0;
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit e79762512120f11c51317570519a1553c70805d8 upstream.
Call intel_sdvo_select_ddc_bus() before initializing any of the outputs. And before that is functional (assuming no VBT) we have to set up the controlled_outputs thing. Otherwise DDC won't be functional during the output init but LVDS really needs it for the fixed mode setup.
Note that the whole multi output support still looks very bogus, and more work will be needed to make it correct. But for now this should at least fix the LVDS EDID fixed mode setup.
Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7301 Fixes: aa2b88074a56 ("drm/i915/sdvo: Fix multi function encoder stuff") Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-3-ville.s... Reviewed-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit 64b7b557dc8a96d9cfed6aedbf81de2df80c025d) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_sdvo.c | 31 +++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c +++ b/drivers/gpu/drm/i915/display/intel_sdvo.c @@ -2747,13 +2747,10 @@ intel_sdvo_dvi_init(struct intel_sdvo *i if (!intel_sdvo_connector) return false;
- if (device == 0) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_TMDS0; + if (device == 0) intel_sdvo_connector->output_flag = SDVO_OUTPUT_TMDS0; - } else if (device == 1) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_TMDS1; + else if (device == 1) intel_sdvo_connector->output_flag = SDVO_OUTPUT_TMDS1; - }
intel_connector = &intel_sdvo_connector->base; connector = &intel_connector->base; @@ -2808,7 +2805,6 @@ intel_sdvo_tv_init(struct intel_sdvo *in encoder->encoder_type = DRM_MODE_ENCODER_TVDAC; connector->connector_type = DRM_MODE_CONNECTOR_SVIDEO;
- intel_sdvo->controlled_output |= type; intel_sdvo_connector->output_flag = type;
if (intel_sdvo_connector_init(intel_sdvo_connector, intel_sdvo) < 0) { @@ -2849,13 +2845,10 @@ intel_sdvo_analog_init(struct intel_sdvo encoder->encoder_type = DRM_MODE_ENCODER_DAC; connector->connector_type = DRM_MODE_CONNECTOR_VGA;
- if (device == 0) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_RGB0; + if (device == 0) intel_sdvo_connector->output_flag = SDVO_OUTPUT_RGB0; - } else if (device == 1) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_RGB1; + else if (device == 1) intel_sdvo_connector->output_flag = SDVO_OUTPUT_RGB1; - }
if (intel_sdvo_connector_init(intel_sdvo_connector, intel_sdvo) < 0) { kfree(intel_sdvo_connector); @@ -2885,13 +2878,10 @@ intel_sdvo_lvds_init(struct intel_sdvo * encoder->encoder_type = DRM_MODE_ENCODER_LVDS; connector->connector_type = DRM_MODE_CONNECTOR_LVDS;
- if (device == 0) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_LVDS0; + if (device == 0) intel_sdvo_connector->output_flag = SDVO_OUTPUT_LVDS0; - } else if (device == 1) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_LVDS1; + else if (device == 1) intel_sdvo_connector->output_flag = SDVO_OUTPUT_LVDS1; - }
if (intel_sdvo_connector_init(intel_sdvo_connector, intel_sdvo) < 0) { kfree(intel_sdvo_connector); @@ -2946,8 +2936,14 @@ static u16 intel_sdvo_filter_output_flag static bool intel_sdvo_output_setup(struct intel_sdvo *intel_sdvo, u16 flags) { + struct drm_i915_private *i915 = to_i915(intel_sdvo->base.base.dev); + flags = intel_sdvo_filter_output_flags(flags);
+ intel_sdvo->controlled_output = flags; + + intel_sdvo_select_ddc_bus(i915, intel_sdvo); + if (flags & SDVO_OUTPUT_TMDS0) if (!intel_sdvo_dvi_init(intel_sdvo, 0)) return false; @@ -2988,7 +2984,6 @@ intel_sdvo_output_setup(struct intel_sdv if (flags == 0) { unsigned char bytes[2];
- intel_sdvo->controlled_output = 0; memcpy(bytes, &intel_sdvo->caps.output_flags, 2); DRM_DEBUG_KMS("%s: Unknown SDVO output type (0x%02x%02x)\n", SDVO_NAME(intel_sdvo), @@ -3400,8 +3395,6 @@ bool intel_sdvo_init(struct drm_i915_pri */ intel_sdvo->base.cloneable = 0;
- intel_sdvo_select_ddc_bus(dev_priv, intel_sdvo); - /* Set the input timing to the screen. Assume always input 0. */ if (!intel_sdvo_set_target_input(intel_sdvo)) goto err_output;
From: Dokyung Song dokyung.song@gmail.com
commit 6788ba8aed4e28e90f72d68a9d794e34eac17295 upstream.
This patch fixes an intra-object buffer overflow in brcmfmac that occurs when the device provides a 'bsscfgidx' equal to or greater than the buffer size. The patch adds a check that leads to a safe failure if that is the case.
This fixes CVE-2022-3628.
UBSAN: array-index-out-of-bounds in drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c index 52 is out of range for type 'brcmf_if *[16]' CPU: 0 PID: 1898 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events brcmf_fweh_event_worker Call Trace: dump_stack_lvl+0x57/0x7d ubsan_epilogue+0x5/0x40 __ubsan_handle_out_of_bounds+0x69/0x80 ? memcpy+0x39/0x60 brcmf_fweh_event_worker+0xae1/0xc00 ? brcmf_fweh_call_event_handler.isra.0+0x100/0x100 ? rcu_read_lock_sched_held+0xa1/0xd0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x273/0x3e0 process_one_work+0x873/0x13e0 ? lock_release+0x640/0x640 ? pwq_dec_nr_in_flight+0x320/0x320 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x8b/0xd10 ? __kthread_parkme+0xd9/0x1d0 ? process_one_work+0x13e0/0x13e0 kthread+0x379/0x450 ? _raw_spin_unlock_irq+0x24/0x30 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x1f/0x30 ================================================================================ general protection fault, probably for non-canonical address 0xe5601c0020023fff: 0000 [#1] SMP KASAN KASAN: maybe wild-memory-access in range [0x2b0100010011fff8-0x2b0100010011ffff] CPU: 0 PID: 1898 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events brcmf_fweh_event_worker RIP: 0010:brcmf_fweh_call_event_handler.isra.0+0x42/0x100 Code: 89 f5 53 48 89 fb 48 83 ec 08 e8 79 0b 38 fe 48 85 ed 74 7e e8 6f 0b 38 fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 8b 00 00 00 4c 8b 7d 00 44 89 e0 48 ba 00 00 00 RSP: 0018:ffffc9000259fbd8 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffff888115d8cd50 RCX: 0000000000000000 RDX: 0560200020023fff RSI: ffffffff8304bc91 RDI: ffff888115d8cd50 RBP: 2b0100010011ffff R08: ffff888112340050 R09: ffffed1023549809 R10: ffff88811aa4c047 R11: ffffed1023549808 R12: 0000000000000045 R13: ffffc9000259fca0 R14: ffff888112340050 R15: ffff888112340000 FS: 0000000000000000(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004053ccc0 CR3: 0000000112740000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: brcmf_fweh_event_worker+0x117/0xc00 ? brcmf_fweh_call_event_handler.isra.0+0x100/0x100 ? rcu_read_lock_sched_held+0xa1/0xd0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x273/0x3e0 process_one_work+0x873/0x13e0 ? lock_release+0x640/0x640 ? pwq_dec_nr_in_flight+0x320/0x320 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x8b/0xd10 ? __kthread_parkme+0xd9/0x1d0 ? process_one_work+0x13e0/0x13e0 kthread+0x379/0x450 ? _raw_spin_unlock_irq+0x24/0x30 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x1f/0x30 Modules linked in: 88XXau(O) 88x2bu(O) ---[ end trace 41d302138f3ff55a ]--- RIP: 0010:brcmf_fweh_call_event_handler.isra.0+0x42/0x100 Code: 89 f5 53 48 89 fb 48 83 ec 08 e8 79 0b 38 fe 48 85 ed 74 7e e8 6f 0b 38 fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 8b 00 00 00 4c 8b 7d 00 44 89 e0 48 ba 00 00 00 RSP: 0018:ffffc9000259fbd8 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffff888115d8cd50 RCX: 0000000000000000 RDX: 0560200020023fff RSI: ffffffff8304bc91 RDI: ffff888115d8cd50 RBP: 2b0100010011ffff R08: ffff888112340050 R09: ffffed1023549809 R10: ffff88811aa4c047 R11: ffffed1023549808 R12: 0000000000000045 R13: ffffc9000259fca0 R14: ffff888112340050 R15: ffff888112340000 FS: 0000000000000000(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004053ccc0 CR3: 0000000112740000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception
Reported-by: Dokyung Song dokyungs@yonsei.ac.kr Reported-by: Jisoo Jang jisoo.jang@yonsei.ac.kr Reported-by: Minsuk Kang linuxlovemin@yonsei.ac.kr Reviewed-by: Arend van Spriel aspriel@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Dokyung Song dokyung.song@gmail.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20221021061359.GA550858@laguna Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c @@ -228,6 +228,10 @@ static void brcmf_fweh_event_worker(stru brcmf_fweh_event_name(event->code), event->code, event->emsg.ifidx, event->emsg.bsscfgidx, event->emsg.addr); + if (event->emsg.bsscfgidx >= BRCMF_MAX_IFS) { + bphy_err(drvr, "invalid bsscfg index: %u\n", event->emsg.bsscfgidx); + goto event_free; + }
/* convert event message */ emsg_be = &event->emsg;
On Tue, Nov 08, 2022 at 02:37:18PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On 11/8/2022 5:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On 11/8/22 5:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, Nov 08, 2022 at 02:37:18PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
Build results: total: 152 pass: 152 fail: 0 Qemu test results: total: 500 pass: 500 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
Hey Greg,
Ran tests and boot tested on my system, no regressions found
Tested-by: Fenil Jain fkjainco@gmail.com
On Tue, 8 Nov 2022 at 19:37, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm.
kselftest: net: ip_defrag.sh fails on x86_64 and i386. Our bisect scripts are running.
This failure is also noticed on recent mainline master branch.
Test log did not have details of failure. # selftests: net: ip_defrag.sh not ok 19 selftests: net: ip_defrag.sh # exit=1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Stable-rc Linux-6.0: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.7-...
Mainline master: https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.1-rc4/test...
## Build * kernel: 6.0.8-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.0.y * git commit: 87175bf36da5ab43b42de327d6fdc5987b604269 * git describe: v6.0.7-198-g87175bf36da5 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.7-...
## Test Regressions (compared to v6.0.6-241-g436175d0f780)
* qemu_i386, kselftest-net - net.ip_defrag.sh
* x86, kselftest-net - net.ip_defrag.sh
## Metric Regressions (compared to v6.0.6-241-g436175d0f780)
## Test Fixes (compared to v6.0.6-241-g436175d0f780)
## Metric Fixes (compared to v6.0.6-241-g436175d0f780)
## Test result summary total: 158003, pass: 133527, fail: 6163, skip: 17932, xfail: 381
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 149 total, 146 passed, 3 failed * arm64: 46 total, 46 passed, 0 failed * i386: 37 total, 36 passed, 1 failed * mips: 27 total, 26 passed, 1 failed * parisc: 6 total, 6 passed, 0 failed * powerpc: 34 total, 30 passed, 4 failed * riscv: 12 total, 12 passed, 0 failed * s390: 12 total, 12 passed, 0 failed * sh: 12 total, 12 passed, 0 failed * sparc: 6 total, 6 passed, 0 failed * x86_64: 40 total, 40 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * kself[ * kselft[ * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simpl * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-math++ * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * perf/Zstd-perf.data-compression * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On Tue, Nov 08, 2022 at 02:37:18PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0) and powerpc (ps3_defconfig, GCC 12.2.0).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On Tue, Nov 08, 2022 at 02:37:18PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
Hi Greg,
6.0.8-rc1 tested.
Run tested on: - Intel Alder Lake x86_64 (nuc12 i7-1260P)
In addition - build tested for: - Allwinner A64 - Allwinner H3 - Allwinner H5 - Allwinner H6 - NXP iMX6 - NXP iMX8 - Qualcomm Dragonboard - Rockchip RK3288 - Rockchip RK3328 - Rockchip RK3399pro - Samsung Exynos5422
Tested-by: Rudi Heitbaum rudi@heitbaum.com -- Rudi
On 11/8/22 06:37, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
Hi Greg,
On Tue, Nov 08, 2022 at 02:37:18PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.8 release. There are 197 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
Build test (gcc version 12.2.1 20221016): mips: 52 configs -> no failure arm: 100 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure csky allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2]
[1]. https://openqa.qa.codethink.co.uk/tests/2129 [2]. https://openqa.qa.codethink.co.uk/tests/2135
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
linux-stable-mirror@lists.linaro.org