This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.78-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.78-rc1
Dokyung Song dokyung.song@gmail.com wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker()
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/sdvo: Setup DDC fully before output init
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/sdvo: Filter out invalid outputs more sensibly
Brian Norris briannorris@chromium.org drm/rockchip: dsi: Force synchronous probe
Brian Norris briannorris@chromium.org drm/rockchip: dsi: Clean up 'usage_mode' when failing to attach
Ronnie Sahlberg lsahlber@redhat.com cifs: fix regression in very old smb1 mounts
Matthew Wilcox (Oracle) willy@infradead.org ext4,f2fs: fix readahead of verity data
Sumit Garg sumit.garg@linaro.org tee: Fix tee_shm_register() for kernel TEE drivers
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: update the emulation mode after CR0 write
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: update the emulation mode after rsm
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: introduce emulator_recalc_and_set_mode
Maxim Levitsky mlevitsk@redhat.com KVM: x86: emulator: em_sysexit should update ctxt->mode
Ryan Roberts ryan.roberts@arm.com KVM: arm64: Fix bad dereference on MTE-enabled systems
Emanuele Giuseppe Esposito eesposit@redhat.com KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.8000001FH
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.80000001H
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.80000008H
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.8000001AH
Jim Mattson jmattson@google.com KVM: x86: Mask off reserved bits in CPUID.80000006H
Jiri Olsa olsajiri@gmail.com x86/syscall: Include asm/ptrace.h in syscall_wrapper header
Luís Henriques lhenriques@suse.de ext4: fix BUG_ON() when directory entry has invalid rec_len
Ye Bin yebin10@huawei.com ext4: fix warning in 'ext4_da_release_space'
Helge Deller deller@gmx.de parisc: Avoid printing the hardware path twice
Helge Deller deller@gmx.de parisc: Export iosapic_serial_irq() symbol for serial port driver
Helge Deller deller@gmx.de parisc: Make 8250_gsc driver dependend on CONFIG_PARISC
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix pebs event constraints for SPR
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix pebs event constraints for ICL
Mark Rutland mark.rutland@arm.com arm64: entry: avoid kprobe recursion
Ard Biesheuvel ardb@kernel.org efi: random: Use 'ACPI reclaim' memory for random seed
Ard Biesheuvel ardb@kernel.org efi: random: reduce seed size to 32 bytes
Miklos Szeredi mszeredi@redhat.com fuse: add file_modified() to fallocate
Gaosheng Cui cuigaosheng1@huawei.com capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
Zheng Yejian zhengyejian1@huawei.com tracing/histogram: Update document for KEYS_MAX size
Rasmus Villemoes linux@rasmusvillemoes.dk tools/nolibc/string: Fix memcmp() implementation
Steven Rostedt (Google) rostedt@goodmis.org ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()
Li Qiang liq3ea@163.com kprobe: reverse kp->flags when arm_kprobe failed
Shang XiaoJing shangxiaojing@huawei.com tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
Kuniyuki Iwashima kuniyu@amazon.com tcp/udp: Make early_demux back namespacified.
Li Huafei lihuafei1@huawei.com ftrace: Fix use-after-free for dynamic ftrace_ops
David Sterba dsterba@suse.com btrfs: fix type of parameter generation in btrfs_get_dentry
Josef Bacik josef@toxicpanda.com btrfs: fix tree mod log mishandling of reallocated nodes
Filipe Manana fdmanana@suse.com btrfs: fix lost file sync on direct IO write with nowait and dsync iocb
Eric Biggers ebiggers@google.com fscrypt: fix keyring memory leak on mount failure
Eric Biggers ebiggers@google.com fscrypt: stop using keyrings subsystem for fscrypt_master_key
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix memory leaks of the whole sk due to OOB skb.
Yu Kuai yukuai3@huawei.com block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix attempting to access uninitialized memory
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
Chen Zhongjin chenzhongjin@huawei.com i2c: piix4: Fix adapter not be removed in piix4_remove()
Cristian Marussi cristian.marussi@arm.com arm64: dts: juno: Add thermal critical trip points
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Fix devres allocation device in virtio transport
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Make Rx chan_setup fail on memory errors
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Suppress the driver's bind attributes
Chen Zhongjin chenzhongjin@huawei.com block: Fix possible memory leak for rq_wb on add_disk failure
Ioana Ciornei ioana.ciornei@nxp.com arm64: dts: ls208xa: specify clock frequencies for the MDIO controllers
Ioana Ciornei ioana.ciornei@nxp.com arm64: dts: ls1088a: specify clock frequencies for the MDIO controllers
Ioana Ciornei ioana.ciornei@nxp.com arm64: dts: lx2160a: specify clock frequencies for the MDIO controllers
Peng Fan peng.fan@nxp.com arm64: dts: imx8: correct clock order
Tim Harvey tharvey@gateworks.com ARM: dts: imx6qdl-gw59{10,13}: fix user pushbutton GPIO offset
Taniya Das quic_tdas@quicinc.com clk: qcom: Update the force mem core bit for GPU clocks
Jerry Snitselaar jsnitsel@redhat.com efi/tpm: Pass correct address to memblock_reserve
Martin Tůma martin.tuma@digiteqautomotive.com i2c: xiic: Add platform module alias
Danijel Slivka danijel.slivka@amd.com drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
Samuel Bailey samuel.bailey1@gmail.com HID: saitek: add madcatz variant of MMO7 mouse device ID
Uday Shankar ushankar@purestorage.com scsi: core: Restrict legal sdev_state transitions via sysfs
Ashish Kalra ashish.kalra@amd.com ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
Sakari Ailus sakari.ailus@linux.intel.com media: v4l: subdev: Fail graciously when getting try data for NULL state
Hangyu Hua hbh25y@gmail.com media: meson: vdec: fix possible refcount leak in vdec_probe()
Hans Verkuil hverkuil-cisco@xs4all.nl media: dvb-frontends/drxk: initialize err to 0
Hans Verkuil hverkuil-cisco@xs4all.nl media: cros-ec-cec: limit msg.len to CEC_MAX_MSG_SIZE
Hans Verkuil hverkuil-cisco@xs4all.nl media: s5p_cec: limit msg.len to CEC_MAX_MSG_SIZE
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Zero v4l2_subdev_format fields in when validating links
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Use correct macro for gradient registers
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Initialize color space on resizer sink and source pads
Laurent Pinchart laurent.pinchart@ideasonboard.com media: rkisp1: Don't pass the quantization to rkisp1_csm_config()
Peter Oberparleiter oberpar@linux.ibm.com s390/cio: fix out-of-bounds access on cio_ignore free
Vineeth Vijayan vneethv@linux.ibm.com s390/cio: derive cdev information only for IO-subchannels
Peter Oberparleiter oberpar@linux.ibm.com s390/boot: add secure boot trailer
Heiko Carstens hca@linux.ibm.com s390/uaccess: add missing EX_TABLE entries to __clear_user()
Linus Walleij linus.walleij@linaro.org mtd: parsers: bcm47xxpart: Fix halfblock reads
Rafał Miłecki rafal@milecki.pl mtd: parsers: bcm47xxpart: print correct offset on read error
Helge Deller deller@gmx.de fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
Helge Deller deller@gmx.de video/fbdev/stifb: Implement the stifb_fillrect() function
Johan Hovold johan+linaro@kernel.org drm/msm/hdmi: fix IRQ lifetime
Daniel Thompson daniel.thompson@linaro.org drm/msm/hdmi: Remove spurious IRQF_ONESHOT flag
Dexuan Cui decui@microsoft.com vsock: fix possible infinite sleep in vsock_connectible_wait_data()
Zhengchao Shao shaozhengchao@huawei.com ipv6: fix WARNING in ip6_route_net_exit_late()
Chen Zhongjin chenzhongjin@huawei.com net, neigh: Fix null-ptr-deref in neigh_table_clear()
Chen Zhongjin chenzhongjin@huawei.com net/smc: Fix possible leaked pernet namespace in smc_init()
Liu Peibao liupeibao@loongson.cn stmmac: dwmac-loongson: fix invalid mdio_node
Nick Child nnac123@linux.ibm.com ibmvnic: Free rwi on reset success
Gaosheng Cui cuigaosheng1@huawei.com net: mdio: fix undefined behavior in bit shift for __mdiobus_register
Hawkins Jiawei yin31149@gmail.com Bluetooth: L2CAP: Fix memory leak in vhci_write
Zhengchao Shao shaozhengchao@huawei.com Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
Soenke Huster soenke.huster@eknoes.de Bluetooth: virtio_bt: Use skb_put to set length
Maxim Mikityanskiy maxtram95@gmail.com Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: enforce documented limit to prevent allocating huge memory
Filipe Manana fdmanana@suse.com btrfs: fix ulist leaks in error paths of qgroup self tests
Filipe Manana fdmanana@suse.com btrfs: fix inode list leak during backref walking at find_parent_nodes()
Filipe Manana fdmanana@suse.com btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
Yang Yingliang yangyingliang@huawei.com isdn: mISDN: netjet: fix wrong check of device registration
Yang Yingliang yangyingliang@huawei.com mISDN: fix possible memory leak in mISDN_register_device()
Zhang Qilong zhangqilong3@huawei.com rose: Fix NULL pointer dereference in rose_send_frame()
Zhengchao Shao shaozhengchao@huawei.com ipvs: fix WARNING in ip_vs_app_net_cleanup()
Zhengchao Shao shaozhengchao@huawei.com ipvs: fix WARNING in __ip_vs_cleanup_batch()
Jason A. Donenfeld Jason@zx2c4.com ipvs: use explicitly signed chars
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: release flow rule object from commit path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: netlink notifier might race to release objects
Ziyang Xuan william.xuanziyang@huawei.com net: tun: fix bugs for oversize packet when napi frags enabled
Dan Carpenter dan.carpenter@oracle.com net: sched: Fix use after free in red_enqueue()
Sergey Shtylyov s.shtylyov@omp.ru ata: pata_legacy: fix pdc20230_set_piomode()
Zhang Changzhong zhangchangzhong@huawei.com net: fec: fix improper use of NETDEV_TX_BUSY
Shang XiaoJing shangxiaojing@huawei.com nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()
Shang XiaoJing shangxiaojing@huawei.com nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()
Shang XiaoJing shangxiaojing@huawei.com nfc: nxp-nci: Fix potential memory leak in nxp_nci_send()
Shang XiaoJing shangxiaojing@huawei.com nfc: fdp: Fix potential memory leak in fdp_nci_send()
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: fall back to default tagger if we can't load the one from DT
Dan Carpenter dan.carpenter@oracle.com RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()
Chen Zhongjin chenzhongjin@huawei.com RDMA/core: Fix null-ptr-deref in ib_core_cleanup()
Chen Zhongjin chenzhongjin@huawei.com net: dsa: Fix possible memory leaks in dsa_loop_init()
Zhang Xiaoxu zhangxiaoxu5@huawei.com nfs4: Fix kmemleak when allocate slot failed
Benjamin Coddington bcodding@redhat.com NFSv4.2: Fixup CLONE dest file size for zero-length count
Zhang Xiaoxu zhangxiaoxu5@huawei.com SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Fix a potential state reclaim deadlock
Yangyang Li liyangyang20@huawei.com RDMA/hns: Disable local invalidate operation
Wenpeng Liang liangwenpeng@huawei.com RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
Xinhao Liu liuxinhao5@hisilicon.com RDMA/hns: Remove magic number
Dean Luick dean.luick@cornelisnetworks.com IB/hfi1: Correctly move list in sc_disable()
Håkon Bugge haakon.bugge@oracle.com RDMA/cma: Use output interface for net_dev check
Alexander Graf graf@amazon.com KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
Alexander Graf graf@amazon.com KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
Aaron Lewis aaronlewis@google.com KVM: x86: Protect the unused bits in MSR exiting flags
Roderick Colenbrander roderick@gaikai.com HID: playstation: add initial DualSense Edge controller support
Baolin Wang baolin.wang@linux.alibaba.com mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Shirish S shirish.s@amd.com drm/amd/display: explicitly disable psr_feature_enable appropriately
Sean Christopherson seanjc@google.com KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)
Sean Christopherson seanjc@google.com KVM: x86: Trace re-injected exceptions
Lukas Wunner lukas@wunner.de serial: ar933x: Deassert Transmit Enable on ->rs485_config()
James Smart jsmart2021@gmail.com scsi: lpfc: Rework MIB Rx Monitor debug info logic
James Smart jsmart2021@gmail.com scsi: lpfc: Adjust CMF total bytes and rxmonitor
James Smart jsmart2021@gmail.com scsi: lpfc: Adjust bytes received vales during cmf timer interval
-------------
Diffstat:
Documentation/trace/histogram.rst | 2 +- Makefile | 4 +- arch/arm/boot/dts/imx6qdl-gw5910.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-gw5913.dtsi | 2 +- arch/arm64/boot/dts/arm/juno-base.dtsi | 14 + arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 6 + arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 6 + arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 6 + arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 18 +- arch/arm64/kernel/entry-common.c | 3 +- arch/arm64/kvm/hyp/exception.c | 3 +- arch/parisc/include/asm/hardware.h | 12 +- arch/parisc/kernel/drivers.c | 14 +- arch/s390/boot/compressed/vmlinux.lds.S | 13 +- arch/s390/lib/uaccess.c | 6 +- arch/x86/events/intel/core.c | 1 + arch/x86/events/intel/ds.c | 18 +- arch/x86/include/asm/syscall_wrapper.h | 2 +- arch/x86/kvm/cpuid.c | 11 +- arch/x86/kvm/emulate.c | 104 +++-- arch/x86/kvm/trace.h | 12 +- arch/x86/kvm/vmx/vmx.c | 5 + arch/x86/kvm/x86.c | 134 +++++- block/bfq-iosched.c | 4 +- block/genhd.c | 1 + drivers/acpi/apei/ghes.c | 2 +- drivers/ata/pata_legacy.c | 5 +- drivers/bluetooth/virtio_bt.c | 2 +- drivers/clk/qcom/gcc-sc7280.c | 1 + drivers/clk/qcom/gpucc-sc7280.c | 1 + drivers/firmware/arm_scmi/driver.c | 9 +- drivers/firmware/arm_scmi/virtio.c | 6 +- drivers/firmware/efi/efi.c | 2 +- drivers/firmware/efi/libstub/random.c | 7 +- drivers/firmware/efi/tpm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 8 +- drivers/gpu/drm/i915/display/intel_sdvo.c | 58 ++- drivers/gpu/drm/msm/hdmi/hdmi.c | 4 +- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 22 +- drivers/hid/hid-ids.h | 2 + drivers/hid/hid-playstation.c | 5 +- drivers/hid/hid-quirks.c | 1 + drivers/hid/hid-saitek.c | 2 + drivers/i2c/busses/i2c-piix4.c | 1 + drivers/i2c/busses/i2c-xiic.c | 1 + drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/device.c | 10 +- drivers/infiniband/core/nldev.c | 2 +- drivers/infiniband/hw/hfi1/pio.c | 3 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 266 ++++------- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 166 +++---- drivers/infiniband/hw/qedr/main.c | 9 +- drivers/isdn/hardware/mISDN/netjet.c | 2 +- drivers/isdn/mISDN/core.c | 5 +- drivers/media/cec/platform/cros-ec/cros-ec-cec.c | 2 + drivers/media/cec/platform/s5p/s5p_cec.c | 2 + drivers/media/dvb-frontends/drxk_hard.c | 2 +- .../platform/rockchip/rkisp1/rkisp1-capture.c | 7 +- .../media/platform/rockchip/rkisp1/rkisp1-params.c | 14 +- .../media/platform/rockchip/rkisp1/rkisp1-regs.h | 2 +- .../platform/rockchip/rkisp1/rkisp1-resizer.c | 4 + drivers/mtd/parsers/bcm47xxpart.c | 4 +- drivers/net/dsa/dsa_loop.c | 25 +- drivers/net/ethernet/freescale/fec_main.c | 4 +- drivers/net/ethernet/ibm/ibmvnic.c | 16 +- .../net/ethernet/stmicro/stmmac/dwmac-loongson.c | 7 +- drivers/net/phy/mdio_bus.c | 2 +- drivers/net/tun.c | 3 +- .../wireless/broadcom/brcm80211/brcmfmac/fweh.c | 4 + drivers/nfc/fdp/fdp.c | 10 +- drivers/nfc/nfcmrvl/i2c.c | 7 +- drivers/nfc/nxp-nci/core.c | 7 +- drivers/nfc/s3fwrn5/core.c | 8 +- drivers/parisc/iosapic.c | 1 + drivers/s390/cio/css.c | 3 +- drivers/scsi/lpfc/lpfc.h | 15 +- drivers/scsi/lpfc/lpfc_crtn.h | 8 + drivers/scsi/lpfc/lpfc_debugfs.c | 57 +-- drivers/scsi/lpfc/lpfc_debugfs.h | 2 +- drivers/scsi/lpfc/lpfc_init.c | 101 ++--- drivers/scsi/lpfc/lpfc_mem.c | 9 +- drivers/scsi/lpfc/lpfc_sli.c | 190 +++++++- drivers/scsi/scsi_sysfs.c | 8 + drivers/staging/media/meson/vdec/vdec.c | 2 + drivers/tee/tee_core.c | 3 + drivers/tee/tee_shm.c | 3 - drivers/tty/serial/8250/Kconfig | 2 +- drivers/tty/serial/ar933x_uart.c | 5 + drivers/video/fbdev/stifb.c | 46 +- fs/btrfs/backref.c | 54 ++- fs/btrfs/export.c | 2 +- fs/btrfs/export.h | 2 +- fs/btrfs/extent-tree.c | 25 +- fs/btrfs/file.c | 39 +- fs/btrfs/tests/qgroup-tests.c | 20 +- fs/cifs/connect.c | 11 +- fs/crypto/fscrypt_private.h | 71 ++- fs/crypto/hooks.c | 10 +- fs/crypto/keyring.c | 491 +++++++++++---------- fs/crypto/keysetup.c | 81 ++-- fs/crypto/policy.c | 8 +- fs/ext4/migrate.c | 3 +- fs/ext4/namei.c | 10 +- fs/ext4/verity.c | 3 +- fs/f2fs/verity.c | 3 +- fs/fuse/file.c | 4 + fs/nfs/delegation.c | 36 +- fs/nfs/nfs42proc.c | 3 + fs/nfs/nfs4client.c | 1 + fs/nfs/nfs4state.c | 2 + fs/super.c | 3 +- include/acpi/ghes.h | 2 +- include/linux/efi.h | 2 +- include/linux/fs.h | 2 +- include/linux/fscrypt.h | 4 +- include/linux/hugetlb.h | 8 +- include/media/v4l2-subdev.h | 6 + include/net/protocol.h | 4 - include/net/tcp.h | 2 +- include/net/udp.h | 2 +- kernel/kprobes.c | 5 +- kernel/trace/ftrace.c | 16 +- kernel/trace/kprobe_event_gen_test.c | 18 +- kernel/trace/ring_buffer.c | 11 + mm/gup.c | 14 +- mm/hugetlb.c | 27 +- net/bluetooth/l2cap_core.c | 84 +++- net/core/neighbour.c | 2 +- net/dsa/dsa2.c | 13 +- net/ipv4/af_inet.c | 14 +- net/ipv4/ip_input.c | 37 +- net/ipv4/sysctl_net_ipv4.c | 59 +-- net/ipv6/ip6_input.c | 23 +- net/ipv6/route.c | 14 +- net/ipv6/tcp_ipv6.c | 9 +- net/ipv6/udp.c | 9 +- net/netfilter/ipset/ip_set_hash_gen.h | 30 +- net/netfilter/ipvs/ip_vs_app.c | 10 +- net/netfilter/ipvs/ip_vs_conn.c | 30 +- net/netfilter/nf_tables_api.c | 8 +- net/rose/rose_link.c | 3 + net/sched/sch_red.c | 4 +- net/smc/af_smc.c | 6 +- net/sunrpc/sysfs.c | 12 +- net/unix/af_unix.c | 13 +- net/vmw_vsock/af_vsock.c | 5 +- security/commoncap.c | 6 +- tools/include/nolibc/nolibc.h | 4 +- 151 files changed, 1767 insertions(+), 1243 deletions(-)
From: James Smart jsmart2021@gmail.com
[ Upstream commit d5ac69b332d8859d1f8bd5d4dee31f3267f6b0d2 ]
The newly added congestion mgmt framework is seeing unexpected congestion FPINs and signals. In analysis, time values given to the adapter are not at hard time intervals. Thus the drift vs the transfer count seen is affecting how the framework manages things.
Adjust counters to cover the drift.
Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com Co-developed-by: Justin Tee justin.tee@broadcom.com Signed-off-by: Justin Tee justin.tee@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Stable-dep-of: bd269188ea94 ("scsi: lpfc: Rework MIB Rx Monitor debug info logic") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_init.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index fe8b4258cdc0..38bd58fb2044 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -5870,7 +5870,7 @@ lpfc_cmf_timer(struct hrtimer *timer) uint32_t io_cnt; uint32_t head, tail; uint32_t busy, max_read; - uint64_t total, rcv, lat, mbpi; + uint64_t total, rcv, lat, mbpi, extra; int timer_interval = LPFC_CMF_INTERVAL; uint32_t ms; struct lpfc_cgn_stat *cgs; @@ -5937,7 +5937,19 @@ lpfc_cmf_timer(struct hrtimer *timer) phba->hba_flag & HBA_SETUP) { mbpi = phba->cmf_last_sync_bw; phba->cmf_last_sync_bw = 0; - lpfc_issue_cmf_sync_wqe(phba, LPFC_CMF_INTERVAL, total); + extra = 0; + + /* Calculate any extra bytes needed to account for the + * timer accuracy. If we are less than LPFC_CMF_INTERVAL + * add an extra 3% slop factor, equal to LPFC_CMF_INTERVAL + * add an extra 2%. The goal is to equalize total with a + * time > LPFC_CMF_INTERVAL or <= LPFC_CMF_INTERVAL + 1 + */ + if (ms == LPFC_CMF_INTERVAL) + extra = div_u64(total, 50); + else if (ms < LPFC_CMF_INTERVAL) + extra = div_u64(total, 33); + lpfc_issue_cmf_sync_wqe(phba, LPFC_CMF_INTERVAL, total + extra); } else { /* For Monitor mode or link down we want mbpi * to be the full link speed
From: James Smart jsmart2021@gmail.com
[ Upstream commit a6269f837045acb02904f31f05acde847ec8f8a7 ]
Calculate any extra bytes needed to account for timer accuracy. If we are less than LPFC_CMF_INTERVAL, then calculate the adjustment needed for total to reflect a full LPFC_CMF_INTERVAL.
Add additional info to rxmonitor, and adjust some log formatting.
Link: https://lore.kernel.org/r/20211204002644.116455-7-jsmart2021@gmail.com Co-developed-by: Justin Tee justin.tee@broadcom.com Signed-off-by: Justin Tee justin.tee@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Stable-dep-of: bd269188ea94 ("scsi: lpfc: Rework MIB Rx Monitor debug info logic") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_debugfs.c | 14 ++++++++------ drivers/scsi/lpfc/lpfc_debugfs.h | 2 +- drivers/scsi/lpfc/lpfc_init.c | 20 ++++++++++++-------- 4 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index b2508a00bafd..924483368037 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -1611,6 +1611,7 @@ struct lpfc_hba { #define LPFC_MAX_RXMONITOR_ENTRY 800 #define LPFC_MAX_RXMONITOR_DUMP 32 struct rxtable_entry { + uint64_t cmf_bytes; /* Total no of read bytes for CMF_SYNC_WQE */ uint64_t total_bytes; /* Total no of read bytes requested */ uint64_t rcv_bytes; /* Total no of read bytes completed */ uint64_t avg_io_size; diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c index 79bc86ba59b3..61f8dcd072ac 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.c +++ b/drivers/scsi/lpfc/lpfc_debugfs.c @@ -5561,22 +5561,24 @@ lpfc_rx_monitor_read(struct file *file, char __user *buf, size_t nbytes, start = tail;
len += scnprintf(buffer + len, MAX_DEBUGFS_RX_TABLE_SIZE - len, - " MaxBPI\t Total Data Cmd Total Data Cmpl " - " Latency(us) Avg IO Size\tMax IO Size IO cnt " - "Info BWutil(ms)\n"); + " MaxBPI Tot_Data_CMF Tot_Data_Cmd " + "Tot_Data_Cmpl Lat(us) Avg_IO Max_IO " + "Bsy IO_cnt Info BWutil(ms)\n"); get_table: for (i = start; i < last; i++) { entry = &phba->rxtable[i]; len += scnprintf(buffer + len, MAX_DEBUGFS_RX_TABLE_SIZE - len, - "%3d:%12lld %12lld\t%12lld\t" - "%8lldus\t%8lld\t%10lld " - "%8d %2d %2d(%2d)\n", + "%3d:%12lld %12lld %12lld %12lld " + "%7lldus %8lld %7lld " + "%2d %4d %2d %2d(%2d)\n", i, entry->max_bytes_per_interval, + entry->cmf_bytes, entry->total_bytes, entry->rcv_bytes, entry->avg_io_latency, entry->avg_io_size, entry->max_read_cnt, + entry->cmf_busy, entry->io_cnt, entry->cmf_info, entry->timer_utilization, diff --git a/drivers/scsi/lpfc/lpfc_debugfs.h b/drivers/scsi/lpfc/lpfc_debugfs.h index a5bf71b34972..6dd361c1fd31 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.h +++ b/drivers/scsi/lpfc/lpfc_debugfs.h @@ -282,7 +282,7 @@ struct lpfc_idiag { void *ptr_private; };
-#define MAX_DEBUGFS_RX_TABLE_SIZE (100 * LPFC_MAX_RXMONITOR_ENTRY) +#define MAX_DEBUGFS_RX_TABLE_SIZE (128 * LPFC_MAX_RXMONITOR_ENTRY) struct lpfc_rx_monitor_debug { char *i_private; char *buffer; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 38bd58fb2044..5dad0b07953f 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -5870,7 +5870,7 @@ lpfc_cmf_timer(struct hrtimer *timer) uint32_t io_cnt; uint32_t head, tail; uint32_t busy, max_read; - uint64_t total, rcv, lat, mbpi, extra; + uint64_t total, rcv, lat, mbpi, extra, cnt; int timer_interval = LPFC_CMF_INTERVAL; uint32_t ms; struct lpfc_cgn_stat *cgs; @@ -5941,20 +5941,23 @@ lpfc_cmf_timer(struct hrtimer *timer)
/* Calculate any extra bytes needed to account for the * timer accuracy. If we are less than LPFC_CMF_INTERVAL - * add an extra 3% slop factor, equal to LPFC_CMF_INTERVAL - * add an extra 2%. The goal is to equalize total with a - * time > LPFC_CMF_INTERVAL or <= LPFC_CMF_INTERVAL + 1 + * calculate the adjustment needed for total to reflect + * a full LPFC_CMF_INTERVAL. */ - if (ms == LPFC_CMF_INTERVAL) - extra = div_u64(total, 50); - else if (ms < LPFC_CMF_INTERVAL) - extra = div_u64(total, 33); + if (ms && ms < LPFC_CMF_INTERVAL) { + cnt = div_u64(total, ms); /* bytes per ms */ + cnt *= LPFC_CMF_INTERVAL; /* what total should be */ + if (cnt > mbpi) + cnt = mbpi; + extra = cnt - total; + } lpfc_issue_cmf_sync_wqe(phba, LPFC_CMF_INTERVAL, total + extra); } else { /* For Monitor mode or link down we want mbpi * to be the full link speed */ mbpi = phba->cmf_link_byte_count; + extra = 0; } phba->cmf_timer_cnt++;
@@ -5985,6 +5988,7 @@ lpfc_cmf_timer(struct hrtimer *timer) LPFC_RXMONITOR_TABLE_IN_USE); entry = &phba->rxtable[head]; entry->total_bytes = total; + entry->cmf_bytes = total + extra; entry->rcv_bytes = rcv; entry->cmf_busy = busy; entry->cmf_info = phba->cmf_active_info;
From: James Smart jsmart2021@gmail.com
[ Upstream commit bd269188ea94e40ab002cad7b0df8f12b8f0de54 ]
The kernel test robot reported the following sparse warning:
arch/arm64/include/asm/cmpxchg.h:88:1: sparse: sparse: cast truncates bits from constant value (369 becomes 69)
On arm64, atomic_xchg only works on 8-bit byte fields. Thus, the macro usage of LPFC_RXMONITOR_TABLE_IN_USE can be unintentionally truncated leading to all logic involving the LPFC_RXMONITOR_TABLE_IN_USE macro to not work properly.
Replace the Rx Table atomic_t indexing logic with a new lpfc_rx_info_monitor structure that holds a circular ring buffer. For locking semantics, a spinlock_t is used.
Link: https://lore.kernel.org/r/20220819011736.14141-4-jsmart2021@gmail.com Fixes: 17b27ac59224 ("scsi: lpfc: Add rx monitoring statistics") Cc: stable@vger.kernel.org # v5.15+ Co-developed-by: Justin Tee justin.tee@broadcom.com Signed-off-by: Justin Tee justin.tee@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc.h | 14 ++- drivers/scsi/lpfc/lpfc_crtn.h | 8 ++ drivers/scsi/lpfc/lpfc_debugfs.c | 59 ++-------- drivers/scsi/lpfc/lpfc_debugfs.h | 2 +- drivers/scsi/lpfc/lpfc_init.c | 83 ++++---------- drivers/scsi/lpfc/lpfc_mem.c | 9 +- drivers/scsi/lpfc/lpfc_sli.c | 190 +++++++++++++++++++++++++++++-- 7 files changed, 240 insertions(+), 125 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 924483368037..457ff86e02b3 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -1558,10 +1558,7 @@ struct lpfc_hba { u32 cgn_acqe_cnt;
/* RX monitor handling for CMF */ - struct rxtable_entry *rxtable; /* RX_monitor information */ - atomic_t rxtable_idx_head; -#define LPFC_RXMONITOR_TABLE_IN_USE (LPFC_MAX_RXMONITOR_ENTRY + 73) - atomic_t rxtable_idx_tail; + struct lpfc_rx_info_monitor *rx_monitor; atomic_t rx_max_read_cnt; /* Maximum read bytes */ uint64_t rx_block_cnt;
@@ -1610,7 +1607,7 @@ struct lpfc_hba {
#define LPFC_MAX_RXMONITOR_ENTRY 800 #define LPFC_MAX_RXMONITOR_DUMP 32 -struct rxtable_entry { +struct rx_info_entry { uint64_t cmf_bytes; /* Total no of read bytes for CMF_SYNC_WQE */ uint64_t total_bytes; /* Total no of read bytes requested */ uint64_t rcv_bytes; /* Total no of read bytes completed */ @@ -1625,6 +1622,13 @@ struct rxtable_entry { uint32_t timer_interval; };
+struct lpfc_rx_info_monitor { + struct rx_info_entry *ring; /* info organized in a circular buffer */ + u32 head_idx, tail_idx; /* index to head/tail of ring */ + spinlock_t lock; /* spinlock for ring */ + u32 entries; /* storing number entries/size of ring */ +}; + static inline struct Scsi_Host * lpfc_shost_from_vport(struct lpfc_vport *vport) { diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h index c9770b1d2366..470239394e64 100644 --- a/drivers/scsi/lpfc/lpfc_crtn.h +++ b/drivers/scsi/lpfc/lpfc_crtn.h @@ -90,6 +90,14 @@ void lpfc_cgn_dump_rxmonitor(struct lpfc_hba *phba); void lpfc_cgn_update_stat(struct lpfc_hba *phba, uint32_t dtag); void lpfc_unblock_requests(struct lpfc_hba *phba); void lpfc_block_requests(struct lpfc_hba *phba); +int lpfc_rx_monitor_create_ring(struct lpfc_rx_info_monitor *rx_monitor, + u32 entries); +void lpfc_rx_monitor_destroy_ring(struct lpfc_rx_info_monitor *rx_monitor); +void lpfc_rx_monitor_record(struct lpfc_rx_info_monitor *rx_monitor, + struct rx_info_entry *entry); +u32 lpfc_rx_monitor_report(struct lpfc_hba *phba, + struct lpfc_rx_info_monitor *rx_monitor, char *buf, + u32 buf_len, u32 max_read_entries);
void lpfc_mbx_cmpl_local_config_link(struct lpfc_hba *, LPFC_MBOXQ_t *); void lpfc_mbx_cmpl_reg_login(struct lpfc_hba *, LPFC_MBOXQ_t *); diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c index 61f8dcd072ac..8e8bbe734e87 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.c +++ b/drivers/scsi/lpfc/lpfc_debugfs.c @@ -5520,7 +5520,7 @@ lpfc_rx_monitor_open(struct inode *inode, struct file *file) if (!debug) goto out;
- debug->buffer = vmalloc(MAX_DEBUGFS_RX_TABLE_SIZE); + debug->buffer = vmalloc(MAX_DEBUGFS_RX_INFO_SIZE); if (!debug->buffer) { kfree(debug); goto out; @@ -5541,57 +5541,18 @@ lpfc_rx_monitor_read(struct file *file, char __user *buf, size_t nbytes, struct lpfc_rx_monitor_debug *debug = file->private_data; struct lpfc_hba *phba = (struct lpfc_hba *)debug->i_private; char *buffer = debug->buffer; - struct rxtable_entry *entry; - int i, len = 0, head, tail, last, start; - - head = atomic_read(&phba->rxtable_idx_head); - while (head == LPFC_RXMONITOR_TABLE_IN_USE) { - /* Table is getting updated */ - msleep(20); - head = atomic_read(&phba->rxtable_idx_head); - }
- tail = atomic_xchg(&phba->rxtable_idx_tail, head); - if (!phba->rxtable || head == tail) { - len += scnprintf(buffer + len, MAX_DEBUGFS_RX_TABLE_SIZE - len, - "Rxtable is empty\n"); - goto out; - } - last = (head > tail) ? head : LPFC_MAX_RXMONITOR_ENTRY; - start = tail; - - len += scnprintf(buffer + len, MAX_DEBUGFS_RX_TABLE_SIZE - len, - " MaxBPI Tot_Data_CMF Tot_Data_Cmd " - "Tot_Data_Cmpl Lat(us) Avg_IO Max_IO " - "Bsy IO_cnt Info BWutil(ms)\n"); -get_table: - for (i = start; i < last; i++) { - entry = &phba->rxtable[i]; - len += scnprintf(buffer + len, MAX_DEBUGFS_RX_TABLE_SIZE - len, - "%3d:%12lld %12lld %12lld %12lld " - "%7lldus %8lld %7lld " - "%2d %4d %2d %2d(%2d)\n", - i, entry->max_bytes_per_interval, - entry->cmf_bytes, - entry->total_bytes, - entry->rcv_bytes, - entry->avg_io_latency, - entry->avg_io_size, - entry->max_read_cnt, - entry->cmf_busy, - entry->io_cnt, - entry->cmf_info, - entry->timer_utilization, - entry->timer_interval); + if (!phba->rx_monitor) { + scnprintf(buffer, MAX_DEBUGFS_RX_INFO_SIZE, + "Rx Monitor Info is empty.\n"); + } else { + lpfc_rx_monitor_report(phba, phba->rx_monitor, buffer, + MAX_DEBUGFS_RX_INFO_SIZE, + LPFC_MAX_RXMONITOR_ENTRY); }
- if (head != last) { - start = 0; - last = head; - goto get_table; - } -out: - return simple_read_from_buffer(buf, nbytes, ppos, buffer, len); + return simple_read_from_buffer(buf, nbytes, ppos, buffer, + strlen(buffer)); }
static int diff --git a/drivers/scsi/lpfc/lpfc_debugfs.h b/drivers/scsi/lpfc/lpfc_debugfs.h index 6dd361c1fd31..f71e5b6311ac 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.h +++ b/drivers/scsi/lpfc/lpfc_debugfs.h @@ -282,7 +282,7 @@ struct lpfc_idiag { void *ptr_private; };
-#define MAX_DEBUGFS_RX_TABLE_SIZE (128 * LPFC_MAX_RXMONITOR_ENTRY) +#define MAX_DEBUGFS_RX_INFO_SIZE (128 * LPFC_MAX_RXMONITOR_ENTRY) struct lpfc_rx_monitor_debug { char *i_private; char *buffer; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 5dad0b07953f..855817f6fe67 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -5428,38 +5428,12 @@ lpfc_async_link_speed_to_read_top(struct lpfc_hba *phba, uint8_t speed_code) void lpfc_cgn_dump_rxmonitor(struct lpfc_hba *phba) { - struct rxtable_entry *entry; - int cnt = 0, head, tail, last, start; - - head = atomic_read(&phba->rxtable_idx_head); - tail = atomic_read(&phba->rxtable_idx_tail); - if (!phba->rxtable || head == tail) { - lpfc_printf_log(phba, KERN_ERR, LOG_CGN_MGMT, - "4411 Rxtable is empty\n"); - return; - } - last = tail; - start = head; - - /* Display the last LPFC_MAX_RXMONITOR_DUMP entries from the rxtable */ - while (start != last) { - if (start) - start--; - else - start = LPFC_MAX_RXMONITOR_ENTRY - 1; - entry = &phba->rxtable[start]; + if (!phba->rx_monitor) { lpfc_printf_log(phba, KERN_INFO, LOG_CGN_MGMT, - "4410 %02d: MBPI %lld Xmit %lld Cmpl %lld " - "Lat %lld ASz %lld Info %02d BWUtil %d " - "Int %d slot %d\n", - cnt, entry->max_bytes_per_interval, - entry->total_bytes, entry->rcv_bytes, - entry->avg_io_latency, entry->avg_io_size, - entry->cmf_info, entry->timer_utilization, - entry->timer_interval, start); - cnt++; - if (cnt >= LPFC_MAX_RXMONITOR_DUMP) - return; + "4411 Rx Monitor Info is empty.\n"); + } else { + lpfc_rx_monitor_report(phba, phba->rx_monitor, NULL, 0, + LPFC_MAX_RXMONITOR_DUMP); } }
@@ -5866,9 +5840,8 @@ lpfc_cmf_timer(struct hrtimer *timer) { struct lpfc_hba *phba = container_of(timer, struct lpfc_hba, cmf_timer); - struct rxtable_entry *entry; + struct rx_info_entry entry; uint32_t io_cnt; - uint32_t head, tail; uint32_t busy, max_read; uint64_t total, rcv, lat, mbpi, extra, cnt; int timer_interval = LPFC_CMF_INTERVAL; @@ -5983,40 +5956,30 @@ lpfc_cmf_timer(struct hrtimer *timer) }
/* Save rxmonitor information for debug */ - if (phba->rxtable) { - head = atomic_xchg(&phba->rxtable_idx_head, - LPFC_RXMONITOR_TABLE_IN_USE); - entry = &phba->rxtable[head]; - entry->total_bytes = total; - entry->cmf_bytes = total + extra; - entry->rcv_bytes = rcv; - entry->cmf_busy = busy; - entry->cmf_info = phba->cmf_active_info; + if (phba->rx_monitor) { + entry.total_bytes = total; + entry.cmf_bytes = total + extra; + entry.rcv_bytes = rcv; + entry.cmf_busy = busy; + entry.cmf_info = phba->cmf_active_info; if (io_cnt) { - entry->avg_io_latency = div_u64(lat, io_cnt); - entry->avg_io_size = div_u64(rcv, io_cnt); + entry.avg_io_latency = div_u64(lat, io_cnt); + entry.avg_io_size = div_u64(rcv, io_cnt); } else { - entry->avg_io_latency = 0; - entry->avg_io_size = 0; + entry.avg_io_latency = 0; + entry.avg_io_size = 0; } - entry->max_read_cnt = max_read; - entry->io_cnt = io_cnt; - entry->max_bytes_per_interval = mbpi; + entry.max_read_cnt = max_read; + entry.io_cnt = io_cnt; + entry.max_bytes_per_interval = mbpi; if (phba->cmf_active_mode == LPFC_CFG_MANAGED) - entry->timer_utilization = phba->cmf_last_ts; + entry.timer_utilization = phba->cmf_last_ts; else - entry->timer_utilization = ms; - entry->timer_interval = ms; + entry.timer_utilization = ms; + entry.timer_interval = ms; phba->cmf_last_ts = 0;
- /* Increment rxtable index */ - head = (head + 1) % LPFC_MAX_RXMONITOR_ENTRY; - tail = atomic_read(&phba->rxtable_idx_tail); - if (head == tail) { - tail = (tail + 1) % LPFC_MAX_RXMONITOR_ENTRY; - atomic_set(&phba->rxtable_idx_tail, tail); - } - atomic_set(&phba->rxtable_idx_head, head); + lpfc_rx_monitor_record(phba->rx_monitor, &entry); }
if (phba->cmf_active_mode == LPFC_CFG_MONITOR) { diff --git a/drivers/scsi/lpfc/lpfc_mem.c b/drivers/scsi/lpfc/lpfc_mem.c index 870e53b8f81d..5d36b3514864 100644 --- a/drivers/scsi/lpfc/lpfc_mem.c +++ b/drivers/scsi/lpfc/lpfc_mem.c @@ -344,9 +344,12 @@ lpfc_mem_free_all(struct lpfc_hba *phba) phba->cgn_i = NULL; }
- /* Free RX table */ - kfree(phba->rxtable); - phba->rxtable = NULL; + /* Free RX Monitor */ + if (phba->rx_monitor) { + lpfc_rx_monitor_destroy_ring(phba->rx_monitor); + kfree(phba->rx_monitor); + phba->rx_monitor = NULL; + }
/* Free the iocb lookup array */ kfree(psli->iocbq_lookup); diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 0024c0e0afd3..d6e761adf1f1 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -7878,6 +7878,172 @@ static void lpfc_sli4_dip(struct lpfc_hba *phba) } }
+/** + * lpfc_rx_monitor_create_ring - Initialize ring buffer for rx_monitor + * @rx_monitor: Pointer to lpfc_rx_info_monitor object + * @entries: Number of rx_info_entry objects to allocate in ring + * + * Return: + * 0 - Success + * ENOMEM - Failure to kmalloc + **/ +int lpfc_rx_monitor_create_ring(struct lpfc_rx_info_monitor *rx_monitor, + u32 entries) +{ + rx_monitor->ring = kmalloc_array(entries, sizeof(struct rx_info_entry), + GFP_KERNEL); + if (!rx_monitor->ring) + return -ENOMEM; + + rx_monitor->head_idx = 0; + rx_monitor->tail_idx = 0; + spin_lock_init(&rx_monitor->lock); + rx_monitor->entries = entries; + + return 0; +} + +/** + * lpfc_rx_monitor_destroy_ring - Free ring buffer for rx_monitor + * @rx_monitor: Pointer to lpfc_rx_info_monitor object + **/ +void lpfc_rx_monitor_destroy_ring(struct lpfc_rx_info_monitor *rx_monitor) +{ + spin_lock(&rx_monitor->lock); + kfree(rx_monitor->ring); + rx_monitor->ring = NULL; + rx_monitor->entries = 0; + rx_monitor->head_idx = 0; + rx_monitor->tail_idx = 0; + spin_unlock(&rx_monitor->lock); +} + +/** + * lpfc_rx_monitor_record - Insert an entry into rx_monitor's ring + * @rx_monitor: Pointer to lpfc_rx_info_monitor object + * @entry: Pointer to rx_info_entry + * + * Used to insert an rx_info_entry into rx_monitor's ring. Note that this is a + * deep copy of rx_info_entry not a shallow copy of the rx_info_entry ptr. + * + * This is called from lpfc_cmf_timer, which is in timer/softirq context. + * + * In cases of old data overflow, we do a best effort of FIFO order. + **/ +void lpfc_rx_monitor_record(struct lpfc_rx_info_monitor *rx_monitor, + struct rx_info_entry *entry) +{ + struct rx_info_entry *ring = rx_monitor->ring; + u32 *head_idx = &rx_monitor->head_idx; + u32 *tail_idx = &rx_monitor->tail_idx; + spinlock_t *ring_lock = &rx_monitor->lock; + u32 ring_size = rx_monitor->entries; + + spin_lock(ring_lock); + memcpy(&ring[*tail_idx], entry, sizeof(*entry)); + *tail_idx = (*tail_idx + 1) % ring_size; + + /* Best effort of FIFO saved data */ + if (*tail_idx == *head_idx) + *head_idx = (*head_idx + 1) % ring_size; + + spin_unlock(ring_lock); +} + +/** + * lpfc_rx_monitor_report - Read out rx_monitor's ring + * @phba: Pointer to lpfc_hba object + * @rx_monitor: Pointer to lpfc_rx_info_monitor object + * @buf: Pointer to char buffer that will contain rx monitor info data + * @buf_len: Length buf including null char + * @max_read_entries: Maximum number of entries to read out of ring + * + * Used to dump/read what's in rx_monitor's ring buffer. + * + * If buf is NULL || buf_len == 0, then it is implied that we want to log the + * information to kmsg instead of filling out buf. + * + * Return: + * Number of entries read out of the ring + **/ +u32 lpfc_rx_monitor_report(struct lpfc_hba *phba, + struct lpfc_rx_info_monitor *rx_monitor, char *buf, + u32 buf_len, u32 max_read_entries) +{ + struct rx_info_entry *ring = rx_monitor->ring; + struct rx_info_entry *entry; + u32 *head_idx = &rx_monitor->head_idx; + u32 *tail_idx = &rx_monitor->tail_idx; + spinlock_t *ring_lock = &rx_monitor->lock; + u32 ring_size = rx_monitor->entries; + u32 cnt = 0; + char tmp[DBG_LOG_STR_SZ] = {0}; + bool log_to_kmsg = (!buf || !buf_len) ? true : false; + + if (!log_to_kmsg) { + /* clear the buffer to be sure */ + memset(buf, 0, buf_len); + + scnprintf(buf, buf_len, "\t%-16s%-16s%-16s%-16s%-8s%-8s%-8s" + "%-8s%-8s%-8s%-16s\n", + "MaxBPI", "Tot_Data_CMF", + "Tot_Data_Cmd", "Tot_Data_Cmpl", + "Lat(us)", "Avg_IO", "Max_IO", "Bsy", + "IO_cnt", "Info", "BWutil(ms)"); + } + + /* Needs to be _bh because record is called from timer interrupt + * context + */ + spin_lock_bh(ring_lock); + while (*head_idx != *tail_idx) { + entry = &ring[*head_idx]; + + /* Read out this entry's data. */ + if (!log_to_kmsg) { + /* If !log_to_kmsg, then store to buf. */ + scnprintf(tmp, sizeof(tmp), + "%03d:\t%-16llu%-16llu%-16llu%-16llu%-8llu" + "%-8llu%-8llu%-8u%-8u%-8u%u(%u)\n", + *head_idx, entry->max_bytes_per_interval, + entry->cmf_bytes, entry->total_bytes, + entry->rcv_bytes, entry->avg_io_latency, + entry->avg_io_size, entry->max_read_cnt, + entry->cmf_busy, entry->io_cnt, + entry->cmf_info, entry->timer_utilization, + entry->timer_interval); + + /* Check for buffer overflow */ + if ((strlen(buf) + strlen(tmp)) >= buf_len) + break; + + /* Append entry's data to buffer */ + strlcat(buf, tmp, buf_len); + } else { + lpfc_printf_log(phba, KERN_INFO, LOG_CGN_MGMT, + "4410 %02u: MBPI %llu Xmit %llu " + "Cmpl %llu Lat %llu ASz %llu Info %02u " + "BWUtil %u Int %u slot %u\n", + cnt, entry->max_bytes_per_interval, + entry->total_bytes, entry->rcv_bytes, + entry->avg_io_latency, + entry->avg_io_size, entry->cmf_info, + entry->timer_utilization, + entry->timer_interval, *head_idx); + } + + *head_idx = (*head_idx + 1) % ring_size; + + /* Don't feed more than max_read_entries */ + cnt++; + if (cnt >= max_read_entries) + break; + } + spin_unlock_bh(ring_lock); + + return cnt; +} + /** * lpfc_cmf_setup - Initialize idle_stat tracking * @phba: Pointer to HBA context object. @@ -8069,19 +8235,29 @@ lpfc_cmf_setup(struct lpfc_hba *phba) phba->cmf_interval_rate = LPFC_CMF_INTERVAL;
/* Allocate RX Monitor Buffer */ - if (!phba->rxtable) { - phba->rxtable = kmalloc_array(LPFC_MAX_RXMONITOR_ENTRY, - sizeof(struct rxtable_entry), - GFP_KERNEL); - if (!phba->rxtable) { + if (!phba->rx_monitor) { + phba->rx_monitor = kzalloc(sizeof(*phba->rx_monitor), + GFP_KERNEL); + + if (!phba->rx_monitor) { lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2644 Failed to alloc memory " "for RX Monitor Buffer\n"); return -ENOMEM; } + + /* Instruct the rx_monitor object to instantiate its ring */ + if (lpfc_rx_monitor_create_ring(phba->rx_monitor, + LPFC_MAX_RXMONITOR_ENTRY)) { + kfree(phba->rx_monitor); + phba->rx_monitor = NULL; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, + "2645 Failed to alloc memory " + "for RX Monitor's Ring\n"); + return -ENOMEM; + } } - atomic_set(&phba->rxtable_idx_head, 0); - atomic_set(&phba->rxtable_idx_tail, 0); + return 0; }
From: Lukas Wunner lukas@wunner.de
commit 3a939433ddc1bab98be028903aaa286e5e7461d7 upstream.
The ar933x_uart driver neglects to deassert Transmit Enable when ->rs485_config() is invoked. Fix it.
Fixes: 9be1064fe524 ("serial: ar933x_uart: add RS485 support") Cc: stable@vger.kernel.org # v5.7+ Cc: Daniel Golle daniel@makrotopia.org Reviewed-by: Ilpo JÀrvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Lukas Wunner lukas@wunner.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/ar933x_uart.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/tty/serial/ar933x_uart.c b/drivers/tty/serial/ar933x_uart.c index 4379ca4842ae..0f2677695b52 100644 --- a/drivers/tty/serial/ar933x_uart.c +++ b/drivers/tty/serial/ar933x_uart.c @@ -591,6 +591,11 @@ static int ar933x_config_rs485(struct uart_port *port, dev_err(port->dev, "RS485 needs rts-gpio\n"); return 1; } + + if (rs485conf->flags & SER_RS485_ENABLED) + gpiod_set_value(up->rts_gpiod, + !!(rs485conf->flags & SER_RS485_RTS_AFTER_SEND)); + port->rs485 = *rs485conf; return 0; }
From: Sean Christopherson seanjc@google.com
[ Upstream commit a61d7c5432ac5a953bbcec17af031661c2bd201d ]
Trace exceptions that are re-injected, not just those that KVM is injecting for the first time. Debugging re-injection bugs is painful enough as is, not having visibility into what KVM is doing only makes things worse.
Delay propagating pending=>injected in the non-reinjection path so that the tracing can properly identify reinjected exceptions.
Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Maxim Levitsky mlevitsk@redhat.com Signed-off-by: Maciej S. Szmigiero maciej.szmigiero@oracle.com Message-Id: 25470690a38b4d2b32b6204875dd35676c65c9f2.1651440202.git.maciej.szmigiero@oracle.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Stable-dep-of: 5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/trace.h | 12 ++++++++---- arch/x86/kvm/x86.c | 16 +++++++++------- 2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 03ebe368333e..c41506ed8c7d 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -355,25 +355,29 @@ TRACE_EVENT(kvm_inj_virq, * Tracepoint for kvm interrupt injection: */ TRACE_EVENT(kvm_inj_exception, - TP_PROTO(unsigned exception, bool has_error, unsigned error_code), - TP_ARGS(exception, has_error, error_code), + TP_PROTO(unsigned exception, bool has_error, unsigned error_code, + bool reinjected), + TP_ARGS(exception, has_error, error_code, reinjected),
TP_STRUCT__entry( __field( u8, exception ) __field( u8, has_error ) __field( u32, error_code ) + __field( bool, reinjected ) ),
TP_fast_assign( __entry->exception = exception; __entry->has_error = has_error; __entry->error_code = error_code; + __entry->reinjected = reinjected; ),
- TP_printk("%s (0x%x)", + TP_printk("%s (0x%x)%s", __print_symbolic(__entry->exception, kvm_trace_sym_exc), /* FIXME: don't print error_code if not present */ - __entry->has_error ? __entry->error_code : 0) + __entry->has_error ? __entry->error_code : 0, + __entry->reinjected ? " [reinjected]" : "") );
/* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8648799d48f8..c171ca2bb85a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9010,6 +9010,11 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu) { + trace_kvm_inj_exception(vcpu->arch.exception.nr, + vcpu->arch.exception.has_error_code, + vcpu->arch.exception.error_code, + vcpu->arch.exception.injected); + if (vcpu->arch.exception.error_code && !is_protmode(vcpu)) vcpu->arch.exception.error_code = false; static_call(kvm_x86_queue_exception)(vcpu); @@ -9067,13 +9072,6 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */ if (vcpu->arch.exception.pending) { - trace_kvm_inj_exception(vcpu->arch.exception.nr, - vcpu->arch.exception.has_error_code, - vcpu->arch.exception.error_code); - - vcpu->arch.exception.pending = false; - vcpu->arch.exception.injected = true; - if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | X86_EFLAGS_RF); @@ -9087,6 +9085,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) }
kvm_inject_exception(vcpu); + + vcpu->arch.exception.pending = false; + vcpu->arch.exception.injected = true; + can_inject = false; }
From: Sean Christopherson seanjc@google.com
[ Upstream commit 5623f751bd9c438ed12840e086f33c4646440d19 ]
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or trap-like depending the sub-type of #DB, and effectively defer the decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as fault-like, as the emulator handles only code breakpoint and general detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to set EFLAGS.RF=1 on the stack, keep the current behavior of not setting RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs, so exempting by failing the "== EXCPT_FAULT" check is correct. The only other fault-like #DB is General Detect, and despite Intel and AMD both strongly implying (through omission) that General Detect #DBs should set RF=1, hardware (multiple generations of both Intel and AMD), in fact does not. Through insider knowledge, extreme foresight, sheer dumb luck, or some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Maxim Levitsky mlevitsk@redhat.com Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c171ca2bb85a..cd22557e2645 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -525,6 +525,7 @@ static int exception_class(int vector) #define EXCPT_TRAP 1 #define EXCPT_ABORT 2 #define EXCPT_INTERRUPT 3 +#define EXCPT_DB 4
static int exception_type(int vector) { @@ -535,8 +536,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */ - if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR))) + /* + * #DBs can be trap-like or fault-like, the caller must check other CPU + * state, e.g. DR6, to determine whether a #DB is a trap or fault. + */ + if (mask & (1 << DB_VECTOR)) + return EXCPT_DB; + + if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR))) return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR))) @@ -8134,6 +8141,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu); toggle_interruptibility(vcpu, ctxt->interruptibility); vcpu->arch.emulate_regs_need_sync_to_vcpu = false; + + /* + * Note, EXCPT_DB is assumed to be fault-like as the emulator + * only supports code breakpoints and general detect #DB, both + * of which are fault-like. + */ if (!ctxt->have_exception || exception_type(ctxt->exception.vector) == EXCPT_TRAP) { kvm_rip_write(vcpu, ctxt->eip); @@ -9072,6 +9085,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */ if (vcpu->arch.exception.pending) { + /* + * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS + * value pushed on the stack. Trap-like exception and all #DBs + * leave RF as-is (KVM follows Intel's behavior in this regard; + * AMD states that code breakpoint #DBs excplitly clear RF=0). + * + * Note, most versions of Intel's SDM and AMD's APM incorrectly + * describe the behavior of General Detect #DBs, which are + * fault-like. They do _not_ set RF, a la code breakpoints. + */ if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | X86_EFLAGS_RF);
From: Shirish S shirish.s@amd.com
[ Upstream commit 6094b9136ca9038b61e9c4b5d25cd5512ce50b34 ]
[Why] If psr_feature_enable is set to true by default, it continues to be enabled for non capable links.
[How] explicitly disable the feature on links that are not capable of the same.
Fixes: 8c322309e48e9 ("drm/amd/display: Enable PSR") Signed-off-by: Shirish S shirish.s@amd.com Reviewed-by: Leo Li sunpeng.li@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c index 7072fb2ec07f..278ff281a1bd 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c @@ -36,10 +36,14 @@ void amdgpu_dm_set_psr_caps(struct dc_link *link) { uint8_t dpcd_data[EDP_PSR_RECEIVER_CAP_SIZE];
- if (!(link->connector_signal & SIGNAL_TYPE_EDP)) + if (!(link->connector_signal & SIGNAL_TYPE_EDP)) { + link->psr_settings.psr_feature_enabled = false; return; - if (link->type == dc_connection_none) + } + if (link->type == dc_connection_none) { + link->psr_settings.psr_feature_enabled = false; return; + } if (dm_helpers_dp_read_dpcd(NULL, link, DP_PSR_SUPPORT, dpcd_data, sizeof(dpcd_data))) { link->dpcd_caps.psr_caps.psr_version = dpcd_data[0];
From: Baolin Wang baolin.wang@linux.alibaba.com
[ Upstream commit fac35ba763ed07ba93154c95ffc0c4a55023707f ]
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.166201756... Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Suggested-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: David Hildenbrand david@redhat.com Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/hugetlb.h | 8 ++++---- mm/gup.c | 14 +++++++++++++- mm/hugetlb.c | 27 +++++++++++++-------------- 3 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 22c1d935e22d..af8e7045017f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -198,8 +198,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, struct page *follow_huge_pd(struct vm_area_struct *vma, unsigned long address, hugepd_t hpd, int flags, int pdshift); -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags); +struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, + int flags); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int flags); struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, @@ -286,8 +286,8 @@ static inline struct page *follow_huge_pd(struct vm_area_struct *vma, return NULL; }
-static inline struct page *follow_huge_pmd(struct mm_struct *mm, - unsigned long address, pmd_t *pmd, int flags) +static inline struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, + unsigned long address, int flags) { return NULL; } diff --git a/mm/gup.c b/mm/gup.c index 1a23cd0b4fba..69e45cbe58f8 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -509,6 +509,18 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) return ERR_PTR(-EINVAL); + + /* + * Considering PTE level hugetlb, like continuous-PTE hugetlb on + * ARM64 architecture. + */ + if (is_vm_hugetlb_page(vma)) { + page = follow_huge_pmd_pte(vma, address, flags); + if (page) + return page; + return no_page_table(vma, flags); + } + retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); @@ -652,7 +664,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (pmd_none(pmdval)) return no_page_table(vma, flags); if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) { - page = follow_huge_pmd(mm, address, pmd, flags); + page = follow_huge_pmd_pte(vma, address, flags); if (page) return page; return no_page_table(vma, flags); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f5f8929fad51..dbb63ec3b5fa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6182,12 +6182,13 @@ follow_huge_pd(struct vm_area_struct *vma, }
struct page * __weak -follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags) +follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags) { + struct hstate *h = hstate_vma(vma); + struct mm_struct *mm = vma->vm_mm; struct page *page = NULL; spinlock_t *ptl; - pte_t pte; + pte_t *ptep, pte;
/* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == @@ -6195,17 +6196,15 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, return NULL;
retry: - ptl = pmd_lockptr(mm, pmd); - spin_lock(ptl); - /* - * make sure that the address range covered by this pmd is not - * unmapped from other threads. - */ - if (!pmd_huge(*pmd)) - goto out; - pte = huge_ptep_get((pte_t *)pmd); + ptep = huge_pte_offset(mm, address, huge_page_size(h)); + if (!ptep) + return NULL; + + ptl = huge_pte_lock(h, mm, ptep); + pte = huge_ptep_get(ptep); if (pte_present(pte)) { - page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT); + page = pte_page(pte) + + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); /* * try_grab_page() should always succeed here, because: a) we * hold the pmd (ptl) lock, and b) we've just checked that the @@ -6221,7 +6220,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, } else { if (is_hugetlb_entry_migration(pte)) { spin_unlock(ptl); - __migration_entry_wait(mm, (pte_t *)pmd, ptl); + __migration_entry_wait(mm, ptep, ptl); goto retry; } /*
From: Roderick Colenbrander roderick@gaikai.com
[ Upstream commit b8a968efab301743fd659b5649c5d7d3e30e63a6 ]
Provide initial support for the DualSense Edge controller. The brings support up to the level of the original DualSense, but won't yet provide support for new features (e.g. reprogrammable buttons).
Signed-off-by: Roderick Colenbrander roderick.colenbrander@sony.com CC: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Link: https://lore.kernel.org/r/20221010212313.78275-3-roderick.colenbrander@sony.... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-playstation.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index cb2b48d6915e..a5aac9cc2075 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -1109,6 +1109,7 @@ #define USB_DEVICE_ID_SONY_PS4_CONTROLLER_2 0x09cc #define USB_DEVICE_ID_SONY_PS4_CONTROLLER_DONGLE 0x0ba0 #define USB_DEVICE_ID_SONY_PS5_CONTROLLER 0x0ce6 +#define USB_DEVICE_ID_SONY_PS5_CONTROLLER_2 0x0df2 #define USB_DEVICE_ID_SONY_MOTION_CONTROLLER 0x03d5 #define USB_DEVICE_ID_SONY_NAVIGATION_CONTROLLER 0x042f #define USB_DEVICE_ID_SONY_BUZZ_CONTROLLER 0x0002 diff --git a/drivers/hid/hid-playstation.c b/drivers/hid/hid-playstation.c index ab7c82c2e886..bd0e0fe2f627 100644 --- a/drivers/hid/hid-playstation.c +++ b/drivers/hid/hid-playstation.c @@ -1282,7 +1282,8 @@ static int ps_probe(struct hid_device *hdev, const struct hid_device_id *id) goto err_stop; }
- if (hdev->product == USB_DEVICE_ID_SONY_PS5_CONTROLLER) { + if (hdev->product == USB_DEVICE_ID_SONY_PS5_CONTROLLER || + hdev->product == USB_DEVICE_ID_SONY_PS5_CONTROLLER_2) { dev = dualsense_create(hdev); if (IS_ERR(dev)) { hid_err(hdev, "Failed to create dualsense.\n"); @@ -1320,6 +1321,8 @@ static void ps_remove(struct hid_device *hdev) static const struct hid_device_id ps_devices[] = { { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER) }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER_2) }, + { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER_2) }, { } }; MODULE_DEVICE_TABLE(hid, ps_devices);
From: Aaron Lewis aaronlewis@google.com
[ Upstream commit cf5029d5dd7cb0aaa53250fa9e389abd231606b3 ]
The flags for KVM_CAP_X86_USER_SPACE_MSR and KVM_X86_SET_MSR_FILTER have no protection for their unused bits. Without protection, future development for these features will be difficult. Add the protection needed to make it possible to extend these features in the future.
Signed-off-by: Aaron Lewis aaronlewis@google.com Message-Id: 20220714161314.1715227-1-aaronlewis@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Stable-dep-of: 2e3272bc1790 ("KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cd22557e2645..59c9eb55e6d1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5770,6 +5770,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; break; case KVM_CAP_X86_USER_SPACE_MSR: + r = -EINVAL; + if (cap->args[0] & ~(KVM_MSR_EXIT_REASON_INVAL | + KVM_MSR_EXIT_REASON_UNKNOWN | + KVM_MSR_EXIT_REASON_FILTER)) + break; kvm->arch.user_space_msr_mask = cap->args[0]; r = 0; break; @@ -5903,6 +5908,9 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp) if (copy_from_user(&filter, user_msr_filter, sizeof(filter))) return -EFAULT;
+ if (filter.flags & ~KVM_MSR_FILTER_DEFAULT_DENY) + return -EINVAL; + for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) empty &= !filter.ranges[i].nmsrs;
From: Alexander Graf graf@amazon.com
[ Upstream commit 2e3272bc1790825c43d2c39690bf2836b81c6d36 ]
In the next patch we want to introduce a second caller to set_msr_filter() which constructs its own filter list on the stack. Refactor the original function so it takes it as argument instead of reading it through copy_from_user().
Signed-off-by: Alexander Graf graf@amazon.com Message-Id: 20221017184541.2658-3-graf@amazon.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 59c9eb55e6d1..5d165f793f44 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5895,26 +5895,22 @@ static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter, return 0; }
-static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp) +static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, + struct kvm_msr_filter *filter) { - struct kvm_msr_filter __user *user_msr_filter = argp; struct kvm_x86_msr_filter *new_filter, *old_filter; - struct kvm_msr_filter filter; bool default_allow; bool empty = true; int r = 0; u32 i;
- if (copy_from_user(&filter, user_msr_filter, sizeof(filter))) - return -EFAULT; - - if (filter.flags & ~KVM_MSR_FILTER_DEFAULT_DENY) + if (filter->flags & ~KVM_MSR_FILTER_DEFAULT_DENY) return -EINVAL;
- for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) - empty &= !filter.ranges[i].nmsrs; + for (i = 0; i < ARRAY_SIZE(filter->ranges); i++) + empty &= !filter->ranges[i].nmsrs;
- default_allow = !(filter.flags & KVM_MSR_FILTER_DEFAULT_DENY); + default_allow = !(filter->flags & KVM_MSR_FILTER_DEFAULT_DENY); if (empty && !default_allow) return -EINVAL;
@@ -5922,8 +5918,8 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp) if (!new_filter) return -ENOMEM;
- for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) { - r = kvm_add_msr_filter(new_filter, &filter.ranges[i]); + for (i = 0; i < ARRAY_SIZE(filter->ranges); i++) { + r = kvm_add_msr_filter(new_filter, &filter->ranges[i]); if (r) { kvm_free_msr_filter(new_filter); return r; @@ -6320,9 +6316,16 @@ long kvm_arch_vm_ioctl(struct file *filp, case KVM_SET_PMU_EVENT_FILTER: r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp); break; - case KVM_X86_SET_MSR_FILTER: - r = kvm_vm_ioctl_set_msr_filter(kvm, argp); + case KVM_X86_SET_MSR_FILTER: { + struct kvm_msr_filter __user *user_msr_filter = argp; + struct kvm_msr_filter filter; + + if (copy_from_user(&filter, user_msr_filter, sizeof(filter))) + return -EFAULT; + + r = kvm_vm_ioctl_set_msr_filter(kvm, &filter); break; + } default: r = -ENOTTY; }
From: Alexander Graf graf@amazon.com
[ Upstream commit 1739c7017fb1d759965dcbab925ff5980a5318cb ]
The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in struct which means it has a different struct size depending on whether it gets called from 32bit or 64bit code.
This patch introduces compat code that converts from the 32bit struct to its 64bit counterpart which then gets used going forward internally. With this applied, 32bit QEMU can successfully set MSR bitmaps when running on 64bit kernels.
Reported-by: Andrew Randrianasulu randrianasulu@gmail.com Fixes: 1a155254ff937 ("KVM: x86: Introduce MSR filtering") Signed-off-by: Alexander Graf graf@amazon.com Message-Id: 20221017184541.2658-4-graf@amazon.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5d165f793f44..ec2a37ba07e6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5942,6 +5942,62 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, return 0; }
+#ifdef CONFIG_KVM_COMPAT +/* for KVM_X86_SET_MSR_FILTER */ +struct kvm_msr_filter_range_compat { + __u32 flags; + __u32 nmsrs; + __u32 base; + __u32 bitmap; +}; + +struct kvm_msr_filter_compat { + __u32 flags; + struct kvm_msr_filter_range_compat ranges[KVM_MSR_FILTER_MAX_RANGES]; +}; + +#define KVM_X86_SET_MSR_FILTER_COMPAT _IOW(KVMIO, 0xc6, struct kvm_msr_filter_compat) + +long kvm_arch_vm_compat_ioctl(struct file *filp, unsigned int ioctl, + unsigned long arg) +{ + void __user *argp = (void __user *)arg; + struct kvm *kvm = filp->private_data; + long r = -ENOTTY; + + switch (ioctl) { + case KVM_X86_SET_MSR_FILTER_COMPAT: { + struct kvm_msr_filter __user *user_msr_filter = argp; + struct kvm_msr_filter_compat filter_compat; + struct kvm_msr_filter filter; + int i; + + if (copy_from_user(&filter_compat, user_msr_filter, + sizeof(filter_compat))) + return -EFAULT; + + filter.flags = filter_compat.flags; + for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) { + struct kvm_msr_filter_range_compat *cr; + + cr = &filter_compat.ranges[i]; + filter.ranges[i] = (struct kvm_msr_filter_range) { + .flags = cr->flags, + .nmsrs = cr->nmsrs, + .base = cr->base, + .bitmap = (__u8 *)(ulong)cr->bitmap, + }; + } + + r = kvm_vm_ioctl_set_msr_filter(kvm, &filter); + break; + } + } + + return r; +} +#endif + #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER static int kvm_arch_suspend_notifier(struct kvm *kvm) {
From: Håkon Bugge haakon.bugge@oracle.com
[ Upstream commit eb83f502adb036cd56c27e13b9ca3b2aabfa790b ]
Commit 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation") swapped the src and dst addresses in the call to validate_net_dev().
As a consequence, the test in validate_ipv4_net_dev() to see if the net_dev is the right one, is incorrect for port 1 <-> 2 communication when the ports are on the same sub-net. This is fixed by denoting the flowi4_oif as the device instead of the incoming one.
The bug has not been observed using IPv6 addresses.
Fixes: 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation") Signed-off-by: Håkon Bugge haakon.bugge@oracle.com Link: https://lore.kernel.org/r/20221012141542.16925-1-haakon.bugge@oracle.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0da66dd40d6a..fd192104fd8d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1433,7 +1433,7 @@ static bool validate_ipv4_net_dev(struct net_device *net_dev, return false;
memset(&fl4, 0, sizeof(fl4)); - fl4.flowi4_iif = net_dev->ifindex; + fl4.flowi4_oif = net_dev->ifindex; fl4.daddr = daddr; fl4.saddr = saddr;
From: Dean Luick dean.luick@cornelisnetworks.com
[ Upstream commit 1afac08b39d85437187bb2a92d89a741b1078f55 ]
Commit 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") incorrectly tries to move a list from one list head to another. The result is a kernel crash.
The crash is triggered when a link goes down and there are waiters for a send to complete. The following signature is seen:
BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] Call Trace: sc_disable+0x1ba/0x240 [hfi1] pio_freeze+0x3d/0x60 [hfi1] handle_freeze+0x27/0x1b0 [hfi1] process_one_work+0x1b0/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xd7/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30
The fix is to use the correct call to move the list.
Fixes: 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") Signed-off-by: Dean Luick dean.luick@cornelisnetworks.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Link: https://lore.kernel.org/r/166610327042.674422.6146908799669288976.stgit@awfm... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/pio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c index 3d42bd2b36bd..51ae58c02b15 100644 --- a/drivers/infiniband/hw/hfi1/pio.c +++ b/drivers/infiniband/hw/hfi1/pio.c @@ -913,8 +913,7 @@ void sc_disable(struct send_context *sc) spin_unlock(&sc->release_lock);
write_seqlock(&sc->waitlock); - if (!list_empty(&sc->piowait)) - list_move(&sc->piowait, &wake_list); + list_splice_init(&sc->piowait, &wake_list); write_sequnlock(&sc->waitlock); while (!list_empty(&wake_list)) { struct iowait *wait;
From: Xinhao Liu liuxinhao5@hisilicon.com
[ Upstream commit 9c3631d17054a8766dbdc1abf8d29306260e7c7f ]
Don't use unintelligible constants.
Link: https://lore.kernel.org/r/20211119140208.40416-10-liangwenpeng@huawei.com Signed-off-by: Xinhao Liu liuxinhao5@hisilicon.com Signed-off-by: Wenpeng Liang liangwenpeng@huawei.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Stable-dep-of: 9e272ed69ad6 ("RDMA/hns: Disable local invalidate operation") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 1dbad159f379..1782626b54db 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -678,6 +678,7 @@ static void hns_roce_write512(struct hns_roce_dev *hr_dev, u64 *val, static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp, void *wqe) { +#define HNS_ROCE_SL_SHIFT 2 struct hns_roce_v2_rc_send_wqe *rc_sq_wqe = wqe;
/* All kinds of DirectWQE have the same header field layout */ @@ -685,7 +686,8 @@ static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp, roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_DB_SL_L_M, V2_RC_SEND_WQE_BYTE_4_DB_SL_L_S, qp->sl); roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_DB_SL_H_M, - V2_RC_SEND_WQE_BYTE_4_DB_SL_H_S, qp->sl >> 2); + V2_RC_SEND_WQE_BYTE_4_DB_SL_H_S, + qp->sl >> HNS_ROCE_SL_SHIFT); roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S, qp->sq.head);
From: Wenpeng Liang liangwenpeng@huawei.com
[ Upstream commit 82600b2d3cd57428bdb03c66ae67708d3c8f7281 ]
To reduce the code size and make the code clearer, replace all roce_set_xxx() with hr_reg_xxx() to write the data fields.
Link: https://lore.kernel.org/r/20220512080012.38728-2-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang liangwenpeng@huawei.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Stable-dep-of: 9e272ed69ad6 ("RDMA/hns: Disable local invalidate operation") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 261 ++++++++------------- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 166 +++++-------- 2 files changed, 157 insertions(+), 270 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 1782626b54db..3b26c74efee0 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -149,8 +149,7 @@ static void set_atomic_seg(const struct ib_send_wr *wr, aseg->cmp_data = 0; }
- roce_set_field(rc_sq_wqe->byte_16, V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M, - V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, valid_num_sge); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_SGE_NUM, valid_num_sge); }
static int fill_ext_sge_inl_data(struct hns_roce_qp *qp, @@ -271,8 +270,7 @@ static int set_rc_inl(struct hns_roce_qp *qp, const struct ib_send_wr *wr, dseg += sizeof(struct hns_roce_v2_rc_send_wqe);
if (msg_len <= HNS_ROCE_V2_MAX_RC_INL_INN_SZ) { - roce_set_bit(rc_sq_wqe->byte_20, - V2_RC_SEND_WQE_BYTE_20_INL_TYPE_S, 0); + hr_reg_clear(rc_sq_wqe, RC_SEND_WQE_INL_TYPE);
for (i = 0; i < wr->num_sge; i++) { memcpy(dseg, ((void *)wr->sg_list[i].addr), @@ -280,17 +278,13 @@ static int set_rc_inl(struct hns_roce_qp *qp, const struct ib_send_wr *wr, dseg += wr->sg_list[i].length; } } else { - roce_set_bit(rc_sq_wqe->byte_20, - V2_RC_SEND_WQE_BYTE_20_INL_TYPE_S, 1); + hr_reg_enable(rc_sq_wqe, RC_SEND_WQE_INL_TYPE);
ret = fill_ext_sge_inl_data(qp, wr, &curr_idx, msg_len); if (ret) return ret;
- roce_set_field(rc_sq_wqe->byte_16, - V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M, - V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, - curr_idx - *sge_idx); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_SGE_NUM, curr_idx - *sge_idx); }
*sge_idx = curr_idx; @@ -309,12 +303,10 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr, int j = 0; int i;
- roce_set_field(rc_sq_wqe->byte_20, - V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M, - V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S, - (*sge_ind) & (qp->sge.sge_cnt - 1)); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_MSG_START_SGE_IDX, + (*sge_ind) & (qp->sge.sge_cnt - 1));
- roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_INLINE_S, + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_INLINE, !!(wr->send_flags & IB_SEND_INLINE)); if (wr->send_flags & IB_SEND_INLINE) return set_rc_inl(qp, wr, rc_sq_wqe, sge_ind); @@ -339,9 +331,7 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr, valid_num_sge - HNS_ROCE_SGE_IN_WQE); }
- roce_set_field(rc_sq_wqe->byte_16, - V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M, - V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, valid_num_sge); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_SGE_NUM, valid_num_sge);
return 0; } @@ -412,8 +402,7 @@ static int set_ud_opcode(struct hns_roce_v2_ud_send_wqe *ud_sq_wqe,
ud_sq_wqe->immtdata = get_immtdata(wr);
- roce_set_field(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_OPCODE_M, - V2_UD_SEND_WQE_BYTE_4_OPCODE_S, to_hr_opcode(ib_op)); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_OPCODE, to_hr_opcode(ib_op));
return 0; } @@ -424,21 +413,15 @@ static int fill_ud_av(struct hns_roce_v2_ud_send_wqe *ud_sq_wqe, struct ib_device *ib_dev = ah->ibah.device; struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
- roce_set_field(ud_sq_wqe->byte_24, V2_UD_SEND_WQE_BYTE_24_UDPSPN_M, - V2_UD_SEND_WQE_BYTE_24_UDPSPN_S, ah->av.udp_sport); - - roce_set_field(ud_sq_wqe->byte_36, V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_M, - V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_S, ah->av.hop_limit); - roce_set_field(ud_sq_wqe->byte_36, V2_UD_SEND_WQE_BYTE_36_TCLASS_M, - V2_UD_SEND_WQE_BYTE_36_TCLASS_S, ah->av.tclass); - roce_set_field(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_M, - V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_S, ah->av.flowlabel); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_UDPSPN, ah->av.udp_sport); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_HOPLIMIT, ah->av.hop_limit); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_TCLASS, ah->av.tclass); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_FLOW_LABEL, ah->av.flowlabel);
if (WARN_ON(ah->av.sl > MAX_SERVICE_LEVEL)) return -EINVAL;
- roce_set_field(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_SL_M, - V2_UD_SEND_WQE_BYTE_40_SL_S, ah->av.sl); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_SL, ah->av.sl);
ud_sq_wqe->sgid_index = ah->av.gid_index;
@@ -448,10 +431,8 @@ static int fill_ud_av(struct hns_roce_v2_ud_send_wqe *ud_sq_wqe, if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP09) return 0;
- roce_set_bit(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_UD_VLAN_EN_S, - ah->av.vlan_en); - roce_set_field(ud_sq_wqe->byte_36, V2_UD_SEND_WQE_BYTE_36_VLAN_M, - V2_UD_SEND_WQE_BYTE_36_VLAN_S, ah->av.vlan_id); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_VLAN_EN, ah->av.vlan_en); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_VLAN, ah->av.vlan_id);
return 0; } @@ -476,27 +457,19 @@ static inline int set_ud_wqe(struct hns_roce_qp *qp,
ud_sq_wqe->msg_len = cpu_to_le32(msg_len);
- roce_set_bit(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_CQE_S, + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_CQE, !!(wr->send_flags & IB_SEND_SIGNALED)); - - roce_set_bit(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_SE_S, + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_SE, !!(wr->send_flags & IB_SEND_SOLICITED));
- roce_set_field(ud_sq_wqe->byte_16, V2_UD_SEND_WQE_BYTE_16_PD_M, - V2_UD_SEND_WQE_BYTE_16_PD_S, to_hr_pd(qp->ibqp.pd)->pdn); - - roce_set_field(ud_sq_wqe->byte_16, V2_UD_SEND_WQE_BYTE_16_SGE_NUM_M, - V2_UD_SEND_WQE_BYTE_16_SGE_NUM_S, valid_num_sge); - - roce_set_field(ud_sq_wqe->byte_20, - V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M, - V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S, - curr_idx & (qp->sge.sge_cnt - 1)); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_PD, to_hr_pd(qp->ibqp.pd)->pdn); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_SGE_NUM, valid_num_sge); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_MSG_START_SGE_IDX, + curr_idx & (qp->sge.sge_cnt - 1));
ud_sq_wqe->qkey = cpu_to_le32(ud_wr(wr)->remote_qkey & 0x80000000 ? qp->qkey : ud_wr(wr)->remote_qkey); - roce_set_field(ud_sq_wqe->byte_32, V2_UD_SEND_WQE_BYTE_32_DQPN_M, - V2_UD_SEND_WQE_BYTE_32_DQPN_S, ud_wr(wr)->remote_qpn); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_DQPN, ud_wr(wr)->remote_qpn);
ret = fill_ud_av(ud_sq_wqe, ah); if (ret) @@ -516,8 +489,7 @@ static inline int set_ud_wqe(struct hns_roce_qp *qp, dma_wmb();
*sge_idx = curr_idx; - roce_set_bit(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_OWNER_S, - owner_bit); + hr_reg_write(ud_sq_wqe, UD_SEND_WQE_OWNER, owner_bit);
return 0; } @@ -553,7 +525,7 @@ static int set_rc_opcode(struct hns_roce_dev *hr_dev, ret = -EOPNOTSUPP; break; case IB_WR_LOCAL_INV: - roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_SO_S, 1); + hr_reg_enable(rc_sq_wqe, RC_SEND_WQE_SO); fallthrough; case IB_WR_SEND_WITH_INV: rc_sq_wqe->inv_key = cpu_to_le32(wr->ex.invalidate_rkey); @@ -565,11 +537,11 @@ static int set_rc_opcode(struct hns_roce_dev *hr_dev, if (unlikely(ret)) return ret;
- roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_OPCODE_M, - V2_RC_SEND_WQE_BYTE_4_OPCODE_S, to_hr_opcode(ib_op)); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_OPCODE, to_hr_opcode(ib_op));
return ret; } + static inline int set_rc_wqe(struct hns_roce_qp *qp, const struct ib_send_wr *wr, void *wqe, unsigned int *sge_idx, @@ -590,13 +562,13 @@ static inline int set_rc_wqe(struct hns_roce_qp *qp, if (WARN_ON(ret)) return ret;
- roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_FENCE_S, + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_FENCE, (wr->send_flags & IB_SEND_FENCE) ? 1 : 0);
- roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_SE_S, + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_SE, (wr->send_flags & IB_SEND_SOLICITED) ? 1 : 0);
- roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_CQE_S, + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_CQE, (wr->send_flags & IB_SEND_SIGNALED) ? 1 : 0);
if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP || @@ -616,8 +588,7 @@ static inline int set_rc_wqe(struct hns_roce_qp *qp, dma_wmb();
*sge_idx = curr_idx; - roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_OWNER_S, - owner_bit); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_OWNER, owner_bit);
return ret; } @@ -682,14 +653,11 @@ static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp, struct hns_roce_v2_rc_send_wqe *rc_sq_wqe = wqe;
/* All kinds of DirectWQE have the same header field layout */ - roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_FLAG_S, 1); - roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_DB_SL_L_M, - V2_RC_SEND_WQE_BYTE_4_DB_SL_L_S, qp->sl); - roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_DB_SL_H_M, - V2_RC_SEND_WQE_BYTE_4_DB_SL_H_S, - qp->sl >> HNS_ROCE_SL_SHIFT); - roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M, - V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S, qp->sq.head); + hr_reg_enable(rc_sq_wqe, RC_SEND_WQE_FLAG); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_DB_SL_L, qp->sl); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_DB_SL_H, + qp->sl >> HNS_ROCE_SL_SHIFT); + hr_reg_write(rc_sq_wqe, RC_SEND_WQE_WQE_INDEX, qp->sq.head);
hns_roce_write512(hr_dev, wqe, qp->sq.db_reg); } @@ -1785,17 +1753,16 @@ static int __hns_roce_set_vf_switch_param(struct hns_roce_dev *hr_dev, swt = (struct hns_roce_vf_switch *)desc.data; hns_roce_cmq_setup_basic_desc(&desc, HNS_SWITCH_PARAMETER_CFG, true); swt->rocee_sel |= cpu_to_le32(HNS_ICL_SWITCH_CMD_ROCEE_SEL); - roce_set_field(swt->fun_id, VF_SWITCH_DATA_FUN_ID_VF_ID_M, - VF_SWITCH_DATA_FUN_ID_VF_ID_S, vf_id); + hr_reg_write(swt, VF_SWITCH_VF_ID, vf_id); ret = hns_roce_cmq_send(hr_dev, &desc, 1); if (ret) return ret;
desc.flag = cpu_to_le16(HNS_ROCE_CMD_FLAG_IN); desc.flag &= cpu_to_le16(~HNS_ROCE_CMD_FLAG_WR); - roce_set_bit(swt->cfg, VF_SWITCH_DATA_CFG_ALW_LPBK_S, 1); - roce_set_bit(swt->cfg, VF_SWITCH_DATA_CFG_ALW_LCL_LPBK_S, 0); - roce_set_bit(swt->cfg, VF_SWITCH_DATA_CFG_ALW_DST_OVRD_S, 1); + hr_reg_enable(swt, VF_SWITCH_ALW_LPBK); + hr_reg_clear(swt, VF_SWITCH_ALW_LCL_LPBK); + hr_reg_enable(swt, VF_SWITCH_ALW_DST_OVRD);
return hns_roce_cmq_send(hr_dev, &desc, 1); } @@ -2944,10 +2911,8 @@ static int config_sgid_table(struct hns_roce_dev *hr_dev,
hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_CFG_SGID_TB, false);
- roce_set_field(sgid_tb->table_idx_rsv, CFG_SGID_TB_TABLE_IDX_M, - CFG_SGID_TB_TABLE_IDX_S, gid_index); - roce_set_field(sgid_tb->vf_sgid_type_rsv, CFG_SGID_TB_VF_SGID_TYPE_M, - CFG_SGID_TB_VF_SGID_TYPE_S, sgid_type); + hr_reg_write(sgid_tb, CFG_SGID_TB_TABLE_IDX, gid_index); + hr_reg_write(sgid_tb, CFG_SGID_TB_VF_SGID_TYPE, sgid_type);
copy_gid(&sgid_tb->vf_sgid_l, gid);
@@ -2982,19 +2947,14 @@ static int config_gmv_table(struct hns_roce_dev *hr_dev,
copy_gid(&tb_a->vf_sgid_l, gid);
- roce_set_field(tb_a->vf_sgid_type_vlan, CFG_GMV_TB_VF_SGID_TYPE_M, - CFG_GMV_TB_VF_SGID_TYPE_S, sgid_type); - roce_set_bit(tb_a->vf_sgid_type_vlan, CFG_GMV_TB_VF_VLAN_EN_S, - vlan_id < VLAN_CFI_MASK); - roce_set_field(tb_a->vf_sgid_type_vlan, CFG_GMV_TB_VF_VLAN_ID_M, - CFG_GMV_TB_VF_VLAN_ID_S, vlan_id); + hr_reg_write(tb_a, GMV_TB_A_VF_SGID_TYPE, sgid_type); + hr_reg_write(tb_a, GMV_TB_A_VF_VLAN_EN, vlan_id < VLAN_CFI_MASK); + hr_reg_write(tb_a, GMV_TB_A_VF_VLAN_ID, vlan_id);
tb_b->vf_smac_l = cpu_to_le32(*(u32 *)mac); - roce_set_field(tb_b->vf_smac_h, CFG_GMV_TB_SMAC_H_M, - CFG_GMV_TB_SMAC_H_S, *(u16 *)&mac[4]);
- roce_set_field(tb_b->table_idx_rsv, CFG_GMV_TB_SGID_IDX_M, - CFG_GMV_TB_SGID_IDX_S, gid_index); + hr_reg_write(tb_b, GMV_TB_B_SMAC_H, *(u16 *)&mac[4]); + hr_reg_write(tb_b, GMV_TB_B_SGID_IDX, gid_index);
return hns_roce_cmq_send(hr_dev, desc, 2); } @@ -3043,10 +3003,8 @@ static int hns_roce_v2_set_mac(struct hns_roce_dev *hr_dev, u8 phy_port, reg_smac_l = *(u32 *)(&addr[0]); reg_smac_h = *(u16 *)(&addr[4]);
- roce_set_field(smac_tb->tb_idx_rsv, CFG_SMAC_TB_IDX_M, - CFG_SMAC_TB_IDX_S, phy_port); - roce_set_field(smac_tb->vf_smac_h_rsv, CFG_SMAC_TB_VF_SMAC_H_M, - CFG_SMAC_TB_VF_SMAC_H_S, reg_smac_h); + hr_reg_write(smac_tb, CFG_SMAC_TB_IDX, phy_port); + hr_reg_write(smac_tb, CFG_SMAC_TB_VF_SMAC_H, reg_smac_h); smac_tb->vf_smac_l = cpu_to_le32(reg_smac_l);
return hns_roce_cmq_send(hr_dev, &desc, 1); @@ -3075,21 +3033,15 @@ static int set_mtpt_pbl(struct hns_roce_dev *hr_dev,
mpt_entry->pbl_size = cpu_to_le32(mr->npages); mpt_entry->pbl_ba_l = cpu_to_le32(pbl_ba >> 3); - roce_set_field(mpt_entry->byte_48_mode_ba, - V2_MPT_BYTE_48_PBL_BA_H_M, V2_MPT_BYTE_48_PBL_BA_H_S, - upper_32_bits(pbl_ba >> 3)); + hr_reg_write(mpt_entry, MPT_PBL_BA_H, upper_32_bits(pbl_ba >> 3));
mpt_entry->pa0_l = cpu_to_le32(lower_32_bits(pages[0])); - roce_set_field(mpt_entry->byte_56_pa0_h, V2_MPT_BYTE_56_PA0_H_M, - V2_MPT_BYTE_56_PA0_H_S, upper_32_bits(pages[0])); + hr_reg_write(mpt_entry, MPT_PA0_H, upper_32_bits(pages[0]));
mpt_entry->pa1_l = cpu_to_le32(lower_32_bits(pages[1])); - roce_set_field(mpt_entry->byte_64_buf_pa1, V2_MPT_BYTE_64_PA1_H_M, - V2_MPT_BYTE_64_PA1_H_S, upper_32_bits(pages[1])); - roce_set_field(mpt_entry->byte_64_buf_pa1, - V2_MPT_BYTE_64_PBL_BUF_PG_SZ_M, - V2_MPT_BYTE_64_PBL_BUF_PG_SZ_S, - to_hr_hw_page_shift(mr->pbl_mtr.hem_cfg.buf_pg_shift)); + hr_reg_write(mpt_entry, MPT_PA1_H, upper_32_bits(pages[1])); + hr_reg_write(mpt_entry, MPT_PBL_BUF_PG_SZ, + to_hr_hw_page_shift(mr->pbl_mtr.hem_cfg.buf_pg_shift));
return 0; } @@ -3151,24 +3103,19 @@ static int hns_roce_v2_rereg_write_mtpt(struct hns_roce_dev *hr_dev, u32 mr_access_flags = mr->access; int ret = 0;
- roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_MPT_ST_M, - V2_MPT_BYTE_4_MPT_ST_S, V2_MPT_ST_VALID); - - roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_PD_M, - V2_MPT_BYTE_4_PD_S, mr->pd); + hr_reg_write(mpt_entry, MPT_ST, V2_MPT_ST_VALID); + hr_reg_write(mpt_entry, MPT_PD, mr->pd);
if (flags & IB_MR_REREG_ACCESS) { - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, - V2_MPT_BYTE_8_BIND_EN_S, + hr_reg_write(mpt_entry, MPT_BIND_EN, (mr_access_flags & IB_ACCESS_MW_BIND ? 1 : 0)); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, - V2_MPT_BYTE_8_ATOMIC_EN_S, + hr_reg_write(mpt_entry, MPT_ATOMIC_EN, mr_access_flags & IB_ACCESS_REMOTE_ATOMIC ? 1 : 0); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_RR_EN_S, + hr_reg_write(mpt_entry, MPT_RR_EN, mr_access_flags & IB_ACCESS_REMOTE_READ ? 1 : 0); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_RW_EN_S, + hr_reg_write(mpt_entry, MPT_RW_EN, mr_access_flags & IB_ACCESS_REMOTE_WRITE ? 1 : 0); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_LW_EN_S, + hr_reg_write(mpt_entry, MPT_LW_EN, mr_access_flags & IB_ACCESS_LOCAL_WRITE ? 1 : 0); }
@@ -3199,37 +3146,28 @@ static int hns_roce_v2_frmr_write_mtpt(struct hns_roce_dev *hr_dev, return -ENOBUFS; }
- roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_MPT_ST_M, - V2_MPT_BYTE_4_MPT_ST_S, V2_MPT_ST_FREE); - roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_PBL_HOP_NUM_M, - V2_MPT_BYTE_4_PBL_HOP_NUM_S, 1); - roce_set_field(mpt_entry->byte_4_pd_hop_st, - V2_MPT_BYTE_4_PBL_BA_PG_SZ_M, - V2_MPT_BYTE_4_PBL_BA_PG_SZ_S, - to_hr_hw_page_shift(mr->pbl_mtr.hem_cfg.ba_pg_shift)); - roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_PD_M, - V2_MPT_BYTE_4_PD_S, mr->pd); + hr_reg_write(mpt_entry, MPT_ST, V2_MPT_ST_FREE); + hr_reg_write(mpt_entry, MPT_PD, mr->pd); + + hr_reg_enable(mpt_entry, MPT_RA_EN); + hr_reg_enable(mpt_entry, MPT_R_INV_EN); + hr_reg_enable(mpt_entry, MPT_L_INV_EN);
- roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_RA_EN_S, 1); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_R_INV_EN_S, 1); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_L_INV_EN_S, 1); + hr_reg_enable(mpt_entry, MPT_FRE); + hr_reg_clear(mpt_entry, MPT_MR_MW); + hr_reg_enable(mpt_entry, MPT_BPD); + hr_reg_clear(mpt_entry, MPT_PA);
- roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_FRE_S, 1); - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_PA_S, 0); - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_MR_MW_S, 0); - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_BPD_S, 1); + hr_reg_write(mpt_entry, MPT_PBL_HOP_NUM, 1); + hr_reg_write(mpt_entry, MPT_PBL_BA_PG_SZ, + to_hr_hw_page_shift(mr->pbl_mtr.hem_cfg.ba_pg_shift)); + hr_reg_write(mpt_entry, MPT_PBL_BUF_PG_SZ, + to_hr_hw_page_shift(mr->pbl_mtr.hem_cfg.buf_pg_shift));
mpt_entry->pbl_size = cpu_to_le32(mr->npages);
mpt_entry->pbl_ba_l = cpu_to_le32(lower_32_bits(pbl_ba >> 3)); - roce_set_field(mpt_entry->byte_48_mode_ba, V2_MPT_BYTE_48_PBL_BA_H_M, - V2_MPT_BYTE_48_PBL_BA_H_S, - upper_32_bits(pbl_ba >> 3)); - - roce_set_field(mpt_entry->byte_64_buf_pa1, - V2_MPT_BYTE_64_PBL_BUF_PG_SZ_M, - V2_MPT_BYTE_64_PBL_BUF_PG_SZ_S, - to_hr_hw_page_shift(mr->pbl_mtr.hem_cfg.buf_pg_shift)); + hr_reg_write(mpt_entry, MPT_PBL_BA_H, upper_32_bits(pbl_ba >> 3));
return 0; } @@ -3241,36 +3179,29 @@ static int hns_roce_v2_mw_write_mtpt(void *mb_buf, struct hns_roce_mw *mw) mpt_entry = mb_buf; memset(mpt_entry, 0, sizeof(*mpt_entry));
- roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_MPT_ST_M, - V2_MPT_BYTE_4_MPT_ST_S, V2_MPT_ST_FREE); - roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_PD_M, - V2_MPT_BYTE_4_PD_S, mw->pdn); - roce_set_field(mpt_entry->byte_4_pd_hop_st, V2_MPT_BYTE_4_PBL_HOP_NUM_M, - V2_MPT_BYTE_4_PBL_HOP_NUM_S, - mw->pbl_hop_num == HNS_ROCE_HOP_NUM_0 ? 0 : - mw->pbl_hop_num); - roce_set_field(mpt_entry->byte_4_pd_hop_st, - V2_MPT_BYTE_4_PBL_BA_PG_SZ_M, - V2_MPT_BYTE_4_PBL_BA_PG_SZ_S, - mw->pbl_ba_pg_sz + PG_SHIFT_OFFSET); - - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_R_INV_EN_S, 1); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_L_INV_EN_S, 1); - roce_set_bit(mpt_entry->byte_8_mw_cnt_en, V2_MPT_BYTE_8_LW_EN_S, 1); - - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_PA_S, 0); - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_MR_MW_S, 1); - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_BPD_S, 1); - roce_set_bit(mpt_entry->byte_12_mw_pa, V2_MPT_BYTE_12_BQP_S, - mw->ibmw.type == IB_MW_TYPE_1 ? 0 : 1); + hr_reg_write(mpt_entry, MPT_ST, V2_MPT_ST_FREE); + hr_reg_write(mpt_entry, MPT_PD, mw->pdn);
- roce_set_field(mpt_entry->byte_64_buf_pa1, - V2_MPT_BYTE_64_PBL_BUF_PG_SZ_M, - V2_MPT_BYTE_64_PBL_BUF_PG_SZ_S, - mw->pbl_buf_pg_sz + PG_SHIFT_OFFSET); + hr_reg_enable(mpt_entry, MPT_R_INV_EN); + hr_reg_enable(mpt_entry, MPT_L_INV_EN); + hr_reg_enable(mpt_entry, MPT_LW_EN); + + hr_reg_enable(mpt_entry, MPT_MR_MW); + hr_reg_enable(mpt_entry, MPT_BPD); + hr_reg_clear(mpt_entry, MPT_PA); + hr_reg_write(mpt_entry, MPT_BQP, + mw->ibmw.type == IB_MW_TYPE_1 ? 0 : 1);
mpt_entry->lkey = cpu_to_le32(mw->rkey);
+ hr_reg_write(mpt_entry, MPT_PBL_HOP_NUM, + mw->pbl_hop_num == HNS_ROCE_HOP_NUM_0 ? 0 : + mw->pbl_hop_num); + hr_reg_write(mpt_entry, MPT_PBL_BA_PG_SZ, + mw->pbl_ba_pg_sz + PG_SHIFT_OFFSET); + hr_reg_write(mpt_entry, MPT_PBL_BUF_PG_SZ, + mw->pbl_buf_pg_sz + PG_SHIFT_OFFSET); + return 0; }
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h index d3d5b5f57052..ddb99a7ff135 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h @@ -790,12 +790,15 @@ struct hns_roce_v2_mpt_entry { #define MPT_LKEY MPT_FIELD_LOC(223, 192) #define MPT_VA MPT_FIELD_LOC(287, 224) #define MPT_PBL_SIZE MPT_FIELD_LOC(319, 288) -#define MPT_PBL_BA MPT_FIELD_LOC(380, 320) +#define MPT_PBL_BA_L MPT_FIELD_LOC(351, 320) +#define MPT_PBL_BA_H MPT_FIELD_LOC(380, 352) #define MPT_BLK_MODE MPT_FIELD_LOC(381, 381) #define MPT_RSV0 MPT_FIELD_LOC(383, 382) -#define MPT_PA0 MPT_FIELD_LOC(441, 384) +#define MPT_PA0_L MPT_FIELD_LOC(415, 384) +#define MPT_PA0_H MPT_FIELD_LOC(441, 416) #define MPT_BOUND_VA MPT_FIELD_LOC(447, 442) -#define MPT_PA1 MPT_FIELD_LOC(505, 448) +#define MPT_PA1_L MPT_FIELD_LOC(479, 448) +#define MPT_PA1_H MPT_FIELD_LOC(505, 480) #define MPT_PERSIST_EN MPT_FIELD_LOC(506, 506) #define MPT_RSV2 MPT_FIELD_LOC(507, 507) #define MPT_PBL_BUF_PG_SZ MPT_FIELD_LOC(511, 508) @@ -901,48 +904,24 @@ struct hns_roce_v2_ud_send_wqe { u8 dgid[GID_LEN_V2]; };
-#define V2_UD_SEND_WQE_BYTE_4_OPCODE_S 0 -#define V2_UD_SEND_WQE_BYTE_4_OPCODE_M GENMASK(4, 0) - -#define V2_UD_SEND_WQE_BYTE_4_OWNER_S 7 - -#define V2_UD_SEND_WQE_BYTE_4_CQE_S 8 - -#define V2_UD_SEND_WQE_BYTE_4_SE_S 11 - -#define V2_UD_SEND_WQE_BYTE_16_PD_S 0 -#define V2_UD_SEND_WQE_BYTE_16_PD_M GENMASK(23, 0) - -#define V2_UD_SEND_WQE_BYTE_16_SGE_NUM_S 24 -#define V2_UD_SEND_WQE_BYTE_16_SGE_NUM_M GENMASK(31, 24) - -#define V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S 0 -#define V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M GENMASK(23, 0) - -#define V2_UD_SEND_WQE_BYTE_24_UDPSPN_S 16 -#define V2_UD_SEND_WQE_BYTE_24_UDPSPN_M GENMASK(31, 16) - -#define V2_UD_SEND_WQE_BYTE_32_DQPN_S 0 -#define V2_UD_SEND_WQE_BYTE_32_DQPN_M GENMASK(23, 0) - -#define V2_UD_SEND_WQE_BYTE_36_VLAN_S 0 -#define V2_UD_SEND_WQE_BYTE_36_VLAN_M GENMASK(15, 0) - -#define V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_S 16 -#define V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_M GENMASK(23, 16) - -#define V2_UD_SEND_WQE_BYTE_36_TCLASS_S 24 -#define V2_UD_SEND_WQE_BYTE_36_TCLASS_M GENMASK(31, 24) - -#define V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_S 0 -#define V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_M GENMASK(19, 0) - -#define V2_UD_SEND_WQE_BYTE_40_SL_S 20 -#define V2_UD_SEND_WQE_BYTE_40_SL_M GENMASK(23, 20) - -#define V2_UD_SEND_WQE_BYTE_40_UD_VLAN_EN_S 30 - -#define V2_UD_SEND_WQE_BYTE_40_LBI_S 31 +#define UD_SEND_WQE_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_v2_ud_send_wqe, h, l) + +#define UD_SEND_WQE_OPCODE UD_SEND_WQE_FIELD_LOC(4, 0) +#define UD_SEND_WQE_OWNER UD_SEND_WQE_FIELD_LOC(7, 7) +#define UD_SEND_WQE_CQE UD_SEND_WQE_FIELD_LOC(8, 8) +#define UD_SEND_WQE_SE UD_SEND_WQE_FIELD_LOC(11, 11) +#define UD_SEND_WQE_PD UD_SEND_WQE_FIELD_LOC(119, 96) +#define UD_SEND_WQE_SGE_NUM UD_SEND_WQE_FIELD_LOC(127, 120) +#define UD_SEND_WQE_MSG_START_SGE_IDX UD_SEND_WQE_FIELD_LOC(151, 128) +#define UD_SEND_WQE_UDPSPN UD_SEND_WQE_FIELD_LOC(191, 176) +#define UD_SEND_WQE_DQPN UD_SEND_WQE_FIELD_LOC(247, 224) +#define UD_SEND_WQE_VLAN UD_SEND_WQE_FIELD_LOC(271, 256) +#define UD_SEND_WQE_HOPLIMIT UD_SEND_WQE_FIELD_LOC(279, 272) +#define UD_SEND_WQE_TCLASS UD_SEND_WQE_FIELD_LOC(287, 280) +#define UD_SEND_WQE_FLOW_LABEL UD_SEND_WQE_FIELD_LOC(307, 288) +#define UD_SEND_WQE_SL UD_SEND_WQE_FIELD_LOC(311, 308) +#define UD_SEND_WQE_VLAN_EN UD_SEND_WQE_FIELD_LOC(318, 318) +#define UD_SEND_WQE_LBI UD_SEND_WQE_FIELD_LOC(319, 319)
struct hns_roce_v2_rc_send_wqe { __le32 byte_4; @@ -957,42 +936,23 @@ struct hns_roce_v2_rc_send_wqe { __le64 va; };
-#define V2_RC_SEND_WQE_BYTE_4_OPCODE_S 0 -#define V2_RC_SEND_WQE_BYTE_4_OPCODE_M GENMASK(4, 0) - -#define V2_RC_SEND_WQE_BYTE_4_DB_SL_L_S 5 -#define V2_RC_SEND_WQE_BYTE_4_DB_SL_L_M GENMASK(6, 5) - -#define V2_RC_SEND_WQE_BYTE_4_DB_SL_H_S 13 -#define V2_RC_SEND_WQE_BYTE_4_DB_SL_H_M GENMASK(14, 13) - -#define V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S 15 -#define V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M GENMASK(30, 15) - -#define V2_RC_SEND_WQE_BYTE_4_OWNER_S 7 - -#define V2_RC_SEND_WQE_BYTE_4_CQE_S 8 - -#define V2_RC_SEND_WQE_BYTE_4_FENCE_S 9 - -#define V2_RC_SEND_WQE_BYTE_4_SO_S 10 - -#define V2_RC_SEND_WQE_BYTE_4_SE_S 11 - -#define V2_RC_SEND_WQE_BYTE_4_INLINE_S 12 - -#define V2_RC_SEND_WQE_BYTE_4_FLAG_S 31 - -#define V2_RC_SEND_WQE_BYTE_16_XRC_SRQN_S 0 -#define V2_RC_SEND_WQE_BYTE_16_XRC_SRQN_M GENMASK(23, 0) - -#define V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S 24 -#define V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M GENMASK(31, 24) - -#define V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S 0 -#define V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M GENMASK(23, 0) - -#define V2_RC_SEND_WQE_BYTE_20_INL_TYPE_S 31 +#define RC_SEND_WQE_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_v2_rc_send_wqe, h, l) + +#define RC_SEND_WQE_OPCODE RC_SEND_WQE_FIELD_LOC(4, 0) +#define RC_SEND_WQE_DB_SL_L RC_SEND_WQE_FIELD_LOC(6, 5) +#define RC_SEND_WQE_DB_SL_H RC_SEND_WQE_FIELD_LOC(14, 13) +#define RC_SEND_WQE_OWNER RC_SEND_WQE_FIELD_LOC(7, 7) +#define RC_SEND_WQE_CQE RC_SEND_WQE_FIELD_LOC(8, 8) +#define RC_SEND_WQE_FENCE RC_SEND_WQE_FIELD_LOC(9, 9) +#define RC_SEND_WQE_SO RC_SEND_WQE_FIELD_LOC(10, 10) +#define RC_SEND_WQE_SE RC_SEND_WQE_FIELD_LOC(11, 11) +#define RC_SEND_WQE_INLINE RC_SEND_WQE_FIELD_LOC(12, 12) +#define RC_SEND_WQE_WQE_INDEX RC_SEND_WQE_FIELD_LOC(30, 15) +#define RC_SEND_WQE_FLAG RC_SEND_WQE_FIELD_LOC(31, 31) +#define RC_SEND_WQE_XRC_SRQN RC_SEND_WQE_FIELD_LOC(119, 96) +#define RC_SEND_WQE_SGE_NUM RC_SEND_WQE_FIELD_LOC(127, 120) +#define RC_SEND_WQE_MSG_START_SGE_IDX RC_SEND_WQE_FIELD_LOC(151, 128) +#define RC_SEND_WQE_INL_TYPE RC_SEND_WQE_FIELD_LOC(159, 159)
struct hns_roce_wqe_frmr_seg { __le32 pbl_size; @@ -1114,12 +1074,12 @@ struct hns_roce_vf_switch { __le32 resv3; };
-#define VF_SWITCH_DATA_FUN_ID_VF_ID_S 3 -#define VF_SWITCH_DATA_FUN_ID_VF_ID_M GENMASK(10, 3) +#define VF_SWITCH_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_vf_switch, h, l)
-#define VF_SWITCH_DATA_CFG_ALW_LPBK_S 1 -#define VF_SWITCH_DATA_CFG_ALW_LCL_LPBK_S 2 -#define VF_SWITCH_DATA_CFG_ALW_DST_OVRD_S 3 +#define VF_SWITCH_VF_ID VF_SWITCH_FIELD_LOC(42, 35) +#define VF_SWITCH_ALW_LPBK VF_SWITCH_FIELD_LOC(65, 65) +#define VF_SWITCH_ALW_LCL_LPBK VF_SWITCH_FIELD_LOC(66, 66) +#define VF_SWITCH_ALW_DST_OVRD VF_SWITCH_FIELD_LOC(67, 67)
struct hns_roce_post_mbox { __le32 in_param_l; @@ -1182,11 +1142,10 @@ struct hns_roce_cfg_sgid_tb { __le32 vf_sgid_type_rsv; };
-#define CFG_SGID_TB_TABLE_IDX_S 0 -#define CFG_SGID_TB_TABLE_IDX_M GENMASK(7, 0) +#define SGID_TB_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_cfg_sgid_tb, h, l)
-#define CFG_SGID_TB_VF_SGID_TYPE_S 0 -#define CFG_SGID_TB_VF_SGID_TYPE_M GENMASK(1, 0) +#define CFG_SGID_TB_TABLE_IDX SGID_TB_FIELD_LOC(7, 0) +#define CFG_SGID_TB_VF_SGID_TYPE SGID_TB_FIELD_LOC(161, 160)
struct hns_roce_cfg_smac_tb { __le32 tb_idx_rsv; @@ -1194,11 +1153,11 @@ struct hns_roce_cfg_smac_tb { __le32 vf_smac_h_rsv; __le32 rsv[3]; }; -#define CFG_SMAC_TB_IDX_S 0 -#define CFG_SMAC_TB_IDX_M GENMASK(7, 0)
-#define CFG_SMAC_TB_VF_SMAC_H_S 0 -#define CFG_SMAC_TB_VF_SMAC_H_M GENMASK(15, 0) +#define SMAC_TB_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_cfg_smac_tb, h, l) + +#define CFG_SMAC_TB_IDX SMAC_TB_FIELD_LOC(7, 0) +#define CFG_SMAC_TB_VF_SMAC_H SMAC_TB_FIELD_LOC(79, 64)
struct hns_roce_cfg_gmv_tb_a { __le32 vf_sgid_l; @@ -1209,16 +1168,11 @@ struct hns_roce_cfg_gmv_tb_a { __le32 resv; };
-#define CFG_GMV_TB_SGID_IDX_S 0 -#define CFG_GMV_TB_SGID_IDX_M GENMASK(7, 0) - -#define CFG_GMV_TB_VF_SGID_TYPE_S 0 -#define CFG_GMV_TB_VF_SGID_TYPE_M GENMASK(1, 0) +#define GMV_TB_A_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_cfg_gmv_tb_a, h, l)
-#define CFG_GMV_TB_VF_VLAN_EN_S 2 - -#define CFG_GMV_TB_VF_VLAN_ID_S 16 -#define CFG_GMV_TB_VF_VLAN_ID_M GENMASK(27, 16) +#define GMV_TB_A_VF_SGID_TYPE GMV_TB_A_FIELD_LOC(129, 128) +#define GMV_TB_A_VF_VLAN_EN GMV_TB_A_FIELD_LOC(130, 130) +#define GMV_TB_A_VF_VLAN_ID GMV_TB_A_FIELD_LOC(155, 144)
struct hns_roce_cfg_gmv_tb_b { __le32 vf_smac_l; @@ -1227,8 +1181,10 @@ struct hns_roce_cfg_gmv_tb_b { __le32 resv[3]; };
-#define CFG_GMV_TB_SMAC_H_S 0 -#define CFG_GMV_TB_SMAC_H_M GENMASK(15, 0) +#define GMV_TB_B_FIELD_LOC(h, l) FIELD_LOC(struct hns_roce_cfg_gmv_tb_b, h, l) + +#define GMV_TB_B_SMAC_H GMV_TB_B_FIELD_LOC(47, 32) +#define GMV_TB_B_SGID_IDX GMV_TB_B_FIELD_LOC(71, 64)
#define HNS_ROCE_QUERY_PF_CAPS_CMD_NUM 5 struct hns_roce_query_pf_caps_a {
From: Yangyang Li liyangyang20@huawei.com
[ Upstream commit 9e272ed69ad6f6952fafd0599d6993575512408e ]
When function reset and local invalidate are mixed, HNS RoCEE may hang. Before introducing the cause of the problem, two hardware internal concepts need to be introduced:
1. Execution queue: The queue of hardware execution instructions, function reset and local invalidate are queued for execution in this queue.
2.Local queue: A queue that stores local operation instructions. The instructions in the local queue will be sent to the execution queue for execution. The instructions in the local queue will not be removed until the execution is completed.
The reason for the problem is as follows:
1. There is a function reset instruction in the execution queue, which is currently being executed. A necessary condition for the successful execution of function reset is: the hardware pipeline needs to empty the instructions that were not completed before;
2. A local invalidate instruction at the head of the local queue is sent to the execution queue. Now there are two instructions in the execution queue, the first is the function reset instruction, and the second is the local invalidate instruction, which will be executed in se quence;
3. The user has issued many local invalidate operations, causing the local queue to be filled up.
4. The user still has a new local operation command and is queuing to enter the local queue. But the local queue is full and cannot receive new instructions, this instruction is temporarily stored at the hardware pipeline.
5. The function reset has been waiting for the instruction before the hardware pipeline stage is drained. The hardware pipeline stage also caches a local invalidate instruction, so the function reset cannot be completed, and the instructions after it cannot be executed.
These factors together cause the execution logic deadlock of the hardware, and the consequence is that RoCEE will not have any response. Considering that the local operation command may potentially cause RoCEE to hang, this feature is no longer supported.
Fixes: e93df0108579 ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Yangyang Li liyangyang20@huawei.com Signed-off-by: Wenpeng Liang liangwenpeng@huawei.com Signed-off-by: Haoyue Xu xuhaoyue1@hisilicon.com Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 11 ----------- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 2 -- 2 files changed, 13 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 3b26c74efee0..1421896abaf0 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -82,7 +82,6 @@ static const u32 hns_roce_op_code[] = { HR_OPC_MAP(ATOMIC_CMP_AND_SWP, ATOM_CMP_AND_SWAP), HR_OPC_MAP(ATOMIC_FETCH_AND_ADD, ATOM_FETCH_AND_ADD), HR_OPC_MAP(SEND_WITH_INV, SEND_WITH_INV), - HR_OPC_MAP(LOCAL_INV, LOCAL_INV), HR_OPC_MAP(MASKED_ATOMIC_CMP_AND_SWP, ATOM_MSK_CMP_AND_SWAP), HR_OPC_MAP(MASKED_ATOMIC_FETCH_AND_ADD, ATOM_MSK_FETCH_AND_ADD), HR_OPC_MAP(REG_MR, FAST_REG_PMR), @@ -524,9 +523,6 @@ static int set_rc_opcode(struct hns_roce_dev *hr_dev, else ret = -EOPNOTSUPP; break; - case IB_WR_LOCAL_INV: - hr_reg_enable(rc_sq_wqe, RC_SEND_WQE_SO); - fallthrough; case IB_WR_SEND_WITH_INV: rc_sq_wqe->inv_key = cpu_to_le32(wr->ex.invalidate_rkey); break; @@ -3058,7 +3054,6 @@ static int hns_roce_v2_write_mtpt(struct hns_roce_dev *hr_dev,
hr_reg_write(mpt_entry, MPT_ST, V2_MPT_ST_VALID); hr_reg_write(mpt_entry, MPT_PD, mr->pd); - hr_reg_enable(mpt_entry, MPT_L_INV_EN);
hr_reg_write_bool(mpt_entry, MPT_BIND_EN, mr->access & IB_ACCESS_MW_BIND); @@ -3151,7 +3146,6 @@ static int hns_roce_v2_frmr_write_mtpt(struct hns_roce_dev *hr_dev,
hr_reg_enable(mpt_entry, MPT_RA_EN); hr_reg_enable(mpt_entry, MPT_R_INV_EN); - hr_reg_enable(mpt_entry, MPT_L_INV_EN);
hr_reg_enable(mpt_entry, MPT_FRE); hr_reg_clear(mpt_entry, MPT_MR_MW); @@ -3183,7 +3177,6 @@ static int hns_roce_v2_mw_write_mtpt(void *mb_buf, struct hns_roce_mw *mw) hr_reg_write(mpt_entry, MPT_PD, mw->pdn);
hr_reg_enable(mpt_entry, MPT_R_INV_EN); - hr_reg_enable(mpt_entry, MPT_L_INV_EN); hr_reg_enable(mpt_entry, MPT_LW_EN);
hr_reg_enable(mpt_entry, MPT_MR_MW); @@ -3540,7 +3533,6 @@ static const u32 wc_send_op_map[] = { HR_WC_OP_MAP(RDMA_READ, RDMA_READ), HR_WC_OP_MAP(RDMA_WRITE, RDMA_WRITE), HR_WC_OP_MAP(RDMA_WRITE_WITH_IMM, RDMA_WRITE), - HR_WC_OP_MAP(LOCAL_INV, LOCAL_INV), HR_WC_OP_MAP(ATOM_CMP_AND_SWAP, COMP_SWAP), HR_WC_OP_MAP(ATOM_FETCH_AND_ADD, FETCH_ADD), HR_WC_OP_MAP(ATOM_MSK_CMP_AND_SWAP, MASKED_COMP_SWAP), @@ -3590,9 +3582,6 @@ static void fill_send_wc(struct ib_wc *wc, struct hns_roce_v2_cqe *cqe) case HNS_ROCE_V2_WQE_OP_RDMA_WRITE_WITH_IMM: wc->wc_flags |= IB_WC_WITH_IMM; break; - case HNS_ROCE_V2_WQE_OP_LOCAL_INV: - wc->wc_flags |= IB_WC_WITH_INVALIDATE; - break; case HNS_ROCE_V2_WQE_OP_ATOM_CMP_AND_SWAP: case HNS_ROCE_V2_WQE_OP_ATOM_FETCH_AND_ADD: case HNS_ROCE_V2_WQE_OP_ATOM_MSK_CMP_AND_SWAP: diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h index ddb99a7ff135..2f4a0019a716 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h @@ -184,7 +184,6 @@ enum { HNS_ROCE_V2_WQE_OP_ATOM_MSK_CMP_AND_SWAP = 0x8, HNS_ROCE_V2_WQE_OP_ATOM_MSK_FETCH_AND_ADD = 0x9, HNS_ROCE_V2_WQE_OP_FAST_REG_PMR = 0xa, - HNS_ROCE_V2_WQE_OP_LOCAL_INV = 0xb, HNS_ROCE_V2_WQE_OP_BIND_MW = 0xc, HNS_ROCE_V2_WQE_OP_MASK = 0x1f, }; @@ -944,7 +943,6 @@ struct hns_roce_v2_rc_send_wqe { #define RC_SEND_WQE_OWNER RC_SEND_WQE_FIELD_LOC(7, 7) #define RC_SEND_WQE_CQE RC_SEND_WQE_FIELD_LOC(8, 8) #define RC_SEND_WQE_FENCE RC_SEND_WQE_FIELD_LOC(9, 9) -#define RC_SEND_WQE_SO RC_SEND_WQE_FIELD_LOC(10, 10) #define RC_SEND_WQE_SE RC_SEND_WQE_FIELD_LOC(11, 11) #define RC_SEND_WQE_INLINE RC_SEND_WQE_FIELD_LOC(12, 12) #define RC_SEND_WQE_WQE_INDEX RC_SEND_WQE_FIELD_LOC(30, 15)
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 1ba04394e028ea8b45d92685cc0d6ab582cf7647 ]
If the server reboots while we are engaged in a delegation return, and there is a pNFS layout with return-on-close set, then the current code can end up deadlocking in pnfs_roc() when nfs_inode_set_delegation() tries to return the old delegation. Now that delegreturn actually uses its own copy of the stateid, it should be safe to just always update the delegation stateid in place.
Fixes: 078000d02d57 ("pNFS: We want return-on-close to complete when evicting the inode") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/delegation.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 7c9eb679dbdb..6a3ba306c321 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -228,8 +228,7 @@ static int nfs_delegation_claim_opens(struct inode *inode, * */ void nfs_inode_reclaim_delegation(struct inode *inode, const struct cred *cred, - fmode_t type, - const nfs4_stateid *stateid, + fmode_t type, const nfs4_stateid *stateid, unsigned long pagemod_limit) { struct nfs_delegation *delegation; @@ -239,25 +238,24 @@ void nfs_inode_reclaim_delegation(struct inode *inode, const struct cred *cred, delegation = rcu_dereference(NFS_I(inode)->delegation); if (delegation != NULL) { spin_lock(&delegation->lock); - if (nfs4_is_valid_delegation(delegation, 0)) { - nfs4_stateid_copy(&delegation->stateid, stateid); - delegation->type = type; - delegation->pagemod_limit = pagemod_limit; - oldcred = delegation->cred; - delegation->cred = get_cred(cred); - clear_bit(NFS_DELEGATION_NEED_RECLAIM, - &delegation->flags); - spin_unlock(&delegation->lock); - rcu_read_unlock(); - put_cred(oldcred); - trace_nfs4_reclaim_delegation(inode, type); - return; - } - /* We appear to have raced with a delegation return. */ + nfs4_stateid_copy(&delegation->stateid, stateid); + delegation->type = type; + delegation->pagemod_limit = pagemod_limit; + oldcred = delegation->cred; + delegation->cred = get_cred(cred); + clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags); + if (test_and_clear_bit(NFS_DELEGATION_REVOKED, + &delegation->flags)) + atomic_long_inc(&nfs_active_delegations); spin_unlock(&delegation->lock); + rcu_read_unlock(); + put_cred(oldcred); + trace_nfs4_reclaim_delegation(inode, type); + } else { + rcu_read_unlock(); + nfs_inode_set_delegation(inode, cred, type, stateid, + pagemod_limit); } - rcu_read_unlock(); - nfs_inode_set_delegation(inode, cred, type, stateid, pagemod_limit); }
static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync)
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 5d917cba3201e5c25059df96c29252fd99c4f6a7 ]
If RECLAIM_COMPLETE sets the NFS4CLNT_BIND_CONN_TO_SESSION flag, then we need to loop back in order to handle it.
Fixes: 0048fdd06614 ("NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 32ee79a99246..2795aa1c3cf4 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2651,6 +2651,7 @@ static void nfs4_state_manager(struct nfs_client *clp) if (status < 0) goto out_error; nfs4_state_end_reclaim_reboot(clp); + continue; }
/* Detect expired delegations... */
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit e59679f2b7e522ecad99974e5636291ffd47c184 ]
Currently, we are only guaranteed to send RECLAIM_COMPLETE if we have open state to recover. Fix the client to always send RECLAIM_COMPLETE after setting up the lease.
Fixes: fce5c838e133 ("nfs41: RECLAIM_COMPLETE functionality") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 2795aa1c3cf4..ecac56be6cb7 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1778,6 +1778,7 @@ static void nfs4_state_mark_reclaim_helper(struct nfs_client *clp,
static void nfs4_state_start_reclaim_reboot(struct nfs_client *clp) { + set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state); /* Mark all delegations for reclaim */ nfs_delegation_mark_reclaim(clp); nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_reboot);
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit cbdeaee94a415800c65a8c3fa04d9664a8b8fb3a ]
There is a null-ptr-deref when xps sysfs alloc failed: BUG: KASAN: null-ptr-deref in sysfs_do_create_link_sd+0x40/0xd0 Read of size 8 at addr 0000000000000030 by task gssproxy/457
CPU: 5 PID: 457 Comm: gssproxy Not tainted 6.0.0-09040-g02357b27ee03 #9 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xa3/0x120 sysfs_do_create_link_sd+0x40/0xd0 rpc_sysfs_client_setup+0x161/0x1b0 rpc_new_client+0x3fc/0x6e0 rpc_create_xprt+0x71/0x220 rpc_create+0x1d4/0x350 gssp_rpc_create+0xc3/0x160 set_gssp_clnt+0xbc/0x140 write_gssp+0x116/0x1a0 proc_reg_write+0xd6/0x130 vfs_write+0x177/0x690 ksys_write+0xb9/0x150 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
When the xprt_switch sysfs alloc failed, should not add xprt and switch sysfs to it, otherwise, maybe null-ptr-deref; also initialize the 'xps_sysfs' to NULL to avoid oops when destroy it.
Fixes: 2a338a543163 ("sunrpc: add a symlink from rpc-client directory to the xprt_switch") Fixes: d408ebe04ac5 ("sunrpc: add add sysfs directory per xprt under each xprt_switch") Fixes: baea99445dd4 ("sunrpc: add xprt_switch direcotry to sunrpc's sysfs") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/sysfs.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/sysfs.c b/net/sunrpc/sysfs.c index a7020b1f3ec7..55da1b627a7d 100644 --- a/net/sunrpc/sysfs.c +++ b/net/sunrpc/sysfs.c @@ -525,13 +525,16 @@ void rpc_sysfs_client_setup(struct rpc_clnt *clnt, struct net *net) { struct rpc_sysfs_client *rpc_client; + struct rpc_sysfs_xprt_switch *xswitch = + (struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs; + + if (!xswitch) + return;
rpc_client = rpc_sysfs_client_alloc(rpc_sunrpc_client_kobj, net, clnt->cl_clid); if (rpc_client) { char name[] = "switch"; - struct rpc_sysfs_xprt_switch *xswitch = - (struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs; int ret;
clnt->cl_sysfs = rpc_client; @@ -565,6 +568,8 @@ void rpc_sysfs_xprt_switch_setup(struct rpc_xprt_switch *xprt_switch, rpc_xprt_switch->xprt_switch = xprt_switch; rpc_xprt_switch->xprt = xprt; kobject_uevent(&rpc_xprt_switch->kobject, KOBJ_ADD); + } else { + xprt_switch->xps_sysfs = NULL; } }
@@ -576,6 +581,9 @@ void rpc_sysfs_xprt_setup(struct rpc_xprt_switch *xprt_switch, struct rpc_sysfs_xprt_switch *switch_obj = (struct rpc_sysfs_xprt_switch *)xprt_switch->xps_sysfs;
+ if (!switch_obj) + return; + rpc_xprt = rpc_sysfs_xprt_alloc(&switch_obj->kobject, xprt, gfp_flags); if (rpc_xprt) { xprt->xprt_sysfs = rpc_xprt;
From: Benjamin Coddington bcodding@redhat.com
[ Upstream commit 038efb6348ce96228f6828354cb809c22a661681 ]
When holding a delegation, the NFS client optimizes away setting the attributes of a file from the GETATTR in the compound after CLONE, and for a zero-length CLONE we will end up setting the inode's size to zero in nfs42_copy_dest_done(). Handle this case by computing the resulting count from the server's reported size after CLONE's GETATTR.
Suggested-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Benjamin Coddington bcodding@redhat.com Fixes: 94d202d5ca39 ("NFSv42: Copy offload should update the file size when appropriate") Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 93f4d8257525..da94bf2afd07 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -1077,6 +1077,9 @@ static int _nfs42_proc_clone(struct rpc_message *msg, struct file *src_f, status = nfs4_call_sync(server->client, server, msg, &args.seq_args, &res.seq_res, 0); if (status == 0) { + /* a zero-length count means clone to EOF in src */ + if (count == 0 && res.dst_fattr->valid & NFS_ATTR_FATTR_SIZE) + count = nfs_size_to_loff_t(res.dst_fattr->size) - dst_offset; nfs42_copy_dest_done(dst_inode, dst_offset, count); status = nfs_post_op_update_inode(dst_inode, res.dst_fattr); }
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit 7e8436728e22181c3f12a5dbabd35ed3a8b8c593 ]
If one of the slot allocate failed, should cleanup all the other allocated slots, otherwise, the allocated slots will leak:
unreferenced object 0xffff8881115aa100 (size 64): comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s) hex dump (first 32 bytes): 00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff ...s......Z..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130 [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270 [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90 [<00000000128486db>] nfs4_init_client+0xce/0x270 [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0 [<000000000e593b52>] nfs4_create_server+0x300/0x5f0 [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110 [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0 [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0 [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0 [<000000005d56bdec>] do_syscall_64+0x35/0x80 [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: abf79bb341bf ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4client.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index ed06b68b2b4e..1bf7a72ebda6 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -346,6 +346,7 @@ int nfs40_init_client(struct nfs_client *clp) ret = nfs4_setup_slot_table(tbl, NFS4_MAX_SLOT_TABLE, "NFSv4.0 transport Slot table"); if (ret) { + nfs4_shutdown_slot_table(tbl); kfree(tbl); return ret; }
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 633efc8b3dc96f56f5a57f2a49764853a2fa3f50 ]
kmemleak reported memory leaks in dsa_loop_init():
kmemleak: 12 new suspected memory leaks
unreferenced object 0xffff8880138ce000 (size 2048): comm "modprobe", pid 390, jiffies 4295040478 (age 238.976s) backtrace: [<000000006a94f1d5>] kmalloc_trace+0x26/0x60 [<00000000a9c44622>] phy_device_create+0x5d/0x970 [<00000000d0ee2afc>] get_phy_device+0xf3/0x2b0 [<00000000dca0c71f>] __fixed_phy_register.part.0+0x92/0x4e0 [<000000008a834798>] fixed_phy_register+0x84/0xb0 [<0000000055223fcb>] dsa_loop_init+0xa9/0x116 [dsa_loop] ...
There are two reasons for memleak in dsa_loop_init().
First, fixed_phy_register() create and register phy_device:
fixed_phy_register() get_phy_device() phy_device_create() # freed by phy_device_free() phy_device_register() # freed by phy_device_remove()
But fixed_phy_unregister() only calls phy_device_remove(). So the memory allocated in phy_device_create() is leaked.
Second, when mdio_driver_register() fail in dsa_loop_init(), it just returns and there is no cleanup for phydevs.
Fix the problems by catching the error of mdio_driver_register() in dsa_loop_init(), then calling both fixed_phy_unregister() and phy_device_free() to release phydevs. Also add a function for phydevs cleanup to avoid duplacate.
Fixes: 98cd1552ea27 ("net: dsa: Mock-up driver") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/dsa_loop.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c index e638e3eea911..e6e1054339b5 100644 --- a/drivers/net/dsa/dsa_loop.c +++ b/drivers/net/dsa/dsa_loop.c @@ -376,6 +376,17 @@ static struct mdio_driver dsa_loop_drv = {
#define NUM_FIXED_PHYS (DSA_LOOP_NUM_PORTS - 2)
+static void dsa_loop_phydevs_unregister(void) +{ + unsigned int i; + + for (i = 0; i < NUM_FIXED_PHYS; i++) + if (!IS_ERR(phydevs[i])) { + fixed_phy_unregister(phydevs[i]); + phy_device_free(phydevs[i]); + } +} + static int __init dsa_loop_init(void) { struct fixed_phy_status status = { @@ -383,23 +394,23 @@ static int __init dsa_loop_init(void) .speed = SPEED_100, .duplex = DUPLEX_FULL, }; - unsigned int i; + unsigned int i, ret;
for (i = 0; i < NUM_FIXED_PHYS; i++) phydevs[i] = fixed_phy_register(PHY_POLL, &status, NULL);
- return mdio_driver_register(&dsa_loop_drv); + ret = mdio_driver_register(&dsa_loop_drv); + if (ret) + dsa_loop_phydevs_unregister(); + + return ret; } module_init(dsa_loop_init);
static void __exit dsa_loop_exit(void) { - unsigned int i; - mdio_driver_unregister(&dsa_loop_drv); - for (i = 0; i < NUM_FIXED_PHYS; i++) - if (!IS_ERR(phydevs[i])) - fixed_phy_unregister(phydevs[i]); + dsa_loop_phydevs_unregister(); } module_exit(dsa_loop_exit);
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 07c0d131cc0fe1f3981a42958fc52d573d303d89 ]
KASAN reported a null-ptr-deref error:
KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] CPU: 1 PID: 379 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:destroy_workqueue+0x2f/0x740 RSP: 0018:ffff888016137df8 EFLAGS: 00000202 ... Call Trace: ib_core_cleanup+0xa/0xa1 [ib_core] __do_sys_delete_module.constprop.0+0x34f/0x5b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa1a0d221b7 ...
It is because the fail of roce_gid_mgmt_init() is ignored:
ib_core_init() roce_gid_mgmt_init() gid_cache_wq = alloc_ordered_workqueue # fail ... ib_core_cleanup() roce_gid_mgmt_cleanup() destroy_workqueue(gid_cache_wq) # destroy an unallocated wq
Fix this by catching the fail of roce_gid_mgmt_init() in ib_core_init().
Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Link: https://lore.kernel.org/r/20221025024146.109137-1-chenzhongjin@huawei.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/device.c | 10 +++++++++- drivers/infiniband/core/nldev.c | 2 +- 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 6ab46648af90..1519379b116e 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2814,10 +2814,18 @@ static int __init ib_core_init(void)
nldev_init(); rdma_nl_register(RDMA_NL_LS, ibnl_ls_cb_table); - roce_gid_mgmt_init(); + ret = roce_gid_mgmt_init(); + if (ret) { + pr_warn("Couldn't init RoCE GID management\n"); + goto err_parent; + }
return 0;
+err_parent: + rdma_nl_unregister(RDMA_NL_LS); + nldev_exit(); + unregister_pernet_device(&rdma_dev_net_ops); err_compat: unregister_blocking_lsm_notifier(&ibdev_lsm_nb); err_sa: diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index e9b4b2cccaa0..cd89e59cbe33 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -2349,7 +2349,7 @@ void __init nldev_init(void) rdma_nl_register(RDMA_NL_NLDEV, nldev_cb_table); }
-void __exit nldev_exit(void) +void nldev_exit(void) { rdma_nl_unregister(RDMA_NL_NLDEV); }
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 7a47e077e503feb73d56e491ce89aa73b67a3972 ]
Add a check for if create_singlethread_workqueue() fails and also destroy the work queue on failure paths.
Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/Y1gBkDucQhhWj5YM@kili Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/qedr/main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c index 755930be01b8..6b59f97a182b 100644 --- a/drivers/infiniband/hw/qedr/main.c +++ b/drivers/infiniband/hw/qedr/main.c @@ -345,6 +345,10 @@ static int qedr_alloc_resources(struct qedr_dev *dev) if (IS_IWARP(dev)) { xa_init(&dev->qps); dev->iwarp_wq = create_singlethread_workqueue("qedr_iwarpq"); + if (!dev->iwarp_wq) { + rc = -ENOMEM; + goto err1; + } }
/* Allocate Status blocks for CNQ */ @@ -352,7 +356,7 @@ static int qedr_alloc_resources(struct qedr_dev *dev) GFP_KERNEL); if (!dev->sb_array) { rc = -ENOMEM; - goto err1; + goto err_destroy_wq; }
dev->cnq_array = kcalloc(dev->num_cnq, @@ -403,6 +407,9 @@ static int qedr_alloc_resources(struct qedr_dev *dev) kfree(dev->cnq_array); err2: kfree(dev->sb_array); +err_destroy_wq: + if (IS_IWARP(dev)) + destroy_workqueue(dev->iwarp_wq); err1: kfree(dev->sgid_tbl); return rc;
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit a2c65a9d0568b6737c02b54f00b80716a53fac61 ]
DSA tagging protocol drivers can be changed at runtime through sysfs and at probe time through the device tree (support for the latter was added later).
When changing through sysfs, it is assumed that the module for the new tagging protocol was already loaded into the kernel (in fact this is only a concern for Ocelot/Felix switches, where we have tag_ocelot.ko and tag_ocelot_8021q.ko; for every other switch, the default and alternative protocols are compiled within the same .ko, so there is nothing for the user to load).
The kernel cannot currently call request_module(), because it has no way of constructing the modalias name of the tagging protocol driver ("dsa_tag-%d", where the number is one of DSA_TAG_PROTO_*_VALUE). The device tree only contains the string name of the tagging protocol ("ocelot-8021q"), and the only mapping between the string and the DSA_TAG_PROTO_OCELOT_8021Q_VALUE is present in tag_ocelot_8021q.ko. So this is a chicken-and-egg situation and dsa_core.ko has nothing based on which it can automatically request the insertion of the module.
As a consequence, if CONFIG_NET_DSA_TAG_OCELOT_8021Q is built as module, the switch will forever defer probing.
The long-term solution is to make DSA call request_module() somehow, but that probably needs some refactoring.
What we can do to keep operating with existing device tree blobs is to cancel the attempt to change the tagging protocol with the one specified there, and to remain operating with the default one. Depending on the situation, the default protocol might still allow some functionality (in the case of ocelot, it does), and it's better to have that than to fail to probe.
Fixes: deff710703d8 ("net: dsa: Allow default tag protocol to be overridden from DT") Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/ Reported-by: Heiko Thiery heiko.thiery@gmail.com Reported-by: Michael Walle michael@walle.cc Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Tested-by: Michael Walle michael@walle.cc Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20221027145439.3086017-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/dsa2.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 64a56db3de58..34763f575c30 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -1253,9 +1253,9 @@ static enum dsa_tag_protocol dsa_get_tag_protocol(struct dsa_port *dp, static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master, const char *user_protocol) { + const struct dsa_device_ops *tag_ops = NULL; struct dsa_switch *ds = dp->ds; struct dsa_switch_tree *dst = ds->dst; - const struct dsa_device_ops *tag_ops; enum dsa_tag_protocol default_proto;
/* Find out which protocol the switch would prefer. */ @@ -1278,10 +1278,17 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master, }
tag_ops = dsa_find_tagger_by_name(user_protocol); - } else { - tag_ops = dsa_tag_driver_get(default_proto); + if (IS_ERR(tag_ops)) { + dev_warn(ds->dev, + "Failed to find a tagging driver for protocol %s, using default\n", + user_protocol); + tag_ops = NULL; + } }
+ if (!tag_ops) + tag_ops = dsa_tag_driver_get(default_proto); + if (IS_ERR(tag_ops)) { if (PTR_ERR(tag_ops) == -ENOPROTOOPT) return -EPROBE_DEFER;
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 8e4aae6b8ca76afb1fb64dcb24be44ba814e7f8a ]
fdp_nci_send() will call fdp_nci_i2c_write that will not free skb in the function. As a result, when fdp_nci_i2c_write() finished, the skb will memleak. fdp_nci_send() should free skb after fdp_nci_i2c_write() finished.
Fixes: a06347c04c13 ("NFC: Add Intel Fields Peak NFC solution driver") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/fdp/fdp.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/nfc/fdp/fdp.c b/drivers/nfc/fdp/fdp.c index c6b3334f24c9..f12f903a9dd1 100644 --- a/drivers/nfc/fdp/fdp.c +++ b/drivers/nfc/fdp/fdp.c @@ -249,11 +249,19 @@ static int fdp_nci_close(struct nci_dev *ndev) static int fdp_nci_send(struct nci_dev *ndev, struct sk_buff *skb) { struct fdp_nci_info *info = nci_get_drvdata(ndev); + int ret;
if (atomic_dec_and_test(&info->data_pkt_counter)) info->data_pkt_counter_cb(ndev);
- return info->phy_ops->write(info->phy, skb); + ret = info->phy_ops->write(info->phy, skb); + if (ret < 0) { + kfree_skb(skb); + return ret; + } + + consume_skb(skb); + return 0; }
static int fdp_nci_request_firmware(struct nci_dev *ndev)
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 7bf1ed6aff0f70434bd0cdd45495e83f1dffb551 ]
nxp_nci_send() will call nxp_nci_i2c_write(), and only free skb when nxp_nci_i2c_write() failed. However, even if the nxp_nci_i2c_write() run succeeds, the skb will not be freed in nxp_nci_i2c_write(). As the result, the skb will memleak. nxp_nci_send() should also free the skb when nxp_nci_i2c_write() succeeds.
Fixes: dece45855a8b ("NFC: nxp-nci: Add support for NXP NCI chips") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/nxp-nci/core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/nxp-nci/core.c b/drivers/nfc/nxp-nci/core.c index 518e2afb43a8..13c433eb694d 100644 --- a/drivers/nfc/nxp-nci/core.c +++ b/drivers/nfc/nxp-nci/core.c @@ -77,10 +77,13 @@ static int nxp_nci_send(struct nci_dev *ndev, struct sk_buff *skb) return -EINVAL;
r = info->phy_ops->write(info->phy_id, skb); - if (r < 0) + if (r < 0) { kfree_skb(skb); + return r; + }
- return r; + consume_skb(skb); + return 0; }
static const struct nci_ops nxp_nci_ops = {
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 3a146b7e3099dc7cf3114f627d9b79291e2d2203 ]
s3fwrn5_nci_send() will call s3fwrn5_i2c_write() or s3fwrn82_uart_write(), and free the skb if write() failed. However, even if the write() run succeeds, the skb will not be freed in write(). As the result, the skb will memleak. s3fwrn5_nci_send() should also free the skb when write() succeeds.
Fixes: c04c674fadeb ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/s3fwrn5/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/s3fwrn5/core.c b/drivers/nfc/s3fwrn5/core.c index 1c412007fabb..0270e05b68df 100644 --- a/drivers/nfc/s3fwrn5/core.c +++ b/drivers/nfc/s3fwrn5/core.c @@ -110,11 +110,15 @@ static int s3fwrn5_nci_send(struct nci_dev *ndev, struct sk_buff *skb) }
ret = s3fwrn5_write(info, skb); - if (ret < 0) + if (ret < 0) { kfree_skb(skb); + mutex_unlock(&info->mutex); + return ret; + }
+ consume_skb(skb); mutex_unlock(&info->mutex); - return ret; + return 0; }
static int s3fwrn5_nci_post_setup(struct nci_dev *ndev)
From: Shang XiaoJing shangxiaojing@huawei.com
[ Upstream commit 93d904a734a74c54d945a9884b4962977f1176cd ]
nfcmrvl_i2c_nci_send() will be called by nfcmrvl_nci_send(), and skb should be freed in nfcmrvl_i2c_nci_send(). However, nfcmrvl_nci_send() will only free skb when i2c_master_send() return >=0, which means skb will memleak when i2c_master_send() failed. Free skb no matter whether i2c_master_send() succeeds.
Fixes: b5b3e23e4cac ("NFC: nfcmrvl: add i2c driver") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/nfcmrvl/i2c.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nfc/nfcmrvl/i2c.c b/drivers/nfc/nfcmrvl/i2c.c index 01329b91d59d..a902720cd849 100644 --- a/drivers/nfc/nfcmrvl/i2c.c +++ b/drivers/nfc/nfcmrvl/i2c.c @@ -132,10 +132,15 @@ static int nfcmrvl_i2c_nci_send(struct nfcmrvl_private *priv, ret = -EREMOTEIO; } else ret = 0; + } + + if (ret) { kfree_skb(skb); + return ret; }
- return ret; + consume_skb(skb); + return 0; }
static void nfcmrvl_i2c_nci_update_config(struct nfcmrvl_private *priv,
From: Zhang Changzhong zhangchangzhong@huawei.com
[ Upstream commit 06a4df5863f73af193a4ff7abf7cb04058584f06 ]
The ndo_start_xmit() method must not free skb when returning NETDEV_TX_BUSY, since caller is going to requeue freed skb.
Fix it by returning NETDEV_TX_OK in case of dma_map_single() fails.
Fixes: 79f339125ea3 ("net: fec: Add software TSO support") Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 313ae8112067..a829ba128b9d 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -656,7 +656,7 @@ fec_enet_txq_put_data_tso(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, dev_kfree_skb_any(skb); if (net_ratelimit()) netdev_err(ndev, "Tx DMA memory map failed\n"); - return NETDEV_TX_BUSY; + return NETDEV_TX_OK; }
bdp->cbd_datlen = cpu_to_fec16(size); @@ -718,7 +718,7 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq, dev_kfree_skb_any(skb); if (net_ratelimit()) netdev_err(ndev, "Tx DMA memory map failed\n"); - return NETDEV_TX_BUSY; + return NETDEV_TX_OK; } }
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 171a93182eccd6e6835d2c86b40787f9f832efaa ]
Clang gives a warning when compiling pata_legacy.c with 'make W=1' about the 'rt' local variable in pdc20230_set_piomode() being set but unused. Quite obviously, there is an outb() call missing to write back the updated variable. Moreover, checking the docs by Petr Soucek revealed that bitwise AND should have been done with a negated timing mask and the master/slave timing masks were swapped while updating...
Fixes: 669a5db411d8 ("[libata] Add a bunch of PATA drivers.") Reported-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/pata_legacy.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/ata/pata_legacy.c b/drivers/ata/pata_legacy.c index 0a8bf09a5c19..03c580625c2c 100644 --- a/drivers/ata/pata_legacy.c +++ b/drivers/ata/pata_legacy.c @@ -315,9 +315,10 @@ static void pdc20230_set_piomode(struct ata_port *ap, struct ata_device *adev) outb(inb(0x1F4) & 0x07, 0x1F4);
rt = inb(0x1F3); - rt &= 0x07 << (3 * adev->devno); + rt &= ~(0x07 << (3 * !adev->devno)); if (pio) - rt |= (1 + 3 * pio) << (3 * adev->devno); + rt |= (1 + 3 * pio) << (3 * !adev->devno); + outb(rt, 0x1F3);
udelay(100); outb(inb(0x1F2) | 0x01, 0x1F2);
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 8bdc2acd420c6f3dd1f1c78750ec989f02a1e2b9 ]
We can't use "skb" again after passing it to qdisc_enqueue(). This is basically identical to commit 2f09707d0c97 ("sch_sfb: Also store skb len before calling child enqueue").
Fixes: d7f4f332f082 ("sch_red: update backlog as well") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_red.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index f1e013e3f04a..935d90874b1b 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -72,6 +72,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, { struct red_sched_data *q = qdisc_priv(sch); struct Qdisc *child = q->qdisc; + unsigned int len; int ret;
q->vars.qavg = red_calc_qavg(&q->parms, @@ -126,9 +127,10 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, break; }
+ len = qdisc_pkt_len(skb); ret = qdisc_enqueue(skb, child, to_free); if (likely(ret == NET_XMIT_SUCCESS)) { - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; } else if (net_xmit_drop_count(ret)) { q->stats.pdrop++;
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 363a5328f4b0517e59572118ccfb7c626d81dca9 ]
Recently, we got two syzkaller problems because of oversize packet when napi frags enabled.
One of the problems is because the first seg size of the iov_iter from user space is very big, it is 2147479538 which is bigger than the threshold value for bail out early in __alloc_pages(). And skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN flag. Thus we got a warning as following:
======================================================== WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 ... Call trace: __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 __alloc_pages_node include/linux/gfp.h:550 [inline] alloc_pages_node include/linux/gfp.h:564 [inline] kmalloc_large_node+0x94/0x350 mm/slub.c:4038 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 __skb_grow include/linux/skbuff.h:2779 [inline] tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
The other problem is because odd IPv6 packets without NEXTHDR_NONE extension header and have big packet length, it is 2127925 which is bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in ipv6_gro_receive(), network_header offset and transport_header offset are all bigger than U16_MAX. That would trigger skb->network_header and skb->transport_header overflow error, because they are all '__u16' type. Eventually, it would affect the value for __skb_push(skb, value), and make it be a big value. After __skb_push() in ipv6_gro_receive(), skb->data would less than skb->head, an out of bounds memory bug occurred. That would trigger the problem as following:
================================================================== BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260 ... Call trace: dump_backtrace+0xd8/0x130 show_stack+0x1c/0x50 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0xbc/0x2e8 print_report+0x100/0x1e4 kasan_report+0x80/0x120 __asan_load8+0x78/0xa0 eth_type_trans+0x100/0x260 napi_gro_frags+0x164/0x550 tun_get_user+0xda4/0x1270 tun_chr_write_iter+0x74/0x130 do_iter_readv_writev+0x130/0x1ec do_iter_write+0xbc/0x1e0 vfs_writev+0x13c/0x26c
To fix the problems, restrict the packet size less than (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved skb space in napi_alloc_skb() because transport_header is an offset from skb->head. Add len check in tun_napi_alloc_frags() simply.
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index f92d6a12831f..9909f430d723 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1445,7 +1445,8 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, int err; int i;
- if (it->nr_segs > MAX_SKB_FRAGS + 1) + if (it->nr_segs > MAX_SKB_FRAGS + 1 || + len > (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN)) return ERR_PTR(-EMSGSIZE);
local_bh_disable();
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit d4bc8271db21ea9f1c86a1ca4d64999f184d4aae ]
commit release path is invoked via call_rcu and it runs lockless to release the objects after rcu grace period. The netlink notifier handler might win race to remove objects that the transaction context is still referencing from the commit release path.
Call rcu_barrier() to ensure pending rcu callbacks run to completion if the list of transactions to be destroyed is not empty.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f7a5b8414423..8cd11a7e5a3e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -9824,6 +9824,8 @@ static int nft_rcv_nl_event(struct notifier_block *this, unsigned long event, nft_net = nft_pernet(net); deleted = 0; mutex_lock(&nft_net->commit_mutex); + if (!list_empty(&nf_tables_destroy_list)) + rcu_barrier(); again: list_for_each_entry(table, &nft_net->tables, list) { if (nft_table_has_owner(table) &&
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 26b5934ff4194e13196bedcba373cd4915071d0e ]
No need to postpone this to the commit release path, since no packets are walking over this object, this is accessed from control plane only. This helped uncovered UAF triggered by races with the netlink notifier.
Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path") Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 8cd11a7e5a3e..899f01c6c26c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8308,9 +8308,6 @@ static void nft_commit_release(struct nft_trans *trans) nf_tables_chain_destroy(&trans->ctx); break; case NFT_MSG_DELRULE: - if (trans->ctx.chain->flags & NFT_CHAIN_HW_OFFLOAD) - nft_flow_rule_destroy(nft_trans_flow_rule(trans)); - nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); break; case NFT_MSG_DELSET: @@ -8767,6 +8764,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_rule_expr_deactivate(&trans->ctx, nft_trans_rule(trans), NFT_TRANS_COMMIT); + + if (trans->ctx.chain->flags & NFT_CHAIN_HW_OFFLOAD) + nft_flow_rule_destroy(nft_trans_flow_rule(trans)); break; case NFT_MSG_NEWSET: nft_clear(net, nft_trans_set(trans));
From: Jason A. Donenfeld Jason@zx2c4.com
[ Upstream commit 5c26159c97b324dc5174a5713eafb8c855cf8106 ]
The `char` type with no explicit sign is sometimes signed and sometimes unsigned. This code will break on platforms such as arm, where char is unsigned. So mark it here as explicitly signed, so that the todrop_counter decrement and subsequent comparison is correct.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_conn.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index fb67f1ca2495..db13288fddfa 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1265,8 +1265,8 @@ static inline int todrop_entry(struct ip_vs_conn *cp) * The drop rate array needs tuning for real environments. * Called from timer bh only => no locking */ - static const char todrop_rate[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8}; - static char todrop_counter[9] = {0}; + static const signed char todrop_rate[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8}; + static signed char todrop_counter[9] = {0}; int i;
/* if the conn entry hasn't lasted for 60 seconds, don't drop it.
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 3d00c6a0da8ddcf75213e004765e4a42acc71d5d ]
During the initialization of ip_vs_conn_net_init(), if file ip_vs_conn or ip_vs_conn_sync fails to be created, the initialization is successful by default. Therefore, the ip_vs_conn or ip_vs_conn_sync file doesn't be found during the remove.
The following is the stack information: name 'ip_vs_conn_sync' WARNING: CPU: 3 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> __ip_vs_cleanup_batch+0x7d/0x120 ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_conn.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index db13288fddfa..cb6d68220c26 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -1447,20 +1447,36 @@ int __net_init ip_vs_conn_net_init(struct netns_ipvs *ipvs) { atomic_set(&ipvs->conn_count, 0);
- proc_create_net("ip_vs_conn", 0, ipvs->net->proc_net, - &ip_vs_conn_seq_ops, sizeof(struct ip_vs_iter_state)); - proc_create_net("ip_vs_conn_sync", 0, ipvs->net->proc_net, - &ip_vs_conn_sync_seq_ops, - sizeof(struct ip_vs_iter_state)); +#ifdef CONFIG_PROC_FS + if (!proc_create_net("ip_vs_conn", 0, ipvs->net->proc_net, + &ip_vs_conn_seq_ops, + sizeof(struct ip_vs_iter_state))) + goto err_conn; + + if (!proc_create_net("ip_vs_conn_sync", 0, ipvs->net->proc_net, + &ip_vs_conn_sync_seq_ops, + sizeof(struct ip_vs_iter_state))) + goto err_conn_sync; +#endif + return 0; + +#ifdef CONFIG_PROC_FS +err_conn_sync: + remove_proc_entry("ip_vs_conn", ipvs->net->proc_net); +err_conn: + return -ENOMEM; +#endif }
void __net_exit ip_vs_conn_net_cleanup(struct netns_ipvs *ipvs) { /* flush all the connection entries first */ ip_vs_conn_flush(ipvs); +#ifdef CONFIG_PROC_FS remove_proc_entry("ip_vs_conn", ipvs->net->proc_net); remove_proc_entry("ip_vs_conn_sync", ipvs->net->proc_net); +#endif }
int __init ip_vs_conn_init(void)
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 5663ed63adb9619c98ab7479aa4606fa9b7a548c ]
During the initialization of ip_vs_app_net_init(), if file ip_vs_app fails to be created, the initialization is successful by default. Therefore, the ip_vs_app file doesn't be found during the remove in ip_vs_app_net_cleanup(). It will cause WRNING.
The following is the stack information: name 'ip_vs_app' WARNING: CPU: 1 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 457c4cbc5a3d ("[NET]: Make /proc/net per network namespace") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_app.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c index f9b16f2b2219..fdacbc3c15be 100644 --- a/net/netfilter/ipvs/ip_vs_app.c +++ b/net/netfilter/ipvs/ip_vs_app.c @@ -599,13 +599,19 @@ static const struct seq_operations ip_vs_app_seq_ops = { int __net_init ip_vs_app_net_init(struct netns_ipvs *ipvs) { INIT_LIST_HEAD(&ipvs->app_list); - proc_create_net("ip_vs_app", 0, ipvs->net->proc_net, &ip_vs_app_seq_ops, - sizeof(struct seq_net_private)); +#ifdef CONFIG_PROC_FS + if (!proc_create_net("ip_vs_app", 0, ipvs->net->proc_net, + &ip_vs_app_seq_ops, + sizeof(struct seq_net_private))) + return -ENOMEM; +#endif return 0; }
void __net_exit ip_vs_app_net_cleanup(struct netns_ipvs *ipvs) { unregister_ip_vs_app(ipvs, NULL /* all */); +#ifdef CONFIG_PROC_FS remove_proc_entry("ip_vs_app", ipvs->net->proc_net); +#endif }
From: Zhang Qilong zhangqilong3@huawei.com
[ Upstream commit e97c089d7a49f67027395ddf70bf327eeac2611e ]
The syzkaller reported an issue:
KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387] CPU: 0 PID: 4069 Comm: kworker/0:15 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: rcu_gp srcu_invoke_callbacks RIP: 0010:rose_send_frame+0x1dd/0x2f0 net/rose/rose_link.c:101 Call Trace: <IRQ> rose_transmit_clear_request+0x1d5/0x290 net/rose/rose_link.c:255 rose_rx_call_request+0x4c0/0x1bc0 net/rose/af_rose.c:1009 rose_loopback_timer+0x19e/0x590 net/rose/rose_loopback.c:111 call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790 __run_timers kernel/time/timer.c:1768 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803 __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571 [...] </IRQ>
It triggers NULL pointer dereference when 'neigh->dev->dev_addr' is called in the rose_send_frame(). It's the first occurrence of the `neigh` is in rose_loopback_timer() as `rose_loopback_neigh', and the 'dev' in 'rose_loopback_neigh' is initialized sa nullptr.
It had been fixed by commit 3b3fd068c56e3fbea30090859216a368398e39bf ("rose: Fix Null pointer dereference in rose_send_frame()") ever. But it's introduced by commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ("rose: check NULL rose_loopback_neigh->loopback") again.
We fix it by add NULL check in rose_transmit_clear_request(). When the 'dev' in 'neigh' is NULL, we don't reply the request and just clear it.
syzkaller don't provide repro, and I provide a syz repro like: r0 = syz_init_net_socket$bt_sco(0x1f, 0x5, 0x2) ioctl$sock_inet_SIOCSIFFLAGS(r0, 0x8914, &(0x7f0000000180)={'rose0\x00', 0x201}) r1 = syz_init_net_socket$rose(0xb, 0x5, 0x0) bind$rose(r1, &(0x7f00000000c0)=@full={0xb, @dev, @null, 0x0, [@null, @null, @netrom, @netrom, @default, @null]}, 0x40) connect$rose(r1, &(0x7f0000000240)=@short={0xb, @dev={0xbb, 0xbb, 0xbb, 0x1, 0x0}, @remote={0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x1}, 0x1, @netrom={0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0x0, 0x0}}, 0x1c)
Fixes: 3c53cd65dece ("rose: check NULL rose_loopback_neigh->loopback") Signed-off-by: Zhang Qilong zhangqilong3@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rose/rose_link.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/rose/rose_link.c b/net/rose/rose_link.c index f6102e6f5161..730d2205f197 100644 --- a/net/rose/rose_link.c +++ b/net/rose/rose_link.c @@ -236,6 +236,9 @@ void rose_transmit_clear_request(struct rose_neigh *neigh, unsigned int lci, uns unsigned char *dptr; int len;
+ if (!neigh->dev) + return; + len = AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + ROSE_MIN_LEN + 3;
if ((skb = alloc_skb(len, GFP_ATOMIC)) == NULL)
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit e7d1d4d9ac0dfa40be4c2c8abd0731659869b297 ]
Afer commit 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array"), the name of device is allocated dynamically, add put_device() to give up the reference, so that the name can be freed in kobject_cleanup() when the refcount is 0.
Set device class before put_device() to avoid null release() function WARN message in device_release().
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/mISDN/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c index a41b4b264594..7ea0100f218a 100644 --- a/drivers/isdn/mISDN/core.c +++ b/drivers/isdn/mISDN/core.c @@ -233,11 +233,12 @@ mISDN_register_device(struct mISDNdevice *dev, if (debug & DEBUG_CORE) printk(KERN_DEBUG "mISDN_register %s %d\n", dev_name(&dev->dev), dev->id); + dev->dev.class = &mISDN_class; + err = create_stack(dev); if (err) goto error1;
- dev->dev.class = &mISDN_class; dev->dev.platform_data = dev; dev->dev.parent = parent; dev_set_drvdata(&dev->dev, dev); @@ -249,8 +250,8 @@ mISDN_register_device(struct mISDNdevice *dev,
error3: delete_stack(dev); - return err; error1: + put_device(&dev->dev); return err;
}
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit bf00f5426074249058a106a6edbb89e4b25a4d79 ]
The class is set in mISDN_register_device(), but if device_add() returns error, it will lead to delete a device without added, fix this by using device_is_registered() to check if the device is registered.
Fixes: a900845e5661 ("mISDN: Add support for Traverse Technologies NETJet PCI cards") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/netjet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/isdn/hardware/mISDN/netjet.c b/drivers/isdn/hardware/mISDN/netjet.c index a52f275f8263..f8447135a902 100644 --- a/drivers/isdn/hardware/mISDN/netjet.c +++ b/drivers/isdn/hardware/mISDN/netjet.c @@ -956,7 +956,7 @@ nj_release(struct tiger_hw *card) } if (card->irq > 0) free_irq(card->irq, card); - if (card->isac.dch.dev.dev.class) + if (device_is_registered(&card->isac.dch.dev.dev)) mISDN_unregister_device(&card->isac.dch.dev);
for (i = 0; i < 2; i++) {
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 5614dc3a47e3310fbc77ea3b67eaadd1c6417bf1 ]
During backref walking, at resolve_indirect_refs(), if we get an error we jump to the 'out' label and call ulist_free() on the 'parents' ulist, which frees all the elements in the ulist - however that does not free any inode lists that may be attached to elements, through the 'aux' field of a ulist node, so we end up leaking lists if we have any attached to the unodes.
Fix this by calling free_leaf_list() instead of ulist_free() when we exit from resolve_indirect_refs(). The static function free_leaf_list() is moved up for this to be possible and it's slightly simplified by removing unnecessary code.
Fixes: 3301958b7c1d ("Btrfs: add inodes before dropping the extent lock in find_all_leafs") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/backref.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 2e7c3e48bc9c..26e6867ff023 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -648,6 +648,18 @@ unode_aux_to_inode_list(struct ulist_node *node) return (struct extent_inode_elem *)(uintptr_t)node->aux; }
+static void free_leaf_list(struct ulist *ulist) +{ + struct ulist_node *node; + struct ulist_iterator uiter; + + ULIST_ITER_INIT(&uiter); + while ((node = ulist_next(ulist, &uiter))) + free_inode_elem_list(unode_aux_to_inode_list(node)); + + ulist_free(ulist); +} + /* * We maintain three separate rbtrees: one for direct refs, one for * indirect refs which have a key, and one for indirect refs which do not @@ -762,7 +774,11 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, cond_resched(); } out: - ulist_free(parents); + /* + * We may have inode lists attached to refs in the parents ulist, so we + * must free them before freeing the ulist and its refs. + */ + free_leaf_list(parents); return ret; }
@@ -1412,24 +1428,6 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, return ret; }
-static void free_leaf_list(struct ulist *blocks) -{ - struct ulist_node *node = NULL; - struct extent_inode_elem *eie; - struct ulist_iterator uiter; - - ULIST_ITER_INIT(&uiter); - while ((node = ulist_next(blocks, &uiter))) { - if (!node->aux) - continue; - eie = unode_aux_to_inode_list(node); - free_inode_elem_list(eie); - node->aux = 0; - } - - ulist_free(blocks); -} - /* * Finds all leafs with a reference to the specified combination of bytenr and * offset. key_list_head will point to a list of corresponding keys (caller must
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 92876eec382a0f19f33d09d2c939e9ca49038ae5 ]
During backref walking, at find_parent_nodes(), if we are dealing with a data extent and we get an error while resolving the indirect backrefs, at resolve_indirect_refs(), or in the while loop that iterates over the refs in the direct refs rbtree, we end up leaking the inode lists attached to the direct refs we have in the direct refs rbtree that were not yet added to the refs ulist passed as argument to find_parent_nodes(). Since they were not yet added to the refs ulist and prelim_release() does not free the lists, on error the caller can only free the lists attached to the refs that were added to the refs ulist, all the remaining refs get their inode lists never freed, therefore leaking their memory.
Fix this by having prelim_release() always free any attached inode list to each ref found in the rbtree, and have find_parent_nodes() set the ref's inode list to NULL once it transfers ownership of the inode list to a ref added to the refs ulist passed to find_parent_nodes().
Fixes: 86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/backref.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 26e6867ff023..60bec5a108fa 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -289,8 +289,10 @@ static void prelim_release(struct preftree *preftree) struct prelim_ref *ref, *next_ref;
rbtree_postorder_for_each_entry_safe(ref, next_ref, - &preftree->root.rb_root, rbnode) + &preftree->root.rb_root, rbnode) { + free_inode_elem_list(ref->inode_list); free_pref(ref); + }
preftree->root = RB_ROOT_CACHED; preftree->count = 0; @@ -1387,6 +1389,12 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, if (ret < 0) goto out; ref->inode_list = eie; + /* + * We transferred the list ownership to the ref, + * so set to NULL to avoid a double free in case + * an error happens after this. + */ + eie = NULL; } ret = ulist_add_merge_ptr(refs, ref->parent, ref->inode_list, @@ -1412,6 +1420,14 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, eie->next = ref->inode_list; } eie = NULL; + /* + * We have transferred the inode list ownership from + * this ref to the ref we added to the 'refs' ulist. + * So set this ref's inode list to NULL to avoid + * use-after-free when our caller uses it or double + * frees in case an error happens before we return. + */ + ref->inode_list = NULL; } cond_resched(); }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit d37de92b38932d40e4a251e876cc388f9aee5f42 ]
In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests, if we fail to add the tree ref, remove the extent item or remove the extent ref, we are returning from the test function without freeing the "old_roots" ulist that was allocated by the previous calls to btrfs_find_all_roots(). Fix that by calling ulist_free() before returning.
Fixes: 442244c96332 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tests/qgroup-tests.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/tests/qgroup-tests.c b/fs/btrfs/tests/qgroup-tests.c index 19ba7d5b7d8f..a374b62c9de9 100644 --- a/fs/btrfs/tests/qgroup-tests.c +++ b/fs/btrfs/tests/qgroup-tests.c @@ -232,8 +232,10 @@ static int test_no_shared_qgroup(struct btrfs_root *root,
ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -266,8 +268,10 @@ static int test_no_shared_qgroup(struct btrfs_root *root, }
ret = remove_extent_item(root, nodesize, nodesize); - if (ret) + if (ret) { + ulist_free(old_roots); return -EINVAL; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -329,8 +333,10 @@ static int test_multiple_refs(struct btrfs_root *root,
ret = insert_normal_tree_ref(root, nodesize, nodesize, 0, BTRFS_FS_TREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -362,8 +368,10 @@ static int test_multiple_refs(struct btrfs_root *root,
ret = add_tree_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) { @@ -401,8 +409,10 @@ static int test_multiple_refs(struct btrfs_root *root,
ret = remove_extent_ref(root, nodesize, nodesize, 0, BTRFS_FIRST_FREE_OBJECTID); - if (ret) + if (ret) { + ulist_free(old_roots); return ret; + }
ret = btrfs_find_all_roots(&trans, fs_info, nodesize, 0, &new_roots, false); if (ret) {
From: Jozsef Kadlecsik kadlec@netfilter.org
[ Upstream commit 510841da1fcc16f702440ab58ef0b4d82a9056b7 ]
Daniel Xu reported that the hash:net,iface type of the ipset subsystem does not limit adding the same network with different interfaces to a set, which can lead to huge memory usage or allocation failure.
The quick reproducer is
$ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0 $ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done
The backtrace when vmalloc fails:
[Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages <...> [Tue Oct 25 00:13:08 2022] Call Trace: [Tue Oct 25 00:13:08 2022] <TASK> [Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60 [Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180 [Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20 [Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90 [Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710 <...>
The fix is to enforce the limit documented in the ipset(8) manpage:
The internal restriction of the hash:net,iface set type is that the same network prefix cannot be stored with more than 64 different interfaces in a single set.
Fixes: ccf0a4b7fc68 ("netfilter: ipset: Add bucketsize parameter to all hash types") Reported-by: Daniel Xu dxu@dxuuu.xyz Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipset/ip_set_hash_gen.h | 30 ++++++--------------------- 1 file changed, 6 insertions(+), 24 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 6e391308431d..3adc291d9ce1 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -42,31 +42,8 @@ #define AHASH_MAX_SIZE (6 * AHASH_INIT_SIZE) /* Max muber of elements in the array block when tuned */ #define AHASH_MAX_TUNED 64 - #define AHASH_MAX(h) ((h)->bucketsize)
-/* Max number of elements can be tuned */ -#ifdef IP_SET_HASH_WITH_MULTI -static u8 -tune_bucketsize(u8 curr, u32 multi) -{ - u32 n; - - if (multi < curr) - return curr; - - n = curr + AHASH_INIT_SIZE; - /* Currently, at listing one hash bucket must fit into a message. - * Therefore we have a hard limit here. - */ - return n > curr && n <= AHASH_MAX_TUNED ? n : curr; -} -#define TUNE_BUCKETSIZE(h, multi) \ - ((h)->bucketsize = tune_bucketsize((h)->bucketsize, multi)) -#else -#define TUNE_BUCKETSIZE(h, multi) -#endif - /* A hash bucket */ struct hbucket { struct rcu_head rcu; /* for call_rcu */ @@ -936,7 +913,12 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, goto set_full; /* Create a new slot */ if (n->pos >= n->size) { - TUNE_BUCKETSIZE(h, multi); +#ifdef IP_SET_HASH_WITH_MULTI + if (h->bucketsize >= AHASH_MAX_TUNED) + goto set_full; + else if (h->bucketsize < multi) + h->bucketsize += AHASH_INIT_SIZE; +#endif if (n->size >= AHASH_MAX(h)) { /* Trigger rehashing */ mtype_data_next(&h->next, d);
From: Maxim Mikityanskiy maxtram95@gmail.com
[ Upstream commit 3aff8aaca4e36dc8b17eaa011684881a80238966 ]
Fix the race condition between the following two flows that run in parallel:
1. l2cap_reassemble_sdu -> chan->ops->recv (l2cap_sock_recv_cb) -> __sock_queue_rcv_skb.
2. bt_sock_recvmsg -> skb_recv_datagram, skb_free_datagram.
An SKB can be queued by the first flow and immediately dequeued and freed by the second flow, therefore the callers of l2cap_reassemble_sdu can't use the SKB after that function returns. However, some places continue accessing struct l2cap_ctrl that resides in the SKB's CB for a short time after l2cap_reassemble_sdu returns, leading to a use-after-free condition (the stack trace is below, line numbers for kernel 5.19.8).
Fix it by keeping a local copy of struct l2cap_ctrl.
BUG: KASAN: use-after-free in l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth Read of size 1 at addr ffff88812025f2f0 by task kworker/u17:3/43169
Workqueue: hci0 hci_rx_work [bluetooth] Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) print_report.cold (mm/kasan/report.c:314 mm/kasan/report.c:429) ? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) ? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth l2cap_rx (net/bluetooth/l2cap_core.c:7236 net/bluetooth/l2cap_core.c:7271) bluetooth ret_from_fork (arch/x86/entry/entry_64.S:306) </TASK>
Allocated by task 43169: kasan_save_stack (mm/kasan/common.c:39) __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) kmem_cache_alloc_node (mm/slab.h:750 mm/slub.c:3243 mm/slub.c:3293) __alloc_skb (net/core/skbuff.c:414) l2cap_recv_frag (./include/net/bluetooth/bluetooth.h:425 net/bluetooth/l2cap_core.c:8329) bluetooth l2cap_recv_acldata (net/bluetooth/l2cap_core.c:8442) bluetooth hci_rx_work (net/bluetooth/hci_core.c:3642 net/bluetooth/hci_core.c:3832) bluetooth process_one_work (kernel/workqueue.c:2289) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2437) kthread (kernel/kthread.c:376) ret_from_fork (arch/x86/entry/entry_64.S:306)
Freed by task 27920: kasan_save_stack (mm/kasan/common.c:39) kasan_set_track (mm/kasan/common.c:45) kasan_set_free_info (mm/kasan/generic.c:372) ____kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328) slab_free_freelist_hook (mm/slub.c:1780) kmem_cache_free (mm/slub.c:3536 mm/slub.c:3553) skb_free_datagram (./include/net/sock.h:1578 ./include/net/sock.h:1639 net/core/datagram.c:323) bt_sock_recvmsg (net/bluetooth/af_bluetooth.c:295) bluetooth l2cap_sock_recvmsg (net/bluetooth/l2cap_sock.c:1212) bluetooth sock_read_iter (net/socket.c:1087) new_sync_read (./include/linux/fs.h:2052 fs/read_write.c:401) vfs_read (fs/read_write.c:482) ksys_read (fs/read_write.c:620) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
Link: https://lore.kernel.org/linux-bluetooth/CAKErNvoqga1WcmoR3-0875esY6TVWFQDand... Fixes: d2a7ac5d5d3a ("Bluetooth: Add the ERTM receive state machine") Fixes: 4b51dae96731 ("Bluetooth: Add streaming mode receive and incoming packet classifier") Signed-off-by: Maxim Mikityanskiy maxtram95@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 48 ++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 7 deletions(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 8f1a95b9d320..d12d66f31b47 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -6885,6 +6885,7 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, struct l2cap_ctrl *control, struct sk_buff *skb, u8 event) { + struct l2cap_ctrl local_control; int err = 0; bool skb_in_use = false;
@@ -6909,15 +6910,32 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, chan->buffer_seq = chan->expected_tx_seq; skb_in_use = true;
+ /* l2cap_reassemble_sdu may free skb, hence invalidate + * control, so make a copy in advance to use it after + * l2cap_reassemble_sdu returns and to avoid the race + * condition, for example: + * + * The current thread calls: + * l2cap_reassemble_sdu + * chan->ops->recv == l2cap_sock_recv_cb + * __sock_queue_rcv_skb + * Another thread calls: + * bt_sock_recvmsg + * skb_recv_datagram + * skb_free_datagram + * Then the current thread tries to access control, but + * it was freed by skb_free_datagram. + */ + local_control = *control; err = l2cap_reassemble_sdu(chan, skb, control); if (err) break;
- if (control->final) { + if (local_control.final) { if (!test_and_clear_bit(CONN_REJ_ACT, &chan->conn_state)) { - control->final = 0; - l2cap_retransmit_all(chan, control); + local_control.final = 0; + l2cap_retransmit_all(chan, &local_control); l2cap_ertm_send(chan); } } @@ -7297,11 +7315,27 @@ static int l2cap_rx(struct l2cap_chan *chan, struct l2cap_ctrl *control, static int l2cap_stream_rx(struct l2cap_chan *chan, struct l2cap_ctrl *control, struct sk_buff *skb) { + /* l2cap_reassemble_sdu may free skb, hence invalidate control, so store + * the txseq field in advance to use it after l2cap_reassemble_sdu + * returns and to avoid the race condition, for example: + * + * The current thread calls: + * l2cap_reassemble_sdu + * chan->ops->recv == l2cap_sock_recv_cb + * __sock_queue_rcv_skb + * Another thread calls: + * bt_sock_recvmsg + * skb_recv_datagram + * skb_free_datagram + * Then the current thread tries to access control, but it was freed by + * skb_free_datagram. + */ + u16 txseq = control->txseq; + BT_DBG("chan %p, control %p, skb %p, state %d", chan, control, skb, chan->rx_state);
- if (l2cap_classify_txseq(chan, control->txseq) == - L2CAP_TXSEQ_EXPECTED) { + if (l2cap_classify_txseq(chan, txseq) == L2CAP_TXSEQ_EXPECTED) { l2cap_pass_to_tx(chan, control);
BT_DBG("buffer_seq %u->%u", chan->buffer_seq, @@ -7324,8 +7358,8 @@ static int l2cap_stream_rx(struct l2cap_chan *chan, struct l2cap_ctrl *control, } }
- chan->last_acked_seq = control->txseq; - chan->expected_tx_seq = __next_seq(chan, control->txseq); + chan->last_acked_seq = txseq; + chan->expected_tx_seq = __next_seq(chan, txseq);
return 0; }
From: Soenke Huster soenke.huster@eknoes.de
[ Upstream commit 160fbcf3bfb93c3c086427f9f4c8bc70f217e9be ]
By using skb_put we ensure that skb->tail is set correctly. Currently, skb->tail is always zero, which leads to errors, such as the following page fault in rfcomm_recv_frame:
BUG: unable to handle page fault for address: ffffed1021de29ff #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:rfcomm_run+0x831/0x4040 (net/bluetooth/rfcomm/core.c:1751)
Fixes: afd2daa26c7a ("Bluetooth: Add support for virtio transport driver") Signed-off-by: Soenke Huster soenke.huster@eknoes.de Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/virtio_bt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bluetooth/virtio_bt.c b/drivers/bluetooth/virtio_bt.c index 076e4942a3f0..612f10456849 100644 --- a/drivers/bluetooth/virtio_bt.c +++ b/drivers/bluetooth/virtio_bt.c @@ -219,7 +219,7 @@ static void virtbt_rx_work(struct work_struct *work) if (!skb) return;
- skb->len = len; + skb_put(skb, len); virtbt_rx_handle(vbt, skb);
if (virtbt_add_inbuf(vbt) < 0)
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 0d0e2d032811280b927650ff3c15fe5020e82533 ]
When l2cap_recv_frame() is invoked to receive data, and the cid is L2CAP_CID_A2MP, if the channel does not exist, it will create a channel. However, after a channel is created, the hold operation of the channel is not performed. In this case, the value of channel reference counting is 1. As a result, after hci_error_reset() is triggered, l2cap_conn_del() invokes the close hook function of A2MP to release the channel. Then l2cap_chan_unlock(chan) will trigger UAF issue.
The process is as follows: Receive data: l2cap_data_channel() a2mp_channel_create() --->channel ref is 2 l2cap_chan_put() --->channel ref is 1
Triger event: hci_error_reset() hci_dev_do_close() ... l2cap_disconn_cfm() l2cap_conn_del() l2cap_chan_hold() --->channel ref is 2 l2cap_chan_del() --->channel ref is 1 a2mp_chan_close_cb() --->channel ref is 0, release channel l2cap_chan_unlock() --->UAF of channel
The detailed Call Trace is as follows: BUG: KASAN: use-after-free in __mutex_unlock_slowpath+0xa6/0x5e0 Read of size 8 at addr ffff8880160664b8 by task kworker/u11:1/7593 Workqueue: hci0 hci_error_reset Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 print_report.cold+0x2ba/0x719 kasan_report+0xb1/0x1e0 kasan_check_range+0x140/0x190 __mutex_unlock_slowpath+0xa6/0x5e0 l2cap_conn_del+0x404/0x7b0 l2cap_disconn_cfm+0x8c/0xc0 hci_conn_hash_flush+0x11f/0x260 hci_dev_close_sync+0x5f5/0x11f0 hci_dev_do_close+0x2d/0x70 hci_error_reset+0x9e/0x140 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Allocated by task 7593: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0xa9/0xd0 l2cap_chan_create+0x40/0x930 amp_mgr_create+0x96/0x990 a2mp_channel_create+0x7d/0x150 l2cap_recv_frame+0x51b8/0x9a70 l2cap_recv_acldata+0xaa3/0xc00 hci_rx_work+0x702/0x1220 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30
Freed by task 7593: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x167/0x1c0 slab_free_freelist_hook+0x89/0x1c0 kfree+0xe2/0x580 l2cap_chan_put+0x22a/0x2d0 l2cap_conn_del+0x3fc/0x7b0 l2cap_disconn_cfm+0x8c/0xc0 hci_conn_hash_flush+0x11f/0x260 hci_dev_close_sync+0x5f5/0x11f0 hci_dev_do_close+0x2d/0x70 hci_error_reset+0x9e/0x140 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30
Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0xbe/0xd0 call_rcu+0x99/0x740 netlink_release+0xe6a/0x1cf0 __sock_release+0xcd/0x280 sock_close+0x18/0x20 __fput+0x27c/0xa90 task_work_run+0xdd/0x1a0 exit_to_user_mode_prepare+0x23c/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0xbe/0xd0 call_rcu+0x99/0x740 netlink_release+0xe6a/0x1cf0 __sock_release+0xcd/0x280 sock_close+0x18/0x20 __fput+0x27c/0xa90 task_work_run+0xdd/0x1a0 exit_to_user_mode_prepare+0x23c/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index d12d66f31b47..a18a1c608ed1 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -7615,6 +7615,7 @@ static void l2cap_data_channel(struct l2cap_conn *conn, u16 cid, return; }
+ l2cap_chan_hold(chan); l2cap_chan_lock(chan); } else { BT_DBG("unknown cid 0x%4.4x", cid);
From: Hawkins Jiawei yin31149@gmail.com
[ Upstream commit 7c9524d929648935bac2bbb4c20437df8f9c3f42 ]
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810d81ac00 (size 240): [...] hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff838733d9>] __alloc_skb+0x1f9/0x270 net/core/skbuff.c:418 [<ffffffff833f742f>] alloc_skb include/linux/skbuff.h:1257 [inline] [<ffffffff833f742f>] bt_skb_alloc include/net/bluetooth/bluetooth.h:469 [inline] [<ffffffff833f742f>] vhci_get_user drivers/bluetooth/hci_vhci.c:391 [inline] [<ffffffff833f742f>] vhci_write+0x5f/0x230 drivers/bluetooth/hci_vhci.c:511 [<ffffffff815e398d>] call_write_iter include/linux/fs.h:2192 [inline] [<ffffffff815e398d>] new_sync_write fs/read_write.c:491 [inline] [<ffffffff815e398d>] vfs_write+0x42d/0x540 fs/read_write.c:578 [<ffffffff815e3cdd>] ksys_write+0x9d/0x160 fs/read_write.c:631 [<ffffffff845e0645>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845e0645>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ====================================
HCI core will uses hci_rx_work() to process frame, which is queued to the hdev->rx_q tail in hci_recv_frame() by HCI driver.
Yet the problem is that, HCI core may not free the skb after handling ACL data packets. To be more specific, when start fragment does not contain the L2CAP length, HCI core just copies skb into conn->rx_skb and finishes frame process in l2cap_recv_acldata(), without freeing the skb, which triggers the above memory leak.
This patch solves it by releasing the relative skb, after processing the above case in l2cap_recv_acldata().
Fixes: 4d7ea8ee90e4 ("Bluetooth: L2CAP: Fix handling fragmented length") Link: https://lore.kernel.org/all/0000000000000d0b1905e6aaef64@google.com/ Reported-and-tested-by: syzbot+8f819e36e01022991cfa@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index a18a1c608ed1..4802583f1071 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -8461,9 +8461,8 @@ void l2cap_recv_acldata(struct hci_conn *hcon, struct sk_buff *skb, u16 flags) * expected length. */ if (skb->len < L2CAP_LEN_SIZE) { - if (l2cap_recv_frag(conn, skb, conn->mtu) < 0) - goto drop; - return; + l2cap_recv_frag(conn, skb, conn->mtu); + break; }
len = get_unaligned_le16(skb->data) + L2CAP_HDR_SIZE; @@ -8507,7 +8506,7 @@ void l2cap_recv_acldata(struct hci_conn *hcon, struct sk_buff *skb, u16 flags)
/* Header still could not be read just continue */ if (conn->rx_skb->len < L2CAP_LEN_SIZE) - return; + break; }
if (skb->len > conn->rx_len) {
From: Gaosheng Cui cuigaosheng1@huawei.com
[ Upstream commit 40e4eb324c59e11fcb927aa46742d28aba6ecb8a ]
Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in drivers/net/phy/mdio_bus.c:586:27 left shift of 1 by 31 places cannot be represented in type 'int' Call Trace: <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c __mdiobus_register+0x49d/0x4e0 fixed_mdio_bus_init+0xd8/0x12d do_one_initcall+0x76/0x430 kernel_init_freeable+0x3b3/0x422 kernel_init+0x24/0x1e0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 4fd5f812c23c ("phylib: allow incremental scanning of an mii bus") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20221031132645.168421-1-cuigaosheng1@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/mdio_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 2c0216fe58de..dd7739b5f791 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -577,7 +577,7 @@ int __mdiobus_register(struct mii_bus *bus, struct module *owner) }
for (i = 0; i < PHY_MAX_ADDR; i++) { - if ((bus->phy_mask & (1 << i)) == 0) { + if ((bus->phy_mask & BIT(i)) == 0) { struct phy_device *phydev;
phydev = mdiobus_scan(bus, i);
From: Nick Child nnac123@linux.ibm.com
[ Upstream commit d6dd2fe71153f0ff748bf188bd4af076fe09a0a6 ]
Free the rwi structure in the event that the last rwi in the list processed successfully. The logic in commit 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets") introduces an issue that results in a 32 byte memory leak whenever the last rwi in the list gets processed.
Fixes: 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets") Signed-off-by: Nick Child nnac123@linux.ibm.com Link: https://lore.kernel.org/r/20221031150642.13356-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 4a070724a8fb..8a92c6a6e764 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2621,19 +2621,19 @@ static void __ibmvnic_reset(struct work_struct *work) rwi = get_next_rwi(adapter);
/* - * If there is another reset queued, free the previous rwi - * and process the new reset even if previous reset failed - * (the previous reset could have failed because of a fail - * over for instance, so process the fail over). - * * If there are no resets queued and the previous reset failed, * the adapter would be in an undefined state. So retry the * previous reset as a hard reset. + * + * Else, free the previous rwi and, if there is another reset + * queued, process the new reset even if previous reset failed + * (the previous reset could have failed because of a fail + * over for instance, so process the fail over). */ - if (rwi) - kfree(tmprwi); - else if (rc) + if (!rwi && rc) rwi = tmprwi; + else + kfree(tmprwi);
if (rwi && (rwi->reset_reason == VNIC_RESET_FAILOVER || rwi->reset_reason == VNIC_RESET_MOBILITY || rc))
From: Liu Peibao liupeibao@loongson.cn
[ Upstream commit 2ae34111fe4eebb69986f6490015b57c88804373 ]
In current code "plat->mdio_node" is always NULL, the mdio support is lost as there is no "mdio_bus_data". The original driver could work as the "mdio" variable is never set to false, which is described in commit <b0e03950dd71> ("stmmac: dwmac-loongson: fix uninitialized variable ......"). And after this commit merged, the "mdio" variable is always false, causing the mdio supoort logic lost.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Liu Peibao liupeibao@loongson.cn Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20221101060218.16453-1-liupeibao@loongson.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c index ecf759ee1c9f..220bb454626c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c @@ -51,7 +51,6 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id struct stmmac_resources res; struct device_node *np; int ret, i, phy_mode; - bool mdio = false;
np = dev_of_node(&pdev->dev);
@@ -69,12 +68,10 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id if (!plat) return -ENOMEM;
+ plat->mdio_node = of_get_child_by_name(np, "mdio"); if (plat->mdio_node) { - dev_err(&pdev->dev, "Found MDIO subnode\n"); - mdio = true; - } + dev_info(&pdev->dev, "Found MDIO subnode\n");
- if (mdio) { plat->mdio_bus_data = devm_kzalloc(&pdev->dev, sizeof(*plat->mdio_bus_data), GFP_KERNEL);
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 62ff373da2534534c55debe6c724c7fe14adb97f ]
In smc_init(), register_pernet_subsys(&smc_net_stat_ops) is called without any error handling. If it fails, registering of &smc_net_ops won't be reverted. And if smc_nl_init() fails, &smc_net_stat_ops itself won't be reverted.
This leaves wild ops in subsystem linkedlist and when another module tries to call register_pernet_operations() it triggers page fault:
BUG: unable to handle page fault for address: fffffbfff81b964c RIP: 0010:register_pernet_operations+0x1b9/0x5f0 Call Trace: <TASK> register_pernet_subsys+0x29/0x40 ebtables_init+0x58/0x1000 [ebtables] ...
Fixes: 194730a9beb5 ("net/smc: Make SMC statistics network namespace aware") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Tony Lu tonylu@linux.alibaba.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Link: https://lore.kernel.org/r/20221101093722.127223-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 26f81e2e1dfb..d5ddf283ed8e 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2744,14 +2744,14 @@ static int __init smc_init(void)
rc = register_pernet_subsys(&smc_net_stat_ops); if (rc) - return rc; + goto out_pernet_subsys;
smc_ism_init(); smc_clc_init();
rc = smc_nl_init(); if (rc) - goto out_pernet_subsys; + goto out_pernet_subsys_stat;
rc = smc_pnet_init(); if (rc) @@ -2829,6 +2829,8 @@ static int __init smc_init(void) smc_pnet_exit(); out_nl: smc_nl_exit(); +out_pernet_subsys_stat: + unregister_pernet_subsys(&smc_net_stat_ops); out_pernet_subsys: unregister_pernet_subsys(&smc_net_ops);
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit f8017317cb0b279b8ab98b0f3901a2e0ac880dad ]
When IPv6 module gets initialized but hits an error in the middle, kenel panic with:
KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f] CPU: 1 PID: 361 Comm: insmod Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370 RSP: 0018:ffff888012677908 EFLAGS: 00000202 ... Call Trace: <TASK> neigh_table_clear+0x94/0x2d0 ndisc_cleanup+0x27/0x40 [ipv6] inet6_init+0x21c/0x2cb [ipv6] do_one_initcall+0xd3/0x4d0 do_init_module+0x1ae/0x670 ... Kernel panic - not syncing: Fatal exception
When ipv6 initialization fails, it will try to cleanup and calls:
neigh_table_clear() neigh_ifdown(tbl, NULL) pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL)) # dev_net(NULL) triggers null-ptr-deref.
Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev is NULL, to make kernel not panic immediately.
Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Denis V. Lunev den@openvz.org Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/neighbour.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index b3556c5c1c08..0e22ecb46977 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -372,7 +372,7 @@ static int __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev, write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev, skip_perm); pneigh_ifdown_and_unlock(tbl, dev); - pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev)); + pneigh_queue_purge(&tbl->proxy_queue, dev ? dev_net(dev) : NULL); if (skb_queue_empty_lockless(&tbl->proxy_queue)) del_timer_sync(&tbl->proxy_timer); return 0;
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 768b3c745fe5789f2430bdab02f35a9ad1148d97 ]
During the initialization of ip6_route_net_init_late(), if file ipv6_route or rt6_stats fails to be created, the initialization is successful by default. Therefore, the ipv6_route or rt6_stats file doesn't be found during the remove in ip6_route_net_exit_late(). It will cause WRNING.
The following is the stack information: name 'rt6_stats' WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 PKRU: 55555554 Call Trace: <TASK> ops_exit_list+0xb0/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: cdb1876192db ("[NETNS][IPV6] route6 - create route6 proc files for the namespace") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 27274fc3619a..0655fd8c67e9 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6570,10 +6570,16 @@ static void __net_exit ip6_route_net_exit(struct net *net) static int __net_init ip6_route_net_init_late(struct net *net) { #ifdef CONFIG_PROC_FS - proc_create_net("ipv6_route", 0, net->proc_net, &ipv6_route_seq_ops, - sizeof(struct ipv6_route_iter)); - proc_create_net_single("rt6_stats", 0444, net->proc_net, - rt6_stats_seq_show, NULL); + if (!proc_create_net("ipv6_route", 0, net->proc_net, + &ipv6_route_seq_ops, + sizeof(struct ipv6_route_iter))) + return -ENOMEM; + + if (!proc_create_net_single("rt6_stats", 0444, net->proc_net, + rt6_stats_seq_show, NULL)) { + remove_proc_entry("ipv6_route", net->proc_net); + return -ENOMEM; + } #endif return 0; }
From: Dexuan Cui decui@microsoft.com
[ Upstream commit 466a85336fee6e3b35eb97b8405a28302fd25809 ]
Currently vsock_connectible_has_data() may miss a wakeup operation between vsock_connectible_has_data() == 0 and the prepare_to_wait().
Fix the race by adding the process to the wait queue before checking vsock_connectible_has_data().
Fixes: b3f7fd54881b ("af_vsock: separate wait data loop") Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Reported-by: Frédéric Dalleau frederic.dalleau@docker.com Tested-by: Frédéric Dalleau frederic.dalleau@docker.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/af_vsock.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 5d46036f3ad7..dc36a46ce0e7 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1897,8 +1897,11 @@ static int vsock_connectible_wait_data(struct sock *sk, err = 0; transport = vsk->transport;
- while ((data = vsock_connectible_has_data(vsk)) == 0) { + while (1) { prepare_to_wait(sk_sleep(sk), wait, TASK_INTERRUPTIBLE); + data = vsock_connectible_has_data(vsk); + if (data != 0) + break;
if (sk->sk_err != 0 || (sk->sk_shutdown & RCV_SHUTDOWN) ||
From: Daniel Thompson daniel.thompson@linaro.org
[ Upstream commit 088604d37e23e9ec01a501d0e3630bc4f02027a0 ]
Quoting the header comments, IRQF_ONESHOT is "Used by threaded interrupts which need to keep the irq line disabled until the threaded handler has been run.". When applied to an interrupt that doesn't request a threaded irq then IRQF_ONESHOT has a lesser known (undocumented?) side effect, which it to disable the forced threading of irqs. For "normal" kernels if there is no thread_fn then IRQF_ONESHOT is a nop.
In this case disabling forced threading is not appropriate because the driver calls wake_up_all() (via msm_hdmi_i2c_irq) and also directly uses the regular spinlock API for locking (in msm_hdmi_hdcp_irq() ). Neither of these APIs can be called from no-thread interrupt handlers on PREEMPT_RT systems.
Fix this by removing IRQF_ONESHOT.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220201174734.196718-3-daniel.thompson@linaro.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Stable-dep-of: 152d394842bb ("drm/msm/hdmi: fix IRQ lifetime") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/hdmi/hdmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c index e082e01b5e8d..0b756e84d393 100644 --- a/drivers/gpu/drm/msm/hdmi/hdmi.c +++ b/drivers/gpu/drm/msm/hdmi/hdmi.c @@ -331,7 +331,7 @@ int msm_hdmi_modeset_init(struct hdmi *hdmi, }
ret = devm_request_irq(&pdev->dev, hdmi->irq, - msm_hdmi_irq, IRQF_TRIGGER_HIGH | IRQF_ONESHOT, + msm_hdmi_irq, IRQF_TRIGGER_HIGH, "hdmi_isr", hdmi); if (ret < 0) { DRM_DEV_ERROR(dev->dev, "failed to request IRQ%u: %d\n",
From: Johan Hovold johan+linaro@kernel.org
[ Upstream commit 152d394842bb564148e68b92486a87db0bf54859 ]
Device-managed resources allocated post component bind must be tied to the lifetime of the aggregate DRM device or they will not necessarily be released when binding of the aggregate device is deferred.
This is specifically true for the HDMI IRQ, which will otherwise remain requested so that the next bind attempt fails when requesting the IRQ a second time.
Fix this by tying the device-managed lifetime of the HDMI IRQ to the DRM device so that it is released when bind fails.
Fixes: 067fef372c73 ("drm/msm/hdmi: refactor bind/init") Cc: stable@vger.kernel.org # 3.19 Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Tested-by: Kuogee Hsieh quic_khsieh@quicinc.com Reviewed-by: Kuogee Hsieh quic_khsieh@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/502666/ Link: https://lore.kernel.org/r/20220913085320.8577-9-johan+linaro@kernel.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/hdmi/hdmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c index 0b756e84d393..f6b09e8eca67 100644 --- a/drivers/gpu/drm/msm/hdmi/hdmi.c +++ b/drivers/gpu/drm/msm/hdmi/hdmi.c @@ -330,7 +330,7 @@ int msm_hdmi_modeset_init(struct hdmi *hdmi, goto fail; }
- ret = devm_request_irq(&pdev->dev, hdmi->irq, + ret = devm_request_irq(dev->dev, hdmi->irq, msm_hdmi_irq, IRQF_TRIGGER_HIGH, "hdmi_isr", hdmi); if (ret < 0) {
From: Helge Deller deller@gmx.de
[ Upstream commit 9c379c65241707e44072139d782bc2dfec9b4ab3 ]
The stifb driver (for Artist/HCRX graphics on PA-RISC) was missing the fillrect function. Tested on a 715/64 PA-RISC machine and in qemu.
Signed-off-by: Helge Deller deller@gmx.de Stable-dep-of: 776d875fd4cb ("fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/stifb.c | 45 +++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/video/fbdev/stifb.c b/drivers/video/fbdev/stifb.c index b0470f4f595e..7753e586e65a 100644 --- a/drivers/video/fbdev/stifb.c +++ b/drivers/video/fbdev/stifb.c @@ -1041,6 +1041,47 @@ stifb_copyarea(struct fb_info *info, const struct fb_copyarea *area) SETUP_FB(fb); }
+#define ARTIST_VRAM_SIZE 0x000804 +#define ARTIST_VRAM_SRC 0x000808 +#define ARTIST_VRAM_SIZE_TRIGGER_WINFILL 0x000a04 +#define ARTIST_VRAM_DEST_TRIGGER_BLOCKMOVE 0x000b00 +#define ARTIST_SRC_BM_ACCESS 0x018008 +#define ARTIST_FGCOLOR 0x018010 +#define ARTIST_BGCOLOR 0x018014 +#define ARTIST_BITMAP_OP 0x01801c + +static void +stifb_fillrect(struct fb_info *info, const struct fb_fillrect *rect) +{ + struct stifb_info *fb = container_of(info, struct stifb_info, info); + + if (rect->rop != ROP_COPY) + return cfb_fillrect(info, rect); + + SETUP_HW(fb); + + if (fb->info.var.bits_per_pixel == 32) { + WRITE_WORD(0xBBA0A000, fb, REG_10); + + NGLE_REALLY_SET_IMAGE_PLANEMASK(fb, 0xffffffff); + } else { + WRITE_WORD(fb->id == S9000_ID_HCRX ? 0x13a02000 : 0x13a01000, fb, REG_10); + + NGLE_REALLY_SET_IMAGE_PLANEMASK(fb, 0xff); + } + + WRITE_WORD(0x03000300, fb, ARTIST_BITMAP_OP); + WRITE_WORD(0x2ea01000, fb, ARTIST_SRC_BM_ACCESS); + NGLE_QUICK_SET_DST_BM_ACCESS(fb, 0x2ea01000); + NGLE_REALLY_SET_IMAGE_FG_COLOR(fb, rect->color); + WRITE_WORD(0, fb, ARTIST_BGCOLOR); + + NGLE_SET_DSTXY(fb, (rect->dx << 16) | (rect->dy)); + SET_LENXY_START_RECFILL(fb, (rect->width << 16) | (rect->height)); + + SETUP_FB(fb); +} + static void __init stifb_init_display(struct stifb_info *fb) { @@ -1105,7 +1146,7 @@ static const struct fb_ops stifb_ops = { .owner = THIS_MODULE, .fb_setcolreg = stifb_setcolreg, .fb_blank = stifb_blank, - .fb_fillrect = cfb_fillrect, + .fb_fillrect = stifb_fillrect, .fb_copyarea = stifb_copyarea, .fb_imageblit = cfb_imageblit, }; @@ -1297,7 +1338,7 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref) goto out_err0; } info->screen_size = fix->smem_len; - info->flags = FBINFO_DEFAULT | FBINFO_HWACCEL_COPYAREA; + info->flags = FBINFO_HWACCEL_COPYAREA | FBINFO_HWACCEL_FILLRECT; info->pseudo_palette = &fb->pseudo_palette;
/* This has to be done !!! */
From: Helge Deller deller@gmx.de
[ Upstream commit 776d875fd4cbb3884860ea7f63c3958f02b0c80e ]
When the text console is scrolling text upwards it calls the fillrect() function to empty the new line. The current implementation doesn't seem to work correctly on HCRX cards in 32-bit mode and leave garbage in that line instead. Fix it by falling back to standard cfb_fillrect() in that case.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/stifb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/stifb.c b/drivers/video/fbdev/stifb.c index 7753e586e65a..3feb6e40d56d 100644 --- a/drivers/video/fbdev/stifb.c +++ b/drivers/video/fbdev/stifb.c @@ -1055,7 +1055,8 @@ stifb_fillrect(struct fb_info *info, const struct fb_fillrect *rect) { struct stifb_info *fb = container_of(info, struct stifb_info, info);
- if (rect->rop != ROP_COPY) + if (rect->rop != ROP_COPY || + (fb->id == S9000_ID_HCRX && fb->info.var.bits_per_pixel == 32)) return cfb_fillrect(info, rect);
SETUP_HW(fb);
From: Rafał Miłecki rafal@milecki.pl
[ Upstream commit 4c38eded807043f40f4dc49da6df097f9dcac393 ]
mtd_read() gets called with offset + 0x8000 as argument so use the same value in pr_err().
Signed-off-by: Rafał Miłecki rafal@milecki.pl Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20220317114316.29827-1-zajec5@gmail.com Stable-dep-of: 05e258c6ec66 ("mtd: parsers: bcm47xxpart: Fix halfblock reads") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/parsers/bcm47xxpart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/parsers/bcm47xxpart.c b/drivers/mtd/parsers/bcm47xxpart.c index 6012a10f10c8..50fcf4c2174b 100644 --- a/drivers/mtd/parsers/bcm47xxpart.c +++ b/drivers/mtd/parsers/bcm47xxpart.c @@ -237,7 +237,7 @@ static int bcm47xxpart_parse(struct mtd_info *master, (uint8_t *)buf); if (err && !mtd_is_bitflip(err)) { pr_err("mtd_read error while parsing (offset: 0x%X): %d\n", - offset, err); + offset + 0x8000, err); continue; }
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit 05e258c6ec669d6d18c494ea03d35962d6f5b545 ]
There is some code in the parser that tries to read 0x8000 bytes into a block to "read in the middle" of the block. Well that only works if the block is also 0x10000 bytes all the time, else we get these parse errors as we reach the end of the flash:
spi-nor spi0.0: mx25l1606e (2048 Kbytes) mtd_read error while parsing (offset: 0x200000): -22 mtd_read error while parsing (offset: 0x201000): -22 (...)
Fix the code to do what I think was intended.
Cc: stable@vger.kernel.org Fixes: f0501e81fbaa ("mtd: bcm47xxpart: alternative MAGIC for board_data partition") Cc: Rafał Miłecki zajec5@gmail.com Cc: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20221018091129.280026-1-linus.walleij@lina... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/parsers/bcm47xxpart.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/parsers/bcm47xxpart.c b/drivers/mtd/parsers/bcm47xxpart.c index 50fcf4c2174b..13daf9bffd08 100644 --- a/drivers/mtd/parsers/bcm47xxpart.c +++ b/drivers/mtd/parsers/bcm47xxpart.c @@ -233,11 +233,11 @@ static int bcm47xxpart_parse(struct mtd_info *master, }
/* Read middle of the block */ - err = mtd_read(master, offset + 0x8000, 0x4, &bytes_read, + err = mtd_read(master, offset + (blocksize / 2), 0x4, &bytes_read, (uint8_t *)buf); if (err && !mtd_is_bitflip(err)) { pr_err("mtd_read error while parsing (offset: 0x%X): %d\n", - offset + 0x8000, err); + offset + (blocksize / 2), err); continue; }
From: Heiko Carstens hca@linux.ibm.com
[ Upstream commit 4e1b5a86a5edfbefc9396d41b0fc1a2ebd0101b6 ]
For some exception types the instruction address points behind the instruction that caused the exception. Take that into account and add the missing exception table entries.
Cc: stable@vger.kernel.org Reviewed-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/lib/uaccess.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c index a596e69d3c47..0b012ce0921c 100644 --- a/arch/s390/lib/uaccess.c +++ b/arch/s390/lib/uaccess.c @@ -212,7 +212,7 @@ static inline unsigned long clear_user_mvcos(void __user *to, unsigned long size asm volatile( " llilh 0,%[spec]\n" "0: .insn ss,0xc80000000000,0(%0,%1),0(%4),0\n" - " jz 4f\n" + "6: jz 4f\n" "1: algr %0,%2\n" " slgr %1,%2\n" " j 0b\n" @@ -222,11 +222,11 @@ static inline unsigned long clear_user_mvcos(void __user *to, unsigned long size " clgr %0,%3\n" /* copy crosses next page boundary? */ " jnh 5f\n" "3: .insn ss,0xc80000000000,0(%3,%1),0(%4),0\n" - " slgr %0,%3\n" + "7: slgr %0,%3\n" " j 5f\n" "4: slgr %0,%0\n" "5:\n" - EX_TABLE(0b,2b) EX_TABLE(3b,5b) + EX_TABLE(0b,2b) EX_TABLE(6b,2b) EX_TABLE(3b,5b) EX_TABLE(7b,5b) : "+a" (size), "+a" (to), "+a" (tmp1), "=a" (tmp2) : "a" (empty_zero_page), [spec] "K" (0x81UL) : "cc", "memory", "0");
From: Peter Oberparleiter oberpar@linux.ibm.com
[ Upstream commit aa127a069ef312aca02b730d5137e1778d0c3ba7 ]
This patch enhances the kernel image adding a trailer as required for secure boot by future firmware versions.
Cc: stable@vger.kernel.org # 5.2+ Signed-off-by: Peter Oberparleiter oberpar@linux.ibm.com Reviewed-by: Sven Schnelle svens@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/boot/compressed/vmlinux.lds.S | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/s390/boot/compressed/vmlinux.lds.S b/arch/s390/boot/compressed/vmlinux.lds.S index 918e05137d4c..1686a852534f 100644 --- a/arch/s390/boot/compressed/vmlinux.lds.S +++ b/arch/s390/boot/compressed/vmlinux.lds.S @@ -93,8 +93,17 @@ SECTIONS _compressed_start = .; *(.vmlinux.bin.compressed) _compressed_end = .; - FILL(0xff); - . = ALIGN(4096); + } + +#define SB_TRAILER_SIZE 32 + /* Trailer needed for Secure Boot */ + . += SB_TRAILER_SIZE; /* make sure .sb.trailer does not overwrite the previous section */ + . = ALIGN(4096) - SB_TRAILER_SIZE; + .sb.trailer : { + QUAD(0) + QUAD(0) + QUAD(0) + QUAD(0x000000207a49504c) } _end = .;
From: Vineeth Vijayan vneethv@linux.ibm.com
[ Upstream commit 0c3812c347bfb0dc213556a195e79850c55702f5 ]
cdev->online for the purge function must not be checked for the non-IO subchannel type. Make sure that we are deriving the cdev only from sch-type SUBCHANNEL_TYPE_IO.
Signed-off-by: Vineeth Vijayan vneethv@linux.ibm.com Reviewed-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Stable-dep-of: 1b6074112742 ("s390/cio: fix out-of-bounds access on cio_ignore free") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/cio/css.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c index c27809792609..ce9e7517430f 100644 --- a/drivers/s390/cio/css.c +++ b/drivers/s390/cio/css.c @@ -792,10 +792,13 @@ static int __unset_online(struct device *dev, void *data) { struct idset *set = data; struct subchannel *sch = to_subchannel(dev); - struct ccw_device *cdev = sch_get_cdev(sch); + struct ccw_device *cdev;
- if (cdev && cdev->online) - idset_sch_del(set, sch->schid); + if (sch->st == SUBCHANNEL_TYPE_IO) { + cdev = sch_get_cdev(sch); + if (cdev && cdev->online) + idset_sch_del(set, sch->schid); + }
return 0; }
From: Peter Oberparleiter oberpar@linux.ibm.com
[ Upstream commit 1b6074112742f65ece71b0f299ca5a6a887d2db6 ]
The channel-subsystem-driver scans for newly available devices whenever device-IDs are removed from the cio_ignore list using a command such as:
echo free >/proc/cio_ignore
Since an I/O device scan might interfer with running I/Os, commit 172da89ed0ea ("s390/cio: avoid excessive path-verification requests") introduced an optimization to exclude online devices from the scan.
The newly added check for online devices incorrectly assumes that an I/O-subchannel's drvdata points to a struct io_subchannel_private. For devices that are bound to a non-default I/O subchannel driver, such as the vfio_ccw driver, this results in an out-of-bounds read access during each scan.
Fix this by changing the scan logic to rely on a driver-independent online indication. For this we can use struct subchannel->config.ena, which is the driver's requested subchannel-enabled state. Since I/Os can only be started on enabled subchannels, this matches the intent of the original optimization of not scanning devices where I/O might be running.
Fixes: 172da89ed0ea ("s390/cio: avoid excessive path-verification requests") Fixes: 0c3812c347bf ("s390/cio: derive cdev information only for IO-subchannels") Cc: stable@vger.kernel.org # v5.15 Reported-by: Alexander Egorenkov egorenar@linux.ibm.com Reviewed-by: Vineeth Vijayan vneethv@linux.ibm.com Signed-off-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/cio/css.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c index ce9e7517430f..2ba9e0135565 100644 --- a/drivers/s390/cio/css.c +++ b/drivers/s390/cio/css.c @@ -792,13 +792,9 @@ static int __unset_online(struct device *dev, void *data) { struct idset *set = data; struct subchannel *sch = to_subchannel(dev); - struct ccw_device *cdev;
- if (sch->st == SUBCHANNEL_TYPE_IO) { - cdev = sch_get_cdev(sch); - if (cdev && cdev->online) - idset_sch_del(set, sch->schid); - } + if (sch->st == SUBCHANNEL_TYPE_IO && sch->config.ena) + idset_sch_del(set, sch->schid);
return 0; }
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit 711d91497e203b058cf0a08c0f7d41c04efbde76 ]
The rkisp1_csm_config() function takes a pointer to the rkisp1_params structure which contains the quantization value. There's no need to pass it separately to the function. Drop it from the function parameters.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-params.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c index 8fa5b0abf1f9..8461e88c1288 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c @@ -751,7 +751,7 @@ static void rkisp1_ie_enable(struct rkisp1_params *params, bool en) } }
-static void rkisp1_csm_config(struct rkisp1_params *params, bool full_range) +static void rkisp1_csm_config(struct rkisp1_params *params) { static const u16 full_range_coeff[] = { 0x0026, 0x004b, 0x000f, @@ -765,7 +765,7 @@ static void rkisp1_csm_config(struct rkisp1_params *params, bool full_range) }; unsigned int i;
- if (full_range) { + if (params->quantization == V4L2_QUANTIZATION_FULL_RANGE) { for (i = 0; i < ARRAY_SIZE(full_range_coeff); i++) rkisp1_write(params->rkisp1, full_range_coeff[i], RKISP1_CIF_ISP_CC_COEFF_0 + i * 4); @@ -1235,11 +1235,7 @@ static void rkisp1_params_config_parameter(struct rkisp1_params *params) rkisp1_param_set_bits(params, RKISP1_CIF_ISP_HIST_PROP, rkisp1_hst_params_default_config.mode);
- /* set the range */ - if (params->quantization == V4L2_QUANTIZATION_FULL_RANGE) - rkisp1_csm_config(params, true); - else - rkisp1_csm_config(params, false); + rkisp1_csm_config(params);
spin_lock_irq(¶ms->config_lock);
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit 83b9296e399367862845d3b19984444fc756bd61 ]
Initialize the four color space fields on the sink and source video pads of the resizer in the .init_cfg() operation. The resizer can't perform any color space conversion, so set the sink and source color spaces to the same defaults, which match the ISP source video pad default.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c index 2070f4b06705..a166ede40967 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-resizer.c @@ -510,6 +510,10 @@ static int rkisp1_rsz_init_config(struct v4l2_subdev *sd, sink_fmt->height = RKISP1_DEFAULT_HEIGHT; sink_fmt->field = V4L2_FIELD_NONE; sink_fmt->code = RKISP1_DEF_FMT; + sink_fmt->colorspace = V4L2_COLORSPACE_SRGB; + sink_fmt->xfer_func = V4L2_XFER_FUNC_SRGB; + sink_fmt->ycbcr_enc = V4L2_YCBCR_ENC_601; + sink_fmt->quantization = V4L2_QUANTIZATION_LIM_RANGE;
sink_crop = v4l2_subdev_get_try_crop(sd, sd_state, RKISP1_RSZ_PAD_SINK);
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit 4c3501f13e8e60f6e7e7308c77ac4404e1007c18 ]
The rkisp1_lsc_config() function incorrectly uses the RKISP1_CIF_ISP_LSC_SECT_SIZE() macro for the gradient registers. Replace it with the correct macro, and rename it from RKISP1_CIF_ISP_LSC_GRAD_SIZE() to RKISP1_CIF_ISP_LSC_SECT_GRAD() as the corresponding registers store the gradients for each sector, not a size. This doesn't cause any functional change as the two macros are defined identically (the size and gradient registers store fields in the same number of bits at the same positions).
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-params.c | 4 ++-- drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c index 8461e88c1288..e0e7d0b4ea04 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-params.c @@ -275,7 +275,7 @@ static void rkisp1_lsc_config(struct rkisp1_params *params, RKISP1_CIF_ISP_LSC_XSIZE_01 + i * 4);
/* program x grad tables */ - data = RKISP1_CIF_ISP_LSC_SECT_SIZE(arg->x_grad_tbl[i * 2], + data = RKISP1_CIF_ISP_LSC_SECT_GRAD(arg->x_grad_tbl[i * 2], arg->x_grad_tbl[i * 2 + 1]); rkisp1_write(params->rkisp1, data, RKISP1_CIF_ISP_LSC_XGRAD_01 + i * 4); @@ -287,7 +287,7 @@ static void rkisp1_lsc_config(struct rkisp1_params *params, RKISP1_CIF_ISP_LSC_YSIZE_01 + i * 4);
/* program y grad tables */ - data = RKISP1_CIF_ISP_LSC_SECT_SIZE(arg->y_grad_tbl[i * 2], + data = RKISP1_CIF_ISP_LSC_SECT_GRAD(arg->y_grad_tbl[i * 2], arg->y_grad_tbl[i * 2 + 1]); rkisp1_write(params->rkisp1, data, RKISP1_CIF_ISP_LSC_YGRAD_01 + i * 4); diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h b/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h index fa33080f51db..f584ccfe0286 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-regs.h @@ -480,7 +480,7 @@ (((v0) & 0xFFF) | (((v1) & 0xFFF) << 12)) #define RKISP1_CIF_ISP_LSC_SECT_SIZE(v0, v1) \ (((v0) & 0xFFF) | (((v1) & 0xFFF) << 16)) -#define RKISP1_CIF_ISP_LSC_GRAD_SIZE(v0, v1) \ +#define RKISP1_CIF_ISP_LSC_SECT_GRAD(v0, v1) \ (((v0) & 0xFFF) | (((v1) & 0xFFF) << 16))
/* LSC: ISP_LSC_TABLE_SEL */
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit c53e3a049f35978a150526671587fd46b1ae7ca1 ]
The local sd_fmt variable in rkisp1_capture_link_validate() has uninitialized fields, which causes random failures when calling the subdev .get_fmt() operation. Fix it by initializing the variable when declaring it, which zeros all other fields.
Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Paul Elder paul.elder@ideasonboard.com Reviewed-by: Dafna Hirschfeld dafna@fastmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c index 41988eb0ec0a..0f980f68058c 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c @@ -1270,11 +1270,12 @@ static int rkisp1_capture_link_validate(struct media_link *link) struct rkisp1_capture *cap = video_get_drvdata(vdev); const struct rkisp1_capture_fmt_cfg *fmt = rkisp1_find_fmt_cfg(cap, cap->pix.fmt.pixelformat); - struct v4l2_subdev_format sd_fmt; + struct v4l2_subdev_format sd_fmt = { + .which = V4L2_SUBDEV_FORMAT_ACTIVE, + .pad = link->source->index, + }; int ret;
- sd_fmt.which = V4L2_SUBDEV_FORMAT_ACTIVE; - sd_fmt.pad = link->source->index; ret = v4l2_subdev_call(sd, pad, get_fmt, NULL, &sd_fmt); if (ret) return ret;
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 93f65ce036863893c164ca410938e0968964b26c ]
I expect that the hardware will have limited this to 16, but just in case it hasn't, check for this corner case.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/platform/s5p/s5p_cec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/media/cec/platform/s5p/s5p_cec.c b/drivers/media/cec/platform/s5p/s5p_cec.c index 028a09a7531e..102f1af01000 100644 --- a/drivers/media/cec/platform/s5p/s5p_cec.c +++ b/drivers/media/cec/platform/s5p/s5p_cec.c @@ -115,6 +115,8 @@ static irqreturn_t s5p_cec_irq_handler(int irq, void *priv) dev_dbg(cec->dev, "Buffer overrun (worker did not process previous message)\n"); cec->rx = STATE_BUSY; cec->msg.len = status >> 24; + if (cec->msg.len > CEC_MAX_MSG_SIZE) + cec->msg.len = CEC_MAX_MSG_SIZE; cec->msg.rx_status = CEC_RX_STATUS_OK; s5p_cec_get_rx_buf(cec, cec->msg.len, cec->msg.msg);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 2dc73b48665411a08c4e5f0f823dea8510761603 ]
I expect that the hardware will have limited this to 16, but just in case it hasn't, check for this corner case.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/platform/cros-ec/cros-ec-cec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/media/cec/platform/cros-ec/cros-ec-cec.c b/drivers/media/cec/platform/cros-ec/cros-ec-cec.c index 2d95e16cd248..f66699d5dc66 100644 --- a/drivers/media/cec/platform/cros-ec/cros-ec-cec.c +++ b/drivers/media/cec/platform/cros-ec/cros-ec-cec.c @@ -44,6 +44,8 @@ static void handle_cec_message(struct cros_ec_cec *cros_ec_cec) uint8_t *cec_message = cros_ec->event_data.data.cec_message; unsigned int len = cros_ec->event_size;
+ if (len > CEC_MAX_MSG_SIZE) + len = CEC_MAX_MSG_SIZE; cros_ec_cec->rx_msg.len = len; memcpy(cros_ec_cec->rx_msg.msg, cec_message, len);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 20694e96ca089ce6693c2348f8f628ee621e4e74 ]
Fix a compiler warning:
drivers/media/dvb-frontends/drxk_hard.c: In function 'drxk_read_ucblocks': drivers/media/dvb-frontends/drxk_hard.c:6673:21: warning: 'err' may be used uninitialized [-Wmaybe-uninitialized] 6673 | *ucblocks = (u32) err; | ^~~~~~~~~ drivers/media/dvb-frontends/drxk_hard.c:6663:13: note: 'err' was declared here 6663 | u16 err; | ^~~
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/dvb-frontends/drxk_hard.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/dvb-frontends/drxk_hard.c b/drivers/media/dvb-frontends/drxk_hard.c index d7fc2595f15b..efe92eef67db 100644 --- a/drivers/media/dvb-frontends/drxk_hard.c +++ b/drivers/media/dvb-frontends/drxk_hard.c @@ -6673,7 +6673,7 @@ static int drxk_read_snr(struct dvb_frontend *fe, u16 *snr) static int drxk_read_ucblocks(struct dvb_frontend *fe, u32 *ucblocks) { struct drxk_state *state = fe->demodulator_priv; - u16 err; + u16 err = 0;
dprintk(1, "\n");
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit 7718999356234d9cc6a11b4641bb773928f1390f ]
v4l2_device_unregister need to be called to put the refcount got by v4l2_device_register when vdec_probe fails or vdec_remove is called.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/media/meson/vdec/vdec.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/staging/media/meson/vdec/vdec.c b/drivers/staging/media/meson/vdec/vdec.c index e51d69c4729d..040ed56eb24f 100644 --- a/drivers/staging/media/meson/vdec/vdec.c +++ b/drivers/staging/media/meson/vdec/vdec.c @@ -1105,6 +1105,7 @@ static int vdec_probe(struct platform_device *pdev)
err_vdev_release: video_device_release(vdev); + v4l2_device_unregister(&core->v4l2_dev); return ret; }
@@ -1113,6 +1114,7 @@ static int vdec_remove(struct platform_device *pdev) struct amvdec_core *core = platform_get_drvdata(pdev);
video_unregister_device(core->vdev_dec); + v4l2_device_unregister(&core->v4l2_dev);
return 0; }
From: Sakari Ailus sakari.ailus@linux.intel.com
[ Upstream commit 2ba3e38517f5a4ebf9c997168079dca01b7f9fc6 ]
The state argument for the functions for obtaining various parts of the state is NULL if it is called by drivers for active state. Fail graciously in that case instead of dereferencing a NULL pointer.
Suggested-by: Bingbu Cao bingbu.cao@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Reviewed-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/media/v4l2-subdev.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h index 95ec18c2f49c..9a476f902c42 100644 --- a/include/media/v4l2-subdev.h +++ b/include/media/v4l2-subdev.h @@ -995,6 +995,8 @@ v4l2_subdev_get_try_format(struct v4l2_subdev *sd, struct v4l2_subdev_state *state, unsigned int pad) { + if (WARN_ON(!state)) + return NULL; if (WARN_ON(pad >= sd->entity.num_pads)) pad = 0; return &state->pads[pad].try_fmt; @@ -1013,6 +1015,8 @@ v4l2_subdev_get_try_crop(struct v4l2_subdev *sd, struct v4l2_subdev_state *state, unsigned int pad) { + if (WARN_ON(!state)) + return NULL; if (WARN_ON(pad >= sd->entity.num_pads)) pad = 0; return &state->pads[pad].try_crop; @@ -1031,6 +1035,8 @@ v4l2_subdev_get_try_compose(struct v4l2_subdev *sd, struct v4l2_subdev_state *state, unsigned int pad) { + if (WARN_ON(!state)) + return NULL; if (WARN_ON(pad >= sd->entity.num_pads)) pad = 0; return &state->pads[pad].try_compose;
From: Ashish Kalra ashish.kalra@amd.com
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra ashish.kalra@amd.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/apei/ghes.c | 2 +- include/acpi/ghes.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d490670f8d55..8678e162181f 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -163,7 +163,7 @@ static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx) clear_fixmap(fixmap_idx); }
-int ghes_estatus_pool_init(int num_ghes) +int ghes_estatus_pool_init(unsigned int num_ghes) { unsigned long addr, len; int rc; diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 34fb3431a8f3..292a5c40bd0c 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -71,7 +71,7 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb); void ghes_unregister_vendor_record_notifier(struct notifier_block *nb); #endif
-int ghes_estatus_pool_init(int num_ghes); +int ghes_estatus_pool_init(unsigned int num_ghes);
/* From drivers/edac/ghes_edac.c */
From: Uday Shankar ushankar@purestorage.com
[ Upstream commit 2331ce6126be8864b39490e705286b66e2344aac ]
Userspace can currently write to sysfs to transition sdev_state to RUNNING or OFFLINE from any source state. This causes issues because proper transitioning out of some states involves steps besides just changing sdev_state, so allowing userspace to change sdev_state regardless of the source state can result in inconsistencies; e.g. with ISCSI we can end up with sdev_state == SDEV_RUNNING while the device queue is quiesced. Any task attempting I/O on the device will then hang, and in more recent kernels, iscsid will hang as well.
More detail about this bug is provided in my first attempt:
https://groups.google.com/g/open-iscsi/c/PNKca4HgPDs/m/CXaDkntOAQAJ
Link: https://lore.kernel.org/r/20220924000241.2967323-1-ushankar@purestorage.com Signed-off-by: Uday Shankar ushankar@purestorage.com Suggested-by: Mike Christie michael.christie@oracle.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_sysfs.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 920aae661c5b..774864b54b97 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -816,6 +816,14 @@ store_state_field(struct device *dev, struct device_attribute *attr, }
mutex_lock(&sdev->state_mutex); + switch (sdev->sdev_state) { + case SDEV_RUNNING: + case SDEV_OFFLINE: + break; + default: + mutex_unlock(&sdev->state_mutex); + return -EINVAL; + } if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING) { ret = 0; } else {
From: Samuel Bailey samuel.bailey1@gmail.com
[ Upstream commit 79425b297f56bd481c6e97700a9a4e44c7bcfa35 ]
The MadCatz variant of the MMO7 mouse has the ID 0738:1713 and the same quirks as the Saitek variant.
Signed-off-by: Samuel Bailey samuel.bailey1@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + drivers/hid/hid-saitek.c | 2 ++ 3 files changed, 4 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index a5aac9cc2075..c8a313c84a57 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -845,6 +845,7 @@ #define USB_DEVICE_ID_MADCATZ_BEATPAD 0x4540 #define USB_DEVICE_ID_MADCATZ_RAT5 0x1705 #define USB_DEVICE_ID_MADCATZ_RAT9 0x1709 +#define USB_DEVICE_ID_MADCATZ_MMO7 0x1713
#define USB_VENDOR_ID_MCC 0x09db #define USB_DEVICE_ID_MCC_PMD1024LS 0x0076 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 544d1197aca4..8d36cb7551cf 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -608,6 +608,7 @@ static const struct hid_device_id hid_have_special_driver[] = { { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_MMO7) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_RAT5) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_RAT9) }, + { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_MMO7) }, #endif #if IS_ENABLED(CONFIG_HID_SAMSUNG) { HID_USB_DEVICE(USB_VENDOR_ID_SAMSUNG, USB_DEVICE_ID_SAMSUNG_IR_REMOTE) }, diff --git a/drivers/hid/hid-saitek.c b/drivers/hid/hid-saitek.c index c7bf14c01960..b84e975977c4 100644 --- a/drivers/hid/hid-saitek.c +++ b/drivers/hid/hid-saitek.c @@ -187,6 +187,8 @@ static const struct hid_device_id saitek_devices[] = { .driver_data = SAITEK_RELEASE_MODE_RAT7 }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_MMO7), .driver_data = SAITEK_RELEASE_MODE_MMO7 }, + { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_MMO7), + .driver_data = SAITEK_RELEASE_MODE_MMO7 }, { } };
From: Danijel Slivka danijel.slivka@amd.com
[ Upstream commit 65f8682b9aaae20c2cdee993e6fe52374ad513c9 ]
For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime.
v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT which indicates that VF MMIO write access is not allowed in sriov runtime
Signed-off-by: Danijel Slivka danijel.slivka@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 4 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index a0803425b456..b508126a9738 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -706,6 +706,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev) adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE; }
+ if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_SIENNA_CICHLID) + /* VF MMIO access (except mailbox range) from CPU + * will be blocked during sriov runtime + */ + adev->virt.caps |= AMDGPU_VF_MMIO_ACCESS_PROTECT; + /* we have the ability to check now */ if (amdgpu_sriov_vf(adev)) { switch (adev->asic_type) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h index 9adfb8d63280..ce31d4fdee93 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h @@ -31,6 +31,7 @@ #define AMDGPU_SRIOV_CAPS_IS_VF (1 << 2) /* this GPU is a virtual function */ #define AMDGPU_PASSTHROUGH_MODE (1 << 3) /* thw whole GPU is pass through for VM */ #define AMDGPU_SRIOV_CAPS_RUNTIME (1 << 4) /* is out of full access mode */ +#define AMDGPU_VF_MMIO_ACCESS_PROTECT (1 << 5) /* MMIO write access is not allowed in sriov runtime */
/* all asic after AI use this offset */ #define mmRCC_IOV_FUNC_IDENTIFIER 0xDE5 @@ -278,6 +279,9 @@ struct amdgpu_video_codec_info; #define amdgpu_passthrough(adev) \ ((adev)->virt.caps & AMDGPU_PASSTHROUGH_MODE)
+#define amdgpu_sriov_vf_mmio_access_protection(adev) \ +((adev)->virt.caps & AMDGPU_VF_MMIO_ACCESS_PROTECT) + static inline bool is_virtual_machine(void) { #ifdef CONFIG_X86 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fd37bb39774c..01710cd0d972 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -3224,7 +3224,11 @@ void amdgpu_vm_manager_init(struct amdgpu_device *adev) */ #ifdef CONFIG_X86_64 if (amdgpu_vm_update_mode == -1) { - if (amdgpu_gmc_vram_full_visible(&adev->gmc)) + /* For asic with VF MMIO access protection + * avoid using CPU for VM table updates + */ + if (amdgpu_gmc_vram_full_visible(&adev->gmc) && + !amdgpu_sriov_vf_mmio_access_protection(adev)) adev->vm_manager.vm_update_mode = AMDGPU_VM_USE_CPU_FOR_COMPUTE; else
From: Martin Tůma martin.tuma@digiteqautomotive.com
[ Upstream commit b8caf0a0e04583fb71e21495bef84509182227ea ]
The missing "platform" alias is required for the mgb4 v4l2 driver to load the i2c controller driver when probing the HW.
Signed-off-by: Martin Tůma martin.tuma@digiteqautomotive.com Acked-by: Michal Simek michal.simek@amd.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-xiic.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/busses/i2c-xiic.c b/drivers/i2c/busses/i2c-xiic.c index 612343771ce2..34b8da949462 100644 --- a/drivers/i2c/busses/i2c-xiic.c +++ b/drivers/i2c/busses/i2c-xiic.c @@ -934,6 +934,7 @@ static struct platform_driver xiic_i2c_driver = {
module_platform_driver(xiic_i2c_driver);
+MODULE_ALIAS("platform:" DRIVER_NAME); MODULE_AUTHOR("info@mocean-labs.com"); MODULE_DESCRIPTION("Xilinx I2C bus driver"); MODULE_LICENSE("GPL v2");
From: Jerry Snitselaar jsnitsel@redhat.com
[ Upstream commit f4cd18c5b2000df0c382f6530eeca9141ea41faf ]
memblock_reserve() expects a physical address, but the address being passed for the TPM final events log is what was returned from early_memremap(). This results in something like the following:
[ 0.000000] memblock_reserve: [0xffffffffff2c0000-0xffffffffff2c00e4] efi_tpm_eventlog_init+0x324/0x370
Pass the address from efi like what is done for the TPM events log.
Fixes: c46f3405692d ("tpm: Reserve the TPM final events table") Cc: Matthew Garrett mjg59@google.com Cc: Jarkko Sakkinen jarkko@kernel.org Cc: Bartosz Szczepanek bsz@semihalf.com Cc: Ard Biesheuvel ardb@kernel.org Signed-off-by: Jerry Snitselaar jsnitsel@redhat.com Acked-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/tpm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/tpm.c b/drivers/firmware/efi/tpm.c index 8f665678e9e3..e8d69bd548f3 100644 --- a/drivers/firmware/efi/tpm.c +++ b/drivers/firmware/efi/tpm.c @@ -97,7 +97,7 @@ int __init efi_tpm_eventlog_init(void) goto out_calc; }
- memblock_reserve((unsigned long)final_tbl, + memblock_reserve(efi.tpm_final_log, tbl_size + sizeof(*final_tbl)); efi_tpm_final_log_size = tbl_size;
From: Taniya Das quic_tdas@quicinc.com
[ Upstream commit ffa20aa581cf5377fc397b0d0ff9d67ea823629b ]
There are few GPU clocks which are powering up the memories and thus enable the FORCE_MEM_PERIPH always for these clocks to force the periph_on signal to remain active during halt state of the clock.
Fixes: a3cc092196ef ("clk: qcom: Add Global Clock controller (GCC) driver for SC7280") Fixes: 3e0f01d6c7e7 ("clk: qcom: Add graphics clock controller driver for SC7280") Signed-off-by: Taniya Das quic_tdas@quicinc.com Signed-off-by: Satya Priya quic_c_skakit@quicinc.com Link: https://lore.kernel.org/r/1666159535-6447-1-git-send-email-quic_c_skakit@qui... Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/qcom/gcc-sc7280.c | 1 + drivers/clk/qcom/gpucc-sc7280.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/clk/qcom/gcc-sc7280.c b/drivers/clk/qcom/gcc-sc7280.c index ce7c5ba2b9b7..d10efbf260b7 100644 --- a/drivers/clk/qcom/gcc-sc7280.c +++ b/drivers/clk/qcom/gcc-sc7280.c @@ -3571,6 +3571,7 @@ static int gcc_sc7280_probe(struct platform_device *pdev) regmap_update_bits(regmap, 0x28004, BIT(0), BIT(0)); regmap_update_bits(regmap, 0x28014, BIT(0), BIT(0)); regmap_update_bits(regmap, 0x71004, BIT(0), BIT(0)); + regmap_update_bits(regmap, 0x7100C, BIT(13), BIT(13));
ret = qcom_cc_register_rcg_dfs(regmap, gcc_dfs_clocks, ARRAY_SIZE(gcc_dfs_clocks)); diff --git a/drivers/clk/qcom/gpucc-sc7280.c b/drivers/clk/qcom/gpucc-sc7280.c index 9a832f2bcf49..1490cd45a654 100644 --- a/drivers/clk/qcom/gpucc-sc7280.c +++ b/drivers/clk/qcom/gpucc-sc7280.c @@ -463,6 +463,7 @@ static int gpu_cc_sc7280_probe(struct platform_device *pdev) */ regmap_update_bits(regmap, 0x1170, BIT(0), BIT(0)); regmap_update_bits(regmap, 0x1098, BIT(0), BIT(0)); + regmap_update_bits(regmap, 0x1098, BIT(13), BIT(13));
return qcom_cc_really_probe(pdev, &gpu_cc_sc7280_desc, regmap); }
From: Tim Harvey tharvey@gateworks.com
[ Upstream commit bb5ad73941dc3f4e3c2241348f385da6501d50ea ]
The GW5910 and GW5913 have a user pushbutton that is tied to the Gateworks System Controller GPIO offset 2. Fix the invalid offset of 0.
Fixes: 64bf0a0af18d ("ARM: dts: imx6qdl-gw: add Gateworks System Controller support") Signed-off-by: Tim Harvey tharvey@gateworks.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/imx6qdl-gw5910.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-gw5913.dtsi | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/boot/dts/imx6qdl-gw5910.dtsi b/arch/arm/boot/dts/imx6qdl-gw5910.dtsi index 68e5ab2e27e2..6bb4855d13ce 100644 --- a/arch/arm/boot/dts/imx6qdl-gw5910.dtsi +++ b/arch/arm/boot/dts/imx6qdl-gw5910.dtsi @@ -29,7 +29,7 @@ gpio-keys {
user-pb { label = "user_pb"; - gpios = <&gsc_gpio 0 GPIO_ACTIVE_LOW>; + gpios = <&gsc_gpio 2 GPIO_ACTIVE_LOW>; linux,code = <BTN_0>; };
diff --git a/arch/arm/boot/dts/imx6qdl-gw5913.dtsi b/arch/arm/boot/dts/imx6qdl-gw5913.dtsi index 8e23cec7149e..696427b487f0 100644 --- a/arch/arm/boot/dts/imx6qdl-gw5913.dtsi +++ b/arch/arm/boot/dts/imx6qdl-gw5913.dtsi @@ -26,7 +26,7 @@ gpio-keys {
user-pb { label = "user_pb"; - gpios = <&gsc_gpio 0 GPIO_ACTIVE_LOW>; + gpios = <&gsc_gpio 2 GPIO_ACTIVE_LOW>; linux,code = <BTN_0>; };
From: Peng Fan peng.fan@nxp.com
[ Upstream commit 06acb824d7d00a30e9400f67eee481b218371b5a ]
Per bindings/mmc/fsl-imx-esdhc.yaml, the clock order is ipg, ahb, per, otherwise warning: " mmc@5b020000: clock-names:1: 'ahb' was expected mmc@5b020000: clock-names:2: 'per' was expected "
Fixes: 16c4ea7501b1 ("arm64: dts: imx8: switch to new lpcg clock binding") Signed-off-by: Peng Fan peng.fan@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi index a79f42a9618e..639220dbff00 100644 --- a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi @@ -38,9 +38,9 @@ usdhc1: mmc@5b010000 { interrupts = <GIC_SPI 232 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b010000 0x10000>; clocks = <&sdhc0_lpcg IMX_LPCG_CLK_4>, - <&sdhc0_lpcg IMX_LPCG_CLK_5>, - <&sdhc0_lpcg IMX_LPCG_CLK_0>; - clock-names = "ipg", "per", "ahb"; + <&sdhc0_lpcg IMX_LPCG_CLK_0>, + <&sdhc0_lpcg IMX_LPCG_CLK_5>; + clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_0>; status = "disabled"; }; @@ -49,9 +49,9 @@ usdhc2: mmc@5b020000 { interrupts = <GIC_SPI 233 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b020000 0x10000>; clocks = <&sdhc1_lpcg IMX_LPCG_CLK_4>, - <&sdhc1_lpcg IMX_LPCG_CLK_5>, - <&sdhc1_lpcg IMX_LPCG_CLK_0>; - clock-names = "ipg", "per", "ahb"; + <&sdhc1_lpcg IMX_LPCG_CLK_0>, + <&sdhc1_lpcg IMX_LPCG_CLK_5>; + clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_1>; fsl,tuning-start-tap = <20>; fsl,tuning-step= <2>; @@ -62,9 +62,9 @@ usdhc3: mmc@5b030000 { interrupts = <GIC_SPI 234 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b030000 0x10000>; clocks = <&sdhc2_lpcg IMX_LPCG_CLK_4>, - <&sdhc2_lpcg IMX_LPCG_CLK_5>, - <&sdhc2_lpcg IMX_LPCG_CLK_0>; - clock-names = "ipg", "per", "ahb"; + <&sdhc2_lpcg IMX_LPCG_CLK_0>, + <&sdhc2_lpcg IMX_LPCG_CLK_5>; + clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_2>; status = "disabled"; };
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit c126a0abc5dadd7df236f20aae6d8c3d103f095c ]
Up until now, the external MDIO controller frequency values relied either on the default ones out of reset or on those setup by u-boot. Let's just properly specify the MDC frequency in the DTS so that even without u-boot's intervention Linux can drive the MDIO bus.
Fixes: 6e1b8fae892d ("arm64: dts: lx2160a: add emdio1 node") Fixes: 5705b9dcda57 ("arm64: dts: lx2160a: add emdio2 node") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi index 51c4f61007cd..1bc7f538f690 100644 --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi @@ -1369,6 +1369,9 @@ emdio1: mdio@8b96000 { #address-cells = <1>; #size-cells = <0>; little-endian; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
@@ -1379,6 +1382,9 @@ emdio2: mdio@8b97000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit d78a57426e64fc4c61e6189e450a0432d24536ca ]
Up until now, the external MDIO controller frequency values relied either on the default ones out of reset or on those setup by u-boot. Let's just properly specify the MDC frequency in the DTS so that even without u-boot's intervention Linux can drive the MDIO bus.
Fixes: bbe75af7b092 ("arm64: dts: ls1088a: add external MDIO device nodes") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi index 605072317243..63441028622a 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi @@ -758,6 +758,9 @@ emdio1: mdio@8b96000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(1)>; status = "disabled"; };
@@ -767,6 +770,9 @@ emdio2: mdio@8b97000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(1)>; status = "disabled"; };
From: Ioana Ciornei ioana.ciornei@nxp.com
[ Upstream commit d5c921a53c80dfa942f6dff36253db5a50775a5f ]
Up until now, the external MDIO controller frequency values relied either on the default ones out of reset or on those setup by u-boot. Let's just properly specify the MDC frequency in the DTS so that even without u-boot's intervention Linux can drive the MDIO bus.
Fixes: 0420dde30a90 ("arm64: dts: ls208xa: add the external MDIO nodes") Signed-off-by: Ioana Ciornei ioana.ciornei@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi index 1282b61da8a5..12e59777363f 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi @@ -525,6 +525,9 @@ emdio1: mdio@8b96000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
@@ -534,6 +537,9 @@ emdio2: mdio@8b97000 { little-endian; #address-cells = <1>; #size-cells = <0>; + clock-frequency = <2500000>; + clocks = <&clockgen QORIQ_CLK_PLATFORM_PLL + QORIQ_CLK_PLL_DIV(2)>; status = "disabled"; };
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit fa81cbafbf5764ad5053512152345fab37a1fe18 ]
kmemleak reported memory leaks in device_add_disk():
kmemleak: 3 new suspected memory leaks
unreferenced object 0xffff88800f420800 (size 512): comm "modprobe", pid 4275, jiffies 4295639067 (age 223.512s) hex dump (first 32 bytes): 04 00 00 00 08 00 00 00 01 00 00 00 00 00 00 00 ................ 00 e1 f5 05 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d3662699>] kmalloc_trace+0x26/0x60 [<00000000edc7aadc>] wbt_init+0x50/0x6f0 [<0000000069601d16>] wbt_enable_default+0x157/0x1c0 [<0000000028fc393f>] blk_register_queue+0x2a4/0x420 [<000000007345a042>] device_add_disk+0x6fd/0xe40 [<0000000060e6aab0>] nbd_dev_add+0x828/0xbf0 [nbd] ...
It is because the memory allocated in wbt_enable_default() is not released in device_add_disk() error path. Normally, these memory are freed in:
del_gendisk() rq_qos_exit() rqos->ops->exit(rqos); wbt_exit()
So rq_qos_exit() is called to free the rq_wb memory for wbt_init(). However in the error path of device_add_disk(), only blk_unregister_queue() is called and make rq_wb memory leaked.
Add rq_qos_exit() to the error path to fix it.
Fixes: 83cbce957446 ("block: add error handling for device_add_disk / add_disk") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20221029071355.35462-1-chenzhongjin@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/genhd.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/genhd.c b/block/genhd.c index 74e19d67ceab..68065189ca17 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -527,6 +527,7 @@ int device_add_disk(struct device *parent, struct gendisk *disk, bdi_unregister(disk->bdi); out_unregister_queue: blk_unregister_queue(disk); + rq_qos_exit(disk->queue); out_put_slave_dir: kobject_put(disk->slave_dir); out_put_holder_dir:
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit fd96fbc8fad35d6b1872c90df8a2f5d721f14d91 ]
Suppress the capability to unbind the core SCMI driver since all the SCMI stack protocol drivers depend on it.
Fixes: aa4f886f3893 ("firmware: arm_scmi: add basic driver infrastructure for SCMI") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-2-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/driver.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index e815b8f98739..2c1d5f943dd3 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -2009,6 +2009,7 @@ MODULE_DEVICE_TABLE(of, scmi_of_match); static struct platform_driver scmi_driver = { .driver = { .name = "arm-scmi", + .suppress_bind_attrs = true, .of_match_table = scmi_of_match, .dev_groups = versions_groups, },
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit be9ba1f7f9e0b565b19f4294f5871da9d654bc6d ]
SCMI Rx channels are optional and they can fail to be setup when not present but anyway channels setup routines must bail-out on memory errors.
Make channels setup, and related probing, fail when memory errors are reported on Rx channels.
Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of the transport type") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-4-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/driver.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index 2c1d5f943dd3..16569af4a2ba 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -1516,8 +1516,12 @@ scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id) { int ret = scmi_chan_setup(info, dev, prot_id, true);
- if (!ret) /* Rx is optional, hence no error check */ - scmi_chan_setup(info, dev, prot_id, false); + if (!ret) { + /* Rx is optional, report only memory errors */ + ret = scmi_chan_setup(info, dev, prot_id, false); + if (ret && ret != -ENOMEM) + ret = 0; + }
return ret; }
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit 5ffc1c4cb896f8d2cf10309422da3633a616d60f ]
SCMI virtio transport device managed allocations must use the main platform device in devres operations instead of the channel devices.
Cc: Peter Hilber peter.hilber@opensynergy.com Fixes: 46abe13b5e3d ("firmware: arm_scmi: Add virtio transport") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20221028140833.280091-5-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/virtio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/arm_scmi/virtio.c b/drivers/firmware/arm_scmi/virtio.c index 87039c5c03fd..0c351eeee746 100644 --- a/drivers/firmware/arm_scmi/virtio.c +++ b/drivers/firmware/arm_scmi/virtio.c @@ -247,19 +247,19 @@ static int virtio_chan_setup(struct scmi_chan_info *cinfo, struct device *dev, for (i = 0; i < vioch->max_msg; i++) { struct scmi_vio_msg *msg;
- msg = devm_kzalloc(cinfo->dev, sizeof(*msg), GFP_KERNEL); + msg = devm_kzalloc(dev, sizeof(*msg), GFP_KERNEL); if (!msg) return -ENOMEM;
if (tx) { - msg->request = devm_kzalloc(cinfo->dev, + msg->request = devm_kzalloc(dev, VIRTIO_SCMI_MAX_PDU_SIZE, GFP_KERNEL); if (!msg->request) return -ENOMEM; }
- msg->input = devm_kzalloc(cinfo->dev, VIRTIO_SCMI_MAX_PDU_SIZE, + msg->input = devm_kzalloc(dev, VIRTIO_SCMI_MAX_PDU_SIZE, GFP_KERNEL); if (!msg->input) return -ENOMEM;
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit c4a7b9b587ca1bb4678d48d8be7132492b23a81c ]
When thermnal zones are defined, trip points definitions are mandatory. Define a couple of critical trip points for monitoring of existing PMIC and SOC thermal zones.
This was lost between txt to yaml conversion and was re-enforced recently via the commit 8c596324232d ("dt-bindings: thermal: Fix missing required property")
Cc: Rob Herring robh+dt@kernel.org Cc: Krzysztof Kozlowski krzysztof.kozlowski+dt@linaro.org Cc: devicetree@vger.kernel.org Signed-off-by: Cristian Marussi cristian.marussi@arm.com Fixes: f7b636a8d83c ("arm64: dts: juno: add thermal zones for scpi sensors") Link: https://lore.kernel.org/r/20221028140833.280091-8-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/arm/juno-base.dtsi | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 34e5549ea748..a00b0f14c222 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -597,12 +597,26 @@ pmic { polling-delay = <1000>; polling-delay-passive = <100>; thermal-sensors = <&scpi_sensors0 0>; + trips { + pmic_crit0: trip0 { + temperature = <90000>; + hysteresis = <2000>; + type = "critical"; + }; + }; };
soc { polling-delay = <1000>; polling-delay-passive = <100>; thermal-sensors = <&scpi_sensors0 3>; + trips { + soc_crit0: trip0 { + temperature = <80000>; + hysteresis = <2000>; + type = "critical"; + }; + }; };
big_cluster_thermal_zone: big-cluster {
From: Chen Zhongjin chenzhongjin@huawei.com
[ Upstream commit 569bea74c94d37785682b11bab76f557520477cd ]
In piix4_probe(), the piix4 adapter will be registered in:
piix4_probe() piix4_add_adapters_sb800() / piix4_add_adapter() i2c_add_adapter()
Based on the probed device type, piix4_add_adapters_sb800() or single piix4_add_adapter() will be called. For the former case, piix4_adapter_count is set as the number of adapters, while for antoher case it is not set and kept default *zero*.
When piix4 is removed, piix4_remove() removes the adapters added in piix4_probe(), basing on the piix4_adapter_count value. Because the count is zero for the single adapter case, the adapter won't be removed and makes the sources allocated for adapter leaked, such as the i2c client and device.
These sources can still be accessed by i2c or bus and cause problems. An easily reproduced case is that if a new adapter is registered, i2c will get the leaked adapter and try to call smbus_algorithm, which was already freed:
Triggered by: rmmod i2c_piix4 && modprobe max31730
BUG: unable to handle page fault for address: ffffffffc053d860 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 3752 Comm: modprobe Tainted: G Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:i2c_default_probe (drivers/i2c/i2c-core-base.c:2259) i2c_core RSP: 0018:ffff888107477710 EFLAGS: 00000246 ... <TASK> i2c_detect (drivers/i2c/i2c-core-base.c:2302) i2c_core __process_new_driver (drivers/i2c/i2c-core-base.c:1336) i2c_core bus_for_each_dev (drivers/base/bus.c:301) i2c_for_each_dev (drivers/i2c/i2c-core-base.c:1823) i2c_core i2c_register_driver (drivers/i2c/i2c-core-base.c:1861) i2c_core do_one_initcall (init/main.c:1296) do_init_module (kernel/module/main.c:2455) ... </TASK> ---[ end trace 0000000000000000 ]---
Fix this problem by correctly set piix4_adapter_count as 1 for the single adapter so it can be normally removed.
Fixes: 528d53a1592b ("i2c: piix4: Fix probing of reserved ports on AMD Family 16h Model 30h") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Jean Delvare jdelvare@suse.de Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-piix4.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c index 39cb1b7bb865..809fbd014cd6 100644 --- a/drivers/i2c/busses/i2c-piix4.c +++ b/drivers/i2c/busses/i2c-piix4.c @@ -1080,6 +1080,7 @@ static int piix4_probe(struct pci_dev *dev, const struct pci_device_id *id) "", &piix4_main_adapters[0]); if (retval < 0) return retval; + piix4_adapter_count = 1; }
/* Check for auxiliary SMBus on some AMD chipsets */
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit 711f8c3fb3db61897080468586b970c87c61d9e4 upstream.
The Bluetooth spec states that the valid range for SPSM is from 0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A page 1059: Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896 CC: stable@vger.kernel.org Reported-by: Tamás Koczka poprdi@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Tedd Ho-Jeong An tedd.an@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_core.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5813,6 +5813,19 @@ static int l2cap_le_connect_req(struct l BT_DBG("psm 0x%2.2x scid 0x%4.4x mtu %u mps %u", __le16_to_cpu(psm), scid, mtu, mps);
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A + * page 1059: + * + * Valid range: 0x0001-0x00ff + * + * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges + */ + if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) { + result = L2CAP_CR_LE_BAD_PSM; + chan = NULL; + goto response; + } + /* Check if we have socket listening on psm */ pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src, &conn->hcon->dst, LE_LINK); @@ -6001,6 +6014,18 @@ static inline int l2cap_ecred_conn_req(s
psm = req->psm;
+ /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A + * page 1059: + * + * Valid range: 0x0001-0x00ff + * + * Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges + */ + if (!psm || __le16_to_cpu(psm) > L2CAP_PSM_LE_DYN_END) { + result = L2CAP_CR_LE_BAD_PSM; + goto response; + } + BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps);
memset(&pdu, 0, sizeof(pdu));
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit b1a2cd50c0357f243b7435a732b4e62ba3157a2e upstream.
On l2cap_parse_conf_req the variable efs is only initialized if remote_efs has been set.
CVE: CVE-2022-42895 CC: stable@vger.kernel.org Reported-by: Tamás Koczka poprdi@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Tedd Ho-Jeong An tedd.an@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -3764,7 +3764,8 @@ done: l2cap_add_conf_opt(&ptr, L2CAP_CONF_RFC, sizeof(rfc), (unsigned long) &rfc, endptr - ptr);
- if (test_bit(FLAG_EFS_ENABLE, &chan->flags)) { + if (remote_efs && + test_bit(FLAG_EFS_ENABLE, &chan->flags)) { chan->remote_id = efs.id; chan->remote_stype = efs.stype; chan->remote_msdu = le16_to_cpu(efs.msdu);
From: Yu Kuai yukuai3@huawei.com
commit 181490d5321806e537dc5386db5ea640b826bf78 upstream.
If bfq_schedule_dispatch() is called from bfq_idle_slice_timer_body(), then 'bfqd->queued' is read without holding 'bfqd->lock'. This is wrong since it can be wrote concurrently.
Fix the problem by holding 'bfqd->lock' in such case.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Link: https://lore.kernel.org/r/20220513023507.2625717-2-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Cc: Khazhy Kumykov khazhy@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/bfq-iosched.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -461,6 +461,8 @@ static struct bfq_io_cq *bfq_bic_lookup( */ void bfq_schedule_dispatch(struct bfq_data *bfqd) { + lockdep_assert_held(&bfqd->lock); + if (bfqd->queued != 0) { bfq_log(bfqd, "schedule dispatch"); blk_mq_run_hw_queues(bfqd->queue, true); @@ -6745,8 +6747,8 @@ bfq_idle_slice_timer_body(struct bfq_dat bfq_bfqq_expire(bfqd, bfqq, true, reason);
schedule_dispatch: - spin_unlock_irqrestore(&bfqd->lock, flags); bfq_schedule_dispatch(bfqd); + spin_unlock_irqrestore(&bfqd->lock, flags); }
/*
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 7a62ed61367b8fd01bae1e18e30602c25060d824 upstream.
syzbot reported a sequence of memory leaks, and one of them indicated we failed to free a whole sk:
unreferenced object 0xffff8880126e0000 (size 1088): comm "syz-executor419", pid 326, jiffies 4294773607 (age 12.609s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00 ........}....... 01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970 [<0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029 [<00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928 [<00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997 [<0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516 [<00000000da1521e1>] sock_create net/socket.c:1566 [inline] [<00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698 [<000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline] [<000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline] [<000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748 [<000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets, send()ing an OOB skb to each other, and close()ing them without consuming the OOB skbs.
int skpair[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, skpair);
send(skpair[0], "x", 1, MSG_OOB); send(skpair[1], "x", 1, MSG_OOB);
close(skpair[0]); close(skpair[1]);
Currently, we free an OOB skb in unix_sock_destructor() which is called via __sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is called only when sk->sk_wmem_alloc is 0.
In the repro sequences, we do not consume the OOB skb, so both two sk's sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc. Then, no one can consume the OOB skb nor call __sk_free(), and we finally leak the two whole sk.
Thus, we must free the unconsumed OOB skb earlier when close()ing the socket.
Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Anil Altinay aaltinay@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/unix/af_unix.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -504,12 +504,6 @@ static void unix_sock_destructor(struct
skb_queue_purge(&sk->sk_receive_queue);
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif WARN_ON(refcount_read(&sk->sk_wmem_alloc)); WARN_ON(!sk_unhashed(sk)); WARN_ON(sk->sk_socket); @@ -556,6 +550,13 @@ static void unix_release_sock(struct soc
unix_state_unlock(sk);
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + kfree_skb(u->oob_skb); + u->oob_skb = NULL; + } +#endif + wake_up_interruptible_all(&u->peer_wait);
if (skpair != NULL) {
From: Eric Biggers ebiggers@google.com
commit d7e7b9af104c7b389a0c21eb26532511bce4b510 upstream.
The approach of fs/crypto/ internally managing the fscrypt_master_key structs as the payloads of "struct key" objects contained in a "struct key" keyring has outlived its usefulness. The original idea was to simplify the code by reusing code from the keyrings subsystem. However, several issues have arisen that can't easily be resolved:
- When a master key struct is destroyed, blk_crypto_evict_key() must be called on any per-mode keys embedded in it. (This started being the case when inline encryption support was added.) Yet, the keyrings subsystem can arbitrarily delay the destruction of keys, even past the time the filesystem was unmounted. Therefore, currently there is no easy way to call blk_crypto_evict_key() when a master key is destroyed. Currently, this is worked around by holding an extra reference to the filesystem's request_queue(s). But it was overlooked that the request_queue reference is *not* guaranteed to pin the corresponding blk_crypto_profile too; for device-mapper devices that support inline crypto, it doesn't. This can cause a use-after-free.
- When the last inode that was using an incompletely-removed master key is evicted, the master key removal is completed by removing the key struct from the keyring. Currently this is done via key_invalidate(). Yet, key_invalidate() takes the key semaphore. This can deadlock when called from the shrinker, since in fscrypt_ioctl_add_key(), memory is allocated with GFP_KERNEL under the same semaphore.
- More generally, the fact that the keyrings subsystem can arbitrarily delay the destruction of keys (via garbage collection delay, or via random processes getting temporary key references) is undesirable, as it means we can't strictly guarantee that all secrets are ever wiped.
- Doing the master key lookups via the keyrings subsystem results in the key_permission LSM hook being called. fscrypt doesn't want this, as all access control for encrypted files is designed to happen via the files themselves, like any other files. The workaround which SELinux users are using is to change their SELinux policy to grant key search access to all domains. This works, but it is an odd extra step that shouldn't really have to be done.
The fix for all these issues is to change the implementation to what I should have done originally: don't use the keyrings subsystem to keep track of the filesystem's fscrypt_master_key structs. Instead, just store them in a regular kernel data structure, and rework the reference counting, locking, and lifetime accordingly. Retain support for RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free() with fscrypt_sb_delete(), which releases the keys synchronously and runs a bit earlier during unmount, so that block devices are still available.
A side effect of this patch is that neither the master keys themselves nor the filesystem keyrings will be listed in /proc/keys anymore. ("Master key users" and the master key users keyrings will still be listed.) However, this was mostly an implementation detail, and it was intended just for debugging purposes. I don't know of anyone using it.
This patch does *not* change how "master key users" (->mk_users) works; that still uses the keyrings subsystem. That is still needed for key quotas, and changing that isn't necessary to solve the issues listed above. If we decide to change that too, it would be a separate patch.
I've marked this as fixing the original commit that added the fscrypt keyring, but as noted above the most important issue that this patch fixes wasn't introduced until the addition of inline encryption support.
Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/crypto/fscrypt_private.h | 71 ++++-- fs/crypto/hooks.c | 10 fs/crypto/keyring.c | 484 +++++++++++++++++++++++--------------------- fs/crypto/keysetup.c | 81 ++----- fs/crypto/policy.c | 8 fs/super.c | 2 include/linux/fs.h | 2 include/linux/fscrypt.h | 4 8 files changed, 352 insertions(+), 310 deletions(-)
--- a/fs/crypto/fscrypt_private.h +++ b/fs/crypto/fscrypt_private.h @@ -220,7 +220,7 @@ struct fscrypt_info { * will be NULL if the master key was found in a process-subscribed * keyring rather than in the filesystem-level keyring. */ - struct key *ci_master_key; + struct fscrypt_master_key *ci_master_key;
/* * Link in list of inodes that were unlocked with the master key. @@ -431,6 +431,40 @@ struct fscrypt_master_key_secret { struct fscrypt_master_key {
/* + * Back-pointer to the super_block of the filesystem to which this + * master key has been added. Only valid if ->mk_active_refs > 0. + */ + struct super_block *mk_sb; + + /* + * Link in ->mk_sb->s_master_keys->key_hashtable. + * Only valid if ->mk_active_refs > 0. + */ + struct hlist_node mk_node; + + /* Semaphore that protects ->mk_secret and ->mk_users */ + struct rw_semaphore mk_sem; + + /* + * Active and structural reference counts. An active ref guarantees + * that the struct continues to exist, continues to be in the keyring + * ->mk_sb->s_master_keys, and that any embedded subkeys (e.g. + * ->mk_direct_keys) that have been prepared continue to exist. + * A structural ref only guarantees that the struct continues to exist. + * + * There is one active ref associated with ->mk_secret being present, + * and one active ref for each inode in ->mk_decrypted_inodes. + * + * There is one structural ref associated with the active refcount being + * nonzero. Finding a key in the keyring also takes a structural ref, + * which is then held temporarily while the key is operated on. + */ + refcount_t mk_active_refs; + refcount_t mk_struct_refs; + + struct rcu_head mk_rcu_head; + + /* * The secret key material. After FS_IOC_REMOVE_ENCRYPTION_KEY is * executed, this is wiped and no new inodes can be unlocked with this * key; however, there may still be inodes in ->mk_decrypted_inodes @@ -438,7 +472,10 @@ struct fscrypt_master_key { * FS_IOC_REMOVE_ENCRYPTION_KEY can be retried, or * FS_IOC_ADD_ENCRYPTION_KEY can add the secret again. * - * Locking: protected by this master key's key->sem. + * While ->mk_secret is present, one ref in ->mk_active_refs is held. + * + * Locking: protected by ->mk_sem. The manipulation of ->mk_active_refs + * associated with this field is protected by ->mk_sem as well. */ struct fscrypt_master_key_secret mk_secret;
@@ -459,23 +496,13 @@ struct fscrypt_master_key { * * This is NULL for v1 policy keys; those can only be added by root. * - * Locking: in addition to this keyring's own semaphore, this is - * protected by this master key's key->sem, so we can do atomic - * search+insert. It can also be searched without taking any locks, but - * in that case the returned key may have already been removed. + * Locking: protected by ->mk_sem. (We don't just rely on the keyrings + * subsystem semaphore ->mk_users->sem, as we need support for atomic + * search+insert along with proper synchronization with ->mk_secret.) */ struct key *mk_users;
/* - * Length of ->mk_decrypted_inodes, plus one if mk_secret is present. - * Once this goes to 0, the master key is removed from ->s_master_keys. - * The 'struct fscrypt_master_key' will continue to live as long as the - * 'struct key' whose payload it is, but we won't let this reference - * count rise again. - */ - refcount_t mk_refcount; - - /* * List of inodes that were unlocked using this key. This allows the * inodes to be evicted efficiently if the key is removed. */ @@ -500,10 +527,10 @@ static inline bool is_master_key_secret_present(const struct fscrypt_master_key_secret *secret) { /* - * The READ_ONCE() is only necessary for fscrypt_drop_inode() and - * fscrypt_key_describe(). These run in atomic context, so they can't - * take the key semaphore and thus 'secret' can change concurrently - * which would be a data race. But they only need to know whether the + * The READ_ONCE() is only necessary for fscrypt_drop_inode(). + * fscrypt_drop_inode() runs in atomic context, so it can't take the key + * semaphore and thus 'secret' can change concurrently which would be a + * data race. But fscrypt_drop_inode() only need to know whether the * secret *was* present at the time of check, so READ_ONCE() suffices. */ return READ_ONCE(secret->size) != 0; @@ -532,7 +559,11 @@ static inline int master_key_spec_len(co return 0; }
-struct key * +void fscrypt_put_master_key(struct fscrypt_master_key *mk); + +void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk); + +struct fscrypt_master_key * fscrypt_find_master_key(struct super_block *sb, const struct fscrypt_key_specifier *mk_spec);
--- a/fs/crypto/hooks.c +++ b/fs/crypto/hooks.c @@ -5,8 +5,6 @@ * Encryption hooks for higher-level filesystem operations. */
-#include <linux/key.h> - #include "fscrypt_private.h"
/** @@ -142,7 +140,6 @@ int fscrypt_prepare_setflags(struct inod unsigned int oldflags, unsigned int flags) { struct fscrypt_info *ci; - struct key *key; struct fscrypt_master_key *mk; int err;
@@ -158,14 +155,13 @@ int fscrypt_prepare_setflags(struct inod ci = inode->i_crypt_info; if (ci->ci_policy.version != FSCRYPT_POLICY_V2) return -EINVAL; - key = ci->ci_master_key; - mk = key->payload.data[0]; - down_read(&key->sem); + mk = ci->ci_master_key; + down_read(&mk->mk_sem); if (is_master_key_secret_present(&mk->mk_secret)) err = fscrypt_derive_dirhash_key(ci, mk); else err = -ENOKEY; - up_read(&key->sem); + up_read(&mk->mk_sem); return err; } return 0; --- a/fs/crypto/keyring.c +++ b/fs/crypto/keyring.c @@ -18,6 +18,7 @@ * information about these ioctls. */
+#include <asm/unaligned.h> #include <crypto/skcipher.h> #include <linux/key-type.h> #include <linux/random.h> @@ -25,6 +26,18 @@
#include "fscrypt_private.h"
+/* The master encryption keys for a filesystem (->s_master_keys) */ +struct fscrypt_keyring { + /* + * Lock that protects ->key_hashtable. It does *not* protect the + * fscrypt_master_key structs themselves. + */ + spinlock_t lock; + + /* Hash table that maps fscrypt_key_specifier to fscrypt_master_key */ + struct hlist_head key_hashtable[128]; +}; + static void wipe_master_key_secret(struct fscrypt_master_key_secret *secret) { fscrypt_destroy_hkdf(&secret->hkdf); @@ -38,20 +51,70 @@ static void move_master_key_secret(struc memzero_explicit(src, sizeof(*src)); }
-static void free_master_key(struct fscrypt_master_key *mk) +static void fscrypt_free_master_key(struct rcu_head *head) +{ + struct fscrypt_master_key *mk = + container_of(head, struct fscrypt_master_key, mk_rcu_head); + /* + * The master key secret and any embedded subkeys should have already + * been wiped when the last active reference to the fscrypt_master_key + * struct was dropped; doing it here would be unnecessarily late. + * Nevertheless, use kfree_sensitive() in case anything was missed. + */ + kfree_sensitive(mk); +} + +void fscrypt_put_master_key(struct fscrypt_master_key *mk) +{ + if (!refcount_dec_and_test(&mk->mk_struct_refs)) + return; + /* + * No structural references left, so free ->mk_users, and also free the + * fscrypt_master_key struct itself after an RCU grace period ensures + * that concurrent keyring lookups can no longer find it. + */ + WARN_ON(refcount_read(&mk->mk_active_refs) != 0); + key_put(mk->mk_users); + mk->mk_users = NULL; + call_rcu(&mk->mk_rcu_head, fscrypt_free_master_key); +} + +void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk) { + struct super_block *sb = mk->mk_sb; + struct fscrypt_keyring *keyring = sb->s_master_keys; size_t i;
- wipe_master_key_secret(&mk->mk_secret); + if (!refcount_dec_and_test(&mk->mk_active_refs)) + return; + /* + * No active references left, so complete the full removal of this + * fscrypt_master_key struct by removing it from the keyring and + * destroying any subkeys embedded in it. + */ + + spin_lock(&keyring->lock); + hlist_del_rcu(&mk->mk_node); + spin_unlock(&keyring->lock); + + /* + * ->mk_active_refs == 0 implies that ->mk_secret is not present and + * that ->mk_decrypted_inodes is empty. + */ + WARN_ON(is_master_key_secret_present(&mk->mk_secret)); + WARN_ON(!list_empty(&mk->mk_decrypted_inodes));
for (i = 0; i <= FSCRYPT_MODE_MAX; i++) { fscrypt_destroy_prepared_key(&mk->mk_direct_keys[i]); fscrypt_destroy_prepared_key(&mk->mk_iv_ino_lblk_64_keys[i]); fscrypt_destroy_prepared_key(&mk->mk_iv_ino_lblk_32_keys[i]); } + memzero_explicit(&mk->mk_ino_hash_key, + sizeof(mk->mk_ino_hash_key)); + mk->mk_ino_hash_key_initialized = false;
- key_put(mk->mk_users); - kfree_sensitive(mk); + /* Drop the structural ref associated with the active refs. */ + fscrypt_put_master_key(mk); }
static inline bool valid_key_spec(const struct fscrypt_key_specifier *spec) @@ -61,44 +124,6 @@ static inline bool valid_key_spec(const return master_key_spec_len(spec) != 0; }
-static int fscrypt_key_instantiate(struct key *key, - struct key_preparsed_payload *prep) -{ - key->payload.data[0] = (struct fscrypt_master_key *)prep->data; - return 0; -} - -static void fscrypt_key_destroy(struct key *key) -{ - free_master_key(key->payload.data[0]); -} - -static void fscrypt_key_describe(const struct key *key, struct seq_file *m) -{ - seq_puts(m, key->description); - - if (key_is_positive(key)) { - const struct fscrypt_master_key *mk = key->payload.data[0]; - - if (!is_master_key_secret_present(&mk->mk_secret)) - seq_puts(m, ": secret removed"); - } -} - -/* - * Type of key in ->s_master_keys. Each key of this type represents a master - * key which has been added to the filesystem. Its payload is a - * 'struct fscrypt_master_key'. The "." prefix in the key type name prevents - * users from adding keys of this type via the keyrings syscalls rather than via - * the intended method of FS_IOC_ADD_ENCRYPTION_KEY. - */ -static struct key_type key_type_fscrypt = { - .name = "._fscrypt", - .instantiate = fscrypt_key_instantiate, - .destroy = fscrypt_key_destroy, - .describe = fscrypt_key_describe, -}; - static int fscrypt_user_key_instantiate(struct key *key, struct key_preparsed_payload *prep) { @@ -131,32 +156,6 @@ static struct key_type key_type_fscrypt_ .describe = fscrypt_user_key_describe, };
-/* Search ->s_master_keys or ->mk_users */ -static struct key *search_fscrypt_keyring(struct key *keyring, - struct key_type *type, - const char *description) -{ - /* - * We need to mark the keyring reference as "possessed" so that we - * acquire permission to search it, via the KEY_POS_SEARCH permission. - */ - key_ref_t keyref = make_key_ref(keyring, true /* possessed */); - - keyref = keyring_search(keyref, type, description, false); - if (IS_ERR(keyref)) { - if (PTR_ERR(keyref) == -EAGAIN || /* not found */ - PTR_ERR(keyref) == -EKEYREVOKED) /* recently invalidated */ - keyref = ERR_PTR(-ENOKEY); - return ERR_CAST(keyref); - } - return key_ref_to_ptr(keyref); -} - -#define FSCRYPT_FS_KEYRING_DESCRIPTION_SIZE \ - (CONST_STRLEN("fscrypt-") + sizeof_field(struct super_block, s_id)) - -#define FSCRYPT_MK_DESCRIPTION_SIZE (2 * FSCRYPT_KEY_IDENTIFIER_SIZE + 1) - #define FSCRYPT_MK_USERS_DESCRIPTION_SIZE \ (CONST_STRLEN("fscrypt-") + 2 * FSCRYPT_KEY_IDENTIFIER_SIZE + \ CONST_STRLEN("-users") + 1) @@ -164,21 +163,6 @@ static struct key *search_fscrypt_keyrin #define FSCRYPT_MK_USER_DESCRIPTION_SIZE \ (2 * FSCRYPT_KEY_IDENTIFIER_SIZE + CONST_STRLEN(".uid.") + 10 + 1)
-static void format_fs_keyring_description( - char description[FSCRYPT_FS_KEYRING_DESCRIPTION_SIZE], - const struct super_block *sb) -{ - sprintf(description, "fscrypt-%s", sb->s_id); -} - -static void format_mk_description( - char description[FSCRYPT_MK_DESCRIPTION_SIZE], - const struct fscrypt_key_specifier *mk_spec) -{ - sprintf(description, "%*phN", - master_key_spec_len(mk_spec), (u8 *)&mk_spec->u); -} - static void format_mk_users_keyring_description( char description[FSCRYPT_MK_USERS_DESCRIPTION_SIZE], const u8 mk_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) @@ -199,20 +183,15 @@ static void format_mk_user_description( /* Create ->s_master_keys if needed. Synchronized by fscrypt_add_key_mutex. */ static int allocate_filesystem_keyring(struct super_block *sb) { - char description[FSCRYPT_FS_KEYRING_DESCRIPTION_SIZE]; - struct key *keyring; + struct fscrypt_keyring *keyring;
if (sb->s_master_keys) return 0;
- format_fs_keyring_description(description, sb); - keyring = keyring_alloc(description, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, - current_cred(), KEY_POS_SEARCH | - KEY_USR_SEARCH | KEY_USR_READ | KEY_USR_VIEW, - KEY_ALLOC_NOT_IN_QUOTA, NULL, NULL); - if (IS_ERR(keyring)) - return PTR_ERR(keyring); - + keyring = kzalloc(sizeof(*keyring), GFP_KERNEL); + if (!keyring) + return -ENOMEM; + spin_lock_init(&keyring->lock); /* * Pairs with the smp_load_acquire() in fscrypt_find_master_key(). * I.e., here we publish ->s_master_keys with a RELEASE barrier so that @@ -222,21 +201,75 @@ static int allocate_filesystem_keyring(s return 0; }
-void fscrypt_sb_free(struct super_block *sb) +/* + * This is called at unmount time to release all encryption keys that have been + * added to the filesystem, along with the keyring that contains them. + * + * Note that besides clearing and freeing memory, this might need to evict keys + * from the keyslots of an inline crypto engine. Therefore, this must be called + * while the filesystem's underlying block device(s) are still available. + */ +void fscrypt_sb_delete(struct super_block *sb) { - key_put(sb->s_master_keys); + struct fscrypt_keyring *keyring = sb->s_master_keys; + size_t i; + + if (!keyring) + return; + + for (i = 0; i < ARRAY_SIZE(keyring->key_hashtable); i++) { + struct hlist_head *bucket = &keyring->key_hashtable[i]; + struct fscrypt_master_key *mk; + struct hlist_node *tmp; + + hlist_for_each_entry_safe(mk, tmp, bucket, mk_node) { + /* + * Since all inodes were already evicted, every key + * remaining in the keyring should have an empty inode + * list, and should only still be in the keyring due to + * the single active ref associated with ->mk_secret. + * There should be no structural refs beyond the one + * associated with the active ref. + */ + WARN_ON(refcount_read(&mk->mk_active_refs) != 1); + WARN_ON(refcount_read(&mk->mk_struct_refs) != 1); + WARN_ON(!is_master_key_secret_present(&mk->mk_secret)); + wipe_master_key_secret(&mk->mk_secret); + fscrypt_put_master_key_activeref(mk); + } + } + kfree_sensitive(keyring); sb->s_master_keys = NULL; }
+static struct hlist_head * +fscrypt_mk_hash_bucket(struct fscrypt_keyring *keyring, + const struct fscrypt_key_specifier *mk_spec) +{ + /* + * Since key specifiers should be "random" values, it is sufficient to + * use a trivial hash function that just takes the first several bits of + * the key specifier. + */ + unsigned long i = get_unaligned((unsigned long *)&mk_spec->u); + + return &keyring->key_hashtable[i % ARRAY_SIZE(keyring->key_hashtable)]; +} + /* - * Find the specified master key in ->s_master_keys. - * Returns ERR_PTR(-ENOKEY) if not found. + * Find the specified master key struct in ->s_master_keys and take a structural + * ref to it. The structural ref guarantees that the key struct continues to + * exist, but it does *not* guarantee that ->s_master_keys continues to contain + * the key struct. The structural ref needs to be dropped by + * fscrypt_put_master_key(). Returns NULL if the key struct is not found. */ -struct key *fscrypt_find_master_key(struct super_block *sb, - const struct fscrypt_key_specifier *mk_spec) +struct fscrypt_master_key * +fscrypt_find_master_key(struct super_block *sb, + const struct fscrypt_key_specifier *mk_spec) { - struct key *keyring; - char description[FSCRYPT_MK_DESCRIPTION_SIZE]; + struct fscrypt_keyring *keyring; + struct hlist_head *bucket; + struct fscrypt_master_key *mk;
/* * Pairs with the smp_store_release() in allocate_filesystem_keyring(). @@ -246,10 +279,38 @@ struct key *fscrypt_find_master_key(stru */ keyring = smp_load_acquire(&sb->s_master_keys); if (keyring == NULL) - return ERR_PTR(-ENOKEY); /* No keyring yet, so no keys yet. */ + return NULL; /* No keyring yet, so no keys yet. */
- format_mk_description(description, mk_spec); - return search_fscrypt_keyring(keyring, &key_type_fscrypt, description); + bucket = fscrypt_mk_hash_bucket(keyring, mk_spec); + rcu_read_lock(); + switch (mk_spec->type) { + case FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR: + hlist_for_each_entry_rcu(mk, bucket, mk_node) { + if (mk->mk_spec.type == + FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR && + memcmp(mk->mk_spec.u.descriptor, + mk_spec->u.descriptor, + FSCRYPT_KEY_DESCRIPTOR_SIZE) == 0 && + refcount_inc_not_zero(&mk->mk_struct_refs)) + goto out; + } + break; + case FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER: + hlist_for_each_entry_rcu(mk, bucket, mk_node) { + if (mk->mk_spec.type == + FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER && + memcmp(mk->mk_spec.u.identifier, + mk_spec->u.identifier, + FSCRYPT_KEY_IDENTIFIER_SIZE) == 0 && + refcount_inc_not_zero(&mk->mk_struct_refs)) + goto out; + } + break; + } + mk = NULL; +out: + rcu_read_unlock(); + return mk; }
static int allocate_master_key_users_keyring(struct fscrypt_master_key *mk) @@ -277,17 +338,30 @@ static int allocate_master_key_users_key static struct key *find_master_key_user(struct fscrypt_master_key *mk) { char description[FSCRYPT_MK_USER_DESCRIPTION_SIZE]; + key_ref_t keyref;
format_mk_user_description(description, mk->mk_spec.u.identifier); - return search_fscrypt_keyring(mk->mk_users, &key_type_fscrypt_user, - description); + + /* + * We need to mark the keyring reference as "possessed" so that we + * acquire permission to search it, via the KEY_POS_SEARCH permission. + */ + keyref = keyring_search(make_key_ref(mk->mk_users, true /*possessed*/), + &key_type_fscrypt_user, description, false); + if (IS_ERR(keyref)) { + if (PTR_ERR(keyref) == -EAGAIN || /* not found */ + PTR_ERR(keyref) == -EKEYREVOKED) /* recently invalidated */ + keyref = ERR_PTR(-ENOKEY); + return ERR_CAST(keyref); + } + return key_ref_to_ptr(keyref); }
/* * Give the current user a "key" in ->mk_users. This charges the user's quota * and marks the master key as added by the current user, so that it cannot be - * removed by another user with the key. Either the master key's key->sem must - * be held for write, or the master key must be still undergoing initialization. + * removed by another user with the key. Either ->mk_sem must be held for + * write, or the master key must be still undergoing initialization. */ static int add_master_key_user(struct fscrypt_master_key *mk) { @@ -309,7 +383,7 @@ static int add_master_key_user(struct fs
/* * Remove the current user's "key" from ->mk_users. - * The master key's key->sem must be held for write. + * ->mk_sem must be held for write. * * Returns 0 if removed, -ENOKEY if not found, or another -errno code. */ @@ -327,63 +401,49 @@ static int remove_master_key_user(struct }
/* - * Allocate a new fscrypt_master_key which contains the given secret, set it as - * the payload of a new 'struct key' of type fscrypt, and link the 'struct key' - * into the given keyring. Synchronized by fscrypt_add_key_mutex. + * Allocate a new fscrypt_master_key, transfer the given secret over to it, and + * insert it into sb->s_master_keys. */ -static int add_new_master_key(struct fscrypt_master_key_secret *secret, - const struct fscrypt_key_specifier *mk_spec, - struct key *keyring) +static int add_new_master_key(struct super_block *sb, + struct fscrypt_master_key_secret *secret, + const struct fscrypt_key_specifier *mk_spec) { + struct fscrypt_keyring *keyring = sb->s_master_keys; struct fscrypt_master_key *mk; - char description[FSCRYPT_MK_DESCRIPTION_SIZE]; - struct key *key; int err;
mk = kzalloc(sizeof(*mk), GFP_KERNEL); if (!mk) return -ENOMEM;
+ mk->mk_sb = sb; + init_rwsem(&mk->mk_sem); + refcount_set(&mk->mk_struct_refs, 1); mk->mk_spec = *mk_spec;
- move_master_key_secret(&mk->mk_secret, secret); - - refcount_set(&mk->mk_refcount, 1); /* secret is present */ INIT_LIST_HEAD(&mk->mk_decrypted_inodes); spin_lock_init(&mk->mk_decrypted_inodes_lock);
if (mk_spec->type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) { err = allocate_master_key_users_keyring(mk); if (err) - goto out_free_mk; + goto out_put; err = add_master_key_user(mk); if (err) - goto out_free_mk; + goto out_put; }
- /* - * Note that we don't charge this key to anyone's quota, since when - * ->mk_users is in use those keys are charged instead, and otherwise - * (when ->mk_users isn't in use) only root can add these keys. - */ - format_mk_description(description, mk_spec); - key = key_alloc(&key_type_fscrypt, description, - GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(), - KEY_POS_SEARCH | KEY_USR_SEARCH | KEY_USR_VIEW, - KEY_ALLOC_NOT_IN_QUOTA, NULL); - if (IS_ERR(key)) { - err = PTR_ERR(key); - goto out_free_mk; - } - err = key_instantiate_and_link(key, mk, sizeof(*mk), keyring, NULL); - key_put(key); - if (err) - goto out_free_mk; + move_master_key_secret(&mk->mk_secret, secret); + refcount_set(&mk->mk_active_refs, 1); /* ->mk_secret is present */
+ spin_lock(&keyring->lock); + hlist_add_head_rcu(&mk->mk_node, + fscrypt_mk_hash_bucket(keyring, mk_spec)); + spin_unlock(&keyring->lock); return 0;
-out_free_mk: - free_master_key(mk); +out_put: + fscrypt_put_master_key(mk); return err; }
@@ -392,42 +452,34 @@ out_free_mk: static int add_existing_master_key(struct fscrypt_master_key *mk, struct fscrypt_master_key_secret *secret) { - struct key *mk_user; - bool rekey; int err;
/* * If the current user is already in ->mk_users, then there's nothing to - * do. (Not applicable for v1 policy keys, which have NULL ->mk_users.) + * do. Otherwise, we need to add the user to ->mk_users. (Neither is + * applicable for v1 policy keys, which have NULL ->mk_users.) */ if (mk->mk_users) { - mk_user = find_master_key_user(mk); + struct key *mk_user = find_master_key_user(mk); + if (mk_user != ERR_PTR(-ENOKEY)) { if (IS_ERR(mk_user)) return PTR_ERR(mk_user); key_put(mk_user); return 0; } - } - - /* If we'll be re-adding ->mk_secret, try to take the reference. */ - rekey = !is_master_key_secret_present(&mk->mk_secret); - if (rekey && !refcount_inc_not_zero(&mk->mk_refcount)) - return KEY_DEAD; - - /* Add the current user to ->mk_users, if applicable. */ - if (mk->mk_users) { err = add_master_key_user(mk); - if (err) { - if (rekey && refcount_dec_and_test(&mk->mk_refcount)) - return KEY_DEAD; + if (err) return err; - } }
/* Re-add the secret if needed. */ - if (rekey) + if (!is_master_key_secret_present(&mk->mk_secret)) { + if (!refcount_inc_not_zero(&mk->mk_active_refs)) + return KEY_DEAD; move_master_key_secret(&mk->mk_secret, secret); + } + return 0; }
@@ -436,38 +488,36 @@ static int do_add_master_key(struct supe const struct fscrypt_key_specifier *mk_spec) { static DEFINE_MUTEX(fscrypt_add_key_mutex); - struct key *key; + struct fscrypt_master_key *mk; int err;
mutex_lock(&fscrypt_add_key_mutex); /* serialize find + link */ -retry: - key = fscrypt_find_master_key(sb, mk_spec); - if (IS_ERR(key)) { - err = PTR_ERR(key); - if (err != -ENOKEY) - goto out_unlock; + + mk = fscrypt_find_master_key(sb, mk_spec); + if (!mk) { /* Didn't find the key in ->s_master_keys. Add it. */ err = allocate_filesystem_keyring(sb); - if (err) - goto out_unlock; - err = add_new_master_key(secret, mk_spec, sb->s_master_keys); + if (!err) + err = add_new_master_key(sb, secret, mk_spec); } else { /* * Found the key in ->s_master_keys. Re-add the secret if * needed, and add the user to ->mk_users if needed. */ - down_write(&key->sem); - err = add_existing_master_key(key->payload.data[0], secret); - up_write(&key->sem); + down_write(&mk->mk_sem); + err = add_existing_master_key(mk, secret); + up_write(&mk->mk_sem); if (err == KEY_DEAD) { - /* Key being removed or needs to be removed */ - key_invalidate(key); - key_put(key); - goto retry; + /* + * We found a key struct, but it's already been fully + * removed. Ignore the old struct and add a new one. + * fscrypt_add_key_mutex means we don't need to worry + * about concurrent adds. + */ + err = add_new_master_key(sb, secret, mk_spec); } - key_put(key); + fscrypt_put_master_key(mk); } -out_unlock: mutex_unlock(&fscrypt_add_key_mutex); return err; } @@ -731,19 +781,19 @@ int fscrypt_verify_key_added(struct supe const u8 identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) { struct fscrypt_key_specifier mk_spec; - struct key *key, *mk_user; struct fscrypt_master_key *mk; + struct key *mk_user; int err;
mk_spec.type = FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER; memcpy(mk_spec.u.identifier, identifier, FSCRYPT_KEY_IDENTIFIER_SIZE);
- key = fscrypt_find_master_key(sb, &mk_spec); - if (IS_ERR(key)) { - err = PTR_ERR(key); + mk = fscrypt_find_master_key(sb, &mk_spec); + if (!mk) { + err = -ENOKEY; goto out; } - mk = key->payload.data[0]; + down_read(&mk->mk_sem); mk_user = find_master_key_user(mk); if (IS_ERR(mk_user)) { err = PTR_ERR(mk_user); @@ -751,7 +801,8 @@ int fscrypt_verify_key_added(struct supe key_put(mk_user); err = 0; } - key_put(key); + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); out: if (err == -ENOKEY && capable(CAP_FOWNER)) err = 0; @@ -913,11 +964,10 @@ static int do_remove_key(struct file *fi struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_remove_key_arg __user *uarg = _uarg; struct fscrypt_remove_key_arg arg; - struct key *key; struct fscrypt_master_key *mk; u32 status_flags = 0; int err; - bool dead; + bool inodes_remain;
if (copy_from_user(&arg, uarg, sizeof(arg))) return -EFAULT; @@ -937,12 +987,10 @@ static int do_remove_key(struct file *fi return -EACCES;
/* Find the key being removed. */ - key = fscrypt_find_master_key(sb, &arg.key_spec); - if (IS_ERR(key)) - return PTR_ERR(key); - mk = key->payload.data[0]; - - down_write(&key->sem); + mk = fscrypt_find_master_key(sb, &arg.key_spec); + if (!mk) + return -ENOKEY; + down_write(&mk->mk_sem);
/* If relevant, remove current user's (or all users) claim to the key */ if (mk->mk_users && mk->mk_users->keys.nr_leaves_on_tree != 0) { @@ -951,7 +999,7 @@ static int do_remove_key(struct file *fi else err = remove_master_key_user(mk); if (err) { - up_write(&key->sem); + up_write(&mk->mk_sem); goto out_put_key; } if (mk->mk_users->keys.nr_leaves_on_tree != 0) { @@ -963,26 +1011,22 @@ static int do_remove_key(struct file *fi status_flags |= FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS; err = 0; - up_write(&key->sem); + up_write(&mk->mk_sem); goto out_put_key; } }
/* No user claims remaining. Go ahead and wipe the secret. */ - dead = false; + err = -ENOKEY; if (is_master_key_secret_present(&mk->mk_secret)) { wipe_master_key_secret(&mk->mk_secret); - dead = refcount_dec_and_test(&mk->mk_refcount); - } - up_write(&key->sem); - if (dead) { - /* - * No inodes reference the key, and we wiped the secret, so the - * key object is free to be removed from the keyring. - */ - key_invalidate(key); + fscrypt_put_master_key_activeref(mk); err = 0; - } else { + } + inodes_remain = refcount_read(&mk->mk_active_refs) > 0; + up_write(&mk->mk_sem); + + if (inodes_remain) { /* Some inodes still reference this key; try to evict them. */ err = try_to_lock_encrypted_files(sb, mk); if (err == -EBUSY) { @@ -998,7 +1042,7 @@ static int do_remove_key(struct file *fi * has been fully removed including all files locked. */ out_put_key: - key_put(key); + fscrypt_put_master_key(mk); if (err == 0) err = put_user(status_flags, &uarg->removal_status_flags); return err; @@ -1045,7 +1089,6 @@ int fscrypt_ioctl_get_key_status(struct { struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_get_key_status_arg arg; - struct key *key; struct fscrypt_master_key *mk; int err;
@@ -1062,19 +1105,18 @@ int fscrypt_ioctl_get_key_status(struct arg.user_count = 0; memset(arg.__out_reserved, 0, sizeof(arg.__out_reserved));
- key = fscrypt_find_master_key(sb, &arg.key_spec); - if (IS_ERR(key)) { - if (key != ERR_PTR(-ENOKEY)) - return PTR_ERR(key); + mk = fscrypt_find_master_key(sb, &arg.key_spec); + if (!mk) { arg.status = FSCRYPT_KEY_STATUS_ABSENT; err = 0; goto out; } - mk = key->payload.data[0]; - down_read(&key->sem); + down_read(&mk->mk_sem);
if (!is_master_key_secret_present(&mk->mk_secret)) { - arg.status = FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED; + arg.status = refcount_read(&mk->mk_active_refs) > 0 ? + FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED : + FSCRYPT_KEY_STATUS_ABSENT /* raced with full removal */; err = 0; goto out_release_key; } @@ -1096,8 +1138,8 @@ int fscrypt_ioctl_get_key_status(struct } err = 0; out_release_key: - up_read(&key->sem); - key_put(key); + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); out: if (!err && copy_to_user(uarg, &arg, sizeof(arg))) err = -EFAULT; @@ -1109,13 +1151,9 @@ int __init fscrypt_init_keyring(void) { int err;
- err = register_key_type(&key_type_fscrypt); - if (err) - return err; - err = register_key_type(&key_type_fscrypt_user); if (err) - goto err_unregister_fscrypt; + return err;
err = register_key_type(&key_type_fscrypt_provisioning); if (err) @@ -1125,7 +1163,5 @@ int __init fscrypt_init_keyring(void)
err_unregister_fscrypt_user: unregister_key_type(&key_type_fscrypt_user); -err_unregister_fscrypt: - unregister_key_type(&key_type_fscrypt); return err; } --- a/fs/crypto/keysetup.c +++ b/fs/crypto/keysetup.c @@ -9,7 +9,6 @@ */
#include <crypto/skcipher.h> -#include <linux/key.h> #include <linux/random.h>
#include "fscrypt_private.h" @@ -151,6 +150,7 @@ void fscrypt_destroy_prepared_key(struct { crypto_free_skcipher(prep_key->tfm); fscrypt_destroy_inline_crypt_key(prep_key); + memzero_explicit(prep_key, sizeof(*prep_key)); }
/* Given a per-file encryption key, set up the file's crypto transform object */ @@ -404,20 +404,18 @@ static bool fscrypt_valid_master_key_siz /* * Find the master key, then set up the inode's actual encryption key. * - * If the master key is found in the filesystem-level keyring, then the - * corresponding 'struct key' is returned in *master_key_ret with its semaphore - * read-locked. This is needed to ensure that only one task links the - * fscrypt_info into ->mk_decrypted_inodes (as multiple tasks may race to create - * an fscrypt_info for the same inode), and to synchronize the master key being - * removed with a new inode starting to use it. + * If the master key is found in the filesystem-level keyring, then it is + * returned in *mk_ret with its semaphore read-locked. This is needed to ensure + * that only one task links the fscrypt_info into ->mk_decrypted_inodes (as + * multiple tasks may race to create an fscrypt_info for the same inode), and to + * synchronize the master key being removed with a new inode starting to use it. */ static int setup_file_encryption_key(struct fscrypt_info *ci, bool need_dirhash_key, - struct key **master_key_ret) + struct fscrypt_master_key **mk_ret) { - struct key *key; - struct fscrypt_master_key *mk = NULL; struct fscrypt_key_specifier mk_spec; + struct fscrypt_master_key *mk; int err;
err = fscrypt_select_encryption_impl(ci); @@ -442,11 +440,10 @@ static int setup_file_encryption_key(str return -EINVAL; }
- key = fscrypt_find_master_key(ci->ci_inode->i_sb, &mk_spec); - if (IS_ERR(key)) { - if (key != ERR_PTR(-ENOKEY) || - ci->ci_policy.version != FSCRYPT_POLICY_V1) - return PTR_ERR(key); + mk = fscrypt_find_master_key(ci->ci_inode->i_sb, &mk_spec); + if (!mk) { + if (ci->ci_policy.version != FSCRYPT_POLICY_V1) + return -ENOKEY;
/* * As a legacy fallback for v1 policies, search for the key in @@ -456,9 +453,7 @@ static int setup_file_encryption_key(str */ return fscrypt_setup_v1_file_key_via_subscribed_keyrings(ci); } - - mk = key->payload.data[0]; - down_read(&key->sem); + down_read(&mk->mk_sem);
/* Has the secret been removed (via FS_IOC_REMOVE_ENCRYPTION_KEY)? */ if (!is_master_key_secret_present(&mk->mk_secret)) { @@ -486,18 +481,18 @@ static int setup_file_encryption_key(str if (err) goto out_release_key;
- *master_key_ret = key; + *mk_ret = mk; return 0;
out_release_key: - up_read(&key->sem); - key_put(key); + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); return err; }
static void put_crypt_info(struct fscrypt_info *ci) { - struct key *key; + struct fscrypt_master_key *mk;
if (!ci) return; @@ -507,24 +502,18 @@ static void put_crypt_info(struct fscryp else if (ci->ci_owns_key) fscrypt_destroy_prepared_key(&ci->ci_enc_key);
- key = ci->ci_master_key; - if (key) { - struct fscrypt_master_key *mk = key->payload.data[0]; - + mk = ci->ci_master_key; + if (mk) { /* * Remove this inode from the list of inodes that were unlocked - * with the master key. - * - * In addition, if we're removing the last inode from a key that - * already had its secret removed, invalidate the key so that it - * gets removed from ->s_master_keys. + * with the master key. In addition, if we're removing the last + * inode from a master key struct that already had its secret + * removed, then complete the full removal of the struct. */ spin_lock(&mk->mk_decrypted_inodes_lock); list_del(&ci->ci_master_key_link); spin_unlock(&mk->mk_decrypted_inodes_lock); - if (refcount_dec_and_test(&mk->mk_refcount)) - key_invalidate(key); - key_put(key); + fscrypt_put_master_key_activeref(mk); } memzero_explicit(ci, sizeof(*ci)); kmem_cache_free(fscrypt_info_cachep, ci); @@ -538,7 +527,7 @@ fscrypt_setup_encryption_info(struct ino { struct fscrypt_info *crypt_info; struct fscrypt_mode *mode; - struct key *master_key = NULL; + struct fscrypt_master_key *mk = NULL; int res;
res = fscrypt_initialize(inode->i_sb->s_cop->flags); @@ -561,8 +550,7 @@ fscrypt_setup_encryption_info(struct ino WARN_ON(mode->ivsize > FSCRYPT_MAX_IV_SIZE); crypt_info->ci_mode = mode;
- res = setup_file_encryption_key(crypt_info, need_dirhash_key, - &master_key); + res = setup_file_encryption_key(crypt_info, need_dirhash_key, &mk); if (res) goto out;
@@ -577,12 +565,9 @@ fscrypt_setup_encryption_info(struct ino * We won the race and set ->i_crypt_info to our crypt_info. * Now link it into the master key's inode list. */ - if (master_key) { - struct fscrypt_master_key *mk = - master_key->payload.data[0]; - - refcount_inc(&mk->mk_refcount); - crypt_info->ci_master_key = key_get(master_key); + if (mk) { + crypt_info->ci_master_key = mk; + refcount_inc(&mk->mk_active_refs); spin_lock(&mk->mk_decrypted_inodes_lock); list_add(&crypt_info->ci_master_key_link, &mk->mk_decrypted_inodes); @@ -592,9 +577,9 @@ fscrypt_setup_encryption_info(struct ino } res = 0; out: - if (master_key) { - up_read(&master_key->sem); - key_put(master_key); + if (mk) { + up_read(&mk->mk_sem); + fscrypt_put_master_key(mk); } put_crypt_info(crypt_info); return res; @@ -759,7 +744,6 @@ EXPORT_SYMBOL(fscrypt_free_inode); int fscrypt_drop_inode(struct inode *inode) { const struct fscrypt_info *ci = fscrypt_get_info(inode); - const struct fscrypt_master_key *mk;
/* * If ci is NULL, then the inode doesn't have an encryption key set up @@ -769,7 +753,6 @@ int fscrypt_drop_inode(struct inode *ino */ if (!ci || !ci->ci_master_key) return 0; - mk = ci->ci_master_key->payload.data[0];
/* * With proper, non-racy use of FS_IOC_REMOVE_ENCRYPTION_KEY, all inodes @@ -788,6 +771,6 @@ int fscrypt_drop_inode(struct inode *ino * then the thread removing the key will either evict the inode itself * or will correctly detect that it wasn't evicted due to the race. */ - return !is_master_key_secret_present(&mk->mk_secret); + return !is_master_key_secret_present(&ci->ci_master_key->mk_secret); } EXPORT_SYMBOL_GPL(fscrypt_drop_inode); --- a/fs/crypto/policy.c +++ b/fs/crypto/policy.c @@ -692,12 +692,8 @@ int fscrypt_set_context(struct inode *in * delayed key setup that requires the inode number. */ if (ci->ci_policy.version == FSCRYPT_POLICY_V2 && - (ci->ci_policy.v2.flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32)) { - const struct fscrypt_master_key *mk = - ci->ci_master_key->payload.data[0]; - - fscrypt_hash_inode_number(ci, mk); - } + (ci->ci_policy.v2.flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32)) + fscrypt_hash_inode_number(ci, ci->ci_master_key);
return inode->i_sb->s_cop->set_context(inode, &ctx, ctxsize, fs_data); } --- a/fs/super.c +++ b/fs/super.c @@ -293,7 +293,6 @@ static void __put_super(struct super_blo WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); security_sb_free(s); - fscrypt_sb_free(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); call_rcu(&s->rcu, destroy_super_rcu); @@ -454,6 +453,7 @@ void generic_shutdown_super(struct super evict_inodes(sb); /* only nonzero refcount inodes can have marks */ fsnotify_sb_delete(sb); + fscrypt_sb_delete(sb); security_sb_delete(sb);
if (sb->s_dio_done_wq) { --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1487,7 +1487,7 @@ struct super_block { const struct xattr_handler **s_xattr; #ifdef CONFIG_FS_ENCRYPTION const struct fscrypt_operations *s_cop; - struct key *s_master_keys; /* master crypto keys in use */ + struct fscrypt_keyring *s_master_keys; /* master crypto keys in use */ #endif #ifdef CONFIG_FS_VERITY const struct fsverity_operations *s_vop; --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -294,7 +294,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -void fscrypt_sb_free(struct super_block *sb); +void fscrypt_sb_delete(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg); int fscrypt_ioctl_remove_key_all_users(struct file *filp, void __user *arg); @@ -482,7 +482,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -static inline void fscrypt_sb_free(struct super_block *sb) +static inline void fscrypt_sb_delete(struct super_block *sb) { }
From: Eric Biggers ebiggers@google.com
commit ccd30a476f8e864732de220bd50e6f372f5ebcab upstream.
Commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") moved the keyring destruction from __put_super() to generic_shutdown_super() so that the filesystem's block device(s) are still available. Unfortunately, this causes a memory leak in the case where a mount is attempted with the test_dummy_encryption mount option, but the mount fails after the option has already been processed.
To fix this, attempt the keyring destruction in both places.
Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/crypto/keyring.c | 17 +++++++++++------ fs/super.c | 3 ++- include/linux/fscrypt.h | 4 ++-- 3 files changed, 15 insertions(+), 9 deletions(-)
--- a/fs/crypto/keyring.c +++ b/fs/crypto/keyring.c @@ -202,14 +202,19 @@ static int allocate_filesystem_keyring(s }
/* - * This is called at unmount time to release all encryption keys that have been - * added to the filesystem, along with the keyring that contains them. + * Release all encryption keys that have been added to the filesystem, along + * with the keyring that contains them. * - * Note that besides clearing and freeing memory, this might need to evict keys - * from the keyslots of an inline crypto engine. Therefore, this must be called - * while the filesystem's underlying block device(s) are still available. + * This is called at unmount time. The filesystem's underlying block device(s) + * are still available at this time; this is important because after user file + * accesses have been allowed, this function may need to evict keys from the + * keyslots of an inline crypto engine, which requires the block device(s). + * + * This is also called when the super_block is being freed. This is needed to + * avoid a memory leak if mounting fails after the "test_dummy_encryption" + * option was processed, as in that case the unmount-time call isn't made. */ -void fscrypt_sb_delete(struct super_block *sb) +void fscrypt_destroy_keyring(struct super_block *sb) { struct fscrypt_keyring *keyring = sb->s_master_keys; size_t i; --- a/fs/super.c +++ b/fs/super.c @@ -293,6 +293,7 @@ static void __put_super(struct super_blo WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); security_sb_free(s); + fscrypt_destroy_keyring(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); call_rcu(&s->rcu, destroy_super_rcu); @@ -453,7 +454,7 @@ void generic_shutdown_super(struct super evict_inodes(sb); /* only nonzero refcount inodes can have marks */ fsnotify_sb_delete(sb); - fscrypt_sb_delete(sb); + fscrypt_destroy_keyring(sb); security_sb_delete(sb);
if (sb->s_dio_done_wq) { --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -294,7 +294,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -void fscrypt_sb_delete(struct super_block *sb); +void fscrypt_destroy_keyring(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg); int fscrypt_ioctl_remove_key_all_users(struct file *filp, void __user *arg); @@ -482,7 +482,7 @@ fscrypt_free_dummy_policy(struct fscrypt }
/* keyring.c */ -static inline void fscrypt_sb_delete(struct super_block *sb) +static inline void fscrypt_destroy_keyring(struct super_block *sb) { }
From: Filipe Manana fdmanana@suse.com
commit 8184620ae21213d51eaf2e0bd4186baacb928172 upstream.
When doing a direct IO write using a iocb with nowait and dsync set, we end up not syncing the file once the write completes.
This is because we tell iomap to not call generic_write_sync(), which would result in calling btrfs_sync_file(), in order to avoid a deadlock since iomap can call it while we are holding the inode's lock and btrfs_sync_file() needs to acquire the inode's lock. The deadlock happens only if the write happens synchronously, when iomap_dio_rw() calls iomap_dio_complete() before it returns. Instead we do the sync ourselves at btrfs_do_write_iter().
For a nowait write however we can end up not doing the sync ourselves at at btrfs_do_write_iter() because the write could have been queued, and therefore we get -EIOCBQUEUED returned from iomap in such case. That makes us skip the sync call at btrfs_do_write_iter(), as we don't do it for any error returned from btrfs_direct_write(). We can't simply do the call even if -EIOCBQUEUED is returned, since that would block the task waiting for IO, both for the data since there are bios still in progress as well as potentially blocking when joining a log transaction and when syncing the log (writing log trees, super blocks, etc).
So let iomap do the sync call itself and in order to avoid deadlocks for the case of synchronous writes (without nowait), use __iomap_dio_rw() and have ourselves call iomap_dio_complete() after unlocking the inode.
A test case will later be sent for fstests, after this is fixed in Linus' tree.
Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes") Reported-by: Марк Коренберг socketpair@gmail.com Link: https://lore.kernel.org/linux-btrfs/CAEmTpZGRKbzc16fWPvxbr6AfFsQoLmz-Lcg-7Og... CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/file.c | 39 ++++++++++++++++----------------------- 1 file changed, 16 insertions(+), 23 deletions(-)
--- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1906,7 +1906,6 @@ static ssize_t check_direct_IO(struct bt
static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) { - const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC); struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -1917,6 +1916,7 @@ static ssize_t btrfs_direct_write(struct loff_t endbyte; ssize_t err; unsigned int ilock_flags = 0; + struct iomap_dio *dio;
if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; @@ -1960,15 +1960,6 @@ relock: }
/* - * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw() - * calls generic_write_sync() (through iomap_dio_complete()), because - * that results in calling fsync (btrfs_sync_file()) which will try to - * lock the inode in exclusive/write mode. - */ - if (is_sync_write) - iocb->ki_flags &= ~IOCB_DSYNC; - - /* * The iov_iter can be mapped to the same file range we are writing to. * If that's the case, then we will deadlock in the iomap code, because * it first calls our callback btrfs_dio_iomap_begin(), which will create @@ -1986,12 +1977,23 @@ relock: * So here we disable page faults in the iov_iter and then retry if we * got -EFAULT, faulting in the pages before the retry. */ -again: from->nofault = true; - err = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops, - IOMAP_DIO_PARTIAL, written); + dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops, + IOMAP_DIO_PARTIAL, written); from->nofault = false;
+ /* + * iomap_dio_complete() will call btrfs_sync_file() if we have a dsync + * iocb, and that needs to lock the inode. So unlock it before calling + * iomap_dio_complete() to avoid a deadlock. + */ + btrfs_inode_unlock(inode, ilock_flags); + + if (IS_ERR_OR_NULL(dio)) + err = PTR_ERR_OR_ZERO(dio); + else + err = iomap_dio_complete(dio); + /* No increment (+=) because iomap returns a cumulative value. */ if (err > 0) written = err; @@ -2017,19 +2019,10 @@ again: } else { fault_in_iov_iter_readable(from, left); prev_left = left; - goto again; + goto relock; } }
- btrfs_inode_unlock(inode, ilock_flags); - - /* - * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do - * the fsync (call generic_write_sync()). - */ - if (is_sync_write) - iocb->ki_flags |= IOCB_DSYNC; - /* If 'err' is -ENOTBLK then it means we must fallback to buffered IO. */ if ((err < 0 && err != -ENOTBLK) || !iov_iter_count(from)) goto out;
From: Josef Bacik josef@toxicpanda.com
commit 968b71583130b6104c9f33ba60446d598e327a8b upstream.
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677! invalid opcode: 0000 [#1] SMP RIP: 0010:tree_mod_log_rewind+0x1b4/0x200 RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293 RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000 RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00 RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001 R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00 FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: btrfs_get_old_root+0x12b/0x420 btrfs_search_old_slot+0x64/0x2f0 ? tree_mod_log_oldest_root+0x3d/0xf0 resolve_indirect_ref+0xfd/0x660 ? ulist_alloc+0x31/0x60 ? kmem_cache_alloc_trace+0x114/0x2c0 find_parent_nodes+0x97a/0x17e0 ? ulist_alloc+0x30/0x60 btrfs_find_all_roots_safe+0x97/0x150 iterate_extent_inodes+0x154/0x370 ? btrfs_search_path_in_tree+0x240/0x240 iterate_inodes_from_logical+0x98/0xd0 ? btrfs_search_path_in_tree+0x240/0x240 btrfs_ioctl_logical_to_ino+0xd9/0x180 btrfs_ioctl+0xe2/0x2ec0 ? __mod_memcg_lruvec_state+0x3d/0x280 ? do_sys_openat2+0x6d/0x140 ? kretprobe_dispatcher+0x47/0x70 ? kretprobe_rethook_handler+0x38/0x50 ? rethook_trampoline_handler+0x82/0x140 ? arch_rethook_trampoline_callback+0x3b/0x50 ? kmem_cache_free+0xfb/0x270 ? do_sys_openat2+0xd5/0x140 __x64_sys_ioctl+0x71/0xb0 do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) { case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot, starting at 0. 'n' here is the number of items in this block, which in this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for a block that shouldn't have been replayed. Consider the following sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq(). 2. We begin removing items from this root, triggering KEY_REPLACE for it's child slots. 3. We remove one of the 2 children this root node points to, thus triggering the root node promotion of the remaining child, and freeing this node. 4. We modify a new root, and re-allocate the above node to the root node of this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2 logical 0 op KEY_REMOVE (slot 1) seq 3 logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4 logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5 logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6 logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7 logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root. 2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is currently logical 0. 3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which gives us the KEY_REPLACE seq 2, and since that's not a LOG_ROOT_REPLACE we incorrectly believe that we don't have an old root, because we expect that the most recent change should be a LOG_ROOT_REPLACE. 4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE, so we don't set old_root, we simply use our existing extent buffer. 5. Since we're using our existing extent buffer (logical 0) we call tree_mod_log_search(0) in order to get the newest change to start the rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8. 6. Again since we didn't find an old_root we simply clone logical 0 at it's current state. 7. We call tree_mod_log_rewind() with the cloned extent buffer. 8. Set n = btrfs_header_nritems(logical 0), which would be whatever the original nritems was when we COWed the original root, say for this example it's 2. 9. We start from the newest operation and work our way forward, so we see LOG_ROOT_REPLACE which we ignore. 10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the BUG_ON(tm->slot < n), because it expects if we've done this we have a completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then get the old_root set to logical 8192. In fact making that change fixes this particular problem.
However consider the much more complicated case. We have a child node in this tree and the above situation. In the above case we freed one of the child blocks at the seq 3 operation. If this block was also re-allocated and got new tree mod log operations we would have a different problem. btrfs_search_old_slot(orig root) would get down to the logical 0 root that still pointed at that node. However in btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This is not context aware enough to know which operations we should be replaying. If the block was re-allocated multiple times we may only want to replay a range of operations, and determining what that range is isn't possible to determine.
We could maybe solve this by keeping track of which root the node belonged to at every tree mod log operation, and then passing this around to make sure we're only replaying operations that relate to the root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow reallocations if we have currently running tree mod log users. We already do this for leaf's, so we're simply expanding this to nodes as well. This is a relatively uncommon occurrence, and the problem is complicated enough I'm worried that we will still have corner cases in the reallocation case. So fix this in the most straightforward way possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions") CC: stable@vger.kernel.org # 3.3+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3307,21 +3307,22 @@ void btrfs_free_tree_block(struct btrfs_ }
/* - * If this is a leaf and there are tree mod log users, we may - * have recorded mod log operations that point to this leaf. - * So we must make sure no one reuses this leaf's extent before - * mod log operations are applied to a node, otherwise after - * rewinding a node using the mod log operations we get an - * inconsistent btree, as the leaf's extent may now be used as - * a node or leaf for another different btree. + * If there are tree mod log users we may have recorded mod log + * operations for this node. If we re-allocate this node we + * could replay operations on this node that happened when it + * existed in a completely different root. For example if it + * was part of root A, then was reallocated to root B, and we + * are doing a btrfs_old_search_slot(root b), we could replay + * operations that happened when the block was part of root A, + * giving us an inconsistent view of the btree. + * * We are safe from races here because at this point no other * node or root points to this extent buffer, so if after this - * check a new tree mod log user joins, it will not be able to - * find a node pointing to this leaf and record operations that - * point to this leaf. + * check a new tree mod log user joins we will not have an + * existing log of operations on this node that we have to + * contend with. */ - if (btrfs_header_level(buf) == 0 && - test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags)) + if (test_bit(BTRFS_FS_TREE_MOD_LOG_USERS, &fs_info->flags)) must_pin = true;
if (must_pin || btrfs_is_zoned(fs_info)) {
From: David Sterba dsterba@suse.com
commit 2398091f9c2c8e0040f4f9928666787a3e8108a7 upstream.
The type of parameter generation has been u32 since the beginning, however all callers pass a u64 generation, so unify the types to prevent potential loss.
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/export.c | 2 +- fs/btrfs/export.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -58,7 +58,7 @@ static int btrfs_encode_fh(struct inode }
struct dentry *btrfs_get_dentry(struct super_block *sb, u64 objectid, - u64 root_objectid, u32 generation, + u64 root_objectid, u64 generation, int check_generation) { struct btrfs_fs_info *fs_info = btrfs_sb(sb); --- a/fs/btrfs/export.h +++ b/fs/btrfs/export.h @@ -19,7 +19,7 @@ struct btrfs_fid { } __attribute__ ((packed));
struct dentry *btrfs_get_dentry(struct super_block *sb, u64 objectid, - u64 root_objectid, u32 generation, + u64 root_objectid, u64 generation, int check_generation); struct dentry *btrfs_get_parent(struct dentry *child);
From: Li Huafei lihuafei1@huawei.com
commit 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 upstream.
KASAN reported a use-after-free with ftrace ops [1]. It was found from vmcore that perf had registered two ops with the same content successively, both dynamic. After unregistering the second ops, a use-after-free occurred.
In ftrace_shutdown(), when the second ops is unregistered, the FTRACE_UPDATE_CALLS command is not set because there is another enabled ops with the same content. Also, both ops are dynamic and the ftrace callback function is ftrace_ops_list_func, so the FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value of 'command' will be 0 and ftrace_shutdown() will skip the rcu synchronization.
However, ftrace may be activated. When the ops is released, another CPU may be accessing the ops. Add the missing synchronization to fix this problem.
[1] BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468
CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b4/0x248 lib/dump_stack.c:118 print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387 __kasan_report mm/kasan/report.c:547 [inline] kasan_report+0x118/0x210 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load8+0x98/0xc0 mm/kasan/generic.c:253 __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 ftrace_graph_call+0x0/0x4 __might_sleep+0x8/0x100 include/linux/perf_event.h:1170 __might_fault mm/memory.c:5183 [inline] __might_fault+0x58/0x70 mm/memory.c:5171 do_strncpy_from_user lib/strncpy_from_user.c:41 [inline] strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139 getname_flags+0xb0/0x31c fs/namei.c:149 getname+0x2c/0x40 fs/namei.c:209 [...]
Allocated by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:479 [inline] __kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493 kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950 kmalloc include/linux/slab.h:563 [inline] kzalloc include/linux/slab.h:675 [inline] perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 __arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723 [...]
Freed by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track+0x24/0x34 mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358 __kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437 __kasan_slab_free mm/kasan/common.c:445 [inline] kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446 slab_free_hook mm/slub.c:1569 [inline] slab_free_freelist_hook mm/slub.c:1608 [inline] slab_free mm/slub.c:3179 [inline] kfree+0x12c/0xc10 mm/slub.c:4176 perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 [...]
Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei1...
Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled") Cc: stable@vger.kernel.org Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Li Huafei lihuafei1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2948,18 +2948,8 @@ int ftrace_shutdown(struct ftrace_ops *o command |= FTRACE_UPDATE_TRACE_FUNC; }
- if (!command || !ftrace_enabled) { - /* - * If these are dynamic or per_cpu ops, they still - * need their data freed. Since, function tracing is - * not currently active, we can just free them - * without synchronizing all CPUs. - */ - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) - goto free_ops; - - return 0; - } + if (!command || !ftrace_enabled) + goto out;
/* * If the ops uses a trampoline, then it needs to be @@ -2996,6 +2986,7 @@ int ftrace_shutdown(struct ftrace_ops *o removed_ops = NULL; ops->flags &= ~FTRACE_OPS_FL_REMOVING;
+out: /* * Dynamic ops may be freed, we must make sure that all * callers are done before leaving this function. @@ -3023,7 +3014,6 @@ int ftrace_shutdown(struct ftrace_ops *o if (IS_ENABLED(CONFIG_PREEMPTION)) synchronize_rcu_tasks();
- free_ops: ftrace_trampoline_free(ops); }
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d upstream.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/protocol.h | 4 --- include/net/tcp.h | 2 - include/net/udp.h | 2 - net/ipv4/af_inet.c | 14 +--------- net/ipv4/ip_input.c | 37 ++++++++++++++++------------ net/ipv4/sysctl_net_ipv4.c | 59 +-------------------------------------------- net/ipv6/ip6_input.c | 23 +++++++++-------- net/ipv6/tcp_ipv6.c | 9 +----- net/ipv6/udp.c | 9 +----- 9 files changed, 45 insertions(+), 114 deletions(-)
--- a/include/net/protocol.h +++ b/include/net/protocol.h @@ -35,8 +35,6 @@
/* This is used to register protocols. */ struct net_protocol { - int (*early_demux)(struct sk_buff *skb); - int (*early_demux_handler)(struct sk_buff *skb); int (*handler)(struct sk_buff *skb);
/* This returns an error if we weren't able to handle the error. */ @@ -52,8 +50,6 @@ struct net_protocol {
#if IS_ENABLED(CONFIG_IPV6) struct inet6_protocol { - void (*early_demux)(struct sk_buff *skb); - void (*early_demux_handler)(struct sk_buff *skb); int (*handler)(struct sk_buff *skb);
/* This returns an error if we weren't able to handle the error. */ --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -921,7 +921,7 @@ extern const struct inet_connection_sock
INDIRECT_CALLABLE_DECLARE(void tcp_v6_send_check(struct sock *sk, struct sk_buff *skb)); INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *skb)); -INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *skb)); +void tcp_v6_early_demux(struct sk_buff *skb);
#endif
--- a/include/net/udp.h +++ b/include/net/udp.h @@ -173,7 +173,7 @@ INDIRECT_CALLABLE_DECLARE(int udp4_gro_c INDIRECT_CALLABLE_DECLARE(struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *)); INDIRECT_CALLABLE_DECLARE(int udp6_gro_complete(struct sk_buff *, int)); -INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *)); +void udp_v6_early_demux(struct sk_buff *skb); INDIRECT_CALLABLE_DECLARE(int udpv6_rcv(struct sk_buff *));
struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1728,24 +1728,14 @@ static const struct net_protocol igmp_pr }; #endif
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct net_protocol tcp_protocol = { - .early_demux = tcp_v4_early_demux, - .early_demux_handler = tcp_v4_early_demux, +static const struct net_protocol tcp_protocol = { .handler = tcp_v4_rcv, .err_handler = tcp_v4_err, .no_policy = 1, .icmp_strict_tag_validation = 1, };
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct net_protocol udp_protocol = { - .early_demux = udp_v4_early_demux, - .early_demux_handler = udp_v4_early_demux, +static const struct net_protocol udp_protocol = { .handler = udp_rcv, .err_handler = udp_err, .no_policy = 1, --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -310,14 +310,13 @@ static bool ip_can_use_hint(const struct ip_hdr(hint)->tos == iph->tos; }
-INDIRECT_CALLABLE_DECLARE(int udp_v4_early_demux(struct sk_buff *)); -INDIRECT_CALLABLE_DECLARE(int tcp_v4_early_demux(struct sk_buff *)); +int tcp_v4_early_demux(struct sk_buff *skb); +int udp_v4_early_demux(struct sk_buff *skb); static int ip_rcv_finish_core(struct net *net, struct sock *sk, struct sk_buff *skb, struct net_device *dev, const struct sk_buff *hint) { const struct iphdr *iph = ip_hdr(skb); - int (*edemux)(struct sk_buff *skb); int err, drop_reason; struct rtable *rt;
@@ -330,21 +329,29 @@ static int ip_rcv_finish_core(struct net goto drop_error; }
- if (net->ipv4.sysctl_ip_early_demux && + if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) && !skb_dst(skb) && !skb->sk && !ip_is_fragment(iph)) { - const struct net_protocol *ipprot; - int protocol = iph->protocol; - - ipprot = rcu_dereference(inet_protos[protocol]); - if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) { - err = INDIRECT_CALL_2(edemux, tcp_v4_early_demux, - udp_v4_early_demux, skb); - if (unlikely(err)) - goto drop_error; - /* must reload iph, skb->head might have changed */ - iph = ip_hdr(skb); + switch (iph->protocol) { + case IPPROTO_TCP: + if (READ_ONCE(net->ipv4.sysctl_tcp_early_demux)) { + tcp_v4_early_demux(skb); + + /* must reload iph, skb->head might have changed */ + iph = ip_hdr(skb); + } + break; + case IPPROTO_UDP: + if (READ_ONCE(net->ipv4.sysctl_udp_early_demux)) { + err = udp_v4_early_demux(skb); + if (unlikely(err)) + goto drop_error; + + /* must reload iph, skb->head might have changed */ + iph = ip_hdr(skb); + } + break; } }
--- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -363,61 +363,6 @@ bad_key: return ret; }
-static void proc_configure_early_demux(int enabled, int protocol) -{ - struct net_protocol *ipprot; -#if IS_ENABLED(CONFIG_IPV6) - struct inet6_protocol *ip6prot; -#endif - - rcu_read_lock(); - - ipprot = rcu_dereference(inet_protos[protocol]); - if (ipprot) - ipprot->early_demux = enabled ? ipprot->early_demux_handler : - NULL; - -#if IS_ENABLED(CONFIG_IPV6) - ip6prot = rcu_dereference(inet6_protos[protocol]); - if (ip6prot) - ip6prot->early_demux = enabled ? ip6prot->early_demux_handler : - NULL; -#endif - rcu_read_unlock(); -} - -static int proc_tcp_early_demux(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - int ret = 0; - - ret = proc_dou8vec_minmax(table, write, buffer, lenp, ppos); - - if (write && !ret) { - int enabled = init_net.ipv4.sysctl_tcp_early_demux; - - proc_configure_early_demux(enabled, IPPROTO_TCP); - } - - return ret; -} - -static int proc_udp_early_demux(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - int ret = 0; - - ret = proc_dou8vec_minmax(table, write, buffer, lenp, ppos); - - if (write && !ret) { - int enabled = init_net.ipv4.sysctl_udp_early_demux; - - proc_configure_early_demux(enabled, IPPROTO_UDP); - } - - return ret; -} - static int proc_tfo_blackhole_detect_timeout(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) @@ -720,14 +665,14 @@ static struct ctl_table ipv4_net_table[] .data = &init_net.ipv4.sysctl_udp_early_demux, .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_udp_early_demux + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_early_demux", .data = &init_net.ipv4.sysctl_tcp_early_demux, .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_tcp_early_demux + .proc_handler = proc_dou8vec_minmax, }, { .procname = "nexthop_compat_mode", --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -45,20 +45,23 @@ #include <net/inet_ecn.h> #include <net/dst_metadata.h>
-INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *)); static void ip6_rcv_finish_core(struct net *net, struct sock *sk, struct sk_buff *skb) { - void (*edemux)(struct sk_buff *skb); - - if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) { - const struct inet6_protocol *ipprot; - - ipprot = rcu_dereference(inet6_protos[ipv6_hdr(skb)->nexthdr]); - if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) - INDIRECT_CALL_2(edemux, tcp_v6_early_demux, - udp_v6_early_demux, skb); + if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) && + !skb_dst(skb) && !skb->sk) { + switch (ipv6_hdr(skb)->nexthdr) { + case IPPROTO_TCP: + if (READ_ONCE(net->ipv4.sysctl_tcp_early_demux)) + tcp_v6_early_demux(skb); + break; + case IPPROTO_UDP: + if (READ_ONCE(net->ipv4.sysctl_udp_early_demux)) + udp_v6_early_demux(skb); + break; + } } + if (!skb_valid_dst(skb)) ip6_route_input(skb); } --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1854,7 +1854,7 @@ do_time_wait: goto discard_it; }
-INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb) +void tcp_v6_early_demux(struct sk_buff *skb) { const struct ipv6hdr *hdr; const struct tcphdr *th; @@ -2209,12 +2209,7 @@ struct proto tcpv6_prot = { }; EXPORT_SYMBOL_GPL(tcpv6_prot);
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct inet6_protocol tcpv6_protocol = { - .early_demux = tcp_v6_early_demux, - .early_demux_handler = tcp_v6_early_demux, +static const struct inet6_protocol tcpv6_protocol = { .handler = tcp_v6_rcv, .err_handler = tcp_v6_err, .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL, --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1046,7 +1046,7 @@ static struct sock *__udp6_lib_demux_loo return NULL; }
-INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb) +void udp_v6_early_demux(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); const struct udphdr *uh; @@ -1659,12 +1659,7 @@ int udpv6_getsockopt(struct sock *sk, in return ipv6_getsockopt(sk, level, optname, optval, optlen); }
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct inet6_protocol udpv6_protocol = { - .early_demux = udp_v6_early_demux, - .early_demux_handler = udp_v6_early_demux, +static const struct inet6_protocol udpv6_protocol = { .handler = udpv6_rcv, .err_handler = udpv6_err, .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL,
From: Shang XiaoJing shangxiaojing@huawei.com
commit 66f0919c953ef7b55e5ab94389a013da2ce80a2c upstream.
test_gen_kprobe_cmd() only free buf in fail path, hence buf will leak when there is no failure. Move kfree(buf) from fail path to common path to prevent the memleak. The same reason and solution in test_gen_kretprobe_cmd().
unreferenced object 0xffff888143b14000 (size 2048): comm "insmod", pid 52490, jiffies 4301890980 (age 40.553s) hex dump (first 32 bytes): 70 3a 6b 70 72 6f 62 65 73 2f 67 65 6e 5f 6b 70 p:kprobes/gen_kp 72 6f 62 65 5f 74 65 73 74 20 64 6f 5f 73 79 73 robe_test do_sys backtrace: [<000000006d7b836b>] kmalloc_trace+0x27/0xa0 [<0000000009528b5b>] 0xffffffffa059006f [<000000008408b580>] do_one_initcall+0x87/0x2a0 [<00000000c4980a7e>] do_init_module+0xdf/0x320 [<00000000d775aad0>] load_module+0x3006/0x3390 [<00000000e9a74b80>] __do_sys_finit_module+0x113/0x1b0 [<000000003726480d>] do_syscall_64+0x35/0x80 [<000000003441e93b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lore.kernel.org/all/20221102072954.26555-1-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Cc: stable@vger.kernel.org Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/kprobe_event_gen_test.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-)
--- a/kernel/trace/kprobe_event_gen_test.c +++ b/kernel/trace/kprobe_event_gen_test.c @@ -100,20 +100,20 @@ static int __init test_gen_kprobe_cmd(vo KPROBE_GEN_TEST_FUNC, KPROBE_GEN_TEST_ARG0, KPROBE_GEN_TEST_ARG1); if (ret) - goto free; + goto out;
/* Use kprobe_event_add_fields to add the rest of the fields */
ret = kprobe_event_add_fields(&cmd, KPROBE_GEN_TEST_ARG2, KPROBE_GEN_TEST_ARG3); if (ret) - goto free; + goto out;
/* * This actually creates the event. */ ret = kprobe_event_gen_cmd_end(&cmd); if (ret) - goto free; + goto out;
/* * Now get the gen_kprobe_test event file. We need to prevent @@ -136,13 +136,11 @@ static int __init test_gen_kprobe_cmd(vo goto delete; } out: + kfree(buf); return ret; delete: /* We got an error after creating the event, delete it */ ret = kprobe_event_delete("gen_kprobe_test"); - free: - kfree(buf); - goto out; }
@@ -170,14 +168,14 @@ static int __init test_gen_kretprobe_cmd KPROBE_GEN_TEST_FUNC, "$retval"); if (ret) - goto free; + goto out;
/* * This actually creates the event. */ ret = kretprobe_event_gen_cmd_end(&cmd); if (ret) - goto free; + goto out;
/* * Now get the gen_kretprobe_test event file. We need to @@ -201,13 +199,11 @@ static int __init test_gen_kretprobe_cmd goto delete; } out: + kfree(buf); return ret; delete: /* We got an error after creating the event, delete it */ ret = kprobe_event_delete("gen_kretprobe_test"); - free: - kfree(buf); - goto out; }
From: Li Qiang liq3ea@163.com
commit 4a6f316d6855a434f56dbbeba05e14c01acde8f8 upstream.
In aggregate kprobe case, when arm_kprobe failed, we need set the kp->flags with KPROBE_FLAG_DISABLED again. If not, the 'kp' kprobe will been considered as enabled but it actually not enabled.
Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/
Fixes: 12310e343755 ("kprobes: Propagate error from arm_kprobe_ftrace()") Cc: stable@vger.kernel.org Signed-off-by: Li Qiang liq3ea@163.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kprobes.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2208,8 +2208,11 @@ int enable_kprobe(struct kprobe *kp) if (!kprobes_all_disarmed && kprobe_disabled(p)) { p->flags &= ~KPROBE_FLAG_DISABLED; ret = arm_kprobe(p); - if (ret) + if (ret) { p->flags |= KPROBE_FLAG_DISABLED; + if (p != kp) + kp->flags |= KPROBE_FLAG_DISABLED; + } } out: mutex_unlock(&kprobe_mutex);
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 7433632c9ff68a991bd0bc38cabf354e9d2de410 upstream.
On some machines the number of listed CPUs may be bigger than the actual CPUs that exist. The tracing subsystem allocates a per_cpu directory with access to the per CPU ring buffer via a cpuX file. But to save space, the ring buffer will only allocate buffers for online CPUs, even though the CPU array will be as big as the nr_cpu_ids.
With the addition of waking waiters on the ring buffer when closing the file, the ring_buffer_wake_waiters() now needs to make sure that the buffer is allocated (with the irq_work allocated with it) before trying to wake waiters, as it will cause a NULL pointer dereference.
While debugging this, I added a NULL check for the buffer itself (which is OK to do), and also NULL pointer checks against buffer->buffers (which is not fine, and will WARN) as well as making sure the CPU number passed in is within the nr_cpu_ids (which is also not fine if it isn't).
Link: https://lore.kernel.org/all/87h6zklb6n.wl-tiwai@suse.de/ Link: https://lore.kernel.org/all/CAM6Wdxc0KRJMXVAA0Y=u6Jh2V=uWB-_Fn6M4xRuNppfXzL1... Link: https://lkml.kernel.org/linux-trace-kernel/20221101191009.1e7378c8@rorschach...
Cc: stable@vger.kernel.org Cc: Steven Noonan steven.noonan@gmail.com Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1204705 Reported-by: Takashi Iwai tiwai@suse.de Reported-by: Roland Ruckerbauer roland.rucky@gmail.com Fixes: f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file") Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ring_buffer.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -901,6 +901,9 @@ void ring_buffer_wake_waiters(struct tra struct ring_buffer_per_cpu *cpu_buffer; struct rb_irq_work *rbwork;
+ if (!buffer) + return; + if (cpu == RING_BUFFER_ALL_CPUS) {
/* Wake up individual ones too. One level recursion */ @@ -909,7 +912,15 @@ void ring_buffer_wake_waiters(struct tra
rbwork = &buffer->irq_work; } else { + if (WARN_ON_ONCE(!buffer->buffers)) + return; + if (WARN_ON_ONCE(cpu >= nr_cpu_ids)) + return; + cpu_buffer = buffer->buffers[cpu]; + /* The CPU buffer may not have been initialized yet */ + if (!cpu_buffer) + return; rbwork = &cpu_buffer->irq_work; }
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit b3f4f51ea68a495f8a5956064c33dce711a2df91 upstream.
The C standard says that memcmp() must treat the buffers as consisting of "unsigned chars". If char happens to be unsigned, the casts are ok, but then obviously the c1 variable can never contain a negative value. And when char is signed, the casts are wrong, and there's still a problem with using an 8-bit quantity to hold the difference, because that can range from -255 to +255.
For example, assuming char is signed, comparing two 1-byte buffers, one containing 0x00 and another 0x80, the current implementation would return -128 for both memcmp(a, b, 1) and memcmp(b, a, 1), whereas one of those should of course return something positive.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/include/nolibc/nolibc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -2408,9 +2408,9 @@ static __attribute__((unused)) int memcmp(const void *s1, const void *s2, size_t n) { size_t ofs = 0; - char c1 = 0; + int c1 = 0;
- while (ofs < n && !(c1 = ((char *)s1)[ofs] - ((char *)s2)[ofs])) { + while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) { ofs++; } return c1;
From: Zheng Yejian zhengyejian1@huawei.com
commit a635beeacc6d56d2b71c39e6c0103f85b53d108e upstream.
After commit 4f36c2d85ced ("tracing: Increase tracing map KEYS_MAX size"), 'keys' supports up to three fields.
Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/r/20221017103806.2479139-1-zhengyejian1@huawei.com Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/trace/histogram.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -39,7 +39,7 @@ Documentation written by Tom Zanussi will use the event's kernel stacktrace as the key. The keywords 'keys' or 'key' can be used to specify keys, and the keywords 'values', 'vals', or 'val' can be used to specify values. Compound - keys consisting of up to two fields can be specified by the 'keys' + keys consisting of up to three fields can be specified by the 'keys' keyword. Hashing a compound key produces a unique entry in the table for each unique combination of component keys, and can be useful for providing more fine-grained summaries of event data.
From: Gaosheng Cui cuigaosheng1@huawei.com
commit 8cf0a1bc12870d148ae830a4ba88cfdf0e879cee upstream.
In cap_inode_getsecurity(), we will use vfs_getxattr_alloc() to complete the memory allocation of tmpbuf, if we have completed the memory allocation of tmpbuf, but failed to call handler->get(...), there will be a memleak in below logic:
|-- ret = (int)vfs_getxattr_alloc(mnt_userns, ...) | /* ^^^ alloc for tmpbuf */ |-- value = krealloc(*xattr_value, error + 1, flags) | /* ^^^ alloc memory */ |-- error = handler->get(handler, ...) | /* error! */ |-- *xattr_value = value | /* xattr_value is &tmpbuf (memory leak!) */
So we will try to free(tmpbuf) after vfs_getxattr_alloc() fails to fix it.
Cc: stable@vger.kernel.org Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Acked-by: Serge Hallyn serge@hallyn.com [PM: subject line and backtrace tweaks] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/commoncap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/security/commoncap.c +++ b/security/commoncap.c @@ -401,8 +401,10 @@ int cap_inode_getsecurity(struct user_na &tmpbuf, size, GFP_NOFS); dput(dentry);
- if (ret < 0 || !tmpbuf) - return ret; + if (ret < 0 || !tmpbuf) { + size = ret; + goto out_free; + }
fs_ns = inode->i_sb->s_user_ns; cap = (struct vfs_cap_data *) tmpbuf;
From: Miklos Szeredi mszeredi@redhat.com
commit 4a6f278d4827b59ba26ceae0ff4529ee826aa258 upstream.
Add missing file_modified() call to fuse_file_fallocate(). Without this fallocate on fuse failed to clear privileges.
Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation") Cc: stable@vger.kernel.org Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/file.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2975,6 +2975,10 @@ static long fuse_file_fallocate(struct f goto out; }
+ err = file_modified(file); + if (err) + goto out; + if (!(mode & FALLOC_FL_KEEP_SIZE)) set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
From: Ard Biesheuvel ardb@kernel.org
commit 161a438d730dade2ba2b1bf8785f0759aba4ca5f upstream.
We no longer need at least 64 bytes of random seed to permit the early crng init to complete. The RNG is now based on Blake2s, so reduce the EFI seed size to the Blake2s hash size, which is sufficient for our purposes.
While at it, drop the READ_ONCE(), which was supposed to prevent size from being evaluated after seed was unmapped. However, this cannot actually happen, so READ_ONCE() is unnecessary here.
Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Jason A. Donenfeld Jason@zx2c4.com Acked-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/efi.c | 2 +- include/linux/efi.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -590,7 +590,7 @@ int __init efi_config_parse_tables(const
seed = early_memremap(efi_rng_seed, sizeof(*seed)); if (seed != NULL) { - size = READ_ONCE(seed->size); + size = min(seed->size, EFI_RANDOM_SEED_SIZE); early_memunmap(seed, sizeof(*seed)); } else { pr_err("Could not map UEFI random seed!\n"); --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1165,7 +1165,7 @@ efi_status_t efi_random_get_seed(void); arch_efi_call_virt_teardown(); \ })
-#define EFI_RANDOM_SEED_SIZE 64U +#define EFI_RANDOM_SEED_SIZE 32U // BLAKE2S_HASH_SIZE
struct linux_efi_random_seed { u32 size;
From: Ard Biesheuvel ardb@kernel.org
commit 7d866e38c7e9ece8a096d0d098fa9d92b9d4f97e upstream.
EFI runtime services data is guaranteed to be preserved by the OS, making it a suitable candidate for the EFI random seed table, which may be passed to kexec kernels as well (after refreshing the seed), and so we need to ensure that the memory is preserved without support from the OS itself.
However, runtime services data is intended for allocations that are relevant to the implementations of the runtime services themselves, and so they are unmapped from the kernel linear map, and mapped into the EFI page tables that are active while runtime service invocations are in progress. None of this is needed for the RNG seed.
So let's switch to EFI 'ACPI reclaim' memory: in spite of the name, there is nothing exclusively ACPI about it, it is simply a type of allocation that carries firmware provided data which may or may not be relevant to the OS, and it is left up to the OS to decide whether to reclaim it after having consumed its contents.
Given that in Linux, we never reclaim these allocations, it is a good choice for the EFI RNG seed, as the allocation is guaranteed to survive kexec reboots.
One additional reason for changing this now is to align it with the upcoming recommendation for EFI bootloader provided RNG seeds, which must not use EFI runtime services code/data allocations.
Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/libstub/random.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -75,7 +75,12 @@ efi_status_t efi_random_get_seed(void) if (status != EFI_SUCCESS) return status;
- status = efi_bs_call(allocate_pool, EFI_RUNTIME_SERVICES_DATA, + /* + * Use EFI_ACPI_RECLAIM_MEMORY here so that it is guaranteed that the + * allocation will survive a kexec reboot (although we refresh the seed + * beforehand) + */ + status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY, sizeof(*seed) + EFI_RANDOM_SEED_SIZE, (void **)&seed); if (status != EFI_SUCCESS)
From: Mark Rutland mark.rutland@arm.com
commit 024f4b2e1f874934943eb2d3d288ebc52c79f55c upstream.
The cortex_a76_erratum_1463225_debug_handler() function is called when handling debug exceptions (and synchronous exceptions from BRK instructions), and so is called when a probed function executes. If the compiler does not inline cortex_a76_erratum_1463225_debug_handler(), it can be probed.
If cortex_a76_erratum_1463225_debug_handler() is probed, any debug exception or software breakpoint exception will result in recursive exceptions leading to a stack overflow. This can be triggered with the ftrace multiple_probes selftest, and as per the example splat below.
This is a regression caused by commit:
6459b8469753e9fe ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround")
... which removed the NOKPROBE_SYMBOL() annotation associated with the function.
My intent was that cortex_a76_erratum_1463225_debug_handler() would be inlined into its caller, el1_dbg(), which is marked noinstr and cannot be probed. Mark cortex_a76_erratum_1463225_debug_handler() as __always_inline to ensure this.
Example splat prior to this patch (with recursive entries elided):
| # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events | # echo p do_el0_svc >> /sys/kernel/debug/tracing/kprobe_events | # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable | Insufficient stack space to handle exception! | ESR: 0x0000000096000047 -- DABT (current EL) | FAR: 0xffff800009cefff0 | Task stack: [0xffff800009cf0000..0xffff800009cf4000] | IRQ stack: [0xffff800008000000..0xffff800008004000] | Overflow stack: [0xffff00007fbc00f0..0xffff00007fbc10f0] | CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2 | Hardware name: linux,dummy-virt (DT) | pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : arm64_enter_el1_dbg+0x4/0x20 | lr : el1_dbg+0x24/0x5c | sp : ffff800009cf0000 | x29: ffff800009cf0000 x28: ffff000002c74740 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: 00000000604003c5 x22: ffff80000801745c x21: 0000aaaac95ac068 | x20: 00000000f2000004 x19: ffff800009cf0040 x18: 0000000000000000 | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 | x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 | x11: 0000000000000010 x10: ffff800008c87190 x9 : ffff800008ca00d0 | x8 : 000000000000003c x7 : 0000000000000000 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000000043a4 | x2 : 00000000f2000004 x1 : 00000000f2000004 x0 : ffff800009cf0040 | Kernel panic - not syncing: kernel stack overflow | CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0xe4/0x104 | show_stack+0x18/0x4c | dump_stack_lvl+0x64/0x7c | dump_stack+0x18/0x38 | panic+0x14c/0x338 | test_taint+0x0/0x2c | panic_bad_stack+0x104/0x118 | handle_bad_stack+0x34/0x48 | __bad_stack+0x78/0x7c | arm64_enter_el1_dbg+0x4/0x20 | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 ... | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 ... | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | cortex_a76_erratum_1463225_debug_handler+0x0/0x34 | el1h_64_sync_handler+0x40/0x98 | el1h_64_sync+0x64/0x68 | do_el0_svc+0x0/0x28 | el0t_64_sync_handler+0x84/0xf0 | el0t_64_sync+0x18c/0x190 | Kernel Offset: disabled | CPU features: 0x0080,00005021,19001080 | Memory Limit: none | ---[ end Kernel panic - not syncing: kernel stack overflow ]---
With this patch, cortex_a76_erratum_1463225_debug_handler() is inlined into el1_dbg(), and el1_dbg() cannot be probed:
| # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events | sh: write error: No such file or directory | # grep -w cortex_a76_erratum_1463225_debug_handler /proc/kallsyms | wc -l | 0 | # echo p el1_dbg > /sys/kernel/debug/tracing/kprobe_events | sh: write error: Invalid argument | # grep -w el1_dbg /proc/kallsyms | wc -l | 1
Fixes: 6459b8469753 ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround") Cc: stable@vger.kernel.org # 5.12.x Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20221017090157.2881408-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/entry-common.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -320,7 +320,8 @@ static void cortex_a76_erratum_1463225_s __this_cpu_write(__in_cortex_a76_erratum_1463225_wa, 0); }
-static bool cortex_a76_erratum_1463225_debug_handler(struct pt_regs *regs) +static __always_inline bool +cortex_a76_erratum_1463225_debug_handler(struct pt_regs *regs) { if (!__this_cpu_read(__in_cortex_a76_erratum_1463225_wa)) return false;
From: Kan Liang kan.liang@linux.intel.com
commit acc5568b90c19ac6375508a93b9676cd18a92a35 upstream.
According to the latest event list, update the MEM_INST_RETIRED events which support the DataLA facility.
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Reported-by: Jannis Klinkenberg jannis.klinkenberg@rwth-aachen.de Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154119.571386-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/ds.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -936,8 +936,13 @@ struct event_constraint intel_icl_pebs_e INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL), /* SLOTS */
INTEL_PLD_CONSTRAINT(0x1cd, 0xff), /* MEM_TRANS_RETIRED.LOAD_LATENCY */ - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf), /* MEM_INST_RETIRED.LOAD */ - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf), /* MEM_INST_RETIRED.STORE */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED.LOCK_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_INST_RETIRED.SPLIT_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_INST_RETIRED.SPLIT_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_INST_RETIRED.ALL_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_INST_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf), /* MEM_LOAD_*_RETIRED.* */
From: Kan Liang kan.liang@linux.intel.com
commit 6f8faf471446844bb9c318e0340221049d5c19f4 upstream.
The intel_pebs_isolation quirk checks both model number and stepping. Cooper Lake has a different stepping (11) than the other Skylake Xeon. It cannot benefit from the optimization in commit 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering").
Add the stepping of Cooper Lake into the isolation_ucodes[] table.
Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154550.571663-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4713,6 +4713,7 @@ static const struct x86_cpu_desc isolati INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 5, 0x00000000), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 6, 0x00000000), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 7, 0x00000000), + INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 11, 0x00000000), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_L, 3, 0x0000007c), INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE, 3, 0x0000007c), INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE, 9, 0x0000004e),
From: Kan Liang kan.liang@linux.intel.com
commit 0916886bb978e7eae1ca3955ba07f51c020da20c upstream.
According to the latest event list, update the MEM_INST_RETIRED events which support the DataLA facility for SPR.
Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154119.571386-2-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/ds.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -963,8 +963,13 @@ struct event_constraint intel_spr_pebs_e INTEL_FLAGS_EVENT_CONSTRAINT(0xc0, 0xfe), INTEL_PLD_CONSTRAINT(0x1cd, 0xfe), INTEL_PSD_CONSTRAINT(0x2cd, 0x1), - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf), - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf), + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED.LOCK_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_INST_RETIRED.SPLIT_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_INST_RETIRED.SPLIT_STORES */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_INST_RETIRED.ALL_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_INST_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf),
From: Helge Deller deller@gmx.de
commit e8a18e3f00f3ee8d07c17ab1ea3ad4df4a3b6fe0 upstream.
Although the name of the driver 8250_gsc.c suggests that it handles only serial ports on the GSC bus, it does handle serial ports listed in the parisc machine inventory as well, e.g. the serial ports in a C8000 PCI-only workstation.
Change the dependency to CONFIG_PARISC, so that the driver gets included in the kernel even if CONFIG_GSC isn't set.
Reported-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/8250/Kconfig +++ b/drivers/tty/serial/8250/Kconfig @@ -118,7 +118,7 @@ config SERIAL_8250_CONSOLE
config SERIAL_8250_GSC tristate - depends on SERIAL_8250 && GSC + depends on SERIAL_8250 && PARISC default SERIAL_8250
config SERIAL_8250_DMA
From: Helge Deller deller@gmx.de
commit a0c9f1f2e53b8eb2ae43987a30e547ba56b4fa18 upstream.
The parisc serial port driver needs this symbol when it's compiled as module.
Signed-off-by: Helge Deller deller@gmx.de Reported-by: kernel test robot lkp@intel.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/iosapic.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/parisc/iosapic.c +++ b/drivers/parisc/iosapic.c @@ -875,6 +875,7 @@ int iosapic_serial_irq(struct parisc_dev
return vi->txn_irq; } +EXPORT_SYMBOL(iosapic_serial_irq); #endif
From: Helge Deller deller@gmx.de
commit 2b6ae0962b421103feb41a80406732944b0665b3 upstream.
Avoid that the hardware path is shown twice in the kernel log, and clean up the output of the version numbers to show up in the same order as they are listed in the hardware database in the hardware.c file. Additionally, optimize the memory footprint of the hardware database and mark some code as init code.
Fixes: cab56b51ec0e ("parisc: Fix device names in /proc/iomem") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/include/asm/hardware.h | 12 ++++++------ arch/parisc/kernel/drivers.c | 14 ++++++-------- 2 files changed, 12 insertions(+), 14 deletions(-)
--- a/arch/parisc/include/asm/hardware.h +++ b/arch/parisc/include/asm/hardware.h @@ -10,12 +10,12 @@ #define SVERSION_ANY_ID PA_SVERSION_ANY_ID
struct hp_hardware { - unsigned short hw_type:5; /* HPHW_xxx */ - unsigned short hversion; - unsigned long sversion:28; - unsigned short opt; - const char name[80]; /* The hardware description */ -}; + unsigned int hw_type:8; /* HPHW_xxx */ + unsigned int hversion:12; + unsigned int sversion:12; + unsigned char opt; + unsigned char name[59]; /* The hardware description */ +} __packed;
struct parisc_device;
--- a/arch/parisc/kernel/drivers.c +++ b/arch/parisc/kernel/drivers.c @@ -882,15 +882,13 @@ void __init walk_central_bus(void) &root); }
-static void print_parisc_device(struct parisc_device *dev) +static __init void print_parisc_device(struct parisc_device *dev) { - char hw_path[64]; - static int count; + static int count __initdata;
- print_pa_hwpath(dev, hw_path); - pr_info("%d. %s at %pap [%s] { %d, 0x%x, 0x%.3x, 0x%.5x }", - ++count, dev->name, &(dev->hpa.start), hw_path, dev->id.hw_type, - dev->id.hversion_rev, dev->id.hversion, dev->id.sversion); + pr_info("%d. %s at %pap { type:%d, hv:%#x, sv:%#x, rev:%#x }", + ++count, dev->name, &(dev->hpa.start), dev->id.hw_type, + dev->id.hversion, dev->id.sversion, dev->id.hversion_rev);
if (dev->num_addrs) { int k; @@ -1079,7 +1077,7 @@ static __init int qemu_print_iodc_data(s
-static int print_one_device(struct device * dev, void * data) +static __init int print_one_device(struct device * dev, void * data) { struct parisc_device * pdev = to_parisc_device(dev);
From: Ye Bin yebin10@huawei.com
commit 1b8f787ef547230a3249bcf897221ef0cc78481b upstream.
Syzkaller report issue as follows: EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=0 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=0 EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks ------------[ cut here ]------------ WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524 Modules linked in: CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528 RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296 RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00 RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000 RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5 R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000 R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740 FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461 mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589 ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852 do_writepages+0x3c3/0x680 mm/page-writeback.c:2469 __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870 wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK>
Above issue may happens as follows: ext4_da_write_begin ext4_create_inline_data ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); __ext4_ioctl ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag ext4_da_write_begin ext4_da_convert_inline_data_to_extent ext4_da_write_inline_data_begin ext4_da_map_blocks ext4_insert_delayed_block if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk)) ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1 allocated = true; ext4_es_insert_delayed_block(inode, lblk, allocated); ext4_writepages mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1 ext4_es_remove_extent ext4_da_release_space(inode, reserved); if (unlikely(to_free > ei->i_reserved_data_blocks)) -> to_free == 1 but ei->i_reserved_data_blocks == 0 -> then trigger warning as above
To solve above issue, forbid inode do migrate which has inline data.
Cc: stable@kernel.org Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/migrate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -425,7 +425,8 @@ int ext4_ext_migrate(struct inode *inode * already is extent-based, error out. */ if (!ext4_has_feature_extents(inode->i_sb) || - (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) || + ext4_has_inline_data(inode)) return -EINVAL;
if (S_ISLNK(inode->i_mode) && inode->i_blocks == 0)
From: Luís Henriques lhenriques@suse.de
commit 17a0bc9bd697f75cfdf9b378d5eb2d7409c91340 upstream.
The rec_len field in the directory entry has to be a multiple of 4. A corrupted filesystem image can be used to hit a BUG() in ext4_rec_len_to_disk(), called from make_indexed_dir().
------------[ cut here ]------------ kernel BUG at fs/ext4/ext4.h:2413! ... RIP: 0010:make_indexed_dir+0x53f/0x5f0 ... Call Trace: <TASK> ? add_dirent_to_buf+0x1b2/0x200 ext4_add_entry+0x36e/0x480 ext4_add_nondir+0x2b/0xc0 ext4_create+0x163/0x200 path_openat+0x635/0xe90 do_filp_open+0xb4/0x160 ? __create_object.isra.0+0x1de/0x3b0 ? _raw_spin_unlock+0x12/0x30 do_sys_openat2+0x91/0x150 __x64_sys_open+0x6c/0xa0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
The fix simply adds a call to ext4_check_dir_entry() to validate the directory entry, returning -EFSCORRUPTED if the entry is invalid.
CC: stable@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540 Signed-off-by: Luís Henriques lhenriques@suse.de Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/namei.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2259,8 +2259,16 @@ static int make_indexed_dir(handle_t *ha memset(de, 0, len); /* wipe old data */ de = (struct ext4_dir_entry_2 *) data2; top = data2 + len; - while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top) + while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top) { + if (ext4_check_dir_entry(dir, NULL, de, bh2, data2, len, + (data2 + (blocksize - csum_size) - + (char *) de))) { + brelse(bh2); + brelse(bh); + return -EFSCORRUPTED; + } de = de2; + } de->rec_len = ext4_rec_len_to_disk(data2 + (blocksize - csum_size) - (char *) de, blocksize);
From: Jiri Olsa olsajiri@gmail.com
commit 9440c42941606af4c379afa3cf8624f0dc43a629 upstream.
With just the forward declaration of the 'struct pt_regs' in syscall_wrapper.h, the syscall stub functions:
__[x64|ia32]_sys_*(struct pt_regs *regs)
will have different definition of 'regs' argument in BTF data based on which object file they are defined in.
If the syscall's object includes 'struct pt_regs' definition, the BTF argument data will point to a 'struct pt_regs' record, like:
[226] STRUCT 'pt_regs' size=168 vlen=21 'r15' type_id=1 bits_offset=0 'r14' type_id=1 bits_offset=64 'r13' type_id=1 bits_offset=128 ...
If not, it will point to a fwd declaration record:
[15439] FWD 'pt_regs' fwd_kind=struct
and make bpf tracing program hooking on those functions unable to access fields from 'struct pt_regs'.
Include asm/ptrace.h directly in syscall_wrapper.h to make sure all syscalls see 'struct pt_regs' definition. This then results in BTF for '__*_sys_*(struct pt_regs *regs)' functions to point to the actual struct, not just the forward declaration.
[ bp: No Fixes tag as this is not really a bug fix but "adjustment" so that BTF is happy. ]
Reported-by: Akihiro HARAI jharai0815@gmail.com Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Andrii Nakryiko andrii@kernel.org Cc: stable@vger.kernel.org # this is needed only for BTF so kernels >= 5.15 Link: https://lore.kernel.org/r/20221018122708.823792-1-jolsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/syscall_wrapper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h index 59358d1bf880..fd2669b1cb2d 100644 --- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -6,7 +6,7 @@ #ifndef _ASM_X86_SYSCALL_WRAPPER_H #define _ASM_X86_SYSCALL_WRAPPER_H
-struct pt_regs; +#include <asm/ptrace.h>
extern long __x64_sys_ni_syscall(const struct pt_regs *regs); extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
From: Jim Mattson jmattson@google.com
commit eeb69eab57c6604ac90b3fd8e5ac43f24a5535b1 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.80000006H:EDX[17:16] are reserved bits and should be masked off.
Fixes: 43d05de2bee7 ("KVM: pass through CPUID(0x80000006)") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-2-jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -906,7 +906,8 @@ static inline int __do_cpuid_func(struct cpuid_entry_override(entry, CPUID_8000_0001_ECX); break; case 0x80000006: - /* L2 cache and TLB: pass through host info. */ + /* Drop reserved bits, pass host L2 cache and TLB info. */ + entry->edx &= ~GENMASK(17, 16); break; case 0x80000007: /* Advanced power management */ /* invariant TSC is CPUID.80000007H:EDX[8] */
From: Jim Mattson jmattson@google.com
commit 079f6889818dd07903fb36c252532ab47ebb6d48 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. In the case of CPUID.8000001AH, only three bits are currently defined. The 125 reserved bits should be masked off.
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-4-jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -956,6 +956,9 @@ static inline int __do_cpuid_func(struct entry->ecx = entry->edx = 0; break; case 0x8000001a: + entry->eax &= GENMASK(2, 0); + entry->ebx = entry->ecx = entry->edx = 0; + break; case 0x8000001e: break; case 0x8000001F:
From: Jim Mattson jmattson@google.com
commit 7030d8530e533844e2f4b0e7476498afcd324634 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. The following ranges of CPUID.80000008H are reserved and should be masked off: ECX[31:18] ECX[11:8]
In addition, the PerfTscSize field at ECX[17:16] should also be zero because KVM does not set the PERFTSC bit at CPUID.80000001H.ECX[27].
Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-3-jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -937,6 +937,7 @@ static inline int __do_cpuid_func(struct g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8); + entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); entry->edx = 0; cpuid_entry_override(entry, CPUID_8000_0008_EBX); break;
From: Jim Mattson jmattson@google.com
commit 0469e56a14bf8cfb80507e51b7aeec0332cdbc13 upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.80000001:EBX[27:16] are reserved bits and should be masked off.
Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management") Signed-off-by: Jim Mattson jmattson@google.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -902,6 +902,7 @@ static inline int __do_cpuid_func(struct entry->eax = min(entry->eax, 0x8000001f); break; case 0x80000001: + entry->ebx &= ~GENMASK(27, 16); cpuid_entry_override(entry, CPUID_8000_0001_EDX); cpuid_entry_override(entry, CPUID_8000_0001_ECX); break;
From: Jim Mattson jmattson@google.com
commit 86c4f0d547f6460d0426ebb3ba0614f1134b8cda upstream.
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and should be masked off.
Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220929225203.2234702-6-jmattson@google.com Cc: stable@vger.kernel.org [Clear NumVMPL too. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -968,7 +968,8 @@ static inline int __do_cpuid_func(struct entry->eax = entry->ebx = entry->ecx = entry->edx = 0; } else { cpuid_entry_override(entry, CPUID_8000_001F_EAX); - + /* Clear NumVMPL since KVM does not support VMPL. */ + entry->ebx &= ~GENMASK(31, 12); /* * Enumerate '0' for "PA bits reduction", the adjusted * MAXPHYADDR is enumerated directly (see 0x80000008).
From: Emanuele Giuseppe Esposito eesposit@redhat.com
commit 1c1a41497ab879ac9608f3047f230af833eeef3d upstream.
Clear enable_sgx if ENCLS-exiting is not supported, i.e. if SGX cannot be virtualized. When KVM is loaded, adjust_vmx_controls checks that the bit is available before enabling the feature; however, other parts of the code check enable_sgx and not clearing the variable caused two different bugs, mostly affecting nested virtualization scenarios.
First, because enable_sgx remained true, SECONDARY_EXEC_ENCLS_EXITING would be marked available in the capability MSR that are accessed by a nested hypervisor. KVM would then propagate the control from vmcs12 to vmcs02 even if it isn't supported by the processor, thus causing an unexpected VM-Fail (exit code 0x7) in L1.
Second, vmx_set_cpu_caps() would not clear the SGX bits when hardware support is unavailable. This is a much less problematic bug as it only happens if SGX is soft-disabled (available in the processor but hidden in CPUID) or if SGX is supported for bare metal but not in the VMCS (will never happen when running on bare metal, but can theoertically happen when running in a VM).
Last but not least, this ensures that module params in sysfs reflect KVM's actual configuration.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128 Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson seanjc@google.com Suggested-by: Bandan Das bsd@redhat.com Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com Message-Id: 20221025123749.2201649-1-eesposit@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7902,6 +7902,11 @@ static __init int hardware_setup(void) if (!cpu_has_virtual_nmis()) enable_vnmi = 0;
+#ifdef CONFIG_X86_SGX_KVM + if (!cpu_has_vmx_encls_vmexit()) + enable_sgx = false; +#endif + /* * set_apic_access_page_addr() is used to reload apic access * page upon invalidation. No need to do anything if not
From: Ryan Roberts ryan.roberts@arm.com
commit b6bcdc9f6b8321e4471ff45413b6410e16762a8d upstream.
enter_exception64() performs an MTE check, which involves dereferencing vcpu->kvm. While vcpu has already been fixed up to be a HYP VA pointer, kvm is still a pointer in the kernel VA space.
This only affects nVHE configurations with MTE enabled, as in other cases, the pointer is either valid (VHE) or not dereferenced (!MTE).
Fix this by first converting kvm to a HYP VA pointer.
Fixes: ea7fc1bb1cd1 ("KVM: arm64: Introduce MTE VM feature") Signed-off-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Steven Price steven.price@arm.com [maz: commit message tidy-up] Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221027120945.29679-1-ryan.roberts@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/exception.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/kvm/hyp/exception.c +++ b/arch/arm64/kvm/hyp/exception.c @@ -13,6 +13,7 @@ #include <hyp/adjust_pc.h> #include <linux/kvm_host.h> #include <asm/kvm_emulate.h> +#include <asm/kvm_mmu.h>
#if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__) #error Hypervisor code only! @@ -115,7 +116,7 @@ static void enter_exception64(struct kvm new |= (old & PSR_C_BIT); new |= (old & PSR_V_BIT);
- if (kvm_has_mte(vcpu->kvm)) + if (kvm_has_mte(kern_hyp_va(vcpu->kvm))) new |= PSR_TCO_BIT;
new |= (old & PSR_DIT_BIT);
From: Maxim Levitsky mlevitsk@redhat.com
commit 5015bb89b58225f97df6ac44383e7e8c8662c8c9 upstream.
SYSEXIT is one of the instructions that can change the processor mode, thus ctxt->mode should be updated after it.
Note that this is likely a benign bug, because the only problematic mode change is from 32 bit to 64 bit which can lead to truncation of RIP, and it is not possible to do with sysexit, since sysexit running in 32 bit mode will be limited to 32 bit version.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-11-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2861,6 +2861,7 @@ static int em_sysexit(struct x86_emulate ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
ctxt->_eip = rdx; + ctxt->mode = usermode; *reg_write(ctxt, VCPU_REGS_RSP) = rcx;
return X86EMUL_CONTINUE;
From: Maxim Levitsky mlevitsk@redhat.com
commit d087e0f79fa0dd336a9a6b2f79ec23120f5eff73 upstream.
Some instructions update the cpu execution mode, which needs to update the emulation mode.
Extract this code, and make assign_eip_far use it.
assign_eip_far now reads CS, instead of getting it via a parameter, which is ok, because callers always assign CS to the same value before calling this function.
No functional change is intended.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-12-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 85 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 28 deletions(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -795,8 +795,7 @@ static int linearize(struct x86_emulate_ ctxt->mode, linear); }
-static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst, - enum x86emul_mode mode) +static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst) { ulong linear; int rc; @@ -806,41 +805,71 @@ static inline int assign_eip(struct x86_
if (ctxt->op_bytes != sizeof(unsigned long)) addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1); - rc = __linearize(ctxt, addr, &max_size, 1, false, true, mode, &linear); + rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear); if (rc == X86EMUL_CONTINUE) ctxt->_eip = addr.ea; return rc; }
+static inline int emulator_recalc_and_set_mode(struct x86_emulate_ctxt *ctxt) +{ + u64 efer; + struct desc_struct cs; + u16 selector; + u32 base3; + + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); + + if (!(ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE)) { + /* Real mode. cpu must not have long mode active */ + if (efer & EFER_LMA) + return X86EMUL_UNHANDLEABLE; + ctxt->mode = X86EMUL_MODE_REAL; + return X86EMUL_CONTINUE; + } + + if (ctxt->eflags & X86_EFLAGS_VM) { + /* Protected/VM86 mode. cpu must not have long mode active */ + if (efer & EFER_LMA) + return X86EMUL_UNHANDLEABLE; + ctxt->mode = X86EMUL_MODE_VM86; + return X86EMUL_CONTINUE; + } + + if (!ctxt->ops->get_segment(ctxt, &selector, &cs, &base3, VCPU_SREG_CS)) + return X86EMUL_UNHANDLEABLE; + + if (efer & EFER_LMA) { + if (cs.l) { + /* Proper long mode */ + ctxt->mode = X86EMUL_MODE_PROT64; + } else if (cs.d) { + /* 32 bit compatibility mode*/ + ctxt->mode = X86EMUL_MODE_PROT32; + } else { + ctxt->mode = X86EMUL_MODE_PROT16; + } + } else { + /* Legacy 32 bit / 16 bit mode */ + ctxt->mode = cs.d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; + } + + return X86EMUL_CONTINUE; +} + static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst) { - return assign_eip(ctxt, dst, ctxt->mode); + return assign_eip(ctxt, dst); }
-static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst, - const struct desc_struct *cs_desc) +static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst) { - enum x86emul_mode mode = ctxt->mode; - int rc; + int rc = emulator_recalc_and_set_mode(ctxt);
-#ifdef CONFIG_X86_64 - if (ctxt->mode >= X86EMUL_MODE_PROT16) { - if (cs_desc->l) { - u64 efer = 0; + if (rc != X86EMUL_CONTINUE) + return rc;
- ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); - if (efer & EFER_LMA) - mode = X86EMUL_MODE_PROT64; - } else - mode = X86EMUL_MODE_PROT32; /* temporary value */ - } -#endif - if (mode == X86EMUL_MODE_PROT16 || mode == X86EMUL_MODE_PROT32) - mode = cs_desc->d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; - rc = assign_eip(ctxt, dst, mode); - if (rc == X86EMUL_CONTINUE) - ctxt->mode = mode; - return rc; + return assign_eip(ctxt, dst); }
static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel) @@ -2153,7 +2182,7 @@ static int em_jmp_far(struct x86_emulate if (rc != X86EMUL_CONTINUE) return rc;
- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc); + rc = assign_eip_far(ctxt, ctxt->src.val); /* Error handling is not implemented. */ if (rc != X86EMUL_CONTINUE) return X86EMUL_UNHANDLEABLE; @@ -2234,7 +2263,7 @@ static int em_ret_far(struct x86_emulate &new_desc); if (rc != X86EMUL_CONTINUE) return rc; - rc = assign_eip_far(ctxt, eip, &new_desc); + rc = assign_eip_far(ctxt, eip); /* Error handling is not implemented. */ if (rc != X86EMUL_CONTINUE) return X86EMUL_UNHANDLEABLE; @@ -3458,7 +3487,7 @@ static int em_call_far(struct x86_emulat if (rc != X86EMUL_CONTINUE) return rc;
- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc); + rc = assign_eip_far(ctxt, ctxt->src.val); if (rc != X86EMUL_CONTINUE) goto fail;
From: Maxim Levitsky mlevitsk@redhat.com
commit 055f37f84e304e59c046d1accfd8f08462f52c4c upstream.
Update the emulation mode after RSM so that RIP will be correctly written back, because the RSM instruction can switch the CPU mode from 32 bit (or less) to 64 bit.
This fixes a guest crash in case the #SMI is received while the guest runs a code from an address > 32 bit.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-13-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2646,7 +2646,7 @@ static int em_rsm(struct x86_emulate_ctx * those side effects need to be explicitly handled for both success * and shutdown. */ - return X86EMUL_CONTINUE; + return emulator_recalc_and_set_mode(ctxt);
emulate_shutdown: ctxt->ops->triple_fault(ctxt);
From: Maxim Levitsky mlevitsk@redhat.com
commit ad8f9e69942c7db90758d9d774157e53bce94840 upstream.
Update the emulation mode when handling writes to CR0, because toggling CR0.PE switches between Real and Protected Mode, and toggling CR0.PG when EFER.LME=1 switches between Long and Protected Mode.
This is likely a benign bug because there is no writeback of state, other than the RIP increment, and when toggling CR0.PE, the CPU has to execute code from a very low memory address.
Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20221025124741.228045-14-mlevitsk@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/emulate.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3629,11 +3629,25 @@ static int em_movbe(struct x86_emulate_c
static int em_cr_write(struct x86_emulate_ctxt *ctxt) { - if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val)) + int cr_num = ctxt->modrm_reg; + int r; + + if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val)) return emulate_gp(ctxt, 0);
/* Disable writeback. */ ctxt->dst.type = OP_NONE; + + if (cr_num == 0) { + /* + * CR0 write might have updated CR0.PE and/or CR0.PG + * which can affect the cpu's execution mode. + */ + r = emulator_recalc_and_set_mode(ctxt); + if (r != X86EMUL_CONTINUE) + return r; + } + return X86EMUL_CONTINUE; }
From: Sumit Garg sumit.garg@linaro.org
Commit 056d3fed3d1f ("tee: add tee_shm_register_{user,kernel}_buf()") refactored tee_shm_register() into corresponding user and kernel space functions named tee_shm_register_{user,kernel}_buf(). The upstream fix commit 573ae4f13f63 ("tee: add overflow check in register_shm_helper()") only applied to tee_shm_register_user_buf().
But the stable kernel 4.19, 5.4, 5.10 and 5.15 don't have the above mentioned tee_shm_register() refactoring commit. Hence a direct backport wasn't possible and the fix has to be rather applied to tee_ioctl_shm_register().
Somehow the fix was correctly backported to 4.19 and 5.4 stable kernels but the backports for 5.10 and 5.15 stable kernels were broken as fix was applied to common tee_shm_register() function which broke its kernel space users such as trusted keys driver.
Fortunately the backport for 5.10 stable kernel was incidently fixed by: commit 606fe84a4185 ("tee: fix memory leak in tee_shm_register()"). So fix the backport for 5.15 stable kernel as well.
Fixes: 578c349570d2 ("tee: add overflow check in register_shm_helper()") Cc: stable@vger.kernel.org # 5.15 Reported-by: Sahil Malhotra sahil.malhotra@nxp.com Signed-off-by: Sumit Garg sumit.garg@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tee/tee_core.c | 3 +++ drivers/tee/tee_shm.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/tee/tee_core.c +++ b/drivers/tee/tee_core.c @@ -334,6 +334,9 @@ tee_ioctl_shm_register(struct tee_contex if (data.flags) return -EINVAL;
+ if (!access_ok((void __user *)(unsigned long)data.addr, data.length)) + return -EFAULT; + shm = tee_shm_register(ctx, data.addr, data.length, TEE_SHM_DMA_BUF | TEE_SHM_USER_MAPPED); if (IS_ERR(shm)) --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -223,9 +223,6 @@ struct tee_shm *tee_shm_register(struct goto err; }
- if (!access_ok((void __user *)addr, length)) - return ERR_PTR(-EFAULT); - mutex_lock(&teedev->mutex); shm->id = idr_alloc(&teedev->idr, shm, 1, 0, GFP_KERNEL); mutex_unlock(&teedev->mutex);
From: Matthew Wilcox (Oracle) willy@infradead.org
commit 4fa0e3ff217f775cb58d2d6d51820ec519243fb9 upstream.
The recent change of page_cache_ra_unbounded() arguments was buggy in the two callers, causing us to readahead the wrong pages. Move the definition of ractl down to after the index is set correctly. This affected performance on configurations that use fs-verity.
Link: https://lkml.kernel.org/r/20221012193419.1453558-1-willy@infradead.org Fixes: 73bb49da50cd ("mm/readahead: make page_cache_ra_unbounded take a readahead_control") Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Reported-by: Jintao Yin nicememory@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Cc: Eric Biggers ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/verity.c | 3 ++- fs/f2fs/verity.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -364,13 +364,14 @@ static struct page *ext4_read_merkle_tre pgoff_t index, unsigned long num_ra_pages) { - DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); struct page *page;
index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED); if (!page || !PageUptodate(page)) { + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); + if (page) put_page(page); else if (num_ra_pages > 1) --- a/fs/f2fs/verity.c +++ b/fs/f2fs/verity.c @@ -261,13 +261,14 @@ static struct page *f2fs_read_merkle_tre pgoff_t index, unsigned long num_ra_pages) { - DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); struct page *page;
index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED); if (!page || !PageUptodate(page)) { + DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); + if (page) put_page(page); else if (num_ra_pages > 1)
From: Ronnie Sahlberg lsahlber@redhat.com
commit 2f6f19c7aaad5005dc75298a413eb0243c5d312d upstream.
BZ: 215375
Fixes: 76a3c92ec9e0 ("cifs: remove support for NTLM and weaker authentication algorithms") Reviewed-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/connect.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -3711,12 +3711,11 @@ CIFSTCon(const unsigned int xid, struct pSMB->AndXCommand = 0xFF; pSMB->Flags = cpu_to_le16(TCON_EXTENDED_SECINFO); bcc_ptr = &pSMB->Password[0]; - if (tcon->pipe || (ses->server->sec_mode & SECMODE_USER)) { - pSMB->PasswordLength = cpu_to_le16(1); /* minimum */ - *bcc_ptr = 0; /* password is null byte */ - bcc_ptr++; /* skip password */ - /* already aligned so no need to do it below */ - } + + pSMB->PasswordLength = cpu_to_le16(1); /* minimum */ + *bcc_ptr = 0; /* password is null byte */ + bcc_ptr++; /* skip password */ + /* already aligned so no need to do it below */
if (ses->server->sign) smb_buffer->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
From: Brian Norris briannorris@chromium.org
commit 0be67e0556e469c57100ffe3c90df90abc796f3b upstream.
If we fail to attach the first time (especially: EPROBE_DEFER), we fail to clean up 'usage_mode', and thus will fail to attach on any subsequent attempts, with "dsi controller already in use".
Re-set to DW_DSI_USAGE_IDLE on attach failure.
This is especially common to hit when enabling asynchronous probe on a duel-DSI system (such as RK3399 Gru/Scarlet), such that we're more likely to fail dw_mipi_dsi_rockchip_find_second() the first time.
Fixes: 71f68fe7f121 ("drm/rockchip: dsi: add ability to work as a phy instead of full dsi") Cc: stable@vger.kernel.org Signed-off-by: Brian Norris briannorris@chromium.org Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20221019170255.1.Ia68dfb27b835... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c @@ -1027,23 +1027,31 @@ static int dw_mipi_dsi_rockchip_host_att if (ret) { DRM_DEV_ERROR(dsi->dev, "Failed to register component: %d\n", ret); - return ret; + goto out; }
second = dw_mipi_dsi_rockchip_find_second(dsi); - if (IS_ERR(second)) - return PTR_ERR(second); + if (IS_ERR(second)) { + ret = PTR_ERR(second); + goto out; + } if (second) { ret = component_add(second, &dw_mipi_dsi_rockchip_ops); if (ret) { DRM_DEV_ERROR(second, "Failed to register component: %d\n", ret); - return ret; + goto out; } }
return 0; + +out: + mutex_lock(&dsi->usage_mutex); + dsi->usage_mode = DW_DSI_USAGE_IDLE; + mutex_unlock(&dsi->usage_mutex); + return ret; }
static int dw_mipi_dsi_rockchip_host_detach(void *priv_data,
From: Brian Norris briannorris@chromium.org
commit 81e592f86f7afdb76d655e7fbd7803d7b8f985d8 upstream.
We can't safely probe a dual-DSI display asynchronously (driver_async_probe='*' or driver_async_probe='dw-mipi-dsi-rockchip' cmdline), because dw_mipi_dsi_rockchip_find_second() pokes one DSI device's drvdata from the other device without any locking.
Request synchronous probe, at least until this driver learns some appropriate locking for dual-DSI initialization.
Cc: stable@vger.kernel.org Signed-off-by: Brian Norris briannorris@chromium.org Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20221019170255.2.I6b985b0ca372... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c @@ -1638,5 +1638,11 @@ struct platform_driver dw_mipi_dsi_rockc .of_match_table = dw_mipi_dsi_rockchip_dt_ids, .pm = &dw_mipi_dsi_rockchip_pm_ops, .name = "dw-mipi-dsi-rockchip", + /* + * For dual-DSI display, one DSI pokes at the other DSI's + * drvdata in dw_mipi_dsi_rockchip_find_second(). This is not + * safe for asynchronous probe. + */ + .probe_type = PROBE_FORCE_SYNCHRONOUS, }, };
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 3e206b6aa6df7eed4297577e0cf8403169b800a2 upstream.
We try to filter out the corresponding xxx1 output if the xxx0 output is not present. But the way that is being done is pretty awkward. Make it less so.
Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-2-ville.s... Reviewed-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit cc1e66394daaa7e9f005e2487a84e34a39f9308b) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_sdvo.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c +++ b/drivers/gpu/drm/i915/display/intel_sdvo.c @@ -2939,16 +2939,33 @@ err: return false; }
+static u16 intel_sdvo_filter_output_flags(u16 flags) +{ + flags &= SDVO_OUTPUT_MASK; + + /* SDVO requires XXX1 function may not exist unless it has XXX0 function.*/ + if (!(flags & SDVO_OUTPUT_TMDS0)) + flags &= ~SDVO_OUTPUT_TMDS1; + + if (!(flags & SDVO_OUTPUT_RGB0)) + flags &= ~SDVO_OUTPUT_RGB1; + + if (!(flags & SDVO_OUTPUT_LVDS0)) + flags &= ~SDVO_OUTPUT_LVDS1; + + return flags; +} + static bool intel_sdvo_output_setup(struct intel_sdvo *intel_sdvo, u16 flags) { - /* SDVO requires XXX1 function may not exist unless it has XXX0 function.*/ + flags = intel_sdvo_filter_output_flags(flags);
if (flags & SDVO_OUTPUT_TMDS0) if (!intel_sdvo_dvi_init(intel_sdvo, 0)) return false;
- if ((flags & SDVO_TMDS_MASK) == SDVO_TMDS_MASK) + if (flags & SDVO_OUTPUT_TMDS1) if (!intel_sdvo_dvi_init(intel_sdvo, 1)) return false;
@@ -2969,7 +2986,7 @@ intel_sdvo_output_setup(struct intel_sdv if (!intel_sdvo_analog_init(intel_sdvo, 0)) return false;
- if ((flags & SDVO_RGB_MASK) == SDVO_RGB_MASK) + if (flags & SDVO_OUTPUT_RGB1) if (!intel_sdvo_analog_init(intel_sdvo, 1)) return false;
@@ -2977,11 +2994,11 @@ intel_sdvo_output_setup(struct intel_sdv if (!intel_sdvo_lvds_init(intel_sdvo, 0)) return false;
- if ((flags & SDVO_LVDS_MASK) == SDVO_LVDS_MASK) + if (flags & SDVO_OUTPUT_LVDS1) if (!intel_sdvo_lvds_init(intel_sdvo, 1)) return false;
- if ((flags & SDVO_OUTPUT_MASK) == 0) { + if (flags == 0) { unsigned char bytes[2];
intel_sdvo->controlled_output = 0;
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit e79762512120f11c51317570519a1553c70805d8 upstream.
Call intel_sdvo_select_ddc_bus() before initializing any of the outputs. And before that is functional (assuming no VBT) we have to set up the controlled_outputs thing. Otherwise DDC won't be functional during the output init but LVDS really needs it for the fixed mode setup.
Note that the whole multi output support still looks very bogus, and more work will be needed to make it correct. But for now this should at least fix the LVDS EDID fixed mode setup.
Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7301 Fixes: aa2b88074a56 ("drm/i915/sdvo: Fix multi function encoder stuff") Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20221026101134.20865-3-ville.s... Reviewed-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit 64b7b557dc8a96d9cfed6aedbf81de2df80c025d) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_sdvo.c | 31 +++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c +++ b/drivers/gpu/drm/i915/display/intel_sdvo.c @@ -2762,13 +2762,10 @@ intel_sdvo_dvi_init(struct intel_sdvo *i if (!intel_sdvo_connector) return false;
- if (device == 0) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_TMDS0; + if (device == 0) intel_sdvo_connector->output_flag = SDVO_OUTPUT_TMDS0; - } else if (device == 1) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_TMDS1; + else if (device == 1) intel_sdvo_connector->output_flag = SDVO_OUTPUT_TMDS1; - }
intel_connector = &intel_sdvo_connector->base; connector = &intel_connector->base; @@ -2823,7 +2820,6 @@ intel_sdvo_tv_init(struct intel_sdvo *in encoder->encoder_type = DRM_MODE_ENCODER_TVDAC; connector->connector_type = DRM_MODE_CONNECTOR_SVIDEO;
- intel_sdvo->controlled_output |= type; intel_sdvo_connector->output_flag = type;
if (intel_sdvo_connector_init(intel_sdvo_connector, intel_sdvo) < 0) { @@ -2864,13 +2860,10 @@ intel_sdvo_analog_init(struct intel_sdvo encoder->encoder_type = DRM_MODE_ENCODER_DAC; connector->connector_type = DRM_MODE_CONNECTOR_VGA;
- if (device == 0) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_RGB0; + if (device == 0) intel_sdvo_connector->output_flag = SDVO_OUTPUT_RGB0; - } else if (device == 1) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_RGB1; + else if (device == 1) intel_sdvo_connector->output_flag = SDVO_OUTPUT_RGB1; - }
if (intel_sdvo_connector_init(intel_sdvo_connector, intel_sdvo) < 0) { kfree(intel_sdvo_connector); @@ -2900,13 +2893,10 @@ intel_sdvo_lvds_init(struct intel_sdvo * encoder->encoder_type = DRM_MODE_ENCODER_LVDS; connector->connector_type = DRM_MODE_CONNECTOR_LVDS;
- if (device == 0) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_LVDS0; + if (device == 0) intel_sdvo_connector->output_flag = SDVO_OUTPUT_LVDS0; - } else if (device == 1) { - intel_sdvo->controlled_output |= SDVO_OUTPUT_LVDS1; + else if (device == 1) intel_sdvo_connector->output_flag = SDVO_OUTPUT_LVDS1; - }
if (intel_sdvo_connector_init(intel_sdvo_connector, intel_sdvo) < 0) { kfree(intel_sdvo_connector); @@ -2959,8 +2949,14 @@ static u16 intel_sdvo_filter_output_flag static bool intel_sdvo_output_setup(struct intel_sdvo *intel_sdvo, u16 flags) { + struct drm_i915_private *i915 = to_i915(intel_sdvo->base.base.dev); + flags = intel_sdvo_filter_output_flags(flags);
+ intel_sdvo->controlled_output = flags; + + intel_sdvo_select_ddc_bus(i915, intel_sdvo); + if (flags & SDVO_OUTPUT_TMDS0) if (!intel_sdvo_dvi_init(intel_sdvo, 0)) return false; @@ -3001,7 +2997,6 @@ intel_sdvo_output_setup(struct intel_sdv if (flags == 0) { unsigned char bytes[2];
- intel_sdvo->controlled_output = 0; memcpy(bytes, &intel_sdvo->caps.output_flags, 2); DRM_DEBUG_KMS("%s: Unknown SDVO output type (0x%02x%02x)\n", SDVO_NAME(intel_sdvo), @@ -3413,8 +3408,6 @@ bool intel_sdvo_init(struct drm_i915_pri */ intel_sdvo->base.cloneable = 0;
- intel_sdvo_select_ddc_bus(dev_priv, intel_sdvo); - /* Set the input timing to the screen. Assume always input 0. */ if (!intel_sdvo_set_target_input(intel_sdvo)) goto err_output;
From: Dokyung Song dokyung.song@gmail.com
commit 6788ba8aed4e28e90f72d68a9d794e34eac17295 upstream.
This patch fixes an intra-object buffer overflow in brcmfmac that occurs when the device provides a 'bsscfgidx' equal to or greater than the buffer size. The patch adds a check that leads to a safe failure if that is the case.
This fixes CVE-2022-3628.
UBSAN: array-index-out-of-bounds in drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c index 52 is out of range for type 'brcmf_if *[16]' CPU: 0 PID: 1898 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events brcmf_fweh_event_worker Call Trace: dump_stack_lvl+0x57/0x7d ubsan_epilogue+0x5/0x40 __ubsan_handle_out_of_bounds+0x69/0x80 ? memcpy+0x39/0x60 brcmf_fweh_event_worker+0xae1/0xc00 ? brcmf_fweh_call_event_handler.isra.0+0x100/0x100 ? rcu_read_lock_sched_held+0xa1/0xd0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x273/0x3e0 process_one_work+0x873/0x13e0 ? lock_release+0x640/0x640 ? pwq_dec_nr_in_flight+0x320/0x320 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x8b/0xd10 ? __kthread_parkme+0xd9/0x1d0 ? process_one_work+0x13e0/0x13e0 kthread+0x379/0x450 ? _raw_spin_unlock_irq+0x24/0x30 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x1f/0x30 ================================================================================ general protection fault, probably for non-canonical address 0xe5601c0020023fff: 0000 [#1] SMP KASAN KASAN: maybe wild-memory-access in range [0x2b0100010011fff8-0x2b0100010011ffff] CPU: 0 PID: 1898 Comm: kworker/0:2 Tainted: G O 5.14.0+ #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events brcmf_fweh_event_worker RIP: 0010:brcmf_fweh_call_event_handler.isra.0+0x42/0x100 Code: 89 f5 53 48 89 fb 48 83 ec 08 e8 79 0b 38 fe 48 85 ed 74 7e e8 6f 0b 38 fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 8b 00 00 00 4c 8b 7d 00 44 89 e0 48 ba 00 00 00 RSP: 0018:ffffc9000259fbd8 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffff888115d8cd50 RCX: 0000000000000000 RDX: 0560200020023fff RSI: ffffffff8304bc91 RDI: ffff888115d8cd50 RBP: 2b0100010011ffff R08: ffff888112340050 R09: ffffed1023549809 R10: ffff88811aa4c047 R11: ffffed1023549808 R12: 0000000000000045 R13: ffffc9000259fca0 R14: ffff888112340050 R15: ffff888112340000 FS: 0000000000000000(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004053ccc0 CR3: 0000000112740000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: brcmf_fweh_event_worker+0x117/0xc00 ? brcmf_fweh_call_event_handler.isra.0+0x100/0x100 ? rcu_read_lock_sched_held+0xa1/0xd0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x273/0x3e0 process_one_work+0x873/0x13e0 ? lock_release+0x640/0x640 ? pwq_dec_nr_in_flight+0x320/0x320 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x8b/0xd10 ? __kthread_parkme+0xd9/0x1d0 ? process_one_work+0x13e0/0x13e0 kthread+0x379/0x450 ? _raw_spin_unlock_irq+0x24/0x30 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x1f/0x30 Modules linked in: 88XXau(O) 88x2bu(O) ---[ end trace 41d302138f3ff55a ]--- RIP: 0010:brcmf_fweh_call_event_handler.isra.0+0x42/0x100 Code: 89 f5 53 48 89 fb 48 83 ec 08 e8 79 0b 38 fe 48 85 ed 74 7e e8 6f 0b 38 fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 8b 00 00 00 4c 8b 7d 00 44 89 e0 48 ba 00 00 00 RSP: 0018:ffffc9000259fbd8 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffff888115d8cd50 RCX: 0000000000000000 RDX: 0560200020023fff RSI: ffffffff8304bc91 RDI: ffff888115d8cd50 RBP: 2b0100010011ffff R08: ffff888112340050 R09: ffffed1023549809 R10: ffff88811aa4c047 R11: ffffed1023549808 R12: 0000000000000045 R13: ffffc9000259fca0 R14: ffff888112340050 R15: ffff888112340000 FS: 0000000000000000(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004053ccc0 CR3: 0000000112740000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception
Reported-by: Dokyung Song dokyungs@yonsei.ac.kr Reported-by: Jisoo Jang jisoo.jang@yonsei.ac.kr Reported-by: Minsuk Kang linuxlovemin@yonsei.ac.kr Reviewed-by: Arend van Spriel aspriel@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Dokyung Song dokyung.song@gmail.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20221021061359.GA550858@laguna Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c @@ -228,6 +228,10 @@ static void brcmf_fweh_event_worker(stru brcmf_fweh_event_name(event->code), event->code, event->emsg.ifidx, event->emsg.bsscfgidx, event->emsg.addr); + if (event->emsg.bsscfgidx >= BRCMF_MAX_IFS) { + bphy_err(drvr, "invalid bsscfg index: %u\n", event->emsg.bsscfgidx); + goto event_free; + }
/* convert event message */ emsg_be = &event->emsg;
On 11/8/2022 5:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.78-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
Hi,
On Tue, Nov 8, 2022 at 8:59 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
5.15.78-rc1 compiled and booted on my x86_64 test system. No errors or regressions.
Tested-by: Slade Watkins srw@sladewatkins.net
Best, -srw
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.78-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On Tue, Nov 08, 2022 at 02:37:57PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 489 pass: 489 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, Nov 08, 2022 at 02:37:57PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0) and powerpc (ps3_defconfig, GCC 12.2.0).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On 11/8/22 5:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.78-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, 8 Nov 2022 at 19:29, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.78-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
NOTE: 1) following warning noticed while booting x86 machine with kselftest configs enabled. this is always reproducible.
[ 1.465970] RETBleed: Mitigation: IBRS [ 1.466971] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier [ 1.467970] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 1.468983] MDS: Mitigation: Clear CPU buffers [ 1.469970] TAA: Mitigation: Clear CPU buffers [ 1.470970] MMIO Stale Data: Mitigation: Clear CPU buffers [ 1.471972] SRBDS: Mitigation: Microcode [ 1.499045] ------------[ cut here ]------------ [ 1.499970] missing return thunk: lkdtm_rodata_do_nothing+0x0/0x10-lkdtm_rodata_do_nothing+0x5/0x10: e9 00 00 00 00 [ 1.499978] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:557 apply_returns+0x1d4/0x210 [ 1.501970] Modules linked in: [ 1.502970] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.78-rc1 #1 [ 1.503970] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.5 11/26/2020 [ 1.504970] RIP: 0010:apply_returns+0x1d4/0x210
Full test log link, - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15....
metadata: build_name: gcc-11-lkftconfig-kselftest-kernel git_ref: linux-5.15.y git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc git_sha: f98185b81e483128439a76d6e64217b606c09bed git_describe: v5.15.77-145-gf98185b81e48 kernel_version: 5.15.78-rc1 kernel-config: https://builds.tuxbuild.com/2HGeJ0HPMmHQP8DQlZikW685I9F/config build-url: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/pipelines/68... artifact-location: https://builds.tuxbuild.com/2HGeJ0HPMmHQP8DQlZikW685I9F toolchain: gcc-11
2) The arm x15 device kernel crash log was shared on another email thread.
[ 5.393585] Internal error: Oops: 5 [#1] SMP ARM [ 7.917999] WARNING: CPU: 0 PID: 8 at kernel/sched/core.c:9542 __might_sleep+0xa8/0xac [ 10.529235] kernel BUG at kernel/sched/core.c:6360!
Email thread link, - https://lore.kernel.org/stable/CA+G9fYt49jY+sAqHXYwpJtF0oa-jL8t8nArY6W1_zui0...
## Build * kernel: 5.15.78-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.15.y * git commit: f98185b81e483128439a76d6e64217b606c09bed * git describe: v5.15.77-145-gf98185b81e48 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15....
## Test Regressions (compared to v5.15.76-133-g55ed865a9c8f)
## Metric Regressions (compared to v5.15.76-133-g55ed865a9c8f)
## Test Fixes (compared to v5.15.76-133-g55ed865a9c8f)
## Metric Fixes (compared to v5.15.76-133-g55ed865a9c8f)
## Test result summary total: 154331, pass: 129988, fail: 4055, skip: 19753, xfail: 535
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 149 total, 148 passed, 1 failed * arm64: 47 total, 45 passed, 2 failed * i386: 37 total, 35 passed, 2 failed * mips: 27 total, 27 passed, 0 failed * parisc: 6 total, 6 passed, 0 failed * powerpc: 30 total, 30 passed, 0 failed * riscv: 10 total, 10 passed, 0 failed * s390: 12 total, 12 passed, 0 failed * sh: 12 total, 12 passed, 0 failed * sparc: 6 total, 6 passed, 0 failed * x86_64: 40 total, 38 passed, 2 failed
## Test suites summary * fwts * igt-gpu-tools * kself[ * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On 11/8/22 06:37, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.78-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
Hi Greg,
On Tue, Nov 08, 2022 at 02:37:57PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.78 release. There are 144 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Nov 2022 13:33:17 +0000. Anything received after that time might be too late.
Build test (gcc version 12.2.1 20221016): mips: 62 configs -> no failure arm: 99 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure csky allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2]
[1]. https://openqa.qa.codethink.co.uk/tests/2128 [2]. https://openqa.qa.codethink.co.uk/tests/2134
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
linux-stable-mirror@lists.linaro.org